R. Swindell Internet Draft Wind River Systems Document: September 1999 Category: Informational Expires March 2000 Plain Text/Source Code File Header Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract Anyone that has dealt at length with plain (ASCII [1]) text and source code files can testify that the lack of a global definition of the effect of the horizontal-tab character, all too often, causes ill-formed display and printed output of plain text files that utilize the horizontal-tab character for formatting. This document defines a common header for plain text and source code (PT/SC) files, whose primary purpose is to specify the tab-dependant formatting parameters to be used when displaying, printing, or editing such files. The defined header also addresses such issues as whether to use the line-feed character or carriage-return/line- feed character sequence to terminate lines in the file. Widespread adoption and support of the header defined in this document could substantially improve the interoperability of text and source code files distributed across the Internet and other mediums. Swindell [Page 1] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [2]. All numbers in this document a represented in decimal (base 10) format unless otherwise noted. 3. Introduction The tab key is typically used in plain (ASCII) text editors as a fast and convenient way to align text on predetermined column boundaries (tab-stops). Upon editing or creating a file, when the tab key is pressed, the editor will usually place a horizontal-tab (ASCII 9) character in the current insert position and move the cursor forward to the next tab-stop position (i.e. indention). The offset (tab-size) of each subsequent tab-stop is usually configurable in the editor, although the default value of this option is different from one editor to the next (typically in the range of two to eight character positions). Additionally, some editors allow asymmetric tab-stops (variable tab-size) where for example, the first tab-stop may be at column five and the second at column eight. 3.1 The Problem The problem occurs when the file is printed, viewed, or loaded into a different editor, or perhaps loaded into the same editor, but with a different tab-size or tab-stop configuration. The resulting file image may or may not resemble the original formatting of the file. This is especially critical for the legibility of program and script source code (C/C++, Java, HTML, etc.) files. The problem is especially apparent in a multi-author environment, where inevitably, different authors will have their editors and other text processing applications configured with different tab- dependent formatting parameters, resulting in what is affectionately referred to as "tab-hell". Example: Bob creates the following text file with his editor configured with eight space (symmetric) tab-stops: +-------+-------+-------+---------------+---------------+ | User | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | +-------+-------+-------+---------------+---------------+ Swindell [Page 2] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 Julie would like to add herself to the table, so she loads the file into her editor, which happens to be configured for two space tab- stops (her preference). This is what Julie is presented with: +-------+-------+-------+---------------+---------------+ | User | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | +-------+-------+-------+---------------+---------------+ Confused, but determined, Julie adds herself to the table and prints the file for the caterer and DJ of the upcoming company party: +-------+-------+-------+---------------+---------------+ | User | Meat | Dairy | Favorite Food | Favorite Band | +-------+-------+-------+---------------+---------------+ | Bob | No | Yes | Salad | Meat Loaf | | Sally | Yes | No | Burrito | Cream | | Mike | Yes | No | Pasta | Vanilla Fudge | | Julie | Yes | Yes | Milk | Roast Beef | +-------+-------+-------+---------------+---------------+ Needless to say, the party is a disaster: Bob's a vegetarian, Sally and Mike are lactose intolerant, and the DJ brings only Modern Dance music. This is obviously an extreme hypothetical example. Typically, tab- related formatting problems are more of an esthetics than a logistics problem, but you get the idea. NOTE: This document must be viewed or printed using a non- proportional font for the above tables to appear as intended. 3.2 Existing Work-around Many existing text editors offer the option of writing the appropriate number of space (ASCII 32) characters to a file in place of horizontal-tab characters. While such an option can be a functional work-around to the problem for files created and subsequently edited with an editor configured in this manner, it does nothing to solve the problem of correctly displaying, printing, or editing files that utilize horizontal-tab characters. Subsequent editing of a file that uses spaces in place of horizontal-tab characters may still adversely affect the formatting of the file if the author utilizes the tab key for indention and their editor is configured with different tab-stop parameters than the original editor configuration. Swindell [Page 3] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 Utilizing such an option also eliminates the possibility of convenient indentation/column resizing by simply adjusting the tab- stop configuration. Additionally, most editors allow quick cursor movement through white-space in a file that utilizes the horizontal- tab character (typically, one arrow-key press per tab-stop), while white-space in files that use spaces in place of horizontal-tabs must be navigated one character position at a time (e.g. eight arrow-key presses per tab-stop). An increasingly minor consideration is the fact that files (particularly source code files) that replace horizontal-tab characters with spaces can require as much as twenty percent more storage space than files that utilize horizontal-tab characters. 3.3 Proposed Solution While common computer users have migrated toward modern word processors and their elaborate document formats, this problem, although seemingly obscure, remains a thorn in the side of the minority of users who must still deal with plain text files. Ironically, the one group of computer users who are most affected by this problem are programmers, the same ones who are in a position to solve it by implementing a common solution. The solution proposed in this document is a plain text/source code (PT/SC) file header, whose primary purpose is to specify the tab- dependant formatting parameters to be used when printing, displaying, or editing such files. Other common text file formatting issues (such as whether to use the line-feed character or carriage-return/line-feed character sequence to terminate lines) are also addressed in this header. Swindell [Page 4] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 4. Application It is hoped that a significant percentage of the developers of plain text editors, specifically those designed for use in Integrated Development Environments (IDEs), will adopt this proposal. Adoption would include parsing any PT/SC headers (if present) in files opened in the editor and setting the formatting parameters accordingly and optionally, adding PT/SC headers (if not already present) to files written to disk. A supporting editor MUST update any PT/SC headers (if present) with the current formatting parameters when a file is written to disk. A supporting editor SHOULD allow the option of adding PT/SC headers to a file (if not already present) with all relevant formatting parameter values specified. Text editors that support multiple concurrently opened files MUST support a unique set of formatting parameters for each opened document. Many editors already support a unique set of parameters based on the extension (e.g. ".c", ".pas", ".txt") of the opened file. Such a feature would need to be extended to set the appropriate formatting parameters based on the values specified in any PT/SC headers present. Two-way support is defined as that of linking the PT/SC header values and corresponding configuration menu options (if applicable) such that a value changed in the file is reflected in the configuration menu and vice versa. Two-way support is RECOMMENDED, but not required. It is also desirable that developers of applications designed to view, print, or modify in anyway plain text or source code files adopt this proposal. Such applications include (but are not limited to) version control systems, file comparison utilities, syntax verification utilities, source-level debuggers, and universal document viewing and printing utilities. Authors of plain text and source code files need not wait for PT/SC header support in text editors. The simplicity of the PT/SC header format allows authors to significantly help one another by at least "documenting" the original formatting parameters by hand-coding the PT/SC header so that other users and co-authors of such files need not "guess" at the correct formatting parameters. And when applications supporting the PT/SC header become available, existing documents and source code files will immediately benefit from the automatic adjustment of formatting parameters based on the pre- existing PT/SC headers. Swindell [Page 5] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 5. Header Format @format. Where is the beginning of the file (offset 0), is the line-feed (ASCII 10) character, is one or more space (ASCII 32) or horizontal-tab (ASCII 9) characters, is one of the supported format variable names (see section 6), and is one or more desired values (separated by white-space) in the appropriate format for the corresponding variable. The "@format." header token and the supported format variable names SHALL NOT be case sensitive (i.e. "@FoRmAt." is a valid header token). Decimal numeric values SHALL NOT be zero-padded (i.e. "08" is an invalid decimal value). Hexadecimal numeric values MAY be zero-padded (i.e. "0x08" is a valid hexadecimal value). Example: This is MyFile.txt, @format.tab-size 8 PT/SC headers may appear anywhere in a file, though they SHOULD be located as close to the beginning of the file as possible. It is RECOMMENDED that no horizontal-tab characters precede the tab- dependant formatting variables in the header (if present). If horizontal-tab characters precede the definition of the tab- size/tab-stop variables (for example), horizontal-tab characters may not be expanded correctly if the file is processed in a single pass (e.g. first line read, processed, printed, next line read, processed, printed, etc.). Multiple formatting variables may be specified by including multiple headers. Multiple headers may be specified on a single line: This is MyFile.txt, @format.tab-size 8, @format.new-line crlf or multiple lines: This is MyFile.txt, @format.tab-size 8 @format.new-line crlf Format variables SHOULD NOT be multiply defined. If a format variable is defined more than once in a file (e.g. this document), only the first occurrence SHALL be interpreted as valid. Swindell [Page 6] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 5.1 Headers in Source Code Files PT/SC headers may be embedded in program or script source code files by including the header in the comment delimiters of the appropriate language for the file. Examples: /* MyProgram.c, @format.tab-size 4 */ // MyProgram.cpp, @format.tab-size 3 // @format.use-tabs true REM MyProgram.bas, @format.tab-size 8 { MyProgram.pas, @format.tab-size 4 } ; MyProgram.asm, @format.tab-size 4 # MyProgram.mak, @format.tab-size 4 /** * MyProgram.java, JavaDoc comment * @author R. R. Swindell * @version 1.00 * @format.tab-size 4 * @format.new-line crlf */ Since the comment delimiters are not part of the header format, PT/SC headers are not restricted to a specific set of programming or scripting languages and should remain compatible with any future languages provided they allow for free-form in-line comments. NOTE 1: PT/SC headers in source code files define formatting parameters for the display, editing, or printing of the source code itself and not the output of the resulting program or script. Although certain languages (e.g. HTML) could benefit from a standardized method of specifying formatting parameters for output (specifically tab-size/tab-stops), such a definition is beyond the scope of this document. NOTE 2: The PT/SC literal token, "@format." may be present in source code files without accidental interpretation as a PT/SC header by nesting the character string in quotes or the appropriate string delimiters for the language (the header format requires this token be located at the beginning of the file or immediately following a line-feed, space, or horizontal-tab character). Swindell [Page 7] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 6. Format Variables Supported PT/SC Format Variables: tab-size tab-stops indent-size line-length new-line use-tabs If any of the supported format variables are not specified by a PT/SC header in a file, the value of the corresponding format parameter (if supported by the application) SHALL be left in its default or user-configured state unless otherwise noted. NOTE: Format variable names SHALL NOT be case sensitive (i.e. "tab- size" and "TAB-Size" are both valid format variable names). 6.1 tab-size The variable was the initial inspiration for the PT/SC header and remains its primary purpose. The variable is used to specify the symmetric offset of each tab-stop in the file (in non-proportional character widths). Syntax: @format.tab-size Where is a positive decimal (base 10) number in the range of 1 to 60. Example: @format.tab-size 4 Would result in tab-stops at offsets (from the beginning of each line) of 4, 8, 12, 16, 20, 24, etc. If asymmetric tab-stops are supported by the application and the variable is defined, the application SHALL ignore any definition of the variable and use the values specified for the variable instead. Swindell [Page 8] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 6.2 tab-stops The variable is to be used in files that utilize asymmetric tab-stops. The variable MAY still be specified as a back up in the case of applications that do not support asymmetric tab-stops. Syntax: @format.tab-stops... Where each is a positive decimal (base 10) number in the range of 1 to 255, increasing in order (e.g. 4 8 10). A minimum of two (2) values MUST be specified. A maximum of forty (40) values may be specified. All values MUST be separated by white-space. Any tab-stops on a line beyond the offset of the last specified tab- stop MUST be interpreted as symmetric tab-stops with the width determined by the difference of the last two (2) specified tab- stops. Example: @format.tab-stops 4 8 10 Would result in tab-stops at offsets (from the beginning of each line) of 4, 8, 10, 12, 14, 16, 18, etc. If all tab-stops are symmetric, this variable MUST NOT be specified and the variable MUST be specified instead. 6.3 indent-size The variable is used in cases where the editor supports an indent-size configuration option and it has been configured with a value different than the configured tab-size. The variable is used to specify the symmetric offset of each indent-stop in the file (in non-proportional character widths). Indent-stops are very similar to symmetric tab-stops, except that they are used only in the editing of the file; they are not used in the display or printing of the file. If the variable is specified and the editor is configured to use horizontal-tab characters, it MUST use a combination of horizontal-tab and space characters to indent the proper number of character positions when the tab key is pressed. If the editor supports an indent-size configuration option, but the variable has not been specified and the variable has, the indent-size option shall be set the value specified for the variable. Syntax: @format.indent-size Where is a positive decimal (base 10) number in range of 1 to 60. Example: @format.ident-size 4 Swindell [Page 9] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 6.4 line-length The variable is used to specify the maximum allowable individual line length (excluding any end-of-line character sequences). It is used in cases where the editor has been configured to enforce a right-hand margin. In cases where the editor does not support a right margin, it may be specified by the author to notify the file's co-authors of the desired maximum line length. Syntax: @format.line-length Where is a positive decimal (base 10) number in the range of 1 to 255. Example: @format.line-length 79 6.5 new-line The variable is used to specify the character sequence that signifies the end-of-line. This character sequence will be used to determine the end of each line when the file is read and to terminate individual lines when the file is printed or written to disk. Syntax: @format.new-line Where is one or more decimal (base 10) numbers in the range of 0 to 255 or hexadecimal (base 16) numbers (signified by a "0x" prefix) in the range of 0x00 to 0xff. A maximum of forty (40) values may be specified. If multiple numeric values are specified, they must be separated by white-space. The keywords "CR" and "LF" may also be used in place of a numeric value to signify the carriage-return (ASCII 13) and line-feed (ASCII 10) characters respectively. If the "CR" and "LF" keywords are used, they need not be separated by white- space. The keywords are not case sensitive. Example: @format.new-line lf Would result in lines being terminated by the ASCII line-feed character upon input or output. Example: @format.new-line crlf Would result in lines being terminated by the ASCII carriage- return/line-feed character sequence upon input or output. Swindell [Page 10] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 6.6 use-tabs The variable is used to specify whether the editor was configured to write horizontal-tab (ASCII 9) characters to the file or use the appropriate number of space (ASCII 32) characters in place of each horizontal-tab character (see section 3.2). Syntax: @format.use-tabs Where is one of the following keywords (without quotes): "TRUE", "FALSE", "ON", "OFF", "YES", or "NO". The keywords "TRUE", "ON", and "YES" specify that horizontal- tab characters are to be written to the file. The keywords "FALSE", "OFF", and "NO" specify that the appropriate number of space characters are to be used in place of each horizontal-tab character when the file is read from or written to disk. The keywords are not case sensitive. Example: @format.use-tabs true 7. Formal Syntax The following syntax specification uses the augmented Backus-Naur Form (BNF) and Core Rules as described in RFC 2234 [3]. file = [header] *(*CHAR [escape header]) header = "@format." variable values escape = WSP / LF variable = "tab-size" / "tab-stops" / "indent-size" / "line-length" / "new-line" / "use-tabs" values = 1*40((1*WSP) value) value = numeric / keyword numeric = 1*3DIGIT / ("0x" 1*2HEXDIG) keyword = "true" / "false" / "on" / "off" / "yes" / "no" / "cr" / "lf" 8. Security Considerations There are no known security issues with the solution proposed in this document. Swindell [Page 11] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 9. References [1] ANSI X3.4-1986, "US-ASCII Coded Character Set--7-Bit American Standard Code for Information Interchange". [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Crocker, D., "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. 10. Author's Addresses Robert R. Swindell Wind River Systems, Inc. 3961 MacArthur Blvd., Suite 212 Newport Beach, CA 92660 United States of America Email: swindell@wrs.com Swindell [Page 12] Expires March 2000 Internet Draft Plain Text/Source Code File Header September 1999 Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into Swindell [Page 13] Expires March 2000