Handling Long Lines in Inclusions in Internet-Drafts and RFCsWatsen Networkskent+ietf@watsen.netOld Dog Consultingadrian@olddog.co.ukHuawei Technologiesbill.wu@huawei.com
Operations
NETMOD Working GroupsourcecodeartworkThis document defines two strategies for handling long lines in width-bounded
text content. One strategy is based on the historic use of a single backslash
('\') character to indicate where line-folding has occurred, with the continuation
occurring with the first non-space (' ') character on the next line. The second
strategy extends the first strategy by adding a second backslash character to
identify where the continuation begins and thereby able to handle cases not
supported by the first strategy. Both strategies use a self-describing header
enabling automated reconstitution of the original content. sets out the requirements for
plain-text RFCs and states that each line of an RFC (and hence of
an Internet-Draft) must be limited to 72 characters followed by
the character sequence that denotes an end-of-line (EOL).Internet-Drafts and RFCs often include example text or code
fragments. Many times the example text or code
exceeds the 72 character line-length limit. The `xml2rfc`
utility does not attempt to wrap the content of such inclusions,
simply issuing a warning whenever lines exceed 69 characters.
According to the RFC Editor, there is currently no convention
in place for how to handle long lines in such inclusions, other than advising
authors to clearly indicate what manipulation has occurred.This document defines two strategies for handling long lines in width-bounded
text content. One strategy is based on the historic use of a single backslash
('\') character to indicate where line-folding has occurred, with the continuation
occurring with the first non-space (' ') character on the next line. The second
strategy extends the first strategy by adding a second backslash character to
identify where the continuation begins and thereby able to handle cases not
supported by the first strategy. Both strategies use a self-describing header
enabling automated reconstitution of the original content.The strategies defined in this document work on any text content, but are
primarily intended for a structured sequence of lines, such as would be
referenced by the <sourcecode> element defined in Section 2.48 of
, rather than for two-dimensional imagery, such
as would be referenced by the <artwork> element defined in Section
2.5 of .Note that text files are represented as lines having their first
character in column 1, and a line length of N where the last
character is in the Nth column and is immediately followed by an end
of line character sequence.The formats and algorithms defined in this document may be used
in any context, whether for IETF documents or in other situations
where structured folding is desired.Within the IETF, this work primarily targets the xml2rfc v3
<sourcecode> element (Section 2.48 of )
and the xml2rfc v2 <artwork> element (Section 2.5 of
) that, for lack of a better option, is
currently used for both source code and artwork. This work may
be also be used for the xml2rfc v3 <artwork> element
(Section 2.5 of ) but, as described in
, it is generally not recommended.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.Automated folding of long lines is needed in order to support
draft compilations that entail a) validation of source input
files (e.g., XML, JSON, ABNF, ASN.1) and/or b) dynamic
generation of output, using a tool that doesn't observe line
lengths, that is stitched into the final document to be submitted.Generally, in order for tooling to be able to process input
files, the files must be in their original/natural state, which
may entail them having some long lines. Thus, these source files
need to be modified before inclusion in the document in order to
satisfy the line length limits. This modification SHOULD be
automated to reduce effort and errors resulting from manual
processing.Similarly, dynamically generated output (e.g., tree diagrams)
must also be modified, if necessary, in order for the resulting
document to satisfy the line length limits. When needed, this effort
again SHOULD be automated to reduce effort and errors
resulting from manual processing.Automated reconstitution of the exact original text content is needed to
support validation of text-based content extracted from documents.For instance, already YANG modules are
extracted from Internet-Drafts and validated as part of the
draft-submission process. Additionally, the desire to validate
instance examples (i.e., XML/JSON documents) contained within
Internet-Drafts has been discussed ().While the solution presented in this document works on any
kind of text-based content, it is most useful on content that
represents source code (XML, JSON, etc.) or, more generally, on
content that has not been laid out in two dimensions (e.g., diagrams).Fundamentally, the issue is whether the text content remains readable
once folded. Text content that is unpredictable is especially susceptible
to looking bad when folded; falling into this category are most
UML diagrams, YANG tree diagrams, and ASCII art in general.It is NOT RECOMMENDED to use the solution presented in
this document on graphical artwork.The solution presented in this document works generically
for all text-based content, as it only views content as plain
text. However, various formats sometimes have built-in mechanisms
that are better suited to prevent long lines.For instance, both the `pyang` and `yanglint` utilities
have the command line option "--tree-line-length" that can
be used to indicate a desired maximum line length for when
generating tree diagrams .In another example, some source formats (e.g., YANG
) allow any quoted string to be
broken up into substrings separated by a concatenation
character (e.g., '+'), any of which can be on a different
line.It is RECOMMENDED that authors do as much as possible
within the selected format to avoid long lines.This document defines two nearly identical strategies for folding
text-based content.
Uses a backslash
('\') character at the end of the line where folding occurs, and
assumes that the continuation begins at the character that is not
a space character (' ') on the following line.Uses a backslash
('\') character at the end of the line where folding occurs, and
assumes that the continuation begins after a second backslash ('\')
character on the following line.The first strategy produces more readable output, however it is
significantly more likely to encounter unfoldable input (e.g.,
there exists a line anywhere in the input ending with a backslash
character, or there exists a long line containing only space
characters) and, for long lines that can be folded, automation
implementations may encounter scenarios that will produce errors
without special care.The second strategy produces less readable output, but is
unlikely to encounter unfoldable input, there are no long lines
that cannot be folded, and no special care is required for when
folding a long line.It is RECOMMENDED for implementations to first attempt to fold
content using the single backslash strategy and, only in the
unlikely event that it cannot fold the input or the folding
logic is unable to cope with a contingency occurring on the
desired folding column, then fallback to the double backslash
strategy.Text content that has been folded as specified by this strategy
MUST adhere to the following structure.The header is two lines long.The first line is the following 45-character string that
MAY be surrounded by any number of printable characters.
This first line cannot itself be folded.
The second line is a blank line. This line provides visual
separation for readability.The character encoding is the same as described in Section 2
of , except that, per ,
tab characters are prohibited.Lines that have a backslash ('\') occurring as the last character in
a line are considered "folded".Really long lines may be folded multiple times.This section describes a process for folding and unfolding long
lines when they are encountered in text content.The steps are complete, but implementations MAY achieve the same
result in other ways.When a larger document contains multiple instances of text content
that may need to be folded or unfolded, another process must
insert/extract the individual text content instances to/from the
larger document prior to utilizing the algorithms described in this
section. For example, the `xiax` utility does this.Determine the desired maximum line length from input to the
line-wrapping process, such as from a command line
parameter. If no value is explicitly specified, the value "69"
SHOULD be used.Ensure that the desired maximum line length is not less than
the minimum header, which is 45 characters. If the desired
maximum line length is less than this minimum, exit (this text-based
content cannot be folded).Scan the text content for horizontal tab characters. If any
horizontal tab characters appear, either resolve them to space
characters or exit, forcing the input provider to convert them
to space characters themselves first.Scan the text content to ensure at least one line exceeds the
desired maximum. If no line exceeds the desired maximum, exit
(this text content does not
need to be folded).Scan the text content to ensure no existing lines already end with a
backslash ('\') character, as this would lead to an ambiguous result.
If such a line is found, exit (this text content cannot be folded).If this text content needs to and can be folded, insert the header
described in , ensuring that any additional
printable characters surrounding the header does not result in a
line exceeding the desired maximum.For each line in the text content, from top-to-bottom, if the line
exceeds the desired maximum, then fold the line by:
Determine where the fold will occur. This location MUST be before
or at the desired maximum column, and MUST NOT be chosen such that
the character immediately after the fold is a space (' ') character.
If no such location can be found, then exit (this text content cannot
be folded)At the location where the fold is to occur, insert a backslash
('\') character followed by the end of line character sequence.On the following line, insert any number of space (' ') characters.The result of the previous operation is that the next line starts
with an arbitrary number of space (' ') characters, followed by the
character that was previously occupying the position where the fold
occurred.Continue in this manner until reaching the end of the text content. Note
that this algorithm naturally addresses the case where the remainder
of a folded line is still longer than the desired maximum, and hence
needs to be folded again, ad infinitum.The process described in this section is illustrated by the "fold_it_1()"
function in .Scan the beginning of the text content for the header described in
. If the header is not present, starting
on the first line of the text content, exit (this text contents does not
need to be unfolded).Remove the 2-line header from the text content.For each line in the text content, from top-to-bottom, if the line has
a backslash ('\') character immediately followed by the end of line
character sequence, then the line can be unfolded.
Remove the backslash ('\') character, the end of line character
sequence, and any leading space (' ') characters, which will bring up
the next line. Then continue to scan each line in the text content
starting with the current line (in case it was multiply folded).Continue in this manner until reaching the end of the text content.The process described in this section is illustrated by the "unfold_it_1()"
function in .Text content that has been folded as specified by this strategy
MUST adhere to the following structure.The header is two lines long.The first line is the following 46-character string that
MAY be surrounded by any number of printable characters.
This first line cannot itself be folded.
The second line is a blank line. This line provides visual
separation for readability.The character encoding is the same as described in Section 2
of , except that, per ,
tab characters are prohibited.Lines that have a backslash ('\') occurring as the last character in
a line immediately followed by the end of line character sequence, when
the subsequent line starts with a backslash ('\') as the first non-space
(' ') character, are considered "folded".Really long lines may be folded multiple times.This section describes a process for folding and unfolding long
lines when they are encountered in text content.The steps are complete, but implementations MAY achieve the same
result in other ways.When a larger document contains multiple instances of text content
that may need to be folded or unfolded, another process must
insert/extract the individual text content instances to/from the
larger document prior to utilizing the algorithms described in this
section. For example, the `xiax` utility does this.Determine the desired maximum line length from input to the
line-wrapping process, such as from a command line
parameter. If no value is explicitly specified, the value "69"
SHOULD be used.Ensure that the desired maximum line length is not less than
the minimum header, which is 46 characters. If the desired
maximum line length is less than this minimum, exit (this text-based
content cannot be folded).Scan the text content for horizontal tab characters. If any
horizontal tab characters appear, either resolve them to space
characters or exit, forcing the input provider to convert them
to space characters themselves first.Scan the text content to see if any line exceeds the desired maximum.
If no line exceeds the desired maximum, exit (this text content does not
need to be folded).Scan the text content to ensure no existing lines already end with a
backslash ('\') character while the subsequent line starts with a
backslash ('\') character as the first non-space (' ') character,
as this could lead to an ambiguous result. If such a line is found,
and its width is less than the desired maximum, then it SHOULD be
flagged for forced folding (folding even though unnecessary). If
the folding implementation doesn't support forced foldings, it MUST
exit.If this text content needs to and can be folded, insert the header
described in , ensuring that any additional
printable characters surrounding the header does not result in a
line exceeding the desired maximum.For each line in the text content, from top-to-bottom, if the line
exceeds the desired maximum, or requires a forced folding, then
fold the line by:
Determine where the fold will occur. This location MUST be before
or at the desired maximum column.At the location where the fold is to occur, insert a first
backslash ('\') character followed by the end of line character
sequence.On the following line, insert any number of space (' ') characters
followed by a second backslash ('\') character.The result of the previous operation is that the next line starts
with an arbitrary number of space (' ') characters, followed by a
backslash ('\') character, immediately followed by the character that
was previously occupying the position where the fold occurred.Continue in this manner until reaching the end of the text content. Note
that this algorithm naturally addresses the case where the remainder
of a folded line is still longer than the desired maximum, and hence
needs to be folded again, ad infinitum.The process described in this section is illustrated by the "fold_it_2()"
function in .Scan the beginning of the text content for the header described in
. If the header is not present, starting
on the first line of the text content, exit (this text content does not
need to be unfolded).Remove the 2-line header from the text content.For each line in the text content, from top-to-bottom, if the line has
a backslash ('\') character immediately followed by the end of line
character sequence, and if the next line has a backslash ('\') character
as the first non-space (' ') character, then the lines can be unfolded.
Remove the first backslash ('\') character, the end of line character
sequence, any leading space (' ') characters, and the second backslash
('\') character, which will bring up the next line. Then continue to
scan each line in the text content starting with the current line (in case
it was multiply folded).Continue in this manner until reaching the end of the text content.The process described in this section is illustrated by the "unfold_it_2()"
function in .The following self-documenting examples illustrate folded
text-based content.The source text content cannot be presented here, as it would
again be folded. Alas, only the results can be provided.This example illustrates boundary condition. The input contains
seven lines, each line one character longer than the previous line.
Numbers for counting purposes. The default desired maximum column
value "69" is used.This example illustrates what happens when very long line needs to
be folded multiple times. The input contains one line containing
280 characters. Numbers for counting purposes. The default
desired maximum column value "69" is used.This example illustrates how readability can be improved via "smart"
folding, whereby folding occurs at format-specific locations and
format-specific indentations are used.The text content was manually folded, since the script in the appendix
does not implement smart folding.Note that the header is surrounded by different printable characters
then shown in the script-generated examples.Below is the equivalent to the above, but it was folded using the
script in the appendix.Below is the equivalent to the above, but it was folded using the
script in the appendix.This BCP has no Security Considerations.This BCP has no IANA Considerations.[yang-doctors] automating yang doctor reviewsThe `xiax` Python PackageThis non-normative appendix section includes a shell script
that can both fold and unfold text content using both the
single and double backslash strategies described in
and
respectively.This script is intended to be applied to a single text content
instance. If it is desired to fold or unfold test content instances
within a larger document (e.g., an Internet draft or RFC), then
another tool must be used to extract the content from the larger
document before utilizing this script.For readability purposes, this script forces the minimally
supported line length to be eight characters longer than the
raw header text defined in and
so as to ensure that the header
can be wrapped by a space (' ') character and three equal ('=')
characters on each side of the raw header text.This script does not implement the "forced folding" logic
described in . In such cases
the script will exit with the message:
Shell-level end-of-line backslash ('\') characters have been
purposely added to the script so as to ensure that the script is
itself not folded in this document, thus simplify the ability to
copy/paste the script for local use. As should be evident by the
lack of the mandatory header described in ,
these backslashes do not designate a folded line, such as described
in .The authors thank the following folks for their various
contributions (sorted by first name):
BenoƮt Claise, Gianmarco Bruno, Italo Busi, Joel Jaeggli,
Jonathan Hansford, Lou Berger, Martin Bjorklund, and Rob Wilton.The authors additionally thank the RFC Editor for confirming
that there is no set convention today for handling long lines in
artwork/sourcecode inclusions.