Parser for files containing test cases consisting of
<String,String>
pairs, where the first string is
the input to the test case, and the second string is the expected
output of the test case.
A test-case file is a sequence of here documents
("heredocs"), very similar in syntax to Unix Shell heredocs.
Heredocs labeled "INPUT" indicate the start of a new case, and
these INPUT heredocs the inputs of test cases. Following an
"INPUT" heredoc can more zero or more "expected-output" heredocs.
Each of these expected-output heredocs defines what we call a
subcase. The assumption here is that for each
interesting test input, there are often multiple different tests
one could run, each with different expected outputs.
Consumers of this class call the
#find method to find
all subcases marked with a given label. For example, imagine the
following test-case file:
<<INPUT 0
<<VALUE 0
<<PPRINT 0
<<INPUT 1+1
<<VALUE 2
<<PPRINT 1 + 1
<<SEXP (+ 1 1)
SEXP
Calling
#find on the label "VALUE" will return two test
cases, the pair
<"0","0">
and
<"1+1","2">
. Calling it on the label "PPRINT"
will return
<"0","0">
and
<"1+1","1 +
1">
. Notice that there need not be a subcase for every
INPUT. In the case of "SEXP", for example,
#find will
return only the single pair
<"1+1","(+ 1 1)">
.
There are two forms of heredocs, single-line and multi-line.
The examples above (except "SEXP") are single-line heredocs. The
general syntax for these is:
^<<([a-zA-Z][_a-zA-Z0-9]*) (.*)$
The first group in this regex is the label of the heredoc, and the
second group is the text of the heredoc. A single space separates
the two groups and is not part of there heredoc (subsequent spaces
will be included in the heredoc). A "line terminator" as
defined by the Java language (i.e., CR, LR, or CR followed by LF)
terminates a singline-line heredoc but is not included in the text
of the heredoc.
As the name implies, multi-line heredocs are spread across
multiple lines, as in this example:
<<INPUT
1
+1 +
1
INPUT
<<VALUE 3
<<PPRINT 1 + 1 + 1
In this case, the input to the test case is spread across multiple
lines (the line terminators in these documents are preserved as
part of the document text). Multi-line heredocs can be used for
both the inputs of text cases and the expected outputs of them.
The syntax of multi-line heredocs obey the following pseudo-regex:
^<<([a-zA-Z][_a-zA-Z0-9]*)$(.*)$^\1$
That is, as illustrated by the example, a multi-line heredoc named
"LABEL" consists of the text
<lt;LABEL
on a line by
itself, followed by the text of the heredoc, followed by the text
LABEL
on a line by itself (if LABEL starts a line but
is not the
only text on that line, then that entire line
is part of the heredoc, and the heredoc is not terminated by that
line).
In multi-line heredocs, neither the line terminator that
terminates the start of the document, nor the one just before the
label that ends the heredoc, are part of the text of the heredoc.
Thus, for example, the text of the multi-line input from above
would be exactly "1\n+1 +\n1"
. If you want a new
line at the end of a multi-line heredoc, put a blank line before
the label ending the heredoc.
Also in multi-line heredocs, line-terminators within the heredoc
are normalized to line-feeds ('\n'). Thus, for example, when a
test file written on a Windows machine is parsed on any machine,
the Windows-style line terminators within heredocs will be
translated to Unix-style line terminators, no matter what platform
the tests are run on.
Note that lines between heredocs are ignored, and can be used
to provide spacing between and/or commentary on the test cases.