Indicates whether a tag of this type is valid in the specified position of the specified source document.
(
implementation assistance method)
This method is called immediately before
#constructTagAt(Source, int pos)to do a preliminary check on the validity of a tag of this type in the specified position.
This check is not performed as part of the
#constructTagAt(Source, int pos) call because the same
validation is used for all the standard tag types, and is likely to be sufficient
for all custom tag types.
Having this check separated into a different method helps to isolate common code from the code that is unique to each tag type.
A
TagType#isServerTag() is valid in any position except inside a
StartTagType#SERVER_COMMON_COMMENT,
but a non-server tag is not valid inside any other tag, nor inside elements with implicit CDATA content such as
HTMLElementName#SCRIPT and
HTMLElementName#STYLE elements.
The common implementation of this method behaves differently depending upon whether or not a
Source#fullSequentialParse()is being peformed.
For server tags it simply checks that the position is not enclosed by a
StartTagType#SERVER_COMMON_COMMENT if a full sequential parse
is not being performed. If a full sequential parse is being performed, it always returns true
for server tags as the parser automatically skips over
all positions enclosed by server-side comments, so this method is only called in positions where a server tag is always valid.
When this method is called for non-server tags during a full sequential parse, the fullSequentialParseData
argument contains information
allowing the exact theoretical check to be performed, rejecting a non-server tag if it is inside any other tag.
See below for further information about the fullSequentialParseData
parameter.
When this method is called in parse on demand mode
(not during a full sequential parse, fullSequentialParseData==null
),
practical constraints prevent the exact theoretical check from being carried out, and non-server tags are only rejected
if they are found inside HTML
StartTagType#COMMENT or
StartTagType#CDATA_SECTION.
This behaviour is configurable by manipulating the static
TagType#getTagTypesIgnoringEnclosedMarkup() array
to determine which tag types can not contain non-server tags in parse on demand mode.
The
TagType#getTagTypesIgnoringEnclosedMarkup() contains
a more detailed analysis of the subject, detailing some potential problems with this approach and explaining why only the
StartTagType#COMMENT and
StartTagType#CDATA_SECTION tag types are included by default.
See the documentation of the tag parsing process for more information about how this method fits into the whole tag parsing process.
This method can be overridden in custom tag types if the default implementation is unsuitable.
The fullSequentialParseData
parameter:
This parameter is used to discard non-server tags that are found inside other tags or inside
HTMLElementName#SCRIPT elements.
In the current version of this library, the fullSequentialParseData
argument is either null
(in parse on demand mode) or an integer array containing only a single entry
(if a
Source#fullSequentialParse() is being peformed).
The integer contained in the array is the maximum position in the document at which the end of a tag has been found,
indicating that no non-server tags should be recognised before that position.
If no tags have yet been encountered, the value of this integer is zero.
If the last tag encountered was the
StartTag of a
HTMLElementName#SCRIPT element,
the value of this integer is Integer.MAX_VALUE
, indicating that no other non-server elements should be recognised until the
EndTag of the
HTMLElementName#SCRIPT element is found.
The HTML 4 DTD defines script element content as a special type of CDATA. The XHTML DTD changed it to PCDATA, meaning that HTML elements should be parsed
inside script elements if they are not escaped by
StartTagType#COMMENT or an explicit
StartTagType#CDATA_SECTION.
The HTML 5 parsing rules reversed this
again, making it closer to the original HTML 4 rules. Because this parser is designed to facilitate parsing HTML rather than XHTML, it treats script element content
as implicit CDATA, consistent with HTML 4 and HTML 5.
According to the HTML 4.01 specification section 6.2,
the first occurrence of the character sequence "</
" terminates the special handling of CDATA within
HTMLElementName#SCRIPT and
HTMLElementName#STYLE elements.
This library however only terminates the CDATA handling of
HTMLElementName#SCRIPT element content
when the character sequence "</script
" is detected, in line with the behaviour of the major browsers and with
HTML 5 script element parsing rules.
Note that the implicit treatment of
HTMLElementName#SCRIPT element content as CDATA also prevents the recognition of
StartTagType#COMMENT and explicit
StartTagType#CDATA_SECTION inside script elements.
All major browsers used to recognise comments inside script elements regardless, which is relevant if the script element contains a javascript string literal
"<script
", which would terminate the script element unless it was enclosed in a comment.
Versions 3.0 to 3.2 of this parser therefore also recognised comments inside script elements in a full sequential parse to maintain compatibility with the
major browsers, but the latest versions of gecko and webkit browsers now correctly ignore comments inside script elements, so as of version 3.3 this parser
has also reverted to the correct behaviour.
Although
HTMLElementName#STYLE elements should theoretically be treated in the same way as
HTMLElementName#SCRIPT elements,
the syntax of Cascading Style Sheets (CSS) does not contain any constructs that
could be misinterpreted as HTML tags, so there is virtually no need to perform any special checks in this case.
IMPLEMENTATION NOTE: The rationale behind using an integer array to hold this value, rather than a scalar int
value,
is to emulate passing the parameter by reference.
This value needs to be shared amongst several internal methods during the
Source#fullSequentialParse() process,
and any one of those methods needs to be able to modify the value and pass it back to the calling method.
This would normally be implemented by passing the parameter by reference, but because Java does not support this language construct, a container for a
mutable integer must be passed instead.
Because the standard Java library does not provide a class for holding a single mutable integer (the java.lang.Integer
class is immutable),
the easiest container to use, without creating a class especially for this purpose, is an integer array.
The use of an array does not imply any intention to use more than a single array entry in subsequent versions.