net.htmlparser.jericho.TagType.isValidPosition java code examples

Javadoc

Indicates whether a tag of this type is valid in the specified position of the specified source document.
(implementation assistance method)

This method is called immediately before #constructTagAt(Source, int pos)to do a preliminary check on the validity of a tag of this type in the specified position.

This check is not performed as part of the #constructTagAt(Source, int pos) call because the same validation is used for all the standard tag types, and is likely to be sufficient for all custom tag types. Having this check separated into a different method helps to isolate common code from the code that is unique to each tag type.

A TagType#isServerTag() is valid in any position except inside a StartTagType#SERVER_COMMON_COMMENT, but a non-server tag is not valid inside any other tag, nor inside elements with implicit CDATA content such as HTMLElementName#SCRIPT and HTMLElementName#STYLE elements.

The common implementation of this method behaves differently depending upon whether or not a Source#fullSequentialParse()is being peformed.

For server tags it simply checks that the position is not enclosed by a StartTagType#SERVER_COMMON_COMMENT if a full sequential parse is not being performed. If a full sequential parse is being performed, it always returns true for server tags as the parser automatically skips over all positions enclosed by server-side comments, so this method is only called in positions where a server tag is always valid.

When this method is called for non-server tags during a full sequential parse, the fullSequentialParseData argument contains information allowing the exact theoretical check to be performed, rejecting a non-server tag if it is inside any other tag. See below for further information about the fullSequentialParseData parameter.

When this method is called in parse on demand mode (not during a full sequential parse, fullSequentialParseData==null), practical constraints prevent the exact theoretical check from being carried out, and non-server tags are only rejected if they are found inside HTML StartTagType#COMMENT or StartTagType#CDATA_SECTION.

This behaviour is configurable by manipulating the static TagType#getTagTypesIgnoringEnclosedMarkup() array to determine which tag types can not contain non-server tags in parse on demand mode. The TagType#getTagTypesIgnoringEnclosedMarkup() contains a more detailed analysis of the subject, detailing some potential problems with this approach and explaining why only the StartTagType#COMMENT and StartTagType#CDATA_SECTION tag types are included by default.

See the documentation of the tag parsing process for more information about how this method fits into the whole tag parsing process.

This method can be overridden in custom tag types if the default implementation is unsuitable.

The fullSequentialParseData parameter:

This parameter is used to discard non-server tags that are found inside other tags or inside HTMLElementName#SCRIPT elements.

In the current version of this library, the fullSequentialParseData argument is either null (in parse on demand mode) or an integer array containing only a single entry (if a Source#fullSequentialParse() is being peformed).

The integer contained in the array is the maximum position in the document at which the end of a tag has been found, indicating that no non-server tags should be recognised before that position. If no tags have yet been encountered, the value of this integer is zero.

If the last tag encountered was the StartTag of a HTMLElementName#SCRIPT element, the value of this integer is Integer.MAX_VALUE, indicating that no other non-server elements should be recognised until the EndTag of the HTMLElementName#SCRIPT element is found.

The HTML 4 DTD defines script element content as a special type of CDATA. The XHTML DTD changed it to PCDATA, meaning that HTML elements should be parsed inside script elements if they are not escaped by StartTagType#COMMENT or an explicit StartTagType#CDATA_SECTION. The HTML 5 parsing rules reversed this again, making it closer to the original HTML 4 rules. Because this parser is designed to facilitate parsing HTML rather than XHTML, it treats script element content as implicit CDATA, consistent with HTML 4 and HTML 5.

According to the HTML 4.01 specification section 6.2, the first occurrence of the character sequence "</" terminates the special handling of CDATA within HTMLElementName#SCRIPT and HTMLElementName#STYLE elements. This library however only terminates the CDATA handling of HTMLElementName#SCRIPT element content when the character sequence "</script" is detected, in line with the behaviour of the major browsers and with HTML 5 script element parsing rules.

Note that the implicit treatment of HTMLElementName#SCRIPT element content as CDATA also prevents the recognition of StartTagType#COMMENT and explicit StartTagType#CDATA_SECTION inside script elements. All major browsers used to recognise comments inside script elements regardless, which is relevant if the script element contains a javascript string literal "<script", which would terminate the script element unless it was enclosed in a comment. Versions 3.0 to 3.2 of this parser therefore also recognised comments inside script elements in a full sequential parse to maintain compatibility with the major browsers, but the latest versions of gecko and webkit browsers now correctly ignore comments inside script elements, so as of version 3.3 this parser has also reverted to the correct behaviour.

Although HTMLElementName#STYLE elements should theoretically be treated in the same way as HTMLElementName#SCRIPT elements, the syntax of Cascading Style Sheets (CSS) does not contain any constructs that could be misinterpreted as HTML tags, so there is virtually no need to perform any special checks in this case.

IMPLEMENTATION NOTE: The rationale behind using an integer array to hold this value, rather than a scalar int value, is to emulate passing the parameter by reference. This value needs to be shared amongst several internal methods during the Source#fullSequentialParse() process, and any one of those methods needs to be able to modify the value and pass it back to the calling method. This would normally be implemented by passing the parameter by reference, but because Java does not support this language construct, a container for a mutable integer must be passed instead. Because the standard Java library does not provide a class for holding a single mutable integer (the java.lang.Integer class is immutable), the easiest container to use, without creating a class especially for this purpose, is an integer array. The use of an array does not imply any intention to use more than a single array entry in subsequent versions.

Popular methods of TagType

getStartDelimiter
Returns the character sequence that marks the start of the tag. (property [TagType.html#Property] me
register
Registers this tag type for recognition by the parser. (registration related [TagType.html#Registrat
constructTagAt
Constructs a tag of this type at the specified position in the specified source document if it match
deregister
Deregisters this tag type. (registration related [TagType.html#RegistrationRelated] method)
getDescription
Returns a description of this tag type useful for debugging purposes. (property [TagType.html#Proper
getLogger
getTagAt
getTagTypesIgnoringEnclosedMarkup
Returns an array of all the tag types inside which the parser ignores all non- #isServerTag() tags i
isServerTag
Indicates whether this tag type represents a server tag. (property [TagType.html#Property] method) S
tagEncloses
Indicates whether a tag of this type encloses the specified position of the specified source documen

Popular in Java

Finding current android device location
putExtra (Intent)
getOriginalFilename (MultipartFile)
Return the original filename in the client's filesystem.This may contain path information depending
getApplicationContext (Context)
BufferedWriter (java.io)
Wraps an existing Writer and buffers the output. Expensive interaction with the underlying reader is
Socket (java.net)
Provides a client-side TCP socket.
URI (java.net)
A Uniform Resource Identifier that identifies an abstract or physical resource, as specified by RFC
UUID (java.util)
UUID is an immutable representation of a 128-bit universally unique identifier (UUID). There are mul
HttpServletRequest (javax.servlet.http)
Extends the javax.servlet.ServletRequest interface to provide request information for HTTP servlets.
StringUtils (org.apache.commons.lang)
Operations on java.lang.String that arenull safe. * IsEmpty/IsBlank - checks if a String contains
Best plugins for Eclipse

How to use isValidPositionmethodin net.htmlparser.jericho.TagType

Best Java code snippets using net.htmlparser.jericho.TagType.isValidPosition (Showing top 1 results out of 315)

How to use
isValidPosition
method
in
net.htmlparser.jericho.TagType