How to use
getMetaTagValue
method
in
edu.uci.ics.crawler4j.parser.HtmlParseData

Best Java code snippets using edu.uci.ics.crawler4j.parser.HtmlParseData.getMetaTagValue (Showing top 4 results out of 315)

/**
 * Classes that extends WebCrawler should overwrite this function to tell the
 * crawler whether the given url should be crawled or not. The following
 * default implementation indicates that all urls should be included in the crawl
 * except those with a nofollow flag.
 *
 * @param url
 *            the url which we are interested to know whether it should be
 *            included in the crawl or not.
 * @param referringPage
 *           The Page in which this url was found.
 * @return if the url should be included in the crawl it returns true,
 *         otherwise false is returned.
 */
public boolean shouldVisit(Page referringPage, WebURL url) {
  if (myController.getConfig().isRespectNoFollow()) {
    return !((referringPage != null &&
        referringPage.getContentType() != null &&
        referringPage.getContentType().contains("html") &&
        ((HtmlParseData)referringPage.getParseData())
          .getMetaTagValue("robots")
          .contains("nofollow")) ||
        url.getAttribute("rel").contains("nofollow"));
  }
  return true;
}

page.getContentType().contains("html") &&
((HtmlParseData)page.getParseData())
  .getMetaTagValue("robots").
  contains("noindex");

/**
 * Classes that extends WebCrawler should overwrite this function to tell the
 * crawler whether the given url should be crawled or not. The following
 * default implementation indicates that all urls should be included in the crawl
 * except those with a nofollow flag.
 *
 * @param url
 *            the url which we are interested to know whether it should be
 *            included in the crawl or not.
 * @param referringPage
 *           The Page in which this url was found.
 * @return if the url should be included in the crawl it returns true,
 *         otherwise false is returned.
 */
public boolean shouldVisit(Page referringPage, WebURL url) {
  if (myController.getConfig().isRespectNoFollow()) {
    return !((referringPage != null &&
        referringPage.getContentType() != null &&
        referringPage.getContentType().contains("html") &&
        ((HtmlParseData)referringPage.getParseData())
          .getMetaTagValue("robots")
          .contains("nofollow")) ||
        url.getAttribute("rel").contains("nofollow"));
  }
  return true;
}

page.getContentType().contains("html") &&
((HtmlParseData)page.getParseData())
  .getMetaTagValue("robots").
  contains("noindex");

Popular methods of HtmlParseData

Popular in Java

Making http post requests using okhttp
startActivity (Activity)
setContentView (Activity)
runOnUiThread (Activity)
File (java.io)
An "abstract" representation of a file system entity identified by a pathname. The pathname may be a
NumberFormat (java.text)
The abstract base class for all number formats. This class provides the interface for formatting and
LinkedHashMap (java.util)
LinkedHashMap is an implementation of Map that guarantees iteration order. All optional operations a
SortedMap (java.util)
A map that has its keys ordered. The sorting is according to either the natural ordering of its keys
Timer (java.util)
Timers schedule one-shot or recurring TimerTask for execution. Prefer java.util.concurrent.Scheduled
Filter (javax.servlet)
A filter is an object that performs filtering tasks on either the request to a resource (a servlet o
From CI to AI: The AI layer in your organization

How to use getMetaTagValuemethodin edu.uci.ics.crawler4j.parser.HtmlParseData

Best Java code snippets using edu.uci.ics.crawler4j.parser.HtmlParseData.getMetaTagValue (Showing top 4 results out of 315)

How to use
getMetaTagValue
method
in
edu.uci.ics.crawler4j.parser.HtmlParseData