congrats Icon
New! Announcing Tabnine Chat Beta
Learn More
Tabnine Logo
DefaultExtractor.getText
Code IndexAdd Tabnine to your IDE (free)

How to use
getText
method
in
de.l3s.boilerpipe.extractors.DefaultExtractor

Best Java code snippets using de.l3s.boilerpipe.extractors.DefaultExtractor.getText (Showing top 3 results out of 315)

origin: sujitpal/hia-examples

protected String parse(String rawText) {
 if (StringUtils.isEmpty(rawText)) return null;
 else {
  try {
   return DefaultExtractor.INSTANCE.getText(rawText);
  } catch (BoilerpipeProcessingException e) {
   LOGGER.error(e.getMessage(), e);
   return null;
  }
 }
}
origin: ViDA-NYU/ache

public TargetModelElasticSearch(TargetModelCbor model) {
  URL url = Urls.toJavaURL(model.url);
  String rawContent = (String) model.response.get("body");
  Page page = new Page(url, rawContent);
  page.setParsedData(new ParsedData(new PaginaURL(url, rawContent)));
  this.html = rawContent;
  this.url = model.url;
  this.retrieved = new Date(model.timestamp * 1000);
  this.words = page.getParsedData().getWords();
  this.wordsMeta = page.getParsedData().getWordsMeta();
  this.title = page.getParsedData().getTitle();
  this.domain = url.getHost();
  try {
    this.text = DefaultExtractor.getInstance().getText(page.getContentAsString());
  } catch (Exception e) {
    this.text = "";
  }
  InternetDomainName domainName = InternetDomainName.from(page.getDomainName());
  if (domainName.isUnderPublicSuffix()) {
    this.topPrivateDomain = domainName.topPrivateDomain().toString();
  } else {
    this.topPrivateDomain = domainName.toString();
  }
}
origin: ViDA-NYU/ache

public TargetModelElasticSearch(Page page) {
  this.url = page.getURL().toString();
  this.retrieved = page.getFetchTime() > 0 ? new Date(page.getFetchTime()) : new Date();
  this.domain = page.getDomainName();
  this.responseHeaders = page.getResponseHeaders();
  this.topPrivateDomain = LinkRelevance.getTopLevelDomain(page.getDomainName());
  this.crawlerId = page.getCrawlerId();
  this.isRelevant = page.getTargetRelevance().isRelevant() ? "relevant" : "irrelevant";
  if (page.isHtml()) {
    String contentAsString = page.getContentAsString();
    this.html = contentAsString;
    ParsedData parsedData = page.getParsedData();
    if (parsedData != null) {
      this.words = parsedData.getWords();
      this.wordsMeta = parsedData.getWordsMeta();
      this.title = parsedData.getTitle();
    }
    if (page.getTargetRelevance() != null) {
      this.relevance = page.getTargetRelevance().getRelevance();
    }
    if (contentAsString != null) {
      try {
        this.text = DefaultExtractor.getInstance().getText(contentAsString);
      } catch (Exception e) {
        this.text = "";
      }
    }
  }
}
de.l3s.boilerpipe.extractorsDefaultExtractorgetText

Popular methods of DefaultExtractor

  • getInstance
    Returns the singleton instance for DefaultExtractor.

Popular in Java

  • Creating JSON documents from java classes using gson
  • setContentView (Activity)
  • getSupportFragmentManager (FragmentActivity)
  • requestLocationUpdates (LocationManager)
  • ObjectMapper (com.fasterxml.jackson.databind)
    ObjectMapper provides functionality for reading and writing JSON, either to and from basic POJOs (Pl
  • Pointer (com.sun.jna)
    An abstraction for a native pointer data type. A Pointer instance represents, on the Java side, a na
  • Color (java.awt)
    The Color class is used to encapsulate colors in the default sRGB color space or colors in arbitrary
  • URL (java.net)
    A Uniform Resource Locator that identifies the location of an Internet resource as specified by RFC
  • URLConnection (java.net)
    A connection to a URL for reading or writing. For HTTP connections, see HttpURLConnection for docume
  • Permission (java.security)
    Legacy security code; do not use.
  • Best plugins for Eclipse
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now