congrats Icon
New! Tabnine Pro 14-day free trial
Start a free trial
Tabnine Logo
HTMLHighlighter.process
Code IndexAdd Tabnine to your IDE (free)

How to use
process
method
in
de.l3s.boilerpipe.sax.HTMLHighlighter

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter.process (Showing top 9 results out of 315)

origin: de.l3s.boilerpipe/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: com.syncthemall/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: pvdlg/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: de.l3s.boilerpipe/boilerpipe

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * returns the article from an document with its basic html structure. 
 * 
 * @param HTMLDocument
 * @param URI the uri from the document for resolving the relative anchors in the document to absolute anchors
 * @return String
 */
public String process(HTMLDocument htmlDoc, URI docUri, final BoilerpipeExtractor extractor) {
  final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
  hh.setOutputHighlightOnly(true);
  TextDocument doc;
  String text = "";
  try {
    doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    text = hh.process(doc, is);
  } catch (Exception ex) {
    return null;
  }
  return removeNotAllowedTags(text, docUri);
}
de.l3s.boilerpipe.saxHTMLHighlighterprocess

Javadoc

Processes the given TextDocument and the original HTML text (as a String).

Popular methods of HTMLHighlighter

  • <init>
  • setExtraStyleSheet
    Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it t
  • setOutputHighlightOnly
    Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML docum
  • setPostHighlight
    Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the em
  • setPreHighlight
    Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the
  • newExtractingInstance
    Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enc

Popular in Java

  • Running tasks concurrently on multiple threads
  • onCreateOptionsMenu (Activity)
  • setScale (BigDecimal)
  • getOriginalFilename (MultipartFile)
    Return the original filename in the client's filesystem.This may contain path information depending
  • FileInputStream (java.io)
    An input stream that reads bytes from a file. File file = ...finally if (in != null) in.clos
  • Proxy (java.net)
    This class represents proxy server settings. A created instance of Proxy stores a type and an addres
  • Hashtable (java.util)
    A plug-in replacement for JDK1.5 java.util.Hashtable. This version is based on org.cliffc.high_scale
  • Timer (java.util)
    Timers schedule one-shot or recurring TimerTask for execution. Prefer java.util.concurrent.Scheduled
  • JarFile (java.util.jar)
    JarFile is used to read jar entries and their associated data from jar files.
  • DateTimeFormat (org.joda.time.format)
    Factory that creates instances of DateTimeFormatter from patterns and styles. Datetime formatting i
  • Top 17 Free Sublime Text Plugins
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimAtomGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyStudentsTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now