de.l3s.boilerpipe.sax.HTMLHighlighter.process java code examples

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

/**
 * returns the article from an document with its basic html structure. 
 * 
 * @param HTMLDocument
 * @param URI the uri from the document for resolving the relative anchors in the document to absolute anchors
 * @return String
 */
public String process(HTMLDocument htmlDoc, URI docUri, final BoilerpipeExtractor extractor) {
  final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
  hh.setOutputHighlightOnly(true);
  TextDocument doc;
  String text = "";
  try {
    doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    text = hh.process(doc, is);
  } catch (Exception ex) {
    return null;
  }
  return removeNotAllowedTags(text, docUri);
}

Javadoc

Processes the given TextDocument and the original HTML text (as a String).

Popular methods of HTMLHighlighter

<init>
setExtraStyleSheet
Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it t
setOutputHighlightOnly
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML docum
setPostHighlight
Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the em
setPreHighlight
Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the
newExtractingInstance
Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enc

Popular in Java

Reactive rest calls using spring rest template
scheduleAtFixedRate (ScheduledExecutorService)
runOnUiThread (Activity)
requestLocationUpdates (LocationManager)
BufferedWriter (java.io)
Wraps an existing Writer and buffers the output. Expensive interaction with the underlying reader is
FileInputStream (java.io)
An input stream that reads bytes from a file. File file = ...finally if (in != null) in.clos
FileOutputStream (java.io)
An output stream that writes bytes to a file. If the output file exists, it can be replaced or appen
Stack (java.util)
Stack is a Last-In/First-Out(LIFO) data structure which represents a stack of objects. It enables u
CountDownLatch (java.util.concurrent)
A synchronization aid that allows one or more threads to wait until a set of operations being perfor
AtomicInteger (java.util.concurrent.atomic)
An int value that may be updated atomically. See the java.util.concurrent.atomic package specificati
CodeWhisperer alternatives

How to use processmethodin de.l3s.boilerpipe.sax.HTMLHighlighter

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter.process (Showing top 9 results out of 315)

How to use
process
method
in
de.l3s.boilerpipe.sax.HTMLHighlighter