Tabnine Logo
HTMLHighlighter
Code IndexAdd Tabnine to your IDE (free)

How to use
HTMLHighlighter
in
de.l3s.boilerpipe.sax

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLHighlighter (Showing top 20 results out of 315)

origin: com.syncthemall/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: de.l3s.boilerpipe/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * returns the article from an document with its basic html structure. 
 * 
 * @param HTMLDocument
 * @param URI the uri from the document for resolving the relative anchors in the document to absolute anchors
 * @return String
 */
public String process(HTMLDocument htmlDoc, URI docUri, final BoilerpipeExtractor extractor) {
  final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
  hh.setOutputHighlightOnly(true);
  TextDocument doc;
  String text = "";
  try {
    doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    text = hh.process(doc, is);
  } catch (Exception ex) {
    return null;
  }
  return removeNotAllowedTags(text, docUri);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: com.syncthemall/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: pvdlg/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: pvdlg/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: Netbreeze-GmbH/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: pvdlg/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Processes the given {@link TextDocument} and the original HTML text (as a
 * String).
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param origHTML
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final TextDocument doc, final String origHTML)
    throws BoilerpipeProcessingException {
  return process(doc, new InputSource(new StringReader(origHTML)));
}
origin: com.syncthemall/boilerpipe

private HTMLHighlighter(final boolean extractHTML) {
  if (extractHTML) {
    setOutputHighlightOnly(true);
    setExtraStyleSheet("\n<style type=\"text/css\">\n"
        + "A:before { content:' '; } \n" //
        + "A:after { content:' '; } \n" //
        + "SPAN:before { content:' '; } \n" //
        + "SPAN:after { content:' '; } \n" //
        + "</style>\n");
    setPreHighlight("");
    setPostHighlight("");
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: de.l3s.boilerpipe/boilerpipe

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return only the
 * extracted HTML text, including enclosed markup.
 */
public static HTMLHighlighter newExtractingInstance() {
  return new HTMLHighlighter(true);
}
origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Creates a new {@link HTMLHighlighter}, which is set-up to return the full
 * HTML text, with the extracted text portion <b>highlighted</b>.
 */
public static HTMLHighlighter newHighlightingInstance() {
  return new HTMLHighlighter(false);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
de.l3s.boilerpipe.saxHTMLHighlighter

Javadoc

Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.

Most used methods

  • <init>
  • process
    Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified Boilerp
  • setExtraStyleSheet
    Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it t
  • setOutputHighlightOnly
    Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML docum
  • setPostHighlight
    Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the em
  • setPreHighlight
    Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the
  • newExtractingInstance
    Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enc

Popular in Java

  • Reading from database using SQL prepared statement
  • getApplicationContext (Context)
  • startActivity (Activity)
  • getSharedPreferences (Context)
  • GridLayout (java.awt)
    The GridLayout class is a layout manager that lays out a container's components in a rectangular gri
  • RandomAccessFile (java.io)
    Allows reading from and writing to a file in a random-access manner. This is different from the uni-
  • Path (java.nio.file)
  • Cipher (javax.crypto)
    This class provides access to implementations of cryptographic ciphers for encryption and decryption
  • IOUtils (org.apache.commons.io)
    General IO stream manipulation utilities. This class provides static utility methods for input/outpu
  • DateTimeFormat (org.joda.time.format)
    Factory that creates instances of DateTimeFormatter from patterns and styles. Datetime formatting i
  • Top Vim plugins
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now