Tabnine Logo
HTMLFetcher.fetch
Code IndexAdd Tabnine to your IDE (free)

How to use
fetch
method
in
de.l3s.boilerpipe.sax.HTMLFetcher

Best Java code snippets using de.l3s.boilerpipe.sax.HTMLFetcher.fetch (Showing top 15 results out of 315)

origin: Netbreeze-GmbH/boilerpipe

/**
 * returns the article from an url with its basic html structure. 
 * 
 */
public String process(final BoilerpipeExtractor extractor, final URL url)
    throws IOException, BoilerpipeProcessingException, SAXException, URISyntaxException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  return process(htmlDoc, url.toURI(), extractor);
}

origin: com.syncthemall/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: de.l3s.boilerpipe/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: pvdlg/boilerpipe

/**
 * Extracts text from the HTML code available from the given {@link URL}.
 * NOTE: This method is mainly to be used for show case purposes. If you are
 * going to crawl the Web, consider using {@link #getText(InputSource)}
 * instead.
 * 
 * @param url  The URL pointing to the HTML code.
 * @return  The extracted text.
 * @throws BoilerpipeProcessingException
 */
public String getText(final URL url) throws BoilerpipeProcessingException {
  try {
    return getText(HTMLFetcher.fetch(url).toInputSource());
  } catch (IOException e) {
    throw new BoilerpipeProcessingException(e);
  }
}
origin: de.l3s.boilerpipe/boilerpipe

public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 *            The processed {@link TextDocument}.
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: com.syncthemall/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the retrieved HTML using the specified
 * {@link BoilerpipeExtractor}.
 * 
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 * 
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor) throws IOException,
    BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return The highlighted HTML.
 * @throws BoilerpipeProcessingException
 */
public String process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * 
 * @param doc
 *            The processed {@link TextDocument}.
 * @param is
 *            The original HTML document.
 * @return A List of enclosed {@link Image}s
 * @throws BoilerpipeProcessingException
 */
public List<Image> process(final URL url, final BoilerpipeExtractor extractor)
    throws IOException, BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
      .getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}

origin: pvdlg/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the retrieved HTML using the specified
 * {@link BoilerpipeExtractor}.
 * 
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 * 
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor) throws IOException,
    BoilerpipeProcessingException, SAXException {
  final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
  final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
  extractor.process(doc);
  final InputSource is = htmlDoc.toInputSource();
  return process(doc, is);
}
origin: Netbreeze-GmbH/boilerpipe

/**
 * Fetches the given {@link URL} using {@link HTMLFetcher} and processes the
 * retrieved HTML using the specified {@link BoilerpipeExtractor}.
 * @param url the url of the document to fetch
 * @param extractor extractor to use
 *
 * @return A List of enclosed {@link Image}s
 * @throws IOException
 * @throws BoilerpipeProcessingException
 * @throws SAXException
 */
@SuppressWarnings("javadoc")
public List<Media> process(final URL url, final BoilerpipeExtractor extractor)
        throws IOException, BoilerpipeProcessingException, SAXException {
    final HTMLDocument htmlDoc = HTMLFetcher.fetch(url);
    final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource())
            .getTextDocument();
    extractor.process(doc);
    final InputSource is = htmlDoc.toInputSource();
    return process(doc, is);
}
de.l3s.boilerpipe.saxHTMLFetcherfetch

Javadoc

Fetches the document at the given URL, using URLConnection.

Popular methods of HTMLFetcher

    Popular in Java

    • Reading from database using SQL prepared statement
    • onCreateOptionsMenu (Activity)
    • getResourceAsStream (ClassLoader)
    • putExtra (Intent)
    • MalformedURLException (java.net)
      This exception is thrown when a program attempts to create an URL from an incorrect specification.
    • Calendar (java.util)
      Calendar is an abstract base class for converting between a Date object and a set of integer fields
    • SortedMap (java.util)
      A map that has its keys ordered. The sorting is according to either the natural ordering of its keys
    • JarFile (java.util.jar)
      JarFile is used to read jar entries and their associated data from jar files.
    • JCheckBox (javax.swing)
    • Join (org.hibernate.mapping)
    • Best IntelliJ plugins
    Tabnine Logo
    • Products

      Search for Java codeSearch for JavaScript code
    • IDE Plugins

      IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
    • Company

      About UsContact UsCareers
    • Resources

      FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
    Get Tabnine for your IDE now