Tabnine Logo
TextExtractor
Code IndexAdd Tabnine to your IDE (free)

How to use
TextExtractor
in
it.unimi.dsi.parser.callback

Best Java code snippets using it.unimi.dsi.parser.callback.TextExtractor (Showing top 4 results out of 315)

origin: it.unimi.di/mg4j-big

protected void init() {
  this.parser = new BulletParser();
  
  ComposedCallbackBuilder composedBuilder = new ComposedCallbackBuilder();
  composedBuilder.add( this.textExtractor = new TextExtractor() );
  composedBuilder.add( this.anchorExtractor = new AnchorExtractor( maxPreAnchor, maxAnchor, maxPostAnchor, delimiter ) ); 
  parser.setCallback( composedBuilder.compose() );
  Object o;
  try {
    o = defaultMetadata.get( PropertyBasedDocumentFactory.MetadataKeys.WORDREADER );
    wordReader = o == null ? new FastBufferedReader() : ObjectParser.fromSpec( o.toString(), WordReader.class, MG4JClassParser.PACKAGE );
  }
  catch ( Exception e ) {
    throw new RuntimeException( e );
  }
  text = new char[ DEFAULT_BUFFER_SIZE ];
}
origin: it.unimi.dsi/mg4j

private void init() {
  this.parser = new BulletParser();
  
  ComposedCallbackBuilder composedBuilder = new ComposedCallbackBuilder();
  composedBuilder.add( this.textExtractor = new TextExtractor() );
  composedBuilder.add( this.anchorExtractor = new AnchorExtractor( maxPreAnchor, maxAnchor, maxPostAnchor ) ); 
  parser.setCallback( composedBuilder.compose() );
  Object o;
  try {
    o = defaultMetadata.get( PropertyBasedDocumentFactory.MetadataKeys.WORDREADER );
    wordReader = o == null ? new FastBufferedReader() : ObjectParser.fromSpec( o.toString(), WordReader.class, MG4JClassParser.PACKAGE );
  }
  catch ( Exception e ) {
    throw new RuntimeException( e );
  }
  text = new char[ DEFAULT_BUFFER_SIZE ];
}
origin: it.unimi.dsi/mg4j-big

private void init() {
  this.parser = new BulletParser();
  
  ComposedCallbackBuilder composedBuilder = new ComposedCallbackBuilder();
  composedBuilder.add( this.textExtractor = new TextExtractor() );
  composedBuilder.add( this.anchorExtractor = new AnchorExtractor( maxPreAnchor, maxAnchor, maxPostAnchor ) ); 
  parser.setCallback( composedBuilder.compose() );
  Object o;
  try {
    o = defaultMetadata.get( PropertyBasedDocumentFactory.MetadataKeys.WORDREADER );
    wordReader = o == null ? new FastBufferedReader() : ObjectParser.fromSpec( o.toString(), WordReader.class, MG4JClassParser.PACKAGE );
  }
  catch ( Exception e ) {
    throw new RuntimeException( e );
  }
  text = new char[ DEFAULT_BUFFER_SIZE ];
}
origin: it.unimi.di/mg4j

protected void init() {
  this.parser = new BulletParser();
  
  ComposedCallbackBuilder composedBuilder = new ComposedCallbackBuilder();
  composedBuilder.add( this.textExtractor = new TextExtractor() );
  composedBuilder.add( this.anchorExtractor = new AnchorExtractor( maxPreAnchor, maxAnchor, maxPostAnchor ) ); 
  parser.setCallback( composedBuilder.compose() );
  Object o;
  try {
    o = defaultMetadata.get( PropertyBasedDocumentFactory.MetadataKeys.WORDREADER );
    wordReader = o == null ? new FastBufferedReader() : ObjectParser.fromSpec( o.toString(), WordReader.class, MG4JClassParser.PACKAGE );
  }
  catch ( Exception e ) {
    throw new RuntimeException( e );
  }
  text = new char[ DEFAULT_BUFFER_SIZE ];
}
it.unimi.dsi.parser.callbackTextExtractor

Javadoc

A callback extracting text and titles.

This callbacks extracts all text in the page, and the title. The resulting text is available through #text, and the title through #title.

Note that #text and #title are never trimmed.

Most used methods

  • <init>

Popular in Java

  • Running tasks concurrently on multiple threads
  • getSharedPreferences (Context)
  • orElseThrow (Optional)
    Return the contained value, if present, otherwise throw an exception to be created by the provided s
  • getSupportFragmentManager (FragmentActivity)
  • VirtualMachine (com.sun.tools.attach)
    A Java virtual machine. A VirtualMachine represents a Java virtual machine to which this Java vir
  • FlowLayout (java.awt)
    A flow layout arranges components in a left-to-right flow, much like lines of text in a paragraph. F
  • TimeZone (java.util)
    TimeZone represents a time zone offset, and also figures out daylight savings. Typically, you get a
  • ConcurrentHashMap (java.util.concurrent)
    A plug-in replacement for JDK1.5 java.util.concurrent.ConcurrentHashMap. This version is based on or
  • Modifier (javassist)
    The Modifier class provides static methods and constants to decode class and member access modifiers
  • Reference (javax.naming)
  • Best IntelliJ plugins
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now