Tabnine Logo
NGramTokenFilter
Code IndexAdd Tabnine to your IDE (free)

How to use
NGramTokenFilter
in
org.apache.lucene.analysis.ngram

Best Java code snippets using org.apache.lucene.analysis.ngram.NGramTokenFilter (Showing top 12 results out of 315)

origin: org.apache.lucene/lucene-analyzers-common

 @Override
 public TokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize, preserveOriginal);
 }
}
origin: org.apache.lucene/lucene-analyzers

clearAttributes();
termAtt.copyBuffer(curTermBuffer, curPos, curGramSize);
if (hasIllegalOffsets) {
origin: org.apache.lucene/lucene-analyzers-common

 return false;
state = captureState();
restoreState(state);
final int start = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);
restoreState(state);
posIncrAtt.setPositionIncrement(0);
termAtt.copyBuffer(curTermBuffer, 0, curTermLength);
origin: org.infinispan/infinispan-embedded-query

/**
 * Creates NGramTokenFilter with given min and max n-grams.
 * @param input {@link TokenStream} holding the input to be tokenized
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public NGramTokenFilter(TokenStream input, int minGram, int maxGram) {
 super(new CodepointCountFilter(input, minGram, Integer.MAX_VALUE));
 this.charUtils = CharacterUtils.getInstance();
 if (minGram < 1) {
  throw new IllegalArgumentException("minGram must be greater than zero");
 }
 if (minGram > maxGram) {
  throw new IllegalArgumentException("minGram must not be greater than maxGram");
 }
 this.minGram = minGram;
 this.maxGram = maxGram;
 posIncAtt = addAttribute(PositionIncrementAttribute.class);
 posLenAtt = addAttribute(PositionLengthAttribute.class);
}
origin: org.infinispan/infinispan-embedded-query

clearAttributes();
final int start = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);
origin: rnewson/couchdb-lucene

  @Override
  protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
    return new TokenStreamComponents(components.getTokenizer(),
      new NGramTokenFilter(components.getTokenStream(),
        this.min, this.max));
  }
}
origin: com.strapdata.elasticsearch/elasticsearch

  @Override
  public TokenStream create(TokenStream tokenStream) {
    return new NGramTokenFilter(tokenStream, minGram, maxGram);
  }
}
origin: com.strapdata.elasticsearch/elasticsearch

  @Override
  public TokenStream create(TokenStream tokenStream, Version version) {
    return new NGramTokenFilter(tokenStream);
  }
},
origin: org.dspace.dependencies.solr/dspace-solr-core

 public NGramTokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}
origin: org.codelibs.elasticsearch.module/analysis-common

@Override
public TokenStream create(TokenStream tokenStream) {
  return new NGramTokenFilter(tokenStream, minGram, maxGram);
}
origin: org.infinispan/infinispan-embedded-query

 @Override
 public TokenFilter create(TokenStream input) {
  if (luceneMatchVersion.onOrAfter(Version.LUCENE_4_4_0)) {
   return new NGramTokenFilter(input, minGramSize, maxGramSize);
  }
  return new Lucene43NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}
origin: org.codelibs.elasticsearch.module/analysis-common

            + "Please change the filter name to [ngram] instead.");
  return new NGramTokenFilter(reader);
}));
filters.add(PreConfiguredTokenFilter.singleton("persian_normalization", true, PersianNormalizationFilter::new));
org.apache.lucene.analysis.ngramNGramTokenFilter

Javadoc

Tokenizes the input into n-grams of the given size(s). As of Lucene 4.4, this token filter:
  • handles supplementary characters correctly,
  • emits all n-grams for the same token at the same position,
  • does not modify offsets,
  • sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

If you were using this TokenFilter to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use NGramTokenizer, and potentially override NGramTokenizer#isTokenChar(int) to perform pre-tokenization.

Most used methods

  • <init>
    Creates an NGramTokenFilter that, for a given input term, produces all contained n-grams with length
  • clearAttributes
  • addAttribute
  • captureState
  • restoreState

Popular in Java

  • Parsing JSON documents to java classes using gson
  • setRequestProperty (URLConnection)
  • onRequestPermissionsResult (Fragment)
  • requestLocationUpdates (LocationManager)
  • MalformedURLException (java.net)
    This exception is thrown when a program attempts to create an URL from an incorrect specification.
  • ServerSocket (java.net)
    This class represents a server-side socket that waits for incoming client connections. A ServerSocke
  • Time (java.sql)
    Java representation of an SQL TIME value. Provides utilities to format and parse the time's represen
  • Set (java.util)
    A Set is a data structure which does not allow duplicate elements.
  • Stack (java.util)
    Stack is a Last-In/First-Out(LIFO) data structure which represents a stack of objects. It enables u
  • Option (scala)
  • Github Copilot alternatives
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now