Tabnine Logo
NGramTokenFilter
Code IndexAdd Tabnine to your IDE (free)

How to use
NGramTokenFilter
in
org.apache.lucene.analysis.ngram

Best Java code snippets using org.apache.lucene.analysis.ngram.NGramTokenFilter (Showing top 12 results out of 315)

origin: org.apache.lucene/lucene-analyzers-common

 @Override
 public TokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize, preserveOriginal);
 }
}
origin: org.apache.lucene/lucene-analyzers

clearAttributes();
termAtt.copyBuffer(curTermBuffer, curPos, curGramSize);
if (hasIllegalOffsets) {
origin: org.apache.lucene/lucene-analyzers-common

 return false;
state = captureState();
restoreState(state);
final int start = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = Character.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);
restoreState(state);
posIncrAtt.setPositionIncrement(0);
termAtt.copyBuffer(curTermBuffer, 0, curTermLength);
origin: org.infinispan/infinispan-embedded-query

/**
 * Creates NGramTokenFilter with given min and max n-grams.
 * @param input {@link TokenStream} holding the input to be tokenized
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public NGramTokenFilter(TokenStream input, int minGram, int maxGram) {
 super(new CodepointCountFilter(input, minGram, Integer.MAX_VALUE));
 this.charUtils = CharacterUtils.getInstance();
 if (minGram < 1) {
  throw new IllegalArgumentException("minGram must be greater than zero");
 }
 if (minGram > maxGram) {
  throw new IllegalArgumentException("minGram must not be greater than maxGram");
 }
 this.minGram = minGram;
 this.maxGram = maxGram;
 posIncAtt = addAttribute(PositionIncrementAttribute.class);
 posLenAtt = addAttribute(PositionLengthAttribute.class);
}
origin: org.infinispan/infinispan-embedded-query

clearAttributes();
final int start = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, 0, curPos);
final int end = charUtils.offsetByCodePoints(curTermBuffer, 0, curTermLength, start, curGramSize);
origin: rnewson/couchdb-lucene

  @Override
  protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
    return new TokenStreamComponents(components.getTokenizer(),
      new NGramTokenFilter(components.getTokenStream(),
        this.min, this.max));
  }
}
origin: com.strapdata.elasticsearch/elasticsearch

  @Override
  public TokenStream create(TokenStream tokenStream) {
    return new NGramTokenFilter(tokenStream, minGram, maxGram);
  }
}
origin: com.strapdata.elasticsearch/elasticsearch

  @Override
  public TokenStream create(TokenStream tokenStream, Version version) {
    return new NGramTokenFilter(tokenStream);
  }
},
origin: org.dspace.dependencies.solr/dspace-solr-core

 public NGramTokenFilter create(TokenStream input) {
  return new NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}
origin: org.codelibs.elasticsearch.module/analysis-common

@Override
public TokenStream create(TokenStream tokenStream) {
  return new NGramTokenFilter(tokenStream, minGram, maxGram);
}
origin: org.infinispan/infinispan-embedded-query

 @Override
 public TokenFilter create(TokenStream input) {
  if (luceneMatchVersion.onOrAfter(Version.LUCENE_4_4_0)) {
   return new NGramTokenFilter(input, minGramSize, maxGramSize);
  }
  return new Lucene43NGramTokenFilter(input, minGramSize, maxGramSize);
 }
}
origin: org.codelibs.elasticsearch.module/analysis-common

            + "Please change the filter name to [ngram] instead.");
  return new NGramTokenFilter(reader);
}));
filters.add(PreConfiguredTokenFilter.singleton("persian_normalization", true, PersianNormalizationFilter::new));
org.apache.lucene.analysis.ngramNGramTokenFilter

Javadoc

Tokenizes the input into n-grams of the given size(s). As of Lucene 4.4, this token filter:
  • handles supplementary characters correctly,
  • emits all n-grams for the same token at the same position,
  • does not modify offsets,
  • sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

If you were using this TokenFilter to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use NGramTokenizer, and potentially override NGramTokenizer#isTokenChar(int) to perform pre-tokenization.

Most used methods

  • <init>
    Creates an NGramTokenFilter that, for a given input term, produces all contained n-grams with length
  • clearAttributes
  • addAttribute
  • captureState
  • restoreState

Popular in Java

  • Reading from database using SQL prepared statement
  • setRequestProperty (URLConnection)
  • startActivity (Activity)
  • getOriginalFilename (MultipartFile)
    Return the original filename in the client's filesystem.This may contain path information depending
  • BufferedWriter (java.io)
    Wraps an existing Writer and buffers the output. Expensive interaction with the underlying reader is
  • InputStreamReader (java.io)
    A class for turning a byte stream into a character stream. Data read from the source input stream is
  • URLEncoder (java.net)
    This class is used to encode a string using the format required by application/x-www-form-urlencoded
  • Enumeration (java.util)
    A legacy iteration interface.New code should use Iterator instead. Iterator replaces the enumeration
  • Callable (java.util.concurrent)
    A task that returns a result and may throw an exception. Implementors define a single method with no
  • LogFactory (org.apache.commons.logging)
    Factory for creating Log instances, with discovery and configuration features similar to that employ
  • Github Copilot alternatives
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now