org.apache.lucene.analysis.ngram.EdgeNGramTokenizer java code examples

 @Override
 public Tokenizer create(AttributeFactory factory) {
  return new EdgeNGramTokenizer(factory, minGramSize, maxGramSize);
 }
}

clearAttributes();
int end = start + gramSize;
termAtt.setEmpty().append(inStr, start, end);
offsetAtt.setOffset(correctOffset(start), correctOffset(end));
gramSize++;
return true;

/**
 * Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
 *
 * @param input {@link Reader} holding the input to be tokenized
 * @param side the {@link Side} from which to chop off an n-gram
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public EdgeNGramTokenizer(Reader input, Side side, int minGram, int maxGram) {
 super(input);
 init(side, minGram, maxGram);
}

@Override
public void end() {
 // set final offset
 final int finalOffset = correctOffset(charsRead);
 this.offsetAtt.setOffset(finalOffset, finalOffset);
}

/**
 * Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
 *
 * @param source {@link AttributeSource} to use
 * @param input {@link Reader} holding the input to be tokenized
 * @param side the {@link Side} from which to chop off an n-gram
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public EdgeNGramTokenizer(AttributeSource source, Reader input, Side side, int minGram, int maxGram) {
 super(source, input);
 init(side, minGram, maxGram);
}

  @Override
  protected Tokenizer create(Version version) {
    return new EdgeNGramTokenizer(EdgeNGramTokenizer.DEFAULT_MIN_GRAM_SIZE, EdgeNGramTokenizer.DEFAULT_MAX_GRAM_SIZE);
  }
},

/**
 * Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
 * 
 * @param factory {@link org.apache.lucene.util.AttributeSource.AttributeFactory} to use
 * @param input {@link Reader} holding the input to be tokenized
 * @param side the {@link Side} from which to chop off an n-gram
 * @param minGram the smallest n-gram to generate
 * @param maxGram the largest n-gram to generate
 */
public EdgeNGramTokenizer(AttributeFactory factory, Reader input, Side side, int minGram, int maxGram) {
 super(factory, input);
 init(side, minGram, maxGram);
}

  public EdgeNGramTokenizer create(Reader input) {
    return new EdgeNGramTokenizer(input, side, minGramSize, maxGramSize);
  }
}

  @Override
  public Tokenizer create() {
    if (matcher == null) {
      return new EdgeNGramTokenizer(minGram, maxGram);
    } else {
      return new EdgeNGramTokenizer(minGram, maxGram) {
        @Override
        protected boolean isTokenChar(int chr) {
          return matcher.isTokenChar(chr);
        }
      };
    }
  }
}

  @Override
  public Tokenizer create() {
    if (matcher == null) {
      return new EdgeNGramTokenizer(minGram, maxGram);
    } else {
      return new EdgeNGramTokenizer(minGram, maxGram) {
        @Override
        protected boolean isTokenChar(int chr) {
          return matcher.isTokenChar(chr);
        }
      };
    }
  }
}

  @Override
  public Tokenizer create() {
    if (matcher == null) {
      return new EdgeNGramTokenizer(minGram, maxGram);
    } else {
      return new EdgeNGramTokenizer(minGram, maxGram) {
        @Override
        protected boolean isTokenChar(int chr) {
          return matcher.isTokenChar(chr);
        }
      };
    }
  }
}

 @Override
 public Tokenizer create(AttributeFactory factory) {
  if (luceneMatchVersion.onOrAfter(Version.LUCENE_4_4_0)) {
   return new EdgeNGramTokenizer(factory, minGramSize, maxGramSize);
  }
  return new Lucene43NGramTokenizer(factory, minGramSize, maxGramSize);
 }
}

tokenizers.add(PreConfiguredTokenizer.singleton("ngram", NGramTokenizer::new, null));
tokenizers.add(PreConfiguredTokenizer.singleton("edge_ngram",
  () -> new EdgeNGramTokenizer(EdgeNGramTokenizer.DEFAULT_MIN_GRAM_SIZE, EdgeNGramTokenizer.DEFAULT_MAX_GRAM_SIZE), null));
tokenizers.add(PreConfiguredTokenizer.singleton("pattern", () -> new PatternTokenizer(Regex.compile("\\W+", null), -1), null));
tokenizers.add(PreConfiguredTokenizer.singleton("thai", ThaiTokenizer::new, null));
  () -> new EdgeNGramTokenizer(EdgeNGramTokenizer.DEFAULT_MIN_GRAM_SIZE, EdgeNGramTokenizer.DEFAULT_MAX_GRAM_SIZE), null));
tokenizers.add(PreConfiguredTokenizer.singleton("PathHierarchy", PathHierarchyTokenizer::new, null));

Javadoc

Tokenizes the input from an edge into n-grams of given size(s).

This Tokenizer create n-grams from the beginning edge of a input token.

As of Lucene 4.4, this class supports #isTokenChar(int) and correctly handles supplementary characters.

Most used methods

<init>
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
clearAttributes
correctOffset
init

Popular in Java

Finding current android device location
getContentResolver (Context)
getSystemService (Context)
compareTo (BigDecimal)
HttpURLConnection (java.net)
An URLConnection for HTTP (RFC 2616 [http://tools.ietf.org/html/rfc2616]) used to send and receive d
Set (java.util)
A Set is a data structure which does not allow duplicate elements.
ExecutorService (java.util.concurrent)
An Executor that provides methods to manage termination and methods that can produce a Future for tr
ServletException (javax.servlet)
Defines a general exception a servlet can throw when it encounters difficulty.
Annotation (javassist.bytecode.annotation)
The annotation structure.An instance of this class is returned bygetAnnotations() in AnnotationsAttr
Project (org.apache.tools.ant)
Central representation of an Ant project. This class defines an Ant project with all of its targets,
Top 12 Jupyter Notebook extensions

How to useEdgeNGramTokenizer in org.apache.lucene.analysis.ngram

Best Java code snippets using org.apache.lucene.analysis.ngram.EdgeNGramTokenizer (Showing top 13 results out of 315)

How to use
EdgeNGramTokenizer
in
org.apache.lucene.analysis.ngram