Class EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
This Tokenizer create n-grams from the beginning edge or ending edge of a input token.
As of Lucene 4.4, this tokenizer
- can handle
larger than 1024 chars, but beware that this will result in increased memory usagemaxGram
- doesn't trim the input,
- sets position increments equal to 1 instead of 1 for the first token and 0 for all other ones
- doesn't support backward n-grams anymore.
- supports IsTokenChar(Int32) pre-tokenization,
- correctly handles supplementary characters.
Although highly discouraged, it is still possible to use the old behavior through Lucene43EdgeNGramTokenizer.
Inherited Members
Lucene.Net.Analysis.Tokenizer.SetReader(System.IO.TextReader)
System.Object.Equals(System.Object, System.Object)
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.GetType()
System.Object.MemberwiseClone()
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public class EdgeNGramTokenizer : NGramTokenizer, IDisposable
Constructors
Name | Description |
---|---|
EdgeNGramTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader, Int32, Int32) | Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
EdgeNGramTokenizer(LuceneVersion, TextReader, Int32, Int32) | Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range |
Fields
Name | Description |
---|---|
DEFAULT_MAX_GRAM_SIZE | |
DEFAULT_MIN_GRAM_SIZE |