Class NGramTokenFilter

Tokenizes the input into n-grams of the given size(s).

You must specify the required LuceneVersion compatibility when creating a NGramTokenFilter. As of Lucene 4.4, this token filters:

handles supplementary characters correctly,
emits all n-grams for the same token at the same position,
does not modify offsets,
sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

You can make this filter use the old behavior by providing a version < LUCENE_44 in the constructor but this is not recommended as it will lead to broken TokenStreams that will cause highlighting bugs.

If you were using this TokenFilter to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use NGramTokenizer, and potentially override IsTokenChar(Int32) to perform pre-tokenization.

Inheritance

System.Object

AttributeSource

TokenStream

TokenFilter

NGramTokenFilter

Inherited Members

TokenFilter.m_input

TokenFilter.End()

TokenFilter.Dispose(Boolean)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

[Serializable]
public sealed class NGramTokenFilter : TokenFilter, IDisposable

Constructors

Name	Description
NGramTokenFilter(LuceneVersion, TokenStream)	Creates NGramTokenFilter with default min and max n-grams.
NGramTokenFilter(LuceneVersion, TokenStream, Int32, Int32)	Creates NGramTokenFilter with given min and max n-grams.

Fields

Name	Description
DEFAULT_MAX_NGRAM_SIZE
DEFAULT_MIN_NGRAM_SIZE

Methods

Name	Description
IncrementToken()	Returns the next token in the stream, or null at EOS.
Reset()

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)