Class CJKBigramFilter

Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJK types are set by these tokenizers, but you can also use CJKBigramFilter(TokenStream, CJKScript) to explicitly control which of the CJK scripts are turned into bigrams.

By default, when a CJK character has no adjacent characters to form a bigram, it is output in unigram form. If you want to always output both unigrams and bigrams, set the

outputUnigrams

flag in CJKBigramFilter(TokenStream, CJKScript, Boolean). This can be used for a combined unigram+bigram approach.

In all cases, all non-CJK input is passed thru unmodified.

Inheritance

System.Object

AttributeSource

TokenStream

TokenFilter

CJKBigramFilter

Inherited Members

TokenFilter.m_input

TokenFilter.End()

TokenFilter.Dispose(Boolean)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

[Serializable]
public sealed class CJKBigramFilter : TokenFilter, IDisposable

Constructors

Name	Description
CJKBigramFilter(TokenStream)	Calls CJKBigramFilter(TokenStream, CJKScript)
CJKBigramFilter(TokenStream, CJKScript)	Calls CJKBigramFilter(TokenStream, CJKScript, Boolean)
CJKBigramFilter(TokenStream, CJKScript, Boolean)	Create a new CJKBigramFilter, specifying which writing systems should be bigrammed, and whether or not unigrams should also be output.

Fields

Name	Description
DOUBLE_TYPE	when we emit a bigram, its then marked as this type
SINGLE_TYPE	when we emit a unigram, its then marked as this type

Methods

Name	Description
IncrementToken()
Reset()

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)