Class LowerCaseTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required LuceneVersion compatibility when creating LowerCaseTokenizer:

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See IsTokenChar(Int32) and Normalize(Int32) for details.

Inheritance

System.Object

LowerCaseTokenizer

Inherited Members

LetterTokenizer.IsTokenChar(Int32)

CharTokenizer.IncrementToken()

CharTokenizer.End()

CharTokenizer.Reset()

Tokenizer.m_input

Tokenizer.Dispose(Boolean)

Tokenizer.CorrectOffset(Int32)

Lucene.Net.Analysis.Tokenizer.SetReader(System.IO.TextReader)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

[Serializable]
public sealed class LowerCaseTokenizer : LetterTokenizer, IDisposable

Constructors

Name	Description
LowerCaseTokenizer(LuceneVersion, AttributeSource.AttributeFactory, TextReader)	Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
LowerCaseTokenizer(LuceneVersion, TextReader)	Construct a new LowerCaseTokenizer.

Methods

Name	Description
Normalize(Int32)	Converts char to lower case ToLower(Int32).

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)