Class LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
Inheritance
System.Object
Lucene.Net.Util.AttributeSource
LetterTokenizer
Inherited Members
Namespace:
Assembly: Lucene.Net.NetCore.dll
Syntax
public class LetterTokenizer : CharTokenizer, IDisposable
Constructors
Name | Description |
---|---|
LetterTokenizer(AttributeSource, IO.TextReader) | Construct a new LetterTokenizer using a given Lucene.Net.Util.AttributeSource. |
LetterTokenizer(AttributeSource.AttributeFactory, IO.TextReader) | Construct a new LetterTokenizer using a given Lucene.Net.Util.AttributeSource.AttributeFactory. |
LetterTokenizer(IO.TextReader) | Construct a new LetterTokenizer. |
Methods
Name | Description |
---|---|
IsTokenChar(Char) | Collects only characters which satisfy
|