Class HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
Inheritance
System.Object
HMMChineseTokenizer
Inherited Members
Lucene.Net.Analysis.Tokenizer.SetReader(System.IO.TextReader)
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public class HMMChineseTokenizer : SegmentingTokenizerBase, IDisposable
Constructors
Name | Description |
---|---|
HMMChineseTokenizer(AttributeSource.AttributeFactory, TextReader) | Creates a new HMMChineseTokenizer, supplying the AttributeSource.AttributeFactory |
HMMChineseTokenizer(TextReader) | Creates a new HMMChineseTokenizer |
Methods
Name | Description |
---|---|
IncrementWord() | |
Reset() | |
SetNextSentence(Int32, Int32) |