Class HMMChineseTokenizerFactory
Factory for HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
@lucene.experimental
Inherited Members
Assembly: Lucene.Net.Analysis.SmartCn.dll
Syntax
public sealed class HMMChineseTokenizerFactory : TokenizerFactory
Constructors
Name | Description |
---|---|
HMMChineseTokenizerFactory(IDictionary<String, String>) | Creates a new HMMChineseTokenizerFactory |
Methods
Name | Description |
---|---|
Create(AttributeSource.AttributeFactory, TextReader) |