Namespace Lucene.Net.Analysis.Th
Classes
ThaiAnalyzer
Analyzer for Thai language. It uses BreakIterator to break words.
You must specify the required LuceneVersion compatibility when creating ThaiAnalyzer:
- As of 3.6, a set of Thai stopwords is used by default
ThaiTokenizer
Tokenizer that use BreakIterator to tokenize Thai text.
ThaiTokenizerFactory
Factory for ThaiTokenizer.
<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.ThaiTokenizerFactory"/>
</analyzer>
</fieldType>
ThaiWordFilter
TokenFilter that use BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.
Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text. ThaiAnalyzer will insert a LowerCaseFilter before this filter so the behaviour of the Analyzer does not change. With version 3.1, the filter handles position increments correctly.
ThaiWordFilterFactory
Factory for ThaiWordFilter.
<fieldType name="text_thai" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ThaiWordFilterFactory"/>
</analyzer>
</fieldType>