Namespace Lucene.Net.Analysis.Compound
Classes
CompoundWordTokenFilterBase
Base class for decomposition token filters.
You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
- As of 4.4, CompoundWordTokenFilterBase doesn't update offsets.
CompoundWordTokenFilterBase.CompoundToken
Helper class to hold decompounded token information
DictionaryCompoundWordTokenFilter
A TokenFilter that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a brute-force algorithm to achieve this.
You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
DictionaryCompoundWordTokenFilterFactory
Factory for DictionaryCompoundWordTokenFilter.
<fieldType name="text_dictcomp" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt"
minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="true"/>
</analyzer>
</fieldType>
HyphenationCompoundWordTokenFilter
A TokenFilter that decomposes compound words found in many Germanic languages.
"Donaudampfschiff" becomes Donau, dampf, schiff so that you can find "Donaudampfschiff" even when you only enter "schiff". It uses a hyphenation grammar and a word dictionary to achieve this.
You must specify the required LuceneVersion compatibility when creating CompoundWordTokenFilterBase:
- As of 3.1, CompoundWordTokenFilterBase correctly handles Unicode 4.0 supplementary characters in strings and char arrays provided as compound word dictionaries.
HyphenationCompoundWordTokenFilterFactory
Factory for HyphenationCompoundWordTokenFilter.
This factory accepts the following parameters:
(mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.hyphenator
(optional): encoding of the xml hyphenation file. defaults to UTF-8.encoding
(optional): dictionary of words. defaults to no dictionary.dictionary
(optional): minimal word length that gets decomposed. defaults to 5.minWordSize
(optional): minimum length of subwords. defaults to 2.minSubwordSize
(optional): maximum length of subwords. defaults to 15.maxSubwordSize
(optional): if true, adds only the longest matching subword to the stream. defaults to false.onlyLongestMatch
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
</analyzer>
</fieldType>