Class WordDelimiterFilter

Splits words into subwords and performs optional transformations on subword groups. Words are split into subwords with the following rules:

split on intra-word delimiters (by default, all non alpha-numeric characters): "Wi-Fi" ? "Wi", "Fi"
split on case transitions: "PowerShot" ? "Power", "Shot"
split on letter-number transitions: "SD500" ? "SD", "500"
leading and trailing intra-word delimiters on each subword are ignored: "//hello---there, 'dude'" ? "hello", "there", "dude"
trailing "'s" are removed for each subword: "O'Neil's" ? "O", "Neil"

The combinations parameter affects how subwords are combined:

combinations="0" causes no subword combinations:
```
"PowerShot"
```
? 0:"Power", 1:"Shot" (0 and 1 are the token positions)
combinations="1" means that in addition to the subwords, maximum runs of non-numeric subwords are catenated and produced at the same position of the last subword in the run:

One use for WordDelimiterFilter is to help match words with different subword delimiters. For example, if the source text contained "wi-fi" one may want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so is to specify combinations="1" in the analyzer used for indexing, and combinations="0" (the default) in the analyzer used for querying. Given that the current StandardTokenizer immediately removes many intra-word delimiters, it is recommended that this filter be used after a tokenizer that does not do this (such as WhitespaceTokenizer).

Inheritance

System.Object

AttributeSource

TokenStream

TokenFilter

WordDelimiterFilter

Inherited Members

TokenFilter.m_input

TokenFilter.End()

TokenFilter.Dispose(Boolean)

TokenStream.Dispose()

AttributeSource.GetAttributeFactory()

AttributeSource.GetAttributeClassesEnumerator()

AttributeSource.GetAttributeImplsEnumerator()

AttributeSource.AddAttributeImpl(Attribute)

AttributeSource.AddAttribute<T>()

AttributeSource.HasAttributes

AttributeSource.HasAttribute<T>()

AttributeSource.GetAttribute<T>()

AttributeSource.ClearAttributes()

AttributeSource.CaptureState()

AttributeSource.RestoreState(AttributeSource.State)

AttributeSource.GetHashCode()

AttributeSource.Equals(Object)

AttributeSource.ReflectAsString(Boolean)

AttributeSource.ReflectWith(IAttributeReflector)

AttributeSource.CloneAttributes()

AttributeSource.CopyTo(AttributeSource)

AttributeSource.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

[Serializable]
public sealed class WordDelimiterFilter : TokenFilter, IDisposable

Constructors

Name	Description
WordDelimiterFilter(LuceneVersion, TokenStream, WordDelimiterFlags, CharArraySet)	Creates a new WordDelimiterFilter using DEFAULT_WORD_DELIM_TABLE as its charTypeTable
WordDelimiterFilter(LuceneVersion, TokenStream, Byte[], WordDelimiterFlags, CharArraySet)	Creates a new WordDelimiterFilter

Fields

Name	Description
ALPHA
ALPHANUM
DIGIT
LOWER
SUBWORD_DELIM
UPPER

Methods

Name	Description
IncrementToken()
Reset()

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)