Class ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token.
For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".
This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0.
Inherited Members
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public sealed class ShingleFilter : TokenFilter, IDisposable
Constructors
Name | Description |
---|---|
ShingleFilter(TokenStream) | Construct a ShingleFilter with default shingle size: 2. |
ShingleFilter(TokenStream, Int32) | Constructs a ShingleFilter with the specified shingle size from the
TokenStream |
ShingleFilter(TokenStream, Int32, Int32) | Constructs a ShingleFilter with the specified shingle size from the
TokenStream |
ShingleFilter(TokenStream, String) | Construct a ShingleFilter with the specified token type for shingle tokens and the default shingle size: 2 |
Fields
Name | Description |
---|---|
DEFAULT_FILLER_TOKEN | filler token for when positionIncrement is more than 1 |
DEFAULT_MAX_SHINGLE_SIZE | default maximum shingle size is 2. |
DEFAULT_MIN_SHINGLE_SIZE | default minimum shingle size is 2. |
DEFAULT_TOKEN_SEPARATOR | The default string to use when joining adjacent tokens to form a shingle |
DEFAULT_TOKEN_TYPE | default token type attribute value is "shingle" |
Methods
Name | Description |
---|---|
End() | |
IncrementToken() | |
Reset() | |
SetFillerToken(String) | Sets the string to insert for each position at which there is no token (i.e., when position increment is greater than one). |
SetMaxShingleSize(Int32) | Set the max shingle size (default: 2) |
SetMinShingleSize(Int32) | Set the min shingle size (default: 2). This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method. The unigram output option is independent of the min shingle size. |
SetOutputUnigrams(Boolean) | Shall the output stream contain the input tokens (unigrams) as well as shingles? (default: true.) |
SetOutputUnigramsIfNoShingles(Boolean) | Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.) Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available. |
SetTokenSeparator(String) | Sets the string to use when joining adjacent tokens to form a shingle |
SetTokenType(String) | Set the type of the shingle tokens produced by this filter. (default: "shingle") |