Class DirectSpellChecker
Simple automaton-based spellchecker.
Candidates are presented directly from the term dictionary, based on Levenshtein distance. This is an alternative to SpellChecker if you are using an edit-distance-like metric such as Levenshtein or JaroWinklerDistance.
A practical benefit of this spellchecker is that it requires no additional datastructures (neither in RAM nor on disk) to do its work.
Inheritance
Assembly: Lucene.Net.Suggest.dll
Syntax
public class DirectSpellChecker : object
Constructors
Name | Description |
---|---|
DirectSpellChecker() | Creates a DirectSpellChecker with default configuration values |
Fields
Name | Description |
---|---|
INTERNAL_LEVENSHTEIN | The default StringDistance, Damerau-Levenshtein distance implemented internally
via Note: this is the fastest distance metric, because Damerau-Levenshtein is used to draw candidates from the term dictionary: this just re-uses the scoring. |
Properties
Name | Description |
---|---|
Accuracy | Gets or sets the minimal accuracy required (default: 0.5f) from a StringDistance for a suggestion match. |
Comparer | Gets or sets the comparer for sorting suggestions. The default is DEFAULT_COMPARER |
Distance | Gets or sets the string distance metric. The default is INTERNAL_LEVENSHTEIN. Note: because this spellchecker draws its candidates from the term dictionary using Damerau-Levenshtein, it works best with an edit-distance-like string metric. If you use a different metric than the default, you might want to consider increasing MaxInspections to draw more candidates for your metric to rank. |
LowerCaseTerms | True if the spellchecker should lowercase terms (default: true) This is a convenience method, if your index field has more complicated analysis (such as StandardTokenizer removing punctuation), its probably better to turn this off, and instead run your query terms through your Analyzer first. If this option is not on, case differences count as an edit! |
MaxEdits | Gets or sets the maximum number of Levenshtein edit-distances to draw candidate terms from.This value can be 1 or 2. The default is 2. Note: a large number of spelling errors occur with an edit distance of 1, by setting this value to 1 you can increase both performance and precision at the cost of recall. |
MaxInspections | Get the maximum number of top-N inspections per suggestion. Increasing this number can improve the accuracy of results, at the cost of performance. |
MaxQueryFrequency | Gets or sets the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions. Very high-frequency terms are typically spelled correctly. Additionally, this can increase performance as it will do no work for the common case of correctly-spelled input terms. This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional. |
MinPrefix | Gets or sets the minimal number of characters that must match exactly. This can improve both performance and accuracy of results, as misspellings are commonly not the first character. |
MinQueryLength | Gets or sets the minimum length of a query term (default: 4) needed to return suggestions. Very short query terms will often cause only bad suggestions with any distance metric. |
ThresholdFrequency | Gets or sets the minimal threshold of documents a term must appear for a match.
This can improve quality by only suggesting high-frequency terms. Note that
very high values might decrease performance slightly, by forcing the spellchecker
to draw more candidates from the term dictionary, but a practical value such
as This can be specified as a relative percentage of documents such as 0.5f, or it can be specified as an absolute whole document frequency, such as 4f. Absolute document frequencies may not be fractional. |
Methods
Name | Description |
---|---|
SuggestSimilar(Term, Int32, IndexReader) | Calls SuggestSimilar(Term, Int32, IndexReader, SuggestMode) SuggestSimilar(term, numSug, ir, SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX) |
SuggestSimilar(Term, Int32, IndexReader, SuggestMode) | Calls SuggestSimilar(Term, Int32, IndexReader, SuggestMode, Single) SuggestSimilar(term, numSug, ir, suggestMode, this.accuracy) |
SuggestSimilar(Term, Int32, IndexReader, SuggestMode, Single) | Suggest similar words. Unlike SpellChecker, the similarity used to fetch the most relevant terms is an edit distance, therefore typically a low value for numSug will work very well. |
SuggestSimilar(Term, Int32, IndexReader, Int32, Int32, Single, CharsRef) | Provide spelling corrections based on several parameters. |