Namespace Lucene.Net.Search.Suggest.Analyzing
Classes
AnalyzingInfixSuggester
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.
This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.
This suggester supports contexts, however the contexts must be valid utf8 (arbitrary binary terms will not work). @lucene.experimental
AnalyzingSuggester
Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).
This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with Lucene.Net.Search.Suggest.Analyzing.AnalyzingSuggester.preservePositionIncrements parameter set to false
If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.
When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.
There are some limitations:
- A lookup from a query like "net" in English won't be any different than "net " (ie, user added a trailing space) because analyzers don't reflect when they've seen a token separator and when they haven't.
- If you're using StopFilter, and the user will type "fast apple", but so far all they've typed is "fast a", again because the analyzer doesn't convey whether it's seen a token separator after the "a", StopFilter will remove that "a" causing far more matches than you'd expect.
- Lookups with the empty string return no results instead of all results.
BlendedInfixSuggester
Extension of the AnalyzingInfixSuggester which transforms the weight after search to take into account the position of the searched term into the indexed text. Please note that it increases the number of elements searched and applies the ponderation after. It might be costly for long suggestions.
@lucene.experimental
FreeTextSuggester
Builds an ngram model from the text sent to Build(IInputIterator, Double) and predicts based on the last grams-1 tokens in the request sent to DoLookup(String, IEnumerable<BytesRef>, Boolean, Int32). This tries to handle the "long tail" of suggestions for when the incoming query is a never before seen query string.
Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").
This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation" for details.
From DoLookup(String, IEnumerable<BytesRef>, Boolean, Int32), the key of each result is the
ngram token; the value is onlyMorePopular
is unused.
@lucene.experimental
FSTUtil
Exposes a utility method to enumerate all paths
intersecting an
FSTUtil.Path<T>
Holds a pair (automaton, fst) of states and accumulated output in the intersected machine.
FuzzySuggester
Implements a fuzzy AnalyzingSuggester. The similarity measurement is
based on the Damerau-Levenshtein (optimal string alignment) algorithm, though
you can explicitly choose classic Levenshtein by passing false
for the Lucene.Net.Search.Suggest.Analyzing.FuzzySuggester.transpositions parameter.
At most, this query will match terms up to
NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.
Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.
@lucene.experimental
SuggestStopFilter
Like StopFilter except it will not remove the last token if that token was not followed by some token separator. For example, a query 'find the' would preserve the 'the' since it was not followed by a space or punctuation or something, and mark it KEYWORD so future stemmers won't touch it either while a query like "find the popsicle' would remove 'the' as a stopword.
Normally you'd use the ordinary StopFilter in your indexAnalyzer and then this class in your queryAnalyzer, when using one of the analyzing suggesters.
Enums
BlendedInfixSuggester.BlenderType
The different types of blender.
SuggesterOptions
LUCENENET specific type for specifying AnalyzingSuggester and FuzzySuggester options.