Class FreeTextSuggester

Builds an ngram model from the text sent to Build(IInputIterator, Double) and predicts based on the last grams-1 tokens in the request sent to DoLookup(String, IEnumerable<BytesRef>, Boolean, Int32). This tries to handle the "long tail" of suggestions for when the incoming query is a never before seen query string.

Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.

Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").

This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation" for details.

From DoLookup(String, IEnumerable<BytesRef>, Boolean, Int32), the key of each result is the ngram token; the value is * score (fixed point, cast to long). Divide by to get the score back, which ranges from 0.0 to 1.0. onlyMorePopular is unused. @lucene.experimental

Inheritance

System.Object

Lookup

FreeTextSuggester

Inherited Members

Lookup.CHARSEQUENCE_COMPARER

Lookup.Build(IDictionary)

Lookup.Load(Stream)

Lookup.Store(Stream)

Assembly: Lucene.Net.Suggest.dll

Syntax

public class FreeTextSuggester : Lookup

Constructors

Name	Description
FreeTextSuggester(Analyzer)	Instantiate, using the provided analyzer for both indexing and lookup, using bigram model by default.
FreeTextSuggester(Analyzer, Analyzer)	Instantiate, using the provided indexing and lookup analyzers, using bigram model by default.
FreeTextSuggester(Analyzer, Analyzer, Int32)	Instantiate, using the provided indexing and lookup analyzers, with the specified model (2 = bigram, 3 = trigram, etc.).
FreeTextSuggester(Analyzer, Analyzer, Int32, Byte)	Instantiate, using the provided indexing and lookup analyzers, and specified model (2 = bigram, 3 = trigram ,etc.). The `separator` is passed to SetTokenSeparator(String) to join multiple tokens into a single ngram token; it must be an ascii (7-bit-clean) byte. No input tokens should have this byte, otherwise is thrown.

Fields

Name	Description
ALPHA	The constant used for backoff smoothing; during lookup, this means that if a given trigram did not occur, and we backoff to the bigram, the overall score will be 0.4 times what the bigram model would have assigned.
CODEC_NAME	Codec name used in the header for the saved model.
DEFAULT_GRAMS	By default we use a bigram model.
DEFAULT_SEPARATOR	The default character used to join multiple tokens into a single ngram token. The input tokens produced by the analyzer must not contain this character.
VERSION_CURRENT	Current version of the the saved model file format.
VERSION_START	Initial version of the the saved model file format.

Properties

Name	Description
Count

Methods

Name	Description
Build(IInputIterator)
Build(IInputIterator, Double)	Build the suggest index, using up to the specified amount of temporary RAM while building. Note that the weights for the suggestions are ignored.
DoLookup(String, IEnumerable<BytesRef>, Boolean, Int32)
DoLookup(String, IEnumerable<BytesRef>, Int32)	Retrieve suggestions.
DoLookup(String, Boolean, Int32)
DoLookup(String, Int32)	Lookup, without any context.
Get(String)	Returns the weight associated with an input string, or null if it does not exist.
GetSizeInBytes()	Returns byte size of the underlying FST.
Load(DataInput)
Store(DataOutput)

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)