Class DefaultSimilarity

Expert: Default scoring implementation which encodes (EncodeNormValue(Single)) norm values as a single byte before being stored. At search time, the norm byte value is read from the index Directory and decoded (DecodeNormValue(Int64)) back to a float norm value. this encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that Decode(Encode(x)) = x. For instance, Decode(Encode(0.89)) = 0.75.

Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms - for all documents - are maintained in memory.

The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter.

Last, note that search time is too late to modify this norm part of scoring, e.g. by using a different Similarity for search.

Inheritance

System.Object

Similarity

TFIDFSimilarity

DefaultSimilarity

SweetSpotSimilarity

Inherited Members

TFIDFSimilarity.IdfExplain(CollectionStatistics, TermStatistics)

TFIDFSimilarity.IdfExplain(CollectionStatistics, TermStatistics[])

TFIDFSimilarity.ComputeNorm(FieldInvertState)

TFIDFSimilarity.ComputeWeight(Single, CollectionStatistics, TermStatistics[])

TFIDFSimilarity.GetSimScorer(Similarity.SimWeight, AtomicReaderContext)

Assembly: DistributedLucene.Net.dll

Syntax

public class DefaultSimilarity : TFIDFSimilarity

Constructors

Name	Description
DefaultSimilarity()	Sole constructor: parameter-free

Fields

Name	Description
m_discountOverlaps	`True` if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Properties

Name	Description
DiscountOverlaps	Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms. @lucene.experimental

Methods

Name	Description
Coord(Int32, Int32)	Implemented as `overlap / maxOverlap`.
DecodeNormValue(Int64)	Decodes the norm value, assuming it is a single byte.
EncodeNormValue(Single)	Encodes a normalization factor for storage in an index. The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.
Idf(Int64, Int64)	Implemented as `log(numDocs/(docFreq+1)) + 1`.
LengthNorm(FieldInvertState)	Implemented as `state.Boost * LengthNorm(numTerms)`, where `numTerms` is Length if DiscountOverlaps is `false`, else it's Length - NumOverlap. @lucene.experimental
QueryNorm(Single)	Implemented as `1/sqrt(sumOfSquaredWeights)`.
ScorePayload(Int32, Int32, Int32, BytesRef)	The default implementation returns `1`
SloppyFreq(Int32)	Implemented as `1 / (distance + 1)`.
Tf(Single)	Implemented as `Math.Sqrt(freq)`.
ToString()

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)