Class BM25Similarity

BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994.

@lucene.experimental

Inheritance

System.Object

Similarity

BM25Similarity

Inherited Members

Similarity.Coord(Int32, Int32)

Similarity.QueryNorm(Single)

Assembly: DistributedLucene.Net.dll

Syntax

public class BM25Similarity : Similarity

Constructors

Name	Description
BM25Similarity()	BM25 with these default values: `k1 = 1.2`, `b = 0.75`.
BM25Similarity(Single, Single)	BM25 with the supplied parameter values.

Properties

Name	Description
B	Returns the `b` parameter
DiscountOverlaps	Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
K1	Returns the `k1` parameter

Methods

Name	Description
AvgFieldLength(CollectionStatistics)	The default implementation computes the average as `sumTotalTermFreq / maxDoc`, or returns `1` if the index does not store sumTotalTermFreq (Lucene 3.x indexes or any field that omits frequency information).
ComputeNorm(FieldInvertState)
ComputeWeight(Single, CollectionStatistics, TermStatistics[])
DecodeNormValue(Byte)	The default implementation returns `1 / f²` where `f` is Byte315ToSingle(Byte).
EncodeNormValue(Single, Int32)	The default implementation encodes `boost / sqrt(length)` with SingleToByte315(Single). This is compatible with Lucene's default implementation. If you change this, then you should change DecodeNormValue(Byte) to match.
GetSimScorer(Similarity.SimWeight, AtomicReaderContext)
Idf(Int64, Int64)	Implemented as `log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5))`.
IdfExplain(CollectionStatistics, TermStatistics)	Computes a score factor for a simple term and returns an explanation for that score factor. The default implementation uses: `Idf(docFreq, searcher.MaxDoc);` Note that MaxDoc is used instead of Lucene.Net.Index.IndexReader.IntNumDocs because also DocFreq is used, and when the latter is inaccurate, so is MaxDoc, and in the same direction. In addition, MaxDoc is more efficient to compute
IdfExplain(CollectionStatistics, TermStatistics[])	Computes a score factor for a phrase. The default implementation sums the idf factor for each term in the phrase.
ScorePayload(Int32, Int32, Int32, BytesRef)	The default implementation returns `1`
SloppyFreq(Int32)	Implemented as `1 / (distance + 1)`.
ToString()

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)