Namespace Lucene.Net.Search.PostingsHighlight
Classes
DefaultPassageFormatter
Creates a formatted snippet from the top passages.
The default implementation marks the query terms as bold, and places ellipses between unconnected passages.
Passage
Represents a passage (typically a sentence of the document).
A passage contains NumMatches highlights from the query, and the offsets and query terms that correspond with each match. @lucene.experimental
PassageFormatter
Creates a formatted snippet from the top passages.
@lucene.experimental
PassageScorer
Ranks passages found by PostingsHighlighter.
Each passage is scored as a miniature document within the document.
The final score is computed as norm
* ? (weight
* tf
).
The default implementation is norm
* BM25.
@lucene.experimental
PostingsHighlighter
Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using IcuBreakIterator (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:
- GetBreakIterator(String): Customize how the text is divided into passages.
- GetScorer(String): Customize how passages are ranked.
- GetFormatter(String): Customize how snippets are formatted.
- GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
PostingsHighlighter highlighter = new PostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers. @lucene.experimental
WholeBreakIterator
Just produces one single fragment for the entire text