Namespace Lucene.Net.Search.PostingsHighlight

Classes

DefaultPassageFormatter

Creates a formatted snippet from the top passages.

The default implementation marks the query terms as bold, and places ellipses between unconnected passages.

Passage

Represents a passage (typically a sentence of the document).

A passage contains NumMatches highlights from the query, and the offsets and query terms that correspond with each match. @lucene.experimental

PassageFormatter

Creates a formatted snippet from the top passages.

@lucene.experimental

PassageScorer

Ranks passages found by PostingsHighlighter.

Each passage is scored as a miniature document within the document. The final score is computed as norm * ? (weight * tf). The default implementation is norm * BM25.

@lucene.experimental

PostingsHighlighter

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using IcuBreakIterator (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(String): Customize how the text is divided into passages.
GetScorer(String): Customize how passages are ranked.
GetFormatter(String): Customize how snippets are formatted.
GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);

    // retrieve highlights at query time 
    PostingsHighlighter highlighter = new PostingsHighlighter();
    Query query = new TermQuery(new Term("body", "highlighting"));
    TopDocs topDocs = searcher.Search(query, n);
    string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers. @lucene.experimental

WholeBreakIterator

Just produces one single fragment for the entire text