Class PostingsHighlighter
Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.
PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using IcuBreakIterator (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.
You can customize the behavior by subclassing this highlighter, some important hooks:
- GetBreakIterator(String): Customize how the text is divided into passages.
- GetScorer(String): Customize how passages are ranked.
- GetFormatter(String): Customize how snippets are formatted.
- GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.
WARNING: The code is very new and probably still has some exciting bugs!
Example usage:
// configure field with offsets at index time
IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
Field body = new Field("body", "foobar", offsetsType);
// retrieve highlights at query time
PostingsHighlighter highlighter = new PostingsHighlighter();
Query query = new TermQuery(new Term("body", "highlighting"));
TopDocs topDocs = searcher.Search(query, n);
string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);
This is thread-safe, and can be used across different readers. @lucene.experimental
Inheritance
Assembly: Lucene.Net.ICU.dll
Syntax
public class PostingsHighlighter : object
Constructors
Name | Description |
---|---|
PostingsHighlighter() | Creates a new highlighter with DEFAULT_MAX_LENGTH. |
PostingsHighlighter(Int32) | Creates a new highlighter, specifying maximum content length. |
Fields
Name | Description |
---|---|
DEFAULT_MAX_LENGTH | Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content |
Methods
Name | Description |
---|---|
GetBreakIterator(String) | Returns the BreakIterator to use for dividing text into passages. This instantiates an IcuBreakIterator by default; subclasses can override to customize. |
GetEmptyHighlight(String, BreakIterator, Int32) | Called to summarize a document when no hits were
found. By default this just returns the first
|
GetFormatter(String) | Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize. |
GetIndexAnalyzer(String) | Returns the analyzer originally used to index the content for This is used to highlight some MultiTermQuerys. |
GetMultiValuedSeparator(String) | Returns the logical separator between values for multi-valued fields.
The default value is a space character, which means passages can span across values,
but a subclass can override, for example with |
GetScorer(String) | Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize. |
Highlight(String, Query, IndexSearcher, TopDocs) | Highlights the top passages from a single field. |
Highlight(String, Query, IndexSearcher, TopDocs, Int32) | Highlights the top-N passages from a single field. |
HighlightFields(String[], Query, IndexSearcher, TopDocs) | Highlights the top passages from multiple fields. Conceptually, this behaves as a more efficient form of:
|
HighlightFields(String[], Query, IndexSearcher, TopDocs, Int32[]) | Highlights the top-N passages from multiple fields. Conceptually, this behaves as a more efficient form of:
|
HighlightFields(String[], Query, IndexSearcher, Int32[], Int32[]) | Highlights the top-N passages from multiple fields, for the provided int[] docids. |
HighlightFieldsAsObjects(String[], Query, IndexSearcher, Int32[], Int32[]) | Expert: highlights the top-N passages from multiple fields,
for the provided int[] docids, to custom object as
returned by the PassageFormatter. Use
this API to render to something other than |
LoadFieldValues(IndexSearcher, String[], Int32[], Int32) | Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed. |