Class PostingsHighlighter

Simple highlighter that does not analyze fields nor use term vectors. Instead it requires DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.

PostingsHighlighter treats the single original document as the whole corpus, and then scores individual passages as if they were documents in this corpus. It uses a BreakIterator to find passages in the text; by default it breaks using IcuBreakIterator (for sentence breaking). It then iterates in parallel (merge sorting by offset) through the positions of all terms from the query, coalescing those hits that occur in a single passage into a Passage, and then scores each Passage using a separate PassageScorer. Passages are finally formatted into highlighted snippets with a PassageFormatter.

You can customize the behavior by subclassing this highlighter, some important hooks:

GetBreakIterator(String): Customize how the text is divided into passages.
GetScorer(String): Customize how passages are ranked.
GetFormatter(String): Customize how snippets are formatted.
GetIndexAnalyzer(String): Enable highlighting of MultiTermQuerys such as WildcardQuery.

WARNING: The code is very new and probably still has some exciting bugs!

Example usage:

    // configure field with offsets at index time
    IndexableFieldType offsetsType = new IndexableFieldType(TextField.TYPE_STORED);
    offsetsType.IndexOptions = IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
    Field body = new Field("body", "foobar", offsetsType);

    // retrieve highlights at query time 
    PostingsHighlighter highlighter = new PostingsHighlighter();
    Query query = new TermQuery(new Term("body", "highlighting"));
    TopDocs topDocs = searcher.Search(query, n);
    string highlights[] = highlighter.Highlight("body", query, searcher, topDocs);

This is thread-safe, and can be used across different readers. @lucene.experimental

Inheritance

System.Object

PostingsHighlighter

Assembly: Lucene.Net.ICU.dll

Syntax

public class PostingsHighlighter : object

Constructors

Name	Description
PostingsHighlighter()	Creates a new highlighter with DEFAULT_MAX_LENGTH.
PostingsHighlighter(Int32)	Creates a new highlighter, specifying maximum content length.

Fields

Name	Description
DEFAULT_MAX_LENGTH	Default maximum content size to process. Typically snippets closer to the beginning of the document better summarize its content

Methods

Name	Description
GetBreakIterator(String)	Returns the BreakIterator to use for dividing text into passages. This instantiates an IcuBreakIterator by default; subclasses can override to customize.
GetEmptyHighlight(String, BreakIterator, Int32)	Called to summarize a document when no hits were found. By default this just returns the first `maxPassages` sentences; subclasses can override to customize.
GetFormatter(String)	Returns the PassageFormatter to use for formatting passages into highlighted snippets. This returns a new PassageFormatter by default; subclasses can override to customize.
GetIndexAnalyzer(String)	Returns the analyzer originally used to index the content for `field`. This is used to highlight some MultiTermQuerys.
GetMultiValuedSeparator(String)	Returns the logical separator between values for multi-valued fields. The default value is a space character, which means passages can span across values, but a subclass can override, for example with `U+2029 PARAGRAPH SEPARATOR (PS)` if each value holds a discrete passage for highlighting.
GetScorer(String)	Returns the PassageScorer to use for ranking passages. This returns a new PassageScorer by default; subclasses can override to customize.
Highlight(String, Query, IndexSearcher, TopDocs)	Highlights the top passages from a single field.
Highlight(String, Query, IndexSearcher, TopDocs, Int32)	Highlights the top-N passages from a single field.
HighlightFields(String[], Query, IndexSearcher, TopDocs)	Highlights the top passages from multiple fields. Conceptually, this behaves as a more efficient form of: `IDictionary<string, string[]> m = new Dictionary<string, string[]>(); foreach (string field in fields) { m[field] = Highlight(field, query, searcher, topDocs); } return m;`
HighlightFields(String[], Query, IndexSearcher, TopDocs, Int32[])	Highlights the top-N passages from multiple fields. Conceptually, this behaves as a more efficient form of: `IDictionary<string, string[]> m = new Dictionary<string, string[]>(); foreach (string field in fields) { m[field] = Highlight(field, query, searcher, topDocs, maxPassages); } return m;`
HighlightFields(String[], Query, IndexSearcher, Int32[], Int32[])	Highlights the top-N passages from multiple fields, for the provided int[] docids.
HighlightFieldsAsObjects(String[], Query, IndexSearcher, Int32[], Int32[])	Expert: highlights the top-N passages from multiple fields, for the provided int[] docids, to custom object as returned by the PassageFormatter. Use this API to render to something other than .
LoadFieldValues(IndexSearcher, String[], Int32[], Int32)	Loads the string values for each field X docID to be highlighted. By default this loads from stored fields, but a subclass can change the source. This method should allocate the string[fields.length][docids.length] and fill all values. The returned strings must be identical to what was indexed.

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)