Method GetTokenStream
GetTokenStream(Terms)
Declaration
public static TokenStream GetTokenStream(Terms vector)
Parameters
Type |
Name |
Description |
Terms |
vector |
|
Returns
Type |
Description |
TokenStream |
|
GetTokenStream(Terms, Boolean)
Low level api. Returns a token stream generated from a . This
can be used to feed the highlighter with a pre-parsed token
stream. The must have offsets available.
In my tests the speeds to recreate 1000 token streams using this method are:
-
with TermVector offset only data stored - 420 milliseconds
-
with TermVector offset AND position data stored - 271 milliseconds
(nb timings for TermVector with position data are based on a tokenizer with contiguous
positions - no overlaps or gaps)
-
The cost of not using TermPositionVector to store
pre-parsed content and using an analyzer to re-parse the original content:
- reanalyzing the original content - 980 milliseconds
The re-analyze timings will typically vary depending on -
-
The complexity of the analyzer code (timings above were using a
stemmer/lowercaser/stopword combo)
-
The number of other fields (Lucene reads ALL fields off the disk
when accessing just one document field - can cost dear!)
-
Use of compression on field storage - could be faster due to compression (less disk IO)
or slower (more CPU burn) depending on the content.
Declaration
public static TokenStream GetTokenStream(Terms tpv, bool tokenPositionsGuaranteedContiguous)
Parameters
Type |
Name |
Description |
Terms |
tpv |
|
System.Boolean |
tokenPositionsGuaranteedContiguous |
true if the token position numbers have no overlaps or gaps. If looking
to eek out the last drops of performance, set to true. If in doubt, set to false.
|
Returns
Type |
Description |
TokenStream |
|
GetTokenStream(IndexReader, Int32, String, Analyzer)
Declaration
public static TokenStream GetTokenStream(IndexReader reader, int docId, string field, Analyzer analyzer)
Parameters
Type |
Name |
Description |
IndexReader |
reader |
|
System.Int32 |
docId |
|
System.String |
field |
|
Analyzer |
analyzer |
|
Returns
Type |
Description |
TokenStream |
|
GetTokenStream(Document, String, Analyzer)
Declaration
public static TokenStream GetTokenStream(Document doc, string field, Analyzer analyzer)
Parameters
Type |
Name |
Description |
Document |
doc |
|
System.String |
field |
|
Analyzer |
analyzer |
|
Returns
Type |
Description |
TokenStream |
|
GetTokenStream(String, String, Analyzer)
Declaration
public static TokenStream GetTokenStream(string field, string contents, Analyzer analyzer)
Parameters
Type |
Name |
Description |
System.String |
field |
|
System.String |
contents |
|
Analyzer |
analyzer |
|
Returns
Type |
Description |
TokenStream |
|