Class TrecContentSource
Implements a ContentSource over the TREC collection.
Inherited Members
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class TrecContentSource : ContentSource
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
Fields
Name | Description |
---|---|
DOC | |
DOCNO | |
NEW_LINE | separator between lines in the buffer |
TERMINATING_DOC | |
TERMINATING_DOCNO |
Methods
Name | Description |
---|---|
Dispose(Boolean) | |
GetNextDocData(DocData) | |
ParseDate(String) | |
ResetInputs() | |
SetConfig(Config) |