Class TrecDocParser
Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.
Inheritance
System.Object
TrecDocParser
Assembly: Lucene.Net.Benchmark.dll
Syntax
public abstract class TrecDocParser : object
Fields
Name | Description |
---|---|
DEFAULT_PATH_TYPE | trec parser type used for unknown extensions |
Methods
Name | Description |
---|---|
Extract(StringBuilder, String, String, Int32, String[]) | Extract from |
Parse(DocData, String, TrecContentSource, StringBuilder, TrecDocParser.ParsePathType) | Parse the text prepared in docBuf into a result DocData, no synchronization is required. |
PathType(FileInfo) | Compute the path type of a file by inspecting name of file and its parents. |
StripTags(StringBuilder, Int32) | strip tags from : each tag is replaced by a single blank.
|
StripTags(String, Int32) | Strip tags from input. |