Namespace Lucene.Net.Codecs.Lucene46
Classes
Lucene46Codec
Implements the Lucene 4.6 index format, with configurable per-field postings and docvalues formats.
If you want to reuse functionality of this codec in another codec, extend FilterCodec.
See Lucene.Net.Codecs.Lucene46 package documentation for file format details.
@lucene.experimental
Lucene46FieldInfosFormat
Lucene 4.6 Field Infos format.
Field names are stored in the field info file, with suffix .fnm
.
FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,DocValuesGen,Attributes> FieldsCount,Footer
Data types:
- Header --> CodecHeader (WriteHeader(DataOutput, String, Int32))
- FieldsCount --> VInt (WriteVInt32(Int32))
- FieldName --> String (WriteString(String))
- FieldBits, DocValuesBits --> Byte (WriteByte(Byte))
- FieldNumber --> VInt (WriteInt32(Int32))
- Attributes --> IDictionary<String,String> (
) - DocValuesGen --> Int64 (WriteInt64(Int64))
- Footer --> CodecFooter (WriteFooter(IndexOutput))
- FieldsCount: the number of fields in this file.
- FieldName: name of the field as a UTF-8 string.
- FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly.
- FieldBits: a
containing field options. - The low-order bit is one for indexed fields, and zero for non-indexed fields.
- The second lowest-order bit is one for fields that have term vectors stored, and zero for fields without term vectors.
- If the third lowest order-bit is set (0x4), offsets are stored into the postings list in addition to positions.
- Fourth bit is unused.
- If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field.
- If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field.
- If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field.
- If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field.
- DocValuesBits: a
containing per-document value types. The type recorded as two four-bit integers, with the high-order bits representing norms
options, and the low-order bits representing DocValues options. Each four-bit integer can be decoded as such:- 0: no DocValues for this field.
- 1: NumericDocValues. (NUMERIC)
- 2: BinaryDocValues. (BINARY)
- 3: SortedDocValues. (SORTED)
- DocValuesGen is the generation count of the field's DocValues. If this is -1, there are no DocValues updates to that field. Anything above zero means there are updates stored by DocValuesFormat.
- Attributes: a key-value map of codec-private attributes.
@lucene.experimental
Lucene46SegmentInfoFormat
Lucene 4.6 Segment info format.
Files:
.si
: Header, SegVersion, SegSize, IsCompoundFile, Diagnostics, Files, Footer
- Header --> CodecHeader (WriteHeader(DataOutput, String, Int32))
- SegSize --> Int32 (WriteInt32(Int32))
- SegVersion --> String (WriteString(String))
- Files --> ISet<String> (
) - Diagnostics --> IDictionary<String,String> (
) - IsCompoundFile --> Int8 (WriteByte(Byte))
- Footer --> CodecFooter (WriteFooter(IndexOutput))
- SegVersion is the code version that created the segment.
- SegSize is the number of documents contained in the segment index.
- IsCompoundFile records whether the segment is written as a compound file or not. If this is -1, the segment is not a compound file. If it is 1, the segment is a compound file.
- The Diagnostics Map is privately written by IndexWriter, as a debugging aid, for each segment it creates. It includes metadata like the current Lucene version, OS, .NET/Java version, why the segment was created (merge, flush, addIndexes), etc.
- Files is a list of files referred to by this segment.
Lucene46SegmentInfoReader
Lucene 4.6 implementation of SegmentInfoReader.
@lucene.experimental
Lucene46SegmentInfoWriter
Lucene 4.0 implementation of SegmentInfoWriter.
@lucene.experimental