Namespace Lucene.Net.Documents
Classes
BinaryDocValuesField
Field that stores a per-document BytesRef value.
The values are stored directly with no sharing, which is a good fit when the fields don't share (many) values, such as a title field. If values may be shared and sorted it's better to use SortedDocValuesField. Here's an example usage:
document.Add(new BinaryDocValuesField(name, new BytesRef("hello")));
If you also need to store the value, you should add a separate StoredField instance.
ByteDocValuesField
Field that stores a per-document
document.Add(new ByteDocValuesField(name, (byte) 22));
If you also need to store the value, you should add a separate StoredField instance.
CompressionTools
Simple utility class providing static methods to
compress and decompress binary data for stored fields.
this class uses the
DateTools
Provides support for converting dates to strings and vice-versa. The strings are structured so that lexicographic sorting orders them by date, which makes them suitable for use as field values and search terms.
This class also helps you to limit the resolution of your dates. Do not save dates with a finer resolution than you really need, as then TermRangeQuery and PrefixQuery will require more memory and become slower.
Another approach is NumericUtils, which provides a sortable binary representation (prefix encoded) of numeric values, which date/time are.
For indexing a
DerefBytesDocValuesField
Field that stores a per-document BytesRef value. Here's an example usage:
document.Add(new DerefBytesDocValuesField(name, new BytesRef("hello")));
If you also need to store the value, you should add a separate StoredField instance.
Document
Documents are the unit of indexing and search.
A Document is a set of fields. Each field has a name and a textual value. A field may be stored (IsStored) with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.
Note that fields which are not IsStored are
not available in documents retrieved from the index, e.g. with
Lucene.Net.Search.ScoreDoc.IntDoc or
DocumentExtensions
LUCENENET specific extensions to the Document class.
DocumentStoredFieldVisitor
A StoredFieldVisitor that creates a Document containing all stored fields, or only specific requested fields provided to DocumentStoredFieldVisitor(ISet<String>).
This is used by
@lucene.experimental
DoubleDocValuesField
Syntactic sugar for encoding doubles as NumericDocValues via DoubleToRawInt64Bits(Double).
Per-document double values can be retrieved via GetDoubles(AtomicReader, String, Boolean).
NOTE: In most all cases this will be rather inefficient, requiring eight bytes per document. Consider encoding double values yourself with only as much precision as you require.
DoubleField
Field that indexes
document.Add(new DoubleField(name, 6.0, Field.Store.NO));
For optimal performance, re-use the DoubleField and
Document instance for more than one document:
DoubleField field = new DoubleField(name, 0.0, Field.Store.NO);
Document document = new Document();
document.Add(field);
for (all documents)
{
...
field.SetDoubleValue(value)
writer.AddDocument(document);
...
}
See also Int32Field, Int64Field,
SingleField.
To perform range querying or filtering against a DoubleField, use NumericRangeQuery or NumericRangeFilter<T>. To sort according to a DoubleField, use the normal numeric sort types, eg DOUBLE. DoubleField values can also be loaded directly from IFieldCache.
You may add the same field name as an DoubleField to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued DoubleField.
A DoubleField will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise.
Within Lucene, each numeric value is indexed as a
trie structure, where each term is logically
assigned to larger and larger pre-defined brackets (which
are simply lower-precision representations of the value).
The step size between each successive bracket is called the
precisionStep
, measured in bits. Smaller
precisionStep
values result in larger number
of brackets, which consumes more disk space in the index
but may result in faster range search performance. The
default value, 4, was selected for a reasonable tradeoff
of disk space consumption versus performance. You can
create a custom FieldType and invoke the
NumericPrecisionStep setter if you'd
like to change the value. Note that you must also
specify a congruent value when creating
NumericRangeQuery<T> or NumericRangeFilter<T>.
For low cardinality fields larger precision steps are good.
If the cardinality is < 100, it is fair
to use
For more information on the internals of numeric trie
indexing, including the PrecisionStep (precisionStep
)
configuration, see NumericRangeQuery<T>. The format of
indexed values is described in NumericUtils.
If you only need to sort by numeric value, and never
run range querying/filtering, you can index using a
precisionStep
of
More advanced users can instead use NumericTokenStream directly, when indexing numbers. This class is a wrapper around this token stream type for easier, more intuitive usage.
@since 2.9
Field
Expert: directly create a field for a document. Most users should use one of the sugar subclasses: Int32Field, Int64Field, SingleField, DoubleField, BinaryDocValuesField, NumericDocValuesField, SortedDocValuesField, StringField, TextField, StoredField.
A field is a section of a Document. Each field has three
parts: name, type and value. Values may be text
(
NOTE: the field type is an IIndexableFieldType. Making changes to the state of the IIndexableFieldType will impact any Field it is used in. It is strongly recommended that no changes be made after Field instantiation.
Field.Byte
Field.Double
Field.Int16
Field.Int32
Field.Int64
Field.Number
Field.Single
FieldExtensions
LUCENENET specific extension methods to add functionality to enumerations that mimic Lucene
FieldType
Describes the properties of a field.
IndexableFieldExtensions
Extension methods to the IIndexableField interface.
Int16DocValuesField
Field that stores a per-document
document.Add(new Int16DocValuesField(name, (short) 22));
If you also need to store the value, you should add a separate StoredField instance.
NOTE: This was ShortDocValuesField in Lucene
Int32DocValuesField
Field that stores a per-document
document.Add(new Int32DocValuesField(name, 22));
If you also need to store the value, you should add a separate StoredField instance.
NOTE: This was IntDocValuesField in Lucene
Int32Field
Field that indexes
document.Add(new Int32Field(name, 6, Field.Store.NO));
For optimal performance, re-use the Int32Field and
Document instance for more than one document:
Int32Field field = new Int32Field(name, 6, Field.Store.NO);
Document document = new Document();
document.Add(field);
for (all documents)
{
...
field.SetInt32Value(value)
writer.AddDocument(document);
...
}
See also Int64Field, SingleField,
DoubleField.
To perform range querying or filtering against a Int32Field, use NumericRangeQuery<T> or NumericRangeFilter<T>. To sort according to a Int32Field, use the normal numeric sort types, eg INT32. Int32Field values can also be loaded directly from IFieldCache.
You may add the same field name as an Int32Field to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued Int32Field.
An Int32Field will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise.
Within Lucene, each numeric value is indexed as a
trie structure, where each term is logically
assigned to larger and larger pre-defined brackets (which
are simply lower-precision representations of the value).
The step size between each successive bracket is called the
precisionStep
, measured in bits. Smaller
precisionStep
values result in larger number
of brackets, which consumes more disk space in the index
but may result in faster range search performance. The
default value, 4, was selected for a reasonable tradeoff
of disk space consumption versus performance. You can
create a custom FieldType and invoke the
NumericPrecisionStep setter if you'd
like to change the value. Note that you must also
specify a congruent value when creating
NumericRangeQuery<T> or NumericRangeFilter<T>.
For low cardinality fields larger precision steps are good.
If the cardinality is < 100, it is fair
to use
For more information on the internals of numeric trie
indexing, including the PrecisionStep precisionStep
configuration, see NumericRangeQuery<T>. The format of
indexed values is described in NumericUtils.
If you only need to sort by numeric value, and never
run range querying/filtering, you can index using a
precisionStep
of
More advanced users can instead use NumericTokenStream directly, when indexing numbers. this class is a wrapper around this token stream type for easier, more intuitive usage.
NOTE: This was IntField in Lucene
@since 2.9Int64DocValuesField
Field that stores a per-document
document.Add(new Int64DocValuesField(name, 22L));
If you also need to store the value, you should add a separate StoredField instance.
NOTE: This was LongDocValuesField in Lucene
Int64Field
Field that indexes
document.Add(new Int64Field(name, 6L, Field.Store.NO));
For optimal performance, re-use the Int64Field and
Document instance for more than one document:
Int64Field field = new Int64Field(name, 0L, Field.Store.NO);
Document document = new Document();
document.Add(field);
for (all documents) {
...
field.SetInt64Value(value)
writer.AddDocument(document);
...
}
See also Int32Field, SingleField,
DoubleField.
Any type that can be converted to long can also be
indexed. For example, date/time values represented by a
To perform range querying or filtering against a Int64Field, use NumericRangeQuery<T> or NumericRangeFilter<T>. To sort according to a Int64Field, use the normal numeric sort types, eg INT64. Int64Field values can also be loaded directly from IFieldCache.
You may add the same field name as an Int64Field to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued Int64Field.
An Int64Field will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise.
Within Lucene, each numeric value is indexed as a
trie structure, where each term is logically
assigned to larger and larger pre-defined brackets (which
are simply lower-precision representations of the value).
The step size between each successive bracket is called the
precisionStep
, measured in bits. Smaller
precisionStep
values result in larger number
of brackets, which consumes more disk space in the index
but may result in faster range search performance. The
default value, 4, was selected for a reasonable tradeoff
of disk space consumption versus performance. You can
create a custom FieldType and invoke the
NumericPrecisionStep setter if you'd
like to change the value. Note that you must also
specify a congruent value when creating
NumericRangeQuery<T> or NumericRangeFilter<T>.
For low cardinality fields larger precision steps are good.
If the cardinality is < 100, it is fair
to use
For more information on the internals of numeric trie
indexing, including the PrecisionStep precisionStep
configuration, see NumericRangeQuery<T>. The format of
indexed values is described in NumericUtils.
If you only need to sort by numeric value, and never
run range querying/filtering, you can index using a
precisionStep
of
More advanced users can instead use NumericTokenStream directly, when indexing numbers. this class is a wrapper around this token stream type for easier, more intuitive usage.
NOTE: This was LongField in Lucene
@since 2.9LazyDocument
Defers actually loading a field's value until you ask for it. You must not use the returned Field instances after the provided reader has been closed.
LazyDocument.LazyField
@lucene.internal
NumericDocValuesField
Field that stores a per-document
document.Add(new NumericDocValuesField(name, 22L));
If you also need to store the value, you should add a separate StoredField instance.
PackedInt64DocValuesField
Field that stores a per-document
document.Add(new PackedInt64DocValuesField(name, 22L));
If you also need to store the value, you should add a separate StoredField instance.
NOTE: This was PackedLongDocValuesField in Lucene
SingleDocValuesField
Syntactic sugar for encoding floats as NumericDocValues via SingleToRawInt32Bits(Single).
Per-document floating point values can be retrieved via GetSingles(AtomicReader, String, Boolean).
NOTE: In most all cases this will be rather inefficient, requiring four bytes per document. Consider encoding floating point values yourself with only as much precision as you require.
NOTE: This was FloatDocValuesField in Lucene
SingleField
Field that indexes
document.Add(new SingleField(name, 6.0F, Field.Store.NO));
For optimal performance, re-use the SingleField and
Document instance for more than one document:
FloatField field = new SingleField(name, 0.0F, Field.Store.NO);
Document document = new Document();
document.Add(field);
for (all documents)
{
...
field.SetSingleValue(value)
writer.AddDocument(document);
...
}
See also Int32Field, Int64Field,
DoubleField.
To perform range querying or filtering against a SingleField, use NumericRangeQuery<T> or NumericRangeFilter<T>. To sort according to a SingleField, use the normal numeric sort types, eg SINGLE. SingleField values can also be loaded directly from IFieldCache.
You may add the same field name as an SingleField to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued SingleField.
A SingleField will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise.
Within Lucene, each numeric value is indexed as a
trie structure, where each term is logically
assigned to larger and larger pre-defined brackets (which
are simply lower-precision representations of the value).
The step size between each successive bracket is called the
precisionStep
, measured in bits. Smaller
precisionStep
values result in larger number
of brackets, which consumes more disk space in the index
but may result in faster range search performance. The
default value, 4, was selected for a reasonable tradeoff
of disk space consumption versus performance. You can
create a custom FieldType and invoke the
NumericPrecisionStep setter if you'd
like to change the value. Note that you must also
specify a congruent value when creating
NumericRangeQuery<T>
or NumericRangeFilter<T>.
For low cardinality fields larger precision steps are good.
If the cardinality is < 100, it is fair
to use
For more information on the internals of numeric trie
indexing, including the PrecisionStep precisionStep
configuration, see NumericRangeQuery<T>. The format of
indexed values is described in NumericUtils.
If you only need to sort by numeric value, and never
run range querying/filtering, you can index using a
precisionStep
of
More advanced users can instead use NumericTokenStream directly, when indexing numbers. This class is a wrapper around this token stream type for easier, more intuitive usage.
NOTE: This was FloatField in Lucene
@since 2.9SortedBytesDocValuesField
Field that stores a per-document BytesRef value, indexed for sorting. Here's an example usage:
document.Add(new SortedBytesDocValuesField(name, new BytesRef("hello")));
If you also need to store the value, you should add a separate StoredField instance.
SortedDocValuesField
Field that stores a per-document BytesRef value, indexed for sorting. Here's an example usage:
document.Add(new SortedDocValuesField(name, new BytesRef("hello")));
If you also need to store the value, you should add a separate StoredField instance.
SortedSetDocValuesField
Field that stores a set of per-document BytesRef values, indexed for faceting,grouping,joining. Here's an example usage:
document.Add(new SortedSetDocValuesField(name, new BytesRef("hello")));
document.Add(new SortedSetDocValuesField(name, new BytesRef("world")));
If you also need to store the value, you should add a separate StoredField instance.
StoredField
A field whose value is stored so that
StraightBytesDocValuesField
Field that stores a per-document BytesRef value. If values may be shared it's better to use SortedDocValuesField. Here's an example usage:
document.Add(new StraightBytesDocValuesField(name, new BytesRef("hello")));
If you also need to store the value, you should add a separate StoredField instance.
StringField
A field that is indexed but not tokenized: the entire
TextField
A field that is indexed and tokenized, without term vectors. For example this would be used on a 'body' field, that contains the bulk of a document's text.
Enums
DateTools.Resolution
Specifies the time granularity.
Field.Index
Specifies whether and how a field should be indexed.
Field.Store
Specifies whether and how a field should be stored.
Field.TermVector
Specifies whether and how a field should have term vectors.
NumericFieldType
Data type of the numeric IIndexableField value
NumericType
Data type of the numeric value @since 3.2