Namespace Lucene.Net.Documents
Classes
AbstractField
CompressionTools
Simple utility class providing static methods to compress and decompress binary data for stored fields. This class uses java.util.zip.Deflater and Inflater classes to compress and decompress.
DateField
Provides support for converting dates to strings and vice-versa. The strings are structured so that lexicographic sorting orders by date, which makes them suitable for use as field values and search terms.
Note that this class saves dates with millisecond granularity, which is bad for TermRangeQuery and PrefixQuery, as those queries are expanded to a BooleanQuery with a potentially large number of terms when searching. Thus you might want to use DateTools instead.
Note: dates before 1970 cannot be used, and therefore cannot be indexed when using this class. See DateTools for an alternative without such a limitation.
Another approach is Lucene.Net.Util.NumericUtils, which provides
a sortable binary representation (prefix encoded) of numeric values, which
date/time are.
For indexing a long
and
index this as a numeric value with NumericField
and use NumericRangeQuery<T> to query it.
DateTools
Provides support for converting dates to strings and vice-versa. The strings are structured so that lexicographic sorting orders them by date, which makes them suitable for use as field values and search terms.
This class also helps you to limit the resolution of your dates. Do not save dates with a finer resolution than you really need, as then RangeQuery and PrefixQuery will require more memory and become slower.
Compared to DateField the strings generated by the methods
in this class take slightly more space, unless your selected resolution
is set to Resolution.DAY
or lower.
Another approach is Lucene.Net.Util.NumericUtils, which provides
a sortable binary representation (prefix encoded) of numeric values, which
date/time are.
For indexing a long
and
index this as a numeric value with NumericField
and use NumericRangeQuery<T> to query it.
DateTools.Resolution
Specifies the time granularity.
Document
Documents are the unit of indexing and search.
A Document is a set of fields. Each field has a name and a textual value. A field may be IsStored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.
Note that fields which are not IsStored are not available in documents retrieved from the index, e.g. with Doc, Doc(Int32) or Document(Int32).
Field
A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.
FieldExtensions
LoadFirstFieldSelector
Load the First field and break.
See LOAD_AND_BREAK
MapFieldSelector
A FieldSelector based on a Map of field names to FieldSelectorResults
NumberTools
Provides support for converting longs to Strings, and back again. The strings are structured so that lexicographic sorting order is preserved.
That is, if l1 is less than l2 for any two longs l1 and l2, then NumberTools.longToString(l1) is lexicographically less than NumberTools.longToString(l2). (Similarly for "greater than" and "equals".)
This class handles all long values (unlike DateField).
NumericField
This class provides a Field that enables indexing of numeric values for efficient range filtering and sorting. Here's an example usage, adding an int value:
document.add(new NumericField(name).setIntValue(value));
For optimal performance, re-use the
NumericField
and Document instance for more than
one document:
NumericField field = new NumericField(name);
Document document = new Document();
document.add(field);
for(all documents) {
...
field.setIntValue(value)
writer.addDocument(document);
...
}
The .Net native types int
, long
,
float
and double
are
directly supported. However, any value that can be
converted into these native types can also be indexed.
For example, date/time values represented by a
java.util.Date.getTime
method. If you
don't need millisecond precision, you can quantize the
value, either by dividing the result of
java.util.Date.getTime
or using the separate getters
(for year, month, etc.) to construct an int
or
long
value.
To perform range querying or filtering against a
NumericField
, use NumericRangeQuery<T> or NumericRangeFilter<T>
. To sort according to a
NumericField
, use the normal numeric sort types, eg
INT NumericField
values
can also be loaded directly from FieldCache.
By default, a NumericField
's value is not stored but
is indexed for range filtering and sorting. You can use
the
You may add the same field name as a NumericField
to
the same document more than once. Range querying and
filtering will be the logical OR of all values; so a range query
will hit all documents that have at least one value in
the range. However sort behavior is not defined. If you need to sort,
you should separately index a single-valued NumericField
.
A NumericField
will consume somewhat more disk space
in the index than an ordinary single-valued field.
However, for a typical index that includes substantial
textual content per document, this increase will likely
be in the noise.
Within Lucene, each numeric value is indexed as a
trie structure, where each term is logically
assigned to larger and larger pre-defined brackets (which
are simply lower-precision representations of the value).
The step size between each successive bracket is called the
precisionStep
, measured in bits. Smaller
precisionStep
values result in larger number
of brackets, which consumes more disk space in the index
but may result in faster range search performance. The
default value, 4, was selected for a reasonable tradeoff
of disk space consumption versus performance. You can
use the expert constructor
For more information on the internals of numeric trie
indexing, including the precisionStep
configuration, see NumericRangeQuery<T>. The format of
indexed values is described in Lucene.Net.Util.NumericUtils.
If you only need to sort by numeric value, and never
run range querying/filtering, you can index using a
precisionStep
of
More advanced users can instead use NumericTokenStream directly, when indexing numbers. This class is a wrapper around this token stream type for easier, more intuitive usage.
NOTE: This class is only used during
indexing. When retrieving the stored field value from a
Document instance after search, you will get a
conventional IFieldable instance where the numeric
values are returned as toString(value)
of the used data type).
NOTE: This API is experimental and might change in incompatible ways in the next release.
SetBasedFieldSelector
Declare what fields to load normally and what fields to load lazily
Interfaces
FieldSelector
Similar to a java.io.FileFilter, the FieldSelector allows one to make decisions about what Fields get loaded on a Document by Document(Int32, FieldSelector)
IFieldable
Synonymous with Field.
Enums
Field.Index
Specifies whether and how a field should be indexed.
Field.Store
Specifies whether and how a field should be stored.
Field.TermVector
Specifies whether and how a field should have term vectors.
FieldSelectorResult
Provides information about what should be done with this Field