Namespace Lucene.Net.Index
Classes
AlcoholicMergePolicy
Merge policy for testing, it is like an alcoholic. It drinks (merges) at night, and randomly decides what to drink. During the daytime it sleeps.
if tests pass with this, then they are likely to pass with any bizarro merge policy users might write.
It is a fine bottle of champagne (Ordered by Martijn).
AllDeletedFilterReader
Filters the incoming reader and makes all documents appear deleted.
AssertingAtomicReader
A
AssertingAtomicReader.AssertingBinaryDocValues
Wraps a BinaryDocValues but with additional asserts
AssertingAtomicReader.AssertingBits
Wraps a Bits but with additional asserts
AssertingAtomicReader.AssertingDocsEnum
Wraps a docsenum with additional checks
AssertingAtomicReader.AssertingFields
Wraps a Fields but with additional asserts
AssertingAtomicReader.AssertingNumericDocValues
Wraps a NumericDocValues but with additional asserts
AssertingAtomicReader.AssertingSortedDocValues
Wraps a SortedDocValues but with additional asserts
AssertingAtomicReader.AssertingSortedSetDocValues
Wraps a SortedSetDocValues but with additional asserts
AssertingAtomicReader.AssertingTerms
Wraps a Terms but with additional asserts
AssertingDirectoryReader
A
AtomicReader
AtomicReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable. IndexReaders implemented by this subclass do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings.
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
AtomicReaderContext
IndexReaderContext for AtomicReader instances.
BaseCompositeReader<R>
Base class for implementing CompositeReaders based on an array of sub-readers. The implementing class has to add code for correctly refcounting and closing the sub-readers.
User code will most likely use MultiReader to build a composite reader on a set of sub-readers (like several DirectoryReaders).
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
@lucene.internal
BaseCompressingDocValuesFormatTestCase
Extends BaseDocValuesFormatTestCase to add compression checks.
BaseDocValuesFormatTestCase
Abstract class to do basic tests for a docvalues format. NOTE: this test focuses on the docvalues impl, nothing else. The [stretch] goal is for this test to be so thorough in testing a new DocValuesFormat that if this test passes, then all Lucene/Solr tests should also pass. Ie, if there is some bug in a given DocValuesFormat that this test fails to catch then this test needs to be improved!
BaseIndexFileFormatTestCase
Common tests to all index formats.
BaseMergePolicyTestCase
Base test case for MergePolicy().
BasePostingsFormatTestCase
Abstract class to do basic tests for a postings format. NOTE: this test focuses on the postings (docs/freqs/positions/payloads/offsets) impl, not the terms dict. The [stretch] goal is for this test to be so thorough in testing a new PostingsFormat that if this test passes, then all Lucene/Solr tests should also pass. Ie, if there is some bug in a given PostingsFormat that this test fails to catch then this test needs to be improved!
BaseStoredFieldsFormatTestCase
Base class aiming at testing
BaseTermVectorsFormatTestCase
Base class aiming at testing
BaseTermVectorsFormatTestCase.RandomDocument
BaseTermVectorsFormatTestCase.RandomDocumentFactory
BaseTermVectorsFormatTestCase.RandomTokenStream
BinaryDocValues
A per-document byte[]
BufferedUpdates
Holds buffered deletes and updates, by docID, term or query for a single segment. this is used to hold buffered pending deletes and updates against the to-be-flushed segment. Once the deletes and updates are pushed (on flush in Lucene.Net.Index.DocumentsWriter), they are converted to a FrozenDeletes instance.
NOTE: instances of this class are accessed either via a private instance on Lucene.Net.Index.DocumentsWriterPerThread, or via sync'd code by Lucene.Net.Index.DocumentsWriterDeleteQueue
ByteSliceReader
IndexInput that knows how to read the byte slices written by Posting and PostingVector. We read the bytes in each slice until we hit the end of that slice at which point we read the forwarding address of the next slice and then jump to it.
CheckAbort
Class for recording units of work when merging segments.
CheckIndex
Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.
As this tool checks every byte in the index, on a large index it can take quite a long time to run.
Please make a complete backup of your index before using this to fix your index!
@lucene.experimental
CheckIndex.Status
Returned from DoCheckIndex() detailing the health and status of the index.
@lucene.experimental
CheckIndex.Status.DocValuesStatus
Status from testing DocValues
CheckIndex.Status.FieldNormStatus
Status from testing field norms.
CheckIndex.Status.SegmentInfoStatus
Holds the status of each segment in the index. See SegmentInfos.
@lucene.experimental
CheckIndex.Status.StoredFieldStatus
Status from testing stored fields.
CheckIndex.Status.TermIndexStatus
Status from testing term index.
CheckIndex.Status.TermVectorStatus
Status from testing stored fields.
CompositeReader
Instances of this reader type can only be used to get stored fields from the underlying AtomicReaders, but it is not possible to directly retrieve postings. To do that, get the AtomicReaderContext for all sub-readers via Leaves. Alternatively, you can mimic an AtomicReader (with a serious slowdown), by wrapping composite readers with SlowCompositeReaderWrapper.
IndexReader instances for indexes on disk are usually constructed
with a call to one of the static DirectoryReader.Open()
methods,
e.g. Open(Directory). DirectoryReader implements
the CompositeReader interface, it is not possible to directly get postings.
Concrete subclasses of IndexReader are usually constructed with a call to
one of the static Open()
methods, e.g. Open(Directory).
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
CompositeReaderContext
IndexReaderContext for CompositeReader instance.
CompositeReaderContext.Builder
CompoundFileExtractor
Command-line tool for extracting sub-files out of a compound file.
ConcurrentMergeScheduler
A MergeScheduler that runs each merge using a separate thread.
Specify the max number of threads that may run at once, and the maximum number of simultaneous merges with SetMaxMergesAndThreads(Int32, Int32).
If the number of merges exceeds the max number of threads then the largest merges are paused until one of the smaller merges completes.
If more than MaxMergeCount merges are requested then this class will forcefully throttle the incoming threads by pausing until one more more merges complete.
ConcurrentMergeScheduler.MergeThread
Runs a merge thread, which may run one or more merges in sequence.
CorruptIndexException
This exception is thrown when Lucene detects an inconsistency in the index.
DirectoryReader
DirectoryReader is an implementation of CompositeReader that can read indexes in a Directory.
DirectoryReader instances are usually constructed with a call to
one of the static Open()
methods, e.g. Open(Directory).
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
DocHelper
DocsAndPositionsEnum
Also iterates through positions.
DocsEnum
Iterates through the documents and term freqs. NOTE: you must first call NextDoc() before using any of the per-doc methods.
DocTermOrds
This class enables fast access to multiple term ords for a specified field across all docIDs.
Like IFieldCache, it uninverts the index and holds a packed data structure in RAM to enable fast access. Unlike IFieldCache, it can handle multi-valued fields, and, it does not hold the term bytes in RAM. Rather, you must obtain a TermsEnum from the GetOrdTermsEnum(AtomicReader) method, and then seek-by-ord to get the term's bytes.
While normally term ords are type
Deleted documents are skipped during uninversion, and if you look them up you'll get 0 ords.
The returned per-document ords do not retain their original order in the document. Instead they are returned in sorted (by ord, ie term's BytesRef comparer) order. They are also de-dup'd (ie if doc has same term more than once in this field, you'll only get that ord back once).
This class tests whether the provided reader is able to retrieve terms by ord (ie, it's single segment, and it uses an ord-capable terms index). If not, this class will create its own term index internally, allowing to create a wrapped TermsEnum that can handle ord. The GetOrdTermsEnum(AtomicReader) method then provides this wrapped enum, if necessary.
The RAM consumption of this class can be high!
@lucene.experimental
DocValues
This class contains utility methods and constants for DocValues
DocValuesUpdate
An in-place update to a DocValues field.
DocValuesUpdate.BinaryDocValuesUpdate
An in-place update to a binary DocValues field
DocValuesUpdate.NumericDocValuesUpdate
An in-place update to a numeric DocValues field
FieldFilterAtomicReader
A
FieldInfo
Access to the Field Info file that describes document fields and whether or not they are indexed. Each segment has a separate Field Info file. Objects of this class are thread-safe for multiple readers, but only one thread can be adding documents at a time, with no other reader or writer threads accessing this object.
FieldInfos
Collection of FieldInfos (accessible by number or by name).
@lucene.experimental
FieldInvertState
This class tracks the number and position / offset parameters of terms being added to the index. The information collected in this class is also used to calculate the normalization factor for a field.
@lucene.experimental
Fields
Flex API for access to fields and terms
@lucene.experimental
FilterAtomicReader
A FilterAtomicReader contains another AtomicReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class FilterAtomicReader itself simply implements all abstract methods of IndexReader with versions that pass all requests to the contained index reader. Subclasses of FilterAtomicReader may further override some of these methods and may also provide additional methods and fields.
NOTE: If you override LiveDocs, you will likely need to override IntNumDocs as well and vice-versa.
NOTE: If this FilterAtomicReader does not change the content the contained reader, you could consider overriding CoreCacheKey so that IFieldCache and CachingWrapperFilter share the same entries for this atomic reader and the wrapped one. CombinedCoreAndDeletesKey could be overridden as well if the LiveDocs are not changed either.
FilterAtomicReader.FilterDocsAndPositionsEnum
Base class for filtering DocsAndPositionsEnum implementations.
FilterAtomicReader.FilterDocsEnum
Base class for filtering DocsEnum implementations.
FilterAtomicReader.FilterFields
Base class for filtering Fields implementations.
FilterAtomicReader.FilterTerms
Base class for filtering Terms implementations.
NOTE: If the order of terms and documents is not changed, and if these terms are going to be intersected with automata, you could consider overriding Intersect(CompiledAutomaton, BytesRef) for better performance.
FilterAtomicReader.FilterTermsEnum
Base class for filtering TermsEnum implementations.
FilterDirectoryReader
A FilterDirectoryReader wraps another DirectoryReader, allowing implementations to transform or extend it.
Subclasses should implement DoWrapDirectoryReader(DirectoryReader) to return an instance of the subclass.
If the subclass wants to wrap the DirectoryReader's subreaders, it should also implement a FilterDirectoryReader.SubReaderWrapper subclass, and pass an instance to its base constructor.
FilterDirectoryReader.StandardReaderWrapper
A no-op FilterDirectoryReader.SubReaderWrapper that simply returns the parent DirectoryReader's original subreaders.
FilterDirectoryReader.SubReaderWrapper
Factory class passed to FilterDirectoryReader constructor that allows
subclasses to wrap the filtered DirectoryReader's subreaders. You
can use this to, e.g., wrap the subreaders with specialized
FilteredTermsEnum
Abstract class for enumerating a subset of all terms.
Term enumerations are always ordered by Comparer. Each term in the enumeration is greater than all that precede it.
Please note:
Consumers of this enumeration cannot
call Seek()
, it is forward only; it throws
IndexCommit
Expert: represents a single commit into an index as seen by the IndexDeletionPolicy or IndexReader.
Changes to the content of an index are made visible
only after the writer who made that change commits by
writing a new segments file
(segments_N
). This point in time, when the
action of writing of a new segments file to the directory
is completed, is an index commit.
Each index commit point has a unique segments file associated with it. The segments file associated with a later index commit point would have a larger N.
@lucene.experimental
IndexDeletionPolicy
Expert: policy for deletion of stale IndexCommits.
Implement this interface, and pass it to one of the IndexWriter or IndexReader constructors, to customize when older point-in-time commits (IndexCommit) are deleted from the index directory. The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).
One expected use case for this (and the reason why it was first created) is to work around problems with an index directory accessed via filesystems like NFS because NFS does not provide the "delete on last close" semantics that Lucene's "point in time" search normally relies on. By implementing a custom deletion policy, such as "a commit is only removed once it has been stale for more than X minutes", you can give your readers time to refresh to the new commit before IndexWriter removes the old commits. Note that doing so will increase the storage requirements of the index. See LUCENE-710 for details.
Implementers of sub-classes should make sure that Clone() returns an independent instance able to work with any other IndexWriter or Directory instance.
IndexFileNames
This class contains useful constants representing filenames and extensions used by lucene, as well as convenience methods for querying whether a file name matches an extension (MatchesExtension(String, String)), as well as generating file names from a segment name, generation and extension (FileNameFromGeneration(String, String, Int64), SegmentFileName(String, String, String)).
NOTE: extensions used by codecs are not listed here. You must interact with the Codec directly.
@lucene.internal
IndexFormatTooNewException
This exception is thrown when Lucene detects an index that is newer than this Lucene version.
IndexFormatTooOldException
This exception is thrown when Lucene detects an index that is too old for this Lucene version
IndexNotFoundException
Signals that no index was found in the
IndexReader
IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.
There are two different types of IndexReaders:
- AtomicReader: These indexes do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings.
- CompositeReader: Instances (like DirectoryReader) of this reader can only be used to get stored fields from the underlying AtomicReaders, but it is not possible to directly retrieve postings. To do that, get the sub-readers via GetSequentialSubReaders(). Alternatively, you can mimic an AtomicReader (with a serious slowdown), by wrapping composite readers with SlowCompositeReaderWrapper.
IndexReader instances for indexes on disk are usually constructed
with a call to one of the static DirectoryReader.Open()
methods,
e.g. Open(Directory). DirectoryReader inherits
the CompositeReader abstract class, it is not possible to directly get postings.
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
IndexReaderContext
A struct like class that represents a hierarchical relationship between IndexReader instances.
IndexSplitter
Command-line tool that enables listing segments in an index, copying specific segments to another index, and deleting segments from an index.
This tool does file-level copying of segments files. This means it's unable to split apart a single segment into multiple segments. For example if your index is a single segment, this tool won't help. Also, it does basic file-level copying (using simple Stream) so it will not work with non FSDirectory Directory impls.
@lucene.experimental You can easily accidentally remove segments from your index so be careful!
IndexUpgrader
This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line:
java -cp lucene-core.jar Lucene.Net.Index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir
Alternatively this class can be instantiated and Upgrade() invoked. It uses UpgradeIndexMergePolicy and triggers the upgrade via an ForceMerge(Int32) request to IndexWriter.
This tool keeps only the last commit in an index; for this
reason, if the incoming index has more than one commit, the tool
refuses to run by default. Specify -delete-prior-commits
to override this, allowing the tool to delete all but the last commit.
From .NET code this can be enabled by passing true
to
IndexUpgrader(Directory, LuceneVersion, TextWriter, Boolean).
Warning: this tool may reorder documents if the index was partially upgraded before execution (e.g., documents were added). If your application relies on "monotonicity" of doc IDs (which means that the order in which the documents were added to the index is preserved), do a full ForceMerge instead. The MergePolicy set by IndexWriterConfig may also reorder documents.
IndexWriter
An IndexWriter creates and maintains an index.
IndexWriter.IndexReaderWarmer
If Open(IndexWriter, Boolean) has been called (ie, this writer is in near real-time mode), then after a merge completes, this class can be invoked to warm the reader on the newly merged segment, before the merge commits. This is not required for near real-time search, but will reduce search latency on opening a new near real-time reader after a merge completes.
@lucene.experimental
NOTE: Warm(AtomicReader) is called before any deletes have been carried over to the merged segment.
IndexWriterConfig
Holds all the configuration that is used to create an IndexWriter. Once IndexWriter has been created with this object, changes to this object will not affect the IndexWriter instance. For that, use LiveIndexWriterConfig that is returned from Config.
LUCENENET NOTE: Unlike Lucene, we use property setters instead of setter methods. In C#, this allows you to initialize the IndexWriterConfig using the language features of C#, for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer)
{
Codec = Lucene46Codec(),
OpenMode = OpenMode.CREATE
};
However, if you prefer to match the syntax of Lucene using chained setter methods, there are extension methods in the Lucene.Net.Support namespace. Example usage:
using Lucene.Net.Support;
..
IndexWriterConfig conf = new IndexWriterConfig(analyzer)
.SetCodec(new Lucene46Codec())
.SetOpenMode(OpenMode.CREATE);
@since 3.1
KeepOnlyLastCommitDeletionPolicy
This IndexDeletionPolicy implementation that keeps only the most recent commit and immediately removes all prior commits after a new commit is done. This is the default deletion policy.
LiveIndexWriterConfig
Holds all the configuration used by IndexWriter with few setters for settings that can be changed on an IndexWriter instance "live".
@since 4.0
LogByteSizeMergePolicy
This is a LogMergePolicy that measures size of a segment as the total byte size of the segment's files.
LogDocMergePolicy
This is a LogMergePolicy that measures size of a segment as the number of documents (not taking deletions into account).
LogMergePolicy
This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using MergeFactor.
This class is abstract and requires a subclass to define the Size(SegmentCommitInfo) method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.
MergePolicy
Expert: a MergePolicy determines the sequence of primitive merge operations.
Whenever the segments in an index have been altered by IndexWriter, either the addition of a newly flushed segment, addition of many segments from AddIndexes* calls, or a previous merge that may now need to cascade, IndexWriter invokes FindMerges(MergeTrigger, SegmentInfos) to give the MergePolicy a chance to pick merges that are now required. This method returns a MergePolicy.MergeSpecification instance describing the set of merges that should be done, or null if no merges are necessary. When ForceMerge(Int32) is called, it calls FindForcedMerges(SegmentInfos, Int32, IDictionary<SegmentCommitInfo, Nullable<Boolean>>) and the MergePolicy should then return the necessary merges.
Note that the policy can return more than one merge at a time. In this case, if the writer is using SerialMergeScheduler, the merges will be run sequentially but if it is using ConcurrentMergeScheduler they will be run concurrently.
The default MergePolicy is TieredMergePolicy.
@lucene.experimental
MergePolicy.DocMap
A map of doc IDs.
MergePolicy.MergeAbortedException
Thrown when a merge was explicity aborted because
Dispose(Boolean) was called with
false
. Normally this exception is
privately caught and suppresed by IndexWriter.
MergePolicy.MergeException
Exception thrown if there are any problems while executing a merge.
MergePolicy.MergeSpecification
A MergePolicy.MergeSpecification instance provides the information necessary to perform multiple merges. It simply contains a list of MergePolicy.OneMerge instances.
MergePolicy.OneMerge
OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment. The merge spec includes the subset of segments to be merged as well as whether the new segment should use the compound file format.
MergeScheduler
Expert: IndexWriter uses an instance implementing this interface to execute the merges selected by a MergePolicy. The default MergeScheduler is ConcurrentMergeScheduler.
Implementers of sub-classes should make sure that Clone() returns an independent instance able to work with any IndexWriter instance.
@lucene.experimentalMergeState
Holds common state used during segment merging.
@lucene.experimental
MergeState.DocMap
Remaps docids around deletes during merge
MockIndexInput
IndexInput backed by a byte[] for testing.
MockRandomMergePolicy
MergePolicy that makes random decisions for testing.
MultiDocsAndPositionsEnum
Exposes flex API, merged from flex API of sub-segments.
@lucene.experimental
MultiDocsAndPositionsEnum.EnumWithSlice
Holds a DocsAndPositionsEnum along with the corresponding ReaderSlice.
MultiDocsEnum
Exposes DocsEnum, merged from DocsEnum API of sub-segments.
@lucene.experimental
MultiDocsEnum.EnumWithSlice
Holds a DocsEnum along with the corresponding ReaderSlice.
MultiDocValues
A wrapper for CompositeReader providing access to DocValues.
NOTE: for multi readers, you'll get better performance by gathering the sub readers using Context to get the atomic leaves and then operate per-AtomicReader, instead of using this class.
NOTE: this is very costly.
@lucene.experimental @lucene.internal
MultiDocValues.MultiSortedDocValues
Implements SortedDocValues over n subs, using an MultiDocValues.OrdinalMap
@lucene.internal
MultiDocValues.MultiSortedSetDocValues
Implements MultiDocValues.MultiSortedSetDocValues over n subs, using an MultiDocValues.OrdinalMap
@lucene.internal
MultiDocValues.OrdinalMap
maps per-segment ordinals to/from global ordinal space
MultiFields
Exposes flex API, merged from flex API of sub-segments. This is useful when you're interacting with an IndexReader implementation that consists of sequential sub-readers (eg DirectoryReader or MultiReader).
NOTE: for composite readers, you'll get better performance by gathering the sub readers using Context to get the atomic leaves and then operate per-AtomicReader, instead of using this class.
@lucene.experimental
MultiIndexWriter
MultiPassIndexSplitter
This tool splits input index into multiple equal parts. The method employed here uses AddIndexes(IndexReader[]) where the input data comes from the input index with artificially applied deletes to the document id-s that fall outside the selected partition.
Note 1: Deletes are only applied to a buffered list of deleted docs and don't affect the source index - this tool works also with read-only indexes.
Note 2: the disadvantage of this tool is that source index needs to be read as many times as there are parts to be created, hence the name of this tool.
NOTE: this tool is unaware of documents added atomically via Lucene.Net.Index.IndexWriter.AddDocuments(System.Collections.Generic.IEnumerable{System.Collections.Generic.IEnumerable{Lucene.Net.Index.IIndexableField}},Lucene.Net.Analysis.Analyzer) or Lucene.Net.Index.IndexWriter.UpdateDocuments(Lucene.Net.Index.Term,System.Collections.Generic.IEnumerable{System.Collections.Generic.IEnumerable{Lucene.Net.Index.IIndexableField}},Lucene.Net.Analysis.Analyzer), which means it can easily break up such document groups.
MultiReader
A CompositeReader which reads multiple indexes, appending their content. It can be used to create a view on several sub-readers (like DirectoryReader) and execute searches on it.
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
MultiTerms
Exposes flex API, merged from flex API of sub-segments.
@lucene.experimental
MultiTermsEnum
Exposes TermsEnum API, merged from TermsEnum API of sub-segments. This does a merge sort, by term text, of the sub-readers.
@lucene.experimental
MultiTermsEnum.TermsEnumIndex
MultiTermsEnum.TermsEnumWithSlice
NoDeletionPolicy
An IndexDeletionPolicy which keeps all index commits around, never deleting them. This class is a singleton and can be accessed by referencing INSTANCE.
NoMergePolicy
A MergePolicy which never returns merges to execute (hence it's name). It is also a singleton and can be accessed through NO_COMPOUND_FILES if you want to indicate the index does not use compound files, or through COMPOUND_FILES otherwise. Use it if you want to prevent an IndexWriter from ever executing merges, without going through the hassle of tweaking a merge policy's settings to achieve that, such as changing its merge factor.
NoMergeScheduler
A MergeScheduler which never executes any merges. It is also a singleton and can be accessed through INSTANCE. Use it if you want to prevent an IndexWriter from ever executing merges, regardless of the MergePolicy used. Note that you can achieve the same thing by using NoMergePolicy, however with NoMergeScheduler you also ensure that no unnecessary code of any MergeScheduler implementation is ever executed. Hence it is recommended to use both if you want to disable merges from ever happening.
NumericDocValues
A per-document numeric value.
OrdTermState
An ordinal based TermState
@lucene.experimental
ParallelAtomicReader
An AtomicReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field.
This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together.
Warning: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior.
ParallelCompositeReader
A CompositeReader which reads multiple, parallel indexes. Each index added must have the same number of documents, and exactly the same hierarchical subreader structure, but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field.
This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together.
Warning: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior. A good strategy to create suitable indexes with IndexWriter is to use LogDocMergePolicy, as this one does not reorder documents during merging (like TieredMergePolicy) and triggers merges by number of documents per segment. If you use different MergePolicys it might happen that the segment structure of your index is no longer predictable.
PersistentSnapshotDeletionPolicy
A SnapshotDeletionPolicy which adds a persistence layer so that snapshots can be maintained across the life of an application. The snapshots are persisted in a Directory and are committed as soon as Snapshot() or Release(IndexCommit) is called.
NOTE: Sharing PersistentSnapshotDeletionPolicys that write to the same directory across IndexWriters will corrupt snapshots. You should make sure every IndexWriter has its own PersistentSnapshotDeletionPolicy and that they all write to a different Directory. It is OK to use the same Directory that holds the index.
This class adds a Release(Int64) method to release commits from a previous snapshot's Generation.
@lucene.experimental
PKIndexSplitter
Split an index based on a Filter.
RandomAccessOrds
Extension of SortedSetDocValues that supports random access to the ordinals of a document.
Operations via this API are independent of the iterator api (NextOrd()) and do not impact its state.
Codecs can optionally extend this API if they support constant-time access to ordinals for the document.
RandomCodec
Codec that assigns per-field random postings formats.
The same field/format assignment will happen regardless of order, a hash is computed up front that determines the mapping. this means fields can be put into things like HashSets and added to documents in different orders and the test will still be deterministic and reproducable.
RandomIndexWriter
Silly class that randomizes the indexing experience. EG it may swap in a different merge policy/scheduler; may commit periodically; may or may not forceMerge in the end, may flush by doc count instead of RAM, etc.
RandomIndexWriter.TestPointInfoStream
ReaderManager
Utility class to safely share DirectoryReader instances across multiple threads, while periodically reopening. This class ensures each reader is disposed only once all threads have finished using it.
@lucene.experimental
ReaderSlice
Subreader slice from a parent composite reader.
@lucene.internal
ReaderUtil
Common util methods for dealing with IndexReaders and IndexReaderContexts.
@lucene.internal
SegmentCommitInfo
Embeds a [read-only] SegmentInfo and adds per-commit fields.
@lucene.experimental
SegmentInfo
Information about a segment such as it's name, directory, and files related to the segment.
@lucene.experimental
SegmentInfos
A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.
The active segments in the index are stored in the segment info file,
segments_N
. There may be one or more segments_N
files in the
index; however, the one with the largest generation is the active one (when
older segments_N files are present it's because they temporarily cannot be
deleted, or, a writer is in the process of committing, or a custom
IndexDeletionPolicy
is in use). This file lists each segment by name and has details about the
codec and generation of deletes.
There is also a file segments.gen
. this file contains
the current generation (the _N
in segments_N
) of the index.
This is used only as a fallback in case the current generation cannot be
accurately determined by directory listing alone (as is the case for some NFS
clients with time-based directory cache expiration). This file simply contains
an WriteInt32(Int32) version header
(FORMAT_SEGMENTS_GEN_CURRENT), followed by the
generation recorded as WriteInt64(Int64), written twice.
Files:
segments.gen
: GenHeader, Generation, Generation, Footersegments_N
: Header, Version, NameCounter, SegCount, <SegName, SegCodec, DelGen, DeletionCount, FieldInfosGen, UpdatesFiles>SegCount, CommitUserData, Footer
- Header --> WriteHeader(DataOutput, String, Int32)
- GenHeader, NameCounter, SegCount, DeletionCount --> WriteInt32(Int32)
- Generation, Version, DelGen, Checksum, FieldInfosGen --> WriteInt64(Int64)
- SegName, SegCodec --> WriteString(String)
- CommitUserData --> WriteStringStringMap(IDictionary<String, String>)
- UpdatesFiles --> WriteStringSet(ISet<String>)
- Footer --> WriteFooter(IndexOutput)
- Version counts how often the index has been changed by adding or deleting documents.
- NameCounter is used to generate names for new segment files.
- SegName is the name of the segment, and is used as the file name prefix for all of the files that compose the segment's index.
- DelGen is the generation count of the deletes file. If this is -1, there are no deletes. Anything above zero means there are deletes stored by LiveDocsFormat.
- DeletionCount records the number of deleted documents in this segment.
- SegCodec is the Name of the Codec that encoded this segment.
- CommitUserData stores an optional user-supplied opaque
that was passed to SetCommitData(IDictionary<String, String>). - FieldInfosGen is the generation count of the fieldInfos file. If this is -1, there are no updates to the fieldInfos in that segment. Anything above zero means there are updates to fieldInfos stored by FieldInfosFormat.
- UpdatesFiles stores the list of files that were updated in that segment.
@lucene.experimental
SegmentInfos.FindSegmentsFile
Utility class for executing code that needs to do something with the current segments file. This is necessary with lock-less commits because from the time you locate the current segments file name, until you actually open it, read its contents, or check modified time, etc., it could have been deleted due to a writer commit finishing.
SegmentReader
IndexReader implementation over a single segment.
Instances pointing to the same segment (but with different deletes, etc) may share the same core data.
@lucene.experimental
SegmentReadState
Holder class for common parameters used during read.
@lucene.experimental
SegmentWriteState
Holder class for common parameters used during write.
@lucene.experimental
SerialMergeScheduler
A MergeScheduler that simply does each merge sequentially, using the current thread.
SimpleMergedSegmentWarmer
A very simple merged segment warmer that just ensures data structures are initialized.
SingleTermsEnum
Subclass of FilteredTermsEnum for enumerating a single term.
For example, this can be used by MultiTermQuerys that need only visit one term, but want to preserve MultiTermQuery semantics such as MultiTermRewriteMethod.
SlowCompositeReaderWrapper
This class forces a composite reader (eg a MultiReader or DirectoryReader) to emulate an atomic reader. This requires implementing the postings APIs on-the-fly, using the static methods in MultiFields, MultiDocValues, by stepping through the sub-readers to merge fields/terms, appending docs, etc.
NOTE: This class almost always results in a performance hit. If this is important to your use case, you'll get better performance by gathering the sub readers using Context to get the atomic leaves and then operate per-AtomicReader, instead of using this class.
SnapshotDeletionPolicy
An IndexDeletionPolicy that wraps any other IndexDeletionPolicy and adds the ability to hold and later release snapshots of an index. While a snapshot is held, the IndexWriter will not remove any files associated with it even if the index is otherwise being actively, arbitrarily changed. Because we wrap another arbitrary IndexDeletionPolicy, this gives you the freedom to continue using whatever IndexDeletionPolicy you would normally want to use with your index.
This class maintains all snapshots in-memory, and so the information is not persisted and not protected against system failures. If persistence is important, you can use PersistentSnapshotDeletionPolicy.
@lucene.experimental
SortedDocValues
A per-document byte[] with presorted values.
Per-Document values in a SortedDocValues are deduplicated, dereferenced, and sorted into a dictionary of unique values. A pointer to the dictionary value (ordinal) can be retrieved for each document. Ordinals are dense and in increasing sorted order.
SortedSetDocValues
A per-document set of presorted byte[] values.
Per-Document values in a SortedDocValues are deduplicated, dereferenced, and sorted into a dictionary of unique values. A pointer to the dictionary value (ordinal) can be retrieved for each document. Ordinals are dense and in increasing sorted order.
StoredFieldVisitor
Expert: Provides a low-level means of accessing the stored field
values in an index. See
NOTE: a StoredFieldVisitor implementation should not try to load or visit other stored documents in the same reader because the implementation of stored fields for most codecs is not reeentrant and you will see strange exceptions as a result.
See DocumentStoredFieldVisitor, which is a
StoredFieldVisitor that builds the
Document containing all stored fields. This is
used by
@lucene.experimental
SurrogateDirectoryReader
DirectoryReader is an implementation of CompositeReader that can read indexes in a Directory.
DirectoryReader instances are usually constructed with a call to
one of the static Open()
methods, e.g. Open(Directory).
For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions.
NOTE: IndexReader instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexReader instance; use your own (non-Lucene) objects instead.
SurrogateIndexReader
SurrogateIndexWriter
TaskMergeScheduler
A MergeScheduler that runs each merge using
If more than MaxMergeCount merges are requested then this class will forcefully throttle the incoming threads by pausing until one more more merges complete.
LUCENENET specific
Term
A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in.
Note that terms may represent more than words from text fields, but also things like dates, email addresses, urls, etc.
TermContext
Maintains a IndexReader TermState view over IndexReader instances containing a single term. The TermContext doesn't track if the given TermState objects are valid, neither if the TermState instances refer to the same terms in the associated readers.
@lucene.experimental
Terms
Access to the terms in a specific field. See Fields.
@lucene.experimental
TermsEnum
Iterator to seek (SeekCeil(BytesRef), SeekExact(BytesRef)) or step through (Next() terms to obtain frequency information (DocFreq), DocsEnum or DocsAndPositionsEnum for the current term (Docs(IBits, DocsEnum)).
Term enumerations are always ordered by Comparer. Each term in the enumeration is greater than the one before it.
The TermsEnum is unpositioned when you first obtain it
and you must first successfully call Next() or one
of the Seek
methods.
@lucene.experimental
TermState
Encapsulates all required internal state to position the associated TermsEnum without re-seeking.
@lucene.experimental
ThreadedIndexingAndSearchingTestCase
Utility class that spawns multiple indexing and searching threads.
TieredMergePolicy
Merges segments of approximately equal size, subject to an allowed number of segments per tier. This is similar to LogByteSizeMergePolicy, except this merge policy is able to merge non-adjacent segment, and separates how many segments are merged at once (MaxMergeAtOnce) from how many segments are allowed per tier (SegmentsPerTier). This merge policy also does not over-merge (i.e. cascade merges).
For normal merging, this policy first computes a "budget" of how many segments are allowed to be in the index. If the index is over-budget, then the policy sorts segments by decreasing size (pro-rating by percent deletes), and then finds the least-cost merge. Merge cost is measured by a combination of the "skew" of the merge (size of largest segment divided by smallest segment), total merge size and percent deletes reclaimed, so that merges with lower skew, smaller size and those reclaiming more deletes, are favored.
If a merge will produce a segment that's larger than MaxMergedSegmentMB, then the policy will merge fewer segments (down to 1 at once, if that one has deletions) to keep the segment size under budget.
NOTE: This policy freely merges non-adjacent segments; if this is a problem, use LogMergePolicy.
NOTE: This policy always merges by byte size of the segments, always pro-rates by percent deletes, and does not apply any maximum segment size during forceMerge (unlike LogByteSizeMergePolicy).
@lucene.experimental
TieredMergePolicy.MergeScore
Holds score and explanation for a single candidate merge.
TrackingIndexWriter
Class that tracks changes to a delegated IndexWriter, used by ControlledRealTimeReopenThread<T> to ensure specific changes are visible. Create this class (passing your IndexWriter), and then pass this class to ControlledRealTimeReopenThread<T>. Be sure to make all changes via the TrackingIndexWriter, otherwise ControlledRealTimeReopenThread<T> won't know about the changes.
@lucene.experimental
TwoPhaseCommitTool
A utility for executing 2-phase commit on several objects.
@lucene.experimental
TwoPhaseCommitTool.CommitFailException
Thrown by Execute(ITwoPhaseCommit[]) when an object fails to Commit().
TwoPhaseCommitTool.PrepareCommitFailException
Thrown by Execute(ITwoPhaseCommit[]) when an object fails to PrepareCommit().
UpgradeIndexMergePolicy
This MergePolicy is used for upgrading all existing segments of an index when calling ForceMerge(Int32). All other methods delegate to the base MergePolicy given to the constructor. This allows for an as-cheap-as possible upgrade of an older index by only upgrading segments that are created by previous Lucene versions. ForceMerge does no longer really merge; it is just used to "ForceMerge" older segment versions away.
In general one would use IndexUpgrader, but for a fully customizeable upgrade, you can use this like any other MergePolicy and call ForceMerge(Int32):
IndexWriterConfig iwc = new IndexWriterConfig(LuceneVersion.LUCENE_XX, new KeywordAnalyzer());
iwc.MergePolicy = new UpgradeIndexMergePolicy(iwc.MergePolicy);
using (IndexWriter w = new IndexWriter(dir, iwc))
{
w.ForceMerge(1);
}
Warning: this merge policy may reorder documents if the index was partially
upgraded before calling ForceMerge(Int32) (e.g., documents were added). If your application relies
on "monotonicity" of doc IDs (which means that the order in which the documents
were added to the index is preserved), do a ForceMerge(1)
instead. Please note, the
delegate MergePolicy may also reorder documents.
@lucene.experimental
Interfaces
IConcurrentMergeScheduler
IIndexableField
Represents a single field for indexing. IndexWriter consumes IEnumerable<IndexableField> as a document.
@lucene.experimental
IIndexableFieldType
Describes the properties of a field.
@lucene.experimental
IMergeScheduler
IndexReader.IReaderClosedListener
A custom listener that's invoked when the IndexReader is closed.
@lucene.experimental
IndexWriter.IEvent
Interface for internal atomic events. See Lucene.Net.Index.DocumentsWriter for details. Events are executed concurrently and no order is guaranteed. Each event should only rely on the serializeability within it's process method. All actions that must happen before or after a certain action must be encoded inside the Process(IndexWriter, Boolean, Boolean) method.
ITwoPhaseCommit
An interface for implementations that support 2-phase commit. You can use TwoPhaseCommitTool to execute a 2-phase commit algorithm over several ITwoPhaseCommits.
@lucene.experimental
RandomIndexWriter.TestPoint
Simple interface that is executed for each TP
SegmentReader.ICoreDisposedListener
Called when the shared core for this SegmentReader is disposed.
This listener is called only once all SegmentReaders sharing the same core are disposed. At this point it is safe for apps to evict this reader from any caches keyed on CoreCacheKey. This is the same interface that IFieldCache uses, internally, to evict entries.
NOTE: This was CoreClosedListener in Lucene.
@lucene.experimental
Enums
BaseTermVectorsFormatTestCase.Options
A combination of term vectors options.
DocsAndPositionsFlags
DocsFlags
DocValuesFieldUpdatesType
DocValuesType
DocValues types. Note that DocValues is strongly typed, so a field cannot have different types across different documents.
FilteredTermsEnum.AcceptStatus
Return value, if term should be accepted or the iteration should
END. The *_SEEK
values denote, that after handling the current term
the enum should call NextSeekTerm(BytesRef) and step forward.
IndexOptions
Controls how much information is stored in the postings lists.
@lucene.experimental
MergeTrigger
MergeTrigger is passed to FindMerges(MergeTrigger, SegmentInfos) to indicate the event that triggered the merge.
OpenMode
Specifies the open mode for IndexWriter.
StoredFieldVisitor.Status
Enumeration of possible return values for NeedsField(FieldInfo).
TermsEnum.SeekStatus
Represents returned result from SeekCeil(BytesRef).