Namespace Lucene.Net.Util
Classes
AlreadySetException
Thrown when Set(T) is called more than once.
ArrayUtil
Methods for manipulating arrays.
@lucene.internal
Attribute
Base class for Attributes that can be added to a AttributeSource.
Attributes are used to add data in a dynamic, yet type-safe way to a source of usually streamed objects, e. g. a TokenStream.
AttributeSource
An AttributeSource contains a list of different Attributes, and methods to add and get them. There can only be a single instance of an attribute in the same AttributeSource instance. This is ensured by passing in the actual type of the IAttribute to the AddAttribute<T>(), which then checks if an instance of that type is already present. If yes, it returns the instance, otherwise it creates a new instance and returns it.
AttributeSource.AttributeFactory
An AttributeSource.AttributeFactory creates instances of Attributes.
AttributeSource.State
This class holds the state of an AttributeSource.
BaseDocIdSetTestCase<T>
Base test class for
Bits
Bits.MatchAllBits
Bits impl of the specified length with all bits set.
Bits.MatchNoBits
Bits impl of the specified length with no bits set.
BitUtil
A variety of high efficiency bit twiddling routines.
@lucene.internal
BroadWord
Methods and constants inspired by the article "Broadword Implementation of Rank/Select Queries" by Sebastiano Vigna, January 30, 2012:
- algorithm 1: Lucene.Net.Util.BroadWord.BitCount(System.Int64), count of set bits in a
- algorithm 2: Select(Int64, Int32), selection of a set bit in a
, - bytewise signed smaller <8 operator: SmallerUpTo7_8(Int64, Int64).
- shortwise signed smaller <16 operator: SmallerUpto15_16(Int64, Int64).
- some of the Lk and Hk constants that are used by the above: L8 L8_L, H8 H8_L, L9 L9_L, L16 L16_Land H16 H8_L.
ByteBlockPool
ByteBlockPool.Allocator
Abstract class for allocating and freeing byte blocks.
ByteBlockPool.DirectAllocator
A simple ByteBlockPool.Allocator that never recycles.
ByteBlockPool.DirectTrackingAllocator
A simple ByteBlockPool.Allocator that never recycles, but tracks how much total RAM is in use.
BytesRef
Represents byte[], as a slice (offset + length) into an
existing byte[]. The Bytes property should never be null
;
use EMPTY_BYTES if necessary.
Important note: Unless otherwise noted, Lucene uses this class to
represent terms that are encoded as UTF8 bytes in the index. To
convert them to a .NET new String(bytes, offset, length)
to do this
is wrong, as it does not respect the correct character set
and may return wrong results (depending on the platform's defaults)!
BytesRefArray
A simple append only random-access BytesRef array that stores full copies of the appended bytes in a ByteBlockPool.
Note: this class is not Thread-Safe!
@lucene.internal @lucene.experimental
BytesRefHash
BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.
Note: The maximum capacity BytesRef instance passed to Add(BytesRef) must not be longer than BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.
@lucene.internal
BytesRefHash.BytesStartArray
Manages allocation of the per-term addresses.
BytesRefHash.DirectBytesStartArray
A simple BytesRefHash.BytesStartArray that tracks memory allocation using a private Counter instance.
BytesRefHash.MaxBytesLengthExceededException
Thrown if a BytesRef exceeds the BytesRefHash limit of BYTE_BLOCK_SIZE-2.
BytesRefIterator
LUCENENET specific class to make the syntax of creating an empty IBytesRefIterator the same as it was in Lucene. Example:
var iter = BytesRefIterator.Empty;
CharsRef
Represents char[], as a slice (offset + Length) into an existing char[].
The Chars property should never be null
; use
EMPTY_CHARS if necessary.
@lucene.internal
CollectionUtil
Methods for manipulating (sorting) collections. Sort methods work directly on the supplied lists and don't copy to/from arrays before/after. For medium size collections as used in the Lucene indexer that is much more efficient.
@lucene.internal
CommandLineUtil
Class containing some useful methods used by command line tools
Constants
Some useful constants.
Counter
Simple counter class
@lucene.internal @lucene.experimental
DisposableThreadLocal<T>
Java's builtin ThreadLocal has a serious flaw: it can take an arbitrarily long amount of time to dereference the things you had stored in it, even once the ThreadLocal instance itself is no longer referenced. This is because there is single, master map stored for each thread, which all ThreadLocals share, and that master map only periodically purges "stale" entries.
While not technically a memory leak, because eventually
the memory will be reclaimed, it can take a long time
and you can easily hit
This class works around that, by only enrolling WeakReference values into the ThreadLocal, and separately holding a hard reference to each stored value. When you call Dispose(), these hard references are cleared and then GC is freely able to reclaim space by objects stored in it.
You should not call Dispose() until all threads are done using the instance.
@lucene.internal
DocIdBitSet
Simple DocIdSet and DocIdSetIterator backed by a
DoubleBarrelLRUCache
LUCENENET specific class to nest the DoubleBarrelLRUCache.CloneableKey so it can be accessed without referencing the generic closing types of DoubleBarrelLRUCache<TKey, TValue>.
DoubleBarrelLRUCache.CloneableKey
Object providing clone(); the key class must subclass this.
DoubleBarrelLRUCache<TKey, TValue>
Simple concurrent LRU cache, using a "double barrel" approach where two ConcurrentHashMaps record entries.
At any given time, one hash is primary and the other is secondary. Get(TKey) first checks primary, and if that's a miss, checks secondary. If secondary has the entry, it's promoted to primary (NOTE: the key is cloned at this point). Once primary is full, the secondary is cleared and the two are swapped.
This is not as space efficient as other possible concurrent approaches (see LUCENE-2075): to achieve perfect LRU(N) it requires 2*N storage. But, this approach is relatively simple and seems in practice to not grow unbounded in size when under hideously high load.
@lucene.internal
English
Converts numbers to english strings for testing. @lucene.internal
ExcludeServiceAttribute
Base class for Attribute types that exclude services from Reflection scanning.
FailOnNonBulkMergesInfoStream
Hackidy-Häck-Hack to cause a test to fail on non-bulk merges
FailureMarker
A
FieldCacheSanityChecker
Provides methods for sanity checking that entries in the FieldCache are not wasteful or inconsistent.
Lucene 2.9 Introduced numerous enhancements into how the FieldCache is used by the low levels of Lucene searching (for Sorting and ValueSourceQueries) to improve both the speed for Sorting, as well as reopening of IndexReaders. But these changes have shifted the usage of FieldCache from "top level" IndexReaders (frequently a MultiReader or DirectoryReader) down to the leaf level SegmentReaders. As a result, existing applications that directly access the FieldCache may find RAM usage increase significantly when upgrading to 2.9 or Later. This class provides an API for these applications (or their Unit tests) to check at run time if the FieldCache contains "insane" usages of the FieldCache.
@lucene.experimentalFieldCacheSanityChecker.Insanity
Simple container for a collection of related FieldCache.CacheEntry objects that in conjunction with each other represent some "insane" usage of the IFieldCache.
FieldCacheSanityChecker.InsanityType
An Enumeration of the different types of "insane" behavior that may be detected in a IFieldCache.
FilterIterator<T>
An
FixedBitSet
BitSet of fixed length (numBits), backed by accessible (GetBits()) long[], accessed with an int index, implementing GetBits() and DocIdSet. If you need to manage more than 2.1B bits, use Int64BitSet.
@lucene.internal
FixedBitSet.FixedBitSetIterator
A DocIdSetIterator which iterates over set bits in a FixedBitSet.
GrowableByteArrayDataOutput
A DataOutput that can be used to build a byte[].
@lucene.internal
IndexableBinaryStringTools
Provides support for converting byte sequences to
The
Although unset bits are used as padding in the final char, the original byte sequence could contain trailing bytes with no set bits (null bytes): padding is indistinguishable from valid information. To overcome this problem, a char is appended, indicating the number of encoded bytes in the final content char.
@lucene.experimental
InfoStream
Debugging API for Lucene classes such as IndexWriter and SegmentInfos.
NOTE: Enabling infostreams may cause performance degradation in some components.
@lucene.internalInPlaceMergeSorter
Sorter implementation based on the merge-sort algorithm that merges in place (no extra memory will be allocated). Small arrays are sorted with insertion sort.
@lucene.internal
Int32BlockPool
A pool for
NOTE: This was IntBlockPool in Lucene
@lucene.internal
Int32BlockPool.Allocator
Abstract class for allocating and freeing
Int32BlockPool.DirectAllocator
A simple Int32BlockPool.Allocator that never recycles.
Int32BlockPool.SliceReader
A Int32BlockPool.SliceReader that can read
@lucene.internal
Int32BlockPool.SliceWriter
A Int32BlockPool.SliceWriter that allows to write multiple integer slices into a given Int32BlockPool.
@lucene.internal
Int32sRef
Represents int[], as a slice (offset + length) into an
existing int[]. The Int32s member should never be null
; use
EMPTY_INT32S if necessary.
NOTE: This was IntsRef in Lucene
@lucene.internal
Int64BitSet
BitSet of fixed length (Lucene.Net.Util.Int64BitSet.numBits), backed by accessible (GetBits())
long[], accessed with a
NOTE: This was LongBitSet in Lucene
@lucene.internal
Int64sRef
Represents long[], as a slice (offset + length) into an
existing long[]. The Int64s member should never be null
; use
EMPTY_INT64S if necessary.
NOTE: This was LongsRef in Lucene
@lucene.internal
Int64Values
Abstraction over an array of
NOTE: This was LongValues in Lucene
@lucene.internal
IntroSorter
Sorter implementation based on a variant of the quicksort algorithm called introsort: when the recursion level exceeds the log of the length of the array to sort, it falls back to heapsort. This prevents quicksort from running into its worst-case quadratic runtime. Small arrays are sorted with insertion sort.
@lucene.internal
IOUtils
This class emulates the new Java 7 "Try-With-Resources" statement. Remove once Lucene is on Java 7.
@lucene.internal
LineFileDocs
Minimal port of benchmark's LneDocSource + DocMaker, so tests can enum docs from a line file created by benchmark's WriteLineDoc task
LuceneTestCase
LuceneTestCase.ConcurrentMergeSchedulerFactories
Contains a list of all the Func<IConcurrentMergeSchedulers> to be tested. Delegate method allows them to be created on their target thread instead of the test thread and also ensures a separate instance is created in each case (which can affect the result of the test).
LUCENENET specific
LuceneTestCase.SuppressCodecsAttribute
Annotation for test classes that should avoid certain codec types (because they are expensive, for example).
LuceneTestCase.SuppressTempFileChecks
LuceneVersionExtensions
Extension methods to the LuceneVersion enumeration to provide version comparison and parsing functionality.
MapOfSets<TKey, TValue>
Helper class for keeping Lists of Objects associated with keys. WARNING: this CLASS IS NOT THREAD SAFE
@lucene.internal
MathUtil
Math static utility methods.
MergedIterator<T>
Provides a merged sorted view from several sorted iterators.
If built with Lucene.Net.Util.MergedIterator`1.removeDuplicates set to true
and an element
appears in multiple iterators then it is deduplicated, that is this iterator
returns the sorted union of elements.
If built with Lucene.Net.Util.MergedIterator`1.removeDuplicates set to false
then all elements
in all iterators are returned.
Caveats:
- The behavior is undefined if the iterators are not actually sorted.
- Null elements are unsupported.
- If Lucene.Net.Util.MergedIterator`1.removeDuplicates is set to
true
and if a single iterator contains duplicates then they will not be deduplicated. - When elements are deduplicated it is not defined which one is returned.
- If Lucene.Net.Util.MergedIterator`1.removeDuplicates is set to
false
then the order in which duplicates are returned isn't defined.
@lucene.internal
NamedServiceFactory<TService>
LUCENENET specific abstract class containing common fuctionality for named service factories.
NullInfoStream
Prints nothing. Just to make sure tests pass w/ and without enabled InfoStream without actually making noise. @lucene.experimental
NumericUtils
This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.
To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically.
This class generates terms to achieve this: First the numerical integer values need to
be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned
and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is
sortable like the original integer value (even using UTF-8 sort order). Each value is also
prefixed (in the first char) by the shift
value (number of bits removed) used
during encoding.
To also index floating point numbers, this class supplies two methods to convert them
to integer values by changing their bit layout: DoubleToSortableInt64(Double),
SingleToSortableInt32(Single). You will have no precision loss by
converting floating point numbers to integers and back (only that the integer form
is not usable). Other data types like dates can easily converted to
For easy usage, the trie algorithm is implemented for indexing inside
NumericTokenStream that can index
This class can also be used, to generate lexicographically sortable (according to UTF8SortedAsUTF16Comparer) representations of numeric data types for other usages (e.g. sorting).
@lucene.internal @since 2.9, API changed non backwards-compliant in 4.0
NumericUtils.Int32RangeBuilder
Callback for SplitInt32Range(NumericUtils.Int32RangeBuilder, Int32, Int32, Int32). You need to override only one of the methods.
NOTE: This was IntRangeBuilder in Lucene
@lucene.internal @since 2.9, API changed non backwards-compliant in 4.0
NumericUtils.Int64RangeBuilder
Callback for SplitInt64Range(NumericUtils.Int64RangeBuilder, Int32, Int64, Int64). You need to override only one of the methods.
NOTE: This was LongRangeBuilder in Lucene
@lucene.internal @since 2.9, API changed non backwards-compliant in 4.0
OfflineSorter
On-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields:
- (two bytes) length of the following byte array,
- exactly the above count of bytes for the sequence to be sorted.
OfflineSorter.BufferSize
A bit more descriptive unit for constructors.
OfflineSorter.ByteSequencesReader
Utility class to read length-prefixed byte[] entries from an input. Complementary to OfflineSorter.ByteSequencesWriter.
OfflineSorter.ByteSequencesWriter
Utility class to emit length-prefixed byte[] entries to an output stream for sorting. Complementary to OfflineSorter.ByteSequencesReader.
OfflineSorter.SortInfo
Sort info (debugging mostly).
OpenBitSet
An "open" BitSet implementation that allows direct access to the array of words storing the bits.
NOTE: This can be used in .NET any place where a java.util.BitSet
is used in Java.
Unlike java.util.BitSet
, the fact that bits are packed into an array of longs
is part of the interface. This allows efficient implementation of other algorithms
by someone other than the author. It also allows one to efficiently implement
alternate serialization or interchange formats.
OpenBitSet is faster than java.util.BitSet
in most operations
and much faster at calculating cardinality of sets and results of set operations.
It can also handle sets of larger cardinality (up to 64 * 2**32-1)
The goals of OpenBitSet are the fastest implementation possible, and maximum code reuse. Extra safety and encapsulation may always be built on top, but if that's built in, the cost can never be removed (and hence people re-implement their own version in order to get better performance).
Performance Results
Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinalityIntersectionCountUnionNextSetBitGetGetIterator | |
---|---|
50% full | 3.363.961.441.461.991.58 |
1% full | 3.313.90 1.04 0.99 |
Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinalityIntersectionCountUnionNextSetBitGetGetIterator | |
---|---|
50% full | 2.503.501.001.031.121.25 |
1% full | 2.513.49 1.00 1.02 |
OpenBitSetDISI
OpenBitSet with added methods to bulk-update the bits from a DocIdSetIterator. (DISI stands for DocIdSetIterator).
OpenBitSetIterator
An iterator to iterate over set bits in an OpenBitSet. this is faster than NextSetBit(Int64) for iterating over the complete set of bits, especially when the density of the bits set is high.
PagedBytes
Represents a logical byte[] as a series of pages. You can write-once into the logical byte[] (append only), using copy, and then retrieve slices (BytesRef) into it using fill.
@lucene.internal
PagedBytes.PagedBytesDataInput
PagedBytes.PagedBytesDataOutput
PagedBytes.Reader
Provides methods to read BytesRefs from a frozen PagedBytes.
Paths
The static accessor class for file paths used in testing.
PForDeltaDocIdSet
DocIdSet implementation based on pfor-delta encoding.
This implementation is inspired from LinkedIn's Kamikaze (http://data.linkedin.com/opensource/kamikaze) and Daniel Lemire's JavaFastPFOR (https://github.com/lemire/JavaFastPFOR).
On the contrary to the original PFOR paper, exceptions are encoded with FOR instead of Simple16.
PForDeltaDocIdSet.Builder
A builder for PForDeltaDocIdSet.
PrintStreamInfoStream
LUCENENET specific stub to assist with migration to TextWriterInfoStream.
PriorityQueue<T>
A PriorityQueue<T> maintains a partial ordering of its elements such that the element with least priority can always be found in constant time. Put()'s and Pop()'s require log(size) time.
NOTE: this class will pre-allocate a full array of
length maxSize+1
if instantiated via the
PriorityQueue(Int32, Boolean) constructor with
prepopulate
set to true
. That maximum
size can grow as we insert elements over the time.
@lucene.internal
QueryBuilder
Creates queries from the Analyzer chain.
Example usage:
QueryBuilder builder = new QueryBuilder(analyzer);
Query a = builder.CreateBooleanQuery("body", "just a test");
Query b = builder.CreatePhraseQuery("body", "another test");
Query c = builder.CreateMinShouldMatchQuery("body", "another test", 0.5f);
This can also be used as a subclass for query parsers to make it easier to interact with the analysis chain. Factory methods such as NewTermQuery(Term) are provided so that the generated queries can be customized.
QuickPatchThreadsFilter
Last minute patches. TODO: remove when integrated in system filters in rr.
RamUsageEstimator
Estimates the size (memory representation) of .NET objects.
@lucene.internal
RecyclingByteBlockAllocator
A ByteBlockPool.Allocator implementation that recycles unused byte blocks in a buffer and reuses them in subsequent calls to GetByteBlock().
Note: this class is not thread-safe.
@lucene.internalRecyclingInt32BlockAllocator
A Int32BlockPool.Allocator implementation that recycles unused
Note: this class is not thread-safe.
NOTE: This was RecyclingIntBlockAllocator in Lucene
@lucene.internalRefCount<T>
Manages reference counting for a given object. Extensions can override Release() to do custom logic when reference counting hits 0.
RollingBuffer
LUCENENET specific class to allow referencing static members of RollingBuffer<T> without referencing its generic closing type.
RollingBuffer<T>
Acts like forever growing T[], but internally uses a
circular buffer to reuse instances of
@lucene.internal
RunListenerPrintReproduceInfo
A suite listener printing a "reproduce string". this ensures test result events are always captured properly even if exceptions happen at initialization or suite/ hooks level.
SentinelInt32Set
A native
To iterate over the integers held in this set, simply use code like this:
SentinelIntSet set = ...
foreach (int v in set.keys)
{
if (v == set.EmptyVal)
continue;
//use v...
}
NOTE: This was SentinelIntSet in Lucene
@lucene.internal
ServiceNameAttribute
LUCENENET specific abstract class for
SetOnce<T>
A convenient class which offers a semi-immutable object wrapper implementation which allows one to set the value of an object exactly once, and retrieve it many times. If Set(T) is called more than once, AlreadySetException is thrown and the operation will fail.
@lucene.experimental
SloppyMath
Math functions that trade off accuracy for speed.
SmallSingle
Floating point numbers smaller than 32 bits.
NOTE: This was SmallFloat in Lucene
@lucene.internal
Sorter
Base class for sorting algorithms implementations.
@lucene.internal
SPIClassIterator<S>
Helper class for loading SPI classes from classpath (META-INF files).
This is a light impl of java.util.ServiceLoader
but is guaranteed to
be bug-free regarding classpath order and does not instantiate or initialize
the classes found.
@lucene.internal
StackTraceHelper
StringHelper
Methods for manipulating strings.
@lucene.internal
TestRuleAssertionsRequired
Require assertions for Lucene/Solr packages.
TestRuleFieldCacheSanity
TestRuleIgnoreAfterMaxFailures
TestRuleIgnoreTestSuites
TestRuleMarkFailure
A rule for marking failed tests and suites.
TestRuleStoreClassName
Stores the suite name so you can retrieve it
from
TestSecurityManager
TestUtil
General utility methods for Lucene unit tests.
TextWriterInfoStream
InfoStream implementation over a
NOTE: This is analogous to PrintStreamInfoStream in Lucene.
@lucene.internal
ThrottledIndexOutput
Intentionally slow IndexOutput for testing.
TimeUnits
time unit constants for use in annotations.
TimSorter
Sorter implementation based on the TimSort algorithm.
This implementation is especially good at sorting partially-sorted arrays and sorts small arrays with binary sort.
NOTE:There are a few differences with the original implementation:
- The extra amount of memory to perform merges is
configurable. This allows small merges to be very fast while large merges
will be performed in-place (slightly slower). You can make sure that the
fast merge routine will always be used by having
maxTempSlots
equal to half of the length of the slice of data to sort. - Only the fast merge routine can gallop (the one that doesn't run in-place) and it only gallops on the longest slice.
@lucene.internal
ToStringUtils
Helper methods to ease implementing
UnicodeUtil
Class to encode .NET's UTF16 char[] into UTF8 byte[]
without always allocating a new byte[] as
@lucene.internal
VirtualMethod
A utility for keeping backwards compatibility on previously abstract methods (or similar replacements).
Before the replacement method can be made abstract, the old method must kept deprecated. If somebody still overrides the deprecated method in a non-final class, you must keep track, of this and maybe delegate to the old method in the subclass. The cost of reflection is minimized by the following usage of this class:
Define static final fields in the base class (BaseClass
),
where the old and new method are declared:
static final VirtualMethod<BaseClass> newMethod = new VirtualMethod<BaseClass>(BaseClass.class, "newName", parameters...); static final VirtualMethod<BaseClass> oldMethod = new VirtualMethod<BaseClass>(BaseClass.class, "oldName", parameters...);
this enforces the singleton status of these objects, as the maintenance of the cache would be too costly else.
If you try to create a second instance of for the same method/baseClass
combination, an exception is thrown.
To detect if e.g. the old method was overridden by a more far subclass on the inheritance path to the current instance's class, use a non-static field:
final boolean isDeprecatedMethodOverridden = oldMethod.getImplementationDistance(this.getClass()) > newMethod.getImplementationDistance(this.getClass()); // alternatively (more readable): final boolean isDeprecatedMethodOverridden = VirtualMethod.compareImplementationDistance(this.getClass(), oldMethod, newMethod) > 0
GetImplementationDistance(Type) returns the distance of the subclass that overrides this method.
The one with the larger distance should be used preferable.
this way also more complicated method rename scenarios can be handled
(think of 2.9
@lucene.internal
WAH8DocIdSet
DocIdSet implementation based on word-aligned hybrid encoding on words of 8 bits.
This implementation doesn't support random-access but has a fast DocIdSetIterator which can advance in logarithmic time thanks to an index.
The compression scheme is simplistic and should work well with sparse and very dense doc id sets while being only slightly larger than a FixedBitSet for incompressible sets (overhead<2% in the worst case) in spite of the index.
Format: The format is byte-aligned. An 8-bits word is either clean, meaning composed only of zeros or ones, or dirty, meaning that it contains between 1 and 7 bits set. The idea is to encode sequences of clean words using run-length encoding and to leave sequences of dirty words as-is.
TokenClean length+Dirty length+Dirty words | |
---|---|
1 byte0-n bytes0-n bytes0-n bytes |
- Token encodes whether clean means full of zeros or ones in the first bit, the number of clean words minus 2 on the next 3 bits and the number of dirty words on the last 4 bits. The higher-order bit is a continuation bit, meaning that the number is incomplete and needs additional bytes to be read.
- Clean length+: If clean length has its higher-order bit set, you need to read a vint (ReadVInt32()), shift it by 3 bits on the left side and add it to the 3 bits which have been read in the token.
- Dirty length+ works the same way as Clean length+ but on 4 bits and for the length of dirty words.
- Dirty wordsare the dirty words, there are Dirty length of them.
This format cannot encode sequences of less than 2 clean words and 0 dirty word. The reason is that if you find a single clean word, you should rather encode it as a dirty word. This takes the same space as starting a new sequence (since you need one byte for the token) but will be lighter to decode. There is however an exception for the first sequence. Since the first sequence may start directly with a dirty word, the clean length is encoded directly, without subtracting 2.
There is an additional restriction on the format: the sequence of dirty words is not allowed to contain two consecutive clean words. This restriction exists to make sure no space is wasted and to make sure iterators can read the next doc ID by reading at most 2 dirty words.
@lucene.experimentalWAH8DocIdSet.Builder
A builder for WAH8DocIdSets.
WAH8DocIdSet.WordBuilder
Word-based builder.
WeakIdentityMap<TKey, TValue>
Implements a combination of java.util.WeakHashMap
and
java.util.IdentityHashMap
.
Useful for caches that need to key off of a ==
comparison
instead of a .Equals(object)
.
This class is not a general-purpose
This implementation was forked from Apache CXF
but modified to not implement the null
keys, but those are never weak!
The map supports two modes of operation:
reapOnRead = true
: This behaves identical to ajava.util.WeakHashMap
where it also cleans up the reference queue on every read operation (Get(Object), ContainsKey(Object), Count, GetValueEnumerator()), freeing map entries of already GCed keys.reapOnRead = false
: This mode does not call Reap() on every read operation. In this case, the reference queue is only cleaned up on write operations (like Put(TKey, TValue)). This is ideal for maps with few entries where the keys are unlikely be garbage collected, but there are lots of Get(Object) operations. The code can still call Reap() to manually clean up the queue without doing a write operation.
@lucene.internal
Interfaces
IAccountable
An object whose RAM usage can be computed.
@lucene.internal
IAttribute
Base interface for attributes.
IAttributeReflector
This interface is used to reflect contents of AttributeSource or Attribute.
IBits
Interface for Bitset-like structures.
@lucene.experimental
IBytesRefIterator
A simple iterator interface for BytesRef iteration.
IMutableBits
Extension of IBits for live documents.
IServiceListable
LUCENENET specific contract that provides support for AvailableCodecs(),
AvailableDocValuesFormats(),
and AvailablePostingsFormats(). Implement this
interface in addition to ICodecFactory, IDocValuesFormatFactory,
or IPostingsFormatFactory to provide optional support for the above
methods when providing a custom implementation. If this interface is not supported by
the corresponding factory, a
RollingBuffer.IResettable
Implement to reset an instance
TestRuleIgnoreTestSuites.NestedTestSuite
Marker interface for nested suites that should be ignored if executed in stand-alone mode.
Enums
LuceneVersion
Use by certain classes to match version compatibility across releases of Lucene.
WARNING: When changing the version parameter that you supply to components in Lucene, do not simply change the version at search-time, but instead also adjust your indexing code to match, and re-index.