Namespace Lucene.Net.Util.Packed
Classes
AbstractAppendingInt64Buffer
Common functionality shared by AppendingDeltaPackedInt64Buffer and MonotonicAppendingInt64Buffer.
NOTE: This was AbstractAppendingLongBuffer in Lucene
AbstractAppendingInt64Buffer.Iterator
AbstractBlockPackedWriter
AbstractPagedMutable<T>
Base implementation for PagedMutable and PagedGrowableWriter.
@lucene.internal
AppendingDeltaPackedInt64Buffer
Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for the case where values are close to each other.
NOTE: This was AppendingDeltaPackedLongBuffer in Lucene
@lucene.internal
AppendingPackedInt64Buffer
Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for non-negative numbers with a uniform distribution over a fixed (limited) range.
NOTE: This was AppendingPackedLongBuffer in Lucene
@lucene.internal
BlockPackedReader
Provides random access to a stream written with BlockPackedWriter.
@lucene.internal
BlockPackedReaderIterator
Reader for sequences of
@lucene.internal
BlockPackedWriter
A writer for large sequences of longs.
The sequence is divided into fixed-size blocks and for each block, the difference between each value and the minimum value of the block is encoded using as few bits as possible. Memory usage of this class is proportional to the block size. Each block has an overhead between 1 and 10 bytes to store the minimum value and the number of bits per value of the block.
Format:
- <BLock>BlockCount
- BlockCount: ? ValueCount / BlockSize ?
- Block: <Header, (Ints)>
- Header: <Token, (MinValue)>
- Token: a byte (WriteByte(Byte)), first 7 bits are the
number of bits per value (
bitsPerValue
). If the 8th bit is 1, then MinValue (see next) is0
, otherwise MinValue and needs to be decoded - MinValue: a
zigzag-encoded
variable-length
(WriteVInt64(Int64)) whose value should be added to every int from the block to restore the original values - Ints: If the number of bits per value is
0
, then there is nothing to decode and all ints are equal to MinValue. Otherwise: BlockSize packed ints (PackedInt32s) encoded on exactlybitsPerValue
bits per value. They are the subtraction of the original values and MinValue
@lucene.internal
EliasFanoDecoder
A decoder for an EliasFanoEncoder.
@lucene.internal
EliasFanoDocIdSet
A DocIdSet in Elias-Fano encoding.
@lucene.internal
EliasFanoEncoder
Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.
The Elias-Fano encoding is a high bits / low bits representation of
a monotonically increasing sequence of numValues > 0
natural numbers x[i]
0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound
where upperBound > 0
is an upper bound on the last value.
The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.
The lower L
bits of each x[i]
are stored explicitly and contiguously
in the lower-bits array, with L
chosen as (Log()
base 2):
L = max(0, floor(log(upperBound/numValues)))
The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0
):
(x[i]/2L) - (x[i-1]/2L)
The unary code encodes a natural number n
by n
0 bits followed by a 1 bit:
0...01
.
In the upper bits the total the number of 1 bits is numValues
and the total number of 0 bits is:
floor(x[numValues-1]/2L) <= upperBound/(2max(0, floor(log(upperBound/numValues)))) <= 2numValues
The Elias-Fano encoding uses at most
2 + Ceil(Log(upperBound/numValues))
bits per encoded number. With upperBound
in these bounds (p
is an integer):
2p < x[numValues-1] <= upperBound <= 2*(p+1)
the number of bits per encoded number is minimized.
In this implementation the values in the sequence can be given as long
,
numValues = 0
and upperBound = 0
are allowed,
and each of the upper and lower bit arrays should fit in a long[]
.
An index of positions of zero's in the upper bits is also built.
this implementation is based on this article:
Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3, 4 and 9. Retrieved from http://arxiv.org/pdf/1206.4300 .
The articles originally describing the Elias-Fano representation are:
Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246â€"260, 1974.
Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.
@lucene.internal
GrowableWriter
Implements PackedInt32s.Mutable, but grows the bit count of the underlying packed ints on-demand.
Beware that this class will accept to set negative values but in order to do this, it will grow the number of bits per value to 64.
@lucene.internal
MonotonicAppendingInt64Buffer
Utility class to buffer signed longs in memory, which is optimized for the case where the sequence is monotonic, although it can encode any sequence of arbitrary longs. It only supports appending.
NOTE: This was MonotonicAppendingLongBuffer in Lucene.
@lucene.internal
MonotonicBlockPackedReader
Provides random access to a stream written with MonotonicBlockPackedWriter.
@lucene.internal
MonotonicBlockPackedWriter
A writer for large monotonically increasing sequences of positive
The sequence is divided into fixed-size blocks and for each block, values are modeled after a linear function f: x ? A × x + B. The block encodes deltas from the expected values computed from this function using as few bits as possible. Each block has an overhead between 6 and 14 bytes.
Format:
- <BLock>BlockCount
- BlockCount: ? ValueCount / BlockSize ?
- Block: <Header, (Ints)>
- Header: <B, A, BitsPerValue>
- B: the B from f: x ? A × x + B using a
variable-length
(WriteVInt64(Int64)) - A: the A from f: x ? A × x + B encoded using SingleToInt32Bits(Single) on 4 bytes (WriteVInt32(Int32))
- BitsPerValue: a variable-length
(WriteVInt32(Int32)) - Ints: if BitsPerValue is
0
, then there is nothing to read and all values perfectly match the result of the function. Otherwise, these are the zigzag-encoded packed (PackedInt32s) deltas from the expected value (computed from the function) using exaclty BitsPerValue bits per value
@lucene.internal
Packed64
Space optimized random access capable array of values with a fixed number of bits/value. Values are packed contiguously.
The implementation strives to perform af fast as possible under the constraint of contiguous bits, by avoiding expensive operations. This comes at the cost of code clarity.
Technical details: this implementation is a refinement of a non-branching version. The non-branching get and set methods meant that 2 or 4 atomics in the underlying array were always accessed, even for the cases where only 1 or 2 were needed. Even with caching, this had a detrimental effect on performance. Related to this issue, the old implementation used lookup tables for shifts and masks, which also proved to be a bit slower than calculating the shifts and masks on the fly. See https://issues.apache.org/jira/browse/LUCENE-4062 for details.
PackedDataInput
A DataInput wrapper to read unaligned, variable-length packed integers. This API is much slower than the PackedInt32s fixed-length API but can be convenient to save space.
@lucene.internal
PackedDataOutput
A DataOutput wrapper to write unaligned, variable-length packed integers.
@lucene.internal
PackedInt32s
Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.
NOTE: This was PackedInts in Lucene.
@lucene.internal
PackedInt32s.Format
A format to write packed
@lucene.internal
PackedInt32s.FormatAndBits
Simple class that holds a format and a number of bits per value.
PackedInt32s.Header
Header identifying the structure of a packed integer array.
PackedInt32s.Mutable
A packed integer array that can be modified.
@lucene.internal
PackedInt32s.MutableImpl
PackedInt32s.NullReader
A PackedInt32s.Reader which has all its values equal to 0 (bitsPerValue = 0).
PackedInt32s.Reader
A read-only random access array of positive integers.
@lucene.internal
PackedInt32s.Writer
A write-once Writer.
@lucene.internal
PagedGrowableWriter
A PagedGrowableWriter. This class slices data into fixed-size blocks which have independent numbers of bits per value and grow on-demand.
You should use this class instead of the AbstractAppendingInt64Buffer related ones only when you need random write-access. Otherwise this class will likely be slower and less memory-efficient.
@lucene.internal
PagedMutable
A PagedMutable. This class slices data into fixed-size blocks which have the same number of bits per value. It can be a useful replacement for PackedInt32s.Mutable to store more than 2B values.
@lucene.internal
Interfaces
PackedInt32s.IDecoder
A decoder for packed integers.
PackedInt32s.IEncoder
An encoder for packed integers.
PackedInt32s.IReaderIterator
Run-once iterator interface, to decode previously saved PackedInt32s.