Class Builder<T>
Builds a minimal FST (maps an Int32sRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).
NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
The parameterized type
FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed.
@lucene.experimental
Assembly: DistributedLucene.Net.dll
Syntax
public class Builder<T> : Builder
Type Parameters
Name | Description |
---|---|
T |
Constructors
Name | Description |
---|---|
Builder(FST.INPUT_TYPE, Outputs<T>) | Instantiates an FST/FSA builder without any pruning. A shortcut to Builder(FST.INPUT_TYPE, Int32, Int32, Boolean, Boolean, Int32, Outputs<T>, Builder.FreezeTail<T>, Boolean, Single, Boolean, Int32) with pruning options turned off. |
Builder(FST.INPUT_TYPE, Int32, Int32, Boolean, Boolean, Int32, Outputs<T>, Builder.FreezeTail<T>, Boolean, Single, Boolean, Int32) | Instantiates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully. |
Properties
Name | Description |
---|---|
MappedStateCount | |
TermCount | |
TotStateCount |
Methods
Name | Description |
---|---|
Add(Int32sRef, T) | It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (eg ByteSequenceOutputs or Int32SequenceOutputs) then you cannot reuse across calls. |
Finish() | Returns final FST. NOTE: this will return null if nothing is accepted by the FST. |
GetFstSizeInBytes() |