Class TokenStreamToAutomaton
Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the ITermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
@lucene.experimental
Inheritance
Assembly: DistributedLucene.Net.dll
Syntax
public class TokenStreamToAutomaton : object
Constructors
Name | Description |
---|---|
TokenStreamToAutomaton() | Sole constructor. |
Fields
Name | Description |
---|---|
HOLE | We add this arc to represent a hole. |
POS_SEP | We create transition between two adjacent tokens. |
Properties
Name | Description |
---|---|
PreservePositionIncrements | Whether to generate holes in the automaton for missing positions, |
UnicodeArcs | Whether to make transition labels Unicode code points instead of UTF8 bytes,
|
Methods
Name | Description |
---|---|
ChangeToken(BytesRef) | Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph. |
ToAutomaton(TokenStream) | Pulls the graph (including IPositionLengthAttribute from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term. |