Class TokenStream
A TokenStream
enumerates the sequence of tokens, either from
Fields of a Document or from query text.
This is an abstract class. Concrete subclasses are:
TokenStream
API has been introduced with Lucene 2.9. This API
has moved from being Token based to Lucene.Net.Util.IAttribute based. While
Token still exists in 2.9 as a convenience class, the preferred way
to store the information of a Token is to use Lucene.Net.Util.Attributes.
TokenStream
now extends Lucene.Net.Util.AttributeSource, which provides
access to all of the token Lucene.Net.Util.IAttributes for the TokenStream
.
Note that only one instance per Lucene.Net.Util.Attribute is created and reused
for every token. This approach reduces object creation and allows local
caching of references to the Lucene.Net.Util.Attributes. See
IncrementToken() for further details.
The workflow of the new TokenStream
API is as follows:
You can find some example code for the new API in the analysis package level Javadoc.
Sometimes it is desirable to capture a current state of a TokenStream
, e. g. for buffering purposes (see CachingTokenFilter,
TeeSinkTokenFilter). For this usecase
Lucene.Net.Util.AttributeSource.CaptureState and Lucene.Net.Util.AttributeSource.RestoreState(Lucene.Net.Util.AttributeSource.State)
can be used.
Inheritance
Namespace:
Assembly: Lucene.Net.NetCore.dll
Syntax
public abstract class TokenStream : AttributeSource, IDisposable
Constructors
Name | Description |
---|---|
TokenStream() | A TokenStream using the default attribute factory. |
TokenStream(AttributeSource) | A TokenStream that uses the same attributes as the supplied one. |
TokenStream(AttributeSource.AttributeFactory) | A TokenStream using the supplied AttributeFactory for creating new Lucene.Net.Util.IAttribute instances. |
Methods
Name | Description |
---|---|
Close() | Releases resources associated with this stream. |
Dispose() | |
Dispose(Boolean) | |
End() | This method is called by the consumer after the last token has been
consumed, after IncrementToken() returned This method can be used to perform any end-of-stream operations, such as setting the final offset of a stream. The final offset of a stream might differ from the offset of the last token eg in case one or more whitespaces followed after the last token, but a WhitespaceTokenizer was used. |
IncrementToken() | Consumers (i.e., IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate Lucene.Net.Util.Attributes with the attributes of the next token. The producer must make no assumptions about the attributes after the method has been returned: the caller may arbitrarily change it. If the producer needs to preserve the state for subsequent calls, it can use Lucene.Net.Util.AttributeSource.CaptureState to create a copy of the current attribute state. This method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to Lucene.Net.Util.AttributeSource.AddAttribute``1 and Lucene.Net.Util.AttributeSource.GetAttribute``1, references to all Lucene.Net.Util.Attributes that this stream uses should be retrieved during instantiation. To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in IncrementToken(). |
Reset() | Resets this stream to the beginning. This is an optional operation, so
subclasses may or may not implement this method. Reset() is not needed for
the standard indexing process. However, if the tokens of a
|