Class PatternAnalyzer

Efficient Lucene analyzer/tokenizer that preferably operates on a System.String rather than a System.IO.TextReader, that can flexibly separate text into terms via a regular expression System.Text.RegularExpressions.Regex (with behaviour similar to System.Text.RegularExpressions.Regex.Split(System.String)), and that combines the functionality of LetterTokenizer, LowerCaseTokenizer, WhitespaceTokenizer, StopFilter into a single efficient multi-purpose class.

If you are unsure how exactly a regular expression should look like, consider prototyping by simply trying various expressions on some test texts via System.Text.RegularExpressions.Regex.Split(System.String). Once you are satisfied, give that regex to PatternAnalyzer. Also see Regular Expression Tutorial.

This class can be considerably faster than the "normal" Lucene tokenizers. It can also serve as a building block in a compound Lucene TokenFilter chain. For example as in this stemming example:

PatternAnalyzer pat = ...
TokenStream tokenStream = new SnowballFilter(
    pat.GetTokenStream("content", "James is running round in the woods"), 
    "English"));

Inheritance

System.Object

Analyzer

PatternAnalyzer

Inherited Members

Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>)

Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>, Lucene.Net.Analysis.ReuseStrategy)

Lucene.Net.Analysis.Analyzer.NewAnonymous(System.Func<System.String, System.IO.TextReader, Lucene.Net.Analysis.TokenStreamComponents>, System.Func<System.String, System.IO.TextReader, System.IO.TextReader>)

Lucene.Net.Analysis.Analyzer.GetTokenStream(System.String, System.IO.TextReader)

Analyzer.GetTokenStream(String, String)

Lucene.Net.Analysis.Analyzer.InitReader(System.String, System.IO.TextReader)

Analyzer.GetPositionIncrementGap(String)

Analyzer.GetOffsetGap(String)

Analyzer.Strategy

Analyzer.Dispose()

Lucene.Net.Analysis.Analyzer.GetObjectData(System.Runtime.Serialization.SerializationInfo, System.Runtime.Serialization.StreamingContext)

Analyzer.GLOBAL_REUSE_STRATEGY

Analyzer.PER_FIELD_REUSE_STRATEGY

System.Object.ToString()

System.Object.Equals(System.Object, System.Object)

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.GetType()

System.Object.MemberwiseClone()

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

[Obsolete("(4.0) use the pattern-based analysis in the analysis/pattern package instead.")]
[Serializable]
public sealed class PatternAnalyzer : Analyzer, IDisposable

Constructors

Name	Description
PatternAnalyzer(LuceneVersion, Regex, Boolean, CharArraySet)	Constructs a new instance with the given parameters.

Fields

Name	Description
DEFAULT_ANALYZER	A lower-casing word analyzer with English stop words (can be shared freely across threads without harm); global per class loader.
EXTENDED_ANALYZER	A lower-casing word analyzer with extended English stop words (can be shared freely across threads without harm); global per class loader. The stop words are borrowed from http://thomas.loc.gov/home/stopwords.html, see http://thomas.loc.gov/home/all.about.inquery.html
NON_WORD_PATTERN	`"\W+"`; Divides text at non-letters (NOT Character.isLetter(c))
WHITESPACE_PATTERN	`"\s+"`; Divides text at whitespaces (Character.isWhitespace(c))

Methods

Name	Description
CreateComponents(String, TextReader)	Creates a token stream that tokenizes all the text in the given SetReader; This implementation forwards to Lucene.Net.Analysis.Analyzer.GetTokenStream(System.String, System.IO.TextReader) and is less efficient than Lucene.Net.Analysis.Analyzer.GetTokenStream(System.String, System.IO.TextReader).
CreateComponents(String, TextReader, String)	Creates a token stream that tokenizes the given string into token terms (aka words).
Equals(Object)	Indicates whether some other object is "equal to" this one.
GetHashCode()	Returns a hash code value for the object.

Extension Methods

Number.IsNumber(Object)

SystemTypesHelpers.toString(Object)

SystemTypesHelpers.equals(Object, Object)