Class PersianAnalyzer
Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.
Inherited Members
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public sealed class PersianAnalyzer : StopwordAnalyzerBase, IDisposable
Constructors
Name | Description |
---|---|
PersianAnalyzer(LuceneVersion) | Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE. |
PersianAnalyzer(LuceneVersion, CharArraySet) | Builds an analyzer with the given stop words |
Fields
Name | Description |
---|---|
DEFAULT_STOPWORD_FILE | File containing default Persian stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html. The stopword list is BSD-Licensed. |
STOPWORDS_COMMENT | The comment character in the stopwords file. All lines prefixed with this will be ignored |
Properties
Name | Description |
---|---|
DefaultStopSet | Returns an unmodifiable instance of the default stop-words set. |
Methods
Name | Description |
---|---|
CreateComponents(String, TextReader) | Creates TokenStreamComponents used to tokenize all the text in the provided System.IO.TextReader. |
InitReader(String, TextReader) | Wraps the System.IO.TextReader with PersianCharFilter |