Class ArabicAnalyzer
Analyzer for Arabic.
This analyzer implements light-stemming as specified by:
Light Stemming for Arabic Information Retrieval
http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf
The analysis package contains three primary components:
- ArabicNormalizationFilter: Arabic orthographic normalization.
- ArabicStemFilter: Arabic light stemming
- Arabic stop words file: a set of default Arabic stop words.
Inherited Members
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public sealed class ArabicAnalyzer : StopwordAnalyzerBase, IDisposable
Constructors
Name | Description |
---|---|
ArabicAnalyzer(LuceneVersion) | Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE. |
ArabicAnalyzer(LuceneVersion, CharArraySet) | Builds an analyzer with the given stop words |
ArabicAnalyzer(LuceneVersion, CharArraySet, CharArraySet) | Builds an analyzer with the given stop word. If a none-empty stem exclusion set is provided this analyzer will add a SetKeywordMarkerFilter before ArabicStemFilter. |
Fields
Name | Description |
---|---|
DEFAULT_STOPWORD_FILE | File containing default Arabic stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed. |
Properties
Name | Description |
---|---|
DefaultStopSet | Returns an unmodifiable instance of the default stop-words set. |
Methods
Name | Description |
---|---|
CreateComponents(String, TextReader) | Creates TokenStreamComponents used to tokenize all the text in the provided System.IO.TextReader. |