Class ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
Characters from the following Unicode blocks are converted; however, only those characters with reasonable ASCII alternatives are converted:
- C1 Controls and Latin-1 Supplement: http://www.unicode.org/charts/PDF/U0080.pdf
- Latin Extended-A: http://www.unicode.org/charts/PDF/U0100.pdf
- Latin Extended-B: http://www.unicode.org/charts/PDF/U0180.pdf
- Latin Extended Additional: http://www.unicode.org/charts/PDF/U1E00.pdf
- Latin Extended-C: http://www.unicode.org/charts/PDF/U2C60.pdf
- Latin Extended-D: http://www.unicode.org/charts/PDF/UA720.pdf
- IPA Extensions: http://www.unicode.org/charts/PDF/U0250.pdf
- Phonetic Extensions: http://www.unicode.org/charts/PDF/U1D00.pdf
- Phonetic Extensions Supplement: http://www.unicode.org/charts/PDF/U1D80.pdf
- General Punctuation: http://www.unicode.org/charts/PDF/U2000.pdf
- Superscripts and Subscripts: http://www.unicode.org/charts/PDF/U2070.pdf
- Enclosed Alphanumerics: http://www.unicode.org/charts/PDF/U2460.pdf
- Dingbats: http://www.unicode.org/charts/PDF/U2700.pdf
- Supplemental Punctuation: http://www.unicode.org/charts/PDF/U2E00.pdf
- Alphabetic Presentation Forms: http://www.unicode.org/charts/PDF/UFB00.pdf
- Halfwidth and Fullwidth Forms: http://www.unicode.org/charts/PDF/UFF00.pdf
See: http://en.wikipedia.org/wiki/Latin_characters_in_Unicode
For example, 'à' will be replaced by 'a'.
Inherited Members
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public sealed class ASCIIFoldingFilter : TokenFilter, IDisposable
Constructors
Name | Description |
---|---|
ASCIIFoldingFilter(TokenStream) | |
ASCIIFoldingFilter(TokenStream, Boolean) | Create a new ASCIIFoldingFilter. |
Properties
Name | Description |
---|---|
PreserveOriginal | Does the filter preserve the original tokens? |
Methods
Name | Description |
---|---|
FoldToASCII(Char[], Int32) | Converts characters above ASCII to their ASCII equivalents. For example, accents are removed from accented characters. |
FoldToASCII(Char[], Int32, Char[], Int32, Int32) | Converts characters above ASCII to their ASCII equivalents. For example, accents are removed from accented characters. @lucene.internal |
IncrementToken() | |
Reset() |