Field SOUTH_EAST_ASIAN_TYPE
Chars in class \p{Line_Break = Complex_Context} are from South East Asian scripts (Thai, Lao, Myanmar, Khmer, etc.). Sequences of these are kept together as as a single token rather than broken up, because the logic required to break them at word boundaries is too complex for UAX#29.
See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public static readonly int SOUTH_EAST_ASIAN_TYPE
Returns
Type | Description |
---|---|
System.Int32 |