Class HyphenationTree
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.
This class has been taken from the Apache FOP project (http://xmlgraphics.apache.org/fop/). They have been slightly modified.
Inherited Members
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
[Serializable]
public class HyphenationTree : TernaryTree, IPatternConsumer
Constructors
Name | Description |
---|---|
HyphenationTree() |
Fields
Name | Description |
---|---|
m_classmap | This map stores the character classes |
m_stoplist | This map stores hyphenation exceptions |
m_vspace | value space: stores the interletter values |
Methods
Name | Description |
---|---|
AddClass(String) | Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char. |
AddException(String, IList<Object>) | Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions. |
AddPattern(String, String) | Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree. |
FindPattern(String) | |
GetValues(Int32) | |
HStrCmp(Char[], Int32, Char[], Int32) | String compare, returns 0 if equal or t is a substring of s |
Hyphenate(Char[], Int32, Int32, Int32, Int32) | Hyphenate word and return an array of hyphenation points. |
Hyphenate(String, Int32, Int32) | Hyphenate word and return a Hyphenation object. |
LoadPatterns(FileInfo) | Read hyphenation patterns from an XML file. |
LoadPatterns(FileInfo, Encoding) | Read hyphenation patterns from an XML file. |
LoadPatterns(Stream) | Read hyphenation patterns from an XML file. |
LoadPatterns(Stream, Encoding) | Read hyphenation patterns from an XML file. |
LoadPatterns(String) | Read hyphenation patterns from an XML file. |
LoadPatterns(String, Encoding) | Read hyphenation patterns from an XML file. |
LoadPatterns(XmlReader) | Read hyphenation patterns from an System.Xml.XmlReader. |
PackValues(String) | Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value. |
SearchPatterns(Char[], Int32, Byte[]) | Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:
But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table |
UnpackValues(Int32) |