Class UnicodeUtil

Class to encode .NET's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as of does.

@lucene.internal

Inheritance

System.Object

UnicodeUtil

Assembly: DistributedLucene.Net.dll

Syntax

public static class UnicodeUtil : object

Fields

Name	Description
BIG_TERM	A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms (e.g. collation keys) one would normally encounter, and definitely bigger than any UTF-8 terms. WARNING: this is not a valid UTF8 Term
UNI_REPLACEMENT_CHAR
UNI_SUR_HIGH_END
UNI_SUR_HIGH_START
UNI_SUR_LOW_END
UNI_SUR_LOW_START

Methods

Name	Description
CodePointCount(BytesRef)	Returns the number of code points in this UTF8 sequence. This method assumes valid UTF8 input. This method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
NewString(Int32[], Int32, Int32)	Cover JDK 1.5 API. Create a String from an array of `codePoints`.
ToCharArray(Int32[], Int32, Int32)	Generates char array that represents the provided input code points. LUCENENET specific.
ToHexString(String)
UTF16toUTF8(ICharSequence, Int32, Int32, BytesRef)	Encode characters from this ICharSequence, starting at `offset` for `length` characters. After encoding, `result.Offset` will always be 0.
UTF16toUTF8(Char[], Int32, Int32, BytesRef)	Encode characters from a char[] `source`, starting at `offset` for `length` chars. After encoding, `result.Offset` will always be 0.
UTF16toUTF8(String, Int32, Int32, BytesRef)	Encode characters from this , starting at `offset` for `length` characters. After encoding, `result.Offset` will always be 0. LUCENENET specific.
UTF8toUTF16(BytesRef, CharsRef)	Utility method for UTF8toUTF16(Byte[], Int32, Int32, CharsRef)
UTF8toUTF16(Byte[], Int32, Int32, CharsRef)	Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint. NOTE: Full characters are read, even if this reads past the length passed (and can result in an if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
UTF8toUTF32(BytesRef, Int32sRef)	This method assumes valid UTF8 input. This method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
ValidUTF16String(ICharSequence)
ValidUTF16String(StringBuilder)
ValidUTF16String(Char[], Int32)
ValidUTF16String(String)

Class UnicodeUtil

Inheritance

Assembly: DistributedLucene.Net.dll

Syntax

Fields

Methods

Contact Us