Index Data in Distributed Lucene
Distributed Lucene enables efficient searching by indexing the documents, where the indexes need to be created and stored in a directory. The first step is to initialize the directory that stores indexes. Unlike Lucene.NET, the NCacheDirectory
is used for opening a directory by specifying the cache name and the index name.
Note
This feature is also available in NCache Professional.
Note
The default path for the index directory is C:\ProgramData\ncache\lucene-index\{CACHE_NAME}\data for Windows, and /user/share/ncache/lucene-index/{CACHE_NAME}/data for Linux.
Once the directory is initialized, the IndexWriter
is opened by providing the instance of NCacheDirectory
and the IndexWriterConfig. The IndexWriter
adds and indexes the documents on the index with the same mechanism as in the Lucene.NET using the AddDocument
method.
Note
For the document distribution among the cache servers, NCache automatically adds a TextField
with a DocKey
field and an auto-generated GUID value to a document if it is not specified by the user.
All the write operations performed on the indexes are synchronous. So, they are only returned to the user on the completion of the said operations. This adds data consistency and integrity to the Distributed Lucene.
Important
For any write operation on the writer, we need to call the Commit
to save it to the document directory. Commit needs to be called, or else it won’t save the write operations and may also impact the search operations on the added documents.
Unlike Lucene.NET, you can open multiple writers on the same directory for parallel indexing. When the document is written, IndexWriter.Commit
is called to persist the document and make it searchable. Once the IndexWriter
is used for indexing the data, the IndexWriter.Dispose
needs to be called to free the resources in use.
Tip
You can get information about the completion of write operations at an instance using the NCache provided Boolean
property IndexWriter.OperationsCompleted
.
Prerequisites
- To learn about the standard prerequisites required to work with all NCache client-side features, please refer to the given page on Client-Side API Prerequisites.
- Make sure that you have created and started a Lucene cache through the NCache Management Center or Command Line Interface.
- Make sure that Client Notifications are enabled.
- Make sure that your application is not using any native Lucene DLL/Reference.
- For API details, refer to: NCacheDirectory, Analyzer, WhiteSpaceAnalyzer, IndexWriter, IndexWriterConfig, Commit, AddDocument, Document.
Indexing Data
The following example first opens the NCacheDirectory
by specifying the cache name as LuceneCache and the index name as ProductIndex. Then it opens an IndexWriter
on the specified directory by specifying the WhitespaceAnalyzer for analyzing the data.
Warning
Index write operations are not allowed in case of a partial cluster.
// Specify the Distributed-Lucene cache name
string cache = "LuceneCache";
// Specify the index name to create the indexes
string indexName = "ProductIndex";
NCacheDirectory ncacheDirectory = null;
IndexWriter indexWriter = null;
//Initializing
// Create a directory and open it on the cache and the index path
ncacheDirectory = NCacheDirectory.Open(cache, indexName);
// Specify the analyzer used to analyze data
Analyzer analyzer = new WhitespaceAnalyzer(LuceneVersion.LUCENE_48);
// Create an indexWriterConfig which holds all the configurations to create an instance of the writer
IndexWriterConfig config = new IndexWriterConfig(LuceneVersion.
LUCENE_48, analyzer);
// Create an instance of the writer
indexWriter = new IndexWriter(ncacheDirectory, new IndexWriterConfig
(LuceneVersion.LUCENE_48, _analyzer));
// Indexing
// Add the products information that is to be indexed
Product[] products = FetchProductsFromDB();
foreach (var prod in products)
{
// Create a document and add fields to it
Document doc = new Document();
doc.Add(new TextField("ProductID", prod.ProductID.ToString(),
Field.Store.YES));
doc.Add(new TextField("ProductName", prod.ProductName, Field.Store.NO));
doc.Add(new TextField("Category", prod.Category, Field.Store.YES));
doc.Add(new TextField("Description", prod.Description, Field.Store.YES));
// Writer is created previously
indexWriter.AddDocument(doc);
}
// Calling commit on the writer saves all the write operations
indexWriter.Commit();
if(indexWriter.OperationsCompleted)
{
// All writing operations are complete
// Can proceed with other operations such as querying
}
// Dispose the indexWriter after indexing
if (indexWriter != null) indexWriter.Dispose();
// Dispose the ncacheDirectory
if (ncacheDirectory != null) ncacheDirectory.Dispose();
Note
You should call Dispose
at the end of every writer
instance, otherwise, it lives in memory and causes memory leakages.
Additional Resources
NCache provides a sample application for Distributed Lucene on GitHub.
See Also
.NET: Lucene.Net.Index namespace.