Distributed Lucene Facets
In faceted search, you get the facet results, that consist of the subcategories for certain categories. NCache now supports Facets with the Distributed Lucene, which aids you in searching the desired documents efficiently and effectively.
The Distributed Lucene supports all of the facet types supported by Lucene. These facets are categorized into the following three types:
- Taxonomy-Based Facets: These facets are defined using a hierarchy of categories, known as taxonomy.
- Range-Based Facets: These facets are based on ranges into which a
Numeric
orDateTime
field falls. - Sorted Set Facets: These facets do not require a separate taxonomy index and compute the counts based on the sorted set document values fields instead.
These facet types are explained below with their respective code examples.
Prerequisites
- To learn about the standard prerequisites required to work with all NCache client-side features, please refer to the given page on Client-Side API Prerequisites.
- Make sure that you have created and started a Lucene cache through the NCache Management Center or Command Line Interface.
- Make sure that your application is not using any native Lucene DLL/Reference.
- For API details, refer to: NCacheDirectory, FacetsCollector, DirectoryTaxonomyWriter, TaxoWriter, DirectoryTaxonomyReader, IndexWriterConfig, LuceneVersion, StandardAnalyzer, IndexWriter, FacetsConfig, IndexSearcher, FastTaxonomyFacetCounts, FacetResult, GetTopChildren, GetSpecificValue, GetAllDims, Dispose, NumericDocValuesField, SortedSetDocValuesFacetCounts, DefaultSortedSetDocValuesReaderState.
Taxonomy Based Facets
The DirectoryTaxonomyWriter class allows you to create and open a taxonomy writer. Open the taxonomy directory first inside a taxonomy directory, and the DirectoryTaxonomyReader class allows you to open a taxonomy reader on a taxonomy writer.
Note
The Distributed Lucene supports all the taxonomy-based facets supported by the Lucene.
In the following example, facets are created with the FastTaxonomoyFacetCounts class. They are then configured, and added to the documents in the form of the facet fields. Later, the results are retrieved and verified with the help of the FacetResult class.
FacetsCollector facetCollector = null;
Facets facets = null;
DirectoryTaxonomyWriter taxoWriter = null;
Directory directory = null;
Directory taxoDir = null;
DirectoryTaxonomyReader taxoReader = null;
// Precondition: Lucene cache is already connected
string cache = "luceneCache";
string index = "luceneIndex";
string taxoIndex = "taxonomyIndex";
// Open NCache directory
directory = NCacheDirectory.Open(cache, index);
// Open taxonomy directory
taxoDir = NCacheDirectory.Open(cache, taxoIndex);
// Initialize taxonomy writer
taxoWriter = new DirectoryTaxonomyWriter(taxoDir, OpenMode.CREATE);
// Configure index writer and initialize it
var indexWriterConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, new StandardAnalyzer(LuceneVersion.LUCENE_48));
IndexWriter writer = new IndexWriter(directory, indexWriterConfig);
// Configure facets
FacetsConfig config = new FacetsConfig(cache);
config.SetHierarchical("Publish Date", true);
// Initialize document and add facet fields to it
Document document = new Document();
document.Add(new FacetField("Author", "Bob"));
document.Add(new FacetField("Publish Date", "2010", "10", "15"));
// Add document with facets config and taxonomy writer
writer.AddDocument(document, config, taxoWriter);
// Initialize index searcher
IndexSearcher indexSearcher = new IndexSearcher(writer.GetReader(true));
// Initialize taxonomy reader
taxoReader = new DirectoryTaxonomyReader(taxoWriter);
// Initialize facets collector
facetCollector = new FacetsCollector(false, cache);
// Search docs that match query and collect facets with facets collector
indexSearcher.Search(new MatchAllDocsQuery(), facetCollector);
// Initialize facets
facets = new FastTaxonomyFacetCounts(taxoReader, config, facetCollector);
// Retrieve & verify results:
FacetResult facetResult = facets.GetTopChildren(10, "Publish Date");
float Val = facets.GetSpecificValue("Publish Date");
IList<FacetResult> facetResults = facets.GetAllDims(10);
// Dispose all instances
facetCollector?.Dispose();
facets?.Dispose();
taxoWriter?.Dispose();
taxoReader?.Dispose();
taxoDir?.Dispose();
directory?.Dispose();
Range Based Facets
In the following example, the numeric facets are created using the Int64RangeFacetCounts class. These facets are then added to the document via the IndexWriter
. Documents are searched based on these facets, and the results are verified afterward.
Note
The Distributed Lucene supports all the range-based facets supported by the Lucene.
Directory directory = null;
IndexWriter writer = null;
IndexReader reader = null;
FacetsCollector facetCollector = null;
IndexSearcher indexSearcher = null;
Facets facets = null;
string cache = "luceneCache";
string index = "luceneIndex";
// Open NCache directory
directory = NCacheDirectory.Open(cache, index);
// Configure index writer and initialize it
var indexWriterConfig = new IndexWriterConfig(LuceneVersion.LUCENE_48, new StandardAnalyzer(LuceneVersion.LUCENE_48));
writer = new IndexWriter(directory, indexWriterConfig);
// Create a document
Document document = new Document();
// Create and add numeric fields to the document
NumericDocValuesField field = new NumericDocValuesField
("field", 0L);
document.Add(field);
for (long l = 0; l < 100; l++)
{
field.SetInt64Value(l);
writer.AddDocument(document);
}
// Also add Long.MAX_VALUE
field.SetInt64Value(long.MaxValue);
writer.AddDocument(document);
// Initialize index reader
reader = writer.GetReader(true);
// Initialize facets collector
facetCollector = new FacetsCollector(false, cache);
// Initialize index searcher
indexSearcher = new IndexSearcher(reader);
// Search docs that match query and collect facets with facets collector
indexSearcher.Search(new MatchAllDocsQuery(), facetCollector);
// Intialize facets
facets = new Int64RangeFacetCounts("field",
facetCollector, new Int64Range("less than 10", 0L, true, 10L, false),
new Int64Range("less than or equal to 10", 0L, true, 10L, true),
new Int64Range("over 90", 90L, false, 100L, false),
new Int64Range("90 or above", 90L, true, 100L, false),
new Int64Range("over 1000", 1000L,
false, long.MaxValue, true));
// Retrieve & verify results:
FacetResult facetResult = facets.GetTopChildren(10, "field");
finally
{
// Dispose all instances
facets?.Dispose();
indexSearcher?.Dispose();
facetCollector?.Dispose();
reader?.Dispose();
writer?.Dispose();
directory?.Dispose();
}
Sorted Set Facets
In the following example, the Sorted Set Facets are created with the SortedSetDocValuesFacetCounts
class. These facets are then added to the document, then searched, and later, the results are verified.
Directory directory = null;
IndexWriter writer = null;
IndexSearcher indexSearcher = null;
FacetsCollector facetCollector = null;
SortedSetDocValuesFacetCounts facets = null;
string cache = "luceneCache";
string index = "normalIndex";
// Open NCache directory
directory = NCacheDirectory.Open(cache, index);
// Initialize facets config
FacetsConfig config = new FacetsConfig(cache);
config.SetMultiValued("a", true);
// Initialize index writer
var indexWriterConfig = new IndexWriterConfig
(LuceneVersion.LUCENE_48, new StandardAnalyzer
(LuceneVersion.LUCENE_48));
writer = new IndexWriter(directory, indexWriterConfig);
// Create a document and add fields
Document document = new Document();
document.Add(new SortedSetDocValuesFacetField("a", "foo"));
document.Add(new SortedSetDocValuesFacetField("b", "foo"));
// Add sorted set facet field to document with facet config
writer.AddDocument(document, config);
// Commit writer
writer.Commit();
// Initialize index searcher
indexSearcher = new IndexSearcher(writer.GetReader(true));
// Per-top-reader state:
SortedSetDocValuesReaderState state = new
DefaultSortedSetDocValuesReaderState(indexSearcher.IndexReader);
// Initialize facets collector
facetCollector = new FacetsCollector(false, cache);
// Searcher docs that match query and collect facets with facet collector
indexSearcher.Search(new MatchAllDocsQuery(), facetCollector);
facets = new SortedSetDocValuesFacetCounts(state,
facetCollector);
// Retrieve & verify results:
FacetResult facetResult = facets.GetTopChildren(10, "a");
IList<FacetResult> facetResults = facets.GetAllDims(10);
finally
{
// Dipose all instances
facets?.Dispose();
facetCollector?.Dispose();
indexSearcher?.Dispose();
writer?.Dispose();
directory?.Dispose();
}
Warning
Not disposing of the Facets
, FacetCollector
, NCacheDirectory
, IndexReader
, IndexWriter
, and DirectoryTaxonomyReader
will lead to extra memory consumption on the server-side.
Additional Resources
NCache also provides a sample application for the Facets in the Distributed Lucene on GitHub
See Also
.NET: Lucene.Net.Facet namespace.