Aggregator (MapReduce) Components and Working
Note
This feature is only available in NCache Enterprise Edition.
The Aggregator has been built upon the MapReduce framework and processes distributed data records to return compiled and statistical results. As the name indicates, the Aggregator is used to perform an operation over a range of values to produce a small number of results for analytical purposes. In other words, NCache’s built-in Aggregator is essentially a MapReduce Task, but with a broader set of Aggregation operations. Note that the users can also implement their own Aggregator if required.
The Aggregator basically converts the input data from multiple sources into meaningful key-value pairs and can perform a variety of mathematical operations like summing up values, calculating averages, finding minimum/maximum values etc. to return a single result.
How does the Aggregator Work?
The Aggregator has the following components:
ValueExtractor
This component extracts the meaningful attributes from the given object, similar to the Mapper in the MapReduce Framework.
Aggregator
The actual grouping and analytical operations take place here as in the Combiner
and Reducer of MapReduce. The following operations are supported in the built-in
Aggregator of NCache, the BuiltinAggregator
:
Operation | Description | Supporting Data Types |
---|---|---|
AVG |
Returns the average of the given data present in the cache. The data is returned cumulatively from all the nodes in the cluster. | Integer , Double , Float , BigInteger , Long , Short , Decimal |
SUM |
Returns the sum of the value of the item in the data set. | - |
MIN |
Returns the least value of the item in the data set. | Integer , Double , Float , BigInteger , Long , Short , Decimal , String , DateTime |
MAX |
Returns the maximum value of the item in the data set. | Integer , Double , Float , BigInteger , Long , Short , Decimal , String , DateTime |
COUNT |
Returns the total number of occurrences of the item in the data set. | - |
DISTINCT |
Returns the unique occurrence of the item in the data set. | - |
If Aggregator’s MapReduce Task fails due to any exception, an Exception will be thrown about the task failure.
If the result returned after Aggregator execution is null, it will return the default value of the built-in Aggregator for that data type.
Apart from these built-in features, users can also provide their own aggregations such as Mean, Median or Mode. These are logical statistical functions and the user has the ability to make as many variants of aggregation suited to their needs.
The user can provide own data types as well such as custom objects as the Map Reducer takes value of type Object.
NCache provider has also built-in implementation needed for Aggregator in order to work for above listed types in the package mentioned above. However if the user wishes to use the Aggregator for custom types and their own implementation of aggregation, they can by simply implementing two interfaces IValueExtractor
and IAggregator
.
The implementation of IValueExtractor
interface will contain Extract()
method at least which will be used by the internal framework to identify the type of an instance by simply returning the type of the object passed to it. On the other hand, the IAggregator
interface contains signatures of two methods to be implemented; Aggregate()
and AgregateAll()
. The job of the Value Extractor is to return filtered data to the Aggregator. The Aggregator then works on the given data set to produce more refined data.
See Also
Implement and Use Aggregator
MapReduce
WAN Replication across Multi Datacenters through Bridge
Deploy Providers