Aggregator (MapReduce) Components and Working [Deprecated]
The Aggregator has been built upon the MapReduce framework and processes distributed data records to return compiled and statistical results. As the name indicates, the Aggregator is used to perform an operation over a range of values to produce a small number of results for analytical purposes. In other words, NCache’s built-in Aggregator is essentially a MapReduce Task, but with a broader set of Aggregation operations. Note that the users can also implement their own Aggregator if required.
The Aggregator basically converts the input data from multiple sources into meaningful key-value pairs and can perform a variety of mathematical operations like summing up values, calculating averages, finding minimum/maximum values, etc., to return a single result.
How Does the Aggregator Work?
The Aggregator has the following components:
ValueExtractor
This component extracts the meaningful attributes from the given object, similar to the Mapper in the MapReduce Framework.
Aggregator
The actual grouping and analytical operations take place here as in the Combiner and Reducer of MapReduce. The following operations are supported in the built-in Aggregator of NCache, the BuiltinAggregator
:
Operation | Description | Supporting Data Types |
---|---|---|
AVG |
Returns the average of the given data present in the cache. The data is returned cumulatively from all the nodes in the cluster. | Integer , Double , Float , BigInteger , Long , Short , Decimal |
SUM |
Returns the sum of the value of the item in the data set. | - |
MIN |
Returns the least value of the item in the data set. | Integer , Double , Float , BigInteger , Long , Short , Decimal , String , DateTime |
MAX |
Returns the maximum value of the item in the data set. | Integer , Double , Float , BigInteger , Long , Short , Decimal , String , DateTime |
COUNT |
Returns the total number of occurrences of the item in the data set. | - |
DISTINCT |
Returns the unique occurrence of the item in the data set. | - |
If the Aggregator’s MapReduce Task fails due to any exception, an exception will be thrown about the Task failure.
If the result returned after Aggregator execution is null, it will return the default value of the built-in Aggregator for that data type.
Apart from these built-in features, users can also provide their aggregations such as Mean, Median, or Mode. These are logical statistical functions and the user can make as many variants of the aggregation suited to their needs.
The users can provide their own data types as well such as custom objects as the Map Reducer takes the value of the type Object.
NCache provider has also built-in implementation needed for the Aggregator to work for the listed types in the package mentioned above. However, if the user wishes to use the Aggregator for custom types and their implementation of the aggregation, they can achieve this by simply implementing the two interfaces, IValueExtractor
and IAggregator
.
The implementation of the IValueExtractor
interface will contain the Extract()
method at least, that will be used by the internal framework to identify the type of an instance by simply returning the type of the object passed to it. On the other hand, the IAggregator
interface contains the signatures of the two methods to be implemented; Aggregate()
and AgregateAll()
. The job of the Value Extractor is to return the filtered data to the Aggregator. The Aggregator then works on the given data set to produce more refined data.
See Also
Implement and Use Aggregator
MapReduce
WAN Replication across Multi Datacenters through Bridge
Deploy Providers