Aggregator (MapReduce) Components and Working [Deprecated]

The Aggregator, built on the MapReduce framework, processes distributed data records to generate compiled and statistical results. It is designed to perform operations across a range of values, reducing large datasets into a smaller, meaningful set of analytical results. In other words, NCache’s built-in Aggregator is essentially a MapReduce Task, but with a broader set of Aggregation operations. Note that the users can also implement their own Aggregator if required.

The Aggregator basically converts the input data from multiple sources into meaningful key-value pairs and can perform a variety of mathematical operations like summing up values, calculating averages, finding minimum/maximum values, etc., to return a single result.

How Does the Aggregator Work?

The Aggregator has the following components:

ValueExtractor

This component extracts the meaningful attributes from the given object, similar to the Mapper in the MapReduce Framework.

Aggregator

The actual grouping and analytical operations take place here as in the Combiner and Reducer of MapReduce. The following operations are supported in the built-in Aggregator of NCache, the BuiltinAggregator:

Operation	Description	Supporting Data Types
`AVG`	Returns the average of the given data present in the cache. The data is returned cumulatively from all the nodes in the cluster.	`Integer`, `Double`, `Float`, `BigInteger`, `Long`, `Short`, `Decimal`
`SUM`	Returns the sum of the value of the item in the data set.	-
`MIN`	Returns the least value of the item in the data set.	`Integer`, `Double`, `Float`, `BigInteger`, `Long`, `Short`, `Decimal`, `String`, `DateTime`
`MAX`	Returns the maximum value of the item in the data set.	`Integer`, `Double`, `Float`, `BigInteger`, `Long`, `Short`, `Decimal`, `String`, `DateTime`
`COUNT`	Returns the total number of occurrences of the item in the data set.	-
`DISTINCT`	Returns the unique occurrence of the item in the data set.	-

If the Aggregator’s MapReduce Task fails due to any exception, an exception will be thrown about the Task failure.
If the result returned after Aggregator execution is null, it will return the default value of the built-in Aggregator for that data type.
Apart from these built-in features, users can also provide their aggregations such as Mean, Median, or Mode. These are logical statistical functions and the user can make as many variants of the aggregation suited to their needs.
The users can provide their own data types as well such as custom objects as the Map Reducer takes the value of the type Object.

NCache provider has also built-in implementation needed for the Aggregator to work for the listed types in the package mentioned above. However, if the user wishes to use the Aggregator for custom types and their implementation of the aggregation, they can achieve this by simply implementing the two interfaces, IValueExtractor and IAggregator.

The implementation of the IValueExtractor interface will contain the Extract() method at least, that will be used by the internal framework to identify the type of an instance by simply returning the type of the object passed to it. On the other hand, the IAggregator interface contains the signatures of the two methods to be implemented; Aggregate() and AgregateAll(). The job of the Value Extractor is to return the filtered data to the Aggregator. The Aggregator then works on the given data set to produce more refined data.

Aggregator (MapReduce) Components and Working [Deprecated]

How Does the Aggregator Work?

See Also

Contact Us