Configuring MapReduce
You can configure MapReduce for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
Click on the cache name in Cache Explorer to open cache settings.
Go to the MapReduce tab.
Click Deploy Task Libraries.
A dialog box will open. Browse for the libraries which have the MapReduce interfaces implemented and click Open.
A Notification will appear after successful deployment of assemblies.
Maximum number of MapReduce tasks to be executed simultaneously can be changed according to your requirements.
In case you expect exceptions to be thrown during task execution, you can specify the number of exceptions to be avoided from your code, after which the task is failed and logged in the cache error log.
You can modify chunk size - the number of tasks processed in the Mapper and Combiner - before transmitting to Combiner or Reducer.
You can modify the maximum number of tasks that can wait in queue before they are processed.
Using Windows PowerShell
Add-MapReduce
cmdlet configure MapReduce tasks for processing and generating large data sets with a parallel, distributed algorithm on a clustered cache.
The following command configures MapReduce execution on demoLocalCache with default options.
Add-MapReduce -CacheName demoLocalCache
The following command configures MapReduce on demoLocalCache with 20 tasks to be executed in parallel with chunks of 100 elements each.
Add-MapReduce demoLocalCache -MaxTasks 10 -ChunkSize 100