Cache Startup Loader Properties and Overview

Note

This feature is only available in NCache Enterprise Edition.

NCache provides the Cache Startup Loader to enable preloading the cache with data as it starts up. This is useful in cases where certain data should be available to the application immediately after it begins executing.

For example, a video streaming site could have hundreds or thousands of videos at the beginning, and later new videos could be added on the server. For such an application, cache can be preloaded with existing videos on cache startup instead of manually adding the data to it.

Cache Loader may fail to load data due to an unsuccessful connection with the data source or any issue that may occur during execution of the cache loader implementation. To identify this, server side cache logs for Cache Loader must be checked for errors/exceptions.

Cache Loader Properties

Loader Service

Previously, the cache loader used to run in the same process as the cache, which resulted in overburden for cache process as the cache startup loader would execute long running tasks and temporarily degrade performance for any following operations.

Thus, for OutProc topologies, NCache has dedicated a Loader Service to manage tasks and load data from the data source into caches on startup. For a clustered topology, each node will have its dedicated service, which will be responsible for carrying out the loading task to its respective cache. For InProc topology, the task is executed in the same process.

Distribution Hints

For clustered topologies, if the data being loaded on a single node is taking up a considerate amount of time, NCache provides the option to distribute the data load among nodes of the cluster. The data is distributed based on “hints” provided by the user for each node. Note that NCache internally assigns the hints to the nodes. This ensures that no two nodes end up loading the duplicate data in the cache, and huge volume of data is loaded in lesser time. Each node has a loader service assigned to load the data according to the distribution hint.

Hints are distributed to the nodes using round robin algorithm. Which means every node in the cluster is assigned a free hint from hint list. As one of the nodes finishes loading data against its hint, next hint is assigned to it. In case the number of distribution hints is greater than the number of nodes, NCache will assign one hint to each node and when all nodes are assigned hints, the next hint would be assigned to the available node which has finished loading data.

Let’s suppose the user wants to load specific data from the Northwind database into a clustered cache of 3 nodes on startup. The Cache Loader performance is affected by the number of hints being assigned:

5 hints

The user allocates 5 hints (Customer, Order, Products, Employees, and Suppliers) to the loader. The coordinator node then assigns the distribution hints to the nodes in a Round Robin manner – Customer to node1, Orders to node2, and Products to node3. Once all nodes have been exhausted, the coordinator node waits for the loader services to load the data into the nodes. As soon as a node is free, the coordinator assigns the next hint, i.e., Employees to it and eventually Suppliers to the next free node.

3 hints

The user assigns 3 hints (Customers, Products and Orders) to the loader. This means that each node is responsible for the hint assigned to it, so it will load the data according to the hint, ensuring equal distribution.

2 hints

Assigning 2 hints to a three node cluster will result in the third node being idle during the loading process. That is why it is preferable that the number of hints is equal or greater than the number nodes so maximum utilization is ensured.

The following properties of NCache Distribution Hints need to be kept in mind:

Null values are not accepted as hints.
Hints are case-insensitive and should be unique. For example, if a hint "Customers" exists, "customers" will be considered as a duplicate value and throw an exception.
Distribution Hints are only enabled for clustered topologies - OutProc and InProc. For local topology, all data is loaded through a single node.

Loading Mechanism

The user specifies the implementation for how and which objects are to be loaded from the master data source. Every individual datum is encapsulated into ProviderCacheItem, which is added to an object of the LoaderResult class. The LoaderResult object is then given to the loader service and it adds/inserts the data into cache.

Bulk Loading

The items are loaded from the data source and populated in the cache in bulk to optimize performance and reduce travelling costs from data source to the cache. The bulk is loaded as an object of the LoaderResult class, which also contains an object to add data to the cache, flags to check key dependency and remaining data and the context for the bulk.

Key Dependency Support

In case of any items that have key dependency, the items will be loaded sequentially instead of as a bulk to maintain consistency of data. For example, if Key2 is dependent on Key1, the item with Key1 has to be added before Key2, else if Key2 is added before Key1, there will be no key to be dependent on and the item will not be added eventually.

Cache Loader Retries

In case the insert operation fails while loading the cache, the failed operations can be performed before proceeding to the next operation. NCache does not retry to perform the failed operation by default, however, the number of retries can be configured through NCache Web Manager.

Cache Loader Retry Interval

If the user opts to enable retries for failed operations, the user can also specify the time interval in seconds to wait before trying again. The interval is zero by default and can be configured through NCache Web Manager.