Cache Loader and Refresher Properties and Overview
NCache provides a Startup Cache Loader to pre-load the cache with essential data on startup. This mechanism is vital in scenarios where an application requires specific datasets immediately after it begins execution.
For example, imagine a video streaming site with hundreds of videos that need to be available to the user when the user accesses the site. Here, the cache can be pre-loaded with existing videos on cache startup instead of manually adding the data.
Note
Sometimes, the Cache Loader might fail to load data successfully due to a connectivity issue with the primary data source or an error while executing the custom Cache Loader implementation. To identify such an error/exception, check the Execution Service logs.
Loading your cache with data on startup can be very useful. By doing so, you can avoid performance issues at cache startup because of an empty cache - resulting in frequent data requests to the database (which is slow). The NCache Cache Startup Loader feature will help you pre-load your cache with data of your choice at the time of startup.
Despite the advantages of pre-loading the data, it does lead to the loaded data in the cache becoming stale. The user loads the relevant data at cache startup, and any change occurring in the data source outdates it. To prevent this invalidation, NCache provides another feature called Cache Refresher. The Cache Refresher is responsible for synchronizing loaded data in the cache with the updated data in the data source.
Cache Loader and Refresher Properties
The NCache Cache Loader and Refresher are essential features to boost overall application performance, especially at cache startup. Like every feature, these features also have their respective properties. We explain these properties below:
NCache Execution Service
In the earlier NCache versions, the cache and the Cache Loader used to run in the same process, which overburdened the cache process, particularly at loading time. This stress resulted in temporary degradation in the performance of the overall cache.
So, for OutProc topologies, NCache has a dedicated NCache Execution Service (previously known as Loader Service) to manage tasks and load data from the data source into caches on cache startup. This service fulfills different responsibilities in .NET and Java editions. In .NET Edition, it manages Cache Loader and Refresher. Whereas, in Java Edition, the service is responsible for managing Cache Loader, Refresher, Data Source Providers, and JMX counters publishing.
Alternatively, in a clustered topology, each node has a dedicated service responsible for loading data into its cache. However, within the InProc topology, the task still executes in the same process.
Datasets
For clustered topologies, if the data loaded on a single node takes up a considerable amount of time, NCache allows the distribution of the data load among cluster nodes. The data is distributed based on user-provided datasets for every single node. Each node has NCache Execution Service assigned to load the data according to the datasets. Basically, a dataset is a way for you to group similar data to load together.
Dataset Assignment to Cache Servers at Runtime
Note
NCache internally assigns the datasets to the nodes - ensuring no two nodes end up loading the same data in the cache. This assignment also allows a huge volume of data to load in less time.
The coordinator node distributes datasets among the cluster nodes in a round-robin fashion. Therefore, each of the servers is assigned a dataset from the list. As one of the nodes finishes loading data against its dataset, it receives the next dataset to load. Essentially, if the number of distribution datasets is greater than the number of nodes, NCache will assign one dataset to each node, and when no more nodes are available, it assigns the next dataset to the first available node (which has finished loading data).
Let’s say the user wants to load specific data from the Northwind database into a clustered cache of 3 nodes on startup, then the Cache Loader performance is affected by the number of datasets assigned. We discuss this behavior below:
5 datasets to load: The user allocates 5 datasets (Customer, Order, Products, Employees, and Suppliers) to the Loader. The coordinator node then assigns the datasets to the nodes in a round-robin manner – Customer to node1, Orders to node2, and Products to node3. As soon as a node finishes loading data, the coordinator assigns the next dataset, i.e., Employees and eventually Suppliers to the next available node.
3 datasets to load: The user assigns 3 datasets (Customers, Products, and Orders) to the Loader - meaning that each node is responsible for the dataset assigned to it, so it will load the data according to the dataset while ensuring equal distribution.
2 datasets to load: The user assigns 2 datasets (Customers and Products) to the Loader. Since the cluster consists of three nodes, the third node will be idle during the loading process. That is why it prefers the number of datasets to be equal to or greater than the number of nodes - ensuring maximum utilization.
Dataset Scheduling
Datasets need scheduling for refreshing. Therefore, NCache provides a scheduling option that decides the time interval after which it updates the cache data. At this time, the Refresh Interval checks for datasets to be updated, thus updating/refreshing the consequent data in the cache. The four different schedule options provided with Cache Refresher work as follows:
Daily Interval: The daily interval ensures a dataset refreshes at a set interval after the cache starts. The value of the interval is in minutes. For example, 20 minutes means the dataset refreshes after every 20 minutes.
Daily Time: The daily time option ensures a dataset refreshes every day at a specific time provided by the user. Unlike daily intervals with a gap of 60 minutes at maximum, it is usually 24 hours apart and does not rely on cache start for a beginning. It is generally employed when datasets are not updated as frequently as the ones with daily intervals.
Weekly: The weekly option ensures a dataset refreshes on specific days every week at the time specified by the user. For example, if you want your loaded datasets to be refreshed every Monday, Thursday, and Saturday at precisely midnight, you need to set weekly dataset scheduling.
Monthly: The monthly option ensures a dataset refreshes on one or multiple specified days every month and one or multiple weekdays per week. For example, you can specify the refreshing of the dataset so that the service refreshes it every Monday of every first and last week of the month.
The schedule expression has the format week:days:hours:minutes
to specify the scheduling expression.
- Weeks can be 1-4, with 1 being the first week of the month.
- Days can be 1-7 showing the days of the week. The hours and minutes can be according to the time of the day for scheduling. You can specify more than one day of the week by adding the days separated by a comma.
- Users can select multiple weeks from a month for scheduling.
Let us take a few examples to understand how the scheduling expression works:
The expression
1,2:2:00:00
for scheduling refers to dataset refreshes on the 2nd day of the first and second week of the month at midnight.The expression
1:1,2,7:15:30
for scheduling shows datasets refresh on the first, second, and seventh day of the first week of the month (Monday, Tuesday, and Sunday) at 3:30 p.m.
Loading Mechanism
The user specifies the implementation of which objects load from the master data source. Every individual datum exists as a complete CacheItem
- added to the cache on cache startup.
Cache Loader Retries
If an operation fails while loading the cache, NCache performs it before proceeding to the next. By default, NCache does not retry the failed operation. However, you can enable this through the NCache Management Center.
Cache Loader Retry Interval
If the user opts to enable retries for failed operations, the user can also specify the time interval to wait (in seconds) before trying the failed operation again. The interval is 0 by default. However, it is user-configurable through the NCache Management Center.
Refresh Interval
To check which datasets need updating/refreshing, a thread runs after a specific time, known as the Refresh Interval. By default, the Refresh Interval is 900 seconds. The minimum value for this interval is 1 second, and the maximum is 3600 seconds. Users can configure the Refresh Interval through the NCache Management Center.
On-Demand Dataset Refresh
The user also has the option to refresh their datasets manually through the Invoke-RefresherDataset cmdlet. Through this cmdlet, the user can either refresh their datasets immediately or within the next 24 hours using the RefreshPreference
option of this cmdlet. This option identifies when this on-demand refresh will occur based on whether or not the process will result in degradation. If not, the dataset will RefreshNow
. If it does, the dataset will refresh at the next scheduled time through the RefreshOnNextTimeOfDay
option.
See Also
Components of Cache Startup Loader and Refresher
Data Source Providers (Backing Source)
Upgrade NCache Versions