Dynamic Clustering in In-Memory Distributed Cache

The NCache in-memory distributed cache cluster is self-healing, dynamic, and highly scalable. You can add or remove cache servers at runtime, without application downtime. The NCache cluster provides linear scalability in terms of handling application request processing and data. When your cache cluster reaches its peak limits you can add more servers to redistribute the requests and data loads.

If a cache server goes down, the cache cluster automatically detects the server failure and adjusts accordingly. Let's consider a case when a cache server goes down in the Partition-Replica Topology, the cache cluster automatically rearranges the partitions and the data. The remaining cache servers copy the leftover data of the server that went down from its backup server.

Every clustered cache has a dedicated TCP-based cluster. The applications talk to the cache cluster over TCP. So, if the application process goes down, it does not affect the cache cluster. Every cache cluster requires a separate TCP port at the time of configuring a clustered cache. The Partition-Replica Topology occupies one extra TCP port for the replica.

The Peer-to-Peer architecture allows every cache server to establish a TCP connection with every other cache server. This enables cache servers to either directly talk to each other whenever necessary or broadcast the requests at other times. The cache cluster, multi-threaded architecture allows executing operations in parallel, thus, reaping the benefits of today's powerful hardware.

You can configure the request timeout for the cache cluster at the time of configuring the cache cluster. The default request timeout is 60 seconds.

Note

This feature is also available in NCache Professional.

Note

Cluster port(s) assigned to the cache cluster should be available on all configured cache servers. Similarly, the firewall should be configured to allow communication between cache servers on given cluster ports. The default cluster port range starts from 7800.

Cluster Coordinator in an In-Memory Distributed Cache

In a distributed environment, the coordination of different tasks is inevitable. In the cache cluster, the coordinator server plays this vital role. Every cache cluster has one coordinator cache server.

The selection of the coordinator server in NCache is simple. The senior-most cache server assumes the role of the coordinator server. So, the cache server which has joined the cluster first becomes the coordinator server. If a coordinator cache server leaves the cache cluster gracefully or abruptly, the next senior-most cache server becomes the coordinator.

The coordinator cache server is responsible for the following:

Cluster Membership: The coordinator server is responsible for accepting membership requests of new cache servers. It also broadcasts the updated membership list to the rest of the servers. Similarly, when a server leaves the cluster, the coordinator server notifies the other servers about the change in membership.
Distribution Map Generation: The coordinator server generates the distribution map and manages the replicas whenever a server joins or leaves the cache cluster in the Partitioned and Partition-Replica topologies.
Request Ordering: Adding and updating of data in the Replicated Topology requires that the operations on all the cache servers are executed in the same order to achieve data consistency. These operations take sequence tokens from the coordinator server, and execution of the operations takes place according to these sequence tokens. This mechanism guarantees that the same updates against a given CacheItem are applied across the entire cluster.
Cache Data Loader and Refresher Management: Cache Startup Loader and Refresher tasks are executed on the coordinator server. When multiple distribution hints are specified for the Cache Startup Loader or Refresher, the coordinator assigns these hints to different cluster members for parallel loading and refreshing.
Data Invalidation and Eviction: Every cache server in Replicated and Mirror topologies contains the same set of data. Therefore, the coordinator server is responsible for data invalidation (Expirations and Dependencies) across the entire cluster. The coordinator server determines which items are to be expired or evicted from the cache, and it removes those items from the entire cluster through a cluster call.

How Servers Join and Leave the Cache Cluster in an In-Memory Distributed Cache

The following is the process by which server nodes join and leave the cache cluster:

Member Discovery and Server Joining

When you start the cache on the first cache server, it forms the cache cluster. A member discovery process runs on every cache server upon cache start. In the discovery process, the cache server tries to establish a TCP connection with the other cache members on the configured cluster port.

When the first cache server starts up, and it can not establish a connection with the other cache servers in the cache cluster, it concludes that no other cache server is up. This server then becomes the first member and the coordinator server of the cache cluster. As soon as the other servers start, they go through the same discovery process and establish connections with the running cache servers.

These cache servers seek information about the coordinator server of the cache cluster from already running cache servers. Once the discovery process has concluded and the coordinator server is determined, the newly joining cache servers send join requests to the coordinator.

The coordinator accepts the join requests, generates the distribution map (only in the Partitioned and Partition-Replica topologies), and broadcasts the new membership list to the entire cache cluster.

Graceful Leaving of Server

When you stop the cache on a cache server through the NCache Management Center or PowerShell tools, the leaving cache server informs the coordinator by sending a leave request. The coordinator immediately processes the leave request and generates distribution maps, and broadcasts the new membership list to the entire cluster. Hence, the cluster quickly adjusts according to the new distribution maps.

Abrupt Server Failure

There are cases when a cache server may go down abruptly without informing the coordinator i.e., a power failure. When a cache server goes down abruptly, the TCP connections break between the leaving server and other cache servers. They try to re-establish the TCP connection for a configurable number of retries with the leaving server. This mechanism helps recover from temporary connection failures and prevent the wrong death declaration of a server.

Once connection retries are over, the given cache server concludes the death of the abrupt leaving server. Upon the death conclusion, cache servers inform the coordinator. Although the coordinator may have already discovered the death of a cache server, the information provided by other cache servers results in quick membership adjustment in case the coordinator discovers the death of the server.

Whatever the case is, the coordinator generates the new distribution maps and broadcasts the new membership list across existing cache servers.

Heartbeat

Detection of a broken TCP connection depends on the traffic going through the connection. For example, If the cache cluster is busy and sending requests across different cache servers, then connection breakage is detected early on. However, if the cache cluster has very low activity, detection of TCP connection breakage may take more time.

To avoid this problem, the NCache cluster has a built-in heartbeat mechanism that can be enabled. When the cluster is in an idle state, heartbeat messages are periodically exchanged across servers. Thus, the heartbeat mechanism generates traffic between cache servers and helps early death detection of abruptly leaving servers.

Nagling

NCache cluster supports the parallel execution of multiple requests. The term nagling refers to combining multiple requests into a single one before sending them over the TCP connection. This increases the throughput of the cluster. You can configure nagling and its threshold through the service configuration file.

Cluster Membership Change Alerts

Change in cluster membership is an important activity. So, NCache provides multiple mechanisms to notify the cache administrators and applications about membership changes.

Cache logs: Cache logs are the first place to look for membership changes for cache administrators.
Event Viewer: Member join and leave notifications are logged into the event viewer on Windows.
Email Alerts: You can configure email alerts of servers joining and leaving events for a given cache. Whenever a membership change takes place, an email is generated and sent to the registered email recipients.
Application-level Notifications: Client applications can also register notifications callback about cluster membership changes. These callbacks are called whenever any change in cluster membership change occurs.

Dynamic Clustering in In-Memory Distributed Cache

Note

Note

Cluster Coordinator in an In-Memory Distributed Cache

How Servers Join and Leave the Cache Cluster in an In-Memory Distributed Cache

Member Discovery and Server Joining

Graceful Leaving of Server

Abrupt Server Failure

Heartbeat

Nagling

Cluster Membership Change Alerts

See Also

Contact Us