Eran Chinthaka: Deploying Cassandra Across Multiple Data Centers with Replication

Cassandra provides a highly scalable key/value storage that can be used for many applications. When Cassandra is to be used in production one might consider deploying it across multiple data centers for various reasons. For example, your current architecture is such that you update data in one data center and all the other data centers should have a replication of the same data but you are ok with eventual consistency.

In this blog post I will discuss how one can deploy a Cassandra across three data centers making sure every data center contains full copy of the complete data set (this is important because you don't have to go across data centers to serve the traffic coming into a given data-center.

I assume you already downloaded and configured Cassandra on each of the boxes in your data centers. Since most of the steps we are doing here should be done for each node in every data center, I encourage you to use a tool like cluster-ssh (this will enable to open connections to all the nodes and run commands in parallel).

Goals
Setup a Cassandra cluster on three data centers with four nodes in each cluster. Every piece of data will be places on three nodes (one in each data center). In other words replication factor is 3. Let's assume our nodes are named as DC<data-center-name>N<node-id>. For example, DC2N3 will be the third node in second data center.

Steps
Note that all these steps, except Step 4, must be followed in EACH AND EVERY node of the cluster. These steps are tested on Cassandra 0.8.7 version.

Step 1: Configure cassandra.yaml
Open up $CASSANDRA_HOME/conf/cassandra.yaml in your favorite test editor (did I hear emacs :D).

change cluster_name to a suitable value instead of the boring 'Test Cluster'.
Set the initial_token. Current Cassandra implementation does a very poor job of distributing keys across the cluster. Go here and enter the number of nodes that you have in total in all data centers. For our example it is 12. Once it is generated carefully copy each value and place in each of the node's cassandra.yaml file under initial_token.
Point data_file_directories, commitlog_directory and saved_caches_directory to proper locations and make sure those locations do exists (otherwise create them).
Set the seeds. It is best to select one node from each data center and list it here. For example, DC1N1, DC2N2, DC3N3
Assuming your node is properly configured to return the right address when java calls InetAddress.getLocalHost(), leave listen_address and rpc_address blank. If you are not sure type hostname in each node and get that value as the address.
Set endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch. We will provide a snitch file later (snitch file let Cassandra know the layout of our data centers.

That's pretty much it you have to do in cassandra.yaml (assuming you haven't touched any of the other default params)

Step 2: Configure log4j-server.properties

Find log4j.appender.R.File and point it to a proper location. Make sure you remember this because this is the log you will be searching for when things are going bad.

Step 3: Configure Snitch File

Open cassandra-topology.properties in a text editor and let Cassandra know about your node and data center configuration. For our example, this is how it should look like.

# Cassandra Node IP=Data Center:Rack

DC1N1=DC1:RAC1

DC1N2=DC1:RAC1

DC1N3=DC1:RAC1

DC1N4=DC1:RAC1

DC2N1=DC1:RAC1

DC2N2=DC1:RAC1

DC2N3=DC1:RAC1

DC2N4=DC1:RAC1

DC3N1=DC1:RAC1
DC3N2=DC1:RAC1
DC3N3=DC1:RAC1
DC3N4=DC1:RAC1

# default for unknown nodes

default=DC1:RAC1

Step 4: Start Your Cluster.

Goto $CASSANDRA_HOME and type ./bin/cassandra -f to bring up the node. Once you do this in all the nodes type ./bin/nodetool -h localhost ring to make sure all the nodes are up and running.

Step 5: Create Data Model with Replication

We are almost there. Now we need to tell Cassandra to use this configuration for our data model. The best way to do is through cassandra-cli.

Goto $CASSANDRA_HOME/bin and type ./cassandra-cli.

Type connect localhost/9160; to connect to the cluster. Note the semi-colon at the end. If successful you will see Connected to: "<YOUR_CLUSTER_NAME>" on localhost/9160;

Now you need to create the keyspace with proper replication. Assuming your keyspace name is MyCompanyKS type the following.

create keyspace MyCompanyKS with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1,DC2:1,DC3:1}];

and then follow the rest of the steps in cassandra-cli wiki to create column families.

That's it. Now you have an awesome Cassandra cluster spanning across three data centers. Enjoy !!

Eran Chinthaka: Deploying Cassandra Across Multiple Data Centers with Replication

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112