Quantcast
Channel: Planet Apache
Viewing all articles
Browse latest Browse all 9364

Eran Chinthaka: Deploying Cassandra Across Multiple Data Centers with Replication

$
0
0
Cassandra provides a highly scalable key/value storage that can be used for many applications. When Cassandra is to be used in production one might consider deploying it across multiple data centers for various reasons. For example, your current architecture is such that you update data in one data center and all the other data centers should have a replication of the same data but you are ok with eventual consistency.

In this blog post I will discuss how one can deploy a Cassandra across three data centers making sure every data center contains full copy of the complete data set (this is important because you don't have to go across data centers to serve the traffic coming into a given data-center.

I assume you already downloaded and configured Cassandra on each of the boxes in your data centers. Since most of the steps we are doing here should be done for each node in every data center, I encourage you to use a tool like cluster-ssh (this will enable to open connections to all the nodes and run commands in parallel).

Goals
Setup a Cassandra cluster on three data centers with four nodes in each cluster. Every piece of data will be places on three nodes (one in each data center). In other words replication factor is 3. Let's assume our nodes are named as DC<data-center-name>N<node-id>. For example, DC2N3 will be the third node in second data center.

Steps
Note that all these steps, except Step 4, must be followed in EACH AND EVERY node of the cluster. These steps are tested on Cassandra 0.8.7 version.

Step 1: Configure cassandra.yaml
Open up $CASSANDRA_HOME/conf/cassandra.yaml in your favorite test editor (did I hear emacs :D).
  1. change cluster_name to a suitable value instead of the boring 'Test Cluster'.
  2. Set the initial_token. Current Cassandra implementation does a very poor job of distributing keys across the cluster. Go here and enter the number of nodes that you have in total in all data centers. For our example it is 12. Once it is generated carefully copy each value and place in each of the node's cassandra.yaml file under initial_token.
  3. Point data_file_directories, commitlog_directory and saved_caches_directory to proper locations and make sure those locations do exists (otherwise create them).
  4. Set the seeds. It is best to select one node from each data center and list it here. For example, DC1N1, DC2N2, DC3N3
  5. Assuming your node is properly configured to return the right address when java calls InetAddress.getLocalHost(), leave listen_address and rpc_address blank. If you are not sure type hostname in each node and get that value as the address.
  6. Set endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch. We will provide a snitch file later (snitch file let Cassandra know the layout of our data centers.
That's pretty much it you have to do in cassandra.yaml (assuming you haven't touched any of the other default params)

Step 2: Configure log4j-server.properties
Find log4j.appender.R.File and point it to a proper location. Make sure you remember this because this is the log you will be searching for when things are going bad.

Step 3: Configure Snitch File
Open cassandra-topology.properties in a text editor and let Cassandra know about your node and data center configuration. For our example, this is how it should look like.

# Cassandra Node IP=Data Center:Rack
DC1N1=DC1:RAC1
DC1N2=DC1:RAC1
DC1N3=DC1:RAC1
DC1N4=DC1:RAC1

DC2N1=DC1:RAC1
DC2N2=DC1:RAC1
DC2N3=DC1:RAC1
DC2N4=DC1:RAC1

DC3N1=DC1:RAC1
DC3N2=DC1:RAC1
DC3N3=DC1:RAC1
DC3N4=DC1:RAC1

# default for unknown nodes
default=DC1:RAC1

Step 4: Start Your Cluster.
Goto $CASSANDRA_HOME and type ./bin/cassandra -f to bring up the node. Once you do this in all the nodes type ./bin/nodetool -h localhost ring to make sure all the nodes are up and running.

Step 5: Create Data Model with Replication
We are almost there. Now we need to tell Cassandra to use this configuration for our data model. The best way to do is through cassandra-cli.
Goto $CASSANDRA_HOME/bin and type ./cassandra-cli.

Type connect localhost/9160; to connect to the cluster. Note the semi-colon at the end. If successful you will see Connected to: "<YOUR_CLUSTER_NAME>" on localhost/9160;

Now you need to create the keyspace with proper replication. Assuming your keyspace name is MyCompanyKS type the following.

create keyspace MyCompanyKS with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:1,DC2:1,DC3:1}];

and then follow the rest of the steps in cassandra-cli wiki to create column families.

That's it. Now you have an awesome Cassandra cluster spanning across three data centers. Enjoy !!






Viewing all articles
Browse latest Browse all 9364

Trending Articles