36 KiB
Cassandra Cluster Setup (3-Node)
Prerequisites: Complete 01_init_docker_swarm.md first
Time to Complete: 60-90 minutes
What You'll Build:
- 3 new DigitalOcean droplets (workers 2, 3, 4)
- 3-node Cassandra cluster using Docker Swarm
- Replication factor 3 for high availability
- Private network communication only
Table of Contents
- Overview
- Create Cassandra Worker Droplets
- Configure Workers and Join Swarm
- Deploy Cassandra Cluster
- Initialize Keyspaces
- Verify Cluster Health
- Cluster Management
- Troubleshooting
Overview
Architecture
Swarm Manager (existing):
├── mapleopentech-swarm-manager-1-prod (10.116.0.2)
└── Controls cluster, no Cassandra
Existing Worker:
└── mapleopentech-swarm-worker-1-prod (10.116.0.3)
└── Available for other services
Cassandra Cluster (NEW):
├── mapleopentech-swarm-worker-2-prod (10.116.0.4)
│ └── Cassandra Node 1
├── mapleopentech-swarm-worker-3-prod (10.116.0.5)
│ └── Cassandra Node 2
└── mapleopentech-swarm-worker-4-prod (10.116.0.6)
└── Cassandra Node 3
Cassandra Configuration
- Version: Cassandra 5.0.4
- Cluster Name: mapleopentech-private-prod-cluster
- Replication Factor: 3 (each data stored on all 3 nodes)
- Data Center: datacenter1
- Heap Size: 512MB (reduced for 2GB RAM constraint)
- Communication: Private network only (secure)
⚠️ IMPORTANT - Memory Constraints: This configuration uses minimal 2GB RAM droplets with 512MB heap size. This is NOT recommended for production use. Expect:
- Limited performance (max ~1,000 writes/sec vs 10,000 with proper sizing)
- Potential stability issues under load
- Frequent garbage collection pauses
- Limited concurrent connection capacity
For production use, upgrade to 8GB RAM droplets with 2GB heap size.
Why 3 Nodes?
- High Availability: Cluster survives 1 node failure
- Replication Factor 3: Every piece of data stored on all 3 nodes
- Read Performance: Queries can hit any node
- Write Performance: Writes distributed across cluster
- Production Standard: Minimum for HA Cassandra
Create Cassandra Worker Droplets
Step 1: Create Worker 2 (Cassandra Node 1)
From DigitalOcean Dashboard:
- Go to https://cloud.digitalocean.com/
- Click Create → Droplets
Droplet Configuration:
| Setting | Value |
|---|---|
| Region | Toronto 1 (TOR1) - SAME as existing |
| Image | Ubuntu 24.04 LTS x64 |
| Droplet Type | Regular Intel |
| CPU Options | 1 vCPU, 2 GB RAM ($12/month) |
| Storage | 50 GB SSD |
| VPC | default-tor1 (auto-selected) |
| SSH Key | Select your key |
| Hostname | mapleopentech-swarm-worker-2-prod |
| Tags | production, cassandra, database |
Click Create Droplet and wait 60 seconds.
✅ Checkpoint - Save to .env:
# On your local machine:
SWARM_WORKER_2_HOSTNAME=mapleopentech-swarm-worker-2-prod
SWARM_WORKER_2_PUBLIC_IP=159.65.123.47 # Your public IP
SWARM_WORKER_2_PRIVATE_IP=10.116.0.4 # Your private IP
CASSANDRA_NODE_1_IP=10.116.0.4 # Same as private IP
Step 2: Create Worker 3 (Cassandra Node 2)
Repeat with these values:
| Setting | Value |
|---|---|
| Hostname | mapleopentech-swarm-worker-3-prod |
| All other settings | Same as Worker 2 |
✅ Checkpoint - Save to .env:
SWARM_WORKER_3_HOSTNAME=mapleopentech-swarm-worker-3-prod
SWARM_WORKER_3_PUBLIC_IP=159.65.123.48 # Your public IP
SWARM_WORKER_3_PRIVATE_IP=10.116.0.5 # Your private IP
CASSANDRA_NODE_2_IP=10.116.0.5 # Same as private IP
Step 3: Create Worker 4 (Cassandra Node 3)
Repeat with these values:
| Setting | Value |
|---|---|
| Hostname | mapleopentech-swarm-worker-4-prod |
| All other settings | Same as Worker 2 |
✅ Checkpoint - Save to .env:
SWARM_WORKER_4_HOSTNAME=mapleopentech-swarm-worker-4-prod
SWARM_WORKER_4_PUBLIC_IP=159.65.123.49 # Your public IP
SWARM_WORKER_4_PRIVATE_IP=10.116.0.6 # Your private IP
CASSANDRA_NODE_3_IP=10.116.0.6 # Same as private IP
Step 4: Verify All Droplets in Same VPC
- Go to Networking → VPC → Click
default-tor1 - Should see 5 droplets total:
- mapleopentech-swarm-manager-1-prod (10.116.0.2)
- mapleopentech-swarm-worker-1-prod (10.116.0.3)
- mapleopentech-swarm-worker-2-prod (10.116.0.4)
- mapleopentech-swarm-worker-3-prod (10.116.0.5)
- mapleopentech-swarm-worker-4-prod (10.116.0.6)
Configure Workers and Join Swarm
Follow these steps for EACH of the 3 new workers (workers 2, 3, 4).
Worker 2 Setup
Step 1: Initial SSH as Root
# SSH to Worker 2
ssh root@159.65.123.47 # Replace with YOUR worker 2 public IP
# You should see: root@mapleopentech-swarm-worker-2-prod:~#
Step 2: System Updates and Create Admin User
# Update system
apt update && apt upgrade -y
# Install essentials
apt install -y curl wget apt-transport-https ca-certificates gnupg lsb-release
# Create dockeradmin user
adduser dockeradmin
# Use the SAME password as other nodes
# Add to sudo group
usermod -aG sudo dockeradmin
# Copy SSH keys
rsync --archive --chown=dockeradmin:dockeradmin ~/.ssh /home/dockeradmin
Step 3: Secure SSH Configuration
# Edit SSH config
vi /etc/ssh/sshd_config
# Update these lines:
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
LoginGraceTime 60
# Save and restart SSH
systemctl restart ssh
Step 4: Reconnect as dockeradmin
# Exit root session
exit
# SSH back as dockeradmin
ssh dockeradmin@159.65.123.47 # Replace with YOUR worker 2 public IP
# You should see: dockeradmin@mapleopentech-swarm-worker-2-prod:~$
Step 5: Install Docker
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Add dockeradmin to docker group
sudo usermod -aG docker dockeradmin
newgrp docker
# Verify
docker --version
# Enable Docker
sudo systemctl enable docker
sudo systemctl status docker
# Press 'q' to exit
Step 6: Configure Firewall
# Install UFW
sudo apt install ufw -y
# Allow SSH
sudo ufw allow 22/tcp
# Allow Docker Swarm ports (replace with YOUR VPC subnet from .env)
sudo ufw allow from 10.116.0.0/16 to any port 2377 proto tcp
sudo ufw allow from 10.116.0.0/16 to any port 7946
sudo ufw allow from 10.116.0.0/16 to any port 4789 proto udp
# Allow Cassandra ports (private network only)
# 7000: Inter-node communication
# 7001: Inter-node communication (TLS)
# 9042: CQL native transport (client connections)
sudo ufw allow from 10.116.0.0/16 to any port 7000 proto tcp
sudo ufw allow from 10.116.0.0/16 to any port 7001 proto tcp
sudo ufw allow from 10.116.0.0/16 to any port 9042 proto tcp
# Enable firewall
sudo ufw --force enable
# Check status
sudo ufw status verbose
Step 7: Join Docker Swarm
# Use the join command from Step 8 of 01_init_docker_swarm.md
# Replace with YOUR actual token and manager private IP:
docker swarm join --token SWMTKN-1-4abc123xyz789verylongtoken 10.116.0.2:2377
# Expected output:
# This node joined a swarm as a worker.
✅ Worker 2 complete! Repeat Steps 1-7 for Workers 3 and 4.
Worker 3 Setup
Repeat Steps 1-7 above, replacing:
- Public IP: Use Worker 3's public IP (159.65.123.48 example)
- Hostname:
mapleopentech-swarm-worker-3-prod
Worker 4 Setup
Repeat Steps 1-7 above, replacing:
- Public IP: Use Worker 4's public IP (159.65.123.49 example)
- Hostname:
mapleopentech-swarm-worker-4-prod
Deploy Cassandra Cluster
Step 1: Verify All Workers Joined
From your manager node:
# SSH to manager
ssh dockeradmin@159.65.123.45 # Your manager's public IP
# List all swarm nodes
docker node ls
# Expected output (5 nodes total):
# ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
# abc123... * mapleopentech-swarm-manager-1-prod Ready Active Leader
# def456... mapleopentech-swarm-worker-1-prod Ready Active
# ghi789... mapleopentech-swarm-worker-2-prod Ready Active
# jkl012... mapleopentech-swarm-worker-3-prod Ready Active
# mno345... mapleopentech-swarm-worker-4-prod Ready Active
Step 2: Label Cassandra Nodes
Apply labels so Cassandra services deploy to correct nodes:
# Label Worker 2 as Cassandra Node 1
docker node update --label-add cassandra=node1 mapleopentech-swarm-worker-2-prod
# Label Worker 3 as Cassandra Node 2
docker node update --label-add cassandra=node2 mapleopentech-swarm-worker-3-prod
# Label Worker 4 as Cassandra Node 3
docker node update --label-add cassandra=node3 mapleopentech-swarm-worker-4-prod
# Verify labels
docker node inspect mapleopentech-swarm-worker-2-prod --format '{{.Spec.Labels}}'
# Should show: map[cassandra:node1]
Step 3: Create Docker Stack File
On your manager, create the Cassandra stack:
# Create directory for stack files
mkdir -p ~/stacks
cd ~/stacks
# Create Cassandra stack file
vi cassandra-stack.yml
Copy and paste the following:
version: '3.8'
networks:
mapleopentech-private-prod:
external: true
volumes:
cassandra-1-data:
cassandra-2-data:
cassandra-3-data:
services:
cassandra-1:
image: cassandra:5.0.4
hostname: cassandra-1
networks:
- mapleopentech-private-prod
environment:
- CASSANDRA_CLUSTER_NAME=mapleopentech-private-prod-cluster
- CASSANDRA_DC=datacenter1
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_SEEDS=cassandra-1,cassandra-2,cassandra-3
- MAX_HEAP_SIZE=512M
- HEAP_NEWSIZE=128M
volumes:
- cassandra-1-data:/var/lib/cassandra
deploy:
replicas: 1
placement:
constraints:
- node.labels.cassandra == node1
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
healthcheck:
test: ["CMD-SHELL", "cqlsh -e 'describe cluster' || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
cassandra-2:
image: cassandra:5.0.4
hostname: cassandra-2
networks:
- mapleopentech-private-prod
environment:
- CASSANDRA_CLUSTER_NAME=mapleopentech-private-prod-cluster
- CASSANDRA_DC=datacenter1
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_SEEDS=cassandra-1,cassandra-2,cassandra-3
- MAX_HEAP_SIZE=512M
- HEAP_NEWSIZE=128M
volumes:
- cassandra-2-data:/var/lib/cassandra
deploy:
replicas: 1
placement:
constraints:
- node.labels.cassandra == node2
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
healthcheck:
test: ["CMD-SHELL", "cqlsh -e 'describe cluster' || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
cassandra-3:
image: cassandra:5.0.4
hostname: cassandra-3
networks:
- mapleopentech-private-prod
environment:
- CASSANDRA_CLUSTER_NAME=mapleopentech-private-prod-cluster
- CASSANDRA_DC=datacenter1
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_SEEDS=cassandra-1,cassandra-2,cassandra-3
- MAX_HEAP_SIZE=512M
- HEAP_NEWSIZE=128M
volumes:
- cassandra-3-data:/var/lib/cassandra
deploy:
replicas: 1
placement:
constraints:
- node.labels.cassandra == node3
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
healthcheck:
test: ["CMD-SHELL", "cqlsh -e 'describe cluster' || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 120s
Step 4: Create Shared Overlay Network
Before deploying any services, create the shared mapleopentech-private-prod network that all services will use:
# Create the mapleopentech-private-prod overlay network
docker network create \
--driver overlay \
--attachable \
mapleopentech-private-prod
# Verify it was created
docker network ls | grep mapleopentech-private-prod
# Should show:
# abc123... mapleopentech-private-prod overlay swarm
What is this network for?
- Shared by all Maple services (Cassandra, Redis, Go backend, etc.)
- Enables private communication between services
- Services can reach each other by service name (e.g.,
redis,cassandra-1) - No public internet exposure
Step 5: Create Deployment Script
Create the sequential deployment script to avoid race conditions:
# Create the deployment script
vi deploy-cassandra.sh
Copy and paste the following script:
#!/bin/bash
#
# Cassandra Cluster Sequential Deployment Script
# This script deploys Cassandra nodes sequentially to avoid race conditions
# during cluster formation.
#
set -e
STACK_NAME="cassandra"
STACK_FILE="cassandra-stack.yml"
echo "=== Cassandra Cluster Sequential Deployment ==="
echo ""
# Check if stack file exists
if [ ! -f "$STACK_FILE" ]; then
echo "ERROR: $STACK_FILE not found in current directory"
exit 1
fi
echo "Step 1: Deploying cassandra-1 (seed node)..."
docker stack deploy -c "$STACK_FILE" "$STACK_NAME"
# Scale down cassandra-2 and cassandra-3 temporarily
docker service scale "${STACK_NAME}_cassandra-2=0" > /dev/null 2>&1
docker service scale "${STACK_NAME}_cassandra-3=0" > /dev/null 2>&1
echo "Waiting for cassandra-1 to become healthy (this takes ~5-8 minutes)..."
echo "Checking every 30 seconds..."
# Wait for cassandra-1 to be running
COUNTER=0
MAX_WAIT=20 # 20 * 30 seconds = 10 minutes max
while [ $COUNTER -lt $MAX_WAIT ]; do
REPLICAS=$(docker service ls --filter "name=${STACK_NAME}_cassandra-1" --format "{{.Replicas}}")
if [ "$REPLICAS" = "1/1" ]; then
echo "✓ cassandra-1 is running"
# Give it extra time to fully initialize
echo "Waiting additional 2 minutes for cassandra-1 to fully initialize..."
sleep 120
break
fi
echo " cassandra-1 status: $REPLICAS (waiting...)"
sleep 30
COUNTER=$((COUNTER + 1))
done
if [ $COUNTER -eq $MAX_WAIT ]; then
echo "ERROR: cassandra-1 failed to start within 10 minutes"
echo "Check logs with: docker service logs ${STACK_NAME}_cassandra-1"
exit 1
fi
echo ""
echo "Step 2: Starting cassandra-2..."
docker service scale "${STACK_NAME}_cassandra-2=1"
echo "Waiting for cassandra-2 to become healthy (this takes ~5-8 minutes)..."
COUNTER=0
while [ $COUNTER -lt $MAX_WAIT ]; do
REPLICAS=$(docker service ls --filter "name=${STACK_NAME}_cassandra-2" --format "{{.Replicas}}")
if [ "$REPLICAS" = "1/1" ]; then
echo "✓ cassandra-2 is running"
echo "Waiting additional 2 minutes for cassandra-2 to join cluster..."
sleep 120
break
fi
echo " cassandra-2 status: $REPLICAS (waiting...)"
sleep 30
COUNTER=$((COUNTER + 1))
done
if [ $COUNTER -eq $MAX_WAIT ]; then
echo "ERROR: cassandra-2 failed to start within 10 minutes"
echo "Check logs with: docker service logs ${STACK_NAME}_cassandra-2"
exit 1
fi
echo ""
echo "Step 3: Starting cassandra-3..."
docker service scale "${STACK_NAME}_cassandra-3=1"
echo "Waiting for cassandra-3 to become healthy (this takes ~5-8 minutes)..."
COUNTER=0
while [ $COUNTER -lt $MAX_WAIT ]; do
REPLICAS=$(docker service ls --filter "name=${STACK_NAME}_cassandra-3" --format "{{.Replicas}}")
if [ "$REPLICAS" = "1/1" ]; then
echo "✓ cassandra-3 is running"
echo "Waiting additional 2 minutes for cassandra-3 to join cluster..."
sleep 120
break
fi
echo " cassandra-3 status: $REPLICAS (waiting...)"
sleep 30
COUNTER=$((COUNTER + 1))
done
if [ $COUNTER -eq $MAX_WAIT ]; then
echo "ERROR: cassandra-3 failed to start within 10 minutes"
echo "Check logs with: docker service logs ${STACK_NAME}_cassandra-3"
exit 1
fi
echo ""
echo "=== Deployment Complete ==="
echo ""
echo "All 3 Cassandra nodes should now be running and forming a cluster."
echo ""
echo "Verify cluster status by SSH'ing to any worker node and running:"
echo " docker exec -it \$(docker ps -q --filter \"name=cassandra\") nodetool status"
echo ""
echo "You should see 3 nodes with status 'UN' (Up Normal)."
echo ""
Make it executable:
chmod +x deploy-cassandra.sh
Step 6: Deploy Cassandra Cluster Sequentially
⚠️ CRITICAL - READ THIS BEFORE DEPLOYING ⚠️
DO NOT use docker stack deploy -c cassandra-stack.yml cassandra directly!
Why? This creates a race condition: all 3 nodes start simultaneously, try to connect to each other before they're ready, give up, and form separate single-node clusters instead of one 3-node cluster. This is a classic distributed systems problem.
What happens if you do? Each node will run independently. Running nodetool status on any node will show only 1 node instead of 3. The cluster will appear broken.
The fix? Use the sequential deployment script below, which starts nodes one at a time:
ALWAYS use the deployment script:
# Run the sequential deployment script
./deploy-cassandra.sh
What this script does:
- Deploys cassandra-1 first and waits for it to be fully healthy (~5-8 minutes)
- Starts cassandra-2 and waits for it to join the cluster (~5-8 minutes)
- Starts cassandra-3 and waits for it to join the cluster (~5-8 minutes)
- Total deployment time: 15-25 minutes
Expected output:
=== Cassandra Cluster Sequential Deployment ===
Step 1: Deploying cassandra-1 (seed node)...
Creating network cassandra_maple-private-prod
Creating service cassandra_cassandra-1
Creating service cassandra_cassandra-2
Creating service cassandra_cassandra-3
cassandra_cassandra-2 scaled to 0
cassandra_cassandra-3 scaled to 0
Waiting for cassandra-1 to become healthy (this takes ~5-8 minutes)...
Checking every 30 seconds...
cassandra-1 status: 0/1 (waiting...)
cassandra-1 status: 1/1 (waiting...)
✓ cassandra-1 is running
Waiting additional 2 minutes for cassandra-1 to fully initialize...
Step 2: Starting cassandra-2...
cassandra_cassandra-2 scaled to 1
Waiting for cassandra-2 to become healthy (this takes ~5-8 minutes)...
cassandra-2 status: 0/1 (waiting...)
cassandra-2 status: 1/1 (waiting...)
✓ cassandra-2 is running
Waiting additional 2 minutes for cassandra-2 to join cluster...
Step 3: Starting cassandra-3...
cassandra_cassandra-3 scaled to 1
Waiting for cassandra-3 to become healthy (this takes ~5-8 minutes)...
cassandra-3 status: 0/1 (waiting...)
cassandra-3 status: 1/1 (waiting...)
✓ cassandra-3 is running
Waiting additional 2 minutes for cassandra-3 to join cluster...
=== Deployment Complete ===
All 3 Cassandra nodes should now be running and forming a cluster.
If the script fails, check the service logs:
docker service logs cassandra_cassandra-1
docker service logs cassandra_cassandra-2
docker service logs cassandra_cassandra-3
Initialize Keyspaces
Step 1: Connect to Cassandra Node 1
# Get the node where cassandra-1 is running
docker service ps cassandra_cassandra-1 --format "{{.Node}}"
# Output: mapleopentech-swarm-worker-2-prod
# SSH to that worker
ssh dockeradmin@10.116.0.4 # Private IP of worker 2
# Find container ID
CONTAINER_ID=$(docker ps --filter "name=cassandra_cassandra-1" --format "{{.ID}}")
# Open CQL shell
docker exec -it $CONTAINER_ID cqlsh
Step 2: Create Keyspaces
-- MaplePress Backend
CREATE KEYSPACE IF NOT EXISTS maplepress
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
}
AND DURABLE_WRITES = true;
-- MapleFile Backend
CREATE KEYSPACE IF NOT EXISTS maplefile
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
}
AND DURABLE_WRITES = true;
-- mapleopentech Backend
CREATE KEYSPACE IF NOT EXISTS mapleopentech
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
}
AND DURABLE_WRITES = true;
-- Verify
DESCRIBE KEYSPACES;
-- Exit CQL shell
exit
Expected output should show your keyspaces:
maplepress maplefile mapleopentech system system_auth system_distributed system_schema system_traces system_views system_virtual_schema
Verify Cluster Health
Step 1: Check Cluster Status
From inside cassandra-1 container:
# If not already in container:
CONTAINER_ID=$(docker ps --filter "name=cassandra_cassandra-1" --format "{{.ID}}")
docker exec -it $CONTAINER_ID bash
# Check cluster status
nodetool status
# Expected output:
# Datacenter: datacenter1
# =======================
# Status=Up/Down
# |/ State=Normal/Leaving/Joining/Moving
# -- Address Load Tokens Owns Host ID Rack
# UN 10.116.0.4 125 KiB 16 100.0% abc123... rack1
# UN 10.116.0.5 120 KiB 16 100.0% def456... rack1
# UN 10.116.0.6 118 KiB 16 100.0% ghi789... rack1
What to verify:
- ✅ All 3 nodes show
UN(Up and Normal) - ✅ Each node has an IP from your private network (10.116.0.x)
- ✅ Load is distributed
- ✅ Owns shows roughly 100% (data is replicated everywhere with RF=3)
Step 2: Test Write/Read
Still in cassandra-1 container:
# Open CQL shell
cqlsh
# Create test keyspace
CREATE KEYSPACE IF NOT EXISTS test
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 3
};
USE test;
# Create test table
CREATE TABLE IF NOT EXISTS users (
user_id UUID PRIMARY KEY,
username TEXT,
email TEXT
);
# Insert test data
INSERT INTO users (user_id, username, email)
VALUES (uuid(), 'testuser', 'test@example.com');
# Read data
SELECT * FROM users;
# Expected output:
# user_id | email | username
# --------------------------------------+------------------+-----------
# abc123-def456-... | test@example.com | testuser
# Exit
exit
exit # Exit container too
Step 3: Verify Replication
Connect to Node 2 and verify data is there:
# SSH to worker 3 (Node 2)
ssh dockeradmin@10.116.0.5
# Find cassandra-2 container
CONTAINER_ID=$(docker ps --filter "name=cassandra_cassandra-2" --format "{{.ID}}")
# Connect and query
docker exec -it $CONTAINER_ID cqlsh -e "SELECT * FROM test.users;"
# Should see the same test data!
# This proves replication is working.
Step 4: Save Connection Details
✅ Final Checkpoint - Update .env:
# On your local machine, add:
CASSANDRA_CLUSTER_NAME=mapleopentech-private-prod-cluster
CASSANDRA_DC=datacenter1
CASSANDRA_REPLICATION_FACTOR=3
# Connection endpoints (any node can be used)
CASSANDRA_CONTACT_POINTS=10.116.0.4,10.116.0.5,10.116.0.6
CASSANDRA_CQL_PORT=9042
# For application connections (use private IPs)
CASSANDRA_NODE_1_IP=10.116.0.4
CASSANDRA_NODE_2_IP=10.116.0.5
CASSANDRA_NODE_3_IP=10.116.0.6
Cluster Management
Restarting the Cassandra Cluster
To restart all Cassandra nodes:
# On manager node
docker service update --force cassandra_cassandra-1
docker service update --force cassandra_cassandra-2
docker service update --force cassandra_cassandra-3
# Wait 5-8 minutes for all nodes to restart
# Then verify cluster health
docker exec -it $(docker ps -q --filter "name=cassandra") nodetool status
To restart a single node:
# Restart just one service
docker service update --force cassandra_cassandra-1
# Wait for it to rejoin the cluster
# Check status from any worker
docker exec -it $(docker ps -q --filter "name=cassandra") nodetool status
Shutting Down the Cassandra Cluster
To stop the entire stack (keeps data):
# On manager node
docker stack rm cassandra
# Services will be removed but volumes persist
# Data is safe and can be restored later
To verify shutdown:
# On manager node - check that services are gone
docker stack ls
# cassandra should not appear
# Volumes are on worker nodes, not manager
# SSH to each worker to verify volumes still exist (data is safe):
# On worker-2:
ssh dockeradmin@<worker-2-ip>
docker volume ls | grep cassandra
# Should show: cassandra_cassandra-1-data
exit
# On worker-3:
ssh dockeradmin@<worker-3-ip>
docker volume ls | grep cassandra
# Should show: cassandra_cassandra-2-data
exit
# On worker-4:
ssh dockeradmin@<worker-4-ip>
docker volume ls | grep cassandra
# Should show: cassandra_cassandra-3-data
exit
To restart after shutdown:
# Use the deployment script again
cd ~/stacks
./deploy-cassandra.sh
# Your data will be intact
Removing All Cassandra Data (Fresh Start)
⚠️ WARNING: This PERMANENTLY deletes all data. Use only when starting from scratch.
IMPORTANT: Volumes are stored on the worker nodes, not the manager node. You must SSH to each worker to delete them.
# Step 1: Remove the stack (from manager node)
docker stack rm cassandra
# Step 2: Wait for services to stop (30-60 seconds)
watch docker service ls
# Press Ctrl+C when cassandra services are gone
# Step 3: SSH to EACH worker and remove volumes (THIS DELETES ALL DATA!)
# On worker-2 (cassandra-1 node)
ssh dockeradmin@<worker-2-ip>
docker volume ls | grep cassandra # Verify volume exists
docker volume rm cassandra_cassandra-1-data
exit
# On worker-3 (cassandra-2 node)
ssh dockeradmin@<worker-3-ip>
docker volume ls | grep cassandra # Verify volume exists
docker volume rm cassandra_cassandra-2-data
exit
# On worker-4 (cassandra-3 node)
ssh dockeradmin@<worker-4-ip>
docker volume ls | grep cassandra # Verify volume exists
docker volume rm cassandra_cassandra-3-data
exit
# Step 4: Deploy fresh cluster (from manager node)
cd ~/stacks
./deploy-cassandra.sh
# You now have a fresh cluster with no data
# You'll need to recreate keyspaces and tables
Why volumes are on worker nodes:
- Docker Swarm creates volumes on the nodes where containers actually run
- Manager node only orchestrates - it doesn't store data
- Each worker node has its own volume for the Cassandra container running on it
When to use this:
- Testing deployment from scratch
- Recovering from corrupted data
- Major version upgrades requiring fresh install
- Development/staging environments
When NOT to use this:
- Production environments (use backups and restore instead)
- When you just need to restart nodes
- When troubleshooting connectivity issues
Scaling Considerations
Can you scale to more than 3 nodes?
Yes, but you'll need to:
- Create additional worker droplets
- Update
cassandra-stack.ymlto addcassandra-4,cassandra-5, etc. - Update the deployment script
- Run
nodetool rebuildon new nodes
Recommended minimum: 3 nodes Recommended maximum with 2GB RAM: 3-5 nodes
For production with proper 8GB RAM droplets, 5-7 nodes is common for large deployments.
Troubleshooting
Problem: Nodes Not Joining Cluster (Race Condition)
Symptom: Each node shows only itself when running nodetool status - no 3-node cluster formed.
Root Cause: If you deployed using docker stack deploy directly instead of the deployment script, all 3 nodes started simultaneously. They each tried to connect to the seed nodes before the others were ready, gave up, and formed separate single-node clusters.
Solution - Force Rolling Restart:
# On manager node, force update all services (triggers restart)
docker service update --force cassandra_cassandra-1
docker service update --force cassandra_cassandra-2
docker service update --force cassandra_cassandra-3
# Wait 5-8 minutes for each to restart and discover each other
# Then verify cluster from any worker:
docker exec -it $(docker ps -q --filter "name=cassandra") nodetool status
# You should now see all 3 nodes with UN status
Prevention: Always use the deploy-cassandra.sh script for initial deployment to avoid this race condition.
Problem: Nodes Not Joining Cluster (Other Causes)
Symptom: nodetool status shows only 1 node, or nodes show DN (Down)
Solutions:
-
Check firewall allows Cassandra ports:
# On each worker: sudo ufw status verbose | grep 7000 sudo ufw status verbose | grep 9042 # Should see rules allowing from 10.116.0.0/16 (your VPC subnet) -
Verify seeds configuration:
# Check service environment docker service inspect cassandra_cassandra-1 --format '{{.Spec.TaskTemplate.ContainerSpec.Env}}' # Should see: CASSANDRA_SEEDS=cassandra-1,cassandra-2,cassandra-3 -
Check inter-node connectivity:
# From cassandra-1 container (install tools first): apt-get update && apt-get install -y dnsutils netcat-openbsd # Test DNS resolution: nslookup cassandra-2 nslookup cassandra-3 # Test port connectivity: nc -zv cassandra-2 7000 nc -zv cassandra-3 7000 # Should all succeed -
Check service placement:
# Verify services are on correct nodes docker service ps cassandra_cassandra-1 docker service ps cassandra_cassandra-2 docker service ps cassandra_cassandra-3 # Each should be on its labeled node
Problem: Slow Startup
Symptom: Services stuck at 0/1 replicas for > 8 minutes
Solutions:
-
Check logs for errors:
docker service logs cassandra_cassandra-1 --tail 50 -
Verify memory constraints:
# With 2GB RAM, 512MB heap is configured # This is already minimal - slower startup is expected # Be patient and wait up to 10 minutes -
Check available memory on worker nodes:
# SSH to a worker and check memory free -h # Should show at least 1.5GB available after OS overhead -
Check disk space:
df -h # Should have plenty of free space
Problem: Can't Connect from Application
Symptom: Application can't reach Cassandra on port 9042
Solutions:
-
Ensure application is on same overlay network:
# In your application stack file: networks: mapleopentech-private-prod: external: true -
Test connectivity from application container:
# From app container: nc -zv cassandra-1 9042 # Should connect -
Use service names in application config:
# Use Docker Swarm service names (recommended): CASSANDRA_CONTACT_POINTS=cassandra-1,cassandra-2,cassandra-3 # These resolve automatically on the overlay network
Problem: Node Shows UJ (Up, Joining)
Symptom: Node stuck in joining state
Solution:
# This is normal for first 5-10 minutes with reduced memory
# Wait longer and check again
# If stuck > 15 minutes, restart that service:
docker service update --force cassandra_cassandra-2
Problem: Out of Memory Errors
Symptom: Container keeps restarting, logs show "Out of memory" or "Cannot allocate memory"
Solution:
This means 2GB RAM is insufficient. You have two options:
-
Upgrade droplets to 4GB RAM minimum (recommended):
- Resize each worker droplet in DigitalOcean
- Update stack file to use
MAX_HEAP_SIZE=1GandHEAP_NEWSIZE=256M - Redeploy:
docker stack rm cassandra && docker stack deploy -c cassandra-stack.yml cassandra
-
Further reduce heap (not recommended):
# In cassandra-stack.yml, change to: - MAX_HEAP_SIZE=384M - HEAP_NEWSIZE=96MThis will severely limit functionality and is not viable for any real workload.
Problem: Keyspace Already Exists Error
Symptom: AlreadyExists error when creating keyspaces
Solution:
This is normal if you've run the script before. The IF NOT EXISTS clause prevents actual errors. Your keyspaces are already created.
Installing Debugging Tools
When troubleshooting, you'll often need diagnostic tools inside the Cassandra containers. Here's how to install them:
Quick install of all useful debugging tools:
# SSH to any worker node, then run:
docker exec -it $(docker ps -q --filter "name=cassandra") bash -c "apt-get update && apt-get install -y dnsutils netcat-openbsd iputils-ping curl vim"
What this installs:
dnsutils- DNS tools (nslookup,dig)netcat-openbsd- Network connectivity testing (nc)iputils-ping- Ping utilitycurl- HTTP testingvim- Text editor
Example debugging workflow:
# Get into a Cassandra container
docker exec -it $(docker ps -q --filter "name=cassandra") bash
# Install tools (only needed once per container)
apt-get update && apt-get install -y dnsutils netcat-openbsd
# Test DNS resolution
nslookup cassandra-1
nslookup cassandra-2
nslookup cassandra-3
# Test port connectivity
nc -zv cassandra-1 7000 # Gossip port
nc -zv cassandra-2 9042 # CQL port
nc -zv cassandra-3 7000 # Gossip port
# Check cluster status
nodetool status
# Exit container
exit
Note: These tools are NOT persistent. If a container restarts, you'll need to reinstall them. For permanent installation, you would need to create a custom Docker image.
Next Steps
✅ You now have:
- 3-node Cassandra cluster with replication factor 3
- High availability (survives 1 node failure)
- Keyspaces ready for application data
- Swarm-managed containers with auto-restart
Next guides:
- Redis Setup - Cache layer for applications
- Application Deployment - Deploy backend services
- Monitoring - Set up cluster monitoring
Performance Notes
Hardware Sizing
Current setup (1 vCPU, 2GB RAM per node):
- NOT suitable for production - development/testing only
- Handles: ~500-1,000 writes/sec, ~5,000 reads/sec
- Storage: 50GB per node (150GB total raw, 50GB with RF=3)
- Expected issues: slow queries, GC pauses, limited connections
- Total cost: 3 nodes × $12 = $36/month
Recommended production setup (4 vCPU, 8GB RAM per node):
- Good for: Staging, small-to-medium production
- Handles: ~10,000 writes/sec, ~50,000 reads/sec
- Storage: 160GB per node (480GB total raw, 160GB with RF=3)
- Total cost: 3 nodes × $48 = $144/month
For larger production:
- Scale to 8 vCPU, 16GB RAM
- Add more workers (5-node, 7-node cluster)
- Use dedicated CPU droplets
Heap Size Tuning
Current: 512MB heap (with 2GB RAM total)
- Absolute minimum for Cassandra to run
- Expect frequent garbage collection
- Limited cache effectiveness
- Not recommended for production
Recommended configurations:
- 2GB RAM: 512MB heap (current - minimal)
- 4GB RAM: 1GB heap (small production)
- 8GB RAM: 2GB heap (recommended production)
- 16GB RAM: 4GB heap (high-traffic production)
Replication Factor
Current: RF=3 (recommended for production)
Options:
- RF=1: No redundancy, not recommended for production
- RF=2: Can tolerate 1 failure, less storage overhead
- RF=3: Best for production, tolerates 1 failure safely
- RF=5: For mission-critical data (requires 5+ nodes)
Upgrading to Production-Ready Configuration
If you started with 2GB RAM droplets and need to upgrade:
Step 1: Resize Droplets in DigitalOcean
- Go to each worker droplet (workers 2, 3, 4)
- Click Resize
- Select 8GB RAM / 4 vCPU plan
- Complete resize (droplets will reboot)
Step 2: Update Stack Configuration
SSH to manager and update the stack file:
ssh dockeradmin@<MANAGER_PUBLIC_IP>
cd ~/stacks
# Edit cassandra-stack.yml
vi cassandra-stack.yml
# Change these lines in ALL THREE services:
# FROM:
- MAX_HEAP_SIZE=512M
- HEAP_NEWSIZE=128M
# TO:
- MAX_HEAP_SIZE=2G
- HEAP_NEWSIZE=512M
Step 3: Redeploy
# Remove old stack
docker stack rm cassandra
# Wait for cleanup
sleep 30
# Deploy with new configuration
docker stack deploy -c cassandra-stack.yml cassandra
# Monitor startup
watch -n 2 'docker stack services cassandra'
Document Version: 1.1 Last Updated: November 3, 2025 Maintained By: Infrastructure Team Changelog:
- v1.1 (Nov 3, 2025): Updated for 2GB RAM droplets with reduced heap (512MB) - NOT production ready
- v1.0 (Nov 3, 2025): Initial version with 8GB RAM droplets
docker stack rm cassandra
Remove old volumes to start fresh
docker volume rm cassandra_cassandra-1-data cassandra_cassandra-2-data cassandra_cassandra-3-data
Install usefull debugging tools into our container.
docker exec -it $(docker ps -q --filter "name=cassandra") bash -c "apt-get update && apt-get install -y dnsutils && nslookup cassandra-2 && nslookup cassandra-3"