monorepo/cloud/infrastructure/production/operations/HORIZONTAL_SCALING.md

26 KiB

Horizontal Scaling Operations Guide

Audience: DevOps Engineers, System Administrators Last Updated: November 2025 Applies To: MapleFile Backend


Table of Contents

  1. Overview
  2. Understanding Scaling
  3. Prerequisites
  4. Scaling Up (Adding Replicas)
  5. Scaling Down (Removing Replicas)
  6. Adding Worker Nodes
  7. Monitoring Scaled Services
  8. Common Scenarios
  9. Troubleshooting
  10. Best Practices

Overview

What is Horizontal Scaling?

Horizontal scaling means adding more servers (replicas) to handle increased load, rather than making existing servers more powerful (vertical scaling).

Example:

  • Before: 1 server handling 100 requests/second
  • After: 3 servers each handling 33 requests/second

Why Scale Horizontally?

  • Higher availability: If one server fails, others keep serving traffic
  • Better performance: Load distributed across multiple servers
  • Handle traffic spikes: Scale up during peak times, scale down during quiet times
  • Zero downtime deployments: Update servers one at a time

Current Architecture

Single-Server Setup (Current):

Worker-8: Backend (1 replica) + Cassandra + Redis
                    ↓
         100% of traffic

Multi-Server Setup (After Scaling):

Worker-8: Backend (replica 1) + Cassandra + Redis
Worker-10: Backend (replica 2)
Worker-11: Backend (replica 3)
          ↓           ↓           ↓
        33%         33%         34% of traffic

Understanding Scaling

Vertical vs Horizontal Scaling

Aspect Vertical Scaling Horizontal Scaling
Method Bigger server More servers
Cost Expensive (high-tier droplets) Cheaper (many small droplets)
Limit Hardware limit (max CPU/RAM) Unlimited (add more servers)
Downtime Required (resize server) Zero downtime
Complexity Simple More complex (load balancing)
Failure Single point of failure High availability

Example:

  • Vertical: Upgrade from $12/mo (2 vCPU, 2GB RAM) to $48/mo (8 vCPU, 16GB RAM)
  • Horizontal: Add 3x $12/mo droplets = $36/mo total for 6 vCPU, 6GB RAM

When to Scale

Scale up when:

  • CPU usage consistently above 70%
  • Memory usage consistently above 80%
  • Response times increasing
  • Error rates increasing
  • Traffic growing steadily

Scale down when:

  • CPU usage consistently below 30%
  • Memory usage consistently below 50%
  • Traffic decreased
  • Cost optimization needed

How Docker Swarm Handles Scaling

Docker Swarm automatically:

  • Load balances traffic across all replicas
  • Health checks each replica
  • Restarts failed replicas
  • Distributes replicas across worker nodes
  • Updates replicas with zero downtime

Prerequisites

Before Scaling Up

Ensure your application supports horizontal scaling:

MapleFile Backend is Ready

MapleFile backend is designed for horizontal scaling:

  • Stateless: No local state (uses Cassandra/Redis for shared state)
  • Leader election: Scheduled tasks run only on one instance
  • Shared database: All replicas use same Cassandra cluster
  • Shared cache: All replicas use same Redis instance
  • Session storage: JWT tokens are stateless (no session store needed)

⚠️ Check Your Application

If you were scaling a different app, verify:

  • No local file storage (use S3 instead)
  • No in-memory sessions (use Redis instead)
  • No local caching (use Redis instead)
  • Database supports concurrent connections
  • No port conflicts (don't bind to host ports)

Scaling Up (Adding Replicas)

Method 1: Quick Scale (Same Worker)

Scale to multiple replicas on the same worker node.

Step 1: SSH to Manager

ssh dockeradmin@<MANAGER_IP>

Step 2: Scale the Service

# Scale MapleFile backend from 1 to 3 replicas
docker service scale maplefile_backend=3

# Or use update command
docker service update --replicas 3 maplefile_backend

Step 3: Monitor Scaling

# Watch replicas start
watch docker service ls

# Expected output:
# NAME                REPLICAS   IMAGE
# maplefile_backend   3/3        ...maplefile-backend:prod

3/3 means: 3 desired replicas, 3 running

Step 4: Verify All Replicas Running

# Check where replicas are running
docker service ps maplefile_backend

# Output:
# NAME                  NODE       CURRENT STATE
# maplefile_backend.1   worker-8   Running 5 minutes ago
# maplefile_backend.2   worker-8   Running 30 seconds ago
# maplefile_backend.3   worker-8   Running 30 seconds ago

Step 5: Check Logs

# Check logs from all replicas
docker service logs maplefile_backend --tail 50

# Look for successful startup from each replica

Step 6: Test Load Balancing

# Make multiple requests - should be distributed across replicas
for i in {1..10}; do
  curl -s https://maplefile.ca/health
done

# Check logs to see different replicas handling requests
docker service logs maplefile_backend --tail 20

Method 2: Scale Across Multiple Workers

Scale replicas across different worker nodes for better availability.

Step 1: Add Worker Nodes (If Needed)

See Adding Worker Nodes section below.

Step 2: Label Worker Nodes

# Label worker-10 as backend node
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-10-prod

# Label worker-11 as backend node
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-11-prod

# Verify labels
docker node inspect mapleopentech-swarm-worker-10-prod --format '{{.Spec.Labels}}'

Step 3: Update Stack File

# Edit stack file
nano ~/stacks/maplefile-stack.yml

Change deployment configuration:

services:
  backend:
    deploy:
      replicas: 3  # Change from 1 to 3
      placement:
        constraints:
          # Remove single-node constraint
          - node.labels.maplefile-backend == true  # Now matches multiple workers
        preferences:
          # Spread replicas across different nodes
          - spread: node.hostname

Step 4: Redeploy Stack

cd ~/stacks
docker stack deploy -c maplefile-stack.yml maplefile

Step 5: Verify Distribution

# Check which nodes replicas are running on
docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}"

# Expected output (distributed across nodes):
# NAME                  NODE       CURRENT STATE
# maplefile_backend.1   worker-8   Running
# maplefile_backend.2   worker-10  Running
# maplefile_backend.3   worker-11  Running

Method 3: Auto-Scaling (Advanced)

Note: Docker Swarm doesn't have built-in auto-scaling. You would need to implement custom auto-scaling using:

  • Prometheus for metrics
  • Custom script to monitor CPU/memory
  • Script to scale service based on thresholds

Example auto-scale script:

#!/bin/bash
# auto-scale.sh - Example only, not production-ready

# Get average CPU usage across all replicas
CPU_AVG=$(docker stats --no-stream --format "{{.CPUPerc}}" | grep maplefile | awk '{sum+=$1; count++} END {print sum/count}')

# Scale up if CPU > 70%
if (( $(echo "$CPU_AVG > 70" | bc -l) )); then
  CURRENT=$(docker service inspect maplefile_backend --format '{{.Spec.Mode.Replicated.Replicas}}')
  NEW=$((CURRENT + 1))
  docker service scale maplefile_backend=$NEW
  echo "Scaled up to $NEW replicas (CPU: $CPU_AVG%)"
fi

# Scale down if CPU < 30% and more than 1 replica
if (( $(echo "$CPU_AVG < 30" | bc -l) )); then
  CURRENT=$(docker service inspect maplefile_backend --format '{{.Spec.Mode.Replicated.Replicas}}')
  if [ $CURRENT -gt 1 ]; then
    NEW=$((CURRENT - 1))
    docker service scale maplefile_backend=$NEW
    echo "Scaled down to $NEW replicas (CPU: $CPU_AVG%)"
  fi
fi

Scaling Down (Removing Replicas)

When to Scale Down

Scale down to save costs when:

  • Traffic decreased
  • CPU/memory usage consistently low
  • Cost optimization needed
  • Testing showed fewer replicas handle load fine

Step 1: SSH to Manager

ssh dockeradmin@<MANAGER_IP>

Step 2: Scale Down Service

# Scale from 3 replicas to 1
docker service scale maplefile_backend=1

# Or update
docker service update --replicas 1 maplefile_backend

Step 3: Monitor Scaling Down

# Watch replicas stop
watch docker service ls

# Expected output:
# NAME                REPLICAS   IMAGE
# maplefile_backend   1/1        ...maplefile-backend:prod

Step 4: Verify Which Replica Kept

# Check which replica is still running
docker service ps maplefile_backend

# Output:
# NAME                  NODE       CURRENT STATE
# maplefile_backend.1   worker-8   Running 10 minutes ago
# maplefile_backend.2   worker-10  Shutdown 10 seconds ago
# maplefile_backend.3   worker-11  Shutdown 10 seconds ago

Step 5: Test Service Still Works

# Test endpoint
curl https://maplefile.ca/health

# Check logs
docker service logs maplefile_backend --tail 20

Adding Worker Nodes

When to Add Worker Nodes

Add worker nodes when:

  • Want to distribute backend across multiple servers
  • Current worker at capacity
  • Need better high availability
  • Planning for growth

Step 1: Create New DigitalOcean Droplet

From DigitalOcean dashboard or CLI:

# Create worker-10 droplet (Ubuntu 22.04, $12/mo)
doctl compute droplet create mapleopentech-swarm-worker-10-prod \
  --region nyc3 \
  --size s-2vcpu-2gb \
  --image ubuntu-22-04-x64 \
  --ssh-keys <your-ssh-key-id> \
  --tag-names production,swarm-worker,maplefile

# Get IP address
doctl compute droplet get mapleopentech-swarm-worker-10-prod --format PublicIPv4

Step 2: Install Docker on New Worker

# SSH to new worker
ssh root@<worker-10-ip>

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

# Verify Docker installed
docker --version

Step 3: Join Worker to Swarm

On manager node:

# Get join token
ssh dockeradmin@<MANAGER_IP>
docker swarm join-token worker

# Output:
# docker swarm join --token SWMTKN-xxx <MANAGER_IP>:2377

On new worker:

# Join swarm (use token from above)
docker swarm join --token SWMTKN-xxx <MANAGER_IP>:2377

# Output:
# This node joined a swarm as a worker.

Step 4: Verify Worker Joined

On manager:

# List all nodes
docker node ls

# Output should include new worker:
# ID        HOSTNAME                            STATUS  AVAILABILITY
# xyz123    mapleopentech-swarm-manager-1-prod      Ready   Active        Leader
# abc456    mapleopentech-swarm-worker-8-prod       Ready   Active
# def789    mapleopentech-swarm-worker-10-prod      Ready   Active        ← New!

Step 5: Label New Worker

# Label worker-10 for backend workloads
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-10-prod

# Verify label
docker node inspect mapleopentech-swarm-worker-10-prod --format '{{.Spec.Labels}}'

Step 6: Join to Private Network

Important: Workers must access Cassandra and Redis.

# Add worker-10 to maple-private-prod network
# This is done automatically when services start on the worker
# But verify connectivity:

# On worker-10, test Redis connectivity
ssh root@<worker-10-ip>
docker run --rm --network maple-private-prod redis:7.0-alpine redis-cli -h redis ping

# Should output: PONG

Step 7: Scale Service to Use New Worker

# On manager
docker service update --replicas 2 maplefile_backend

# Check distribution
docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}"

# Should show replicas on both worker-8 and worker-10

Monitoring Scaled Services

Real-Time Monitoring

Watch service status:

# All services
watch docker service ls

# Specific service
watch 'docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}"'

Monitor resource usage:

# CPU and memory of all replicas
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

# Continuous monitoring
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

Check Load Distribution

See which replica handled request:

# Follow logs from all replicas
docker service logs -f maplefile_backend

# Filter for specific endpoint
docker service logs -f maplefile_backend | grep "/api/v1/users"

# You should see different replica IDs in logs

Prometheus Monitoring (If Configured)

Query metrics:

# Average CPU usage across all replicas
avg(rate(container_cpu_usage_seconds_total{service="maplefile_backend"}[5m]))

# Request rate per replica
sum(rate(http_requests_total{service="maplefile_backend"}[5m])) by (instance)

# P95 response time
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

Common Scenarios

Scenario 1: Handling Traffic Spike

Sudden traffic increase - need to scale quickly.

# SSH to manager
ssh dockeradmin@<MANAGER_IP>

# Check current load
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

# Scale from 1 to 5 replicas immediately
docker service scale maplefile_backend=5

# Monitor scaling
watch docker service ls

# Wait for all replicas healthy (5/5)

# Verify load distributed
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

# After traffic spike ends, scale back down
docker service scale maplefile_backend=2

Scenario 2: Planned Scaling for Event

You know a marketing campaign will increase traffic.

Day Before Event:

# Add worker nodes if needed (see Adding Worker Nodes section)

# Scale up gradually
docker service scale maplefile_backend=3

# Verify all healthy
docker service ps maplefile_backend

# Load test
# Run load tests to verify system handles expected traffic

During Event:

# Monitor continuously
watch docker service ls

# Scale up more if needed
docker service scale maplefile_backend=5

# Check logs for errors
docker service logs maplefile_backend --tail 100 | grep -i error

After Event:

# Scale back down gradually
docker service scale maplefile_backend=3

# Monitor for 1 hour

# Scale to normal
docker service scale maplefile_backend=1

Scenario 3: Zero-Downtime Deployment with Scaling

Deploy new version with zero downtime using scaled replicas.

# 1. Scale up to 3 replicas BEFORE deploying
docker service scale maplefile_backend=3

# Wait for all healthy
docker service ps maplefile_backend

# 2. Deploy new image
docker service update --image registry.digitalocean.com/ssp/maplefile-backend:prod maplefile_backend

# Docker Swarm will:
# - Update replica 1, wait for health check
# - Update replica 2, wait for health check
# - Update replica 3, wait for health check
# Always at least 2 replicas serving traffic

# 3. Monitor update
docker service ps maplefile_backend

# 4. After successful deployment, scale back down if desired
docker service scale maplefile_backend=1

Scenario 4: High Availability Setup

Run 3 replicas across 3 workers for maximum availability.

# Ensure 3 worker nodes labeled
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-8-prod
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-10-prod
docker node update --label-add maplefile-backend=true mapleopentech-swarm-worker-11-prod

# Update stack file for HA
nano ~/stacks/maplefile-stack.yml

Stack file HA configuration:

services:
  backend:
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.labels.maplefile-backend == true
        preferences:
          # Spread across different nodes
          - spread: node.hostname
        max_replicas_per_node: 1  # Only 1 replica per node
      update_config:
        parallelism: 1  # Update 1 replica at a time
        delay: 10s
        failure_action: rollback
        monitor: 60s
        order: start-first  # Start new before stopping old

Deploy HA stack:

docker stack deploy -c maplefile-stack.yml maplefile

# Verify distribution
docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}"

# Should show 1 replica on each worker

Scenario 5: Cost Optimization

Running 3 replicas but only need 1 during off-peak hours.

Create scale-down script:

# Create script
cat > ~/stacks/scale-schedule.sh << 'EOF'
#!/bin/bash

HOUR=$(date +%H)

# Scale up during business hours (9 AM - 6 PM)
if [ $HOUR -ge 9 ] && [ $HOUR -lt 18 ]; then
  docker service scale maplefile_backend=3
  echo "$(date): Scaled to 3 replicas (business hours)"
# Scale down during off-peak (6 PM - 9 AM)
else
  docker service scale maplefile_backend=1
  echo "$(date): Scaled to 1 replica (off-peak)"
fi
EOF

chmod +x ~/stacks/scale-schedule.sh

Add to crontab:

# Run every hour
crontab -e

# Add:
0 * * * * /root/stacks/scale-schedule.sh >> /var/log/maplefile-scaling.log 2>&1

Troubleshooting

Problem: Replica Won't Start

Symptom: Service shows 2/3 replicas (one missing)

Diagnosis:

# Check service tasks
docker service ps maplefile_backend --no-trunc

# Look for ERROR or FAILED states
# Common errors:
# - "no suitable node"
# - "resource constraints not met"
# - "starting container failed"

Solutions:

If "no suitable node":

# Check node availability
docker node ls

# Check placement constraints
docker service inspect maplefile_backend --format '{{.Spec.TaskTemplate.Placement}}'

# Fix: Add more worker nodes or adjust constraints

If "resource constraints":

# Check worker resources
docker node inspect <worker-name> --format '{{.Description.Resources}}'

# Fix: Add more memory/CPU or scale down other services

If "container failed to start":

# Check logs
docker service logs maplefile_backend --tail 100

# Fix: Resolve application error (database connection, etc.)

Problem: Uneven Load Distribution

Symptom: One replica handling more traffic than others

Diagnosis:

# Check CPU/memory per replica
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

# Check request logs
docker service logs maplefile_backend | grep "HTTP request"

Causes:

  • External load balancer pinning connections
  • Long-lived connections (WebSockets)
  • Some replicas slower (different hardware)

Solution:

# Ensure using Docker Swarm's built-in load balancer (ingress network)
# Check service network mode
docker service inspect maplefile_backend --format '{{.Spec.EndpointSpec.Mode}}'

# Should be: vip (virtual IP for load balancing)

# If not, update service
docker service update --endpoint-mode vip maplefile_backend

Problem: Replica on Wrong Node

Symptom: Replica running on node without required labels

Diagnosis:

# Check where replicas are running
docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}"

# Check node labels
docker node inspect <node-name> --format '{{.Spec.Labels}}'

Solution:

# Add label to node
docker node update --label-add maplefile-backend=true <node-name>

# Or force replica to move
docker service update --force maplefile_backend

Problem: Can't Scale Down

Symptom: docker service scale hangs or fails

Diagnosis:

# Check service update status
docker service inspect maplefile_backend --format '{{.UpdateStatus.State}}'

# Check for stuck tasks
docker service ps maplefile_backend --no-trunc

Solution:

# Cancel stuck update
docker service update --rollback maplefile_backend

# Force scale
docker service update --replicas 1 --force maplefile_backend

Problem: Leader Election Issues (Multiple Leaders)

Symptom: Scheduled tasks running multiple times

Diagnosis:

# Check logs for leader election messages
docker service logs maplefile_backend | grep -i "leader"

# Should see only one "Elected as leader"

Cause: Redis connection issues or split-brain

Solution:

# Restart all replicas to re-elect leader
docker service update --force maplefile_backend

# Verify single leader in logs
docker service logs maplefile_backend --tail 50 | grep -i "leader"

Best Practices

1. Start Small, Scale Gradually

# Don't go from 1 to 10 replicas immediately
# Scale gradually:
docker service scale maplefile_backend=2  # Test with 2
# Monitor for 30 minutes
docker service scale maplefile_backend=3  # Increase to 3
# Monitor for 30 minutes
docker service scale maplefile_backend=5  # Increase to 5

2. Always Scale Before Deploying

# Scale up for safer deployments
docker service scale maplefile_backend=3
docker service update --image ...new-image... maplefile_backend
# Can scale back down after deployment succeeds

3. Use Health Checks

Ensure stack file has health checks:

services:
  backend:
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8000/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 60s

4. Monitor Resource Usage

# Check BEFORE scaling
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# If CPU < 50%, probably don't need to scale yet
# If CPU > 70%, scale up
# If CPU > 90%, scale urgently

5. Document Scaling Decisions

Keep a scaling log:

## Scaling Log

### 2025-11-14 - Scaled to 3 Replicas
- **Reason:** Marketing campaign expected to 3x traffic
- **Duration:** 2025-11-14 to 2025-11-16
- **Command:** `docker service scale maplefile_backend=3`
- **Result:** Successfully handled 3x traffic, CPU avg 45%
- **Cost:** +$24/mo for 2 extra droplets

### 2025-11-16 - Scaled back to 1 Replica
- **Reason:** Campaign ended, traffic back to normal
- **Command:** `docker service scale maplefile_backend=1`
- **Result:** Single replica handling load fine, CPU avg 35%

6. Test Scaling in Non-Production First

If you have QA environment:

# Test scaling in QA
ssh qa-manager
docker service scale maplefile_backend_qa=3

# Verify works correctly
# - Load balancing
# - Leader election
# - Database connections
# - Performance

# Then apply to production
ssh dockeradmin@<MANAGER_IP>
docker service scale maplefile_backend=3

7. Plan for Database Connections

Each replica needs database connections:

# If you have 3 replicas with 2 connections each = 6 total connections
# Ensure Cassandra can handle this

# Check Cassandra connection limit (default: high)
# Check Redis connection limit (default: 10000)

# If scaling to 10+ replicas, verify database can handle connections

8. Consider Cost vs Performance

Calculate costs:

# Current: 1 replica on worker-8 ($12/mo)
# Total: $12/mo

# Scaled: 3 replicas across 3 workers ($12/mo each)
# Total: $36/mo (+$24/mo)

# Is the performance gain worth $24/mo?
# - If traffic justifies it: Yes
# - If just for redundancy: Maybe use 2 replicas instead

9. Use Placement Strategies

Spread across nodes for HA:

deploy:
  placement:
    preferences:
      - spread: node.hostname  # Spread across different nodes

Or pack onto fewer nodes for cost:

deploy:
  placement:
    preferences:
      - spread: node.id  # Pack onto fewer nodes first

10. Set Resource Limits

Prevent one replica from using all resources:

services:
  backend:
    deploy:
      resources:
        limits:
          memory: 1G     # Max 1GB per replica
          cpus: '0.5'    # Max 50% of 1 CPU
        reservations:
          memory: 512M   # Reserve 512MB
          cpus: '0.25'   # Reserve 25% of 1 CPU

Quick Reference

Essential Commands

# Scale service
docker service scale maplefile_backend=3
docker service update --replicas 3 maplefile_backend

# Check replicas
docker service ls | grep maplefile
docker service ps maplefile_backend

# Monitor resources
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep maplefile

# Check distribution
docker service ps maplefile_backend --format "table {{.Name}}\t{{.Node}}\t{{.CurrentState}}"

# Scale down
docker service scale maplefile_backend=1

# Force update (re-distribute replicas)
docker service update --force maplefile_backend

Scaling Decision Matrix

CPU Usage Memory Usage Action
< 30% < 50% Scale down or keep current
30-70% 50-80% Keep current (optimal)
70-85% 80-90% Scale up soon (planned)
> 85% > 90% Scale up now (urgent)

Replica Count Guidelines

Traffic Level Suggested Replicas Cost
Development 1 $12/mo
Low (< 1000 req/day) 1 $12/mo
Medium (1000-10000 req/day) 2-3 $24-36/mo
High (10000-100000 req/day) 5-10 $60-120/mo
Very High (> 100000 req/day) 10+ $120+/mo


Questions?

  • Check service status: docker service ls | grep maplefile
  • Check replica distribution: docker service ps maplefile_backend
  • Monitor resources: docker stats | grep maplefile

Last Updated: November 2025