monorepo/cloud/infrastructure/production/automation
2025-12-02 14:33:08 -05:00
..
README.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00

Automation Scripts and Tools

Audience: DevOps Engineers, Automation Teams Purpose: Automated scripts, monitoring configs, and CI/CD pipelines for production infrastructure Prerequisites: Infrastructure deployed, basic scripting knowledge


Overview

This directory contains automation tools, scripts, and configurations to reduce manual operational overhead and ensure consistency across deployments.

What's automated:

  • Backup procedures (scheduled)
  • Deployment workflows (CI/CD)
  • Monitoring and alerting (Prometheus/Grafana configs)
  • Common maintenance tasks (scripts)
  • Infrastructure health checks

Directory Structure

automation/
├── README.md                    # This file
│
├── scripts/                     # Operational scripts
│   ├── backup-all.sh           # Master backup orchestrator
│   ├── backup-cassandra.sh     # Cassandra snapshot + upload
│   ├── backup-redis.sh         # Redis RDB/AOF backup
│   ├── backup-meilisearch.sh   # Meilisearch dump export
│   ├── deploy-backend.sh       # Backend deployment automation
│   ├── deploy-frontend.sh      # Frontend deployment automation
│   ├── health-check.sh         # Infrastructure health verification
│   ├── rotate-secrets.sh       # Secret rotation automation
│   └── cleanup-docker.sh       # Docker cleanup (images, containers)
│
├── monitoring/                  # Monitoring configurations
│   ├── prometheus.yml          # Prometheus scrape configs
│   ├── alertmanager.yml        # Alert routing and receivers
│   ├── alert-rules.yml         # Prometheus alert definitions
│   └── grafana-dashboards/     # JSON dashboard exports
│       ├── infrastructure.json
│       ├── maplepress.json
│       └── databases.json
│
└── ci-cd/                       # CI/CD pipeline examples
    ├── github-actions.yml      # GitHub Actions workflow
    ├── gitlab-ci.yml           # GitLab CI pipeline
    └── deployment-pipeline.md  # CI/CD setup guide

Scripts

Backup Scripts

All backup scripts are designed to be run via cron. They:

  • Create local snapshots/dumps
  • Compress and upload to DigitalOcean Spaces
  • Clean up old backups (retention policy)
  • Log to /var/log/
  • Exit with appropriate codes for monitoring

See ../operations/01_backup_recovery.md for complete script contents and setup instructions.

Installation:

# On manager node
ssh dockeradmin@<manager-ip>

# Copy scripts (once scripts are created in this directory)
sudo cp automation/scripts/backup-*.sh /usr/local/bin/
sudo chmod +x /usr/local/bin/backup-*.sh

# Schedule via cron
sudo crontab -e
# 0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1

Deployment Scripts

deploy-backend.sh - Automated backend deployment

#!/bin/bash
# Purpose: Deploy new backend version with zero downtime
# Usage: ./deploy-backend.sh [tag]
# Example: ./deploy-backend.sh prod

set -e

TAG=${1:-prod}
echo "=== Deploying Backend: Tag $TAG ==="

# Step 1: Build and push (from local dev machine)
echo "Building and pushing image..."
cd ~/go/src/codeberg.org/mapleopentech/monorepo/cloud/mapleopentech-backend
task deploy

# Step 2: Force pull on worker-6
echo "Forcing fresh pull on worker-6..."
ssh dockeradmin@<worker-6-ip> \
  "docker pull registry.digitalocean.com/ssp/maplepress_backend:$TAG"

# Step 3: Redeploy stack
echo "Redeploying stack..."
ssh dockeradmin@<manager-ip> << 'ENDSSH'
  cd ~/stacks
  docker stack rm maplepress
  sleep 10
  docker config rm maplepress_caddyfile 2>/dev/null || true
  docker stack deploy -c maplepress-stack.yml maplepress
ENDSSH

# Step 4: Verify deployment
echo "Verifying deployment..."
sleep 30
ssh dockeradmin@<manager-ip> << 'ENDSSH'
  docker service ps maplepress_backend | head -5
  docker service logs maplepress_backend --tail 20
ENDSSH

# Step 5: Health check
echo "Testing health endpoint..."
curl -f https://getmaplepress.ca/health || { echo "Health check failed!"; exit 1; }

echo "✅ Backend deployment complete!"

deploy-frontend.sh - Automated frontend deployment

#!/bin/bash
# Purpose: Deploy new frontend build
# Usage: ./deploy-frontend.sh

set -e

echo "=== Deploying Frontend ==="

# SSH to worker-7 and run deployment
ssh dockeradmin@<worker-7-ip> << 'ENDSSH'
  cd /var/www/monorepo

  echo "Pulling latest code..."
  git pull origin main

  cd web/maplepress-frontend

  echo "Configuring production environment..."
  cat > .env.production << 'EOF'
VITE_API_BASE_URL=https://getmaplepress.ca
NODE_ENV=production
EOF

  echo "Installing dependencies..."
  npm install

  echo "Building frontend..."
  npm run build

  echo "Verifying build..."
  if grep -q "getmaplepress.ca" dist/assets/*.js 2>/dev/null; then
    echo "✅ Production API URL confirmed"
  else
    echo "⚠️  Warning: Production URL not found in build"
  fi
ENDSSH

# Test frontend
echo "Testing frontend..."
curl -f https://getmaplepress.com || { echo "Frontend test failed!"; exit 1; }

echo "✅ Frontend deployment complete!"

Health Check Script

health-check.sh - Comprehensive infrastructure health verification

#!/bin/bash
# Purpose: Check health of all infrastructure components
# Usage: ./health-check.sh
# Exit codes: 0=healthy, 1=warnings, 2=critical

WARNINGS=0
CRITICAL=0

echo "=== Infrastructure Health Check ==="
echo "Started: $(date)"
echo ""

# Check all services
echo "--- Docker Services ---"
SERVICES_DOWN=$(docker service ls | grep -v "1/1" | grep -v "REPLICAS" | wc -l)
if [ $SERVICES_DOWN -gt 0 ]; then
  echo "⚠️  WARNING: $SERVICES_DOWN services not at full capacity"
  docker service ls | grep -v "1/1" | grep -v "REPLICAS"
  WARNINGS=$((WARNINGS + 1))
else
  echo "✅ All services running (1/1)"
fi

# Check all nodes
echo ""
echo "--- Docker Nodes ---"
NODES_DOWN=$(docker node ls | grep -v "Ready" | grep -v "STATUS" | wc -l)
if [ $NODES_DOWN -gt 0 ]; then
  echo "🔴 CRITICAL: $NODES_DOWN nodes not ready!"
  docker node ls | grep -v "Ready" | grep -v "STATUS"
  CRITICAL=$((CRITICAL + 1))
else
  echo "✅ All nodes ready"
fi

# Check disk space
echo ""
echo "--- Disk Space ---"
for NODE in worker-1 worker-2 worker-3 worker-4 worker-5 worker-6 worker-7; do
  DISK_USAGE=$(ssh -o StrictHostKeyChecking=no dockeradmin@$NODE "df -h / | tail -1 | awk '{print \$5}' | tr -d '%'")
  if [ $DISK_USAGE -gt 85 ]; then
    echo "🔴 CRITICAL: $NODE disk usage: ${DISK_USAGE}%"
    CRITICAL=$((CRITICAL + 1))
  elif [ $DISK_USAGE -gt 75 ]; then
    echo "⚠️  WARNING: $NODE disk usage: ${DISK_USAGE}%"
    WARNINGS=$((WARNINGS + 1))
  else
    echo "✅ $NODE disk usage: ${DISK_USAGE}%"
  fi
done

# Check endpoints
echo ""
echo "--- HTTP Endpoints ---"
if curl -sf https://getmaplepress.ca/health > /dev/null; then
  echo "✅ Backend health check passed"
else
  echo "🔴 CRITICAL: Backend health check failed!"
  CRITICAL=$((CRITICAL + 1))
fi

if curl -sf https://getmaplepress.com > /dev/null; then
  echo "✅ Frontend accessible"
else
  echo "🔴 CRITICAL: Frontend not accessible!"
  CRITICAL=$((CRITICAL + 1))
fi

# Summary
echo ""
echo "=== Summary ==="
echo "Warnings: $WARNINGS"
echo "Critical: $CRITICAL"

if [ $CRITICAL -gt 0 ]; then
  echo "🔴 Status: CRITICAL"
  exit 2
elif [ $WARNINGS -gt 0 ]; then
  echo "⚠️  Status: WARNING"
  exit 1
else
  echo "✅ Status: HEALTHY"
  exit 0
fi

Monitoring Configuration Files

Prometheus Configuration

Located at: monitoring/prometheus.yml

# See ../operations/02_monitoring_alerting.md for complete configuration
# This file should be copied to ~/stacks/monitoring-config/ on manager node

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - /etc/prometheus/alert-rules.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    dns_sd_configs:
      - names: ['tasks.node-exporter']
        type: 'A'
        port: 9100

  - job_name: 'cadvisor'
    dns_sd_configs:
      - names: ['tasks.cadvisor']
        type: 'A'
        port: 8080

  - job_name: 'maplepress-backend'
    static_configs:
      - targets: ['maplepress-backend:8000']
    metrics_path: '/metrics'

Alert Rules

Located at: monitoring/alert-rules.yml

See ../operations/02_monitoring_alerting.md for complete alert rule configurations.

Grafana Dashboards

Dashboard exports (JSON format) should be stored in monitoring/grafana-dashboards/.

To import:

  1. Access Grafana via SSH tunnel: ssh -L 3000:localhost:3000 dockeradmin@<manager-ip>
  2. Open http://localhost:3000
  3. Dashboards → Import → Upload JSON file

Recommended dashboards:

  • Infrastructure Overview (node metrics, disk, CPU, memory)
  • MaplePress Application (HTTP metrics, errors, latency)
  • Database Metrics (Cassandra, Redis, Meilisearch)

CI/CD Pipelines

GitHub Actions Example

File: ci-cd/github-actions.yml

name: Deploy to Production

on:
  push:
    branches:
      - main
    paths:
      - 'cloud/mapleopentech-backend/**'

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Set up Go
        uses: actions/setup-go@v4
        with:
          go-version: '1.21'

      - name: Run tests
        run: |
          cd cloud/mapleopentech-backend
          go test ./...

      - name: Install doctl
        uses: digitalocean/action-doctl@v2
        with:
          token: ${{ secrets.DIGITALOCEAN_TOKEN }}

      - name: Build and push Docker image
        run: |
          cd cloud/mapleopentech-backend
          doctl registry login
          docker build -t registry.digitalocean.com/ssp/maplepress_backend:prod .
          docker push registry.digitalocean.com/ssp/maplepress_backend:prod

      - name: Deploy to production
        uses: appleboy/ssh-action@master
        with:
          host: ${{ secrets.MANAGER_IP }}
          username: dockeradmin
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          script: |
            # Force pull on worker-6
            ssh dockeradmin@${{ secrets.WORKER_6_IP }} \
              "docker pull registry.digitalocean.com/ssp/maplepress_backend:prod"

            # Redeploy stack
            cd ~/stacks
            docker stack rm maplepress
            sleep 10
            docker config rm maplepress_caddyfile || true
            docker stack deploy -c maplepress-stack.yml maplepress

            # Wait and verify
            sleep 30
            docker service ps maplepress_backend | head -5

      - name: Health check
        run: |
          curl -f https://getmaplepress.ca/health || exit 1

      - name: Notify deployment
        if: always()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: 'Backend deployment ${{ job.status }}'
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

GitLab CI Example

File: ci-cd/gitlab-ci.yml

stages:
  - test
  - build
  - deploy

variables:
  DOCKER_IMAGE: registry.digitalocean.com/ssp/maplepress_backend
  DOCKER_TAG: prod

test:
  stage: test
  image: golang:1.21
  script:
    - cd cloud/mapleopentech-backend
    - go test ./...

build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - docker login registry.digitalocean.com -u $DIGITALOCEAN_TOKEN -p $DIGITALOCEAN_TOKEN
  script:
    - cd cloud/mapleopentech-backend
    - docker build -t $DOCKER_IMAGE:$DOCKER_TAG .
    - docker push $DOCKER_IMAGE:$DOCKER_TAG
  only:
    - main

deploy:
  stage: deploy
  image: alpine:latest
  before_script:
    - apk add --no-cache openssh-client
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
    - mkdir -p ~/.ssh
    - chmod 700 ~/.ssh
    - ssh-keyscan -H $MANAGER_IP >> ~/.ssh/known_hosts
  script:
    # Force pull on worker-6
    - ssh dockeradmin@$WORKER_6_IP "docker pull $DOCKER_IMAGE:$DOCKER_TAG"

    # Redeploy stack
    - |
      ssh dockeradmin@$MANAGER_IP << 'EOF'
        cd ~/stacks
        docker stack rm maplepress
        sleep 10
        docker config rm maplepress_caddyfile || true
        docker stack deploy -c maplepress-stack.yml maplepress
      EOF

    # Verify deployment
    - sleep 30
    - ssh dockeradmin@$MANAGER_IP "docker service ps maplepress_backend | head -5"

    # Health check
    - apk add --no-cache curl
    - curl -f https://getmaplepress.ca/health
  only:
    - main
  environment:
    name: production
    url: https://getmaplepress.ca

Usage Examples

Running Scripts Manually

# Backup all services
ssh dockeradmin@<manager-ip>
sudo /usr/local/bin/backup-all.sh

# Health check
ssh dockeradmin@<manager-ip>
sudo /usr/local/bin/health-check.sh
echo "Exit code: $?"
# 0 = healthy, 1 = warnings, 2 = critical

# Deploy backend
cd ~/monorepo/cloud/infrastructure/production
./automation/scripts/deploy-backend.sh prod

# Deploy frontend
./automation/scripts/deploy-frontend.sh

Scheduling Scripts with Cron

# Edit crontab on manager
ssh dockeradmin@<manager-ip>
sudo crontab -e

# Add these lines:

# Backup all services daily at 2 AM
0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1

# Health check every hour
0 * * * * /usr/local/bin/health-check.sh >> /var/log/health-check.log 2>&1

# Docker cleanup weekly (Sunday 3 AM)
0 3 * * 0 /usr/local/bin/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1

# Secret rotation monthly (1st of month, 4 AM)
0 4 1 * * /usr/local/bin/rotate-secrets.sh >> /var/log/secret-rotation.log 2>&1

Monitoring Script Execution

# View cron logs
sudo grep CRON /var/log/syslog | tail -20

# View specific script logs
tail -f /var/log/backup-all.log
tail -f /var/log/health-check.log

# Check script exit codes
echo "Last backup exit code: $?"

Best Practices

Script Development

  1. Always use set -e: Exit on first error
  2. Log everything: Redirect to /var/log/
  3. Use exit codes: 0=success, 1=warning, 2=critical
  4. Idempotent: Safe to run multiple times
  5. Document: Comments and usage instructions
  6. Test: Verify on staging before production

Secret Management

Never hardcode secrets in scripts!

# ❌ Bad
REDIS_PASSWORD="mysecret123"

# ✅ Good
REDIS_PASSWORD=$(docker exec redis cat /run/secrets/redis_password)

# ✅ Even better
REDIS_PASSWORD=$(cat /run/secrets/redis_password 2>/dev/null || echo "")
if [ -z "$REDIS_PASSWORD" ]; then
  echo "Error: Redis password not found"
  exit 1
fi

Error Handling

# Check command success
if ! docker service ls > /dev/null 2>&1; then
  echo "Error: Cannot connect to Docker"
  exit 2
fi

# Trap errors
trap 'echo "Script failed on line $LINENO"' ERR

# Verify prerequisites
for COMMAND in docker ssh s3cmd; do
  if ! command -v $COMMAND &> /dev/null; then
    echo "Error: $COMMAND not found"
    exit 1
  fi
done

Troubleshooting

Script Won't Execute

# Check permissions
ls -la /usr/local/bin/script.sh
# Should be: -rwxr-xr-x (executable)

# Fix permissions
sudo chmod +x /usr/local/bin/script.sh

# Check shebang
head -1 /usr/local/bin/script.sh
# Should be: #!/bin/bash

Cron Job Not Running

# Check cron service
sudo systemctl status cron

# Check cron logs
sudo grep CRON /var/log/syslog | tail -20

# Test cron environment
* * * * * /usr/bin/env > /tmp/cron-env.txt
# Wait 1 minute, then check /tmp/cron-env.txt

SSH Issues in Scripts

# Add SSH keys to ssh-agent
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa

# Disable strict host checking (only for internal network)
ssh -o StrictHostKeyChecking=no user@host "command"

# Use SSH config
cat >> ~/.ssh/config << EOF
Host worker-*
  StrictHostKeyChecking no
  UserKnownHostsFile=/dev/null
EOF

Contributing

When adding new automation:

  1. Place scripts in automation/scripts/
  2. Document usage in header comments
  3. Follow naming convention: verb-noun.sh
  4. Test thoroughly on staging
  5. Update this README with script description
  6. Add to appropriate cron schedule if applicable

Future Automation Ideas

Not yet implemented, but good candidates:

  • Automatic SSL certificate monitoring (separate from Caddy)
  • Database performance metrics collection
  • Automated capacity planning reports
  • Self-healing scripts (restart failed services)
  • Traffic spike detection and auto-scaling
  • Automated security vulnerability scanning
  • Log aggregation and analysis
  • Cost optimization recommendations

Last Updated: January 2025 Maintained By: Infrastructure Team

Note: Scripts in this directory are templates. Customize IP addresses, domains, and credentials for your specific environment before use.