# Automation Scripts and Tools **Audience**: DevOps Engineers, Automation Teams **Purpose**: Automated scripts, monitoring configs, and CI/CD pipelines for production infrastructure **Prerequisites**: Infrastructure deployed, basic scripting knowledge --- ## Overview This directory contains automation tools, scripts, and configurations to reduce manual operational overhead and ensure consistency across deployments. **What's automated:** - Backup procedures (scheduled) - Deployment workflows (CI/CD) - Monitoring and alerting (Prometheus/Grafana configs) - Common maintenance tasks (scripts) - Infrastructure health checks --- ## Directory Structure ``` automation/ ├── README.md # This file │ ├── scripts/ # Operational scripts │ ├── backup-all.sh # Master backup orchestrator │ ├── backup-cassandra.sh # Cassandra snapshot + upload │ ├── backup-redis.sh # Redis RDB/AOF backup │ ├── backup-meilisearch.sh # Meilisearch dump export │ ├── deploy-backend.sh # Backend deployment automation │ ├── deploy-frontend.sh # Frontend deployment automation │ ├── health-check.sh # Infrastructure health verification │ ├── rotate-secrets.sh # Secret rotation automation │ └── cleanup-docker.sh # Docker cleanup (images, containers) │ ├── monitoring/ # Monitoring configurations │ ├── prometheus.yml # Prometheus scrape configs │ ├── alertmanager.yml # Alert routing and receivers │ ├── alert-rules.yml # Prometheus alert definitions │ └── grafana-dashboards/ # JSON dashboard exports │ ├── infrastructure.json │ ├── maplepress.json │ └── databases.json │ └── ci-cd/ # CI/CD pipeline examples ├── github-actions.yml # GitHub Actions workflow ├── gitlab-ci.yml # GitLab CI pipeline └── deployment-pipeline.md # CI/CD setup guide ``` --- ## Scripts ### Backup Scripts All backup scripts are designed to be run via cron. They: - Create local snapshots/dumps - Compress and upload to DigitalOcean Spaces - Clean up old backups (retention policy) - Log to `/var/log/` - Exit with appropriate codes for monitoring **See `../operations/01_backup_recovery.md` for complete script contents and setup instructions.** **Installation:** ```bash # On manager node ssh dockeradmin@ # Copy scripts (once scripts are created in this directory) sudo cp automation/scripts/backup-*.sh /usr/local/bin/ sudo chmod +x /usr/local/bin/backup-*.sh # Schedule via cron sudo crontab -e # 0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1 ``` ### Deployment Scripts **`deploy-backend.sh`** - Automated backend deployment ```bash #!/bin/bash # Purpose: Deploy new backend version with zero downtime # Usage: ./deploy-backend.sh [tag] # Example: ./deploy-backend.sh prod set -e TAG=${1:-prod} echo "=== Deploying Backend: Tag $TAG ===" # Step 1: Build and push (from local dev machine) echo "Building and pushing image..." cd ~/go/src/codeberg.org/mapleopentech/monorepo/cloud/mapleopentech-backend task deploy # Step 2: Force pull on worker-6 echo "Forcing fresh pull on worker-6..." ssh dockeradmin@ \ "docker pull registry.digitalocean.com/ssp/maplepress_backend:$TAG" # Step 3: Redeploy stack echo "Redeploying stack..." ssh dockeradmin@ << 'ENDSSH' cd ~/stacks docker stack rm maplepress sleep 10 docker config rm maplepress_caddyfile 2>/dev/null || true docker stack deploy -c maplepress-stack.yml maplepress ENDSSH # Step 4: Verify deployment echo "Verifying deployment..." sleep 30 ssh dockeradmin@ << 'ENDSSH' docker service ps maplepress_backend | head -5 docker service logs maplepress_backend --tail 20 ENDSSH # Step 5: Health check echo "Testing health endpoint..." curl -f https://getmaplepress.ca/health || { echo "Health check failed!"; exit 1; } echo "✅ Backend deployment complete!" ``` **`deploy-frontend.sh`** - Automated frontend deployment ```bash #!/bin/bash # Purpose: Deploy new frontend build # Usage: ./deploy-frontend.sh set -e echo "=== Deploying Frontend ===" # SSH to worker-7 and run deployment ssh dockeradmin@ << 'ENDSSH' cd /var/www/monorepo echo "Pulling latest code..." git pull origin main cd web/maplepress-frontend echo "Configuring production environment..." cat > .env.production << 'EOF' VITE_API_BASE_URL=https://getmaplepress.ca NODE_ENV=production EOF echo "Installing dependencies..." npm install echo "Building frontend..." npm run build echo "Verifying build..." if grep -q "getmaplepress.ca" dist/assets/*.js 2>/dev/null; then echo "✅ Production API URL confirmed" else echo "⚠️ Warning: Production URL not found in build" fi ENDSSH # Test frontend echo "Testing frontend..." curl -f https://getmaplepress.com || { echo "Frontend test failed!"; exit 1; } echo "✅ Frontend deployment complete!" ``` ### Health Check Script **`health-check.sh`** - Comprehensive infrastructure health verification ```bash #!/bin/bash # Purpose: Check health of all infrastructure components # Usage: ./health-check.sh # Exit codes: 0=healthy, 1=warnings, 2=critical WARNINGS=0 CRITICAL=0 echo "=== Infrastructure Health Check ===" echo "Started: $(date)" echo "" # Check all services echo "--- Docker Services ---" SERVICES_DOWN=$(docker service ls | grep -v "1/1" | grep -v "REPLICAS" | wc -l) if [ $SERVICES_DOWN -gt 0 ]; then echo "⚠️ WARNING: $SERVICES_DOWN services not at full capacity" docker service ls | grep -v "1/1" | grep -v "REPLICAS" WARNINGS=$((WARNINGS + 1)) else echo "✅ All services running (1/1)" fi # Check all nodes echo "" echo "--- Docker Nodes ---" NODES_DOWN=$(docker node ls | grep -v "Ready" | grep -v "STATUS" | wc -l) if [ $NODES_DOWN -gt 0 ]; then echo "🔴 CRITICAL: $NODES_DOWN nodes not ready!" docker node ls | grep -v "Ready" | grep -v "STATUS" CRITICAL=$((CRITICAL + 1)) else echo "✅ All nodes ready" fi # Check disk space echo "" echo "--- Disk Space ---" for NODE in worker-1 worker-2 worker-3 worker-4 worker-5 worker-6 worker-7; do DISK_USAGE=$(ssh -o StrictHostKeyChecking=no dockeradmin@$NODE "df -h / | tail -1 | awk '{print \$5}' | tr -d '%'") if [ $DISK_USAGE -gt 85 ]; then echo "🔴 CRITICAL: $NODE disk usage: ${DISK_USAGE}%" CRITICAL=$((CRITICAL + 1)) elif [ $DISK_USAGE -gt 75 ]; then echo "⚠️ WARNING: $NODE disk usage: ${DISK_USAGE}%" WARNINGS=$((WARNINGS + 1)) else echo "✅ $NODE disk usage: ${DISK_USAGE}%" fi done # Check endpoints echo "" echo "--- HTTP Endpoints ---" if curl -sf https://getmaplepress.ca/health > /dev/null; then echo "✅ Backend health check passed" else echo "🔴 CRITICAL: Backend health check failed!" CRITICAL=$((CRITICAL + 1)) fi if curl -sf https://getmaplepress.com > /dev/null; then echo "✅ Frontend accessible" else echo "🔴 CRITICAL: Frontend not accessible!" CRITICAL=$((CRITICAL + 1)) fi # Summary echo "" echo "=== Summary ===" echo "Warnings: $WARNINGS" echo "Critical: $CRITICAL" if [ $CRITICAL -gt 0 ]; then echo "🔴 Status: CRITICAL" exit 2 elif [ $WARNINGS -gt 0 ]; then echo "⚠️ Status: WARNING" exit 1 else echo "✅ Status: HEALTHY" exit 0 fi ``` --- ## Monitoring Configuration Files ### Prometheus Configuration **Located at**: `monitoring/prometheus.yml` ```yaml # See ../operations/02_monitoring_alerting.md for complete configuration # This file should be copied to ~/stacks/monitoring-config/ on manager node global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] rule_files: - /etc/prometheus/alert-rules.yml scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'node-exporter' dns_sd_configs: - names: ['tasks.node-exporter'] type: 'A' port: 9100 - job_name: 'cadvisor' dns_sd_configs: - names: ['tasks.cadvisor'] type: 'A' port: 8080 - job_name: 'maplepress-backend' static_configs: - targets: ['maplepress-backend:8000'] metrics_path: '/metrics' ``` ### Alert Rules **Located at**: `monitoring/alert-rules.yml` See `../operations/02_monitoring_alerting.md` for complete alert rule configurations. ### Grafana Dashboards **Dashboard exports** (JSON format) should be stored in `monitoring/grafana-dashboards/`. **To import:** 1. Access Grafana via SSH tunnel: `ssh -L 3000:localhost:3000 dockeradmin@` 2. Open http://localhost:3000 3. Dashboards → Import → Upload JSON file **Recommended dashboards:** - Infrastructure Overview (node metrics, disk, CPU, memory) - MaplePress Application (HTTP metrics, errors, latency) - Database Metrics (Cassandra, Redis, Meilisearch) --- ## CI/CD Pipelines ### GitHub Actions Example **File:** `ci-cd/github-actions.yml` ```yaml name: Deploy to Production on: push: branches: - main paths: - 'cloud/mapleopentech-backend/**' jobs: build-and-deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set up Go uses: actions/setup-go@v4 with: go-version: '1.21' - name: Run tests run: | cd cloud/mapleopentech-backend go test ./... - name: Install doctl uses: digitalocean/action-doctl@v2 with: token: ${{ secrets.DIGITALOCEAN_TOKEN }} - name: Build and push Docker image run: | cd cloud/mapleopentech-backend doctl registry login docker build -t registry.digitalocean.com/ssp/maplepress_backend:prod . docker push registry.digitalocean.com/ssp/maplepress_backend:prod - name: Deploy to production uses: appleboy/ssh-action@master with: host: ${{ secrets.MANAGER_IP }} username: dockeradmin key: ${{ secrets.SSH_PRIVATE_KEY }} script: | # Force pull on worker-6 ssh dockeradmin@${{ secrets.WORKER_6_IP }} \ "docker pull registry.digitalocean.com/ssp/maplepress_backend:prod" # Redeploy stack cd ~/stacks docker stack rm maplepress sleep 10 docker config rm maplepress_caddyfile || true docker stack deploy -c maplepress-stack.yml maplepress # Wait and verify sleep 30 docker service ps maplepress_backend | head -5 - name: Health check run: | curl -f https://getmaplepress.ca/health || exit 1 - name: Notify deployment if: always() uses: 8398a7/action-slack@v3 with: status: ${{ job.status }} text: 'Backend deployment ${{ job.status }}' webhook_url: ${{ secrets.SLACK_WEBHOOK }} ``` ### GitLab CI Example **File:** `ci-cd/gitlab-ci.yml` ```yaml stages: - test - build - deploy variables: DOCKER_IMAGE: registry.digitalocean.com/ssp/maplepress_backend DOCKER_TAG: prod test: stage: test image: golang:1.21 script: - cd cloud/mapleopentech-backend - go test ./... build: stage: build image: docker:latest services: - docker:dind before_script: - docker login registry.digitalocean.com -u $DIGITALOCEAN_TOKEN -p $DIGITALOCEAN_TOKEN script: - cd cloud/mapleopentech-backend - docker build -t $DOCKER_IMAGE:$DOCKER_TAG . - docker push $DOCKER_IMAGE:$DOCKER_TAG only: - main deploy: stage: deploy image: alpine:latest before_script: - apk add --no-cache openssh-client - eval $(ssh-agent -s) - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - - mkdir -p ~/.ssh - chmod 700 ~/.ssh - ssh-keyscan -H $MANAGER_IP >> ~/.ssh/known_hosts script: # Force pull on worker-6 - ssh dockeradmin@$WORKER_6_IP "docker pull $DOCKER_IMAGE:$DOCKER_TAG" # Redeploy stack - | ssh dockeradmin@$MANAGER_IP << 'EOF' cd ~/stacks docker stack rm maplepress sleep 10 docker config rm maplepress_caddyfile || true docker stack deploy -c maplepress-stack.yml maplepress EOF # Verify deployment - sleep 30 - ssh dockeradmin@$MANAGER_IP "docker service ps maplepress_backend | head -5" # Health check - apk add --no-cache curl - curl -f https://getmaplepress.ca/health only: - main environment: name: production url: https://getmaplepress.ca ``` --- ## Usage Examples ### Running Scripts Manually ```bash # Backup all services ssh dockeradmin@ sudo /usr/local/bin/backup-all.sh # Health check ssh dockeradmin@ sudo /usr/local/bin/health-check.sh echo "Exit code: $?" # 0 = healthy, 1 = warnings, 2 = critical # Deploy backend cd ~/monorepo/cloud/infrastructure/production ./automation/scripts/deploy-backend.sh prod # Deploy frontend ./automation/scripts/deploy-frontend.sh ``` ### Scheduling Scripts with Cron ```bash # Edit crontab on manager ssh dockeradmin@ sudo crontab -e # Add these lines: # Backup all services daily at 2 AM 0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1 # Health check every hour 0 * * * * /usr/local/bin/health-check.sh >> /var/log/health-check.log 2>&1 # Docker cleanup weekly (Sunday 3 AM) 0 3 * * 0 /usr/local/bin/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1 # Secret rotation monthly (1st of month, 4 AM) 0 4 1 * * /usr/local/bin/rotate-secrets.sh >> /var/log/secret-rotation.log 2>&1 ``` ### Monitoring Script Execution ```bash # View cron logs sudo grep CRON /var/log/syslog | tail -20 # View specific script logs tail -f /var/log/backup-all.log tail -f /var/log/health-check.log # Check script exit codes echo "Last backup exit code: $?" ``` --- ## Best Practices ### Script Development 1. **Always use `set -e`**: Exit on first error 2. **Log everything**: Redirect to `/var/log/` 3. **Use exit codes**: 0=success, 1=warning, 2=critical 4. **Idempotent**: Safe to run multiple times 5. **Document**: Comments and usage instructions 6. **Test**: Verify on staging before production ### Secret Management **Never hardcode secrets in scripts!** ```bash # ❌ Bad REDIS_PASSWORD="mysecret123" # ✅ Good REDIS_PASSWORD=$(docker exec redis cat /run/secrets/redis_password) # ✅ Even better REDIS_PASSWORD=$(cat /run/secrets/redis_password 2>/dev/null || echo "") if [ -z "$REDIS_PASSWORD" ]; then echo "Error: Redis password not found" exit 1 fi ``` ### Error Handling ```bash # Check command success if ! docker service ls > /dev/null 2>&1; then echo "Error: Cannot connect to Docker" exit 2 fi # Trap errors trap 'echo "Script failed on line $LINENO"' ERR # Verify prerequisites for COMMAND in docker ssh s3cmd; do if ! command -v $COMMAND &> /dev/null; then echo "Error: $COMMAND not found" exit 1 fi done ``` --- ## Troubleshooting ### Script Won't Execute ```bash # Check permissions ls -la /usr/local/bin/script.sh # Should be: -rwxr-xr-x (executable) # Fix permissions sudo chmod +x /usr/local/bin/script.sh # Check shebang head -1 /usr/local/bin/script.sh # Should be: #!/bin/bash ``` ### Cron Job Not Running ```bash # Check cron service sudo systemctl status cron # Check cron logs sudo grep CRON /var/log/syslog | tail -20 # Test cron environment * * * * * /usr/bin/env > /tmp/cron-env.txt # Wait 1 minute, then check /tmp/cron-env.txt ``` ### SSH Issues in Scripts ```bash # Add SSH keys to ssh-agent eval $(ssh-agent) ssh-add ~/.ssh/id_rsa # Disable strict host checking (only for internal network) ssh -o StrictHostKeyChecking=no user@host "command" # Use SSH config cat >> ~/.ssh/config << EOF Host worker-* StrictHostKeyChecking no UserKnownHostsFile=/dev/null EOF ``` --- ## Contributing **When adding new automation:** 1. Place scripts in `automation/scripts/` 2. Document usage in header comments 3. Follow naming convention: `verb-noun.sh` 4. Test thoroughly on staging 5. Update this README with script description 6. Add to appropriate cron schedule if applicable --- ## Future Automation Ideas **Not yet implemented, but good candidates:** - [ ] Automatic SSL certificate monitoring (separate from Caddy) - [ ] Database performance metrics collection - [ ] Automated capacity planning reports - [ ] Self-healing scripts (restart failed services) - [ ] Traffic spike detection and auto-scaling - [ ] Automated security vulnerability scanning - [ ] Log aggregation and analysis - [ ] Cost optimization recommendations --- **Last Updated**: January 2025 **Maintained By**: Infrastructure Team **Note**: Scripts in this directory are templates. Customize IP addresses, domains, and credentials for your specific environment before use.