Initial commit: Open sourcing all of the Maple Open Technologies code.

2025-12-02 14:33:08 -05:00 · 2025-12-02 14:33:08 -05:00 · 755d54a99d
commit 755d54a99d
2010 changed files with 448675 additions and 0 deletions
--- a/cloud/infrastructure/production/automation/README.md
+++ b/cloud/infrastructure/production/automation/README.md
@ -0,0 +1,693 @@
+# Automation Scripts and Tools
+
+**Audience**: DevOps Engineers, Automation Teams
+**Purpose**: Automated scripts, monitoring configs, and CI/CD pipelines for production infrastructure
+**Prerequisites**: Infrastructure deployed, basic scripting knowledge
+
+---
+
+## Overview
+
+This directory contains automation tools, scripts, and configurations to reduce manual operational overhead and ensure consistency across deployments.
+
+**What's automated:**
+- Backup procedures (scheduled)
+- Deployment workflows (CI/CD)
+- Monitoring and alerting (Prometheus/Grafana configs)
+- Common maintenance tasks (scripts)
+- Infrastructure health checks
+
+---
+
+## Directory Structure
+
+```
+automation/
+├── README.md                    # This file
+│
+├── scripts/                     # Operational scripts
+│   ├── backup-all.sh           # Master backup orchestrator
+│   ├── backup-cassandra.sh     # Cassandra snapshot + upload
+│   ├── backup-redis.sh         # Redis RDB/AOF backup
+│   ├── backup-meilisearch.sh   # Meilisearch dump export
+│   ├── deploy-backend.sh       # Backend deployment automation
+│   ├── deploy-frontend.sh      # Frontend deployment automation
+│   ├── health-check.sh         # Infrastructure health verification
+│   ├── rotate-secrets.sh       # Secret rotation automation
+│   └── cleanup-docker.sh       # Docker cleanup (images, containers)
+│
+├── monitoring/                  # Monitoring configurations
+│   ├── prometheus.yml          # Prometheus scrape configs
+│   ├── alertmanager.yml        # Alert routing and receivers
+│   ├── alert-rules.yml         # Prometheus alert definitions
+│   └── grafana-dashboards/     # JSON dashboard exports
+│       ├── infrastructure.json
+│       ├── maplepress.json
+│       └── databases.json
+│
+└── ci-cd/                       # CI/CD pipeline examples
+    ├── github-actions.yml      # GitHub Actions workflow
+    ├── gitlab-ci.yml           # GitLab CI pipeline
+    └── deployment-pipeline.md  # CI/CD setup guide
+```
+
+---
+
+## Scripts
+
+### Backup Scripts
+
+All backup scripts are designed to be run via cron. They:
+- Create local snapshots/dumps
+- Compress and upload to DigitalOcean Spaces
+- Clean up old backups (retention policy)
+- Log to `/var/log/`
+- Exit with appropriate codes for monitoring
+
+**See `../operations/01_backup_recovery.md` for complete script contents and setup instructions.**
+
+**Installation:**
+
+```bash
+# On manager node
+ssh dockeradmin@<manager-ip>
+
+# Copy scripts (once scripts are created in this directory)
+sudo cp automation/scripts/backup-*.sh /usr/local/bin/
+sudo chmod +x /usr/local/bin/backup-*.sh
+
+# Schedule via cron
+sudo crontab -e
+# 0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1
+```
+
+### Deployment Scripts
+
+**`deploy-backend.sh`** - Automated backend deployment
+
+```bash
+#!/bin/bash
+# Purpose: Deploy new backend version with zero downtime
+# Usage: ./deploy-backend.sh [tag]
+# Example: ./deploy-backend.sh prod
+
+set -e
+
+TAG=${1:-prod}
+echo "=== Deploying Backend: Tag $TAG ==="
+
+# Step 1: Build and push (from local dev machine)
+echo "Building and pushing image..."
+cd ~/go/src/codeberg.org/mapleopentech/monorepo/cloud/mapleopentech-backend
+task deploy
+
+# Step 2: Force pull on worker-6
+echo "Forcing fresh pull on worker-6..."
+ssh dockeradmin@<worker-6-ip> \
+  "docker pull registry.digitalocean.com/ssp/maplepress_backend:$TAG"
+
+# Step 3: Redeploy stack
+echo "Redeploying stack..."
+ssh dockeradmin@<manager-ip> << 'ENDSSH'
+  cd ~/stacks
+  docker stack rm maplepress
+  sleep 10
+  docker config rm maplepress_caddyfile 2>/dev/null || true
+  docker stack deploy -c maplepress-stack.yml maplepress
+ENDSSH
+
+# Step 4: Verify deployment
+echo "Verifying deployment..."
+sleep 30
+ssh dockeradmin@<manager-ip> << 'ENDSSH'
+  docker service ps maplepress_backend | head -5
+  docker service logs maplepress_backend --tail 20
+ENDSSH
+
+# Step 5: Health check
+echo "Testing health endpoint..."
+curl -f https://getmaplepress.ca/health || { echo "Health check failed!"; exit 1; }
+
+echo "✅ Backend deployment complete!"
+```
+
+**`deploy-frontend.sh`** - Automated frontend deployment
+
+```bash
+#!/bin/bash
+# Purpose: Deploy new frontend build
+# Usage: ./deploy-frontend.sh
+
+set -e
+
+echo "=== Deploying Frontend ==="
+
+# SSH to worker-7 and run deployment
+ssh dockeradmin@<worker-7-ip> << 'ENDSSH'
+  cd /var/www/monorepo
+
+  echo "Pulling latest code..."
+  git pull origin main
+
+  cd web/maplepress-frontend
+
+  echo "Configuring production environment..."
+  cat > .env.production << 'EOF'
+VITE_API_BASE_URL=https://getmaplepress.ca
+NODE_ENV=production
+EOF
+
+  echo "Installing dependencies..."
+  npm install
+
+  echo "Building frontend..."
+  npm run build
+
+  echo "Verifying build..."
+  if grep -q "getmaplepress.ca" dist/assets/*.js 2>/dev/null; then
+    echo "✅ Production API URL confirmed"
+  else
+    echo "⚠️  Warning: Production URL not found in build"
+  fi
+ENDSSH
+
+# Test frontend
+echo "Testing frontend..."
+curl -f https://getmaplepress.com || { echo "Frontend test failed!"; exit 1; }
+
+echo "✅ Frontend deployment complete!"
+```
+
+### Health Check Script
+
+**`health-check.sh`** - Comprehensive infrastructure health verification
+
+```bash
+#!/bin/bash
+# Purpose: Check health of all infrastructure components
+# Usage: ./health-check.sh
+# Exit codes: 0=healthy, 1=warnings, 2=critical
+
+WARNINGS=0
+CRITICAL=0
+
+echo "=== Infrastructure Health Check ==="
+echo "Started: $(date)"
+echo ""
+
+# Check all services
+echo "--- Docker Services ---"
+SERVICES_DOWN=$(docker service ls | grep -v "1/1" | grep -v "REPLICAS" | wc -l)
+if [ $SERVICES_DOWN -gt 0 ]; then
+  echo "⚠️  WARNING: $SERVICES_DOWN services not at full capacity"
+  docker service ls | grep -v "1/1" | grep -v "REPLICAS"
+  WARNINGS=$((WARNINGS + 1))
+else
+  echo "✅ All services running (1/1)"
+fi
+
+# Check all nodes
+echo ""
+echo "--- Docker Nodes ---"
+NODES_DOWN=$(docker node ls | grep -v "Ready" | grep -v "STATUS" | wc -l)
+if [ $NODES_DOWN -gt 0 ]; then
+  echo "🔴 CRITICAL: $NODES_DOWN nodes not ready!"
+  docker node ls | grep -v "Ready" | grep -v "STATUS"
+  CRITICAL=$((CRITICAL + 1))
+else
+  echo "✅ All nodes ready"
+fi
+
+# Check disk space
+echo ""
+echo "--- Disk Space ---"
+for NODE in worker-1 worker-2 worker-3 worker-4 worker-5 worker-6 worker-7; do
+  DISK_USAGE=$(ssh -o StrictHostKeyChecking=no dockeradmin@$NODE "df -h / | tail -1 | awk '{print \$5}' | tr -d '%'")
+  if [ $DISK_USAGE -gt 85 ]; then
+    echo "🔴 CRITICAL: $NODE disk usage: ${DISK_USAGE}%"
+    CRITICAL=$((CRITICAL + 1))
+  elif [ $DISK_USAGE -gt 75 ]; then
+    echo "⚠️  WARNING: $NODE disk usage: ${DISK_USAGE}%"
+    WARNINGS=$((WARNINGS + 1))
+  else
+    echo "✅ $NODE disk usage: ${DISK_USAGE}%"
+  fi
+done
+
+# Check endpoints
+echo ""
+echo "--- HTTP Endpoints ---"
+if curl -sf https://getmaplepress.ca/health > /dev/null; then
+  echo "✅ Backend health check passed"
+else
+  echo "🔴 CRITICAL: Backend health check failed!"
+  CRITICAL=$((CRITICAL + 1))
+fi
+
+if curl -sf https://getmaplepress.com > /dev/null; then
+  echo "✅ Frontend accessible"
+else
+  echo "🔴 CRITICAL: Frontend not accessible!"
+  CRITICAL=$((CRITICAL + 1))
+fi
+
+# Summary
+echo ""
+echo "=== Summary ==="
+echo "Warnings: $WARNINGS"
+echo "Critical: $CRITICAL"
+
+if [ $CRITICAL -gt 0 ]; then
+  echo "🔴 Status: CRITICAL"
+  exit 2
+elif [ $WARNINGS -gt 0 ]; then
+  echo "⚠️  Status: WARNING"
+  exit 1
+else
+  echo "✅ Status: HEALTHY"
+  exit 0
+fi
+```
+
+---
+
+## Monitoring Configuration Files
+
+### Prometheus Configuration
+
+**Located at**: `monitoring/prometheus.yml`
+
+```yaml
+# See ../operations/02_monitoring_alerting.md for complete configuration
+# This file should be copied to ~/stacks/monitoring-config/ on manager node
+
+global:
+  scrape_interval: 15s
+  evaluation_interval: 15s
+
+alerting:
+  alertmanagers:
+    - static_configs:
+        - targets: ['alertmanager:9093']
+
+rule_files:
+  - /etc/prometheus/alert-rules.yml
+
+scrape_configs:
+  - job_name: 'prometheus'
+    static_configs:
+      - targets: ['localhost:9090']
+
+  - job_name: 'node-exporter'
+    dns_sd_configs:
+      - names: ['tasks.node-exporter']
+        type: 'A'
+        port: 9100
+
+  - job_name: 'cadvisor'
+    dns_sd_configs:
+      - names: ['tasks.cadvisor']
+        type: 'A'
+        port: 8080
+
+  - job_name: 'maplepress-backend'
+    static_configs:
+      - targets: ['maplepress-backend:8000']
+    metrics_path: '/metrics'
+```
+
+### Alert Rules
+
+**Located at**: `monitoring/alert-rules.yml`
+
+See `../operations/02_monitoring_alerting.md` for complete alert rule configurations.
+
+### Grafana Dashboards
+
+**Dashboard exports** (JSON format) should be stored in `monitoring/grafana-dashboards/`.
+
+**To import:**
+1. Access Grafana via SSH tunnel: `ssh -L 3000:localhost:3000 dockeradmin@<manager-ip>`
+2. Open http://localhost:3000
+3. Dashboards → Import → Upload JSON file
+
+**Recommended dashboards:**
+- Infrastructure Overview (node metrics, disk, CPU, memory)
+- MaplePress Application (HTTP metrics, errors, latency)
+- Database Metrics (Cassandra, Redis, Meilisearch)
+
+---
+
+## CI/CD Pipelines
+
+### GitHub Actions Example
+
+**File:** `ci-cd/github-actions.yml`
+
+```yaml
+name: Deploy to Production
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - 'cloud/mapleopentech-backend/**'
+
+jobs:
+  build-and-deploy:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v3
+
+      - name: Set up Go
+        uses: actions/setup-go@v4
+        with:
+          go-version: '1.21'
+
+      - name: Run tests
+        run: |
+          cd cloud/mapleopentech-backend
+          go test ./...
+
+      - name: Install doctl
+        uses: digitalocean/action-doctl@v2
+        with:
+          token: ${{ secrets.DIGITALOCEAN_TOKEN }}
+
+      - name: Build and push Docker image
+        run: |
+          cd cloud/mapleopentech-backend
+          doctl registry login
+          docker build -t registry.digitalocean.com/ssp/maplepress_backend:prod .
+          docker push registry.digitalocean.com/ssp/maplepress_backend:prod
+
+      - name: Deploy to production
+        uses: appleboy/ssh-action@master
+        with:
+          host: ${{ secrets.MANAGER_IP }}
+          username: dockeradmin
+          key: ${{ secrets.SSH_PRIVATE_KEY }}
+          script: |
+            # Force pull on worker-6
+            ssh dockeradmin@${{ secrets.WORKER_6_IP }} \
+              "docker pull registry.digitalocean.com/ssp/maplepress_backend:prod"
+
+            # Redeploy stack
+            cd ~/stacks
+            docker stack rm maplepress
+            sleep 10
+            docker config rm maplepress_caddyfile || true
+            docker stack deploy -c maplepress-stack.yml maplepress
+
+            # Wait and verify
+            sleep 30
+            docker service ps maplepress_backend | head -5
+
+      - name: Health check
+        run: |
+          curl -f https://getmaplepress.ca/health || exit 1
+
+      - name: Notify deployment
+        if: always()
+        uses: 8398a7/action-slack@v3
+        with:
+          status: ${{ job.status }}
+          text: 'Backend deployment ${{ job.status }}'
+          webhook_url: ${{ secrets.SLACK_WEBHOOK }}
+```
+
+### GitLab CI Example
+
+**File:** `ci-cd/gitlab-ci.yml`
+
+```yaml
+stages:
+  - test
+  - build
+  - deploy
+
+variables:
+  DOCKER_IMAGE: registry.digitalocean.com/ssp/maplepress_backend
+  DOCKER_TAG: prod
+
+test:
+  stage: test
+  image: golang:1.21
+  script:
+    - cd cloud/mapleopentech-backend
+    - go test ./...
+
+build:
+  stage: build
+  image: docker:latest
+  services:
+    - docker:dind
+  before_script:
+    - docker login registry.digitalocean.com -u $DIGITALOCEAN_TOKEN -p $DIGITALOCEAN_TOKEN
+  script:
+    - cd cloud/mapleopentech-backend
+    - docker build -t $DOCKER_IMAGE:$DOCKER_TAG .
+    - docker push $DOCKER_IMAGE:$DOCKER_TAG
+  only:
+    - main
+
+deploy:
+  stage: deploy
+  image: alpine:latest
+  before_script:
+    - apk add --no-cache openssh-client
+    - eval $(ssh-agent -s)
+    - echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
+    - mkdir -p ~/.ssh
+    - chmod 700 ~/.ssh
+    - ssh-keyscan -H $MANAGER_IP >> ~/.ssh/known_hosts
+  script:
+    # Force pull on worker-6
+    - ssh dockeradmin@$WORKER_6_IP "docker pull $DOCKER_IMAGE:$DOCKER_TAG"
+
+    # Redeploy stack
+    - |
+      ssh dockeradmin@$MANAGER_IP << 'EOF'
+        cd ~/stacks
+        docker stack rm maplepress
+        sleep 10
+        docker config rm maplepress_caddyfile || true
+        docker stack deploy -c maplepress-stack.yml maplepress
+      EOF
+
+    # Verify deployment
+    - sleep 30
+    - ssh dockeradmin@$MANAGER_IP "docker service ps maplepress_backend | head -5"
+
+    # Health check
+    - apk add --no-cache curl
+    - curl -f https://getmaplepress.ca/health
+  only:
+    - main
+  environment:
+    name: production
+    url: https://getmaplepress.ca
+```
+
+---
+
+## Usage Examples
+
+### Running Scripts Manually
+
+```bash
+# Backup all services
+ssh dockeradmin@<manager-ip>
+sudo /usr/local/bin/backup-all.sh
+
+# Health check
+ssh dockeradmin@<manager-ip>
+sudo /usr/local/bin/health-check.sh
+echo "Exit code: $?"
+# 0 = healthy, 1 = warnings, 2 = critical
+
+# Deploy backend
+cd ~/monorepo/cloud/infrastructure/production
+./automation/scripts/deploy-backend.sh prod
+
+# Deploy frontend
+./automation/scripts/deploy-frontend.sh
+```
+
+### Scheduling Scripts with Cron
+
+```bash
+# Edit crontab on manager
+ssh dockeradmin@<manager-ip>
+sudo crontab -e
+
+# Add these lines:
+
+# Backup all services daily at 2 AM
+0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1
+
+# Health check every hour
+0 * * * * /usr/local/bin/health-check.sh >> /var/log/health-check.log 2>&1
+
+# Docker cleanup weekly (Sunday 3 AM)
+0 3 * * 0 /usr/local/bin/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1
+
+# Secret rotation monthly (1st of month, 4 AM)
+0 4 1 * * /usr/local/bin/rotate-secrets.sh >> /var/log/secret-rotation.log 2>&1
+```
+
+### Monitoring Script Execution
+
+```bash
+# View cron logs
+sudo grep CRON /var/log/syslog | tail -20
+
+# View specific script logs
+tail -f /var/log/backup-all.log
+tail -f /var/log/health-check.log
+
+# Check script exit codes
+echo "Last backup exit code: $?"
+```
+
+---
+
+## Best Practices
+
+### Script Development
+
+1. **Always use `set -e`**: Exit on first error
+2. **Log everything**: Redirect to `/var/log/`
+3. **Use exit codes**: 0=success, 1=warning, 2=critical
+4. **Idempotent**: Safe to run multiple times
+5. **Document**: Comments and usage instructions
+6. **Test**: Verify on staging before production
+
+### Secret Management
+
+**Never hardcode secrets in scripts!**
+
+```bash
+# ❌ Bad
+REDIS_PASSWORD="mysecret123"
+
+# ✅ Good
+REDIS_PASSWORD=$(docker exec redis cat /run/secrets/redis_password)
+
+# ✅ Even better
+REDIS_PASSWORD=$(cat /run/secrets/redis_password 2>/dev/null || echo "")
+if [ -z "$REDIS_PASSWORD" ]; then
+  echo "Error: Redis password not found"
+  exit 1
+fi
+```
+
+### Error Handling
+
+```bash
+# Check command success
+if ! docker service ls > /dev/null 2>&1; then
+  echo "Error: Cannot connect to Docker"
+  exit 2
+fi
+
+# Trap errors
+trap 'echo "Script failed on line $LINENO"' ERR
+
+# Verify prerequisites
+for COMMAND in docker ssh s3cmd; do
+  if ! command -v $COMMAND &> /dev/null; then
+    echo "Error: $COMMAND not found"
+    exit 1
+  fi
+done
+```
+
+---
+
+## Troubleshooting
+
+### Script Won't Execute
+
+```bash
+# Check permissions
+ls -la /usr/local/bin/script.sh
+# Should be: -rwxr-xr-x (executable)
+
+# Fix permissions
+sudo chmod +x /usr/local/bin/script.sh
+
+# Check shebang
+head -1 /usr/local/bin/script.sh
+# Should be: #!/bin/bash
+```
+
+### Cron Job Not Running
+
+```bash
+# Check cron service
+sudo systemctl status cron
+
+# Check cron logs
+sudo grep CRON /var/log/syslog | tail -20
+
+# Test cron environment
+* * * * * /usr/bin/env > /tmp/cron-env.txt
+# Wait 1 minute, then check /tmp/cron-env.txt
+```
+
+### SSH Issues in Scripts
+
+```bash
+# Add SSH keys to ssh-agent
+eval $(ssh-agent)
+ssh-add ~/.ssh/id_rsa
+
+# Disable strict host checking (only for internal network)
+ssh -o StrictHostKeyChecking=no user@host "command"
+
+# Use SSH config
+cat >> ~/.ssh/config << EOF
+Host worker-*
+  StrictHostKeyChecking no
+  UserKnownHostsFile=/dev/null
+EOF
+```
+
+---
+
+## Contributing
+
+**When adding new automation:**
+
+1. Place scripts in `automation/scripts/`
+2. Document usage in header comments
+3. Follow naming convention: `verb-noun.sh`
+4. Test thoroughly on staging
+5. Update this README with script description
+6. Add to appropriate cron schedule if applicable
+
+---
+
+## Future Automation Ideas
+
+**Not yet implemented, but good candidates:**
+
+- [ ] Automatic SSL certificate monitoring (separate from Caddy)
+- [ ] Database performance metrics collection
+- [ ] Automated capacity planning reports
+- [ ] Self-healing scripts (restart failed services)
+- [ ] Traffic spike detection and auto-scaling
+- [ ] Automated security vulnerability scanning
+- [ ] Log aggregation and analysis
+- [ ] Cost optimization recommendations
+
+---
+
+**Last Updated**: January 2025
+**Maintained By**: Infrastructure Team
+
+**Note**: Scripts in this directory are templates. Customize IP addresses, domains, and credentials for your specific environment before use.