Initial commit: Open sourcing all of the Maple Open Technologies code.

This commit is contained in:
Bartlomiej Mika 2025-12-02 14:33:08 -05:00
commit 755d54a99d
2010 changed files with 448675 additions and 0 deletions

View file

@ -0,0 +1,693 @@
# Automation Scripts and Tools
**Audience**: DevOps Engineers, Automation Teams
**Purpose**: Automated scripts, monitoring configs, and CI/CD pipelines for production infrastructure
**Prerequisites**: Infrastructure deployed, basic scripting knowledge
---
## Overview
This directory contains automation tools, scripts, and configurations to reduce manual operational overhead and ensure consistency across deployments.
**What's automated:**
- Backup procedures (scheduled)
- Deployment workflows (CI/CD)
- Monitoring and alerting (Prometheus/Grafana configs)
- Common maintenance tasks (scripts)
- Infrastructure health checks
---
## Directory Structure
```
automation/
├── README.md # This file
├── scripts/ # Operational scripts
│ ├── backup-all.sh # Master backup orchestrator
│ ├── backup-cassandra.sh # Cassandra snapshot + upload
│ ├── backup-redis.sh # Redis RDB/AOF backup
│ ├── backup-meilisearch.sh # Meilisearch dump export
│ ├── deploy-backend.sh # Backend deployment automation
│ ├── deploy-frontend.sh # Frontend deployment automation
│ ├── health-check.sh # Infrastructure health verification
│ ├── rotate-secrets.sh # Secret rotation automation
│ └── cleanup-docker.sh # Docker cleanup (images, containers)
├── monitoring/ # Monitoring configurations
│ ├── prometheus.yml # Prometheus scrape configs
│ ├── alertmanager.yml # Alert routing and receivers
│ ├── alert-rules.yml # Prometheus alert definitions
│ └── grafana-dashboards/ # JSON dashboard exports
│ ├── infrastructure.json
│ ├── maplepress.json
│ └── databases.json
└── ci-cd/ # CI/CD pipeline examples
├── github-actions.yml # GitHub Actions workflow
├── gitlab-ci.yml # GitLab CI pipeline
└── deployment-pipeline.md # CI/CD setup guide
```
---
## Scripts
### Backup Scripts
All backup scripts are designed to be run via cron. They:
- Create local snapshots/dumps
- Compress and upload to DigitalOcean Spaces
- Clean up old backups (retention policy)
- Log to `/var/log/`
- Exit with appropriate codes for monitoring
**See `../operations/01_backup_recovery.md` for complete script contents and setup instructions.**
**Installation:**
```bash
# On manager node
ssh dockeradmin@<manager-ip>
# Copy scripts (once scripts are created in this directory)
sudo cp automation/scripts/backup-*.sh /usr/local/bin/
sudo chmod +x /usr/local/bin/backup-*.sh
# Schedule via cron
sudo crontab -e
# 0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1
```
### Deployment Scripts
**`deploy-backend.sh`** - Automated backend deployment
```bash
#!/bin/bash
# Purpose: Deploy new backend version with zero downtime
# Usage: ./deploy-backend.sh [tag]
# Example: ./deploy-backend.sh prod
set -e
TAG=${1:-prod}
echo "=== Deploying Backend: Tag $TAG ==="
# Step 1: Build and push (from local dev machine)
echo "Building and pushing image..."
cd ~/go/src/codeberg.org/mapleopentech/monorepo/cloud/mapleopentech-backend
task deploy
# Step 2: Force pull on worker-6
echo "Forcing fresh pull on worker-6..."
ssh dockeradmin@<worker-6-ip> \
"docker pull registry.digitalocean.com/ssp/maplepress_backend:$TAG"
# Step 3: Redeploy stack
echo "Redeploying stack..."
ssh dockeradmin@<manager-ip> << 'ENDSSH'
cd ~/stacks
docker stack rm maplepress
sleep 10
docker config rm maplepress_caddyfile 2>/dev/null || true
docker stack deploy -c maplepress-stack.yml maplepress
ENDSSH
# Step 4: Verify deployment
echo "Verifying deployment..."
sleep 30
ssh dockeradmin@<manager-ip> << 'ENDSSH'
docker service ps maplepress_backend | head -5
docker service logs maplepress_backend --tail 20
ENDSSH
# Step 5: Health check
echo "Testing health endpoint..."
curl -f https://getmaplepress.ca/health || { echo "Health check failed!"; exit 1; }
echo "✅ Backend deployment complete!"
```
**`deploy-frontend.sh`** - Automated frontend deployment
```bash
#!/bin/bash
# Purpose: Deploy new frontend build
# Usage: ./deploy-frontend.sh
set -e
echo "=== Deploying Frontend ==="
# SSH to worker-7 and run deployment
ssh dockeradmin@<worker-7-ip> << 'ENDSSH'
cd /var/www/monorepo
echo "Pulling latest code..."
git pull origin main
cd web/maplepress-frontend
echo "Configuring production environment..."
cat > .env.production << 'EOF'
VITE_API_BASE_URL=https://getmaplepress.ca
NODE_ENV=production
EOF
echo "Installing dependencies..."
npm install
echo "Building frontend..."
npm run build
echo "Verifying build..."
if grep -q "getmaplepress.ca" dist/assets/*.js 2>/dev/null; then
echo "✅ Production API URL confirmed"
else
echo "⚠️ Warning: Production URL not found in build"
fi
ENDSSH
# Test frontend
echo "Testing frontend..."
curl -f https://getmaplepress.com || { echo "Frontend test failed!"; exit 1; }
echo "✅ Frontend deployment complete!"
```
### Health Check Script
**`health-check.sh`** - Comprehensive infrastructure health verification
```bash
#!/bin/bash
# Purpose: Check health of all infrastructure components
# Usage: ./health-check.sh
# Exit codes: 0=healthy, 1=warnings, 2=critical
WARNINGS=0
CRITICAL=0
echo "=== Infrastructure Health Check ==="
echo "Started: $(date)"
echo ""
# Check all services
echo "--- Docker Services ---"
SERVICES_DOWN=$(docker service ls | grep -v "1/1" | grep -v "REPLICAS" | wc -l)
if [ $SERVICES_DOWN -gt 0 ]; then
echo "⚠️ WARNING: $SERVICES_DOWN services not at full capacity"
docker service ls | grep -v "1/1" | grep -v "REPLICAS"
WARNINGS=$((WARNINGS + 1))
else
echo "✅ All services running (1/1)"
fi
# Check all nodes
echo ""
echo "--- Docker Nodes ---"
NODES_DOWN=$(docker node ls | grep -v "Ready" | grep -v "STATUS" | wc -l)
if [ $NODES_DOWN -gt 0 ]; then
echo "🔴 CRITICAL: $NODES_DOWN nodes not ready!"
docker node ls | grep -v "Ready" | grep -v "STATUS"
CRITICAL=$((CRITICAL + 1))
else
echo "✅ All nodes ready"
fi
# Check disk space
echo ""
echo "--- Disk Space ---"
for NODE in worker-1 worker-2 worker-3 worker-4 worker-5 worker-6 worker-7; do
DISK_USAGE=$(ssh -o StrictHostKeyChecking=no dockeradmin@$NODE "df -h / | tail -1 | awk '{print \$5}' | tr -d '%'")
if [ $DISK_USAGE -gt 85 ]; then
echo "🔴 CRITICAL: $NODE disk usage: ${DISK_USAGE}%"
CRITICAL=$((CRITICAL + 1))
elif [ $DISK_USAGE -gt 75 ]; then
echo "⚠️ WARNING: $NODE disk usage: ${DISK_USAGE}%"
WARNINGS=$((WARNINGS + 1))
else
echo "✅ $NODE disk usage: ${DISK_USAGE}%"
fi
done
# Check endpoints
echo ""
echo "--- HTTP Endpoints ---"
if curl -sf https://getmaplepress.ca/health > /dev/null; then
echo "✅ Backend health check passed"
else
echo "🔴 CRITICAL: Backend health check failed!"
CRITICAL=$((CRITICAL + 1))
fi
if curl -sf https://getmaplepress.com > /dev/null; then
echo "✅ Frontend accessible"
else
echo "🔴 CRITICAL: Frontend not accessible!"
CRITICAL=$((CRITICAL + 1))
fi
# Summary
echo ""
echo "=== Summary ==="
echo "Warnings: $WARNINGS"
echo "Critical: $CRITICAL"
if [ $CRITICAL -gt 0 ]; then
echo "🔴 Status: CRITICAL"
exit 2
elif [ $WARNINGS -gt 0 ]; then
echo "⚠️ Status: WARNING"
exit 1
else
echo "✅ Status: HEALTHY"
exit 0
fi
```
---
## Monitoring Configuration Files
### Prometheus Configuration
**Located at**: `monitoring/prometheus.yml`
```yaml
# See ../operations/02_monitoring_alerting.md for complete configuration
# This file should be copied to ~/stacks/monitoring-config/ on manager node
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- /etc/prometheus/alert-rules.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
dns_sd_configs:
- names: ['tasks.node-exporter']
type: 'A'
port: 9100
- job_name: 'cadvisor'
dns_sd_configs:
- names: ['tasks.cadvisor']
type: 'A'
port: 8080
- job_name: 'maplepress-backend'
static_configs:
- targets: ['maplepress-backend:8000']
metrics_path: '/metrics'
```
### Alert Rules
**Located at**: `monitoring/alert-rules.yml`
See `../operations/02_monitoring_alerting.md` for complete alert rule configurations.
### Grafana Dashboards
**Dashboard exports** (JSON format) should be stored in `monitoring/grafana-dashboards/`.
**To import:**
1. Access Grafana via SSH tunnel: `ssh -L 3000:localhost:3000 dockeradmin@<manager-ip>`
2. Open http://localhost:3000
3. Dashboards → Import → Upload JSON file
**Recommended dashboards:**
- Infrastructure Overview (node metrics, disk, CPU, memory)
- MaplePress Application (HTTP metrics, errors, latency)
- Database Metrics (Cassandra, Redis, Meilisearch)
---
## CI/CD Pipelines
### GitHub Actions Example
**File:** `ci-cd/github-actions.yml`
```yaml
name: Deploy to Production
on:
push:
branches:
- main
paths:
- 'cloud/mapleopentech-backend/**'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.21'
- name: Run tests
run: |
cd cloud/mapleopentech-backend
go test ./...
- name: Install doctl
uses: digitalocean/action-doctl@v2
with:
token: ${{ secrets.DIGITALOCEAN_TOKEN }}
- name: Build and push Docker image
run: |
cd cloud/mapleopentech-backend
doctl registry login
docker build -t registry.digitalocean.com/ssp/maplepress_backend:prod .
docker push registry.digitalocean.com/ssp/maplepress_backend:prod
- name: Deploy to production
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.MANAGER_IP }}
username: dockeradmin
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
# Force pull on worker-6
ssh dockeradmin@${{ secrets.WORKER_6_IP }} \
"docker pull registry.digitalocean.com/ssp/maplepress_backend:prod"
# Redeploy stack
cd ~/stacks
docker stack rm maplepress
sleep 10
docker config rm maplepress_caddyfile || true
docker stack deploy -c maplepress-stack.yml maplepress
# Wait and verify
sleep 30
docker service ps maplepress_backend | head -5
- name: Health check
run: |
curl -f https://getmaplepress.ca/health || exit 1
- name: Notify deployment
if: always()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: 'Backend deployment ${{ job.status }}'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
```
### GitLab CI Example
**File:** `ci-cd/gitlab-ci.yml`
```yaml
stages:
- test
- build
- deploy
variables:
DOCKER_IMAGE: registry.digitalocean.com/ssp/maplepress_backend
DOCKER_TAG: prod
test:
stage: test
image: golang:1.21
script:
- cd cloud/mapleopentech-backend
- go test ./...
build:
stage: build
image: docker:latest
services:
- docker:dind
before_script:
- docker login registry.digitalocean.com -u $DIGITALOCEAN_TOKEN -p $DIGITALOCEAN_TOKEN
script:
- cd cloud/mapleopentech-backend
- docker build -t $DOCKER_IMAGE:$DOCKER_TAG .
- docker push $DOCKER_IMAGE:$DOCKER_TAG
only:
- main
deploy:
stage: deploy
image: alpine:latest
before_script:
- apk add --no-cache openssh-client
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- ssh-keyscan -H $MANAGER_IP >> ~/.ssh/known_hosts
script:
# Force pull on worker-6
- ssh dockeradmin@$WORKER_6_IP "docker pull $DOCKER_IMAGE:$DOCKER_TAG"
# Redeploy stack
- |
ssh dockeradmin@$MANAGER_IP << 'EOF'
cd ~/stacks
docker stack rm maplepress
sleep 10
docker config rm maplepress_caddyfile || true
docker stack deploy -c maplepress-stack.yml maplepress
EOF
# Verify deployment
- sleep 30
- ssh dockeradmin@$MANAGER_IP "docker service ps maplepress_backend | head -5"
# Health check
- apk add --no-cache curl
- curl -f https://getmaplepress.ca/health
only:
- main
environment:
name: production
url: https://getmaplepress.ca
```
---
## Usage Examples
### Running Scripts Manually
```bash
# Backup all services
ssh dockeradmin@<manager-ip>
sudo /usr/local/bin/backup-all.sh
# Health check
ssh dockeradmin@<manager-ip>
sudo /usr/local/bin/health-check.sh
echo "Exit code: $?"
# 0 = healthy, 1 = warnings, 2 = critical
# Deploy backend
cd ~/monorepo/cloud/infrastructure/production
./automation/scripts/deploy-backend.sh prod
# Deploy frontend
./automation/scripts/deploy-frontend.sh
```
### Scheduling Scripts with Cron
```bash
# Edit crontab on manager
ssh dockeradmin@<manager-ip>
sudo crontab -e
# Add these lines:
# Backup all services daily at 2 AM
0 2 * * * /usr/local/bin/backup-all.sh >> /var/log/backup-all.log 2>&1
# Health check every hour
0 * * * * /usr/local/bin/health-check.sh >> /var/log/health-check.log 2>&1
# Docker cleanup weekly (Sunday 3 AM)
0 3 * * 0 /usr/local/bin/cleanup-docker.sh >> /var/log/docker-cleanup.log 2>&1
# Secret rotation monthly (1st of month, 4 AM)
0 4 1 * * /usr/local/bin/rotate-secrets.sh >> /var/log/secret-rotation.log 2>&1
```
### Monitoring Script Execution
```bash
# View cron logs
sudo grep CRON /var/log/syslog | tail -20
# View specific script logs
tail -f /var/log/backup-all.log
tail -f /var/log/health-check.log
# Check script exit codes
echo "Last backup exit code: $?"
```
---
## Best Practices
### Script Development
1. **Always use `set -e`**: Exit on first error
2. **Log everything**: Redirect to `/var/log/`
3. **Use exit codes**: 0=success, 1=warning, 2=critical
4. **Idempotent**: Safe to run multiple times
5. **Document**: Comments and usage instructions
6. **Test**: Verify on staging before production
### Secret Management
**Never hardcode secrets in scripts!**
```bash
# ❌ Bad
REDIS_PASSWORD="mysecret123"
# ✅ Good
REDIS_PASSWORD=$(docker exec redis cat /run/secrets/redis_password)
# ✅ Even better
REDIS_PASSWORD=$(cat /run/secrets/redis_password 2>/dev/null || echo "")
if [ -z "$REDIS_PASSWORD" ]; then
echo "Error: Redis password not found"
exit 1
fi
```
### Error Handling
```bash
# Check command success
if ! docker service ls > /dev/null 2>&1; then
echo "Error: Cannot connect to Docker"
exit 2
fi
# Trap errors
trap 'echo "Script failed on line $LINENO"' ERR
# Verify prerequisites
for COMMAND in docker ssh s3cmd; do
if ! command -v $COMMAND &> /dev/null; then
echo "Error: $COMMAND not found"
exit 1
fi
done
```
---
## Troubleshooting
### Script Won't Execute
```bash
# Check permissions
ls -la /usr/local/bin/script.sh
# Should be: -rwxr-xr-x (executable)
# Fix permissions
sudo chmod +x /usr/local/bin/script.sh
# Check shebang
head -1 /usr/local/bin/script.sh
# Should be: #!/bin/bash
```
### Cron Job Not Running
```bash
# Check cron service
sudo systemctl status cron
# Check cron logs
sudo grep CRON /var/log/syslog | tail -20
# Test cron environment
* * * * * /usr/bin/env > /tmp/cron-env.txt
# Wait 1 minute, then check /tmp/cron-env.txt
```
### SSH Issues in Scripts
```bash
# Add SSH keys to ssh-agent
eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa
# Disable strict host checking (only for internal network)
ssh -o StrictHostKeyChecking=no user@host "command"
# Use SSH config
cat >> ~/.ssh/config << EOF
Host worker-*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
EOF
```
---
## Contributing
**When adding new automation:**
1. Place scripts in `automation/scripts/`
2. Document usage in header comments
3. Follow naming convention: `verb-noun.sh`
4. Test thoroughly on staging
5. Update this README with script description
6. Add to appropriate cron schedule if applicable
---
## Future Automation Ideas
**Not yet implemented, but good candidates:**
- [ ] Automatic SSL certificate monitoring (separate from Caddy)
- [ ] Database performance metrics collection
- [ ] Automated capacity planning reports
- [ ] Self-healing scripts (restart failed services)
- [ ] Traffic spike detection and auto-scaling
- [ ] Automated security vulnerability scanning
- [ ] Log aggregation and analysis
- [ ] Cost optimization recommendations
---
**Last Updated**: January 2025
**Maintained By**: Infrastructure Team
**Note**: Scripts in this directory are templates. Customize IP addresses, domains, and credentials for your specific environment before use.