# Production Infrastructure Setup Guide **Audience**: DevOps Engineers, Infrastructure Team, Junior Engineers **Purpose**: Complete step-by-step deployment of Maple Open Technologies production infrastructure from scratch **Time to Complete**: 6-8 hours (first-time deployment) **Prerequisites**: DigitalOcean account, basic Linux knowledge, SSH access --- ## Overview This directory contains comprehensive guides for deploying Maple Open Technologies production infrastructure on DigitalOcean from a **completely fresh start**. Follow these guides in sequential order to build a complete, production-ready infrastructure. **What you'll build:** - Docker Swarm cluster (7+ nodes) - High-availability databases (Cassandra 3-node cluster) - Caching layer (Redis) - Search engine (Meilisearch) - Backend API (Go application) - Frontend (React SPA) - Automatic HTTPS with SSL certificates - Multi-application architecture (MaplePress, MapleFile) **Infrastructure at completion:** ``` Internet (HTTPS) ├─ getmaplepress.ca → Backend API (worker-6) └─ getmaplepress.com → Frontend (worker-7) ↓ Backend Services (mapleopentech-public-prod + mapleopentech-private-prod) ↓ Databases (mapleopentech-private-prod only) ├─ Cassandra: 3-node cluster (workers 2,3,4) - RF=3, QUORUM ├─ Redis: Single instance (worker-1/manager) └─ Meilisearch: Single instance (worker-5) ↓ Object Storage: DigitalOcean Spaces (S3-compatible) ``` --- ## Setup Guides (In Order) ### Phase 0: Planning & Prerequisites (30 minutes) **[00-getting-started.md](00-getting-started.md)** - Local workspace setup - DigitalOcean account setup - API token configuration - SSH key generation - `.env` file initialization - Command-line tools verification **[00-network-architecture.md](00-network-architecture.md)** - Network design - Network segmentation strategy (`mapleopentech-private-prod` vs `mapleopentech-public-prod`) - Security principles (defense in depth) - Service communication patterns - Firewall rules overview **[00-multi-app-architecture.md](00-multi-app-architecture.md)** - Multi-app strategy - Naming conventions for services, stacks, hostnames - Shared infrastructure design (Cassandra/Redis/Meilisearch) - Application isolation patterns - Scaling to multiple apps (MaplePress, MapleFile) **Prerequisites checklist:** - [ ] DigitalOcean account with billing enabled - [ ] DigitalOcean API token (read + write permissions) - [ ] SSH key pair generated (`~/.ssh/id_rsa.pub`) - [ ] Domain names registered (e.g., `getmaplepress.ca`, `getmaplepress.com`) - [ ] Local machine: git, ssh, curl installed - [ ] `.env` file created from `.env.template` **Total time: 30 minutes** --- ### Phase 1: Infrastructure Foundation (3-4 hours) **[01_init_docker_swarm.md](01_init_docker_swarm.md)** - Docker Swarm cluster - Create 7+ DigitalOcean droplets (Ubuntu 24.04) - Install Docker on all nodes - Initialize Docker Swarm (1 manager, 6+ workers) - Configure private networking (VPC) - Set up firewall rules - Verify cluster connectivity **What you'll have:** - Manager node (worker-1): Swarm orchestration - Worker nodes (2-7+): Application/database hosts - Private network: 10.116.0.0/16 - All nodes communicating securely **Total time: 1-1.5 hours** --- **[02_cassandra.md](02_cassandra.md)** - Cassandra database cluster - Deploy 3-node Cassandra cluster (workers 2, 3, 4) - Configure replication (RF=3, QUORUM consistency) - Create keyspace and initial schema - Verify cluster health (`nodetool status`) - Performance tuning for production **What you'll have:** - Highly available database cluster - Automatic failover (survives 1 node failure) - QUORUM reads/writes for consistency - Ready for application data **Total time: 1-1.5 hours** --- **[03_redis.md](03_redis.md)** - Redis cache server - Deploy Redis on manager node (worker-1) - Configure persistence (RDB + AOF) - Set up password authentication - Test connectivity from other services **What you'll have:** - High-performance caching layer - Session storage - Rate limiting storage - Persistent cache (survives restarts) **Total time: 30 minutes** --- **[04_meilisearch.md](04_meilisearch.md)** - Search engine - Deploy Meilisearch on worker-5 - Configure API key authentication - Create initial indexes - Test search functionality **What you'll have:** - Fast full-text search engine - Typo-tolerant search - Faceted filtering - Ready for content indexing **Total time: 30 minutes** --- **[04.5_spaces.md](04.5_spaces.md)** - Object storage - Create DigitalOcean Spaces bucket - Configure access keys - Set up CORS policies - Create Docker secrets for Spaces credentials - Test upload/download **What you'll have:** - S3-compatible object storage - Secure credential management - Ready for file uploads - CDN-backed storage **Total time: 30 minutes** --- ### Phase 2: Application Deployment (2-3 hours) **[05_maplepress_backend.md](05_maplepress_backend.md)** - Backend API deployment (Part 1) - Create worker-6 droplet - Join worker-6 to Docker Swarm - Configure DNS (point domain to worker-6) - Authenticate with DigitalOcean Container Registry - Create Docker secrets (JWT, encryption keys) - Deploy backend service (Go application) - Connect to databases (Cassandra, Redis, Meilisearch) - Verify health checks **What you'll have:** - Backend API running on worker-6 - Connected to all databases - Docker secrets configured - Health checks passing - Ready for reverse proxy **Total time: 1-1.5 hours** --- **[06_maplepress_caddy.md](06_maplepress_caddy.md)** - Backend reverse proxy (Part 2) - Configure Caddy reverse proxy - Set up automatic SSL/TLS (Let's Encrypt) - Configure security headers - Enable HTTP to HTTPS redirect - Preserve CORS headers for frontend - Test SSL certificate acquisition **What you'll have:** - Backend accessible at `https://getmaplepress.ca` - Automatic SSL certificate management - Zero-downtime certificate renewals - Security headers configured - CORS configured for frontend **Total time: 30 minutes** --- **[07_maplepress_frontend.md](07_maplepress_frontend.md)** - Frontend deployment - Create worker-7 droplet - Join worker-7 to Docker Swarm - Install Node.js on worker-7 - Clone repository and build React app - Configure production environment (API URL) - Deploy Caddy for static file serving - Configure SPA routing - Set up automatic SSL for frontend domain **What you'll have:** - Frontend accessible at `https://getmaplepress.com` - React app built with production API URL - Automatic HTTPS - SPA routing working - Static asset caching - Complete end-to-end application **Total time: 1 hour** --- ### Phase 3: Optional Enhancements (1 hour) **[99_extra.md](99_extra.md)** - Extra operations - Domain changes (backend and/or frontend) - Horizontal scaling (multiple backend replicas) - SSL certificate management - Load balancing verification **Total time: As needed** --- ## Quick Start (Experienced Engineers) **If you're familiar with Docker Swarm and don't need detailed explanations:** ```bash # 1. Prerequisites (5 min) cd cloud/infrastructure/production cp .env.template .env vi .env # Add DIGITALOCEAN_TOKEN source .env # 2. Infrastructure (1 hour) # Follow 01_init_docker_swarm.md - create 7 droplets, init swarm # SSH to manager, run quick verification # 3. Databases (1 hour) # Deploy Cassandra (02), Redis (03), Meilisearch (04), Spaces (04.5) # Verify all services: docker service ls # 4. Applications (1 hour) # Deploy backend (05), backend-caddy (06), frontend (07) # Test: curl https://getmaplepress.ca/health # curl https://getmaplepress.com # 5. Verify (15 min) docker service ls # All services 1/1 docker node ls # All nodes Ready # Test in browser: https://getmaplepress.com ``` **Total time for experienced: ~3 hours** --- ## Directory Structure ``` setup/ ├── README.md # This file │ ├── 00-getting-started.md # Prerequisites & workspace setup ├── 00-network-architecture.md # Network design principles ├── 00-multi-app-architecture.md # Multi-app naming & strategy │ ├── 01_init_docker_swarm.md # Docker Swarm cluster ├── 02_cassandra.md # Cassandra database cluster ├── 03_redis.md # Redis cache server ├── 04_meilisearch.md # Meilisearch search engine ├── 04.5_spaces.md # DigitalOcean Spaces (object storage) │ ├── 05_backend.md # Backend API deployment ├── 06_caddy.md # Backend reverse proxy (Caddy + SSL) ├── 07_frontend.md # Frontend deployment (React + Caddy) │ ├── 08_extra.md # Domain changes, scaling, extras │ └── templates/ # Configuration templates ├── cassandra-stack.yml.template ├── redis-stack.yml.template ├── backend-stack.yml.template └── Caddyfile.template ``` --- ## Infrastructure Specifications ### Hardware Requirements | Component | Droplet Size | vCPUs | RAM | Disk | Monthly Cost | |-----------|--------------|-------|-----|------|--------------| | Manager (worker-1) + Redis | Basic | 2 | 2 GB | 50 GB | $18 | | Cassandra Node 1 (worker-2) | General Purpose | 2 | 4 GB | 80 GB | $48 | | Cassandra Node 2 (worker-3) | General Purpose | 2 | 4 GB | 80 GB | $48 | | Cassandra Node 3 (worker-4) | General Purpose | 2 | 4 GB | 80 GB | $48 | | Meilisearch (worker-5) | Basic | 2 | 2 GB | 50 GB | $18 | | Backend (worker-6) | Basic | 2 | 2 GB | 50 GB | $18 | | Frontend (worker-7) | Basic | 1 | 1 GB | 25 GB | $6 | | **Total** | - | **13** | **19 GB** | **415 GB** | **~$204/mo** | **Additional costs:** - DigitalOcean Spaces: $5/mo (250 GB storage + 1 TB transfer) - Bandwidth: Included (1 TB per droplet) - Backups (optional): +20% of droplet cost **Total estimated: ~$210-250/month** ### Software Versions | Software | Version | Notes | |----------|---------|-------| | Ubuntu | 24.04 LTS | Base OS | | Docker | 27.x+ | Container runtime | | Docker Swarm | Built-in | Orchestration | | Cassandra | 4.1.x | Database | | Redis | 7.x-alpine | Cache | | Meilisearch | v1.5+ | Search | | Caddy | 2-alpine | Reverse proxy | | Go | 1.21+ | Backend runtime | | Node.js | 20 LTS | Frontend build | --- ## Key Concepts ### Docker Swarm Architecture **Manager node (worker-1):** - Orchestrates all services - Schedules tasks to workers - Maintains cluster state - Runs Redis (collocated) **Worker nodes (2-7+):** - Execute service tasks (containers) - Report health to manager - Isolated workloads via labels **Node labels:** - `backend=true`: Backend deployment target (worker-6) - `maplepress-frontend=true`: Frontend target (worker-7) ### Network Architecture **`mapleopentech-private-prod` (overlay network):** - All databases (Cassandra, Redis, Meilisearch) - Backend services (access to databases) - **No internet access** (security) - Internal-only communication **`mapleopentech-public-prod` (overlay network):** - Caddy reverse proxies - Backend services (receive HTTP requests) - Ports 80/443 exposed to internet **Backends join BOTH networks:** - Receive requests from Caddy (public network) - Access databases (private network) ### Multi-Application Pattern **Shared infrastructure (workers 1-5):** - Cassandra, Redis, Meilisearch serve ALL apps - Cost-efficient (1 infrastructure for unlimited apps) **Per-application deployment (workers 6+):** - Each app gets dedicated workers - Independent scaling and deployment - Clear isolation **Example: Adding MapleFile** - Worker-8: `maplefile_backend` + `maplefile_backend-caddy` - Worker-9: `maplefile-frontend_caddy` - Uses same Cassandra/Redis/Meilisearch - No changes to infrastructure --- ## Common Commands Reference ### Swarm Management ```bash # List all nodes docker node ls # List all services docker service ls # View service logs docker service logs -f maplepress_backend # Scale service docker service scale maplepress_backend=3 # Update service (rolling restart) docker service update --force maplepress_backend # Remove service docker service rm maplepress_backend ``` ### Stack Management ```bash # Deploy stack docker stack deploy -c stack.yml stack-name # List stacks docker stack ls # View stack services docker stack services maplepress # Remove stack docker stack rm maplepress ``` ### Troubleshooting ```bash # Check service status docker service ps maplepress_backend # View container logs docker logs # Inspect service docker service inspect maplepress_backend # Check network docker network inspect mapleopentech-private-prod # List configs docker config ls # List secrets docker secret ls ``` --- ## Deployment Checklist **Use this checklist to track your progress:** ### Phase 0: Prerequisites - [ ] DigitalOcean account created - [ ] API token generated and saved - [ ] SSH keys generated (`ssh-keygen`) - [ ] SSH key added to DigitalOcean - [ ] Domain names registered - [ ] `.env` file created from template - [ ] `.env` file has correct permissions (600) - [ ] Git repository cloned locally ### Phase 1: Infrastructure - [ ] 7 droplets created (workers 1-7) - [ ] Docker Swarm initialized - [ ] All workers joined swarm - [ ] Private networking configured (VPC) - [ ] Firewall rules configured on all nodes - [ ] Cassandra 3-node cluster deployed - [ ] Cassandra cluster healthy (`nodetool status`) - [ ] Redis deployed on manager - [ ] Redis authentication configured - [ ] Meilisearch deployed on worker-5 - [ ] Meilisearch API key configured - [ ] DigitalOcean Spaces bucket created - [ ] Spaces access keys stored as Docker secrets ### Phase 2: Applications - [ ] Worker-6 created and joined swarm - [ ] Worker-6 labeled for backend - [ ] DNS pointing backend domain to worker-6 - [ ] Backend Docker secrets created (JWT, IP encryption) - [ ] Backend service deployed - [ ] Backend health check passing - [ ] Backend Caddy deployed - [ ] Backend SSL certificate obtained - [ ] Backend accessible at `https://domain.ca` - [ ] Worker-7 created and joined swarm - [ ] Worker-7 labeled for frontend - [ ] DNS pointing frontend domain to worker-7 - [ ] Node.js installed on worker-7 - [ ] Repository cloned on worker-7 - [ ] Frontend built with production API URL - [ ] Frontend Caddy deployed - [ ] Frontend SSL certificate obtained - [ ] Frontend accessible at `https://domain.com` - [ ] CORS working (frontend can call backend) ### Phase 3: Verification - [ ] All services show 1/1 replicas (`docker service ls`) - [ ] All nodes show Ready (`docker node ls`) - [ ] Backend health endpoint returns 200 - [ ] Frontend loads in browser - [ ] Frontend can call backend API (no CORS errors) - [ ] SSL certificates valid (green padlock) - [ ] HTTP redirects to HTTPS ### Next Steps - [ ] Set up monitoring (see `../operations/02_monitoring_alerting.md`) - [ ] Configure backups (see `../operations/01_backup_recovery.md`) - [ ] Review incident runbooks (see `../operations/03_incident_response.md`) --- ## Troubleshooting Guide ### Problem: Docker Swarm Join Fails **Symptoms:** Worker can't join swarm, connection refused **Check:** ```bash # On manager, verify swarm is initialized docker info | grep "Swarm: active" # Verify firewall allows swarm ports sudo ufw status | grep -E "2377|7946|4789" # Get new join token docker swarm join-token worker ``` ### Problem: Service Won't Start **Symptoms:** Service stuck at 0/1 replicas **Check:** ```bash # View service events docker service ps service-name --no-trunc # Common issues: # - Image not found: Authenticate with registry # - Network not found: Create network first # - Secret not found: Create secrets # - No suitable node: Check node labels ``` ### Problem: DNS Not Resolving **Symptoms:** Domain doesn't resolve to correct IP **Check:** ```bash # Test DNS resolution dig yourdomain.com +short # Should return worker IP # If not, wait 5-60 minutes for propagation # Or check DNS provider settings ``` ### Problem: SSL Certificate Not Obtained **Symptoms:** HTTPS not working, certificate errors **Check:** ```bash # Verify DNS points to correct server dig yourdomain.com +short # Verify port 80 accessible (Let's Encrypt challenge) curl http://yourdomain.com # Check Caddy logs docker service logs service-name --tail 100 | grep -i certificate # Common issues: # - DNS not pointing to server # - Port 80 blocked by firewall # - Rate limited (5 certs/domain/week) ``` ### Problem: Services Can't Communicate **Symptoms:** Backend can't reach database **Check:** ```bash # Verify both services on same network docker service inspect backend --format '{{.Spec.TaskTemplate.Networks}}' docker service inspect database --format '{{.Spec.TaskTemplate.Networks}}' # Test DNS resolution from container docker exec nslookup database-hostname # Verify firewall allows internal traffic sudo ufw status | grep 10.116.0.0/16 ``` --- ## Getting Help ### Documentation Resources **Within this repository:** - This directory (`setup/`): Initial deployment guides - `../operations/`: Day-to-day operational procedures - `../reference/`: Architecture diagrams, capacity planning - `../automation/`: Scripts for common tasks **External resources:** - Docker Swarm: https://docs.docker.com/engine/swarm/ - Cassandra: https://cassandra.apache.org/doc/latest/ - DigitalOcean: https://docs.digitalocean.com/ - Caddy: https://caddyserver.com/docs/ ### Common Questions **Q: Can I use a different cloud provider (AWS, GCP, Azure)?** A: Yes, but you'll need to adapt networking and object storage sections. The Docker Swarm and application deployment sections remain the same. **Q: Can I deploy with fewer nodes?** A: Minimum viable: 3 nodes (1 manager + 2 workers). Run Cassandra in single-node mode (not recommended for production). Colocate services on same workers. **Q: How do I add a new application (e.g., MapleFile)?** A: Follow `00-multi-app-architecture.md`. Add 2 workers (backend + frontend), deploy new stacks. Reuse existing databases. **Q: What if I only have one domain?** A: Use subdomains: `api.yourdomain.com` (backend), `app.yourdomain.com` (frontend). Update DNS and Caddyfiles accordingly. --- ## Security Best Practices **Implemented by these guides:** - ✅ Firewall configured (UFW) on all nodes - ✅ SSH key-based authentication (no passwords) - ✅ Docker secrets for sensitive values - ✅ Network segmentation (private vs public) - ✅ Automatic HTTPS with Let's Encrypt - ✅ Security headers configured in Caddy - ✅ Database authentication (Redis password, Meilisearch API key) - ✅ Private Docker registry authentication **Additional recommendations:** - Rotate secrets quarterly (see `../operations/07_security_operations.md`) - Enable 2FA on DigitalOcean account - Regular security updates (Ubuntu unattended-upgrades) - Monitor for unauthorized access attempts - Backup encryption (GPG for backup files) --- ## Maintenance Schedule **After deployment, establish these routines:** **Daily:** - Check service health (`docker service ls`) - Review monitoring dashboards - Check backup completion logs **Weekly:** - Review security logs - Check disk space across all nodes - Verify SSL certificate expiry dates **Monthly:** - Apply security updates (`apt update && apt upgrade`) - Review capacity and performance metrics - Test backup restore procedures - Rotate non-critical secrets **Quarterly:** - Full disaster recovery drill - Review and update documentation - Capacity planning review - Security audit --- ## What's Next? **After completing setup:** 1. **Configure Operations** (`../operations/`) - Set up monitoring and alerting - Configure automated backups - Review incident response runbooks 2. **Optimize Performance** - Tune database settings - Configure caching strategies - Load test your infrastructure 3. **Add Redundancy** - Scale critical services - Set up failover procedures - Implement health checks 4. **Automate** - CI/CD pipeline for deployments - Automated testing - Infrastructure as Code (Terraform) --- **Last Updated**: January 2025 **Maintained By**: Infrastructure Team **Review Frequency**: Quarterly **Feedback**: Found an issue or have a suggestion? Open an issue on Codeberg or contact the infrastructure team. --- ## Success! 🎉 If you've completed all guides in this directory, you now have: ✅ Production-ready infrastructure on DigitalOcean ✅ High-availability database cluster (Cassandra RF=3) ✅ Caching and search infrastructure (Redis, Meilisearch) ✅ Secure backend API with automatic HTTPS ✅ React frontend with automatic SSL ✅ Multi-application architecture ready to scale ✅ Network segmentation for security ✅ Docker Swarm orchestration **Welcome to production operations!** 🚀 Now head to `../operations/` to learn how to run and maintain your infrastructure.