monorepo/cloud/infrastructure/production/setup
2025-12-02 22:48:40 -05:00
..
templates Refactored. 2025-12-02 22:48:40 -05:00
00-getting-started.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
00-multi-app-architecture.md Refactored. 2025-12-02 22:48:40 -05:00
00-network-architecture.md Refactored. 2025-12-02 22:48:40 -05:00
01_init_docker_swarm.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
02_cassandra.md Refactored. 2025-12-02 22:48:40 -05:00
03_redis.md Refactored. 2025-12-02 22:48:40 -05:00
04.5_spaces.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
04_meilisearch.md Refactored. 2025-12-02 22:48:40 -05:00
05_maplepress_backend.md Refactored. 2025-12-02 22:48:40 -05:00
06_maplepress_caddy.md Refactored. 2025-12-02 22:48:40 -05:00
07_maplepress_frontend.md Refactored. 2025-12-02 22:48:40 -05:00
08_wordpress.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
09.5_maplefile_spaces.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
09_maplefile_backend.md Refactored. 2025-12-02 22:48:40 -05:00
10_maplefile_caddy.md Refactored. 2025-12-02 22:48:40 -05:00
11_maplefile_frontend.md Refactored. 2025-12-02 22:48:40 -05:00
99_extra.md Initial commit: Open sourcing all of the Maple Open Technologies code. 2025-12-02 14:33:08 -05:00
README.md Refactored. 2025-12-02 22:48:40 -05:00

Production Infrastructure Setup Guide

Audience: DevOps Engineers, Infrastructure Team, Junior Engineers Purpose: Complete step-by-step deployment of Maple Open Technologies production infrastructure from scratch Time to Complete: 6-8 hours (first-time deployment) Prerequisites: DigitalOcean account, basic Linux knowledge, SSH access


Overview

This directory contains comprehensive guides for deploying Maple Open Technologies production infrastructure on DigitalOcean from a completely fresh start. Follow these guides in sequential order to build a complete, production-ready infrastructure.

What you'll build:

  • Docker Swarm cluster (7+ nodes)
  • High-availability databases (Cassandra 3-node cluster)
  • Caching layer (Redis)
  • Search engine (Meilisearch)
  • Backend API (Go application)
  • Frontend (React SPA)
  • Automatic HTTPS with SSL certificates
  • Multi-application architecture (MaplePress, MapleFile)

Infrastructure at completion:

Internet (HTTPS)
  ├─ getmaplepress.ca → Backend API (worker-6)
  └─ getmaplepress.com → Frontend (worker-7)
          ↓
  Backend Services (mapleopentech-public-prod + mapleopentech-private-prod)
          ↓
  Databases (mapleopentech-private-prod only)
  ├─ Cassandra: 3-node cluster (workers 2,3,4) - RF=3, QUORUM
  ├─ Redis: Single instance (worker-1/manager)
  └─ Meilisearch: Single instance (worker-5)
          ↓
  Object Storage: DigitalOcean Spaces (S3-compatible)

Setup Guides (In Order)

Phase 0: Planning & Prerequisites (30 minutes)

00-getting-started.md - Local workspace setup

  • DigitalOcean account setup
  • API token configuration
  • SSH key generation
  • .env file initialization
  • Command-line tools verification

00-network-architecture.md - Network design

  • Network segmentation strategy (mapleopentech-private-prod vs mapleopentech-public-prod)
  • Security principles (defense in depth)
  • Service communication patterns
  • Firewall rules overview

00-multi-app-architecture.md - Multi-app strategy

  • Naming conventions for services, stacks, hostnames
  • Shared infrastructure design (Cassandra/Redis/Meilisearch)
  • Application isolation patterns
  • Scaling to multiple apps (MaplePress, MapleFile)

Prerequisites checklist:

  • DigitalOcean account with billing enabled
  • DigitalOcean API token (read + write permissions)
  • SSH key pair generated (~/.ssh/id_rsa.pub)
  • Domain names registered (e.g., getmaplepress.ca, getmaplepress.com)
  • Local machine: git, ssh, curl installed
  • .env file created from .env.template

Total time: 30 minutes


Phase 1: Infrastructure Foundation (3-4 hours)

01_init_docker_swarm.md - Docker Swarm cluster

  • Create 7+ DigitalOcean droplets (Ubuntu 24.04)
  • Install Docker on all nodes
  • Initialize Docker Swarm (1 manager, 6+ workers)
  • Configure private networking (VPC)
  • Set up firewall rules
  • Verify cluster connectivity

What you'll have:

  • Manager node (worker-1): Swarm orchestration
  • Worker nodes (2-7+): Application/database hosts
  • Private network: 10.116.0.0/16
  • All nodes communicating securely

Total time: 1-1.5 hours


02_cassandra.md - Cassandra database cluster

  • Deploy 3-node Cassandra cluster (workers 2, 3, 4)
  • Configure replication (RF=3, QUORUM consistency)
  • Create keyspace and initial schema
  • Verify cluster health (nodetool status)
  • Performance tuning for production

What you'll have:

  • Highly available database cluster
  • Automatic failover (survives 1 node failure)
  • QUORUM reads/writes for consistency
  • Ready for application data

Total time: 1-1.5 hours


03_redis.md - Redis cache server

  • Deploy Redis on manager node (worker-1)
  • Configure persistence (RDB + AOF)
  • Set up password authentication
  • Test connectivity from other services

What you'll have:

  • High-performance caching layer
  • Session storage
  • Rate limiting storage
  • Persistent cache (survives restarts)

Total time: 30 minutes


04_meilisearch.md - Search engine

  • Deploy Meilisearch on worker-5
  • Configure API key authentication
  • Create initial indexes
  • Test search functionality

What you'll have:

  • Fast full-text search engine
  • Typo-tolerant search
  • Faceted filtering
  • Ready for content indexing

Total time: 30 minutes


04.5_spaces.md - Object storage

  • Create DigitalOcean Spaces bucket
  • Configure access keys
  • Set up CORS policies
  • Create Docker secrets for Spaces credentials
  • Test upload/download

What you'll have:

  • S3-compatible object storage
  • Secure credential management
  • Ready for file uploads
  • CDN-backed storage

Total time: 30 minutes


Phase 2: Application Deployment (2-3 hours)

05_maplepress_backend.md - Backend API deployment (Part 1)

  • Create worker-6 droplet
  • Join worker-6 to Docker Swarm
  • Configure DNS (point domain to worker-6)
  • Authenticate with DigitalOcean Container Registry
  • Create Docker secrets (JWT, encryption keys)
  • Deploy backend service (Go application)
  • Connect to databases (Cassandra, Redis, Meilisearch)
  • Verify health checks

What you'll have:

  • Backend API running on worker-6
  • Connected to all databases
  • Docker secrets configured
  • Health checks passing
  • Ready for reverse proxy

Total time: 1-1.5 hours


06_maplepress_caddy.md - Backend reverse proxy (Part 2)

  • Configure Caddy reverse proxy
  • Set up automatic SSL/TLS (Let's Encrypt)
  • Configure security headers
  • Enable HTTP to HTTPS redirect
  • Preserve CORS headers for frontend
  • Test SSL certificate acquisition

What you'll have:

  • Backend accessible at https://getmaplepress.ca
  • Automatic SSL certificate management
  • Zero-downtime certificate renewals
  • Security headers configured
  • CORS configured for frontend

Total time: 30 minutes


07_maplepress_frontend.md - Frontend deployment

  • Create worker-7 droplet
  • Join worker-7 to Docker Swarm
  • Install Node.js on worker-7
  • Clone repository and build React app
  • Configure production environment (API URL)
  • Deploy Caddy for static file serving
  • Configure SPA routing
  • Set up automatic SSL for frontend domain

What you'll have:

  • Frontend accessible at https://getmaplepress.com
  • React app built with production API URL
  • Automatic HTTPS
  • SPA routing working
  • Static asset caching
  • Complete end-to-end application

Total time: 1 hour


Phase 3: Optional Enhancements (1 hour)

99_extra.md - Extra operations

  • Domain changes (backend and/or frontend)
  • Horizontal scaling (multiple backend replicas)
  • SSL certificate management
  • Load balancing verification

Total time: As needed


Quick Start (Experienced Engineers)

If you're familiar with Docker Swarm and don't need detailed explanations:

# 1. Prerequisites (5 min)
cd cloud/infrastructure/production
cp .env.template .env
vi .env  # Add DIGITALOCEAN_TOKEN
source .env

# 2. Infrastructure (1 hour)
# Follow 01_init_docker_swarm.md - create 7 droplets, init swarm
# SSH to manager, run quick verification

# 3. Databases (1 hour)
# Deploy Cassandra (02), Redis (03), Meilisearch (04), Spaces (04.5)
# Verify all services: docker service ls

# 4. Applications (1 hour)
# Deploy backend (05), backend-caddy (06), frontend (07)
# Test: curl https://getmaplepress.ca/health
#       curl https://getmaplepress.com

# 5. Verify (15 min)
docker service ls  # All services 1/1
docker node ls     # All nodes Ready
# Test in browser: https://getmaplepress.com

Total time for experienced: ~3 hours


Directory Structure

setup/
├── README.md                          # This file
│
├── 00-getting-started.md             # Prerequisites & workspace setup
├── 00-network-architecture.md        # Network design principles
├── 00-multi-app-architecture.md      # Multi-app naming & strategy
│
├── 01_init_docker_swarm.md           # Docker Swarm cluster
├── 02_cassandra.md                   # Cassandra database cluster
├── 03_redis.md                       # Redis cache server
├── 04_meilisearch.md                 # Meilisearch search engine
├── 04.5_spaces.md                    # DigitalOcean Spaces (object storage)
│
├── 05_backend.md                     # Backend API deployment
├── 06_caddy.md                       # Backend reverse proxy (Caddy + SSL)
├── 07_frontend.md                    # Frontend deployment (React + Caddy)
│
├── 08_extra.md                       # Domain changes, scaling, extras
│
└── templates/                        # Configuration templates
    ├── cassandra-stack.yml.template
    ├── redis-stack.yml.template
    ├── backend-stack.yml.template
    └── Caddyfile.template

Infrastructure Specifications

Hardware Requirements

Component Droplet Size vCPUs RAM Disk Monthly Cost
Manager (worker-1) + Redis Basic 2 2 GB 50 GB $18
Cassandra Node 1 (worker-2) General Purpose 2 4 GB 80 GB $48
Cassandra Node 2 (worker-3) General Purpose 2 4 GB 80 GB $48
Cassandra Node 3 (worker-4) General Purpose 2 4 GB 80 GB $48
Meilisearch (worker-5) Basic 2 2 GB 50 GB $18
Backend (worker-6) Basic 2 2 GB 50 GB $18
Frontend (worker-7) Basic 1 1 GB 25 GB $6
Total - 13 19 GB 415 GB ~$204/mo

Additional costs:

  • DigitalOcean Spaces: $5/mo (250 GB storage + 1 TB transfer)
  • Bandwidth: Included (1 TB per droplet)
  • Backups (optional): +20% of droplet cost

Total estimated: ~$210-250/month

Software Versions

Software Version Notes
Ubuntu 24.04 LTS Base OS
Docker 27.x+ Container runtime
Docker Swarm Built-in Orchestration
Cassandra 4.1.x Database
Redis 7.x-alpine Cache
Meilisearch v1.5+ Search
Caddy 2-alpine Reverse proxy
Go 1.21+ Backend runtime
Node.js 20 LTS Frontend build

Key Concepts

Docker Swarm Architecture

Manager node (worker-1):

  • Orchestrates all services
  • Schedules tasks to workers
  • Maintains cluster state
  • Runs Redis (collocated)

Worker nodes (2-7+):

  • Execute service tasks (containers)
  • Report health to manager
  • Isolated workloads via labels

Node labels:

  • backend=true: Backend deployment target (worker-6)
  • maplepress-frontend=true: Frontend target (worker-7)

Network Architecture

mapleopentech-private-prod (overlay network):

  • All databases (Cassandra, Redis, Meilisearch)
  • Backend services (access to databases)
  • No internet access (security)
  • Internal-only communication

mapleopentech-public-prod (overlay network):

  • Caddy reverse proxies
  • Backend services (receive HTTP requests)
  • Ports 80/443 exposed to internet

Backends join BOTH networks:

  • Receive requests from Caddy (public network)
  • Access databases (private network)

Multi-Application Pattern

Shared infrastructure (workers 1-5):

  • Cassandra, Redis, Meilisearch serve ALL apps
  • Cost-efficient (1 infrastructure for unlimited apps)

Per-application deployment (workers 6+):

  • Each app gets dedicated workers
  • Independent scaling and deployment
  • Clear isolation

Example: Adding MapleFile

  • Worker-8: maplefile_backend + maplefile_backend-caddy
  • Worker-9: maplefile-frontend_caddy
  • Uses same Cassandra/Redis/Meilisearch
  • No changes to infrastructure

Common Commands Reference

Swarm Management

# List all nodes
docker node ls

# List all services
docker service ls

# View service logs
docker service logs -f maplepress_backend

# Scale service
docker service scale maplepress_backend=3

# Update service (rolling restart)
docker service update --force maplepress_backend

# Remove service
docker service rm maplepress_backend

Stack Management

# Deploy stack
docker stack deploy -c stack.yml stack-name

# List stacks
docker stack ls

# View stack services
docker stack services maplepress

# Remove stack
docker stack rm maplepress

Troubleshooting

# Check service status
docker service ps maplepress_backend

# View container logs
docker logs <container-id>

# Inspect service
docker service inspect maplepress_backend

# Check network
docker network inspect mapleopentech-private-prod

# List configs
docker config ls

# List secrets
docker secret ls

Deployment Checklist

Use this checklist to track your progress:

Phase 0: Prerequisites

  • DigitalOcean account created
  • API token generated and saved
  • SSH keys generated (ssh-keygen)
  • SSH key added to DigitalOcean
  • Domain names registered
  • .env file created from template
  • .env file has correct permissions (600)
  • Git repository cloned locally

Phase 1: Infrastructure

  • 7 droplets created (workers 1-7)
  • Docker Swarm initialized
  • All workers joined swarm
  • Private networking configured (VPC)
  • Firewall rules configured on all nodes
  • Cassandra 3-node cluster deployed
  • Cassandra cluster healthy (nodetool status)
  • Redis deployed on manager
  • Redis authentication configured
  • Meilisearch deployed on worker-5
  • Meilisearch API key configured
  • DigitalOcean Spaces bucket created
  • Spaces access keys stored as Docker secrets

Phase 2: Applications

  • Worker-6 created and joined swarm
  • Worker-6 labeled for backend
  • DNS pointing backend domain to worker-6
  • Backend Docker secrets created (JWT, IP encryption)
  • Backend service deployed
  • Backend health check passing
  • Backend Caddy deployed
  • Backend SSL certificate obtained
  • Backend accessible at https://domain.ca
  • Worker-7 created and joined swarm
  • Worker-7 labeled for frontend
  • DNS pointing frontend domain to worker-7
  • Node.js installed on worker-7
  • Repository cloned on worker-7
  • Frontend built with production API URL
  • Frontend Caddy deployed
  • Frontend SSL certificate obtained
  • Frontend accessible at https://domain.com
  • CORS working (frontend can call backend)

Phase 3: Verification

  • All services show 1/1 replicas (docker service ls)
  • All nodes show Ready (docker node ls)
  • Backend health endpoint returns 200
  • Frontend loads in browser
  • Frontend can call backend API (no CORS errors)
  • SSL certificates valid (green padlock)
  • HTTP redirects to HTTPS

Next Steps

  • Set up monitoring (see ../operations/02_monitoring_alerting.md)
  • Configure backups (see ../operations/01_backup_recovery.md)
  • Review incident runbooks (see ../operations/03_incident_response.md)

Troubleshooting Guide

Problem: Docker Swarm Join Fails

Symptoms: Worker can't join swarm, connection refused

Check:

# On manager, verify swarm is initialized
docker info | grep "Swarm: active"

# Verify firewall allows swarm ports
sudo ufw status | grep -E "2377|7946|4789"

# Get new join token
docker swarm join-token worker

Problem: Service Won't Start

Symptoms: Service stuck at 0/1 replicas

Check:

# View service events
docker service ps service-name --no-trunc

# Common issues:
# - Image not found: Authenticate with registry
# - Network not found: Create network first
# - Secret not found: Create secrets
# - No suitable node: Check node labels

Problem: DNS Not Resolving

Symptoms: Domain doesn't resolve to correct IP

Check:

# Test DNS resolution
dig yourdomain.com +short

# Should return worker IP
# If not, wait 5-60 minutes for propagation
# Or check DNS provider settings

Problem: SSL Certificate Not Obtained

Symptoms: HTTPS not working, certificate errors

Check:

# Verify DNS points to correct server
dig yourdomain.com +short

# Verify port 80 accessible (Let's Encrypt challenge)
curl http://yourdomain.com

# Check Caddy logs
docker service logs service-name --tail 100 | grep -i certificate

# Common issues:
# - DNS not pointing to server
# - Port 80 blocked by firewall
# - Rate limited (5 certs/domain/week)

Problem: Services Can't Communicate

Symptoms: Backend can't reach database

Check:

# Verify both services on same network
docker service inspect backend --format '{{.Spec.TaskTemplate.Networks}}'
docker service inspect database --format '{{.Spec.TaskTemplate.Networks}}'

# Test DNS resolution from container
docker exec <container> nslookup database-hostname

# Verify firewall allows internal traffic
sudo ufw status | grep 10.116.0.0/16

Getting Help

Documentation Resources

Within this repository:

  • This directory (setup/): Initial deployment guides
  • ../operations/: Day-to-day operational procedures
  • ../reference/: Architecture diagrams, capacity planning
  • ../automation/: Scripts for common tasks

External resources:

Common Questions

Q: Can I use a different cloud provider (AWS, GCP, Azure)? A: Yes, but you'll need to adapt networking and object storage sections. The Docker Swarm and application deployment sections remain the same.

Q: Can I deploy with fewer nodes? A: Minimum viable: 3 nodes (1 manager + 2 workers). Run Cassandra in single-node mode (not recommended for production). Colocate services on same workers.

Q: How do I add a new application (e.g., MapleFile)? A: Follow 00-multi-app-architecture.md. Add 2 workers (backend + frontend), deploy new stacks. Reuse existing databases.

Q: What if I only have one domain? A: Use subdomains: api.yourdomain.com (backend), app.yourdomain.com (frontend). Update DNS and Caddyfiles accordingly.


Security Best Practices

Implemented by these guides:

  • Firewall configured (UFW) on all nodes
  • SSH key-based authentication (no passwords)
  • Docker secrets for sensitive values
  • Network segmentation (private vs public)
  • Automatic HTTPS with Let's Encrypt
  • Security headers configured in Caddy
  • Database authentication (Redis password, Meilisearch API key)
  • Private Docker registry authentication

Additional recommendations:

  • Rotate secrets quarterly (see ../operations/07_security_operations.md)
  • Enable 2FA on DigitalOcean account
  • Regular security updates (Ubuntu unattended-upgrades)
  • Monitor for unauthorized access attempts
  • Backup encryption (GPG for backup files)

Maintenance Schedule

After deployment, establish these routines:

Daily:

  • Check service health (docker service ls)
  • Review monitoring dashboards
  • Check backup completion logs

Weekly:

  • Review security logs
  • Check disk space across all nodes
  • Verify SSL certificate expiry dates

Monthly:

  • Apply security updates (apt update && apt upgrade)
  • Review capacity and performance metrics
  • Test backup restore procedures
  • Rotate non-critical secrets

Quarterly:

  • Full disaster recovery drill
  • Review and update documentation
  • Capacity planning review
  • Security audit

What's Next?

After completing setup:

  1. Configure Operations (../operations/)

    • Set up monitoring and alerting
    • Configure automated backups
    • Review incident response runbooks
  2. Optimize Performance

    • Tune database settings
    • Configure caching strategies
    • Load test your infrastructure
  3. Add Redundancy

    • Scale critical services
    • Set up failover procedures
    • Implement health checks
  4. Automate

    • CI/CD pipeline for deployments
    • Automated testing
    • Infrastructure as Code (Terraform)

Last Updated: January 2025 Maintained By: Infrastructure Team Review Frequency: Quarterly

Feedback: Found an issue or have a suggestion? Open an issue on Codeberg or contact the infrastructure team.


Success! 🎉

If you've completed all guides in this directory, you now have:

Production-ready infrastructure on DigitalOcean High-availability database cluster (Cassandra RF=3) Caching and search infrastructure (Redis, Meilisearch) Secure backend API with automatic HTTPS React frontend with automatic SSL Multi-application architecture ready to scale Network segmentation for security Docker Swarm orchestration

Welcome to production operations! 🚀

Now head to ../operations/ to learn how to run and maintain your infrastructure.