Initial commit: Open sourcing all of the Maple Open Technologies code.

This commit is contained in:
Bartlomiej Mika 2025-12-02 14:33:08 -05:00
commit 755d54a99d
2010 changed files with 448675 additions and 0 deletions

View file

@ -0,0 +1,745 @@
# Production Infrastructure Setup Guide
**Audience**: DevOps Engineers, Infrastructure Team, Junior Engineers
**Purpose**: Complete step-by-step deployment of Maple Open Technologies production infrastructure from scratch
**Time to Complete**: 6-8 hours (first-time deployment)
**Prerequisites**: DigitalOcean account, basic Linux knowledge, SSH access
---
## Overview
This directory contains comprehensive guides for deploying Maple Open Technologies production infrastructure on DigitalOcean from a **completely fresh start**. Follow these guides in sequential order to build a complete, production-ready infrastructure.
**What you'll build:**
- Docker Swarm cluster (7+ nodes)
- High-availability databases (Cassandra 3-node cluster)
- Caching layer (Redis)
- Search engine (Meilisearch)
- Backend API (Go application)
- Frontend (React SPA)
- Automatic HTTPS with SSL certificates
- Multi-application architecture (MaplePress, MapleFile)
**Infrastructure at completion:**
```
Internet (HTTPS)
├─ getmaplepress.ca → Backend API (worker-6)
└─ getmaplepress.com → Frontend (worker-7)
Backend Services (maple-public-prod + maple-private-prod)
Databases (maple-private-prod only)
├─ Cassandra: 3-node cluster (workers 2,3,4) - RF=3, QUORUM
├─ Redis: Single instance (worker-1/manager)
└─ Meilisearch: Single instance (worker-5)
Object Storage: DigitalOcean Spaces (S3-compatible)
```
---
## Setup Guides (In Order)
### Phase 0: Planning & Prerequisites (30 minutes)
**[00-getting-started.md](00-getting-started.md)** - Local workspace setup
- DigitalOcean account setup
- API token configuration
- SSH key generation
- `.env` file initialization
- Command-line tools verification
**[00-network-architecture.md](00-network-architecture.md)** - Network design
- Network segmentation strategy (`maple-private-prod` vs `maple-public-prod`)
- Security principles (defense in depth)
- Service communication patterns
- Firewall rules overview
**[00-multi-app-architecture.md](00-multi-app-architecture.md)** - Multi-app strategy
- Naming conventions for services, stacks, hostnames
- Shared infrastructure design (Cassandra/Redis/Meilisearch)
- Application isolation patterns
- Scaling to multiple apps (MaplePress, MapleFile)
**Prerequisites checklist:**
- [ ] DigitalOcean account with billing enabled
- [ ] DigitalOcean API token (read + write permissions)
- [ ] SSH key pair generated (`~/.ssh/id_rsa.pub`)
- [ ] Domain names registered (e.g., `getmaplepress.ca`, `getmaplepress.com`)
- [ ] Local machine: git, ssh, curl installed
- [ ] `.env` file created from `.env.template`
**Total time: 30 minutes**
---
### Phase 1: Infrastructure Foundation (3-4 hours)
**[01_init_docker_swarm.md](01_init_docker_swarm.md)** - Docker Swarm cluster
- Create 7+ DigitalOcean droplets (Ubuntu 24.04)
- Install Docker on all nodes
- Initialize Docker Swarm (1 manager, 6+ workers)
- Configure private networking (VPC)
- Set up firewall rules
- Verify cluster connectivity
**What you'll have:**
- Manager node (worker-1): Swarm orchestration
- Worker nodes (2-7+): Application/database hosts
- Private network: 10.116.0.0/16
- All nodes communicating securely
**Total time: 1-1.5 hours**
---
**[02_cassandra.md](02_cassandra.md)** - Cassandra database cluster
- Deploy 3-node Cassandra cluster (workers 2, 3, 4)
- Configure replication (RF=3, QUORUM consistency)
- Create keyspace and initial schema
- Verify cluster health (`nodetool status`)
- Performance tuning for production
**What you'll have:**
- Highly available database cluster
- Automatic failover (survives 1 node failure)
- QUORUM reads/writes for consistency
- Ready for application data
**Total time: 1-1.5 hours**
---
**[03_redis.md](03_redis.md)** - Redis cache server
- Deploy Redis on manager node (worker-1)
- Configure persistence (RDB + AOF)
- Set up password authentication
- Test connectivity from other services
**What you'll have:**
- High-performance caching layer
- Session storage
- Rate limiting storage
- Persistent cache (survives restarts)
**Total time: 30 minutes**
---
**[04_meilisearch.md](04_meilisearch.md)** - Search engine
- Deploy Meilisearch on worker-5
- Configure API key authentication
- Create initial indexes
- Test search functionality
**What you'll have:**
- Fast full-text search engine
- Typo-tolerant search
- Faceted filtering
- Ready for content indexing
**Total time: 30 minutes**
---
**[04.5_spaces.md](04.5_spaces.md)** - Object storage
- Create DigitalOcean Spaces bucket
- Configure access keys
- Set up CORS policies
- Create Docker secrets for Spaces credentials
- Test upload/download
**What you'll have:**
- S3-compatible object storage
- Secure credential management
- Ready for file uploads
- CDN-backed storage
**Total time: 30 minutes**
---
### Phase 2: Application Deployment (2-3 hours)
**[05_maplepress_backend.md](05_maplepress_backend.md)** - Backend API deployment (Part 1)
- Create worker-6 droplet
- Join worker-6 to Docker Swarm
- Configure DNS (point domain to worker-6)
- Authenticate with DigitalOcean Container Registry
- Create Docker secrets (JWT, encryption keys)
- Deploy backend service (Go application)
- Connect to databases (Cassandra, Redis, Meilisearch)
- Verify health checks
**What you'll have:**
- Backend API running on worker-6
- Connected to all databases
- Docker secrets configured
- Health checks passing
- Ready for reverse proxy
**Total time: 1-1.5 hours**
---
**[06_maplepress_caddy.md](06_maplepress_caddy.md)** - Backend reverse proxy (Part 2)
- Configure Caddy reverse proxy
- Set up automatic SSL/TLS (Let's Encrypt)
- Configure security headers
- Enable HTTP to HTTPS redirect
- Preserve CORS headers for frontend
- Test SSL certificate acquisition
**What you'll have:**
- Backend accessible at `https://getmaplepress.ca`
- Automatic SSL certificate management
- Zero-downtime certificate renewals
- Security headers configured
- CORS configured for frontend
**Total time: 30 minutes**
---
**[07_maplepress_frontend.md](07_maplepress_frontend.md)** - Frontend deployment
- Create worker-7 droplet
- Join worker-7 to Docker Swarm
- Install Node.js on worker-7
- Clone repository and build React app
- Configure production environment (API URL)
- Deploy Caddy for static file serving
- Configure SPA routing
- Set up automatic SSL for frontend domain
**What you'll have:**
- Frontend accessible at `https://getmaplepress.com`
- React app built with production API URL
- Automatic HTTPS
- SPA routing working
- Static asset caching
- Complete end-to-end application
**Total time: 1 hour**
---
### Phase 3: Optional Enhancements (1 hour)
**[99_extra.md](99_extra.md)** - Extra operations
- Domain changes (backend and/or frontend)
- Horizontal scaling (multiple backend replicas)
- SSL certificate management
- Load balancing verification
**Total time: As needed**
---
## Quick Start (Experienced Engineers)
**If you're familiar with Docker Swarm and don't need detailed explanations:**
```bash
# 1. Prerequisites (5 min)
cd cloud/infrastructure/production
cp .env.template .env
vi .env # Add DIGITALOCEAN_TOKEN
source .env
# 2. Infrastructure (1 hour)
# Follow 01_init_docker_swarm.md - create 7 droplets, init swarm
# SSH to manager, run quick verification
# 3. Databases (1 hour)
# Deploy Cassandra (02), Redis (03), Meilisearch (04), Spaces (04.5)
# Verify all services: docker service ls
# 4. Applications (1 hour)
# Deploy backend (05), backend-caddy (06), frontend (07)
# Test: curl https://getmaplepress.ca/health
# curl https://getmaplepress.com
# 5. Verify (15 min)
docker service ls # All services 1/1
docker node ls # All nodes Ready
# Test in browser: https://getmaplepress.com
```
**Total time for experienced: ~3 hours**
---
## Directory Structure
```
setup/
├── README.md # This file
├── 00-getting-started.md # Prerequisites & workspace setup
├── 00-network-architecture.md # Network design principles
├── 00-multi-app-architecture.md # Multi-app naming & strategy
├── 01_init_docker_swarm.md # Docker Swarm cluster
├── 02_cassandra.md # Cassandra database cluster
├── 03_redis.md # Redis cache server
├── 04_meilisearch.md # Meilisearch search engine
├── 04.5_spaces.md # DigitalOcean Spaces (object storage)
├── 05_backend.md # Backend API deployment
├── 06_caddy.md # Backend reverse proxy (Caddy + SSL)
├── 07_frontend.md # Frontend deployment (React + Caddy)
├── 08_extra.md # Domain changes, scaling, extras
└── templates/ # Configuration templates
├── cassandra-stack.yml.template
├── redis-stack.yml.template
├── backend-stack.yml.template
└── Caddyfile.template
```
---
## Infrastructure Specifications
### Hardware Requirements
| Component | Droplet Size | vCPUs | RAM | Disk | Monthly Cost |
|-----------|--------------|-------|-----|------|--------------|
| Manager (worker-1) + Redis | Basic | 2 | 2 GB | 50 GB | $18 |
| Cassandra Node 1 (worker-2) | General Purpose | 2 | 4 GB | 80 GB | $48 |
| Cassandra Node 2 (worker-3) | General Purpose | 2 | 4 GB | 80 GB | $48 |
| Cassandra Node 3 (worker-4) | General Purpose | 2 | 4 GB | 80 GB | $48 |
| Meilisearch (worker-5) | Basic | 2 | 2 GB | 50 GB | $18 |
| Backend (worker-6) | Basic | 2 | 2 GB | 50 GB | $18 |
| Frontend (worker-7) | Basic | 1 | 1 GB | 25 GB | $6 |
| **Total** | - | **13** | **19 GB** | **415 GB** | **~$204/mo** |
**Additional costs:**
- DigitalOcean Spaces: $5/mo (250 GB storage + 1 TB transfer)
- Bandwidth: Included (1 TB per droplet)
- Backups (optional): +20% of droplet cost
**Total estimated: ~$210-250/month**
### Software Versions
| Software | Version | Notes |
|----------|---------|-------|
| Ubuntu | 24.04 LTS | Base OS |
| Docker | 27.x+ | Container runtime |
| Docker Swarm | Built-in | Orchestration |
| Cassandra | 4.1.x | Database |
| Redis | 7.x-alpine | Cache |
| Meilisearch | v1.5+ | Search |
| Caddy | 2-alpine | Reverse proxy |
| Go | 1.21+ | Backend runtime |
| Node.js | 20 LTS | Frontend build |
---
## Key Concepts
### Docker Swarm Architecture
**Manager node (worker-1):**
- Orchestrates all services
- Schedules tasks to workers
- Maintains cluster state
- Runs Redis (collocated)
**Worker nodes (2-7+):**
- Execute service tasks (containers)
- Report health to manager
- Isolated workloads via labels
**Node labels:**
- `backend=true`: Backend deployment target (worker-6)
- `maplepress-frontend=true`: Frontend target (worker-7)
### Network Architecture
**`maple-private-prod` (overlay network):**
- All databases (Cassandra, Redis, Meilisearch)
- Backend services (access to databases)
- **No internet access** (security)
- Internal-only communication
**`maple-public-prod` (overlay network):**
- Caddy reverse proxies
- Backend services (receive HTTP requests)
- Ports 80/443 exposed to internet
**Backends join BOTH networks:**
- Receive requests from Caddy (public network)
- Access databases (private network)
### Multi-Application Pattern
**Shared infrastructure (workers 1-5):**
- Cassandra, Redis, Meilisearch serve ALL apps
- Cost-efficient (1 infrastructure for unlimited apps)
**Per-application deployment (workers 6+):**
- Each app gets dedicated workers
- Independent scaling and deployment
- Clear isolation
**Example: Adding MapleFile**
- Worker-8: `maplefile_backend` + `maplefile_backend-caddy`
- Worker-9: `maplefile-frontend_caddy`
- Uses same Cassandra/Redis/Meilisearch
- No changes to infrastructure
---
## Common Commands Reference
### Swarm Management
```bash
# List all nodes
docker node ls
# List all services
docker service ls
# View service logs
docker service logs -f maplepress_backend
# Scale service
docker service scale maplepress_backend=3
# Update service (rolling restart)
docker service update --force maplepress_backend
# Remove service
docker service rm maplepress_backend
```
### Stack Management
```bash
# Deploy stack
docker stack deploy -c stack.yml stack-name
# List stacks
docker stack ls
# View stack services
docker stack services maplepress
# Remove stack
docker stack rm maplepress
```
### Troubleshooting
```bash
# Check service status
docker service ps maplepress_backend
# View container logs
docker logs <container-id>
# Inspect service
docker service inspect maplepress_backend
# Check network
docker network inspect maple-private-prod
# List configs
docker config ls
# List secrets
docker secret ls
```
---
## Deployment Checklist
**Use this checklist to track your progress:**
### Phase 0: Prerequisites
- [ ] DigitalOcean account created
- [ ] API token generated and saved
- [ ] SSH keys generated (`ssh-keygen`)
- [ ] SSH key added to DigitalOcean
- [ ] Domain names registered
- [ ] `.env` file created from template
- [ ] `.env` file has correct permissions (600)
- [ ] Git repository cloned locally
### Phase 1: Infrastructure
- [ ] 7 droplets created (workers 1-7)
- [ ] Docker Swarm initialized
- [ ] All workers joined swarm
- [ ] Private networking configured (VPC)
- [ ] Firewall rules configured on all nodes
- [ ] Cassandra 3-node cluster deployed
- [ ] Cassandra cluster healthy (`nodetool status`)
- [ ] Redis deployed on manager
- [ ] Redis authentication configured
- [ ] Meilisearch deployed on worker-5
- [ ] Meilisearch API key configured
- [ ] DigitalOcean Spaces bucket created
- [ ] Spaces access keys stored as Docker secrets
### Phase 2: Applications
- [ ] Worker-6 created and joined swarm
- [ ] Worker-6 labeled for backend
- [ ] DNS pointing backend domain to worker-6
- [ ] Backend Docker secrets created (JWT, IP encryption)
- [ ] Backend service deployed
- [ ] Backend health check passing
- [ ] Backend Caddy deployed
- [ ] Backend SSL certificate obtained
- [ ] Backend accessible at `https://domain.ca`
- [ ] Worker-7 created and joined swarm
- [ ] Worker-7 labeled for frontend
- [ ] DNS pointing frontend domain to worker-7
- [ ] Node.js installed on worker-7
- [ ] Repository cloned on worker-7
- [ ] Frontend built with production API URL
- [ ] Frontend Caddy deployed
- [ ] Frontend SSL certificate obtained
- [ ] Frontend accessible at `https://domain.com`
- [ ] CORS working (frontend can call backend)
### Phase 3: Verification
- [ ] All services show 1/1 replicas (`docker service ls`)
- [ ] All nodes show Ready (`docker node ls`)
- [ ] Backend health endpoint returns 200
- [ ] Frontend loads in browser
- [ ] Frontend can call backend API (no CORS errors)
- [ ] SSL certificates valid (green padlock)
- [ ] HTTP redirects to HTTPS
### Next Steps
- [ ] Set up monitoring (see `../operations/02_monitoring_alerting.md`)
- [ ] Configure backups (see `../operations/01_backup_recovery.md`)
- [ ] Review incident runbooks (see `../operations/03_incident_response.md`)
---
## Troubleshooting Guide
### Problem: Docker Swarm Join Fails
**Symptoms:** Worker can't join swarm, connection refused
**Check:**
```bash
# On manager, verify swarm is initialized
docker info | grep "Swarm: active"
# Verify firewall allows swarm ports
sudo ufw status | grep -E "2377|7946|4789"
# Get new join token
docker swarm join-token worker
```
### Problem: Service Won't Start
**Symptoms:** Service stuck at 0/1 replicas
**Check:**
```bash
# View service events
docker service ps service-name --no-trunc
# Common issues:
# - Image not found: Authenticate with registry
# - Network not found: Create network first
# - Secret not found: Create secrets
# - No suitable node: Check node labels
```
### Problem: DNS Not Resolving
**Symptoms:** Domain doesn't resolve to correct IP
**Check:**
```bash
# Test DNS resolution
dig yourdomain.com +short
# Should return worker IP
# If not, wait 5-60 minutes for propagation
# Or check DNS provider settings
```
### Problem: SSL Certificate Not Obtained
**Symptoms:** HTTPS not working, certificate errors
**Check:**
```bash
# Verify DNS points to correct server
dig yourdomain.com +short
# Verify port 80 accessible (Let's Encrypt challenge)
curl http://yourdomain.com
# Check Caddy logs
docker service logs service-name --tail 100 | grep -i certificate
# Common issues:
# - DNS not pointing to server
# - Port 80 blocked by firewall
# - Rate limited (5 certs/domain/week)
```
### Problem: Services Can't Communicate
**Symptoms:** Backend can't reach database
**Check:**
```bash
# Verify both services on same network
docker service inspect backend --format '{{.Spec.TaskTemplate.Networks}}'
docker service inspect database --format '{{.Spec.TaskTemplate.Networks}}'
# Test DNS resolution from container
docker exec <container> nslookup database-hostname
# Verify firewall allows internal traffic
sudo ufw status | grep 10.116.0.0/16
```
---
## Getting Help
### Documentation Resources
**Within this repository:**
- This directory (`setup/`): Initial deployment guides
- `../operations/`: Day-to-day operational procedures
- `../reference/`: Architecture diagrams, capacity planning
- `../automation/`: Scripts for common tasks
**External resources:**
- Docker Swarm: https://docs.docker.com/engine/swarm/
- Cassandra: https://cassandra.apache.org/doc/latest/
- DigitalOcean: https://docs.digitalocean.com/
- Caddy: https://caddyserver.com/docs/
### Common Questions
**Q: Can I use a different cloud provider (AWS, GCP, Azure)?**
A: Yes, but you'll need to adapt networking and object storage sections. The Docker Swarm and application deployment sections remain the same.
**Q: Can I deploy with fewer nodes?**
A: Minimum viable: 3 nodes (1 manager + 2 workers). Run Cassandra in single-node mode (not recommended for production). Colocate services on same workers.
**Q: How do I add a new application (e.g., MapleFile)?**
A: Follow `00-multi-app-architecture.md`. Add 2 workers (backend + frontend), deploy new stacks. Reuse existing databases.
**Q: What if I only have one domain?**
A: Use subdomains: `api.yourdomain.com` (backend), `app.yourdomain.com` (frontend). Update DNS and Caddyfiles accordingly.
---
## Security Best Practices
**Implemented by these guides:**
- ✅ Firewall configured (UFW) on all nodes
- ✅ SSH key-based authentication (no passwords)
- ✅ Docker secrets for sensitive values
- ✅ Network segmentation (private vs public)
- ✅ Automatic HTTPS with Let's Encrypt
- ✅ Security headers configured in Caddy
- ✅ Database authentication (Redis password, Meilisearch API key)
- ✅ Private Docker registry authentication
**Additional recommendations:**
- Rotate secrets quarterly (see `../operations/07_security_operations.md`)
- Enable 2FA on DigitalOcean account
- Regular security updates (Ubuntu unattended-upgrades)
- Monitor for unauthorized access attempts
- Backup encryption (GPG for backup files)
---
## Maintenance Schedule
**After deployment, establish these routines:**
**Daily:**
- Check service health (`docker service ls`)
- Review monitoring dashboards
- Check backup completion logs
**Weekly:**
- Review security logs
- Check disk space across all nodes
- Verify SSL certificate expiry dates
**Monthly:**
- Apply security updates (`apt update && apt upgrade`)
- Review capacity and performance metrics
- Test backup restore procedures
- Rotate non-critical secrets
**Quarterly:**
- Full disaster recovery drill
- Review and update documentation
- Capacity planning review
- Security audit
---
## What's Next?
**After completing setup:**
1. **Configure Operations** (`../operations/`)
- Set up monitoring and alerting
- Configure automated backups
- Review incident response runbooks
2. **Optimize Performance**
- Tune database settings
- Configure caching strategies
- Load test your infrastructure
3. **Add Redundancy**
- Scale critical services
- Set up failover procedures
- Implement health checks
4. **Automate**
- CI/CD pipeline for deployments
- Automated testing
- Infrastructure as Code (Terraform)
---
**Last Updated**: January 2025
**Maintained By**: Infrastructure Team
**Review Frequency**: Quarterly
**Feedback**: Found an issue or have a suggestion? Open an issue on Codeberg or contact the infrastructure team.
---
## Success! 🎉
If you've completed all guides in this directory, you now have:
✅ Production-ready infrastructure on DigitalOcean
✅ High-availability database cluster (Cassandra RF=3)
✅ Caching and search infrastructure (Redis, Meilisearch)
✅ Secure backend API with automatic HTTPS
✅ React frontend with automatic SSL
✅ Multi-application architecture ready to scale
✅ Network segmentation for security
✅ Docker Swarm orchestration
**Welcome to production operations!** 🚀
Now head to `../operations/` to learn how to run and maintain your infrastructure.