# Leader Election Package Distributed leader election for MapleFile backend instances using Redis. ## Overview This package provides leader election functionality for multiple backend instances running behind a load balancer. It ensures that only one instance acts as the "leader" at any given time, with automatic failover if the leader crashes. ## Features - ✅ **Redis-based**: Fast, reliable leader election using Redis - ✅ **Automatic Failover**: New leader elected automatically if current leader crashes - ✅ **Heartbeat Mechanism**: Leader maintains lock with periodic renewals - ✅ **Callbacks**: Execute custom code when becoming/losing leadership - ✅ **Graceful Shutdown**: Clean leadership handoff on shutdown - ✅ **Thread-Safe**: Safe for concurrent use - ✅ **Observable**: Query leader status and information ## How It Works 1. **Election**: Instances compete to acquire a Redis lock (key) 2. **Leadership**: First instance to acquire the lock becomes the leader 3. **Heartbeat**: Leader renews the lock every `HeartbeatInterval` (default: 3s) 4. **Lock TTL**: Lock expires after `LockTTL` if not renewed (default: 10s) 5. **Failover**: If leader crashes, lock expires → followers compete for leadership 6. **Re-election**: New leader elected within seconds of previous leader failure ## Architecture ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Instance 1 │ │ Instance 2 │ │ Instance 3 │ │ (Leader) │ │ (Follower) │ │ (Follower) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ Heartbeat │ Try Acquire │ Try Acquire │ (Renew Lock) │ (Check Lock) │ (Check Lock) │ │ │ └───────────────────┴───────────────────┘ │ ┌────▼────┐ │ Redis │ │ Lock │ └─────────┘ ``` ## Usage ### Basic Setup ```go import ( "context" "github.com/redis/go-redis/v9" "go.uber.org/zap" "codeberg.org/mapleopentech/monorepo/cloud/maplefile-backend/pkg/leaderelection" ) // Create Redis client (you likely already have this) redisClient := redis.NewClient(&redis.Options{ Addr: "localhost:6379", }) // Create logger logger, _ := zap.NewProduction() // Create leader election configuration config := leaderelection.DefaultConfig() // Create leader election instance election, err := leaderelection.NewRedisLeaderElection(config, redisClient, logger) if err != nil { panic(err) } // Start leader election in a goroutine ctx := context.Background() go func() { if err := election.Start(ctx); err != nil { logger.Error("Leader election failed", zap.Error(err)) } }() // Check if this instance is the leader if election.IsLeader() { logger.Info("I am the leader! 👑") } // Graceful shutdown defer election.Stop() ``` ### With Callbacks ```go // Register callback when becoming leader election.OnBecomeLeader(func() { logger.Info("🎉 I became the leader!") // Start leader-only tasks go startBackgroundJobs() go startMetricsAggregation() }) // Register callback when losing leadership election.OnLoseLeadership(func() { logger.Info("😢 I lost leadership") // Stop leader-only tasks stopBackgroundJobs() stopMetricsAggregation() }) ``` ### Integration with Application Startup ```go // In your main.go or app startup func (app *Application) Start() error { // Start leader election go func() { if err := app.leaderElection.Start(app.ctx); err != nil { app.logger.Error("Leader election error", zap.Error(err)) } }() // Wait a moment for election to complete time.Sleep(1 * time.Second) if app.leaderElection.IsLeader() { app.logger.Info("This instance is the leader") // Start leader-only services } else { app.logger.Info("This instance is a follower") // Start follower-only services (if any) } // Start your HTTP server, etc. return app.httpServer.Start() } ``` ### Conditional Logic Based on Leadership ```go // Only leader executes certain tasks func (s *Service) PerformTask() { if s.leaderElection.IsLeader() { // Only leader does this expensive operation s.aggregateMetrics() } } // Get information about the current leader func (s *Service) GetLeaderStatus() (*leaderelection.LeaderInfo, error) { info, err := s.leaderElection.GetLeaderInfo() if err != nil { return nil, err } fmt.Printf("Leader: %s (%s)\n", info.InstanceID, info.Hostname) fmt.Printf("Started: %s\n", info.StartedAt) fmt.Printf("Last Heartbeat: %s\n", info.LastHeartbeat) return info, nil } ``` ## Configuration ### Default Configuration ```go config := leaderelection.DefaultConfig() // Returns: // { // RedisKeyName: "maplefile:leader:lock", // RedisInfoKeyName: "maplefile:leader:info", // LockTTL: 10 * time.Second, // HeartbeatInterval: 3 * time.Second, // RetryInterval: 2 * time.Second, // } ``` ### Custom Configuration ```go config := &leaderelection.Config{ RedisKeyName: "my-app:leader", RedisInfoKeyName: "my-app:leader:info", LockTTL: 30 * time.Second, // Lock expires after 30s HeartbeatInterval: 10 * time.Second, // Renew every 10s RetryInterval: 5 * time.Second, // Check for leadership every 5s InstanceID: "instance-1", // Custom instance ID Hostname: "server-01", // Custom hostname } ``` ### Configuration in Application Config Add to your `config/config.go`: ```go type Config struct { // ... existing fields ... LeaderElection struct { LockTTL time.Duration `env:"LEADER_ELECTION_LOCK_TTL" envDefault:"10s"` HeartbeatInterval time.Duration `env:"LEADER_ELECTION_HEARTBEAT_INTERVAL" envDefault:"3s"` RetryInterval time.Duration `env:"LEADER_ELECTION_RETRY_INTERVAL" envDefault:"2s"` InstanceID string `env:"LEADER_ELECTION_INSTANCE_ID" envDefault:""` Hostname string `env:"LEADER_ELECTION_HOSTNAME" envDefault:""` } } ``` ## Use Cases ### 1. Background Job Processing Only the leader runs scheduled jobs: ```go election.OnBecomeLeader(func() { go func() { ticker := time.NewTicker(1 * time.Hour) defer ticker.Stop() for range ticker.C { if election.IsLeader() { processScheduledJobs() } } }() }) ``` ### 2. Database Migrations Only the leader runs migrations on startup: ```go if election.IsLeader() { logger.Info("Leader instance - running database migrations") if err := migrator.Up(); err != nil { return err } } else { logger.Info("Follower instance - skipping migrations") } ``` ### 3. Cache Warming Only the leader pre-loads caches: ```go election.OnBecomeLeader(func() { logger.Info("Warming caches as leader") warmApplicationCache() }) ``` ### 4. Metrics Aggregation Only the leader aggregates and sends metrics: ```go election.OnBecomeLeader(func() { go func() { ticker := time.NewTicker(1 * time.Minute) defer ticker.Stop() for range ticker.C { if election.IsLeader() { aggregateAndSendMetrics() } } }() }) ``` ### 5. Cleanup Tasks Only the leader runs periodic cleanup: ```go election.OnBecomeLeader(func() { go func() { ticker := time.NewTicker(24 * time.Hour) defer ticker.Stop() for range ticker.C { if election.IsLeader() { cleanupOldRecords() purgeExpiredSessions() } } }() }) ``` ## Monitoring ### Health Check Endpoint ```go func (h *HealthHandler) LeaderElectionHealth(w http.ResponseWriter, r *http.Request) { info, err := h.leaderElection.GetLeaderInfo() if err != nil { http.Error(w, "Failed to get leader info", http.StatusInternalServerError) return } response := map[string]interface{}{ "is_leader": h.leaderElection.IsLeader(), "instance_id": h.leaderElection.GetInstanceID(), "leader_info": info, } json.NewEncoder(w).Encode(response) } ``` ### Logging The package logs important events: - `🎉 Became the leader!` - When instance becomes leader - `Heartbeat sent` - When leader renews lock (DEBUG level) - `Failed to send heartbeat, lost leadership` - When leader loses lock - `Releasing leadership voluntarily` - On graceful shutdown ## Testing ### Local Testing with Multiple Instances ```bash # Terminal 1 LEADER_ELECTION_INSTANCE_ID=instance-1 ./maplefile-backend # Terminal 2 LEADER_ELECTION_INSTANCE_ID=instance-2 ./maplefile-backend # Terminal 3 LEADER_ELECTION_INSTANCE_ID=instance-3 ./maplefile-backend ``` ### Failover Testing 1. Start 3 instances 2. Check logs - one will become leader 3. Kill the leader instance (Ctrl+C) 4. Watch logs - another instance becomes leader within seconds ## Best Practices 1. **Always check leadership before expensive operations** ```go if election.IsLeader() { // expensive operation } ``` 2. **Use callbacks for starting/stopping leader-only services** ```go election.OnBecomeLeader(startLeaderServices) election.OnLoseLeadership(stopLeaderServices) ``` 3. **Set appropriate timeouts** - `LockTTL` should be 2-3x `HeartbeatInterval` - Shorter TTL = faster failover but more Redis traffic - Longer TTL = slower failover but less Redis traffic 4. **Handle callback panics** - Callbacks run in goroutines and panics are caught - But you should still handle errors gracefully 5. **Always call Stop() on shutdown** ```go defer election.Stop() ``` ## Troubleshooting ### Leader keeps changing - Increase `LockTTL` (network might be slow) - Check Redis connectivity - Check for clock skew between instances ### No leader elected - Check Redis is running and accessible - Check Redis key permissions - Check logs for errors ### Leader doesn't release on shutdown - Ensure `Stop()` is called - Check for blocking operations preventing shutdown - TTL will eventually expire the lock ## Performance - **Election time**: < 100ms - **Failover time**: < `LockTTL` (default: 10s) - **Redis operations per second**: `1 / HeartbeatInterval` (default: 0.33/s) - **Memory overhead**: Minimal (~1KB per instance) ## Thread Safety All methods are thread-safe and can be called from multiple goroutines: - `IsLeader()` - `GetLeaderID()` - `GetLeaderInfo()` - `OnBecomeLeader()` - `OnLoseLeadership()` - `Stop()`