11 KiB
Leader Election Package
Distributed leader election for MapleFile backend instances using Redis.
Overview
This package provides leader election functionality for multiple backend instances running behind a load balancer. It ensures that only one instance acts as the "leader" at any given time, with automatic failover if the leader crashes.
Features
- ✅ Redis-based: Fast, reliable leader election using Redis
- ✅ Automatic Failover: New leader elected automatically if current leader crashes
- ✅ Heartbeat Mechanism: Leader maintains lock with periodic renewals
- ✅ Callbacks: Execute custom code when becoming/losing leadership
- ✅ Graceful Shutdown: Clean leadership handoff on shutdown
- ✅ Thread-Safe: Safe for concurrent use
- ✅ Observable: Query leader status and information
How It Works
- Election: Instances compete to acquire a Redis lock (key)
- Leadership: First instance to acquire the lock becomes the leader
- Heartbeat: Leader renews the lock every
HeartbeatInterval(default: 3s) - Lock TTL: Lock expires after
LockTTLif not renewed (default: 10s) - Failover: If leader crashes, lock expires → followers compete for leadership
- Re-election: New leader elected within seconds of previous leader failure
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
│ (Leader) │ │ (Follower) │ │ (Follower) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ Heartbeat │ Try Acquire │ Try Acquire
│ (Renew Lock) │ (Check Lock) │ (Check Lock)
│ │ │
└───────────────────┴───────────────────┘
│
┌────▼────┐
│ Redis │
│ Lock │
└─────────┘
Usage
Basic Setup
import (
"context"
"github.com/redis/go-redis/v9"
"go.uber.org/zap"
"codeberg.org/mapleopentech/monorepo/cloud/maplefile-backend/pkg/leaderelection"
)
// Create Redis client (you likely already have this)
redisClient := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
})
// Create logger
logger, _ := zap.NewProduction()
// Create leader election configuration
config := leaderelection.DefaultConfig()
// Create leader election instance
election, err := leaderelection.NewRedisLeaderElection(config, redisClient, logger)
if err != nil {
panic(err)
}
// Start leader election in a goroutine
ctx := context.Background()
go func() {
if err := election.Start(ctx); err != nil {
logger.Error("Leader election failed", zap.Error(err))
}
}()
// Check if this instance is the leader
if election.IsLeader() {
logger.Info("I am the leader! 👑")
}
// Graceful shutdown
defer election.Stop()
With Callbacks
// Register callback when becoming leader
election.OnBecomeLeader(func() {
logger.Info("🎉 I became the leader!")
// Start leader-only tasks
go startBackgroundJobs()
go startMetricsAggregation()
})
// Register callback when losing leadership
election.OnLoseLeadership(func() {
logger.Info("😢 I lost leadership")
// Stop leader-only tasks
stopBackgroundJobs()
stopMetricsAggregation()
})
Integration with Application Startup
// In your main.go or app startup
func (app *Application) Start() error {
// Start leader election
go func() {
if err := app.leaderElection.Start(app.ctx); err != nil {
app.logger.Error("Leader election error", zap.Error(err))
}
}()
// Wait a moment for election to complete
time.Sleep(1 * time.Second)
if app.leaderElection.IsLeader() {
app.logger.Info("This instance is the leader")
// Start leader-only services
} else {
app.logger.Info("This instance is a follower")
// Start follower-only services (if any)
}
// Start your HTTP server, etc.
return app.httpServer.Start()
}
Conditional Logic Based on Leadership
// Only leader executes certain tasks
func (s *Service) PerformTask() {
if s.leaderElection.IsLeader() {
// Only leader does this expensive operation
s.aggregateMetrics()
}
}
// Get information about the current leader
func (s *Service) GetLeaderStatus() (*leaderelection.LeaderInfo, error) {
info, err := s.leaderElection.GetLeaderInfo()
if err != nil {
return nil, err
}
fmt.Printf("Leader: %s (%s)\n", info.InstanceID, info.Hostname)
fmt.Printf("Started: %s\n", info.StartedAt)
fmt.Printf("Last Heartbeat: %s\n", info.LastHeartbeat)
return info, nil
}
Configuration
Default Configuration
config := leaderelection.DefaultConfig()
// Returns:
// {
// RedisKeyName: "maplefile:leader:lock",
// RedisInfoKeyName: "maplefile:leader:info",
// LockTTL: 10 * time.Second,
// HeartbeatInterval: 3 * time.Second,
// RetryInterval: 2 * time.Second,
// }
Custom Configuration
config := &leaderelection.Config{
RedisKeyName: "my-app:leader",
RedisInfoKeyName: "my-app:leader:info",
LockTTL: 30 * time.Second, // Lock expires after 30s
HeartbeatInterval: 10 * time.Second, // Renew every 10s
RetryInterval: 5 * time.Second, // Check for leadership every 5s
InstanceID: "instance-1", // Custom instance ID
Hostname: "server-01", // Custom hostname
}
Configuration in Application Config
Add to your config/config.go:
type Config struct {
// ... existing fields ...
LeaderElection struct {
LockTTL time.Duration `env:"LEADER_ELECTION_LOCK_TTL" envDefault:"10s"`
HeartbeatInterval time.Duration `env:"LEADER_ELECTION_HEARTBEAT_INTERVAL" envDefault:"3s"`
RetryInterval time.Duration `env:"LEADER_ELECTION_RETRY_INTERVAL" envDefault:"2s"`
InstanceID string `env:"LEADER_ELECTION_INSTANCE_ID" envDefault:""`
Hostname string `env:"LEADER_ELECTION_HOSTNAME" envDefault:""`
}
}
Use Cases
1. Background Job Processing
Only the leader runs scheduled jobs:
election.OnBecomeLeader(func() {
go func() {
ticker := time.NewTicker(1 * time.Hour)
defer ticker.Stop()
for range ticker.C {
if election.IsLeader() {
processScheduledJobs()
}
}
}()
})
2. Database Migrations
Only the leader runs migrations on startup:
if election.IsLeader() {
logger.Info("Leader instance - running database migrations")
if err := migrator.Up(); err != nil {
return err
}
} else {
logger.Info("Follower instance - skipping migrations")
}
3. Cache Warming
Only the leader pre-loads caches:
election.OnBecomeLeader(func() {
logger.Info("Warming caches as leader")
warmApplicationCache()
})
4. Metrics Aggregation
Only the leader aggregates and sends metrics:
election.OnBecomeLeader(func() {
go func() {
ticker := time.NewTicker(1 * time.Minute)
defer ticker.Stop()
for range ticker.C {
if election.IsLeader() {
aggregateAndSendMetrics()
}
}
}()
})
5. Cleanup Tasks
Only the leader runs periodic cleanup:
election.OnBecomeLeader(func() {
go func() {
ticker := time.NewTicker(24 * time.Hour)
defer ticker.Stop()
for range ticker.C {
if election.IsLeader() {
cleanupOldRecords()
purgeExpiredSessions()
}
}
}()
})
Monitoring
Health Check Endpoint
func (h *HealthHandler) LeaderElectionHealth(w http.ResponseWriter, r *http.Request) {
info, err := h.leaderElection.GetLeaderInfo()
if err != nil {
http.Error(w, "Failed to get leader info", http.StatusInternalServerError)
return
}
response := map[string]interface{}{
"is_leader": h.leaderElection.IsLeader(),
"instance_id": h.leaderElection.GetInstanceID(),
"leader_info": info,
}
json.NewEncoder(w).Encode(response)
}
Logging
The package logs important events:
🎉 Became the leader!- When instance becomes leaderHeartbeat sent- When leader renews lock (DEBUG level)Failed to send heartbeat, lost leadership- When leader loses lockReleasing leadership voluntarily- On graceful shutdown
Testing
Local Testing with Multiple Instances
# Terminal 1
LEADER_ELECTION_INSTANCE_ID=instance-1 ./maplefile-backend
# Terminal 2
LEADER_ELECTION_INSTANCE_ID=instance-2 ./maplefile-backend
# Terminal 3
LEADER_ELECTION_INSTANCE_ID=instance-3 ./maplefile-backend
Failover Testing
- Start 3 instances
- Check logs - one will become leader
- Kill the leader instance (Ctrl+C)
- Watch logs - another instance becomes leader within seconds
Best Practices
-
Always check leadership before expensive operations
if election.IsLeader() { // expensive operation } -
Use callbacks for starting/stopping leader-only services
election.OnBecomeLeader(startLeaderServices) election.OnLoseLeadership(stopLeaderServices) -
Set appropriate timeouts
LockTTLshould be 2-3xHeartbeatInterval- Shorter TTL = faster failover but more Redis traffic
- Longer TTL = slower failover but less Redis traffic
-
Handle callback panics
- Callbacks run in goroutines and panics are caught
- But you should still handle errors gracefully
-
Always call Stop() on shutdown
defer election.Stop()
Troubleshooting
Leader keeps changing
- Increase
LockTTL(network might be slow) - Check Redis connectivity
- Check for clock skew between instances
No leader elected
- Check Redis is running and accessible
- Check Redis key permissions
- Check logs for errors
Leader doesn't release on shutdown
- Ensure
Stop()is called - Check for blocking operations preventing shutdown
- TTL will eventually expire the lock
Performance
- Election time: < 100ms
- Failover time: <
LockTTL(default: 10s) - Redis operations per second:
1 / HeartbeatInterval(default: 0.33/s) - Memory overhead: Minimal (~1KB per instance)
Thread Safety
All methods are thread-safe and can be called from multiple goroutines:
IsLeader()GetLeaderID()GetLeaderInfo()OnBecomeLeader()OnLoseLeadership()Stop()