# Token Manager Service ## Table of Contents 1. [Overview](#overview) 2. [Why Do We Need This?](#why-do-we-need-this) 3. [How It Works](#how-it-works) 4. [Architecture](#architecture) 5. [Configuration](#configuration) 6. [Lifecycle Management](#lifecycle-management) 7. [Error Handling](#error-handling) 8. [Testing](#testing) 9. [Troubleshooting](#troubleshooting) 10. [Examples](#examples) --- ## Overview The Token Manager is a background service that automatically refreshes authentication tokens before they expire. This ensures users stay logged in without interruption and don't experience failed API requests due to expired tokens. **Key Benefits:** - ✅ Seamless user experience (no sudden logouts) - ✅ No failed API requests due to expired tokens - ✅ Automatic cleanup on app shutdown - ✅ Graceful handling of refresh failures --- ## Why Do We Need This? ### The Problem When you log into MapleFile, the backend gives you two tokens: 1. **Access Token** - Used for API requests (expires quickly, e.g., 1 hour) 2. **Refresh Token** - Used to get new access tokens (lasts longer, e.g., 30 days) **Without Token Manager:** ``` User logs in → Gets tokens (expires in 1 hour) User works for 61 minutes User tries to upload file → ❌ 401 Unauthorized! User gets logged out → 😞 Lost work, has to login again ``` **With Token Manager:** ``` User logs in → Gets tokens (expires in 1 hour) Token Manager checks every 30 seconds At 59 minutes → Token Manager refreshes tokens automatically User works for hours → ✅ Everything just works! ``` ### The Solution The Token Manager runs in the background and: 1. **Checks** token expiration every 30 seconds 2. **Refreshes** tokens when < 1 minute remains 3. **Handles failures** gracefully (3 strikes = logout) 4. **Shuts down cleanly** when app closes --- ## How It Works ### High-Level Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ Application Lifecycle │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ App Starts / User Logs In │ └──────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Token Manager Starts │ │ (background goroutine) │ └──────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Every 30 seconds: │ │ 1. Check session │ │ 2. Calculate time until expiry │ │ 3. Refresh if < 1 minute │ └──────────────────────────────────────┘ │ ┌─────────┴─────────┐ ▼ ▼ ┌───────────────────┐ ┌──────────────────┐ │ Refresh Success │ │ Refresh Failed │ │ (reset counter) │ │ (increment) │ └───────────────────┘ └──────────────────┘ │ ▼ ┌──────────────────┐ │ 3 failures? │ └──────────────────┘ │ Yes │ No ┌──────────┴──────┐ ▼ ▼ ┌─────────────────┐ ┌──────────┐ │ Force Logout │ │ Continue │ └─────────────────┘ └──────────┘ │ ▼ ┌──────────────────────────────────────┐ │ App Shuts Down / User Logs Out │ └──────────────────────────────────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Token Manager Stops Gracefully │ │ (goroutine cleanup) │ └──────────────────────────────────────┘ ``` ### Detailed Process #### 1. **Starting the Token Manager** When a user logs in OR when the app restarts with a valid session: ```go // In CompleteLogin or Startup tokenManager.Start() ``` This creates a background goroutine that runs continuously. #### 2. **Background Refresh Loop** The goroutine runs this logic every 30 seconds: ```go 1. Get current session from LevelDB 2. Check if session exists and is valid 3. Calculate: timeUntilExpiry = session.ExpiresAt - time.Now() 4. If timeUntilExpiry < 1 minute: a. Call API to refresh tokens b. API returns new access + refresh tokens c. Tokens automatically saved to session 5. If refresh fails: a. Increment failure counter b. If counter >= 3: Force logout 6. If refresh succeeds: a. Reset failure counter to 0 ``` #### 3. **Stopping the Token Manager** When user logs out OR app shuts down: ```go // Create a timeout context (max 3 seconds) ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) defer cancel() // Stop gracefully tokenManager.Stop(ctx) ``` This signals the goroutine to stop and waits for confirmation. --- ## Architecture ### Component Structure ``` internal/service/tokenmanager/ ├── config.go # Configuration settings ├── manager.go # Main token manager logic ├── provider.go # Wire dependency injection └── README.md # This file ``` ### Key Components #### 1. **Manager Struct** ```go type Manager struct { // Dependencies config Config // Settings (intervals, thresholds) client *client.Client // API client for token refresh authService *auth.Service // Auth service for logout getSession *session.GetByIdUseCase // Get current session logger *zap.Logger // Structured logging // Lifecycle management ctx context.Context // Manager's context cancel context.CancelFunc // Cancel function stopCh chan struct{} // Signal to stop stoppedCh chan struct{} // Confirmation of stopped running atomic.Bool // Is manager running? // Refresh state mu sync.Mutex // Protects failure counter consecutiveFailures int // Track failures } ``` #### 2. **Config Struct** ```go type Config struct { RefreshBeforeExpiry time.Duration // How early to refresh (default: 1 min) CheckInterval time.Duration // How often to check (default: 30 sec) MaxConsecutiveFailures int // Failures before logout (default: 3) } ``` ### Goroutine Management #### Why Use Goroutines? A **goroutine** is Go's way of running code in the background (like a separate thread). We need this because: - Main app needs to respond to UI events - Token checking can happen in the background - No blocking of user actions #### The Double-Channel Pattern We use **two channels** for clean shutdown: ```go stopCh chan struct{} // We close this to signal "please stop" stoppedCh chan struct{} // Goroutine closes this to say "I stopped" ``` **Why two channels?** ```go // Without confirmation: close(stopCh) // Signal stop // Goroutine might still be running! ⚠️ // App shuts down → goroutine orphaned → potential crash // With confirmation: close(stopCh) // Signal stop <-stoppedCh // Wait for confirmation // Now we KNOW goroutine is done ✅ ``` #### Thread Safety **Problem:** Multiple parts of the app might access the token manager at once. **Solution:** Use synchronization primitives: 1. **`atomic.Bool` for running flag** ```go // Atomic operations are thread-safe (no mutex needed) if !tm.running.CompareAndSwap(false, true) { return // Already running, don't start again } ``` 2. **`sync.Mutex` for failure counter** ```go // Lock before accessing shared data tm.mu.Lock() defer tm.mu.Unlock() tm.consecutiveFailures++ ``` --- ## Configuration ### Default Settings ```go Config{ RefreshBeforeExpiry: 1 * time.Minute, // Refresh with 1 min remaining CheckInterval: 30 * time.Second, // Check every 30 seconds MaxConsecutiveFailures: 3, // 3 failures = logout } ``` ### Why These Values? | Setting | Value | Reasoning | |---------|-------|-----------| | **RefreshBeforeExpiry** | 1 minute | Conservative buffer. Even if one check fails, we have time for next attempt | | **CheckInterval** | 30 seconds | Frequent enough to catch the 1-minute window, not too aggressive on resources | | **MaxConsecutiveFailures** | 3 failures | Balances between transient network issues and genuine auth problems | ### Customizing Configuration To change settings, modify `provider.go`: ```go func ProvideManager(...) *Manager { config := Config{ RefreshBeforeExpiry: 2 * time.Minute, // More conservative CheckInterval: 1 * time.Minute, // Less frequent checks MaxConsecutiveFailures: 5, // More tolerant } return New(config, client, authService, getSession, logger) } ``` --- ## Lifecycle Management ### 1. **Starting the Token Manager** **Called from:** - `Application.Startup()` - If valid session exists from previous run - `Application.CompleteLogin()` - After successful login **What happens:** ```go func (m *Manager) Start() { // 1. Check if already running (thread-safe) if !m.running.CompareAndSwap(false, true) { return // Already running, do nothing } // 2. Create context for goroutine m.ctx, m.cancel = context.WithCancel(context.Background()) // 3. Create channels for communication m.stopCh = make(chan struct{}) m.stoppedCh = make(chan struct{}) // 4. Reset failure counter m.consecutiveFailures = 0 // 5. Launch background goroutine go m.refreshLoop() } ``` **Why it's safe to call multiple times:** The `CompareAndSwap` operation ensures only ONE goroutine starts, even if `Start()` is called many times. ### 2. **Running the Refresh Loop** **The goroutine does this forever (until stopped):** ```go func (m *Manager) refreshLoop() { // Ensure we always mark as stopped when exiting defer close(m.stoppedCh) defer m.running.Store(false) // Create ticker (fires every 30 seconds) ticker := time.NewTicker(m.config.CheckInterval) defer ticker.Stop() // Do initial check immediately m.checkAndRefresh() // Loop forever for { select { case <-m.stopCh: // Stop signal received return case <-m.ctx.Done(): // Context cancelled return case <-ticker.C: // 30 seconds elapsed, check again m.checkAndRefresh() } } } ``` **The `select` statement explained:** Think of `select` like a switch statement for channels. It waits for one of these events: - `stopCh` closed → Time to stop - `ctx.Done()` → Forced cancellation - `ticker.C` → 30 seconds passed, do work ### 3. **Stopping the Token Manager** **Called from:** - `Application.Shutdown()` - App closing - `Application.Logout()` - User logging out **What happens:** ```go func (m *Manager) Stop(ctx context.Context) error { // 1. Check if running if !m.running.Load() { return nil // Not running, nothing to do } // 2. Signal stop (close the channel) close(m.stopCh) // 3. Wait for confirmation OR timeout select { case <-m.stoppedCh: // Goroutine confirmed it stopped return nil case <-ctx.Done(): // Timeout! Force cancel m.cancel() // Give it 100ms more select { case <-m.stoppedCh: return nil case <-time.After(100 * time.Millisecond): return ctx.Err() // Failed to stop cleanly } } } ``` **Why the timeout?** If the goroutine is stuck (e.g., in a long API call), we can't wait forever. The app needs to shut down! --- ## Error Handling ### 1. **Refresh Failures** **Types of failures:** | Failure Type | Cause | Handling | |--------------|-------|----------| | **Network Error** | No internet connection | Increment counter, retry next check | | **401 Unauthorized** | Refresh token expired | Increment counter, likely force logout | | **500 Server Error** | Backend issue | Increment counter, retry next check | | **Timeout** | Slow network | Increment counter, retry next check | **Failure tracking:** ```go func (m *Manager) checkAndRefresh() error { m.mu.Lock() defer m.mu.Unlock() // ... check if refresh needed ... // Attempt refresh if err := m.client.RefreshToken(ctx); err != nil { m.consecutiveFailures++ if m.consecutiveFailures >= m.config.MaxConsecutiveFailures { // Too many failures! Force logout return m.forceLogout() } return err } // Success! Reset counter m.consecutiveFailures = 0 return nil } ``` ### 2. **Force Logout** **When it happens:** - 3 consecutive refresh failures - Session expired on startup **What it does:** ```go func (m *Manager) forceLogout() error { m.logger.Warn("Forcing logout due to token refresh issues") // Use background context (not manager's context which might be cancelled) ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() // Clear session from LevelDB if err := m.authService.Logout(ctx); err != nil { m.logger.Error("Failed to force logout", zap.Error(err)) return err } // User will see login screen on next UI interaction return nil } ``` **User experience:** When force logout happens, the user will see the login screen the next time they interact with the app. Their work is NOT lost (local files remain), they just need to log in again. ### 3. **Session Not Found** **Scenario:** User manually deleted session file, or session expired. **Handling:** ```go // Get current session sess, err := m.getSession.Execute() if err != nil || sess == nil { // No session = user not logged in // This is normal, not an error return nil // Do nothing } ``` --- ## Testing ### Manual Testing #### Test 1: Normal Refresh 1. Log in to the app 2. Watch logs for token manager start 3. Wait ~30 seconds 4. Check logs for "Token refresh not needed yet" 5. Verify `time_until_expiry` is decreasing **Expected logs:** ``` INFO Token manager starting INFO Token refresh loop started DEBUG Token refresh not needed yet {"time_until_expiry": "59m30s"} ... wait 30 seconds ... DEBUG Token refresh not needed yet {"time_until_expiry": "59m0s"} ``` #### Test 2: Automatic Refresh 1. Log in and get tokens with short expiry (if possible) 2. Wait until < 1 minute remaining 3. Watch logs for automatic refresh **Expected logs:** ``` INFO Token refresh needed {"time_until_expiry": "45s"} INFO Token refreshed successfully DEBUG Token refresh not needed yet {"time_until_expiry": "59m30s"} ``` #### Test 3: Graceful Shutdown 1. Log in (token manager running) 2. Close the app (Cmd+Q on Mac, Alt+F4 on Windows) 3. Check logs for clean shutdown **Expected logs:** ``` INFO MapleFile desktop application shutting down INFO Token manager stopping... INFO Token refresh loop received stop signal INFO Token refresh loop exited INFO Token manager stopped gracefully ``` #### Test 4: Logout 1. Log in (token manager running) 2. Click logout button 3. Verify token manager stops **Expected logs:** ``` INFO Token manager stopping... INFO Token manager stopped gracefully INFO User logged out successfully ``` #### Test 5: Session Resume on Restart 1. Log in 2. Close app 3. Restart app 4. Check logs for session resume **Expected logs:** ``` INFO MapleFile desktop application started INFO Resuming valid session from previous run INFO Session restored to API client INFO Token manager starting INFO Token manager started for resumed session ``` ### Unit Testing (TODO) ```go // Example test structure (to be implemented) func TestTokenManager_Start(t *testing.T) { // Test that Start() can be called multiple times safely // Test that goroutine actually starts } func TestTokenManager_Stop(t *testing.T) { // Test graceful shutdown // Test timeout handling } func TestTokenManager_RefreshLogic(t *testing.T) { // Test refresh when < 1 minute // Test no refresh when > 1 minute } func TestTokenManager_FailureHandling(t *testing.T) { // Test failure counter increment // Test force logout after 3 failures // Test counter reset on success } ``` --- ## Troubleshooting ### Problem: Token manager not starting **Symptoms:** - No "Token manager starting" log - App works but might get logged out after token expires **Possible causes:** 1. **No session on startup** ``` Check logs for: "No session found on startup" Solution: This is normal if user hasn't logged in yet ``` 2. **Session expired** ``` Check logs for: "Session expired on startup" Solution: User needs to log in again ``` 3. **Token manager already running** ``` Check logs for: "Token manager already running" Solution: This is expected behavior (prevents duplicate goroutines) ``` ### Problem: "Token manager stop timeout" **Symptoms:** - App takes long time to close - Warning in logs: "Token manager stop timeout, forcing cancellation" **Possible causes:** 1. **Refresh in progress during shutdown** ``` Goroutine might be in the middle of API call Solution: Wait for current API call to timeout (max 30s) ``` 2. **Network issue** ``` API call hanging due to network problems Solution: Force cancellation (already handled automatically) ``` ### Problem: Getting logged out unexpectedly **Symptoms:** - User sees login screen randomly - Logs show "Forcing logout due to token refresh issues" **Possible causes:** 1. **Network connectivity issues** ``` Check logs for repeated: "Token refresh failed" Solution: Check internet connection, backend availability ``` 2. **Backend API down** ``` All refresh attempts failing Solution: Check backend service status ``` 3. **Refresh token expired** ``` Backend returns 401 on refresh Solution: User needs to log in again (this is expected) ``` ### Problem: High CPU/memory usage **Symptoms:** - App using lots of resources - Multiple token managers running **Diagnosis:** ```bash # Check goroutines curl http://localhost:34115/debug/pprof/goroutine?debug=1 # Look for multiple "refreshLoop" goroutines ``` **Possible causes:** 1. **Token manager not stopping on logout** ``` Check logs for missing: "Token manager stopped gracefully" Solution: Bug in stop logic (report issue) ``` 2. **Multiple Start() calls** ``` Should not happen (atomic bool prevents this) Solution: Report issue with reproduction steps ``` --- ## Examples ### Example 1: Adding Custom Logging Want to know exactly when refresh happens? ```go // In tokenmanager/manager.go, modify checkAndRefresh(): func (m *Manager) checkAndRefresh() error { // ... existing code ... // Before refresh m.logger.Info("REFRESH STARTING", zap.Time("now", time.Now()), zap.Time("token_expires_at", sess.ExpiresAt)) if err := m.client.RefreshToken(ctx); err != nil { // Log failure details m.logger.Error("REFRESH FAILED", zap.Error(err), zap.String("error_type", fmt.Sprintf("%T", err))) return err } // After refresh m.logger.Info("REFRESH COMPLETED", zap.Time("completion_time", time.Now())) return nil } ``` ### Example 2: Custom Failure Callback Want to notify UI when logout happens? ```go // Add callback to Manager struct: type Manager struct { // ... existing fields ... onForceLogout func(reason string) // NEW } // In checkAndRefresh(): if m.consecutiveFailures >= m.config.MaxConsecutiveFailures { reason := fmt.Sprintf("%d consecutive refresh failures", m.consecutiveFailures) if m.onForceLogout != nil { m.onForceLogout(reason) // Notify callback } return m.forceLogout() } // In Application, set callback: func (a *Application) Startup(ctx context.Context) { // ... existing code ... // Set callback to emit Wails event a.tokenManager.onForceLogout = func(reason string) { runtime.EventsEmit(a.ctx, "auth:logged-out", reason) } } ``` ### Example 3: Metrics Collection Want to track refresh statistics? ```go type RefreshMetrics struct { TotalRefreshes int64 SuccessfulRefreshes int64 FailedRefreshes int64 LastRefreshTime time.Time } // Add to Manager: type Manager struct { // ... existing fields ... metrics RefreshMetrics metricsMu sync.Mutex } // In checkAndRefresh(): if err := m.client.RefreshToken(ctx); err != nil { m.metricsMu.Lock() m.metrics.TotalRefreshes++ m.metrics.FailedRefreshes++ m.metricsMu.Unlock() return err } m.metricsMu.Lock() m.metrics.TotalRefreshes++ m.metrics.SuccessfulRefreshes++ m.metrics.LastRefreshTime = time.Now() m.metricsMu.Unlock() // Export metrics via Wails: func (a *Application) GetRefreshMetrics() map[string]interface{} { return map[string]interface{}{ "total": a.tokenManager.metrics.TotalRefreshes, "successful": a.tokenManager.metrics.SuccessfulRefreshes, "failed": a.tokenManager.metrics.FailedRefreshes, } } ``` --- ## Summary for Junior Developers ### Key Concepts to Remember 1. **Goroutines are background threads** - They run concurrently with your main app - Need careful management (start/stop) 2. **Channels are for communication** - `close(stopCh)` = "Please stop" - `<-stoppedCh` = "I confirm I stopped" 3. **Mutexes prevent race conditions** - Lock before accessing shared data - Always defer unlock 4. **Atomic operations are thread-safe** - Use for simple flags - No mutex needed 5. **Context carries deadlines** - Respect timeouts - Use for cancellation ### What NOT to Do ❌ **Don't call Start() in a loop** ```go // Bad! for { tokenManager.Start() // Creates goroutine leak! } ``` ❌ **Don't forget to Stop()** ```go // Bad! func Logout() { authService.Logout() // Token manager still running! } ``` ❌ **Don't block on Stop() without timeout** ```go // Bad! tokenManager.Stop(context.Background()) // Could hang forever! // Good! ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) defer cancel() tokenManager.Stop(ctx) ``` ### Learning Resources - **Go Concurrency Patterns**: https://go.dev/blog/pipelines - **Context Package**: https://go.dev/blog/context - **Sync Package**: https://pkg.go.dev/sync ### Getting Help If you're stuck: 1. Check the logs (they're very detailed) 2. Look at the troubleshooting section above 3. Ask senior developers for code review 4. File an issue with reproduction steps --- ## Changelog ### v1.0.0 (2025-11-21) - Initial implementation - Background refresh every 30 seconds - Refresh when < 1 minute before expiry - Graceful shutdown with timeout handling - Automatic logout after 3 consecutive failures - Session resume on app restart