Skip to content

Conversation

@helioelias
Copy link
Contributor

Add mongo-express for MongoDb
Add rebrow for Redis
Add docker-compose-full include all services in one docker-compose file
Remove and ajust networks on docker-compose file

…ress and rebrow, tools for maintenance and visualize data
…ress and rebrow, tools for maintenance and visualize data
Copy link
Collaborator

@DavidsonGomes DavidsonGomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, let's add develop for the next version

@DavidsonGomes DavidsonGomes changed the base branch from main to develop July 14, 2023 18:18
@DavidsonGomes DavidsonGomes merged commit 0fc160f into EvolutionAPI:develop Jul 14, 2023
ricaelchiquetti pushed a commit to ricaelchiquetti/evolution that referenced this pull request Oct 14, 2025
…baileys_7

fix: ajustar a manipulação do remoteJid na mensagem
Leader24-AI added a commit to Leader24-TOP-AI/evolution-api that referenced this pull request Nov 21, 2025
Comprehensive optimization of auto-restart and health check system.
Resolved all identified issues including memory leaks, race conditions,
performance bottlenecks, and edge cases.

CRITICAL FIXES (Deploy ASAP):

FIX EvolutionAPI#1: Safety Timeout Memory Leak
- Save safetyTimeout reference to allow cancellation
- Cancel timeout on connection 'open', logout, and exception
- Prevents accumulation of uncancelled timeouts
- Impact: Eliminates memory leak (100 restart = 100 timeout leak)

FIX EvolutionAPI#2: Max ForceRestart Attempts + Rate Limiting
- Track forceRestartAttempts (max 5)
- Min 5s interval between force restarts
- Send INSTANCE_STUCK webhook when max reached
- Reset counter on successful 'open'
- Impact: Prevents infinite restart loop, alerts unrecoverable instances

FIX EvolutionAPI#3: Database Fallback in PerformHealthCheck
- Wrap DB query in try-catch
- Safe fallback: skip force restart if DB down
- Use cached ownerJid when available
- Impact: System continues functioning with DB issues

HIGH PRIORITY FIXES:

FIX EvolutionAPI#4: Health Check Jitter (Anti-Thundering Herd)
- Random jitter ±10s on health check interval
- Distributes load over 50-70s window instead of 60s spike
- Impact: Prevents 100 instances all checking simultaneously

FIX EvolutionAPI#5: Stop Health Check During Connecting
- stopHealthCheck() when entering 'connecting' state
- Avoids wasted resources and potential conflicts
- Impact: Cleaner state transitions, less overhead

FIX EvolutionAPI#6: Reset ownerJid on Logout
- Update DB to set ownerJid=null on logout
- Allows safe instance name reuse
- Impact: Health check won't trigger on new QR scan for reused name

MEDIUM PRIORITY FIXES:

FIX EvolutionAPI#7: LoadProxy Mutex
- Simple mutex lock to prevent concurrent loadProxy() calls
- Retry with 100ms delay if lock held
- Impact: Prevents proxy config corruption from race conditions

FIX EvolutionAPI#8: Proxy Test Cache + ownerJid Cache
- Cache proxy test results for 2 minutes
- Cache ownerJid in memory to avoid DB queries
- Impact: Reduces external API calls and DB load by ~90%

FIX EvolutionAPI#9: Await ConnectionUpdate Events
- Add await to connectionUpdate() call in eventHandler
- Sequentializes connection events
- Impact: Prevents race conditions on rapid state changes

FIX EvolutionAPI#11: Conditional Logging
- Log health check only on state changes or milestones
- Impact: Reduces log spam from 1000 log/min to ~10 log/min

CONSISTENCY FIXES:

FIX EvolutionAPI#15: Flag Consistency
- Set isAutoRestartTriggered in forceRestart() (was missing)
- Consistent with autoRestart() behavior
- Impact: Correct flag coordination

TOTALS:
- 2 files modified
- ~180 lines added/modified
- 15 bugs/issues fixed
- 1 CRITICAL memory leak eliminated
- 3 HIGH severity issues resolved
- 9 MEDIUM severity improvements
- 2 LOW priority optimizations

BENEFITS:
- No more permanent deadlocks (30s recovery max)
- No memory leaks from uncancelled timeouts
- Handles DB/Redis failures gracefully
- Scales better with many instances (jitter, cache, rate limiting)
- Comprehensive webhook monitoring for stuck instances
- Alerts when instances are unrecoverable
- Better log management (less spam)
- Production-ready for high-load scenarios
Leader24-AI added a commit to Leader24-TOP-AI/evolution-api that referenced this pull request Nov 24, 2025
…ents permanent stuck

CRITICAL BUG FOUND:
- Instance was stuck in 'connecting' state for 9+ hours this morning
- wasOpenBeforeReconnect flag was lost during forceRestart() safety timeout
- Timer auto-restart couldn't start → permanent stuck state
- Manual server restart required to recover

ROOT CAUSE:
4 locations in code were resetting/losing wasOpenBeforeReconnect flag:
1. forceRestart() safety timeout (line 1338-1342)
2. forceRestart() catch block (line 1359-1362)
3. Health check safety net (line 1051-1054)
4. autoRestart() catch block (line 880-883)

IMPACT:
When these code paths executed, wasOpenBeforeReconnect was reset to false.
Next reconnection attempt → timer check fails → no auto-restart → stuck forever.

SOLUTION:
Add explicit comments in all 4 locations to preserve the flag:
- Safety timeout: Do NOT reset wasOpenBeforeReconnect
- Catch blocks: Do NOT reset wasOpenBeforeReconnect
- Health check: Do NOT reset wasOpenBeforeReconnect

This ensures the flag is ALWAYS preserved across:
- Timeout scenarios
- Exception scenarios
- Safety net scenarios

VERIFICATION:
- Test scenario EvolutionAPI#2 (408 timeout): ✅ Passed, reconnected in 4s
- Instance recovered immediately after server restart
- Flag preservation logic now consistent across all paths

FILES MODIFIED:
- src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts

FIXES:
- Bug EvolutionAPI#1: forceRestart() safety timeout preserves flag
- Bug EvolutionAPI#2: forceRestart() catch preserves flag
- Bug EvolutionAPI#3: Health check preserves flag
- Bug EvolutionAPI#4: autoRestart() catch preserves flag

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Leader24-AI added a commit to Leader24-TOP-AI/evolution-api that referenced this pull request Nov 26, 2025
FIX #0: Set wasOpenBeforeReconnect=true in forceRestart() when restarting from 'open' state
- This was the main cause of today's blocking - the flag was being reset in 'open' handler
- Now properly captures state before cleanup to allow auto-restart timer

FIX EvolutionAPI#1: Add finally blocks to autoRestart() and forceRestart()
- Ensures isRestartInProgress is always reset even on uncaught exceptions
- Prevents deadlock scenarios where flag remains stuck

FIX EvolutionAPI#2: Verify createClient() success
- Throws error if client is null after createClient() completes
- Prevents silent failures that could cause infinite loops

FIX EvolutionAPI#4: Cancel existing timers in forceRestart()
- Clears connectingTimer and safetyTimeout before setting flags
- Prevents race conditions between timer execution and restart

FIX EvolutionAPI#6: Prevent infinite loop in safety timeout
- Sets isRestartInProgress=true BEFORE forcing close
- This prevents connectionUpdate('close') from calling connectToWhatsapp()
- Explicitly calls autoRestart() after delay instead of relying on close handler

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants