Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 22, 2025

What is this PR about?

Daily Docker cleanup jobs spawn child processes that weren't being reaped, accumulating as zombie [docker] <defunct> processes. The shell wrapper function executes Docker commands but exits before child processes complete.

Modified dockerSafeExec() in /packages/server/src/utils/docker/utils.ts:

# Execute command and capture exit code
${exec}
EXIT_CODE=$?

# Wait for all background processes to complete to prevent zombie processes
wait

echo "Execution completed with exit code: $EXIT_CODE"
exit $EXIT_CODE

The wait builtin reaps all child processes before script exit. Exit code capture preserves error propagation. Affects all cleanup operations: containers, images, builders, system prune.

Checklist

Before submitting this PR, please make sure that:

Screenshots (if applicable)

N/A - Internal process management fix

Original prompt

This section details on the original issue you should resolve

<issue_title>Zombie processes from failed log cleanup</issue_title>
<issue_description>### To Reproduce

  1. Install Dokploy v0.26.0 on a fresh Ubuntu 22.04 server
  2. Deploy any application using Dokploy (in my case, multiple Node.js APIs)
  3. Configure Traefik routing for the applications
  4. Wait for the daily cleanup job to run (occurs at midnight, 12:00 AM)
  5. Check for zombie processes using: ps aux | grep 'Z' or ps aux | grep defunct
  6. Observe accumulating zombie processes ([docker] and [grep] ) that never get cleaned up

Current vs. Expected behavior

Current behavior:

  • Dokploy's daily cleanup job fails with error: tail: cannot open '/etc/dokploy/traefik/dynamic/access.log' for reading: No such file or directory
  • The cleanup job attempts to execute: tail -n 1000 /etc/dokploy/traefik/dynamic/access.log > /etc/dokploy/traefik/dynamic/access.log.tmp && mv /etc/dokploy/traefik/dynamic/access.log.tmp /etc/dokploy/traefik/dynamic/access.log
  • When this command fails, child processes (tail, grep, docker) are spawned but not properly reaped by the parent Node.js process
  • These become zombie processes with status Z and remain indefinitely
  • Currently have 113 zombie processes accumulated since Dec 15
  • The zombies are all children of the Dokploy containerd-shim process (PID 12556)

Expected behavior:

  • The cleanup job should either:
    1. Check if the log file exists before attempting to truncate it, OR
    1. Create the log file if it doesn't exist, OR
    1. Gracefully handle the missing file without creating zombies
  • Child processes should be properly reaped even when commands fail
  • No zombie processes should accumulate over time

Docker logs showing the error:

Error during log cleanup: Error [ExecError]: Command execution failed: Command failed: tail -n 1000 /etc/dokploy/traefik/dynamic/access.log > /etc/dokploy/traefik/dynamic/access.log.tmp && mv /etc/dokploy/traefik/dynamic/access.log.tmp /etc/dokploy/traefik/dynamic/access.log
tail: cannot open '/etc/dokploy/traefik/dynamic/access.log' for reading: No such file or directory

Provide environment information

Operating System:
  OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
  Kernel: 5.15.0-157-generic
  Arch: x86_64

Dokploy version: v0.26.0

Docker version: 28.5.0, build 887030f (containerd v2.2.0)

VPS Provider: Hetzner

Applications/services deployed:
  - Multiple Node.js APIs (Express/Next.js)
  - Using Traefik for routing

Which area(s) are affected? (Select all that apply)

Traefik

Are you deploying the applications where Dokploy is installed or on a remote server?

Same server where Dokploy is installed

Additional context

Directory listing of /etc/dokploy/traefik/dynamic/:

-rw-r--r-- 1 root root    0 Dec 22 01:00 access.log.tmp
-rw------- 1 root root 3457 Dec  9 15:15 acme.json
drwxr-xr-x 5 root root 4096 Dec 12 15:24 certificates
-rw-r--r-- 1 root root  483 Dec 15 13:39 dokploy.yml
-rw-r--r-- 1 root root 1125 Dec 15 12:33 *redacted*-api-dxra6m.yml
-rw-r--r-- 1 root root 1147 Dec 18 14:12 *redacted*-api-xrzinz.yml
-rw-r--r-- 1 root root  112 Dec 18 13:40 middlewares.yml
-rw-r--r-- 1 root root 1221 Dec 12 15:25 *redacted*-api-a5zlvt.yml

Note: The access.log file is missing, but access.log.tmp exists (created at 01:00, when cleanup runs).

Process tree showing zombie accumulation:
All 113 zombies are children of the Dokploy container's containerd-shim:

root    12556  /usr/bin/containerd-shim-runc-v2 -namespace moby -id 845d81f9...
root    12792  node -r dotenv/config dist/server.mjs
  \_ [docker] <defunct>  (x56)
  \_ [grep] <defunct>    (x56)

Will you send a PR to fix it?

Maybe, need help</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: Siumauricio <47042324+Siumauricio@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix zombie processes from failed log cleanup Fix zombie processes from Docker cleanup operations Dec 22, 2025
Copilot AI requested a review from Siumauricio December 22, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zombie processes from failed log cleanup

2 participants