When AI Agents Eat Your Server: Taming Rogue Processes
I was debugging a CI pipeline when Nginx stopped responding. htop showed the culprit: a Python script spinning at 99.8% CPU, and another process that had swallowed 6GB of RAM before the kernel gave up.
The script was an experimental AI agent stuck in a loop parsing bad JSON — the kind of agent failure I’ve dealt with before. The memory eater was a local LLM session where I forgot to cap the context window.
I didn’t kill them. The agent was 40 minutes deep into a pipeline, and I needed its output. Killing meant starting over. What I needed was a way to contain the problem without throwing away progress.
That night changed how I run every workload on my server. Here’s what I learned.
—
The Problem with AI Workloads
Running LLMs locally sounds great until a 7B model claims 8GB RAM and asks for more. On a 16GB server hosting Nginx, PostgreSQL, and a CI runner — the same kind of bare-metal setup I wrote about here — that’s a disaster waiting to happen.
Failure modes are unpredictable: tool call loops, unbounded parsers, misconfigured context. The fix isn’t better code — it’s better containment.
—
Start with Monitoring: Know What You're Eating
Before you can contain a process, you need to know which process is the problem and when it misbehaves. I use Zabbix for this — it graphs CPU, memory, disk, and network per process over time. It’s part of the monitoring and continuous improvement setup I run on all my servers. When my Nginx goes slow, I check Zabbix and see that a Python agent spiked to 95% CPU at 02:00 AM. No guesswork.
Set up process-level monitoring on your critical hosts: grab the top 5 CPU and top 5 memory consumers, graph them over 24 hours. You’ll spot patterns — a cron job that leaks RAM, an LLM inference that doubles memory with each query. That data tells you which tool to apply and what ceiling to set.
Zabbix is just one option (Grafana + Prometheus works too), but having historical graphs is non-negotiable. You can’t fix what you can’t measure.
—
Four Tools That Keep My Server Alive
1. `systemd` CPUQuota + MemoryMax — Set Once, Forget Forever
If your AI service runs as a systemd unit, this is the single highest-impact thing you can do:
bash sudo systemctl edit ollama
[Service] CPUQuota=80% MemoryMax=8G MemoryHigh=6G
Ollama now gets 80% of one core at most. Exceed 8GB RAM and the kernel terminates it. MemoryHigh throttles before the hard limit — graceful degradation instead of sudden death.
I use this for every AI-adjacent service: Ollama, vLLM, ComfyUI, agent runners. No service gets an unlimited sandbox.
—
2. `cpulimit` — The Emergency Brake
systemd covers planned services. But what about the one-off experiment you launched in a hurry?
bash sudo cpulimit --pid 13452 --limit 40 --background
It works by rapidly sending SIGSTOP/SIGCONT — crude but effective. Two gotchas: (1) Python scripts doing network I/O can timeout, so use it as a temporary fix. (2) It doesn’t catch child processes — my “capped at 30%” agent once spawned four workers and consumed 120% CPU across cores.
—
3. `cgroups v2` — Kernel-Level Isolation
When I need true hard limits — the kind a process cannot escape — I use cgroups v2 directly:
bash sudo mkdir /sys/fs/cgroup/ai-job-1 echo "4294967296" | sudo tee /sys/fs/cgroup/ai-job-1/memory.max echo "500000 1000000" | sudo tee /sys/fs/cgroup/ai-job-1/cpu.max echo 13452 | sudo tee /sys/fs/cgroup/ai-job-1/cgroup.procs
I use this for ad-hoc workloads without systemd units: Python experiments, benchmarks, data pipelines. Manual steps are tedious, so I automated it with a shell wrapper that cleans up on exit.
—
4. `ulimit` — User-Level Safety Net
This isn’t for individual services — it’s for the user account running experiments. My aiexp user gets these in /etc/security/limits.conf:
aiexp hard nproc 200 aiexp hard as 4194304 aiexp hard cpu 600
– nproc 200: Prevents agent scripts from fork-bombing the box – as 4194304: 4GB address space cap — a leaking script dies before the server does – cpu 600: 10 minutes CPU time, then SIGXCPU terminates it
The CPU time limit is gold for stuck AI agents. A reasoning loop hits 10 minutes and dies naturally — no manual intervention.
Caveat: Changes only apply on new login sessions. I’ve tested in the same shell three times wondering why nothing changed before it stuck.
—
Putting It All Together
My current setup for AI experimentation:
0. Zabbix monitors all hosts 24/7 — when something feels slow, I check the graphs first before touching anything 1. systemd limits on all permanent services (Ollama, agent daemon, monitoring) 2. ulimit protects the experiment user account 3. Ad-hoc experiments launch via a cgroups wrapper 4. cpulimit is the panic button if something still goes sideways
This layered approach means no single failure takes the whole server down. An agent can loop, an LLM can leak, a script can fork — the rest keeps running.
—
The Lesson
Apply limits before the crash, not after. Adding CPUQuota and MemoryMax to a systemd unit takes 30 seconds. Fixing a crashed database during an LLM experiment takes hours.
The discipline of resource containment has made me a better engineer. When you can’t throw infinite RAM at a problem, you write tighter code and understand exactly what your software consumes.
That’s a lesson no cloud credits can teach you.
—
Ever had a runaway process take down something unexpected on your server? What was it, and how did you recover? Drop a comment — I’d love to hear your story.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.