I Stopped Self-Hosting AI: Why DeepSeek V4 Pro on Ollama Cloud Is My New Default
The most-said line in my group chats this week was three words: “I miss Fable.”
Not in a nostalgic way. In a “my entire workflow is broken” way.
Fable was the model I used for first-draft generation. Fast, cheap, good enough for 80 percent of the work. Then it vanished. No deprecation warning. No migration path. Just gone.
My first reaction was what a lot of people are doing now: go local. Buy a GPU, run llama.cpp, never depend on a vendor again. I spent $1,400 on a used RTX 4090. I downloaded 150GB of model weights. I learned to love the sound of my fans spinning at 80 percent.
For one month, self-hosting worked. Then the novelty wore off.
The 4090 draws 450W under load. My electricity bill went up $35. The 70B models I was running maxed out at 32K context — not enough for full codebase reviews. Batch processing hundreds of documents meant queuing jobs overnight. And when Opus 4.8 dropped with significantly better reasoning, I had no way to access it without going back to cloud anyway.
I was renting infrastructure, not avoiding vendors. The landlord just changed from Anthropic to NVIDIA.
Then I tried DeepSeek V4 Pro on Ollama Cloud. The pricing made me reconsider everything.