We Tried Every Container Service. Then We Built Our Own in a Week.

Every user on camelAI gets a persistent computer. A real Linux environment with storage, dependencies, and project state that survives across sessions. When you come back tomorrow, your files are still there. Your databases are still running. Your half-finished app is exactly where you left it.
This is the core promise of our product, and for over 2 months, we couldn't make it work.
We tried Modal. We tried Cloudflare Containers. We tried Sprites. We looked at E2B, Daytona, Vercel Sandbox, and even OpenAI's container tool. None of them fit. So we built our own container service in a week, and it was simpler than any of us expected.
I'm Illiana, and I run camelAI with my co-founders Miguel (CTO) and Isabella (COO). This is the story of how we got here — every wrong turn included.
What We Needed
Our requirements sound simple on paper:
- Persistence. The entire filesystem needs to survive across sessions. Not "persist this one folder" — everything.
- Performance. Our agent installs heavy JavaScript dependencies, runs data analysis, deploys full-stack apps. The filesystem has to be fast and the compute has to keep up.
- Reliability. If a user's container disappears, they lose their work. Unacceptable.
- Programmable from JavaScript. Our backend runs on Cloudflare Workers. Everything is TypeScript. We can't drop into Python to manage containers.
That last requirement eliminated more options than you'd think.
Modal: Fast to Start, Hard to Stay
Modal was the first service we tried. The onboarding experience was smooth — you can get something running quickly. But Modal's interface is a Python library. Our entire backend is TypeScript running on Cloudflare Workers, and introducing Python as a dependency for a critical part of our infrastructure wasn't something we were willing to do.
Beyond the language mismatch, persistence was the real problem. Modal had multiple storage options, each with different tradeoffs, and the one that looked closest to what we needed was labeled beta. For a feature this central to our product — the thing that makes a user's computer feel like their computer — building on a beta storage layer felt like a risk we couldn't take.
We moved on.
Cloudflare Containers + JuiceFS: The Creative Hack
Since our entire stack already runs on Cloudflare, their container service was the natural next choice. Cloudflare Containers were new at the time, and they've improved since, but when we tried them they were designed primarily for ephemeral workloads. Persistence meant checkpointing specific folders at intervals — not the seamless, always-on filesystem we needed.
So we got creative. We'd read a blog post from Fly.io about Sprites that described a storage architecture modeled after JuiceFS — a distributed filesystem that splits file data into chunks on object storage. We figured: what if we ran JuiceFS on Cloudflare Containers, backed by an R2 bucket?
It worked. Sort of.
JuiceFS is painfully slow for high-volume file operations. When you're installing thousands of JavaScript packages — each one a separate file write — the overhead is brutal. Install times ballooned from low single-digit seconds to minutes.
We fought this. We switched to Yarn Plug'n'Play, which replaces traditional node_modules with large zip files, dramatically reducing the number of individual file operations. It helped. But there were still too many places where filesystem performance mattered, and the CPUs on Cloudflare Containers at the time weren't fast enough for our compute-heavy workloads.
Then came the memory leaks. Containers would gradually slow to a halt and crash. Our JuiceFS setup had reliability issues — files sometimes wouldn't persist correctly. The whole thing was getting too complex, with too many failure modes, and we were stacking hacks on top of hacks.
We needed something designed for persistence from the ground up.
Sprites: So Close
We'd actually wanted to try Sprites by Fly.io earlier in the process — we'd been eyeing them since reading the blog post that inspired our JuiceFS attempt — but Sprites was labeled as beta, and we'd just been burned by building on beta infrastructure with Cloudflare Containers. After that experience failed, we were desperate enough to try.
Sprites was impressive from the moment we got it running. The machines were fast — responsive within a second, with noticeably better hardware than anything we'd used before. After weeks of sluggish Cloudflare Containers, it felt like night and day.
But Sprites isn't a traditional container service. You can't just hand it a Docker image and spin up a fleet. You spin up a Sprite, install your dependencies inside it, and then that Sprite is provisioned. To work around this, we pre-provisioned a pool of Sprites so new users would have one ready when they signed up.
Updating software on existing Sprites was worse. We had to iterate through every single Sprite, wake it up, run a migration command, and move to the next one. No image-based deploys. No atomic rollouts.
We dealt with all of that because the performance was genuinely great. And then the containers started vanishing.
You'd have a project you'd been working on for days. You'd log back in, and it would just be gone. Technically, the files weren't lost — the Sprite itself had become unreachable. Sometimes it would come back online much later. Sometimes it wouldn't, at least not within any timeframe we could wait for. In our app, it manifested the same way: your workspace was just gone.
We went to the Sprites forums, found other people reporting the same thing, and realized this wasn't a bug in our implementation — it was the reality of building on a beta product. Sprites was beta when we started, and this is what beta means. The team at Fly.io was doing ambitious work, but the reliability wasn't there yet for a production use case like ours.
For a product whose entire value proposition is "your files are always there," this was the nail in the coffin.
The Breaking Point
At this point, we were three weeks past the date we'd planned to start onboarding beta users to a stable product. The beta users who were on the unstable build were mostly friends or a few customers we'd accidentally shared a link with. Morale was low.
Miguel, our CTO, describes this as some of the worst debugging days of his career. There was a sense of desperation — we'd tried everything, and nothing was reliable enough.
What happened next is something Miguel tried to do quietly. Without telling me, he went back to Modal. Their newer Volumes V2 storage looked closer to what we needed. He built a working implementation and was literally about to push it to staging for testing.
And then we had The Conversation.
I don't remember exactly how it started, but Miguel mentioned that the Sprites issues were actually service-level downtime, and that his fix was migrating us back to Modal. I did not take this well.
My exact reaction was something like: "SaaS products have never worked for us. Why are we just switching from one SaaS to another? Why aren't we just building this ourselves?"
Miguel was frustrated. Partly because he felt I didn't appreciate how much work goes into container infrastructure, and partly because he wasn't sure he was up for building it from scratch. A lot of mixed emotions.
Then I asked the question that changed everything: "If we had to pivot right now and become Modal — if we were a container service — what would the MVP look like?"
The Light
A few sentences into his explanation, Miguel stopped. It was doable. Obviously doable. He felt silly for not seeing it sooner.
The core insight was: we don't need to build Modal. We don't need multi-region orchestration or elastic scaling or a self-service API. We need containers that are persistent, fast, and reliable for our own product. That's a much smaller problem.
We also had a pile of cloud credits sitting mostly unused — we'd been spending almost entirely on LLM costs. We could provision a large server instance and not worry about optimizing for cost efficiency right away.
What We Actually Built
The stack is almost comically simple:
- A VM instance on a cloud provider (we use Azure because of credits, but any provider works)
- Docker + gVisor for container isolation. gVisor is Google's open-source container runtime that provides kernel-level isolation — each container gets its own sandboxed kernel. The alternative is Firecracker (AWS's microVM engine, used by Lambda), but most VMs can't run Firecracker due to nested virtualization requirements. gVisor has a small performance overhead, but we haven't noticed it in practice.
- A large network-attached disk mounted to the VM, shared across all containers
- XFS with project quotas to give each container a fixed storage allocation on the shared disk. XFS's project quota feature lets you enforce storage limits on directory trees — each container gets up to 100GB, and they can't exceed it.
That's it. Each container's home directory lives on the shared disk. Persistence isn't a feature we had to engineer — it's just how files on a disk work. The container starts, mounts its directory, and everything is exactly where the user left it.
We built it in about a week. Every persistent issue — the crashes, the vanishing files, the slow filesystems, the memory leaks — was gone. All of them. Immediately.
What This Costs
We get asked about this, so here's the honest answer.
We're running a large instance because we have the credits and we need to support heavy concurrent workloads — our agent runs Claude Code, installs large dependency trees, does intensive data analysis, and deploys full-stack applications.
But if your use case is lighter, the math is forgiving. On a ~$200/month cloud instance, you could comfortably run 50–100 lightweight containers concurrently, or a couple dozen doing heavier work. The fixed overhead per container is modest. gVisor, Docker, and XFS are all free and open source.
The Lesson
The lesson is one that experienced engineers know but founders sometimes forget: if you've tried three vendor solutions and none of them fit, you probably have a specific enough problem that you should just build it yourself.
We spent over 2 months evaluating and integrating container services. Each one taught us something — Modal showed us what good onboarding looks like, Cloudflare Containers pushed us to get creative with filesystem optimization, Sprites proved that fast hardware matters. But none of them were designed for our exact shape: always-on, fully persistent, fast, reliable, and programmable from a Cloudflare Workers backend.
The services we tried aren't bad. Modal, Cloudflare Containers, and Sprites are all solving real problems for their target users. E2B, Daytona, and Vercel Sandbox exist for good reasons too. But when you're buying a service, you're buying the lowest common denominator of everyone's requirements — or sometimes something designed for a different problem entirely. The overlap with what you actually need is often smaller than you hope.
Building it ourselves took one week and gave us exactly what we needed. No persistence hacks, no filesystem workarounds, no praying that a beta feature would stabilize. Just a VM, some containers, and a big disk.
Sometimes the simplest solution is the one you build yourself.