Self-hosting
Run useknockout on Modal
The whole stack — API code, model weights, infrastructure config — is MIT licensed. Deploy your own copy to Modal in four commands. You keep the GPU credits, we keep the open source.
1. Prerequisites
- Modal account (free tier is fine; $30/month free credits as of 2025)
- Python 3.10 or newer locally for
modalCLI pip install modalin any virtualenv- A Stripe account if you want to monetize the deployment (optional)
2. Deploy
Modal handles GPU provisioning, autoscaling, and HTTPS endpoints. The repo ships with a app.py that defines the functions, mounts, and image. You don't need to touch it for a stock deploy.
# 1. Clone the repo
git clone https://github.com/useknockout/api
cd api
# 2. Authenticate Modal (one-time, opens browser)
modal token new
# 3. Deploy to your Modal account
modal deploy app.py
# 4. Confirm it's live
curl https://<your-username>--api.modal.run/healthThe first deploy builds the image and downloads BiRefNet weights — takes ~3 minutes. Subsequent deploys are seconds because Modal caches the image layer cache.
3. Configure secrets (optional)
Stock app.py works with no secrets — it accepts unauthenticated requests. To gate access by token, point at your own Supabase tokens table, or report metered usage to Stripe, set these:
# Set via Modal dashboard or CLI. The deploy reads these at runtime.
modal secret create useknockout-secrets \
KNOCKOUT_ADMIN_TOKEN=<random 32 chars> \
SUPABASE_URL=https://<project>.supabase.co \
SUPABASE_SERVICE_ROLE_KEY=<your service role key> \
STRIPE_SECRET_KEY=sk_live_... \
STRIPE_METER_EVENT_NAME=images.processed4. Cost math
| GPU | Cost / hr | Throughput | Cost / image |
|---|---|---|---|
| L4 | $0.80 | 5 img/sec | ~$0.000044 |
| A10G | $1.10 | 8 img/sec | ~$0.000038 |
| A100-40GB | $3.10 | 18 img/sec | ~$0.000048 |
L4 is the recommended starting point: cheapest per-image, generous Modal free tier, and Modal autoscales to zero when idle so you don't pay for empty containers. Cold start is 60–90 seconds while BiRefNet, Swin2SR, and GFPGAN weights load into VRAM. Production workloads should pin keep_warm=1 on the Modal function decorator to eliminate cold starts; the warm container costs ~$0.80/hr but cuts latency back to 200ms.
5. Custom domain (optional)
Modal hands you a default URL like https://<username>--api.modal.run. To use your own domain (e.g. api.yourcompany.com):
- Modal dashboard → Settings → Custom domains → Add
api.yourcompany.com - Modal returns a target hostname; add a CNAME at your DNS provider
- Wait ~5 minutes for cert provisioning
6. Updating
Pull the latest changes and redeploy:
git pull modal deploy app.py
Modal does a rolling redeploy with zero downtime. Old containers drain gracefully.