Self-hosting

Run useknockout on Modal

The whole stack (API code, model weights, infrastructure config) is MIT licensed. Deploy your own copy to Modal in four commands. You keep the GPU credits, we keep the open source.

Modal's free tier covers ~50,000 image-equivalents per month before billing kicks in. Plenty of runway to ship a side project or evaluate the platform.

1. Prerequisites

Modal account (free tier is fine; $30/month free credits as of 2025)
Python 3.10 or newer locally for modal CLI
pip install modal in any virtualenv
A Stripe account if you want to monetize the deployment (optional)

2. Deploy

Modal handles GPU provisioning, autoscaling, and HTTPS endpoints. The repo ships with a app.py that defines the functions, mounts, and image. You don't need to touch it for a stock deploy.

# 1. Clone the repo
git clone https://github.com/useknockout/api
cd api

# 2. Authenticate Modal (one-time, opens browser)
modal token new

# 3. Deploy to your Modal account
modal deploy app.py

# 4. Confirm it's live
curl https://<your-username>--api.modal.run/health

The first deploy builds the image and downloads BiRefNet weights, which takes ~3 minutes. Subsequent deploys are seconds because Modal caches the image layer cache.

3. Configure secrets (optional)

Stock app.py works with no secrets. It accepts unauthenticated requests. To gate access by token, point at your own Supabase tokens table, or report metered usage to Stripe, set these:

# Set via Modal dashboard or CLI. The deploy reads these at runtime.

modal secret create useknockout-secrets \
  KNOCKOUT_ADMIN_TOKEN=<random 32 chars> \
  SUPABASE_URL=https://<project>.supabase.co \
  SUPABASE_SERVICE_ROLE_KEY=<your service role key> \
  STRIPE_SECRET_KEY=sk_live_... \
  STRIPE_METER_EVENT_NAME=images.processed

4. Cost math

GPU	Cost / hr	Throughput	Cost / image
L4	$0.80	5 img/sec	~$0.000044
A10G	$1.10	8 img/sec	~$0.000038
A100-40GB	$3.10	18 img/sec	~$0.000048

L4 is the recommended starting point: cheapest per-image, generous Modal free tier, and Modal autoscales to zero when idle so you don't pay for empty containers. Cold start is 60–90 seconds while BiRefNet, Swin2SR, and GFPGAN weights load into VRAM. Production workloads should pin keep_warm=1 on the Modal function decorator to eliminate cold starts; the warm container costs ~$0.80/hr but cuts latency back to 200ms.

5. Custom domain (optional)

Modal hands you a default URL like https://<username>--api.modal.run. To use your own domain (e.g. api.yourcompany.com):

Modal dashboard → Settings → Custom domains → Add api.yourcompany.com
Modal returns a target hostname; add a CNAME at your DNS provider
Wait ~5 minutes for cert provisioning

6. Updating

Pull the latest changes and redeploy:

git pull
modal deploy app.py

Modal does a rolling redeploy with zero downtime. Old containers drain gracefully.

← Quickstart Self-host on your own GPU →