6.8 KiB
Web Endpoints
Quick Start
Create web endpoint with single decorator:
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
@app.function(image=image)
@modal.fastapi_endpoint()
def hello():
return "Hello world!"
Development and Deployment
Development with modal serve
modal serve server.py
Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately.
Deployment with modal deploy
modal deploy server.py
Creates persistent endpoint with stable URL.
Simple Endpoints
Query Parameters
@app.function(image=image)
@modal.fastapi_endpoint()
def square(x: int):
return {"square": x**2}
Call with:
curl "https://workspace--app-square.modal.run?x=42"
POST Requests
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
def square(item: dict):
return {"square": item['x']**2}
Call with:
curl -X POST -H 'Content-Type: application/json' \
--data '{"x": 42}' \
https://workspace--app-square.modal.run
Pydantic Models
from pydantic import BaseModel
class Item(BaseModel):
name: str
qty: int = 42
@app.function()
@modal.fastapi_endpoint(method="POST")
def process(item: Item):
return {"processed": item.name, "quantity": item.qty}
ASGI Apps (FastAPI, Starlette, FastHTML)
Serve full ASGI applications:
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
@app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.asgi_app()
def fastapi_app():
from fastapi import FastAPI
web_app = FastAPI()
@web_app.get("/")
async def root():
return {"message": "Hello"}
@web_app.post("/echo")
async def echo(request: Request):
body = await request.json()
return body
return web_app
WSGI Apps (Flask, Django)
Serve synchronous web frameworks:
image = modal.Image.debian_slim().pip_install("flask")
@app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.wsgi_app()
def flask_app():
from flask import Flask, request
web_app = Flask(__name__)
@web_app.post("/echo")
def echo():
return request.json
return web_app
Non-ASGI Web Servers
For frameworks with custom network binding:
@app.function()
@modal.concurrent(max_inputs=100)
@modal.web_server(8000)
def my_server():
import subprocess
# Must bind to 0.0.0.0, not 127.0.0.1
subprocess.Popen("python -m http.server -d / 8000", shell=True)
Streaming Responses
Use FastAPI's StreamingResponse:
import time
def event_generator():
for i in range(10):
yield f"data: event {i}\n\n".encode()
time.sleep(0.5)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream():
from fastapi.responses import StreamingResponse
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
Streaming from Modal Functions
@app.function(gpu="any")
def process_gpu():
for i in range(10):
yield f"data: result {i}\n\n".encode()
time.sleep(1)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def hook():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_gpu.remote_gen(),
media_type="text/event-stream"
)
With .map()
@app.function()
def process_segment(i):
return f"segment {i}\n"
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream_parallel():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_segment.map(range(10)),
media_type="text/plain"
)
WebSockets
Supported with @web_server, @asgi_app, and @wsgi_app. Maintains single function call per connection. Use with @modal.concurrent for multiple simultaneous connections.
Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each.
Authentication
Proxy Auth Tokens
First-class authentication via Modal:
@app.function()
@modal.fastapi_endpoint()
def protected():
return "authenticated!"
Protect with tokens in settings, pass in headers:
Modal-KeyModal-Secret
Bearer Token Authentication
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
auth_scheme = HTTPBearer()
@app.function(secrets=[modal.Secret.from_name("auth-token")])
@modal.fastapi_endpoint()
async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
import os
if token.credentials != os.environ["AUTH_TOKEN"]:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token"
)
return "success!"
Client IP Address
from fastapi import Request
@app.function()
@modal.fastapi_endpoint()
def get_ip(request: Request):
return f"Your IP: {request.client.host}"
Web Endpoint URLs
Auto-Generated URLs
Format: https://<workspace>--<app>-<function>.modal.run
With environment suffix: https://<workspace>-<suffix>--<app>-<function>.modal.run
Custom Labels
@app.function()
@modal.fastapi_endpoint(label="api")
def handler():
...
# URL: https://workspace--api.modal.run
Programmatic URL Retrieval
@app.function()
@modal.fastapi_endpoint()
def my_endpoint():
url = my_endpoint.get_web_url()
return {"url": url}
# From deployed function
f = modal.Function.from_name("app-name", "my_endpoint")
url = f.get_web_url()
Custom Domains
Available on Team and Enterprise plans:
@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
def hello(message: str):
return {"message": f"hello {message}"}
Multiple domains:
@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"])
Wildcard domains:
@modal.fastapi_endpoint(custom_domains=["*.example.com"])
TLS certificates automatically generated and renewed.
Performance
Cold Starts
First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests.
Scaling
- Autoscaling based on traffic
- Use
@modal.concurrentfor multiple requests per container - Beyond concurrency limit, additional containers spin up
- Requests queue when at max containers
Rate Limits
Default: 200 requests/second with 5-second burst multiplier
- Excess returns 429 status code
- Contact support to increase limits
Size Limits
- Request body: up to 4 GiB
- Response body: unlimited
- WebSocket messages: up to 2 MiB