Initial commit

2025-11-30 08:30:10 +08:00
commit f0bd18fb4e
824 changed files with 331919 additions and 0 deletions
--- a/skills/modal/references/scaling.md
+++ b/skills/modal/references/scaling.md
@@ -0,0 +1,230 @@
+# Scaling Out on Modal
+
+## Automatic Autoscaling
+
+Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler:
+- Spins up containers when no capacity available
+- Spins down containers when resources idle
+- Scales to zero by default when no inputs to process
+
+Autoscaling decisions are made quickly and frequently.
+
+## Parallel Execution with `.map()`
+
+Run function repeatedly with different inputs in parallel:
+
+```python
+@app.function()
+def evaluate_model(x):
+    return x ** 2
+
+@app.local_entrypoint()
+def main():
+    inputs = list(range(100))
+    # Runs 100 inputs in parallel across containers
+    for result in evaluate_model.map(inputs):
+        print(result)
+```
+
+### Multiple Arguments with `.starmap()`
+
+For functions with multiple arguments:
+
+```python
+@app.function()
+def add(a, b):
+    return a + b
+
+@app.local_entrypoint()
+def main():
+    results = list(add.starmap([(1, 2), (3, 4)]))
+    # [3, 7]
+```
+
+### Exception Handling
+
+```python
+@app.function()
+def may_fail(a):
+    if a == 2:
+        raise Exception("error")
+    return a ** 2
+
+@app.local_entrypoint()
+def main():
+    results = list(may_fail.map(
+        range(3),
+        return_exceptions=True,
+        wrap_returned_exceptions=False
+    ))
+    # [0, 1, Exception('error')]
+```
+
+## Autoscaling Configuration
+
+Configure autoscaler behavior with parameters:
+
+```python
+@app.function(
+    max_containers=100,      # Upper limit on containers
+    min_containers=2,        # Keep warm even when inactive
+    buffer_containers=5,     # Maintain buffer while active
+    scaledown_window=60,     # Max idle time before scaling down (seconds)
+)
+def my_function():
+    ...
+```
+
+Parameters:
+- **max_containers**: Upper limit on total containers
+- **min_containers**: Minimum kept warm even when inactive
+- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue)
+- **scaledown_window**: Maximum idle duration before scale down (seconds)
+
+Trade-offs:
+- Larger warm pool/buffer → Higher cost, lower latency
+- Longer scaledown window → Less churn for infrequent requests
+
+## Dynamic Autoscaler Updates
+
+Update autoscaler settings without redeployment:
+
+```python
+f = modal.Function.from_name("my-app", "f")
+f.update_autoscaler(max_containers=100)
+```
+
+Settings revert to decorator configuration on next deploy, or are overridden by further updates:
+
+```python
+f.update_autoscaler(min_containers=2, max_containers=10)
+f.update_autoscaler(min_containers=4)  # max_containers=10 still in effect
+```
+
+### Time-Based Scaling
+
+Adjust warm pool based on time of day:
+
+```python
+@app.function()
+def inference_server():
+    ...
+
+@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
+def increase_warm_pool():
+    inference_server.update_autoscaler(min_containers=4)
+
+@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
+def decrease_warm_pool():
+    inference_server.update_autoscaler(min_containers=0)
+```
+
+### For Classes
+
+Update autoscaler for specific parameter instances:
+
+```python
+MyClass = modal.Cls.from_name("my-app", "MyClass")
+obj = MyClass(model_version="3.5")
+obj.update_autoscaler(buffer_containers=2)  # type: ignore
+```
+
+## Input Concurrency
+
+Process multiple inputs per container with `@modal.concurrent`:
+
+```python
+@app.function()
+@modal.concurrent(max_inputs=100)
+def my_function(input: str):
+    # Container can handle up to 100 concurrent inputs
+    ...
+```
+
+Ideal for I/O-bound workloads:
+- Database queries
+- External API requests
+- Remote Modal Function calls
+
+### Concurrency Mechanisms
+
+**Synchronous Functions**: Separate threads (must be thread-safe)
+
+```python
+@app.function()
+@modal.concurrent(max_inputs=10)
+def sync_function():
+    time.sleep(1)  # Must be thread-safe
+```
+
+**Async Functions**: Separate asyncio tasks (must not block event loop)
+
+```python
+@app.function()
+@modal.concurrent(max_inputs=10)
+async def async_function():
+    await asyncio.sleep(1)  # Must not block event loop
+```
+
+### Target vs Max Inputs
+
+```python
+@app.function()
+@modal.concurrent(
+    max_inputs=120,    # Hard limit
+    target_inputs=100  # Autoscaler target
+)
+def my_function(input: str):
+    # Allow 20% burst above target
+    ...
+```
+
+Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up.
+
+## Scaling Limits
+
+Modal enforces limits per function:
+- 2,000 pending inputs (not yet assigned to containers)
+- 25,000 total inputs (running + pending)
+
+For `.spawn()` async jobs: up to 1 million pending inputs.
+
+Exceeding limits returns `Resource Exhausted` error - retry later.
+
+Each `.map()` invocation: max 1,000 concurrent inputs.
+
+## Async Usage
+
+Use async APIs for arbitrary parallel execution patterns:
+
+```python
+@app.function()
+async def async_task(x):
+    await asyncio.sleep(1)
+    return x * 2
+
+@app.local_entrypoint()
+async def main():
+    tasks = [async_task.remote.aio(i) for i in range(100)]
+    results = await asyncio.gather(*tasks)
+```
+
+## Common Gotchas
+
+**Incorrect**: Using Python's builtin map (runs sequentially)
+```python
+# DON'T DO THIS
+results = map(evaluate_model, inputs)
+```
+
+**Incorrect**: Calling function first
+```python
+# DON'T DO THIS
+results = evaluate_model(inputs).map()
+```
+
+**Correct**: Call .map() on Modal function object
+```python
+# DO THIS
+results = evaluate_model.map(inputs)
+```