# Claude Advanced Features

Advanced settings and parameter tuning for Claude models.

## Context Window and Output Limits

| Model | Context Window | Max Output Tokens | Notes |
|--------|-------------------|---------------|------|
| `claude-opus-4-1-20250805` | 200,000 | 32,000 | Highest performance |
| `claude-sonnet-4-5` | 1,000,000 | 64,000 | Latest version |
| `claude-sonnet-4-20250514` | 200,000 (1M beta) | 64,000 | 1M with beta header |
| `claude-haiku-4-5-20251001` | 200,000 | 64,000 | Fast version |

**Note**: To use 1M context with Sonnet 4, a beta header is required.

## Parameter Configuration

```python
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-5",
    temperature=0.7,          # Creativity (0.0-1.0)
    max_tokens=64000,         # Max output (Sonnet 4.5: 64K)
    top_p=0.9,               # Diversity
    top_k=40,                # Sampling
)

# Opus 4.1 (max output 32K)
llm_opus = ChatAnthropic(
    model="claude-opus-4-1-20250805",
    max_tokens=32000,
)
```

## Using 1M Context

### Sonnet 4.5 (Standard)

```python
llm = ChatAnthropic(
    model="claude-sonnet-4-5",
    max_tokens=64000
)

# Can process 1M tokens of context
long_document = "..." * 500000  # Long document
response = llm.invoke(f"Please analyze the following document:\n\n{long_document}")
```

### Sonnet 4 (Beta Header)

```python
# Enable 1M context with beta header
llm = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    max_tokens=64000,
    default_headers={
        "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"
    }
)
```

## Streaming

```python
llm = ChatAnthropic(
    model="claude-sonnet-4-5",
    streaming=True
)

for chunk in llm.stream("question"):
    print(chunk.content, end="", flush=True)
```

## Prompt Caching

Cache parts of long prompts for efficiency:

```python
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
    model="claude-sonnet-4-5",
    max_tokens=4096
)

# System prompt for caching
system_prompt = """
You are a professional code reviewer.
Please review according to the following coding guidelines:
[long guidelines...]
"""

# Use cache
response = llm.invoke(
    [
        {"role": "system", "content": system_prompt, "cache_control": {"type": "ephemeral"}},
        {"role": "user", "content": "Please review this code"}
    ]
)
```

**Cache Benefits**:
- Cost reduction (90% off on cache hits)
- Latency reduction (faster processing on reuse)

## Vision (Image Processing)

```python
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage

llm = ChatAnthropic(model="claude-sonnet-4-5")

message = HumanMessage(
    content=[
        {"type": "text", "text": "What's in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/image.jpg"
            }
        }
    ]
)

response = llm.invoke([message])
```

## JSON Mode

When structured output is needed:

```python
llm = ChatAnthropic(
    model="claude-sonnet-4-5",
    model_kwargs={
        "response_format": {"type": "json_object"}
    }
)

response = llm.invoke("Return user information in JSON format")
```

## Token Usage Tracking

```python
from langchain.callbacks import get_openai_callback

llm = ChatAnthropic(model="claude-sonnet-4-5")

with get_openai_callback() as cb:
    response = llm.invoke("question")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
```

## Error Handling

```python
from anthropic import AnthropicError, RateLimitError

try:
    llm = ChatAnthropic(model="claude-sonnet-4-5")
    response = llm.invoke("question")
except RateLimitError:
    print("Rate limit reached")
except AnthropicError as e:
    print(f"Anthropic error: {e}")
```

## Rate Limit Handling

```python
from tenacity import retry, wait_exponential, stop_after_attempt
from anthropic import RateLimitError

@retry(
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5),
    retry=lambda e: isinstance(e, RateLimitError)
)
def invoke_with_retry(llm, messages):
    return llm.invoke(messages)

llm = ChatAnthropic(model="claude-sonnet-4-5")
response = invoke_with_retry(llm, ["question"])
```

## Listing Models

```python
import anthropic
import os

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
models = client.models.list()

for model in models.data:
    print(f"{model.id} - {model.display_name}")
```

## Cost Optimization

### Cost Management by Model Selection

```python
# Low-cost version (simple tasks)
llm_cheap = ChatAnthropic(model="claude-haiku-4-5-20251001")

# Balanced version (general tasks)
llm_balanced = ChatAnthropic(model="claude-sonnet-4-5")

# High-performance version (complex tasks)
llm_powerful = ChatAnthropic(model="claude-opus-4-1-20250805")

# Select based on task
def get_llm_for_task(complexity):
    if complexity == "simple":
        return llm_cheap
    elif complexity == "medium":
        return llm_balanced
    else:
        return llm_powerful
```

### Cost Reduction with Prompt Caching

```python
# Cache long system prompt
system = {"role": "system", "content": long_guidelines, "cache_control": {"type": "ephemeral"}}

# Reuse cache across multiple calls (90% cost reduction)
for user_input in user_inputs:
    response = llm.invoke([system, {"role": "user", "content": user_input}])
```

## Leveraging Large Context

```python
llm = ChatAnthropic(model="claude-sonnet-4-5")

# Process large documents at once (1M token support)
documents = load_large_documents()  # Large document collection

response = llm.invoke(f"""
Please analyze the following multiple documents:

{documents}

Tell me the main themes and conclusions.
""")
```

## Reference Links

- [Claude API Documentation](https://docs.anthropic.com/)
- [Anthropic API Reference](https://docs.anthropic.com/en/api/)
- [Claude Models Overview](https://docs.anthropic.com/en/docs/about-claude/models/overview)
- [Prompt Caching Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)