Files
gh-hiroshi75-protografico-p…/skills/langgraph-master/06_llm_model_ids_openai_advanced.md
2025-11-29 18:45:58 +08:00

6.5 KiB

OpenAI GPT-5 Advanced Features

Advanced settings and multimodal features for GPT-5 models.

Parameter Settings

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5",
    temperature=0.7,          # Creativity (0.0-2.0)
    max_tokens=128000,        # Max output (GPT-5: 128K)
    top_p=0.9,               # Diversity
    frequency_penalty=0.0,    # Repetition penalty
    presence_penalty=0.0,     # Topic diversity
)

# GPT-5 Pro (larger max output)
llm_pro = ChatOpenAI(
    model="gpt-5-pro",
    max_tokens=272000,        # GPT-5 Pro: 272K
)

Context Window and Output Limits

Model Context Window Max Output Tokens
gpt-5 400,000 (API) 128,000
gpt-5-mini 400,000 (API) 128,000
gpt-5-nano 400,000 (API) 128,000
gpt-5-pro 400,000 272,000
gpt-5.1 128,000 (ChatGPT) / 400,000 (API) 128,000
gpt-5.1-codex 400,000 128,000

Note: Context window is the combined length of input + output.

Vision (Image Processing)

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(model="gpt-5")

message = HumanMessage(
    content=[
        {"type": "text", "text": "What is shown in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://example.com/image.jpg",
                "detail": "high"  # "low", "high", "auto"
            }
        }
    ]
)

response = llm.invoke([message])

Tool Use (Function Calling)

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get weather"""
    return f"The weather in {location} is sunny"

@tool
def calculate(expression: str) -> float:
    """Calculate"""
    return eval(expression)

llm = ChatOpenAI(model="gpt-5")
llm_with_tools = llm.bind_tools([get_weather, calculate])

response = llm_with_tools.invoke("Tell me the weather in Tokyo and 2+2")
print(response.tool_calls)

Parallel Tool Calling

@tool
def get_stock_price(symbol: str) -> float:
    """Get stock price"""
    return 150.25

@tool
def get_company_info(symbol: str) -> dict:
    """Get company information"""
    return {"name": "Apple Inc.", "industry": "Technology"}

llm = ChatOpenAI(model="gpt-5")
llm_with_tools = llm.bind_tools([get_stock_price, get_company_info])

# Call multiple tools in parallel
response = llm_with_tools.invoke("Tell me the stock price and company info for AAPL")

Streaming

llm = ChatOpenAI(
    model="gpt-5",
    streaming=True
)

for chunk in llm.stream("Question"):
    print(chunk.content, end="", flush=True)

JSON Mode

llm = ChatOpenAI(
    model="gpt-5",
    model_kwargs={"response_format": {"type": "json_object"}}
)

response = llm.invoke("Return user information in JSON format")

Using GPT-5.1 Adaptive Reasoning

Instant Mode

Balance between speed and accuracy:

llm = ChatOpenAI(model="gpt-5.1-instant")

# Adaptively adjusts reasoning time
response = llm.invoke("Solve this problem...")

Thinking Mode

Deep thought for complex problems:

llm = ChatOpenAI(model="gpt-5.1-thinking")

# Improves accuracy with longer thinking time
response = llm.invoke("Complex math problem...")

Leveraging GPT-5 Pro

Extended reasoning for enterprise and research environments:

llm = ChatOpenAI(
    model="gpt-5-pro",
    temperature=0.3,  # Precision-focused
    max_tokens=272000  # Large output possible
)

# More detailed and reliable responses
response = llm.invoke("Detailed analysis of...")

Code Generation Specialized Models

# Codex used in GitHub Copilot
llm = ChatOpenAI(model="gpt-5.1-codex")

response = llm.invoke("Implement quicksort in Python")

# Compact version (fast)
llm_mini = ChatOpenAI(model="gpt-5.1-codex-mini")

Tracking Token Usage

from langchain.callbacks import get_openai_callback

llm = ChatOpenAI(model="gpt-5")

with get_openai_callback() as cb:
    response = llm.invoke("Question")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Azure OpenAI Service

GPT-5 is also available on Azure:

from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_key="your-azure-api-key",
    api_version="2024-12-01-preview",
    deployment_name="gpt-5",
    model="gpt-5"
)

Environment Variables (Azure)

export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-azure-api-key"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-5"

Error Handling

from langchain_openai import ChatOpenAI
from openai import OpenAIError, RateLimitError

try:
    llm = ChatOpenAI(model="gpt-5")
    response = llm.invoke("Question")
except RateLimitError:
    print("Rate limit reached")
except OpenAIError as e:
    print(f"OpenAI error: {e}")

Handling Rate Limits

from tenacity import retry, wait_exponential, stop_after_attempt
from openai import RateLimitError

@retry(
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5),
    retry=lambda e: isinstance(e, RateLimitError)
)
def invoke_with_retry(llm, messages):
    return llm.invoke(messages)

llm = ChatOpenAI(model="gpt-5")
response = invoke_with_retry(llm, ["Question"])

Leveraging Large Context

Utilizing GPT-5's 400K context window:

llm = ChatOpenAI(model="gpt-5")

# Process large amounts of documents at once
long_document = "..." * 100000  # Long document

response = llm.invoke(f"""
Please analyze the following document:

{long_document}

Provide a summary and key points.
""")

Compaction Technology

GPT-5.1 introduces technology that effectively handles longer contexts:

# Processing very long conversation histories or documents
llm = ChatOpenAI(model="gpt-5.1")

# Efficiently processed through Compaction
response = llm.invoke(very_long_context)