Initial commit

2025-11-30 08:28:52 +08:00
commit b40af6b4cc
9 changed files with 3945 additions and 0 deletions
--- a/skills/azure-openai-2025.md
+++ b/skills/azure-openai-2025.md
@@ -0,0 +1,718 @@
+## 🚨 CRITICAL GUIDELINES
+
+### Windows File Path Requirements
+
+**MANDATORY: Always Use Backslashes on Windows for File Paths**
+
+When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
+
+**Examples:**
+- ❌ WRONG: `D:/repos/project/file.tsx`
+- ✅ CORRECT: `D:\repos\project\file.tsx`
+
+This applies to:
+- Edit tool file_path parameter
+- Write tool file_path parameter
+- All file operations on Windows systems
+
+### Documentation Guidelines
+
+**NEVER create new documentation files unless explicitly requested by the user.**
+
+- **Priority**: Update existing README.md files rather than creating new documentation
+- **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise
+- **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone
+- **User preference**: Only create additional .md files when user specifically asks for documentation
+
+---
+
+
+# Azure OpenAI Service - 2025 Models and Features
+
+Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
+
+## Overview
+
+Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
+
+## Latest Models (2025)
+
+### GPT-5 Series (GA August 2025)
+
+**Registration Required Models:**
+- `gpt-5-pro`: Highest capability, complex reasoning
+- `gpt-5`: Balanced performance and cost
+- `gpt-5-codex`: Optimized for code generation
+
+**No Registration Required:**
+- `gpt-5-mini`: Faster, more affordable
+- `gpt-5-nano`: Ultra-fast for simple tasks
+- `gpt-5-chat`: Optimized for conversational use
+
+### GPT-4.1 Series
+
+- `gpt-4.1`: 1 million token context window
+- `gpt-4.1-mini`: Efficient version with 1M context
+- `gpt-4.1-nano`: Fastest variant
+
+**Key Improvements:**
+- 1,000,000 token context (vs 128K in GPT-4 Turbo)
+- Better instruction following
+- Reduced hallucinations
+- Improved multilingual support
+
+### Reasoning Models
+
+**o4-mini**: Lightweight reasoning model
+- Faster inference
+- Lower cost
+- Suitable for structured reasoning tasks
+
+**o3**: Advanced reasoning model
+- Complex problem solving
+- Mathematical reasoning
+- Scientific analysis
+
+**o1**: Original reasoning model
+- General-purpose reasoning
+- Step-by-step explanations
+
+**o1-mini**: Efficient reasoning
+- Balanced cost and performance
+
+### Image Generation
+
+**GPT-image-1 (2025-04-15)**
+- DALL-E 3 successor
+- Higher quality images
+- Better prompt understanding
+- Improved safety filters
+
+### Video Generation
+
+**Sora (2025-05-02)**
+- Text-to-video generation
+- Realistic and imaginative scenes
+- Up to 60 seconds of video
+- Multiple camera angles and styles
+
+### Audio Models
+
+**gpt-4o-transcribe**: Speech-to-text powered by GPT-4o
+- High accuracy transcription
+- Multiple languages
+- Speaker diarization
+
+**gpt-4o-mini-transcribe**: Faster, more affordable transcription
+- Good accuracy
+- Lower latency
+- Cost-effective
+
+## Deploying Azure OpenAI
+
+### Create Azure OpenAI Resource
+
+```bash
+# Create OpenAI account
+az cognitiveservices account create \
+  --name myopenai \
+  --resource-group MyRG \
+  --kind OpenAI \
+  --sku S0 \
+  --location eastus \
+  --custom-domain myopenai \
+  --public-network-access Disabled \
+  --identity-type SystemAssigned
+
+# Get endpoint and key
+az cognitiveservices account show \
+  --name myopenai \
+  --resource-group MyRG \
+  --query "properties.endpoint" \
+  --output tsv
+
+az cognitiveservices account keys list \
+  --name myopenai \
+  --resource-group MyRG \
+  --query "key1" \
+  --output tsv
+```
+
+### Deploy GPT-5 Model
+
+```bash
+# Deploy gpt-5
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name gpt-5 \
+  --model-name gpt-5 \
+  --model-version latest \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 100 \
+  --scale-type Standard
+
+# Deploy gpt-5-pro (requires registration)
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name gpt-5-pro \
+  --model-name gpt-5-pro \
+  --model-version latest \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 50
+```
+
+### Deploy Reasoning Models
+
+```bash
+# Deploy o3 reasoning model
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name o3-reasoning \
+  --model-name o3 \
+  --model-version latest \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 50
+
+# Deploy o4-mini
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name o4-mini \
+  --model-name o4-mini \
+  --model-version latest \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 100
+```
+
+### Deploy GPT-4.1 with 1M Context
+
+```bash
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name gpt-4-1 \
+  --model-name gpt-4.1 \
+  --model-version latest \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 100
+```
+
+### Deploy Image Generation Model
+
+```bash
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name image-gen \
+  --model-name gpt-image-1 \
+  --model-version 2025-04-15 \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 10
+```
+
+### Deploy Sora Video Generation
+
+```bash
+az cognitiveservices account deployment create \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name sora \
+  --model-name sora \
+  --model-version 2025-05-02 \
+  --model-format OpenAI \
+  --sku-name Standard \
+  --sku-capacity 5
+```
+
+## Using Azure OpenAI Models
+
+### Python SDK (GPT-5)
+
+```python
+from openai import AzureOpenAI
+import os
+
+# Initialize client
+client = AzureOpenAI(
+    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
+    api_version="2025-02-01-preview",
+    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
+)
+
+# GPT-5 completion
+response = client.chat.completions.create(
+    model="gpt-5",  # deployment name
+    messages=[
+        {"role": "system", "content": "You are a helpful AI assistant."},
+        {"role": "user", "content": "Explain quantum computing in simple terms."}
+    ],
+    max_tokens=1000,
+    temperature=0.7,
+    top_p=0.95
+)
+
+print(response.choices[0].message.content)
+```
+
+### Python SDK (o3 Reasoning Model)
+
+```python
+# o3 reasoning with chain-of-thought
+response = client.chat.completions.create(
+    model="o3-reasoning",
+    messages=[
+        {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
+        {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
+    ],
+    max_tokens=2000,
+    temperature=0.2  # Lower temperature for reasoning tasks
+)
+
+print(response.choices[0].message.content)
+```
+
+### Python SDK (GPT-4.1 with 1M Context)
+
+```python
+# Read a large document
+with open('large_document.txt', 'r') as f:
+    document = f.read()
+
+# GPT-4.1 can handle up to 1M tokens
+response = client.chat.completions.create(
+    model="gpt-4-1",
+    messages=[
+        {"role": "system", "content": "You are a document analysis expert."},
+        {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
+    ],
+    max_tokens=4000
+)
+
+print(response.choices[0].message.content)
+```
+
+### Image Generation (GPT-image-1)
+
+```python
+# Generate image with DALL-E 3 successor
+response = client.images.generate(
+    model="image-gen",
+    prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
+    size="1024x1024",
+    quality="hd",
+    n=1
+)
+
+image_url = response.data[0].url
+print(f"Generated image: {image_url}")
+```
+
+### Video Generation (Sora)
+
+```python
+# Generate video with Sora
+response = client.videos.generate(
+    model="sora",
+    prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
+    duration=10,  # seconds
+    resolution="1080p",
+    fps=30
+)
+
+video_url = response.data[0].url
+print(f"Generated video: {video_url}")
+```
+
+### Audio Transcription
+
+```python
+# Transcribe audio file
+audio_file = open("meeting_recording.mp3", "rb")
+
+response = client.audio.transcriptions.create(
+    model="gpt-4o-transcribe",
+    file=audio_file,
+    language="en",
+    response_format="verbose_json"
+)
+
+print(f"Transcription: {response.text}")
+print(f"Duration: {response.duration}s")
+
+# Speaker diarization
+for segment in response.segments:
+    print(f"[{segment.start}s - {segment.end}s] {segment.text}")
+```
+
+## Azure AI Foundry Integration
+
+### Model Router (Automatic Model Selection)
+
+```python
+from azure.ai.foundry import ModelRouter
+
+# Initialize model router
+router = ModelRouter(
+    endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
+    credential=os.getenv("AZURE_OPENAI_API_KEY")
+)
+
+# Automatically select optimal model
+response = router.complete(
+    prompt="Analyze this complex scientific paper...",
+    optimization_goals=["quality", "cost"],
+    available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
+)
+
+print(f"Selected model: {response.model_used}")
+print(f"Response: {response.content}")
+print(f"Cost: ${response.cost}")
+```
+
+**Benefits:**
+- Automatic model selection based on prompt complexity
+- Balance quality vs cost
+- Reduce costs by up to 40% while maintaining quality
+
+### Agentic Retrieval (Azure AI Search Integration)
+
+```python
+from azure.search.documents import SearchClient
+from azure.core.credentials import AzureKeyCredential
+
+# Initialize search client
+search_client = SearchClient(
+    endpoint=os.getenv("SEARCH_ENDPOINT"),
+    index_name="documents",
+    credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
+)
+
+# Agentic retrieval with Azure OpenAI
+response = client.chat.completions.create(
+    model="gpt-5",
+    messages=[
+        {"role": "system", "content": "You have access to a document search system."},
+        {"role": "user", "content": "What are the company's revenue projections for Q3?"}
+    ],
+    tools=[{
+        "type": "function",
+        "function": {
+            "name": "search_documents",
+            "description": "Search company documents",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search query"}
+                },
+                "required": ["query"]
+            }
+        }
+    }],
+    tool_choice="auto"
+)
+
+# Process tool calls
+if response.choices[0].message.tool_calls:
+    for tool_call in response.choices[0].message.tool_calls:
+        if tool_call.function.name == "search_documents":
+            query = json.loads(tool_call.function.arguments)["query"]
+            results = search_client.search(query)
+            # Feed results back to model for final answer
+```
+
+**Improvements:**
+- 40% better on complex, multi-part questions
+- Automatic query decomposition
+- Relevance ranking
+- Citation generation
+
+### Foundry Observability (Preview)
+
+```python
+from azure.ai.foundry import FoundryObservability
+
+# Enable observability
+observability = FoundryObservability(
+    workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
+    enable_tracing=True,
+    enable_metrics=True
+)
+
+# Monitor agent execution
+with observability.trace_agent("customer_support_agent") as trace:
+    response = client.chat.completions.create(
+        model="gpt-5",
+        messages=messages
+    )
+
+    trace.log_tool_call("search_kb", {"query": "refund policy"})
+    trace.log_reasoning_step("Retrieved refund policy document")
+    trace.log_token_usage(response.usage.total_tokens)
+
+# View in Azure AI Foundry portal:
+# - End-to-end trace logs
+# - Reasoning steps and tool calls
+# - Performance metrics
+# - Cost analysis
+```
+
+## Capacity and Quota Management
+
+### Check Quota
+
+```bash
+# List deployments with usage
+az cognitiveservices account deployment list \
+  --resource-group MyRG \
+  --name myopenai \
+  --output table
+
+# Check usage metrics
+az monitor metrics list \
+  --resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
+  --metric "TokenTransaction" \
+  --start-time 2025-01-01T00:00:00Z \
+  --end-time 2025-01-31T23:59:59Z \
+  --interval PT1H \
+  --aggregation Total
+```
+
+### Update Capacity
+
+```bash
+# Scale up deployment capacity
+az cognitiveservices account deployment update \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name gpt-5 \
+  --sku-capacity 200
+
+# Scale down during off-peak
+az cognitiveservices account deployment update \
+  --resource-group MyRG \
+  --name myopenai \
+  --deployment-name gpt-5 \
+  --sku-capacity 50
+```
+
+### Request Quota Increase
+
+1. Navigate to Azure Portal → Azure OpenAI resource
+2. Go to "Quotas" blade
+3. Select model and region
+4. Click "Request quota increase"
+5. Provide justification and target capacity
+
+## Security and Networking
+
+### Private Endpoint
+
+```bash
+# Create private endpoint
+az network private-endpoint create \
+  --name openai-private-endpoint \
+  --resource-group MyRG \
+  --vnet-name MyVNet \
+  --subnet PrivateEndpointSubnet \
+  --private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
+  --group-id account \
+  --connection-name openai-connection
+
+# Create private DNS zone
+az network private-dns zone create \
+  --resource-group MyRG \
+  --name privatelink.openai.azure.com
+
+# Link to VNet
+az network private-dns link vnet create \
+  --resource-group MyRG \
+  --zone-name privatelink.openai.azure.com \
+  --name openai-dns-link \
+  --virtual-network MyVNet \
+  --registration-enabled false
+
+# Create DNS zone group
+az network private-endpoint dns-zone-group create \
+  --resource-group MyRG \
+  --endpoint-name openai-private-endpoint \
+  --name default \
+  --private-dns-zone privatelink.openai.azure.com \
+  --zone-name privatelink.openai.azure.com
+```
+
+### Managed Identity Access
+
+```bash
+# Enable system-assigned identity
+az cognitiveservices account identity assign \
+  --name myopenai \
+  --resource-group MyRG
+
+# Grant role to managed identity
+PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
+
+az role assignment create \
+  --assignee $PRINCIPAL_ID \
+  --role "Cognitive Services OpenAI User" \
+  --scope /subscriptions/<sub-id>/resourceGroups/MyRG
+```
+
+### Content Filtering
+
+```bash
+# Configure content filtering
+az cognitiveservices account update \
+  --name myopenai \
+  --resource-group MyRG \
+  --set properties.customContentFilter='{
+    "hate": {"severity": "medium", "enabled": true},
+    "violence": {"severity": "medium", "enabled": true},
+    "sexual": {"severity": "medium", "enabled": true},
+    "selfHarm": {"severity": "high", "enabled": true}
+  }'
+```
+
+## Cost Optimization
+
+### Model Selection Strategy
+
+**Use GPT-5-mini or GPT-5-nano for:**
+- Simple questions
+- Classification tasks
+- Content moderation
+- Summarization
+
+**Use GPT-5 or GPT-4.1 for:**
+- Complex reasoning
+- Long-form content generation
+- Document analysis
+- Code generation
+
+**Use Reasoning Models (o3, o4-mini) for:**
+- Mathematical problems
+- Scientific analysis
+- Step-by-step reasoning
+- Logic puzzles
+
+### Implement Caching
+
+```python
+# Use semantic cache to reduce duplicate requests
+from azure.ai.cache import SemanticCache
+
+cache = SemanticCache(
+    similarity_threshold=0.95,
+    ttl_seconds=3600
+)
+
+# Check cache before API call
+cached_response = cache.get(user_query)
+if cached_response:
+    return cached_response
+
+response = client.chat.completions.create(
+    model="gpt-5",
+    messages=messages
+)
+
+cache.set(user_query, response)
+```
+
+### Token Management
+
+```python
+import tiktoken
+
+# Count tokens before API call
+encoding = tiktoken.get_encoding("cl100k_base")
+tokens = len(encoding.encode(prompt))
+
+if tokens > 100000:
+    print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
+
+# Use shorter max_tokens when appropriate
+response = client.chat.completions.create(
+    model="gpt-5",
+    messages=messages,
+    max_tokens=500  # Limit output tokens
+)
+```
+
+## Monitoring and Alerts
+
+### Set Up Cost Alerts
+
+```bash
+# Create budget alert
+az consumption budget create \
+  --budget-name openai-monthly-budget \
+  --resource-group MyRG \
+  --amount 1000 \
+  --category Cost \
+  --time-grain Monthly \
+  --start-date 2025-01-01 \
+  --end-date 2025-12-31 \
+  --notifications '{
+    "actual_GreaterThan_80_Percent": {
+      "enabled": true,
+      "operator": "GreaterThan",
+      "threshold": 80,
+      "contactEmails": ["billing@example.com"]
+    }
+  }'
+```
+
+### Application Insights Integration
+
+```python
+from opencensus.ext.azure.log_exporter import AzureLogHandler
+import logging
+
+# Configure logging
+logger = logging.getLogger(__name__)
+logger.addHandler(AzureLogHandler(
+    connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
+))
+
+# Log API calls
+logger.info("OpenAI API call", extra={
+    "custom_dimensions": {
+        "model": "gpt-5",
+        "tokens": response.usage.total_tokens,
+        "cost": calculate_cost(response.usage.total_tokens),
+        "latency_ms": response.response_ms
+    }
+})
+```
+
+## Best Practices
+
+✓ **Use Model Router** for automatic cost optimization
+✓ **Implement caching** to reduce duplicate requests
+✓ **Monitor token usage** and set budgets
+✓ **Use private endpoints** for production workloads
+✓ **Enable managed identity** instead of API keys
+✓ **Configure content filtering** for safety
+✓ **Right-size capacity** based on actual demand
+✓ **Use Foundry Observability** for monitoring
+✓ **Implement retry logic** with exponential backoff
+✓ **Choose appropriate models** for task complexity
+
+## References
+
+- [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/)
+- [What's New in Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/whats-new)
+- [GPT-5 Announcement](https://azure.microsoft.com/en-us/blog/gpt-5-azure/)
+- [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/)
+- [Model Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
+
+Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!