## 🚨 CRITICAL GUIDELINES ### Windows File Path Requirements **MANDATORY: Always Use Backslashes on Windows for File Paths** When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`). **Examples:** - ❌ WRONG: `D:/repos/project/file.tsx` - ✅ CORRECT: `D:\repos\project\file.tsx` This applies to: - Edit tool file_path parameter - Write tool file_path parameter - All file operations on Windows systems ### Documentation Guidelines **NEVER create new documentation files unless explicitly requested by the user.** - **Priority**: Update existing README.md files rather than creating new documentation - **Repository cleanliness**: Keep repository root clean - only README.md unless user requests otherwise - **Style**: Documentation should be concise, direct, and professional - avoid AI-generated tone - **User preference**: Only create additional .md files when user specifically asks for documentation --- # Azure OpenAI Service - 2025 Models and Features Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration. ## Overview Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability. ## Latest Models (2025) ### GPT-5 Series (GA August 2025) **Registration Required Models:** - `gpt-5-pro`: Highest capability, complex reasoning - `gpt-5`: Balanced performance and cost - `gpt-5-codex`: Optimized for code generation **No Registration Required:** - `gpt-5-mini`: Faster, more affordable - `gpt-5-nano`: Ultra-fast for simple tasks - `gpt-5-chat`: Optimized for conversational use ### GPT-4.1 Series - `gpt-4.1`: 1 million token context window - `gpt-4.1-mini`: Efficient version with 1M context - `gpt-4.1-nano`: Fastest variant **Key Improvements:** - 1,000,000 token context (vs 128K in GPT-4 Turbo) - Better instruction following - Reduced hallucinations - Improved multilingual support ### Reasoning Models **o4-mini**: Lightweight reasoning model - Faster inference - Lower cost - Suitable for structured reasoning tasks **o3**: Advanced reasoning model - Complex problem solving - Mathematical reasoning - Scientific analysis **o1**: Original reasoning model - General-purpose reasoning - Step-by-step explanations **o1-mini**: Efficient reasoning - Balanced cost and performance ### Image Generation **GPT-image-1 (2025-04-15)** - DALL-E 3 successor - Higher quality images - Better prompt understanding - Improved safety filters ### Video Generation **Sora (2025-05-02)** - Text-to-video generation - Realistic and imaginative scenes - Up to 60 seconds of video - Multiple camera angles and styles ### Audio Models **gpt-4o-transcribe**: Speech-to-text powered by GPT-4o - High accuracy transcription - Multiple languages - Speaker diarization **gpt-4o-mini-transcribe**: Faster, more affordable transcription - Good accuracy - Lower latency - Cost-effective ## Deploying Azure OpenAI ### Create Azure OpenAI Resource ```bash # Create OpenAI account az cognitiveservices account create \ --name myopenai \ --resource-group MyRG \ --kind OpenAI \ --sku S0 \ --location eastus \ --custom-domain myopenai \ --public-network-access Disabled \ --identity-type SystemAssigned # Get endpoint and key az cognitiveservices account show \ --name myopenai \ --resource-group MyRG \ --query "properties.endpoint" \ --output tsv az cognitiveservices account keys list \ --name myopenai \ --resource-group MyRG \ --query "key1" \ --output tsv ``` ### Deploy GPT-5 Model ```bash # Deploy gpt-5 az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --model-name gpt-5 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100 \ --scale-type Standard # Deploy gpt-5-pro (requires registration) az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5-pro \ --model-name gpt-5-pro \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 50 ``` ### Deploy Reasoning Models ```bash # Deploy o3 reasoning model az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name o3-reasoning \ --model-name o3 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 50 # Deploy o4-mini az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name o4-mini \ --model-name o4-mini \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100 ``` ### Deploy GPT-4.1 with 1M Context ```bash az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-4-1 \ --model-name gpt-4.1 \ --model-version latest \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 100 ``` ### Deploy Image Generation Model ```bash az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name image-gen \ --model-name gpt-image-1 \ --model-version 2025-04-15 \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 10 ``` ### Deploy Sora Video Generation ```bash az cognitiveservices account deployment create \ --resource-group MyRG \ --name myopenai \ --deployment-name sora \ --model-name sora \ --model-version 2025-05-02 \ --model-format OpenAI \ --sku-name Standard \ --sku-capacity 5 ``` ## Using Azure OpenAI Models ### Python SDK (GPT-5) ```python from openai import AzureOpenAI import os # Initialize client client = AzureOpenAI( api_key=os.getenv("AZURE_OPENAI_API_KEY"), api_version="2025-02-01-preview", azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") ) # GPT-5 completion response = client.chat.completions.create( model="gpt-5", # deployment name messages=[ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain quantum computing in simple terms."} ], max_tokens=1000, temperature=0.7, top_p=0.95 ) print(response.choices[0].message.content) ``` ### Python SDK (o3 Reasoning Model) ```python # o3 reasoning with chain-of-thought response = client.chat.completions.create( model="o3-reasoning", messages=[ {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."}, {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"} ], max_tokens=2000, temperature=0.2 # Lower temperature for reasoning tasks ) print(response.choices[0].message.content) ``` ### Python SDK (GPT-4.1 with 1M Context) ```python # Read a large document with open('large_document.txt', 'r') as f: document = f.read() # GPT-4.1 can handle up to 1M tokens response = client.chat.completions.create( model="gpt-4-1", messages=[ {"role": "system", "content": "You are a document analysis expert."}, {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"} ], max_tokens=4000 ) print(response.choices[0].message.content) ``` ### Image Generation (GPT-image-1) ```python # Generate image with DALL-E 3 successor response = client.images.generate( model="image-gen", prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K", size="1024x1024", quality="hd", n=1 ) image_url = response.data[0].url print(f"Generated image: {image_url}") ``` ### Video Generation (Sora) ```python # Generate video with Sora response = client.videos.generate( model="sora", prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore", duration=10, # seconds resolution="1080p", fps=30 ) video_url = response.data[0].url print(f"Generated video: {video_url}") ``` ### Audio Transcription ```python # Transcribe audio file audio_file = open("meeting_recording.mp3", "rb") response = client.audio.transcriptions.create( model="gpt-4o-transcribe", file=audio_file, language="en", response_format="verbose_json" ) print(f"Transcription: {response.text}") print(f"Duration: {response.duration}s") # Speaker diarization for segment in response.segments: print(f"[{segment.start}s - {segment.end}s] {segment.text}") ``` ## Azure AI Foundry Integration ### Model Router (Automatic Model Selection) ```python from azure.ai.foundry import ModelRouter # Initialize model router router = ModelRouter( endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"), credential=os.getenv("AZURE_OPENAI_API_KEY") ) # Automatically select optimal model response = router.complete( prompt="Analyze this complex scientific paper...", optimization_goals=["quality", "cost"], available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"] ) print(f"Selected model: {response.model_used}") print(f"Response: {response.content}") print(f"Cost: ${response.cost}") ``` **Benefits:** - Automatic model selection based on prompt complexity - Balance quality vs cost - Reduce costs by up to 40% while maintaining quality ### Agentic Retrieval (Azure AI Search Integration) ```python from azure.search.documents import SearchClient from azure.core.credentials import AzureKeyCredential # Initialize search client search_client = SearchClient( endpoint=os.getenv("SEARCH_ENDPOINT"), index_name="documents", credential=AzureKeyCredential(os.getenv("SEARCH_KEY")) ) # Agentic retrieval with Azure OpenAI response = client.chat.completions.create( model="gpt-5", messages=[ {"role": "system", "content": "You have access to a document search system."}, {"role": "user", "content": "What are the company's revenue projections for Q3?"} ], tools=[{ "type": "function", "function": { "name": "search_documents", "description": "Search company documents", "parameters": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } } }], tool_choice="auto" ) # Process tool calls if response.choices[0].message.tool_calls: for tool_call in response.choices[0].message.tool_calls: if tool_call.function.name == "search_documents": query = json.loads(tool_call.function.arguments)["query"] results = search_client.search(query) # Feed results back to model for final answer ``` **Improvements:** - 40% better on complex, multi-part questions - Automatic query decomposition - Relevance ranking - Citation generation ### Foundry Observability (Preview) ```python from azure.ai.foundry import FoundryObservability # Enable observability observability = FoundryObservability( workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"), enable_tracing=True, enable_metrics=True ) # Monitor agent execution with observability.trace_agent("customer_support_agent") as trace: response = client.chat.completions.create( model="gpt-5", messages=messages ) trace.log_tool_call("search_kb", {"query": "refund policy"}) trace.log_reasoning_step("Retrieved refund policy document") trace.log_token_usage(response.usage.total_tokens) # View in Azure AI Foundry portal: # - End-to-end trace logs # - Reasoning steps and tool calls # - Performance metrics # - Cost analysis ``` ## Capacity and Quota Management ### Check Quota ```bash # List deployments with usage az cognitiveservices account deployment list \ --resource-group MyRG \ --name myopenai \ --output table # Check usage metrics az monitor metrics list \ --resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \ --metric "TokenTransaction" \ --start-time 2025-01-01T00:00:00Z \ --end-time 2025-01-31T23:59:59Z \ --interval PT1H \ --aggregation Total ``` ### Update Capacity ```bash # Scale up deployment capacity az cognitiveservices account deployment update \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --sku-capacity 200 # Scale down during off-peak az cognitiveservices account deployment update \ --resource-group MyRG \ --name myopenai \ --deployment-name gpt-5 \ --sku-capacity 50 ``` ### Request Quota Increase 1. Navigate to Azure Portal → Azure OpenAI resource 2. Go to "Quotas" blade 3. Select model and region 4. Click "Request quota increase" 5. Provide justification and target capacity ## Security and Networking ### Private Endpoint ```bash # Create private endpoint az network private-endpoint create \ --name openai-private-endpoint \ --resource-group MyRG \ --vnet-name MyVNet \ --subnet PrivateEndpointSubnet \ --private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \ --group-id account \ --connection-name openai-connection # Create private DNS zone az network private-dns zone create \ --resource-group MyRG \ --name privatelink.openai.azure.com # Link to VNet az network private-dns link vnet create \ --resource-group MyRG \ --zone-name privatelink.openai.azure.com \ --name openai-dns-link \ --virtual-network MyVNet \ --registration-enabled false # Create DNS zone group az network private-endpoint dns-zone-group create \ --resource-group MyRG \ --endpoint-name openai-private-endpoint \ --name default \ --private-dns-zone privatelink.openai.azure.com \ --zone-name privatelink.openai.azure.com ``` ### Managed Identity Access ```bash # Enable system-assigned identity az cognitiveservices account identity assign \ --name myopenai \ --resource-group MyRG # Grant role to managed identity PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv) az role assignment create \ --assignee $PRINCIPAL_ID \ --role "Cognitive Services OpenAI User" \ --scope /subscriptions//resourceGroups/MyRG ``` ### Content Filtering ```bash # Configure content filtering az cognitiveservices account update \ --name myopenai \ --resource-group MyRG \ --set properties.customContentFilter='{ "hate": {"severity": "medium", "enabled": true}, "violence": {"severity": "medium", "enabled": true}, "sexual": {"severity": "medium", "enabled": true}, "selfHarm": {"severity": "high", "enabled": true} }' ``` ## Cost Optimization ### Model Selection Strategy **Use GPT-5-mini or GPT-5-nano for:** - Simple questions - Classification tasks - Content moderation - Summarization **Use GPT-5 or GPT-4.1 for:** - Complex reasoning - Long-form content generation - Document analysis - Code generation **Use Reasoning Models (o3, o4-mini) for:** - Mathematical problems - Scientific analysis - Step-by-step reasoning - Logic puzzles ### Implement Caching ```python # Use semantic cache to reduce duplicate requests from azure.ai.cache import SemanticCache cache = SemanticCache( similarity_threshold=0.95, ttl_seconds=3600 ) # Check cache before API call cached_response = cache.get(user_query) if cached_response: return cached_response response = client.chat.completions.create( model="gpt-5", messages=messages ) cache.set(user_query, response) ``` ### Token Management ```python import tiktoken # Count tokens before API call encoding = tiktoken.get_encoding("cl100k_base") tokens = len(encoding.encode(prompt)) if tokens > 100000: print(f"Warning: Prompt has {tokens} tokens, this will be expensive!") # Use shorter max_tokens when appropriate response = client.chat.completions.create( model="gpt-5", messages=messages, max_tokens=500 # Limit output tokens ) ``` ## Monitoring and Alerts ### Set Up Cost Alerts ```bash # Create budget alert az consumption budget create \ --budget-name openai-monthly-budget \ --resource-group MyRG \ --amount 1000 \ --category Cost \ --time-grain Monthly \ --start-date 2025-01-01 \ --end-date 2025-12-31 \ --notifications '{ "actual_GreaterThan_80_Percent": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["billing@example.com"] } }' ``` ### Application Insights Integration ```python from opencensus.ext.azure.log_exporter import AzureLogHandler import logging # Configure logging logger = logging.getLogger(__name__) logger.addHandler(AzureLogHandler( connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING") )) # Log API calls logger.info("OpenAI API call", extra={ "custom_dimensions": { "model": "gpt-5", "tokens": response.usage.total_tokens, "cost": calculate_cost(response.usage.total_tokens), "latency_ms": response.response_ms } }) ``` ## Best Practices ✓ **Use Model Router** for automatic cost optimization ✓ **Implement caching** to reduce duplicate requests ✓ **Monitor token usage** and set budgets ✓ **Use private endpoints** for production workloads ✓ **Enable managed identity** instead of API keys ✓ **Configure content filtering** for safety ✓ **Right-size capacity** based on actual demand ✓ **Use Foundry Observability** for monitoring ✓ **Implement retry logic** with exponential backoff ✓ **Choose appropriate models** for task complexity ## References - [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/) - [What's New in Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/whats-new) - [GPT-5 Announcement](https://azure.microsoft.com/en-us/blog/gpt-5-azure/) - [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-foundry/) - [Model Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!