Initial commit

2025-11-30 08:25:12 +08:00
commit 7a35a34caa
30 changed files with 8396 additions and 0 deletions
--- a/references/audio-guide.md
+++ b/references/audio-guide.md
@@ -0,0 +1,205 @@
+# Audio Guide (Whisper & TTS)
+
+**Last Updated**: 2025-10-25
+
+Complete guide to OpenAI's Audio API for transcription and text-to-speech.
+
+---
+
+## Whisper Transcription
+
+### Supported Formats
+- mp3, mp4, mpeg, mpga, m4a, wav, webm
+
+### Best Practices
+
+✅ **Audio Quality**:
+- Use clear audio with minimal background noise
+- 16 kHz or higher sample rate recommended
+- Mono or stereo both supported
+
+✅ **File Size**:
+- Max file size: 25 MB
+- For larger files: split into chunks or compress
+
+✅ **Languages**:
+- Whisper automatically detects language
+- Supports 50+ languages
+- Best results with English, Spanish, French, German, Chinese
+
+❌ **Limitations**:
+- May struggle with heavy accents
+- Background noise reduces accuracy
+- Very quiet audio may fail
+
+---
+
+## Text-to-Speech (TTS)
+
+### Model Selection
+
+| Model | Quality | Latency | Features | Best For |
+|-------|---------|---------|----------|----------|
+| tts-1 | Standard | Lowest | Basic TTS | Real-time streaming |
+| tts-1-hd | High | Medium | Better fidelity | Offline audio, podcasts |
+| gpt-4o-mini-tts | Best | Medium | Voice instructions, streaming | Maximum control |
+
+### Voice Selection Guide
+
+| Voice | Character | Best For |
+|-------|-----------|----------|
+| alloy | Neutral, balanced | General use, professional |
+| ash | Clear, professional | Business, presentations |
+| ballad | Warm, storytelling | Narration, audiobooks |
+| coral | Soft, friendly | Customer service, greetings |
+| echo | Calm, measured | Meditation, calm content |
+| fable | Expressive, narrative | Stories, entertainment |
+| onyx | Deep, authoritative | News, serious content |
+| nova | Bright, energetic | Marketing, enthusiastic content |
+| sage | Wise, thoughtful | Educational, informative |
+| shimmer | Gentle, soothing | Relaxation, sleep content |
+| verse | Poetic, rhythmic | Poetry, artistic content |
+
+### Voice Instructions (gpt-4o-mini-tts only)
+
+```typescript
+// Professional tone
+{
+  model: 'gpt-4o-mini-tts',
+  voice: 'ash',
+  input: 'Welcome to our service',
+  instructions: 'Speak in a calm, professional, and friendly tone suitable for customer service.',
+}
+
+// Energetic marketing
+{
+  model: 'gpt-4o-mini-tts',
+  voice: 'nova',
+  input: 'Don\'t miss this sale!',
+  instructions: 'Use an enthusiastic, energetic tone perfect for marketing and advertisements.',
+}
+
+// Meditation guidance
+{
+  model: 'gpt-4o-mini-tts',
+  voice: 'shimmer',
+  input: 'Take a deep breath',
+  instructions: 'Adopt a calm, soothing voice suitable for meditation and relaxation guidance.',
+}
+```
+
+### Speed Control
+
+```typescript
+// Slow (0.5x)
+{ speed: 0.5 } // Good for: Learning, accessibility
+
+// Normal (1.0x)
+{ speed: 1.0 } // Default
+
+// Fast (1.5x)
+{ speed: 1.5 } // Good for: Previews, time-saving
+
+// Very fast (2.0x)
+{ speed: 2.0 } // Good for: Quick previews only
+```
+
+Range: 0.25 to 4.0
+
+### Audio Format Selection
+
+| Format | Compression | Quality | Best For |
+|--------|-------------|---------|----------|
+| mp3 | Lossy | Good | Maximum compatibility |
+| opus | Lossy | Excellent | Web streaming, low bandwidth |
+| aac | Lossy | Good | iOS, Apple devices |
+| flac | Lossless | Best | Archiving, editing |
+| wav | Uncompressed | Best | Editing, processing |
+| pcm | Raw | Best | Low-level processing |
+
+---
+
+## Common Patterns
+
+### 1. Transcribe Interview
+
+```typescript
+const transcription = await openai.audio.transcriptions.create({
+  file: fs.createReadStream('./interview.mp3'),
+  model: 'whisper-1',
+});
+
+// Save transcript
+fs.writeFileSync('./interview.txt', transcription.text);
+```
+
+### 2. Generate Podcast Narration
+
+```typescript
+const script = "Welcome to today's podcast...";
+
+const audio = await openai.audio.speech.create({
+  model: 'tts-1-hd',
+  voice: 'fable',
+  input: script,
+  response_format: 'mp3',
+});
+
+const buffer = Buffer.from(await audio.arrayBuffer());
+fs.writeFileSync('./podcast.mp3', buffer);
+```
+
+### 3. Multi-Voice Conversation
+
+```typescript
+// Speaker 1
+const speaker1 = await openai.audio.speech.create({
+  model: 'tts-1',
+  voice: 'onyx',
+  input: 'Hello, how are you?',
+});
+
+// Speaker 2
+const speaker2 = await openai.audio.speech.create({
+  model: 'tts-1',
+  voice: 'nova',
+  input: 'I\'m doing great, thanks!',
+});
+
+// Combine audio files (requires audio processing library)
+```
+
+---
+
+## Cost Optimization
+
+1. **Use tts-1 for real-time** (cheaper, faster)
+2. **Use tts-1-hd for final production** (better quality)
+3. **Cache generated audio** (deterministic for same input)
+4. **Choose appropriate format** (opus for web, mp3 for compatibility)
+5. **Batch transcriptions** with delays to avoid rate limits
+
+---
+
+## Common Issues
+
+### Transcription Accuracy
+- Improve audio quality
+- Reduce background noise
+- Ensure adequate volume levels
+- Use supported audio formats
+
+### TTS Naturalness
+- Test different voices
+- Use voice instructions (gpt-4o-mini-tts)
+- Adjust speed for better pacing
+- Add punctuation for natural pauses
+
+### File Size
+- Compress audio before transcribing
+- Choose lossy formats (mp3, opus) for TTS
+- Use appropriate bitrates
+
+---
+
+**See Also**: Official Audio Guide (https://platform.openai.com/docs/guides/speech-to-text)