Initial commit

2025-11-30 08:38:41 +08:00
commit 7d5b628e7d
13 changed files with 1610 additions and 0 deletions
--- a/skills/audio-transcript-cleanup/SKILL.md
+++ b/skills/audio-transcript-cleanup/SKILL.md
@@ -0,0 +1,54 @@
+---
+name: audio-transcription-cleanup
+description: Transform messy voice transcription text into well-formatted, human-readable documents while preserving original meaning
+---
+
+# Audio Transcription Cleanup
+
+Clean up raw audio transcriptions by removing filler words, fixing errors, and adding proper structure.
+
+## Usage
+
+Use the `audio_transcript_cleanup.py` script to process transcript files:
+
+```bash
+# Use default output location (~/tmp/cleaned_transcript.md - allows overwrite)
+python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt
+
+# Specify custom output location (cannot overwrite existing files)
+python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt --output /path/to/output.md
+```
+
+## What It Does
+
+The script automatically:
+- Removes verbal artifacts (um, uh, like, you know, 呃, 啊, 那个, etc.)
+- Fixes spelling and grammar errors
+- Adds semantic paragraph breaks and section headings
+- Converts spoken fragments into complete sentences
+- Preserves all original information (no summarization)
+- Auto-detects language and maintains natural expression
+
+## Options
+
+- `--transcript-file` (required) - Path to the transcript file to clean up
+- `--output` (optional) - Custom output path (default: `~/tmp/cleaned_transcript.md`)
+
+## Output Behavior
+
+- **Default location**: `~/tmp/cleaned_transcript.md` - Allows overwrite
+- **Custom location**: Cannot overwrite existing files (raises error if file exists)
+
+## Language Support
+
+Auto-detects and works with:
+- English
+- Chinese (Mandarin, Cantonese)
+- Mixed language content
+- Multi-speaker transcriptions
+
+## Requirements
+
+- Python 3.11+
+- Claude CLI must be installed and accessible
+- Transcript file must exist at specified path