1.7 KiB
1.7 KiB
name, description
| name | description |
|---|---|
| audio-transcription-cleanup | Transform messy voice transcription text into well-formatted, human-readable documents while preserving original meaning |
Audio Transcription Cleanup
Clean up raw audio transcriptions by removing filler words, fixing errors, and adding proper structure.
Usage
Use the audio_transcript_cleanup.py script to process transcript files:
# Use default output location (~/tmp/cleaned_transcript.md - allows overwrite)
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt
# Specify custom output location (cannot overwrite existing files)
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt --output /path/to/output.md
What It Does
The script automatically:
- Removes verbal artifacts (um, uh, like, you know, 呃, 啊, 那个, etc.)
- Fixes spelling and grammar errors
- Adds semantic paragraph breaks and section headings
- Converts spoken fragments into complete sentences
- Preserves all original information (no summarization)
- Auto-detects language and maintains natural expression
Options
--transcript-file(required) - Path to the transcript file to clean up--output(optional) - Custom output path (default:~/tmp/cleaned_transcript.md)
Output Behavior
- Default location:
~/tmp/cleaned_transcript.md- Allows overwrite - Custom location: Cannot overwrite existing files (raises error if file exists)
Language Support
Auto-detects and works with:
- English
- Chinese (Mandarin, Cantonese)
- Mixed language content
- Multi-speaker transcriptions
Requirements
- Python 3.11+
- Claude CLI must be installed and accessible
- Transcript file must exist at specified path