Files
2025-11-30 08:38:41 +08:00

1.7 KiB

name, description
name description
audio-transcription-cleanup Transform messy voice transcription text into well-formatted, human-readable documents while preserving original meaning

Audio Transcription Cleanup

Clean up raw audio transcriptions by removing filler words, fixing errors, and adding proper structure.

Usage

Use the audio_transcript_cleanup.py script to process transcript files:

# Use default output location (~/tmp/cleaned_transcript.md - allows overwrite)
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt

# Specify custom output location (cannot overwrite existing files)
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt --output /path/to/output.md

What It Does

The script automatically:

  • Removes verbal artifacts (um, uh, like, you know, 呃, 啊, 那个, etc.)
  • Fixes spelling and grammar errors
  • Adds semantic paragraph breaks and section headings
  • Converts spoken fragments into complete sentences
  • Preserves all original information (no summarization)
  • Auto-detects language and maintains natural expression

Options

  • --transcript-file (required) - Path to the transcript file to clean up
  • --output (optional) - Custom output path (default: ~/tmp/cleaned_transcript.md)

Output Behavior

  • Default location: ~/tmp/cleaned_transcript.md - Allows overwrite
  • Custom location: Cannot overwrite existing files (raises error if file exists)

Language Support

Auto-detects and works with:

  • English
  • Chinese (Mandarin, Cantonese)
  • Mixed language content
  • Multi-speaker transcriptions

Requirements

  • Python 3.11+
  • Claude CLI must be installed and accessible
  • Transcript file must exist at specified path