Initial commit
This commit is contained in:
54
skills/audio-transcript-cleanup/SKILL.md
Normal file
54
skills/audio-transcript-cleanup/SKILL.md
Normal file
@@ -0,0 +1,54 @@
|
||||
---
|
||||
name: audio-transcription-cleanup
|
||||
description: Transform messy voice transcription text into well-formatted, human-readable documents while preserving original meaning
|
||||
---
|
||||
|
||||
# Audio Transcription Cleanup
|
||||
|
||||
Clean up raw audio transcriptions by removing filler words, fixing errors, and adding proper structure.
|
||||
|
||||
## Usage
|
||||
|
||||
Use the `audio_transcript_cleanup.py` script to process transcript files:
|
||||
|
||||
```bash
|
||||
# Use default output location (~/tmp/cleaned_transcript.md - allows overwrite)
|
||||
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt
|
||||
|
||||
# Specify custom output location (cannot overwrite existing files)
|
||||
python scripts/audio_transcript_cleanup.py --transcript-file /path/to/transcript.txt --output /path/to/output.md
|
||||
```
|
||||
|
||||
## What It Does
|
||||
|
||||
The script automatically:
|
||||
- Removes verbal artifacts (um, uh, like, you know, 呃, 啊, 那个, etc.)
|
||||
- Fixes spelling and grammar errors
|
||||
- Adds semantic paragraph breaks and section headings
|
||||
- Converts spoken fragments into complete sentences
|
||||
- Preserves all original information (no summarization)
|
||||
- Auto-detects language and maintains natural expression
|
||||
|
||||
## Options
|
||||
|
||||
- `--transcript-file` (required) - Path to the transcript file to clean up
|
||||
- `--output` (optional) - Custom output path (default: `~/tmp/cleaned_transcript.md`)
|
||||
|
||||
## Output Behavior
|
||||
|
||||
- **Default location**: `~/tmp/cleaned_transcript.md` - Allows overwrite
|
||||
- **Custom location**: Cannot overwrite existing files (raises error if file exists)
|
||||
|
||||
## Language Support
|
||||
|
||||
Auto-detects and works with:
|
||||
- English
|
||||
- Chinese (Mandarin, Cantonese)
|
||||
- Mixed language content
|
||||
- Multi-speaker transcriptions
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- Claude CLI must be installed and accessible
|
||||
- Transcript file must exist at specified path
|
||||
@@ -0,0 +1,329 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Audio Transcription Cleanup Script
|
||||
|
||||
This module transforms messy voice transcription text into well-formatted, human-readable
|
||||
markdown documents using Claude CLI. It preserves all original meaning and content while
|
||||
improving readability through intelligent text cleanup and restructuring.
|
||||
|
||||
Key Features:
|
||||
- Remove verbal artifacts (um, uh, like, filler words in any language)
|
||||
- Fix spelling and grammar errors
|
||||
- Add semantic paragraph breaks and section headings
|
||||
- Support for single-speaker and multi-speaker content
|
||||
- Multi-language support (English, Chinese, and more)
|
||||
- Preserve all original information (no summarization)
|
||||
- Default output location with overwrite capability
|
||||
- Custom output locations with overwrite protection
|
||||
|
||||
Default Behavior:
|
||||
- Output: ~/tmp/cleaned_transcript.md (allows overwrite)
|
||||
- Language: Auto-detected from source
|
||||
- Format: Well-structured markdown with headings
|
||||
|
||||
Custom Output:
|
||||
- Cannot overwrite existing files (raises FileExistsError)
|
||||
- Creates parent directories automatically
|
||||
|
||||
Text Cleanup Operations:
|
||||
- Remove filler words: um, uh, like, you know, 呃, 啊, 那个, 就是说
|
||||
- Fix obvious noun errors and typos
|
||||
- Correct grammar mistakes
|
||||
- Convert spoken fragments to complete sentences
|
||||
- Add descriptive section headings
|
||||
- Organize content into semantic paragraphs
|
||||
|
||||
Example Usage:
|
||||
# Use default output location (allows overwrite)
|
||||
$ python audio_transcription_cleanup.py --transcript-file /path/to/transcript.txt
|
||||
|
||||
# Specify custom output location (cannot overwrite existing file)
|
||||
$ python audio_transcription_cleanup.py --transcript-file /path/to/transcript.txt --output ~/Documents/cleaned.md
|
||||
|
||||
Requirements:
|
||||
- Claude CLI must be installed and accessible
|
||||
- Transcript file must exist at specified path
|
||||
|
||||
Author: sanhe
|
||||
Plugin: youtube@sanhe-claude-code-plugins
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
dir_home = Path.home()
|
||||
path_cleaned_transcript = dir_home / "tmp" / "cleaned_transcript.md"
|
||||
|
||||
prompt = """
|
||||
## Task
|
||||
Transform the messy voice transcription text provided below into a well-formatted, human-readable document while preserving ALL original meaning and content.
|
||||
|
||||
## Key Principles
|
||||
- **Preserve original meaning**: Do not summarize or omit information
|
||||
- **Fix transcription artifacts**: Remove filler words, false starts, and repetitions
|
||||
- **Improve readability**: Organize into proper paragraphs with clear structure
|
||||
- **Handle multi-speaker content**: Clearly attribute dialogue when multiple speakers are present
|
||||
- **Enhance document structure**: Add semantic paragraphing and descriptive headings
|
||||
|
||||
## Processing Instructions
|
||||
|
||||
### Step 1: Analyze the Input
|
||||
Examine the transcription to identify:
|
||||
- **Primary language**: Determine the dominant language of the transcription
|
||||
- **Number of speakers**: Single vs. multi-speaker content
|
||||
- **Main topics or themes**: Identify distinct topics for sectioning
|
||||
- **Transcription quality issues**: Filler words, repetitions, false starts, obvious errors
|
||||
|
||||
### Step 2: Text Cleanup Operations
|
||||
|
||||
1. **Remove verbal artifacts**:
|
||||
- Filler words: "um", "uh", "like", "you know", "呃", "啊", "那个", "就是说", "嗯", "然后"
|
||||
- False starts and self-corrections
|
||||
- Unnecessary repetitions
|
||||
|
||||
2. **Fix errors and improve accuracy**:
|
||||
- **Correct obvious noun errors**: Fix misrecognized names, places, technical terms
|
||||
- **Fix spelling mistakes**: Correct typos and transcription errors
|
||||
- **Fix grammar errors**: Subject-verb agreement, tense consistency, word order
|
||||
- **Clarify ambiguous terms**: Use context to determine correct words when transcription is unclear
|
||||
|
||||
3. **Organize content structure**:
|
||||
- **Semantic paragraphing**: Group related ideas into logical paragraphs based on meaning
|
||||
- **Add section headings**: Create descriptive headings that summarize each section's content
|
||||
- Convert spoken fragments into complete sentences
|
||||
- Create logical flow between sections
|
||||
|
||||
4. **For multi-speaker content**:
|
||||
- Use clear speaker labels (Speaker A, B, or actual names if identified)
|
||||
- Format as dialogue or meeting notes
|
||||
- Preserve conversational context
|
||||
|
||||
### Step 3: Language and Formatting
|
||||
|
||||
**Language Selection Rules** (in priority order):
|
||||
1. If user explicitly specifies output language → use that language
|
||||
2. Otherwise → use the primary language of the original transcription
|
||||
3. Preserve technical terms and proper nouns in their original language/form
|
||||
|
||||
**For single-speaker content**:
|
||||
```markdown
|
||||
# [Main Topic Title]
|
||||
|
||||
## [Section 1 Heading]
|
||||
|
||||
[Cleaned paragraph 1 content...]
|
||||
|
||||
[Cleaned paragraph 2 content...]
|
||||
|
||||
## [Section 2 Heading]
|
||||
|
||||
[Cleaned content...]
|
||||
```
|
||||
|
||||
**For multi-speaker content** - Option 1 (Dialogue format):
|
||||
```markdown
|
||||
# [Meeting/Discussion Title]
|
||||
|
||||
## [Topic 1]
|
||||
|
||||
**Speaker A:** [cleaned content]
|
||||
|
||||
**Speaker B:** [cleaned content]
|
||||
|
||||
## [Topic 2]
|
||||
|
||||
**Speaker A:** [cleaned content]
|
||||
```
|
||||
|
||||
**For multi-speaker content** - Option 2 (Meeting notes format):
|
||||
```markdown
|
||||
# [Meeting Title]
|
||||
|
||||
## [Discussion Topic 1]
|
||||
|
||||
Speaker A mentioned that [cleaned content]...
|
||||
|
||||
Speaker B responded by explaining [cleaned content]...
|
||||
|
||||
## [Discussion Topic 2]
|
||||
|
||||
The group discussed [cleaned content]...
|
||||
```
|
||||
|
||||
## Important Constraints
|
||||
- **NEVER summarize**: Include all information from original transcription
|
||||
- **NEVER add new substantive information**: Only reorganize and clarify existing content
|
||||
- **NEVER change the speaker's meaning or intent**
|
||||
- **DO add section headings**: These are structural additions that help readability
|
||||
- **Preserve technical terms and specific names** exactly as intended (fix transcription errors only)
|
||||
|
||||
## Heading Guidelines
|
||||
- Section headings should be **concise and descriptive** (3-8 words typically)
|
||||
- Headings should reflect the **actual content** of each section
|
||||
- Use appropriate heading levels (H1 for main title, H2 for major sections, H3 for subsections)
|
||||
- Headings are added to **improve navigation**, not to interpret or editorialize
|
||||
|
||||
## Language Support
|
||||
This skill works with transcriptions in any language:
|
||||
- **Chinese** (Mandarin, Cantonese, etc.)
|
||||
- **English**
|
||||
- **Mixed language** content (preserve code-switching naturally)
|
||||
- **Other languages** (apply language-appropriate grammar rules)
|
||||
|
||||
## OUTPUT INSTRUCTIONS
|
||||
|
||||
**CRITICAL**: Your response must contain ONLY the cleaned markdown document.
|
||||
|
||||
**DO NOT include**:
|
||||
- Any explanations before the document
|
||||
- Any explanations after the document
|
||||
- Any meta-commentary about the cleaning process
|
||||
- Any acknowledgments like "Here is the cleaned version"
|
||||
- Any markdown code fences (no ```markdown blocks)
|
||||
- Any introductory or concluding remarks
|
||||
|
||||
**START your response immediately with the H1 title** (# [Title]) and **END immediately after the last content paragraph**.
|
||||
|
||||
Your entire response = the cleaned document itself, nothing more.
|
||||
|
||||
## INPUT TRANSCRIPTION:
|
||||
""".strip()
|
||||
|
||||
|
||||
def cleanup_transcript(path_transcript: Path):
|
||||
"""
|
||||
Clean up audio transcription and save to file.
|
||||
|
||||
Args:
|
||||
path_transcript: Path to the raw transcript file
|
||||
"""
|
||||
# Read transcript content
|
||||
content = path_transcript.read_text(encoding="utf-8")
|
||||
|
||||
# Call Claude CLI to process the transcript
|
||||
args = [
|
||||
"claude",
|
||||
"--append-system-prompt",
|
||||
prompt,
|
||||
"--print",
|
||||
content,
|
||||
]
|
||||
|
||||
result = subprocess.run(
|
||||
args,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(f"Claude CLI failed: {result.stderr}")
|
||||
|
||||
# Save the cleaned output
|
||||
cleaned_transcript = result.stdout
|
||||
return cleaned_transcript
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Main CLI entry point for cleaning up audio transcriptions.
|
||||
|
||||
This script transforms messy voice transcription text into well-formatted,
|
||||
human-readable markdown documents while preserving the original meaning.
|
||||
|
||||
Example usage:
|
||||
python audio_transcription_cleanup.py --transcript-file "/path/to/transcript.txt"
|
||||
|
||||
What it does:
|
||||
- Removes filler words and verbal artifacts (um, uh, like, etc.)
|
||||
- Fixes obvious spelling and grammar errors
|
||||
- Adds semantic paragraph breaks and section headings
|
||||
- Preserves all original information (no summarization)
|
||||
- Maintains the speaker's meaning and intent
|
||||
|
||||
Requirements:
|
||||
- Transcript file must exist
|
||||
- Claude CLI must be installed and accessible
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Clean up audio transcriptions into well-formatted markdown documents",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Use default output location (allows overwrite)
|
||||
%(prog)s --transcript-file "~/tmp/transcript.txt"
|
||||
|
||||
# Specify custom output location (cannot overwrite existing file)
|
||||
%(prog)s --transcript-file "~/tmp/transcript.txt" --output "~/Documents/cleaned.md"
|
||||
|
||||
Cleanup Operations:
|
||||
- Remove verbal artifacts: um, uh, like, you know, 呃, 啊, 那个
|
||||
- Fix spelling and grammar errors
|
||||
- Add semantic paragraphs and section headings
|
||||
- Convert spoken fragments into complete sentences
|
||||
- Preserve all original information
|
||||
|
||||
Output Behavior:
|
||||
- Default location (~/tmp/cleaned_transcript.md): Allows overwrite
|
||||
- Custom location: Cannot overwrite existing files (will raise error)
|
||||
|
||||
Note:
|
||||
This script uses Claude CLI to perform intelligent transcript cleanup.
|
||||
All original information is preserved - no summarization occurs.
|
||||
""",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--transcript-file",
|
||||
type=str,
|
||||
required=True,
|
||||
help=f"Path to the transcript file to clean up (required)",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
default=None,
|
||||
help=f"Output file path for cleaned transcript (default: {path_cleaned_transcript}). Note: Default location allows overwrite, custom locations cannot overwrite existing files.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Convert to Path objects and expand user home directory
|
||||
path_transcript = Path(args.transcript_file).expanduser()
|
||||
|
||||
# Determine output path
|
||||
if args.output:
|
||||
path_output = Path(args.output).expanduser()
|
||||
# For custom locations, check if file already exists
|
||||
if (path_output != path_cleaned_transcript) and path_output.exists():
|
||||
raise FileExistsError(
|
||||
f"Output file already exists at {path_output}. "
|
||||
f"Please choose a different location or remove the existing file. "
|
||||
f"(Default location {path_cleaned_transcript} allows overwrite)"
|
||||
)
|
||||
else:
|
||||
# Use default location (allows overwrite)
|
||||
path_output = path_cleaned_transcript
|
||||
|
||||
# Ensure output directory exists
|
||||
path_output.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Clean up the transcript
|
||||
cleaned_transcript = cleanup_transcript(
|
||||
path_transcript=path_transcript,
|
||||
)
|
||||
|
||||
# Save to output file
|
||||
path_output.write_text(cleaned_transcript, encoding="utf-8")
|
||||
|
||||
# Print success message with clickable file path
|
||||
absolute_path = path_output.resolve()
|
||||
print(f"✓ Transcript cleanup completed successfully!")
|
||||
print(f"✓ Cleaned transcript saved to: file://{absolute_path}")
|
||||
print(f"\nClick the link above to open the document.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
56
skills/enrich-citations/SKILL.md
Normal file
56
skills/enrich-citations/SKILL.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
name: enrich-citations
|
||||
description: Find and add authoritative source links for all facts, citations, and references in markdown documents
|
||||
---
|
||||
|
||||
# Enrich Citations
|
||||
|
||||
Enhance markdown documents by finding and adding authoritative source links for mentioned facts, tools, products, research, and references.
|
||||
|
||||
## Usage
|
||||
|
||||
Use the `enrich_citations.py` script to process markdown documents:
|
||||
|
||||
```bash
|
||||
# Use default output location (~/tmp/citation_enriched.md - allows overwrite)
|
||||
python scripts/enrich_citations.py --document-file /path/to/document.md
|
||||
|
||||
# Specify custom output location (cannot overwrite existing files)
|
||||
python scripts/enrich_citations.py --document-file /path/to/document.md --output /path/to/output.md
|
||||
```
|
||||
|
||||
## What It Does
|
||||
|
||||
The script automatically:
|
||||
- Identifies all references (tools, research, products, organizations, people, standards)
|
||||
- Performs web search to find authoritative sources
|
||||
- Adds markdown hyperlinks with proper spacing: `[Reference](URL)`
|
||||
- Verifies all URLs are valid and accessible
|
||||
- Preserves all original content (only adds hyperlinks, no text changes)
|
||||
- Prioritizes official sources and documentation
|
||||
|
||||
## Options
|
||||
|
||||
- `--document-file` (required) - Path to the markdown document to enrich
|
||||
- `--output` (optional) - Custom output path (default: `~/tmp/citation_enriched.md`)
|
||||
|
||||
## Output Behavior
|
||||
|
||||
- **Default location**: `~/tmp/citation_enriched.md` - Allows overwrite
|
||||
- **Custom location**: Cannot overwrite existing files (raises error if file exists)
|
||||
|
||||
## Reference Types Identified
|
||||
|
||||
- External sources and research papers
|
||||
- Tools, software, frameworks, libraries
|
||||
- Products and services
|
||||
- Organizations and institutions
|
||||
- Technical concepts and standards (RFC, W3C, APIs)
|
||||
- People and experts
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- Claude CLI must be installed and accessible
|
||||
- Internet connection (for web searches)
|
||||
- Document file must exist at specified path
|
||||
323
skills/enrich-citations/scripts/enrich_citations.py
Normal file
323
skills/enrich-citations/scripts/enrich_citations.py
Normal file
@@ -0,0 +1,323 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Example usage::
|
||||
|
||||
# Use default output location (allows overwrite)
|
||||
python /Users/sanhehu/Documents/GitHub/sanhe-claude-code-plugins/plugins/social-media-network/youtube/skills/enrich-citations/scripts/enrich_citations.py --document-file /Users/sanhehu/tmp/cleaned_transcript.md
|
||||
|
||||
# Specify custom output location (cannot overwrite existing file)
|
||||
python /Users/sanhehu/Documents/GitHub/sanhe-claude-code-plugins/plugins/social-media-network/youtube/skills/enrich-citations/scripts/enrich_citations.py --document-file /Users/sanhehu/tmp/cleaned_transcript.md --output ~/Documents/enriched_document.md
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
dir_home = Path.home()
|
||||
path_enriched_document = dir_home / "tmp" / "citation_enriched.md"
|
||||
|
||||
prompt = """
|
||||
## Task
|
||||
Enhance the markdown document provided below by finding and adding authoritative source links for all mentioned facts, citations, tools, products, research, and references.
|
||||
|
||||
## Key Principles
|
||||
- **Find authoritative sources**: Search for the original, most credible source for each reference
|
||||
- **Verify accuracy**: Ensure links point to the correct, current information
|
||||
- **Preserve content**: Only add hyperlinks, don't change the document's text or meaning
|
||||
- **Proper formatting**: Use markdown hyperlink syntax with spaces for proper rendering
|
||||
|
||||
## Processing Instructions
|
||||
|
||||
### Step 1: Identify References
|
||||
|
||||
Read through the entire markdown document and identify all mentions of:
|
||||
|
||||
1. **External sources and research**:
|
||||
- Academic papers, studies, reports
|
||||
- Books, articles, blog posts
|
||||
- Research findings or statistics
|
||||
|
||||
2. **Tools and software**:
|
||||
- Applications, frameworks, libraries
|
||||
- Programming languages, platforms
|
||||
- Development tools, services
|
||||
|
||||
3. **Products and services**:
|
||||
- Commercial products
|
||||
- SaaS platforms
|
||||
- Hardware or software products
|
||||
|
||||
4. **Organizations and institutions**:
|
||||
- Companies, universities
|
||||
- Standards bodies, foundations
|
||||
- Government agencies
|
||||
|
||||
5. **Technical concepts and standards**:
|
||||
- Specifications (RFC, W3C, etc.)
|
||||
- APIs, protocols
|
||||
- Industry standards
|
||||
|
||||
6. **People and experts**:
|
||||
- Authors, researchers
|
||||
- Industry leaders
|
||||
- Subject matter experts
|
||||
|
||||
### Step 2: Web Search for Sources
|
||||
|
||||
For **each identified reference**:
|
||||
|
||||
1. **Search for the authoritative source**:
|
||||
- Use web search to find the official or most credible source
|
||||
- Prioritize: official websites > documentation > reputable publications
|
||||
- For academic references: search for DOI, arXiv, official publication
|
||||
|
||||
2. **Verify the source**:
|
||||
- Ensure the URL is current and accessible
|
||||
- Check that content matches the reference
|
||||
- Confirm it's the most authoritative source available
|
||||
|
||||
3. **Select the best link**:
|
||||
- Official documentation or homepage for tools/products
|
||||
- Original publication for research/articles
|
||||
- Wikipedia for general concepts (if no better source exists)
|
||||
- GitHub repository for open source projects
|
||||
|
||||
### Step 3: Add Hyperlinks
|
||||
|
||||
Replace plain text references with markdown hyperlinks:
|
||||
|
||||
**Format:** `[Reference Text](URL)`
|
||||
|
||||
**Critical formatting requirements:**
|
||||
- **Add spaces around hyperlinks**: `text [link](url) text` (not `text[link](url)text`)
|
||||
- Use descriptive link text (the actual reference name, not "click here")
|
||||
- Maintain the original sentence structure
|
||||
- Preserve all other content unchanged
|
||||
|
||||
**Examples:**
|
||||
|
||||
Before:
|
||||
```
|
||||
According to the 2023 Stack Overflow Survey, JavaScript is the most popular language.
|
||||
```
|
||||
|
||||
After:
|
||||
```
|
||||
According to the [2023 Stack Overflow Survey](https://survey.stackoverflow.co/2023/) , JavaScript is the most popular language.
|
||||
```
|
||||
|
||||
Before (Chinese):
|
||||
```
|
||||
文章提到了GPT-4的能力提升
|
||||
```
|
||||
|
||||
After (Chinese):
|
||||
```
|
||||
文章提到了 [GPT-4](https://openai.com/gpt-4) 的能力提升
|
||||
```
|
||||
|
||||
### Step 4: Quality Assurance
|
||||
|
||||
Before finalizing:
|
||||
|
||||
1. **Verify all links**:
|
||||
- Test that URLs are valid and accessible
|
||||
- Check for broken links or redirects
|
||||
- Ensure HTTPS when available
|
||||
|
||||
2. **Check formatting**:
|
||||
- Confirm spaces around hyperlinks
|
||||
- Verify markdown syntax is correct
|
||||
- Ensure links render properly
|
||||
|
||||
3. **Review accuracy**:
|
||||
- Links point to correct resources
|
||||
- No duplicate or conflicting sources
|
||||
- All identified references are addressed
|
||||
|
||||
## Important Constraints
|
||||
|
||||
- **NEVER change the document content**: Only add hyperlinks to existing text
|
||||
- **NEVER add new information**: Don't insert citations not mentioned in the original
|
||||
- **NEVER remove existing content**: Preserve all original text
|
||||
- **NEVER summarize or paraphrase**: Keep exact wording
|
||||
- **DO add spaces around hyperlinks**: Essential for proper rendering
|
||||
- **DO verify all sources**: Ensure accuracy and accessibility
|
||||
|
||||
## Source Priority Guidelines
|
||||
|
||||
When multiple sources exist, prioritize in this order:
|
||||
|
||||
1. **Official sources**: Project homepages, official documentation
|
||||
2. **Primary sources**: Original research papers, first publications
|
||||
3. **Authoritative organizations**: Standards bodies, academic institutions
|
||||
4. **Reputable publications**: Well-known tech blogs, news sites
|
||||
5. **Community resources**: GitHub, Stack Overflow (for code/tools)
|
||||
6. **General references**: Wikipedia, encyclopedias (as last resort)
|
||||
|
||||
## Language Support
|
||||
|
||||
This skill works with documents in any language:
|
||||
- Respect the document's primary language
|
||||
- Use appropriate sources for the language/region
|
||||
- Maintain original text while adding hyperlinks
|
||||
- For multilingual documents, find sources matching each language section when appropriate
|
||||
|
||||
## OUTPUT INSTRUCTIONS
|
||||
|
||||
**CRITICAL**: Your response must contain ONLY the enhanced markdown document with citations added.
|
||||
|
||||
**DO NOT include**:
|
||||
- Any explanations before the document
|
||||
- Any explanations after the document
|
||||
- Any meta-commentary about the enrichment process
|
||||
- Any acknowledgments like "Here is the enriched version"
|
||||
- Any markdown code fences (no ```markdown blocks)
|
||||
- Any introductory or concluding remarks
|
||||
|
||||
**START your response immediately with the first line of the document** and **END immediately after the last line**.
|
||||
|
||||
Your entire response = the enriched document itself, nothing more.
|
||||
|
||||
## INPUT DOCUMENT:
|
||||
""".strip()
|
||||
|
||||
|
||||
def enrich_citations(path_document: Path):
|
||||
"""
|
||||
Enrich markdown document with authoritative citation links.
|
||||
|
||||
Args:
|
||||
path_document: Path to the markdown document file
|
||||
"""
|
||||
# Read document content
|
||||
content = path_document.read_text(encoding="utf-8")
|
||||
|
||||
# Call Claude CLI to process the document
|
||||
args = [
|
||||
"claude",
|
||||
"--append-system-prompt",
|
||||
prompt,
|
||||
"--print",
|
||||
content,
|
||||
]
|
||||
|
||||
result = subprocess.run(
|
||||
args,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(f"Claude CLI failed: {result.stderr}")
|
||||
|
||||
# Return the enriched document
|
||||
enriched_document = result.stdout
|
||||
return enriched_document
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Main CLI entry point for enriching citations in markdown documents.
|
||||
|
||||
This script enhances markdown documents by finding and adding authoritative
|
||||
source links for all mentioned facts, citations, and references.
|
||||
|
||||
Example usage:
|
||||
python enrich_citations.py --document-file "/path/to/document.md"
|
||||
|
||||
What it does:
|
||||
- Identifies all references (tools, research, products, people, etc.)
|
||||
- Performs web search to find authoritative sources
|
||||
- Adds markdown hyperlinks with proper spacing
|
||||
- Verifies all URLs are valid and current
|
||||
- Preserves all original content (only adds links)
|
||||
|
||||
Requirements:
|
||||
- Document file must exist
|
||||
- Claude CLI must be installed and accessible
|
||||
- Internet connection required for web searches
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Enrich markdown documents with authoritative citation links",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Use default output location (allows overwrite)
|
||||
%(prog)s --document-file "~/tmp/cleaned_transcript.md"
|
||||
|
||||
# Specify custom output location (cannot overwrite existing file)
|
||||
%(prog)s --document-file "~/tmp/cleaned_transcript.md" --output "~/Documents/enriched.md"
|
||||
|
||||
Citation Enrichment:
|
||||
- Identifies facts, tools, products, research, and references
|
||||
- Web search to find authoritative sources
|
||||
- Adds markdown hyperlinks: [Title](URL)
|
||||
- Verifies all links are valid and current
|
||||
- Maintains proper spacing around hyperlinks
|
||||
- Preserves original document content
|
||||
|
||||
Output Behavior:
|
||||
- Default location (~/tmp/citation_enriched.md): Allows overwrite
|
||||
- Custom location: Cannot overwrite existing files (will raise error)
|
||||
|
||||
Note:
|
||||
This script uses Claude CLI with web search to perform intelligent citation enrichment.
|
||||
All original content is preserved - only hyperlinks are added.
|
||||
""",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--document-file",
|
||||
type=str,
|
||||
required=True,
|
||||
help=f"Path to the markdown document to enrich (required)",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
default=None,
|
||||
help=f"Output file path for enriched document (default: {path_enriched_document}). Note: Default location allows overwrite, custom locations cannot overwrite existing files.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Convert to Path objects and expand user home directory
|
||||
path_document = Path(args.document_file).expanduser()
|
||||
|
||||
# Determine output path
|
||||
if args.output:
|
||||
path_output = Path(args.output).expanduser()
|
||||
# For custom locations, check if file already exists
|
||||
if (path_output != path_enriched_document) and path_output.exists():
|
||||
raise FileExistsError(
|
||||
f"Output file already exists at {path_output}. "
|
||||
f"Please choose a different location or remove the existing file. "
|
||||
f"(Default location {path_enriched_document} allows overwrite)"
|
||||
)
|
||||
else:
|
||||
# Use default location (allows overwrite)
|
||||
path_output = path_enriched_document
|
||||
|
||||
# Ensure output directory exists
|
||||
path_output.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Enrich citations in the document
|
||||
enriched_document = enrich_citations(
|
||||
path_document=path_document,
|
||||
)
|
||||
|
||||
# Save to output file
|
||||
path_output.write_text(enriched_document, encoding="utf-8")
|
||||
|
||||
# Print success message with clickable file path
|
||||
absolute_path = path_output.resolve()
|
||||
print(f"✓ Citation enrichment completed successfully!")
|
||||
print(f"✓ Enriched document saved to: file://{absolute_path}")
|
||||
print(f"\nClick the link above to open the document.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
45
skills/transcribe-audio-to-text/SKILL.md
Normal file
45
skills/transcribe-audio-to-text/SKILL.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
name: transcribe-audio-to-text
|
||||
description: Transcribe audio files to text using audinota cli
|
||||
---
|
||||
|
||||
# Transcribe Audio to Text
|
||||
|
||||
Convert audio files to text using audinota transcription service.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Use default audio file (output from youtube-video-to-audio skill)
|
||||
python scripts/transcribe_audio.py
|
||||
|
||||
# Specify custom audio file
|
||||
python scripts/transcribe_audio.py --audio-file-path "/path/to/audio.mp3"
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
- `--audio-file-path` - Path to audio file to transcribe (default: ~/tmp/download_audio_result.mp3)
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Transcribe default audio file
|
||||
python scripts/transcribe_audio.py
|
||||
|
||||
# Transcribe custom audio file
|
||||
python scripts/transcribe_audio.py --audio-file-path "~/Music/podcast.mp3"
|
||||
|
||||
# Get help
|
||||
python scripts/transcribe_audio.py -h
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- audinota installed at `~/Documents/GitHub/audinota-project/.venv/bin/audinota`
|
||||
- Audio file must exist at specified path
|
||||
|
||||
## Integration
|
||||
|
||||
This skill works seamlessly with the `youtube-video-to-audio` skill. By default, it transcribes the audio file downloaded by that skill at `~/tmp/download_audio_result.mp3`.
|
||||
181
skills/transcribe-audio-to-text/scripts/transcribe_audio.py
Normal file
181
skills/transcribe-audio-to-text/scripts/transcribe_audio.py
Normal file
@@ -0,0 +1,181 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
Audio Transcription Script
|
||||
|
||||
This module provides functionality to transcribe audio files to text using the audinota CLI.
|
||||
It supports both default and custom output locations, with built-in protection against
|
||||
accidentally overwriting existing files at custom locations.
|
||||
|
||||
Key Features:
|
||||
- Transcribe audio files using audinota
|
||||
- Default output location with overwrite capability
|
||||
- Custom output locations with overwrite protection
|
||||
- Automatic directory creation for custom paths
|
||||
- Integration with youtube-video-to-audio skill workflow
|
||||
|
||||
Default Behavior:
|
||||
- Input: ~/tmp/download_audio_result.mp3
|
||||
- Output: ~/tmp/download_audio_result.txt (allows overwrite)
|
||||
|
||||
Custom Output:
|
||||
- Cannot overwrite existing files (raises FileExistsError)
|
||||
- Creates parent directories automatically
|
||||
|
||||
Example Usage:
|
||||
# Use defaults (allows overwrite)
|
||||
$ python transcribe_audio.py
|
||||
|
||||
# Custom audio file
|
||||
$ python transcribe_audio.py --audio-file-path "/path/to/audio.mp3"
|
||||
|
||||
# Custom output (cannot overwrite existing)
|
||||
$ python transcribe_audio.py --audio-file-path "/path/to/audio.mp3" --output "~/Documents/transcript.txt"
|
||||
|
||||
Requirements:
|
||||
- audinota installed at ~/Documents/GitHub/audinota-project/.venv/bin/audinota
|
||||
- Audio file must exist at specified path
|
||||
|
||||
Author: sanhe
|
||||
Plugin: youtube@sanhe-claude-code-plugins
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
dir_home = Path.home()
|
||||
default_audio_path = dir_home / "tmp" / "download_audio_result.mp3"
|
||||
default_transcript_path = dir_home / "tmp" / "download_audio_result.txt"
|
||||
|
||||
|
||||
def transcribe_audio(path_audio: Path, path_output: Path = None):
|
||||
"""
|
||||
Transcribe an audio file to text using audinota.
|
||||
|
||||
Args:
|
||||
path_audio: Path to the audio file to transcribe
|
||||
path_output: Path to save the transcript (optional, audinota will use default if not specified)
|
||||
|
||||
Raises:
|
||||
subprocess.CalledProcessError: If audinota transcription fails
|
||||
"""
|
||||
path_audinota = (
|
||||
dir_home
|
||||
/ "Documents"
|
||||
/ "GitHub"
|
||||
/ "audinota-project"
|
||||
/ ".venv"
|
||||
/ "bin"
|
||||
/ "audinota"
|
||||
)
|
||||
|
||||
if not path_audinota.exists():
|
||||
raise FileNotFoundError(f"audinota not found at {path_audinota}")
|
||||
|
||||
if not path_audio.exists():
|
||||
raise FileNotFoundError(f"Audio file not found at {path_audio}")
|
||||
|
||||
args = [
|
||||
str(path_audinota),
|
||||
"transcribe",
|
||||
"--input",
|
||||
str(path_audio),
|
||||
]
|
||||
|
||||
# Add output argument if specified
|
||||
if path_output:
|
||||
args.extend(["--output", str(path_output)])
|
||||
|
||||
print(f"Transcribing audio file: {path_audio}")
|
||||
subprocess.run(args, check=True)
|
||||
|
||||
output_location = path_output if path_output else "default location"
|
||||
print(f"✓ Transcription completed successfully")
|
||||
print(f"✓ Saved to: {output_location}")
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Main CLI entry point for transcribing audio files to text.
|
||||
|
||||
This script uses audinota to transcribe audio files to text.
|
||||
By default, it transcribes the audio file downloaded by the youtube-video-to-audio skill.
|
||||
|
||||
Example usage:
|
||||
# Use default paths (allows overwrite)
|
||||
python transcribe_audio.py
|
||||
|
||||
# Specify custom audio file with default output
|
||||
python transcribe_audio.py --audio-file-path "/path/to/audio.mp3"
|
||||
|
||||
# Specify both custom audio and output (cannot overwrite existing file)
|
||||
python transcribe_audio.py --audio-file-path "/path/to/audio.mp3" --output "~/Documents/transcript.txt"
|
||||
|
||||
Requirements:
|
||||
- audinota must be installed at ~/Documents/GitHub/audinota-project/.venv/bin/audinota
|
||||
- Audio file must exist at the specified path
|
||||
|
||||
Note:
|
||||
- Default location (~/tmp/download_audio_result.txt) allows overwrite
|
||||
- Custom output locations cannot overwrite existing files
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Transcribe audio files to text using audinota",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s
|
||||
%(prog)s --audio-file-path "/path/to/audio.mp3"
|
||||
%(prog)s --audio-file-path "~/Music/podcast.mp3" --output "~/Documents/transcript.txt"
|
||||
|
||||
Notes:
|
||||
- The default audio file path is the output of youtube-video-to-audio skill
|
||||
- Default output location (~/tmp/download_audio_result.txt) allows overwrite
|
||||
- Custom output locations cannot overwrite existing files
|
||||
- Requires audinota to be installed at ~/Documents/GitHub/audinota-project/.venv/bin/audinota
|
||||
""",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--audio-file-path",
|
||||
type=str,
|
||||
default=str(default_audio_path),
|
||||
help=f"Path to the audio file to transcribe (default: {default_audio_path})",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
default=None,
|
||||
help=f"Output file path for transcript (default: {default_transcript_path}). Note: Default location allows overwrite, custom locations cannot overwrite existing files.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Convert to Path object and expand user home directory
|
||||
audio_path = Path(args.audio_file_path).expanduser()
|
||||
|
||||
# Determine output path and check for existing files
|
||||
if args.output:
|
||||
path_output = Path(args.output).expanduser()
|
||||
# For custom locations, check if file already exists
|
||||
if (path_output != default_transcript_path) and (path_output.exists()):
|
||||
raise FileExistsError(
|
||||
f"Output file already exists at {path_output}. "
|
||||
f"Please choose a different location or remove the existing file. "
|
||||
f"(Default location {default_transcript_path} allows overwrite)"
|
||||
)
|
||||
# Ensure output directory exists
|
||||
path_output.parent.mkdir(parents=True, exist_ok=True)
|
||||
else:
|
||||
# Use default location (allows overwrite)
|
||||
path_output = default_transcript_path
|
||||
path_output.unlink(missing_ok=True)
|
||||
|
||||
# Transcribe the audio
|
||||
transcribe_audio(path_audio=audio_path, path_output=path_output)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
41
skills/youtube-video-to-audio/SKILL.md
Normal file
41
skills/youtube-video-to-audio/SKILL.md
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
name: youtube-video-to-audio
|
||||
description: Download YouTube videos as audio files using yt-dlp
|
||||
---
|
||||
|
||||
# YouTube Video to Audio
|
||||
|
||||
Extract and download audio from YouTube videos.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
python scripts/download_audio.py --video-url "https://www.youtube.com/watch?v=VIDEO_ID"
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
- `--video-url` (required) - YouTube video URL to download
|
||||
- `--quality` - Quality selector (default: "bestaudio[abr<=64]/worstaudio")
|
||||
- `--audio-format` - Output format: mp3, aac, wav (default: mp3)
|
||||
- `--bitrate` - Audio bitrate: 64K, 128K, 192K, 320K (default: 64K)
|
||||
- `--output` - Custom output path (default: ~/tmp/download_audio_result.mp3)
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Basic usage (default: 64K bitrate MP3 to ~/tmp/download_audio_result.mp3)
|
||||
python scripts/download_audio.py --video-url "https://www.youtube.com/watch?v=d6rZtgHcbWA"
|
||||
|
||||
# High quality with custom bitrate
|
||||
python scripts/download_audio.py --video-url "https://youtu.be/xyz" --quality "bestaudio" --bitrate "128K"
|
||||
|
||||
# Custom output location
|
||||
python scripts/download_audio.py --video-url "https://youtu.be/xyz" --output "/path/to/output.mp3"
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.11+
|
||||
- ffmpeg installed at `~/ffmpeg`
|
||||
- yt-dlp (auto-downloaded on first run)
|
||||
290
skills/youtube-video-to-audio/scripts/download_audio.py
Normal file
290
skills/youtube-video-to-audio/scripts/download_audio.py
Normal file
@@ -0,0 +1,290 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
YouTube Audio Downloader
|
||||
|
||||
This module provides functionality to download audio from YouTube videos using yt-dlp.
|
||||
It supports cross-platform operation (Windows, macOS, Linux) and includes automatic
|
||||
yt-dlp binary management with built-in protection against overwriting existing files.
|
||||
|
||||
Key Features:
|
||||
- Download audio from YouTube videos as MP3 files
|
||||
- Automatic yt-dlp binary download and updates
|
||||
- Cross-platform support (Windows, macOS, Linux)
|
||||
- Customizable audio quality and bitrate
|
||||
- Default output location with overwrite capability
|
||||
- Custom output locations with overwrite protection
|
||||
- Automatic directory creation for custom paths
|
||||
|
||||
Default Behavior:
|
||||
- Output: ~/tmp/download_audio_result.mp3 (allows overwrite)
|
||||
- Quality: bestaudio[abr<=64]/worstaudio
|
||||
- Format: MP3
|
||||
- Bitrate: 64K
|
||||
|
||||
Custom Output:
|
||||
- Cannot overwrite existing files (raises FileExistsError)
|
||||
- Creates parent directories automatically
|
||||
|
||||
Example Usage:
|
||||
# Download with defaults (allows overwrite)
|
||||
$ python download_audio.py --video-url "https://www.youtube.com/watch?v=d6rZtgHcbWA"
|
||||
|
||||
# Custom quality and bitrate
|
||||
$ python download_audio.py --video-url "https://youtu.be/xyz" --quality "bestaudio" --bitrate "128K"
|
||||
|
||||
# Custom output location (cannot overwrite existing)
|
||||
$ python download_audio.py --video-url "https://youtu.be/xyz" --output "/path/to/audio.mp3"
|
||||
|
||||
Requirements:
|
||||
- ffmpeg installed at ~/ffmpeg
|
||||
- Internet connection for downloading yt-dlp and videos
|
||||
- yt-dlp will be automatically downloaded if not present
|
||||
|
||||
Platform Support:
|
||||
- Windows: Downloads yt-dlp.exe
|
||||
- macOS: Downloads yt-dlp_macos
|
||||
- Linux: Downloads yt-dlp_linux
|
||||
|
||||
Author: sanhe
|
||||
Plugin: youtube@sanhe-claude-code-plugins
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import subprocess
|
||||
import urllib.request
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
# Detect OS
|
||||
IS_WINDOWS = False
|
||||
IS_MACOS = False
|
||||
IS_LINUX = False
|
||||
|
||||
_platform = sys.platform
|
||||
if _platform in ["win32", "cygwin"]:
|
||||
IS_WINDOWS = True
|
||||
elif _platform == "darwin":
|
||||
IS_MACOS = True
|
||||
elif _platform == "linux":
|
||||
IS_LINUX = True
|
||||
else:
|
||||
raise RuntimeError(f"Unsupported platform: {_platform}")
|
||||
|
||||
dir_home = Path.home()
|
||||
dir_tmp = dir_home / "tmp"
|
||||
try:
|
||||
dir_tmp.mkdir(exist_ok=True)
|
||||
except Exception: # pragma: no cover
|
||||
pass
|
||||
path_audio = dir_tmp / "download_audio_result.mp3"
|
||||
|
||||
if IS_WINDOWS:
|
||||
filename = "yt-dlp.exe"
|
||||
path_yt_dlp = dir_home / "yt-dlp.exe"
|
||||
elif IS_MACOS:
|
||||
filename = "yt-dlp_macos"
|
||||
path_yt_dlp = dir_home / "yt-dlp_macos"
|
||||
elif IS_LINUX:
|
||||
filename = "yt-dlp_linux"
|
||||
path_yt_dlp = dir_home / "yt-dlp_linux"
|
||||
else:
|
||||
raise RuntimeError("Unsupported operating system")
|
||||
|
||||
path_ffmpeg = dir_home / "ffmpeg"
|
||||
|
||||
|
||||
def get_latest_yt_dlp_release() -> str:
|
||||
"""Get the latest yt-dlp release version from GitHub API."""
|
||||
api_url = "https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest"
|
||||
|
||||
with urllib.request.urlopen(api_url) as response:
|
||||
data = json.loads(response.read().decode())
|
||||
tag_name = data["tag_name"]
|
||||
return tag_name
|
||||
|
||||
|
||||
def download_yt_dlp():
|
||||
"""Download the appropriate yt-dlp binary based on the OS."""
|
||||
# Get latest release version
|
||||
latest_version = get_latest_yt_dlp_release()
|
||||
print(f"Latest yt-dlp version: {latest_version}")
|
||||
|
||||
# Determine download URL and local path based on OS
|
||||
download_url = f"https://github.com/yt-dlp/yt-dlp/releases/download/{latest_version}/{filename}"
|
||||
|
||||
print(f"Downloading {filename} from {download_url}...")
|
||||
print(f"Saving to {path_yt_dlp}...")
|
||||
|
||||
# Download the file
|
||||
urllib.request.urlretrieve(download_url, path_yt_dlp)
|
||||
|
||||
# Set executable permissions for macOS and Linux
|
||||
if IS_MACOS or IS_LINUX:
|
||||
os.chmod(path_yt_dlp, 0o755)
|
||||
|
||||
print(f"Successfully downloaded yt-dlp to {path_yt_dlp}")
|
||||
|
||||
|
||||
def download_audio(
|
||||
video_url: str,
|
||||
quality: str = "bestaudio[abr<=64]/worstaudio",
|
||||
audio_format: str = "mp3",
|
||||
bitrate: str = "64K",
|
||||
output_path: str = None,
|
||||
):
|
||||
"""
|
||||
Download audio from a YouTube video using yt-dlp.
|
||||
|
||||
Args:
|
||||
video_url: YouTube video URL to download
|
||||
quality: yt-dlp quality selector (default: "bestaudio[abr<=64]/worstaudio")
|
||||
audio_format: Output audio format (default: "mp3")
|
||||
bitrate: Audio bitrate (default: "64K")
|
||||
output_path: Custom output path (default: ~/tmp/download_audio_result.mp3)
|
||||
|
||||
Returns:
|
||||
Path to the downloaded audio file
|
||||
"""
|
||||
output = output_path if output_path else str(path_audio)
|
||||
|
||||
args = [
|
||||
f"{path_yt_dlp}",
|
||||
"-f",
|
||||
quality,
|
||||
"--extract-audio",
|
||||
"--audio-format",
|
||||
audio_format,
|
||||
"--audio-quality",
|
||||
bitrate,
|
||||
"-o",
|
||||
output,
|
||||
"--restrict-filenames",
|
||||
"--ffmpeg-location",
|
||||
f"{str(path_ffmpeg)}",
|
||||
video_url,
|
||||
]
|
||||
result = subprocess.run(args, check=True, capture_output=True)
|
||||
return output
|
||||
|
||||
|
||||
def main():
|
||||
"""
|
||||
Main CLI entry point for downloading YouTube audio.
|
||||
|
||||
This script downloads audio from YouTube videos using yt-dlp and ffmpeg.
|
||||
It automatically downloads yt-dlp if not present in the home directory.
|
||||
|
||||
Example usage:
|
||||
# Use default output location (allows overwrite)
|
||||
python download_audio.py --video-url "https://www.youtube.com/watch?v=d6rZtgHcbWA"
|
||||
|
||||
# Specify custom quality and bitrate
|
||||
python download_audio.py --video-url "https://youtu.be/xyz" --quality "bestaudio" --bitrate "128K"
|
||||
|
||||
# Specify custom output location (cannot overwrite existing file)
|
||||
python download_audio.py --video-url "https://youtu.be/xyz" --output "/path/to/output.mp3"
|
||||
|
||||
Requirements:
|
||||
- ffmpeg must be installed at ~/ffmpeg
|
||||
- yt-dlp will be automatically downloaded to home directory if not present
|
||||
|
||||
Note:
|
||||
- Default location (~/tmp/download_audio_result.mp3) allows overwrite
|
||||
- Custom output locations cannot overwrite existing files
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Download audio from YouTube videos as MP3 files using yt-dlp",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s --video-url "https://www.youtube.com/watch?v=d6rZtgHcbWA"
|
||||
%(prog)s --video-url "https://youtu.be/xyz" --quality "bestaudio" --bitrate "128K"
|
||||
%(prog)s --video-url "https://youtu.be/xyz" --output "/custom/path/audio.mp3"
|
||||
|
||||
Quality options:
|
||||
bestaudio - Best quality audio available
|
||||
bestaudio[abr<=64] - Best audio with max 64kbps bitrate
|
||||
worstaudio - Lowest quality audio (smallest file)
|
||||
""",
|
||||
)
|
||||
|
||||
# Required arguments
|
||||
parser.add_argument(
|
||||
"--video-url",
|
||||
type=str,
|
||||
required=True,
|
||||
help="YouTube video URL to download (required)",
|
||||
)
|
||||
|
||||
# Optional arguments for yt-dlp customization
|
||||
parser.add_argument(
|
||||
"--quality",
|
||||
type=str,
|
||||
default="bestaudio[abr<=64]/worstaudio",
|
||||
help='Quality selector for yt-dlp (default: "bestaudio[abr<=64]/worstaudio")',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--audio-format",
|
||||
type=str,
|
||||
default="mp3",
|
||||
help="Output audio format: mp3, aac, wav, etc. (default: mp3)",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--bitrate",
|
||||
type=str,
|
||||
default="64K",
|
||||
help="Audio bitrate: 64K, 128K, 192K, 320K, etc. (default: 64K)",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
type=str,
|
||||
default=None,
|
||||
help=f"Custom output file path (default: {path_audio}). Note: Default location allows overwrite, custom locations cannot overwrite existing files.",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Ensure yt-dlp is downloaded
|
||||
if not path_yt_dlp.exists():
|
||||
print("yt-dlp not found, downloading...")
|
||||
download_yt_dlp()
|
||||
|
||||
# Determine output path and check for existing files
|
||||
if args.output:
|
||||
path_output = Path(args.output).expanduser()
|
||||
# For custom locations, check if file already exists
|
||||
if (path_output != path_audio) and path_output.exists():
|
||||
raise FileExistsError(
|
||||
f"Output file already exists at {path_output}. "
|
||||
f"Please choose a different location or remove the existing file. "
|
||||
f"(Default location {path_audio} allows overwrite)"
|
||||
)
|
||||
# Ensure output directory exists
|
||||
path_output.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path_str = str(path_output)
|
||||
else:
|
||||
# Use default location (allows overwrite)
|
||||
path_audio.unlink(missing_ok=True)
|
||||
output_path_str = None
|
||||
|
||||
# Download the audio
|
||||
output_file = download_audio(
|
||||
video_url=args.video_url,
|
||||
quality=args.quality,
|
||||
audio_format=args.audio_format,
|
||||
bitrate=args.bitrate,
|
||||
output_path=output_path_str,
|
||||
)
|
||||
|
||||
print(f"✓ Audio successfully downloaded from {args.video_url}")
|
||||
print(f"✓ Saved to: {output_file}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user