Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:32:01 +08:00
commit e7f6367823
8 changed files with 2219 additions and 0 deletions

View File

@@ -0,0 +1,16 @@
{
"name": "content-productivity-skills",
"description": "Content creation and productivity skills including article extraction, YouTube transcripts, research writing, and workflow optimization",
"version": "0.0.0-2025.11.28",
"author": {
"name": "ando",
"email": "ando@kivilaid.ee"
},
"skills": [
"./skills/article-extractor",
"./skills/youtube-transcript",
"./skills/content-research-writer",
"./skills/tapestry",
"./skills/ship-learn-next"
]
}

3
README.md Normal file
View File

@@ -0,0 +1,3 @@
# content-productivity-skills
Content creation and productivity skills including article extraction, YouTube transcripts, research writing, and workflow optimization

60
plugin.lock.json Normal file
View File

@@ -0,0 +1,60 @@
{
"$schema": "internal://schemas/plugin.lock.v1.json",
"pluginId": "gh:kivilaid/plugin-marketplace:content-productivity-skills",
"normalized": {
"repo": null,
"ref": "refs/tags/v20251128.0",
"commit": "418c6fd2f3f4c168bd9ca05c4eb21ac9d745c19a",
"treeHash": "cc123af981a1872e87bcfdc541f42a084be20f6cbea91810f9c8a36fd750e727",
"generatedAt": "2025-11-28T10:19:35.618440Z",
"toolVersion": "publish_plugins.py@0.2.0"
},
"origin": {
"remote": "git@github.com:zhongweili/42plugin-data.git",
"branch": "master",
"commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390",
"repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data"
},
"manifest": {
"name": "content-productivity-skills",
"description": "Content creation and productivity skills including article extraction, YouTube transcripts, research writing, and workflow optimization"
},
"content": {
"files": [
{
"path": "README.md",
"sha256": "02511e8f7a8c99273ff63e5b7cfd64d21cf14c2baa655a40121e9e0800792712"
},
{
"path": ".claude-plugin/plugin.json",
"sha256": "9b426202ee41876473980b749d3740fa8fed26f52c6c809c388f1f363b29b9de"
},
{
"path": "skills/content-research-writer/SKILL.md",
"sha256": "27eb5c1a0a88a9c1c64f6fcd9c28782e6a6e70d5da43233c4e99ade946f74825"
},
{
"path": "skills/article-extractor/SKILL.md",
"sha256": "026280a34728b2ccab62eece13c5f576e4a75cd7f0e61c6d2d3fa1ff9f30c024"
},
{
"path": "skills/ship-learn-next/SKILL.md",
"sha256": "503b2265e8d97ab8c52fe49c9ac5819653da31577e991e15c470c9a05cbd07f2"
},
{
"path": "skills/tapestry/SKILL.md",
"sha256": "349de7fe19197593e30a7c1074820d2f822cd62e5dc87aaf0399529a97d479f0"
},
{
"path": "skills/youtube-transcript/SKILL.md",
"sha256": "42f1fdbf0de1e24151b7fbf8ab265416ae47ea67cb40b8e561a7bc7579cc7d66"
}
],
"dirSha256": "cc123af981a1872e87bcfdc541f42a084be20f6cbea91810f9c8a36fd750e727"
},
"security": {
"scannedAt": null,
"scannerVersion": null,
"flags": []
}
}

View File

@@ -0,0 +1,371 @@
---
name: article-extractor
description: Extract clean article content from URLs (blog posts, articles, tutorials) and save as readable text. Use when user wants to download, extract, or save an article/blog post from a URL without ads, navigation, or clutter.
allowed-tools:
- Bash
- Write
---
# Article Extractor
This skill extracts the main content from web articles and blog posts, removing navigation, ads, newsletter signups, and other clutter. Saves clean, readable text.
## When to Use This Skill
Activate when the user:
- Provides an article/blog URL and wants the text content
- Asks to "download this article"
- Wants to "extract the content from [URL]"
- Asks to "save this blog post as text"
- Needs clean article text without distractions
## How It Works
### Priority Order:
1. **Check if tools are installed** (reader or trafilatura)
2. **Download and extract article** using best available tool
3. **Clean up the content** (remove extra whitespace, format properly)
4. **Save to file** with article title as filename
5. **Confirm location** and show preview
## Installation Check
Check for article extraction tools in this order:
### Option 1: reader (Recommended - Mozilla's Readability)
```bash
command -v reader
```
If not installed:
```bash
npm install -g @mozilla/readability-cli
# or
npm install -g reader-cli
```
### Option 2: trafilatura (Python-based, very good)
```bash
command -v trafilatura
```
If not installed:
```bash
pip3 install trafilatura
```
### Option 3: Fallback (curl + simple parsing)
If no tools available, use basic curl + text extraction (less reliable but works)
## Extraction Methods
### Method 1: Using reader (Best for most articles)
```bash
# Extract article
reader "URL" > article.txt
```
**Pros:**
- Based on Mozilla's Readability algorithm
- Excellent at removing clutter
- Preserves article structure
### Method 2: Using trafilatura (Best for blogs/news)
```bash
# Extract article
trafilatura --URL "URL" --output-format txt > article.txt
# Or with more options
trafilatura --URL "URL" --output-format txt --no-comments --no-tables > article.txt
```
**Pros:**
- Very accurate extraction
- Good with various site structures
- Handles multiple languages
**Options:**
- `--no-comments`: Skip comment sections
- `--no-tables`: Skip data tables
- `--precision`: Favor precision over recall
- `--recall`: Extract more content (may include some noise)
### Method 3: Fallback (curl + basic parsing)
```bash
# Download and extract basic content
curl -s "URL" | python3 -c "
from html.parser import HTMLParser
import sys
class ArticleExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.in_content = False
self.content = []
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside'}
self.current_tag = None
def handle_starttag(self, tag, attrs):
if tag not in self.skip_tags:
if tag in {'p', 'article', 'main', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6'}:
self.in_content = True
self.current_tag = tag
def handle_data(self, data):
if self.in_content and data.strip():
self.content.append(data.strip())
def get_content(self):
return '\n\n'.join(self.content)
parser = ArticleExtractor()
parser.feed(sys.stdin.read())
print(parser.get_content())
" > article.txt
```
**Note:** This is less reliable but works without dependencies.
## Getting Article Title
Extract title for filename:
### Using reader:
```bash
# reader outputs markdown with title at top
TITLE=$(reader "URL" | head -n 1 | sed 's/^# //')
```
### Using trafilatura:
```bash
# Get metadata including title
TITLE=$(trafilatura --URL "URL" --json | python3 -c "import json, sys; print(json.load(sys.stdin)['title'])")
```
### Using curl (fallback):
```bash
TITLE=$(curl -s "URL" | grep -oP '<title>\K[^<]+' | sed 's/ - .*//' | sed 's/ | .*//')
```
## Filename Creation
Clean title for filesystem:
```bash
# Get title
TITLE="Article Title from Website"
# Clean for filesystem (remove special chars, limit length)
FILENAME=$(echo "$TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | tr '<' '' | tr '>' '' | tr '|' '-' | cut -c 1-100 | sed 's/ *$//')
# Add extension
FILENAME="${FILENAME}.txt"
```
## Complete Workflow
```bash
ARTICLE_URL="https://example.com/article"
# Check for tools
if command -v reader &> /dev/null; then
TOOL="reader"
echo "Using reader (Mozilla Readability)"
elif command -v trafilatura &> /dev/null; then
TOOL="trafilatura"
echo "Using trafilatura"
else
TOOL="fallback"
echo "Using fallback method (may be less accurate)"
fi
# Extract article
case $TOOL in
reader)
# Get content
reader "$ARTICLE_URL" > temp_article.txt
# Get title (first line after # in markdown)
TITLE=$(head -n 1 temp_article.txt | sed 's/^# //')
;;
trafilatura)
# Get title from metadata
METADATA=$(trafilatura --URL "$ARTICLE_URL" --json)
TITLE=$(echo "$METADATA" | python3 -c "import json, sys; print(json.load(sys.stdin).get('title', 'Article'))")
# Get clean content
trafilatura --URL "$ARTICLE_URL" --output-format txt --no-comments > temp_article.txt
;;
fallback)
# Get title
TITLE=$(curl -s "$ARTICLE_URL" | grep -oP '<title>\K[^<]+' | head -n 1)
TITLE=${TITLE%% - *} # Remove site name
TITLE=${TITLE%% | *} # Remove site name (alternate)
# Get content (basic extraction)
curl -s "$ARTICLE_URL" | python3 -c "
from html.parser import HTMLParser
import sys
class ArticleExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.in_content = False
self.content = []
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside', 'form'}
def handle_starttag(self, tag, attrs):
if tag not in self.skip_tags:
if tag in {'p', 'article', 'main'}:
self.in_content = True
if tag in {'h1', 'h2', 'h3'}:
self.content.append('\n')
def handle_data(self, data):
if self.in_content and data.strip():
self.content.append(data.strip())
def get_content(self):
return '\n\n'.join(self.content)
parser = ArticleExtractor()
parser.feed(sys.stdin.read())
print(parser.get_content())
" > temp_article.txt
;;
esac
# Clean filename
FILENAME=$(echo "$TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | tr '<>' '' | tr '|' '-' | cut -c 1-80 | sed 's/ *$//' | sed 's/^ *//')
FILENAME="${FILENAME}.txt"
# Move to final filename
mv temp_article.txt "$FILENAME"
# Show result
echo "✓ Extracted article: $TITLE"
echo "✓ Saved to: $FILENAME"
echo ""
echo "Preview (first 10 lines):"
head -n 10 "$FILENAME"
```
## Error Handling
### Common Issues
**1. Tool not installed**
- Try alternate tool (reader → trafilatura → fallback)
- Offer to install: "Install reader with: npm install -g reader-cli"
**2. Paywall or login required**
- Extraction tools may fail
- Inform user: "This article requires authentication. Cannot extract."
**3. Invalid URL**
- Check URL format
- Try with and without redirects
**4. No content extracted**
- Site may use heavy JavaScript
- Try fallback method
- Inform user if extraction fails
**5. Special characters in title**
- Clean title for filesystem
- Remove: `/`, `:`, `?`, `"`, `<`, `>`, `|`
- Replace with `-` or remove
## Output Format
### Saved File Contains:
- Article title (if available)
- Author (if available from tool)
- Main article text
- Section headings
- No navigation, ads, or clutter
### What Gets Removed:
- Navigation menus
- Ads and promotional content
- Newsletter signup forms
- Related articles sidebars
- Comment sections (optional)
- Social media buttons
- Cookie notices
## Tips for Best Results
**1. Use reader for most articles**
- Best all-around tool
- Based on Firefox Reader View
- Works on most news sites and blogs
**2. Use trafilatura for:**
- Academic articles
- News sites
- Blogs with complex layouts
- Non-English content
**3. Fallback method limitations:**
- May include some noise
- Less accurate paragraph detection
- Better than nothing for simple sites
**4. Check extraction quality:**
- Always show preview to user
- Ask if it looks correct
- Offer to try different tool if needed
## Example Usage
**Simple extraction:**
```bash
# User: "Extract https://example.com/article"
reader "https://example.com/article" > temp.txt
TITLE=$(head -n 1 temp.txt | sed 's/^# //')
FILENAME="$(echo "$TITLE" | tr '/' '-').txt"
mv temp.txt "$FILENAME"
echo "✓ Saved to: $FILENAME"
```
**With error handling:**
```bash
if ! reader "$URL" > temp.txt 2>/dev/null; then
if command -v trafilatura &> /dev/null; then
trafilatura --URL "$URL" --output-format txt > temp.txt
else
echo "Error: Could not extract article. Install reader or trafilatura."
exit 1
fi
fi
```
## Best Practices
- ✅ Always show preview after extraction (first 10 lines)
- ✅ Verify extraction succeeded before saving
- ✅ Clean filename for filesystem compatibility
- ✅ Try fallback method if primary fails
- ✅ Inform user which tool was used
- ✅ Keep filename length reasonable (< 100 chars)
## After Extraction
Display to user:
1. "✓ Extracted: [Article Title]"
2. "✓ Saved to: [filename]"
3. Show preview (first 10-15 lines)
4. File size and location
Ask if needed:
- "Would you like me to also create a Ship-Learn-Next plan from this?" (if using ship-learn-next skill)
- "Should I extract another article?"

View File

@@ -0,0 +1,538 @@
---
name: content-research-writer
description: Assists in writing high-quality content by conducting research, adding citations, improving hooks, iterating on outlines, and providing real-time feedback on each section. Transforms your writing process from solo effort to collaborative partnership.
---
# Content Research Writer
This skill acts as your writing partner, helping you research, outline, draft, and refine content while maintaining your unique voice and style.
## When to Use This Skill
- Writing blog posts, articles, or newsletters
- Creating educational content or tutorials
- Drafting thought leadership pieces
- Researching and writing case studies
- Producing technical documentation with sources
- Writing with proper citations and references
- Improving hooks and introductions
- Getting section-by-section feedback while writing
## What This Skill Does
1. **Collaborative Outlining**: Helps you structure ideas into coherent outlines
2. **Research Assistance**: Finds relevant information and adds citations
3. **Hook Improvement**: Strengthens your opening to capture attention
4. **Section Feedback**: Reviews each section as you write
5. **Voice Preservation**: Maintains your writing style and tone
6. **Citation Management**: Adds and formats references properly
7. **Iterative Refinement**: Helps you improve through multiple drafts
## How to Use
### Setup Your Writing Environment
Create a dedicated folder for your article:
```
mkdir ~/writing/my-article-title
cd ~/writing/my-article-title
```
Create your draft file:
```
touch article-draft.md
```
Open Claude Code from this directory and start writing.
### Basic Workflow
1. **Start with an outline**:
```
Help me create an outline for an article about [topic]
```
2. **Research and add citations**:
```
Research [specific topic] and add citations to my outline
```
3. **Improve the hook**:
```
Here's my introduction. Help me make the hook more compelling.
```
4. **Get section feedback**:
```
I just finished the "Why This Matters" section. Review it and give feedback.
```
5. **Refine and polish**:
```
Review the full draft for flow, clarity, and consistency.
```
## Instructions
When a user requests writing assistance:
1. **Understand the Writing Project**
Ask clarifying questions:
- What's the topic and main argument?
- Who's the target audience?
- What's the desired length/format?
- What's your goal? (educate, persuade, entertain, explain)
- Any existing research or sources to include?
- What's your writing style? (formal, conversational, technical)
2. **Collaborative Outlining**
Help structure the content:
```markdown
# Article Outline: [Title]
## Hook
- [Opening line/story/statistic]
- [Why reader should care]
## Introduction
- Context and background
- Problem statement
- What this article covers
## Main Sections
### Section 1: [Title]
- Key point A
- Key point B
- Example/evidence
- [Research needed: specific topic]
### Section 2: [Title]
- Key point C
- Key point D
- Data/citation needed
### Section 3: [Title]
- Key point E
- Counter-arguments
- Resolution
## Conclusion
- Summary of main points
- Call to action
- Final thought
## Research To-Do
- [ ] Find data on [topic]
- [ ] Get examples of [concept]
- [ ] Source citation for [claim]
```
**Iterate on outline**:
- Adjust based on feedback
- Ensure logical flow
- Identify research gaps
- Mark sections for deep dives
3. **Conduct Research**
When user requests research on a topic:
- Search for relevant information
- Find credible sources
- Extract key facts, quotes, and data
- Add citations in requested format
Example output:
```markdown
## Research: AI Impact on Productivity
Key Findings:
1. **Productivity Gains**: Studies show 40% time savings for
content creation tasks [1]
2. **Adoption Rates**: 67% of knowledge workers use AI tools
weekly [2]
3. **Expert Quote**: "AI augments rather than replaces human
creativity" - Dr. Jane Smith, MIT [3]
Citations:
[1] McKinsey Global Institute. (2024). "The Economic Potential
of Generative AI"
[2] Stack Overflow Developer Survey (2024)
[3] Smith, J. (2024). MIT Technology Review interview
Added to outline under Section 2.
```
4. **Improve Hooks**
When user shares an introduction, analyze and strengthen:
**Current Hook Analysis**:
- What works: [positive elements]
- What could be stronger: [areas for improvement]
- Emotional impact: [current vs. potential]
**Suggested Alternatives**:
Option 1: [Bold statement]
> [Example]
*Why it works: [explanation]*
Option 2: [Personal story]
> [Example]
*Why it works: [explanation]*
Option 3: [Surprising data]
> [Example]
*Why it works: [explanation]*
**Questions to hook**:
- Does it create curiosity?
- Does it promise value?
- Is it specific enough?
- Does it match the audience?
5. **Provide Section-by-Section Feedback**
As user writes each section, review for:
```markdown
# Feedback: [Section Name]
## What Works Well ✓
- [Strength 1]
- [Strength 2]
- [Strength 3]
## Suggestions for Improvement
### Clarity
- [Specific issue] → [Suggested fix]
- [Complex sentence] → [Simpler alternative]
### Flow
- [Transition issue] → [Better connection]
- [Paragraph order] → [Suggested reordering]
### Evidence
- [Claim needing support] → [Add citation or example]
- [Generic statement] → [Make more specific]
### Style
- [Tone inconsistency] → [Match your voice better]
- [Word choice] → [Stronger alternative]
## Specific Line Edits
Original:
> [Exact quote from draft]
Suggested:
> [Improved version]
Why: [Explanation]
## Questions to Consider
- [Thought-provoking question 1]
- [Thought-provoking question 2]
Ready to move to next section!
```
6. **Preserve Writer's Voice**
Important principles:
- **Learn their style**: Read existing writing samples
- **Suggest, don't replace**: Offer options, not directives
- **Match tone**: Formal, casual, technical, friendly
- **Respect choices**: If they prefer their version, support it
- **Enhance, don't override**: Make their writing better, not different
Ask periodically:
- "Does this sound like you?"
- "Is this the right tone?"
- "Should I be more/less [formal/casual/technical]?"
7. **Citation Management**
Handle references based on user preference:
**Inline Citations**:
```markdown
Studies show 40% productivity improvement (McKinsey, 2024).
```
**Numbered References**:
```markdown
Studies show 40% productivity improvement [1].
[1] McKinsey Global Institute. (2024)...
```
**Footnote Style**:
```markdown
Studies show 40% productivity improvement^1
^1: McKinsey Global Institute. (2024)...
```
Maintain a running citations list:
```markdown
## References
1. Author. (Year). "Title". Publication.
2. Author. (Year). "Title". Publication.
...
```
8. **Final Review and Polish**
When draft is complete, provide comprehensive feedback:
```markdown
# Full Draft Review
## Overall Assessment
**Strengths**:
- [Major strength 1]
- [Major strength 2]
- [Major strength 3]
**Impact**: [Overall effectiveness assessment]
## Structure & Flow
- [Comments on organization]
- [Transition quality]
- [Pacing assessment]
## Content Quality
- [Argument strength]
- [Evidence sufficiency]
- [Example effectiveness]
## Technical Quality
- Grammar and mechanics: [assessment]
- Consistency: [assessment]
- Citations: [completeness check]
## Readability
- Clarity score: [evaluation]
- Sentence variety: [evaluation]
- Paragraph length: [evaluation]
## Final Polish Suggestions
1. **Introduction**: [Specific improvements]
2. **Body**: [Specific improvements]
3. **Conclusion**: [Specific improvements]
4. **Title**: [Options if needed]
## Pre-Publish Checklist
- [ ] All claims sourced
- [ ] Citations formatted
- [ ] Examples clear
- [ ] Transitions smooth
- [ ] Call to action present
- [ ] Proofread for typos
Ready to publish! 🚀
```
## Examples
### Example 1: Teresa Torres's Workflow
**User**: "I'm writing an article about continuous discovery. Help me create an outline."
**Process**:
1. Collaborates on outline structure
2. Identifies research needs
3. User starts writing introduction
4. Reviews and improves the hook
5. User writes each section
6. Provides feedback after each section
7. Conducts research and adds citations
8. Final review of complete draft
9. Polish and prep for publishing
**Result**: Well-researched, properly cited article written in Teresa's voice with strong structure and flow.
### Example 2: Research-Heavy Article
**User**: "I'm writing about AI's impact on product management. Help me find current data and examples."
**Output**:
```markdown
## Research Compiled
### Recent Data
- Gartner: 80% of PMs will use AI tools by 2025 [1]
- Survey: 45% report faster feature shipping [2]
- Case study: Airbnb's AI-assisted prioritization [3]
### Expert Quotes
- "AI amplifies PM judgment, not replaces it" - Marty Cagan
- [Additional quotes with citations]
### Real Examples
1. **Company A**: Used AI for user research synthesis
- Result: 60% time savings
- Source: [citation]
2. **Company B**: AI-powered roadmap analysis
- Result: Better stakeholder alignment
- Source: [citation]
All added to your outline with proper citations.
```
### Example 3: Hook Improvement
**User's Original Hook**:
> "Product management is changing because of AI. In this article, I'll discuss some ways AI affects product managers."
**Improved Options**:
**Option 1 (Data-driven)**:
> "Last month, I asked AI to analyze 500 customer interviews. It took 30 minutes instead of 3 weeks. Product management will never be the same."
**Option 2 (Question)**:
> "What if you could talk to every customer, read every review, and analyze every support ticket—all before your morning coffee?"
**Option 3 (Story)**:
> "Sarah spent two weeks building the wrong feature. Not because she didn't understand her users, but because she couldn't process the hundreds of interviews fast enough to spot the pattern."
### Example 4: Section Feedback
**User**: "Just finished my 'Common Mistakes' section. Thoughts?"
**Response**:
```markdown
# Feedback: Common Mistakes Section
## What Works Well ✓
- Strong examples (the Slack notification story)
- Clear structure (3 mistakes, clean layout)
- Practical advice for each mistake
## Suggestions
### Make It More Specific
Your second point says "Bad prioritization" but could be stronger:
Current:
> "Many teams prioritize badly by focusing on features instead of outcomes."
Suggested:
> "I've watched teams ship 14 features in a quarter yet move none of their key metrics. They prioritized activity over progress."
### Add Data
The third mistake would benefit from evidence:
> "[Add citation]: Studies show teams without regular user contact are 3x more likely to build unused features [needs source]"
### Flow Improvement
Consider reordering: Mistake 3 → Mistake 2 → Mistake 1
This builds from small to big impact.
Ready for the next section!
```
## Writing Workflows
### Blog Post Workflow
1. Outline together
2. Research key points
3. Write introduction → get feedback
4. Write body sections → feedback each
5. Write conclusion → final review
6. Polish and edit
### Newsletter Workflow
1. Discuss hook ideas
2. Quick outline (shorter format)
3. Draft in one session
4. Review for clarity and links
5. Quick polish
### Technical Tutorial Workflow
1. Outline steps
2. Write code examples
3. Add explanations
4. Test instructions
5. Add troubleshooting section
6. Final review for accuracy
### Thought Leadership Workflow
1. Brainstorm unique angle
2. Research existing perspectives
3. Develop your thesis
4. Write with strong POV
5. Add supporting evidence
6. Craft compelling conclusion
## Pro Tips
1. **Work in VS Code**: Better than web Claude for long-form writing
2. **One section at a time**: Get feedback incrementally
3. **Save research separately**: Keep a research.md file
4. **Version your drafts**: article-v1.md, article-v2.md, etc.
5. **Read aloud**: Use feedback to identify clunky sentences
6. **Set deadlines**: "I want to finish the draft today"
7. **Take breaks**: Write, get feedback, pause, revise
## File Organization
Recommended structure for writing projects:
```
~/writing/article-name/
├── outline.md # Your outline
├── research.md # All research and citations
├── draft-v1.md # First draft
├── draft-v2.md # Revised draft
├── final.md # Publication-ready
├── feedback.md # Collected feedback
└── sources/ # Reference materials
├── study1.pdf
└── article2.md
```
## Best Practices
### For Research
- Verify sources before citing
- Use recent data when possible
- Balance different perspectives
- Link to original sources
### For Feedback
- Be specific about what you want: "Is this too technical?"
- Share your concerns: "I'm worried this section drags"
- Ask questions: "Does this flow logically?"
- Request alternatives: "What's another way to explain this?"
### For Voice
- Share examples of your writing
- Specify tone preferences
- Point out good matches: "That sounds like me!"
- Flag mismatches: "Too formal for my style"
## Related Use Cases
- Creating social media posts from articles
- Adapting content for different audiences
- Writing email newsletters
- Drafting technical documentation
- Creating presentation content
- Writing case studies
- Developing course outlines

View File

@@ -0,0 +1,328 @@
---
name: ship-learn-next
description: Transform learning content (like YouTube transcripts, articles, tutorials) into actionable implementation plans using the Ship-Learn-Next framework. Use when user wants to turn advice, lessons, or educational content into concrete action steps, reps, or a learning quest.
allowed-tools:
- Read
- Write
---
# Ship-Learn-Next Action Planner
This skill helps transform passive learning content into actionable **Ship-Learn-Next cycles** - turning advice and lessons into concrete, shippable iterations.
## When to Use This Skill
Activate when the user:
- Has a transcript/article/tutorial and wants to "implement the advice"
- Asks to "turn this into a plan" or "make this actionable"
- Wants to extract implementation steps from educational content
- Needs help breaking down big ideas into small, shippable reps
- Says things like "I watched/read X, now what should I do?"
## Core Framework: Ship-Learn-Next
Every learning quest follows three repeating phases:
1. **SHIP** - Create something real (code, content, product, demonstration)
2. **LEARN** - Honest reflection on what happened
3. **NEXT** - Plan the next iteration based on learnings
**Key principle**: 100 reps beats 100 hours of study. Learning = doing better, not knowing more.
## How This Skill Works
### Step 1: Read the Content
Read the file the user provides (transcript, article, notes):
```bash
# User provides path to file
FILE_PATH="/path/to/content.txt"
```
Use the Read tool to analyze the content.
### Step 2: Extract Core Lessons
Identify from the content:
- **Main advice/lessons**: What are the key takeaways?
- **Actionable principles**: What can actually be practiced?
- **Skills being taught**: What would someone learn by doing this?
- **Examples/case studies**: Real implementations mentioned
**Do NOT**:
- Summarize everything (focus on actionable parts)
- List theory without application
- Include "nice to know" vs "need to practice"
### Step 3: Define the Quest
Help the user frame their learning goal:
Ask:
1. "Based on this content, what do you want to achieve in 4-8 weeks?"
2. "What would success look like? (Be specific)"
3. "What's something concrete you could build/create/ship?"
**Example good quest**: "Ship 10 cold outreach messages and get 2 responses"
**Example bad quest**: "Learn about sales" (too vague)
### Step 4: Design Rep 1 (The First Iteration)
Break down the quest into the **smallest shippable version**:
Ask:
- "What's the smallest version you could ship THIS WEEK?"
- "What do you need to learn JUST to do that?" (not everything)
- "What would 'done' look like for rep 1?"
**Make it:**
- Concrete and specific
- Completable in 1-7 days
- Produces real evidence/artifact
- Small enough to not be intimidating
- Big enough to learn something meaningful
### Step 5: Create the Rep Plan
Structure each rep with:
```markdown
## Rep 1: [Specific Goal]
**Ship Goal**: [What you'll create/do]
**Success Criteria**: [How you'll know it's done]
**What You'll Learn**: [Specific skills/insights]
**Resources Needed**: [Minimal - just what's needed for THIS rep]
**Timeline**: [Specific deadline]
**Action Steps**:
1. [Concrete step 1]
2. [Concrete step 2]
3. [Concrete step 3]
...
**After Shipping - Reflection Questions**:
- What actually happened? (Be specific)
- What worked? What didn't?
- What surprised you?
- On a scale of 1-10, how did this rep go?
- What would you do differently next time?
```
### Step 6: Map Future Reps (2-5)
Based on the content, suggest a progression:
```markdown
## Rep 2: [Next level]
**Builds on**: What you learned in Rep 1
**New challenge**: One new thing to try/improve
**Expected difficulty**: [Easier/Same/Harder - and why]
## Rep 3: [Continue progression]
...
```
**Progression principles**:
- Each rep adds ONE new element
- Increase difficulty based on success
- Reference specific lessons from the content
- Keep reps shippable (not theoretical)
### Step 7: Connect to Content
For each rep, reference the source material:
- "This implements the [concept] from minute X"
- "You're practicing the [technique] mentioned in the video"
- "This tests the advice about [topic]"
**But**: Always emphasize DOING over studying. Point to resources only when needed for the specific rep.
## Conversation Style
**Direct but supportive**:
- No fluff, but encouraging
- "Ship it, then we'll improve it"
- "What's the smallest version you could do this week?"
**Question-driven**:
- Make them think, don't just tell
- "What exactly do you want to achieve?" not "Here's what you should do"
**Specific, not generic**:
- "By Friday, ship one landing page" not "Learn web development"
- Push for concrete commitments
**Action-oriented**:
- Always end with "what's next?"
- Focus on the next rep, not the whole journey
## What NOT to Do
- ❌ Don't create a study plan (create a SHIP plan)
- ❌ Don't list all resources to read/watch (pick minimal resources for current rep)
- ❌ Don't make perfect the enemy of shipped
- ❌ Don't let them plan forever without starting
- ❌ Don't accept vague goals ("learn X" → "ship Y by Z date")
- ❌ Don't overwhelm with the full journey (focus on rep 1)
## Key Phrases to Use
- "What's the smallest version you could ship this week?"
- "What do you need to learn JUST to do that?"
- "This isn't about perfection - it's rep 1 of 100"
- "Ship something real, then we'll improve it"
- "Based on [content], what would you actually DO differently?"
- "Learning = doing better, not knowing more"
## Example Output Structure
```markdown
# Your Ship-Learn-Next Quest: [Title]
## Quest Overview
**Goal**: [What they want to achieve in 4-8 weeks]
**Source**: [The content that inspired this]
**Core Lessons**: [3-5 key actionable takeaways from content]
---
## Rep 1: [Specific, Shippable Goal]
**Ship Goal**: [Concrete deliverable]
**Timeline**: [This week / By [date]]
**Success Criteria**:
- [ ] [Specific thing 1]
- [ ] [Specific thing 2]
- [ ] [Specific thing 3]
**What You'll Practice** (from the content):
- [Skill/concept 1 from source material]
- [Skill/concept 2 from source material]
**Action Steps**:
1. [Concrete step]
2. [Concrete step]
3. [Concrete step]
4. Ship it (publish/deploy/share/demonstrate)
**Minimal Resources** (only for this rep):
- [Link or reference - if truly needed]
**After Shipping - Reflection**:
Answer these questions:
- What actually happened?
- What worked? What didn't?
- What surprised you?
- Rate this rep: _/10
- What's one thing to try differently next time?
---
## Rep 2: [Next Iteration]
**Builds on**: Rep 1 + [what you learned]
**New element**: [One new challenge/skill]
**Ship goal**: [Next concrete deliverable]
[Similar structure...]
---
## Rep 3-5: Future Path
**Rep 3**: [Brief description]
**Rep 4**: [Brief description]
**Rep 5**: [Brief description]
*(Details will evolve based on what you learn in Reps 1-2)*
---
## Remember
- This is about DOING, not studying
- Aim for 100 reps over time (not perfection on rep 1)
- Each rep = Plan → Do → Reflect → Next
- You learn by shipping, not by consuming
**Ready to ship Rep 1?**
```
## Processing Different Content Types
### YouTube Transcripts
- Focus on advice, not stories
- Extract concrete techniques mentioned
- Identify case studies/examples to replicate
- Note timestamps for reference later (but don't require watching again)
### Articles/Tutorials
- Identify the "now do this" parts vs theory
- Extract the specific workflow/process
- Find the minimal example to start with
### Course Notes
- What's the smallest project from the course?
- Which modules are needed for rep 1? (ignore the rest for now)
- What can be practiced immediately?
## Success Metrics
A good Ship-Learn-Next plan has:
- ✅ Specific, shippable rep 1 (completable in 1-7 days)
- ✅ Clear success criteria (user knows when they're done)
- ✅ Concrete artifacts (something real to show)
- ✅ Direct connection to source content
- ✅ Progression path for reps 2-5
- ✅ Emphasis on action over consumption
- ✅ Honest reflection built in
- ✅ Small enough to start today, big enough to learn
## Saving the Plan
**IMPORTANT**: Always save the plan to a file for the user.
### Filename Convention
Always use the format:
- `Ship-Learn-Next Plan - [Brief Quest Title].md`
Examples:
- `Ship-Learn-Next Plan - Build in Proven Markets.md`
- `Ship-Learn-Next Plan - Learn React.md`
- `Ship-Learn-Next Plan - Cold Email Outreach.md`
**Quest title should be**:
- Brief (3-6 words)
- Descriptive of the main goal
- Based on the content's core lesson/theme
### What to Save
**Complete plan including**:
- Quest overview with goal and source
- All reps (1-5) with full details
- Action steps and reflection questions
- Timeline commitments
- Reference to source material
**Format**: Always save as Markdown (`.md`) for readability
## After Creating the Plan
**Display to user**:
1. Show them you've saved the plan: "✓ Saved to: [filename]"
2. Give a brief overview of the quest
3. Highlight Rep 1 (what's due this week)
**Then ask**:
1. "When will you ship Rep 1?"
2. "What's the one thing that might stop you? How will you handle it?"
3. "Come back after you ship and we'll reflect + plan Rep 2"
**Remember**: You're not creating a curriculum. You're helping them ship something real, learn from it, and ship the next thing.
Let's help them ship.

485
skills/tapestry/SKILL.md Normal file
View File

@@ -0,0 +1,485 @@
---
name: tapestry
description: Unified content extraction and action planning. Use when user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or similar phrases indicating they want to extract content and create an action plan. Automatically detects content type (YouTube video, article, PDF) and processes accordingly.
allowed-tools:
- Bash
- Read
- Write
---
# Tapestry: Unified Content Extraction + Action Planning
This is the **master skill** that orchestrates the entire Tapestry workflow:
1. Detect content type from URL
2. Extract content using appropriate skill
3. Automatically create a Ship-Learn-Next action plan
## When to Use This Skill
Activate when the user:
- Says "tapestry [URL]"
- Says "weave [URL]"
- Says "help me plan [URL]"
- Says "extract and plan [URL]"
- Says "make this actionable [URL]"
- Says "turn [URL] into a plan"
- Provides a URL and asks to "learn and implement from this"
- Wants the full Tapestry workflow (extract → plan)
**Keywords to watch for**: tapestry, weave, plan, actionable, extract and plan, make a plan, turn into action
## How It Works
### Complete Workflow:
1. **Detect URL type** (YouTube, article, PDF)
2. **Extract content** using appropriate skill:
- YouTube → youtube-transcript skill
- Article → article-extractor skill
- PDF → download and extract text
3. **Create action plan** using ship-learn-next skill
4. **Save both** content file and plan file
5. **Present summary** to user
## URL Detection Logic
### YouTube Videos
**Patterns to detect:**
- `youtube.com/watch?v=`
- `youtu.be/`
- `youtube.com/shorts/`
- `m.youtube.com/watch?v=`
**Action:** Use youtube-transcript skill
### Web Articles/Blog Posts
**Patterns to detect:**
- `http://` or `https://`
- NOT YouTube, NOT PDF
- Common domains: medium.com, substack.com, dev.to, etc.
- Any HTML page
**Action:** Use article-extractor skill
### PDF Documents
**Patterns to detect:**
- URL ends with `.pdf`
- URL returns `Content-Type: application/pdf`
**Action:** Download and extract text
### Other Content
**Fallback:**
- Try article-extractor (works for most HTML)
- If fails, inform user of unsupported type
## Step-by-Step Workflow
### Step 1: Detect Content Type
```bash
URL="$1"
# Check for YouTube
if [[ "$URL" =~ youtube\.com/watch || "$URL" =~ youtu\.be/ || "$URL" =~ youtube\.com/shorts ]]; then
CONTENT_TYPE="youtube"
# Check for PDF
elif [[ "$URL" =~ \.pdf$ ]]; then
CONTENT_TYPE="pdf"
# Check if URL returns PDF
elif curl -sI "$URL" | grep -i "Content-Type: application/pdf" > /dev/null; then
CONTENT_TYPE="pdf"
# Default to article
else
CONTENT_TYPE="article"
fi
echo "📍 Detected: $CONTENT_TYPE"
```
### Step 2: Extract Content (by Type)
#### YouTube Video
```bash
# Use youtube-transcript skill workflow
echo "📺 Extracting YouTube transcript..."
# 1. Check for yt-dlp
if ! command -v yt-dlp &> /dev/null; then
echo "Installing yt-dlp..."
brew install yt-dlp
fi
# 2. Get video title
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
# 3. Download transcript
yt-dlp --write-auto-sub --skip-download --sub-langs en --output "temp_transcript" "$URL"
# 4. Convert to clean text (deduplicate)
python3 -c "
import sys, re
seen = set()
vtt_file = 'temp_transcript.en.vtt'
try:
with open(vtt_file, 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
except FileNotFoundError:
print('Error: Could not find transcript file', file=sys.stderr)
sys.exit(1)
" > "${VIDEO_TITLE}.txt"
# 5. Cleanup
rm -f temp_transcript.en.vtt
CONTENT_FILE="${VIDEO_TITLE}.txt"
echo "✓ Saved transcript: $CONTENT_FILE"
```
#### Article/Blog Post
```bash
# Use article-extractor skill workflow
echo "📄 Extracting article content..."
# 1. Check for extraction tools
if command -v reader &> /dev/null; then
TOOL="reader"
elif command -v trafilatura &> /dev/null; then
TOOL="trafilatura"
else
TOOL="fallback"
fi
echo "Using: $TOOL"
# 2. Extract based on tool
case $TOOL in
reader)
reader "$URL" > temp_article.txt
ARTICLE_TITLE=$(head -n 1 temp_article.txt | sed 's/^# //')
;;
trafilatura)
METADATA=$(trafilatura --URL "$URL" --json)
ARTICLE_TITLE=$(echo "$METADATA" | python3 -c "import json, sys; print(json.load(sys.stdin).get('title', 'Article'))")
trafilatura --URL "$URL" --output-format txt --no-comments > temp_article.txt
;;
fallback)
ARTICLE_TITLE=$(curl -s "$URL" | grep -oP '<title>\K[^<]+' | head -n 1)
ARTICLE_TITLE=${ARTICLE_TITLE%% - *}
curl -s "$URL" | python3 -c "
from html.parser import HTMLParser
import sys
class ArticleExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.content = []
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside', 'form'}
self.in_content = False
def handle_starttag(self, tag, attrs):
if tag not in self.skip_tags and tag in {'p', 'article', 'main'}:
self.in_content = True
def handle_data(self, data):
if self.in_content and data.strip():
self.content.append(data.strip())
def get_content(self):
return '\n\n'.join(self.content)
parser = ArticleExtractor()
parser.feed(sys.stdin.read())
print(parser.get_content())
" > temp_article.txt
;;
esac
# 3. Clean filename
FILENAME=$(echo "$ARTICLE_TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | cut -c 1-80 | sed 's/ *$//')
CONTENT_FILE="${FILENAME}.txt"
mv temp_article.txt "$CONTENT_FILE"
echo "✓ Saved article: $CONTENT_FILE"
```
#### PDF Document
```bash
# Download and extract PDF
echo "📑 Downloading PDF..."
# 1. Download PDF
PDF_FILENAME=$(basename "$URL")
curl -L -o "$PDF_FILENAME" "$URL"
# 2. Extract text using pdftotext (if available)
if command -v pdftotext &> /dev/null; then
pdftotext "$PDF_FILENAME" temp_pdf.txt
CONTENT_FILE="${PDF_FILENAME%.pdf}.txt"
mv temp_pdf.txt "$CONTENT_FILE"
echo "✓ Extracted text from PDF: $CONTENT_FILE"
# Optionally keep PDF
echo "Keep original PDF? (y/n)"
read -r KEEP_PDF
if [[ ! "$KEEP_PDF" =~ ^[Yy]$ ]]; then
rm "$PDF_FILENAME"
fi
else
# No pdftotext available
echo "⚠️ pdftotext not found. PDF downloaded but not extracted."
echo " Install with: brew install poppler"
CONTENT_FILE="$PDF_FILENAME"
fi
```
### Step 3: Create Ship-Learn-Next Action Plan
**IMPORTANT**: Always create an action plan after extracting content.
```bash
# Read the extracted content
CONTENT_FILE="[from previous step]"
# Invoke ship-learn-next skill logic:
# 1. Read the content file
# 2. Extract core actionable lessons
# 3. Create 5-rep progression plan
# 4. Save as: Ship-Learn-Next Plan - [Quest Title].md
# See ship-learn-next/SKILL.md for full details
```
**Key points for plan creation:**
- Extract actionable lessons (not just summaries)
- Define a specific 4-8 week quest
- Create Rep 1 (shippable this week)
- Design Reps 2-5 (progressive iterations)
- Save plan to markdown file
- Use format: `Ship-Learn-Next Plan - [Brief Quest Title].md`
### Step 4: Present Results
Show user:
```
✅ Tapestry Workflow Complete!
📥 Content Extracted:
✓ [Content type]: [Title]
✓ Saved to: [filename.txt]
✓ [X] words extracted
📋 Action Plan Created:
✓ Quest: [Quest title]
✓ Saved to: Ship-Learn-Next Plan - [Title].md
🎯 Your Quest: [One-line summary]
📍 Rep 1 (This Week): [Rep 1 goal]
When will you ship Rep 1?
```
## Complete Tapestry Workflow Script
```bash
#!/bin/bash
# Tapestry: Extract content + create action plan
# Usage: tapestry <URL>
URL="$1"
if [ -z "$URL" ]; then
echo "Usage: tapestry <URL>"
exit 1
fi
echo "🧵 Tapestry Workflow Starting..."
echo "URL: $URL"
echo ""
# Step 1: Detect content type
if [[ "$URL" =~ youtube\.com/watch || "$URL" =~ youtu\.be/ || "$URL" =~ youtube\.com/shorts ]]; then
CONTENT_TYPE="youtube"
elif [[ "$URL" =~ \.pdf$ ]] || curl -sI "$URL" | grep -iq "Content-Type: application/pdf"; then
CONTENT_TYPE="pdf"
else
CONTENT_TYPE="article"
fi
echo "📍 Detected: $CONTENT_TYPE"
echo ""
# Step 2: Extract content
case $CONTENT_TYPE in
youtube)
echo "📺 Extracting YouTube transcript..."
# [YouTube extraction code from above]
;;
article)
echo "📄 Extracting article..."
# [Article extraction code from above]
;;
pdf)
echo "📑 Downloading PDF..."
# [PDF extraction code from above]
;;
esac
echo ""
# Step 3: Create action plan
echo "🚀 Creating Ship-Learn-Next action plan..."
# [Plan creation using ship-learn-next skill]
echo ""
echo "✅ Tapestry Workflow Complete!"
echo ""
echo "📥 Content: $CONTENT_FILE"
echo "📋 Plan: Ship-Learn-Next Plan - [title].md"
echo ""
echo "🎯 Next: Review your action plan and ship Rep 1!"
```
## Error Handling
### Common Issues:
**1. Unsupported URL type**
- Try article extraction as fallback
- If fails: "Could not extract content from this URL type"
**2. No content extracted**
- Check if URL is accessible
- Try alternate extraction method
- Inform user: "Extraction failed. URL may require authentication."
**3. Tools not installed**
- Auto-install when possible (yt-dlp, reader, trafilatura)
- Provide install instructions if auto-install fails
- Use fallback methods when available
**4. Empty or invalid content**
- Verify file has content before creating plan
- Don't create plan if extraction failed
- Show preview to user before planning
## Best Practices
- ✅ Always show what was detected ("📍 Detected: youtube")
- ✅ Display progress for each step
- ✅ Save both content file AND plan file
- ✅ Show preview of extracted content (first 10 lines)
- ✅ Create plan automatically (don't ask)
- ✅ Present clear summary at end
- ✅ Ask commitment question: "When will you ship Rep 1?"
## Usage Examples
### Example 1: YouTube Video (using "tapestry")
```
User: tapestry https://www.youtube.com/watch?v=dQw4w9WgXcQ
Claude:
🧵 Tapestry Workflow Starting...
📍 Detected: youtube
📺 Extracting YouTube transcript...
✓ Saved transcript: Never Gonna Give You Up.txt
🚀 Creating action plan...
✓ Quest: Master Video Production
✓ Saved plan: Ship-Learn-Next Plan - Master Video Production.md
✅ Complete! When will you ship Rep 1?
```
### Example 2: Article (using "weave")
```
User: weave https://example.com/how-to-build-saas
Claude:
🧵 Tapestry Workflow Starting...
📍 Detected: article
📄 Extracting article...
✓ Using reader (Mozilla Readability)
✓ Saved article: How to Build a SaaS.txt
🚀 Creating action plan...
✓ Quest: Build a SaaS MVP
✓ Saved plan: Ship-Learn-Next Plan - Build a SaaS MVP.md
✅ Complete! When will you ship Rep 1?
```
### Example 3: PDF (using "help me plan")
```
User: help me plan https://example.com/research-paper.pdf
Claude:
🧵 Tapestry Workflow Starting...
📍 Detected: pdf
📑 Downloading PDF...
✓ Downloaded: research-paper.pdf
✓ Extracted text: research-paper.txt
🚀 Creating action plan...
✓ Quest: Apply Research Findings
✓ Saved plan: Ship-Learn-Next Plan - Apply Research Findings.md
✅ Complete! When will you ship Rep 1?
```
## Dependencies
This skill orchestrates the other skills, so requires:
**For YouTube:**
- yt-dlp (auto-installed)
- Python 3 (for deduplication)
**For Articles:**
- reader (npm) OR trafilatura (pip)
- Falls back to basic curl if neither available
**For PDFs:**
- curl (built-in)
- pdftotext (optional - from poppler package)
- Install: `brew install poppler` (macOS)
- Install: `apt install poppler-utils` (Linux)
**For Planning:**
- No additional requirements (uses built-in tools)
## Philosophy
**Tapestry weaves learning content into action.**
The unified workflow ensures you never just consume content - you always create an implementation plan. This transforms passive learning into active building.
Extract → Plan → Ship → Learn → Next.
That's the Tapestry way.

View File

@@ -0,0 +1,418 @@
---
name: youtube-transcript
description: Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
allowed-tools:
- Bash
- Read
- Write
---
# YouTube Transcript Downloader
This skill helps download transcripts (subtitles/captions) from YouTube videos using yt-dlp.
## When to Use This Skill
Activate this skill when the user:
- Provides a YouTube URL and wants the transcript
- Asks to "download transcript from YouTube"
- Wants to "get captions" or "get subtitles" from a video
- Asks to "transcribe a YouTube video"
- Needs text content from a YouTube video
## How It Works
### Priority Order:
1. **Check if yt-dlp is installed** - install if needed
2. **List available subtitles** - see what's actually available
3. **Try manual subtitles first** (`--write-sub`) - highest quality
4. **Fallback to auto-generated** (`--write-auto-sub`) - usually available
5. **Last resort: Whisper transcription** - if no subtitles exist (requires user confirmation)
6. **Confirm the download** and show the user where the file is saved
7. **Optionally clean up** the VTT format if the user wants plain text
## Installation Check
**IMPORTANT**: Always check if yt-dlp is installed first:
```bash
which yt-dlp || command -v yt-dlp
```
### If Not Installed
Attempt automatic installation based on the system:
**macOS (Homebrew)**:
```bash
brew install yt-dlp
```
**Linux (apt/Debian/Ubuntu)**:
```bash
sudo apt update && sudo apt install -y yt-dlp
```
**Alternative (pip - works on all systems)**:
```bash
pip3 install yt-dlp
# or
python3 -m pip install yt-dlp
```
**If installation fails**: Inform the user they need to install yt-dlp manually and provide them with installation instructions from https://github.com/yt-dlp/yt-dlp#installation
## Check Available Subtitles
**ALWAYS do this first** before attempting to download:
```bash
yt-dlp --list-subs "YOUTUBE_URL"
```
This shows what subtitle types are available without downloading anything. Look for:
- Manual subtitles (better quality)
- Auto-generated subtitles (usually available)
- Available languages
## Download Strategy
### Option 1: Manual Subtitles (Preferred)
Try this first - highest quality, human-created:
```bash
yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
```
### Option 2: Auto-Generated Subtitles (Fallback)
If manual subtitles aren't available:
```bash
yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
```
Both commands create a `.vtt` file (WebVTT subtitle format).
## Option 3: Whisper Transcription (Last Resort)
**ONLY use this if both manual and auto-generated subtitles are unavailable.**
### Step 1: Show File Size and Ask for Confirmation
```bash
# Get audio file size estimate
yt-dlp --print "%(filesize,filesize_approx)s" -f "bestaudio" "YOUTUBE_URL"
# Or get duration to estimate
yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"
```
**IMPORTANT**: Display the file size to the user and ask: "No subtitles are available. I can download the audio (approximately X MB) and transcribe it using Whisper. Would you like to proceed?"
**Wait for user confirmation before continuing.**
### Step 2: Check for Whisper Installation
```bash
command -v whisper
```
If not installed, ask user: "Whisper is not installed. Install it with `pip install openai-whisper` (requires ~1-3GB for models)? This is a one-time installation."
**Wait for user confirmation before installing.**
Install if approved:
```bash
pip3 install openai-whisper
```
### Step 3: Download Audio Only
```bash
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"
```
### Step 4: Transcribe with Whisper
```bash
# Auto-detect language (recommended)
whisper audio_VIDEO_ID.mp3 --model base --output_format vtt
# Or specify language if known
whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt
```
**Model Options** (stick to `base` for now):
- `tiny` - fastest, least accurate (~1GB)
- `base` - good balance (~1GB) ← **USE THIS**
- `small` - better accuracy (~2GB)
- `medium` - very good (~5GB)
- `large` - best accuracy (~10GB)
### Step 5: Cleanup
After transcription completes, ask user: "Transcription complete! Would you like me to delete the audio file to save space?"
If yes:
```bash
rm audio_VIDEO_ID.mp3
```
## Getting Video Information
### Extract Video Title (for filename)
```bash
yt-dlp --print "%(title)s" "YOUTUBE_URL"
```
Use this to create meaningful filenames based on the video title. Clean the title for filesystem compatibility:
- Replace `/` with `-`
- Replace special characters that might cause issues
- Consider using sanitized version: `$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')`
## Post-Processing
### Convert to Plain Text (Recommended)
YouTube's auto-generated VTT files contain **duplicate lines** because captions are shown progressively with overlapping timestamps. Always deduplicate when converting to plain text while preserving the original speaking order.
```bash
python3 -c "
import sys, re
seen = set()
with open('transcript.en.vtt', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > transcript.txt
```
### Complete Post-Processing with Video Title
```bash
# Get video title
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "YOUTUBE_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
# Find the VTT file
VTT_FILE=$(ls *.vtt | head -n 1)
# Convert with deduplication
python3 -c "
import sys, re
seen = set()
with open('$VTT_FILE', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > "${VIDEO_TITLE}.txt"
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
# Clean up VTT file
rm "$VTT_FILE"
echo "✓ Cleaned up temporary VTT file"
```
## Output Formats
- **VTT format** (`.vtt`): Includes timestamps and formatting, good for video players
- **Plain text** (`.txt`): Just the text content, good for reading or analysis
## Tips
- The filename will be `{output_name}.{language_code}.vtt` (e.g., `transcript.en.vtt`)
- Most YouTube videos have auto-generated English subtitles
- Some videos may have multiple language options
- If auto-subtitles aren't available, try `--write-sub` instead for manual subtitles
## Complete Workflow Example
```bash
VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Get video title for filename
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
OUTPUT_NAME="transcript_temp"
# ============================================
# STEP 1: Check if yt-dlp is installed
# ============================================
if ! command -v yt-dlp &> /dev/null; then
echo "yt-dlp not found, attempting to install..."
if command -v brew &> /dev/null; then
brew install yt-dlp
elif command -v apt &> /dev/null; then
sudo apt update && sudo apt install -y yt-dlp
else
pip3 install yt-dlp
fi
fi
# ============================================
# STEP 2: List available subtitles
# ============================================
echo "Checking available subtitles..."
yt-dlp --list-subs "$VIDEO_URL"
# ============================================
# STEP 3: Try manual subtitles first
# ============================================
echo "Attempting to download manual subtitles..."
if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
echo "✓ Manual subtitles downloaded successfully!"
ls -lh ${OUTPUT_NAME}.*
else
# ============================================
# STEP 4: Fallback to auto-generated
# ============================================
echo "Manual subtitles not available. Trying auto-generated..."
if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
echo "✓ Auto-generated subtitles downloaded successfully!"
ls -lh ${OUTPUT_NAME}.*
else
# ============================================
# STEP 5: Last resort - Whisper transcription
# ============================================
echo "⚠ No subtitles available for this video."
# Get file size
FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL")
DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL")
echo "Video: $TITLE"
echo "Duration: $((DURATION / 60)) minutes"
echo "Audio size: ~$((FILE_SIZE / 1024 / 1024)) MB"
echo ""
echo "Would you like to download and transcribe with Whisper? (y/n)"
read -r RESPONSE
if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then
# Check for Whisper
if ! command -v whisper &> /dev/null; then
echo "Whisper not installed. Install now? (requires ~1-3GB) (y/n)"
read -r INSTALL_RESPONSE
if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then
pip3 install openai-whisper
else
echo "Cannot proceed without Whisper. Exiting."
exit 1
fi
fi
# Download audio
echo "Downloading audio..."
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "$VIDEO_URL"
# Get the actual audio filename
AUDIO_FILE=$(ls audio_*.mp3 | head -n 1)
# Transcribe
echo "Transcribing with Whisper (this may take a few minutes)..."
whisper "$AUDIO_FILE" --model base --output_format vtt
# Cleanup
echo "Transcription complete! Delete audio file? (y/n)"
read -r CLEANUP_RESPONSE
if [[ "$CLEANUP_RESPONSE" =~ ^[Yy]$ ]]; then
rm "$AUDIO_FILE"
echo "Audio file deleted."
fi
ls -lh *.vtt
else
echo "Transcription cancelled."
exit 0
fi
fi
fi
# ============================================
# STEP 6: Convert to readable plain text with deduplication
# ============================================
VTT_FILE=$(ls ${OUTPUT_NAME}*.vtt 2>/dev/null || ls *.vtt | head -n 1)
if [ -f "$VTT_FILE" ]; then
echo "Converting to readable format and removing duplicates..."
python3 -c "
import sys, re
seen = set()
with open('$VTT_FILE', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&amp;', '&').replace('&gt;', '>').replace('&lt;', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > "${VIDEO_TITLE}.txt"
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
# Clean up temporary VTT file
rm "$VTT_FILE"
echo "✓ Cleaned up temporary VTT file"
else
echo "⚠ No VTT file found to convert"
fi
echo "✓ Complete!"
```
**Note**: This complete workflow handles all scenarios with proper error checking and user prompts at each decision point.
## Error Handling
### Common Issues and Solutions:
**1. yt-dlp not installed**
- Attempt automatic installation based on system (Homebrew/apt/pip)
- If installation fails, provide manual installation link
- Verify installation before proceeding
**2. No subtitles available**
- List available subtitles first to confirm
- Try both `--write-sub` and `--write-auto-sub`
- If both fail, offer Whisper transcription option
- Show file size and ask for user confirmation before downloading audio
**3. Invalid or private video**
- Check if URL is correct format: `https://www.youtube.com/watch?v=VIDEO_ID`
- Some videos may be private, age-restricted, or geo-blocked
- Inform user of the specific error from yt-dlp
**4. Whisper installation fails**
- May require system dependencies (ffmpeg, rust)
- Provide fallback: "Install manually with: `pip3 install openai-whisper`"
- Check available disk space (models require 1-10GB depending on size)
**5. Download interrupted or failed**
- Check internet connection
- Verify sufficient disk space
- Try again with `--no-check-certificate` if SSL issues occur
**6. Multiple subtitle languages**
- By default, yt-dlp downloads all available languages
- Can specify with `--sub-langs en` for English only
- List available with `--list-subs` first
### Best Practices:
- ✅ Always check what's available before attempting download (`--list-subs`)
- ✅ Verify success at each step before proceeding to next
- ✅ Ask user before large downloads (audio files, Whisper models)
- ✅ Clean up temporary files after processing
- ✅ Provide clear feedback about what's happening at each stage
- ✅ Handle errors gracefully with helpful messages