Initial commit
This commit is contained in:
371
skills/article-extractor/SKILL.md
Normal file
371
skills/article-extractor/SKILL.md
Normal file
@@ -0,0 +1,371 @@
|
||||
---
|
||||
name: article-extractor
|
||||
description: Extract clean article content from URLs (blog posts, articles, tutorials) and save as readable text. Use when user wants to download, extract, or save an article/blog post from a URL without ads, navigation, or clutter.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Write
|
||||
---
|
||||
|
||||
# Article Extractor
|
||||
|
||||
This skill extracts the main content from web articles and blog posts, removing navigation, ads, newsletter signups, and other clutter. Saves clean, readable text.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Activate when the user:
|
||||
- Provides an article/blog URL and wants the text content
|
||||
- Asks to "download this article"
|
||||
- Wants to "extract the content from [URL]"
|
||||
- Asks to "save this blog post as text"
|
||||
- Needs clean article text without distractions
|
||||
|
||||
## How It Works
|
||||
|
||||
### Priority Order:
|
||||
1. **Check if tools are installed** (reader or trafilatura)
|
||||
2. **Download and extract article** using best available tool
|
||||
3. **Clean up the content** (remove extra whitespace, format properly)
|
||||
4. **Save to file** with article title as filename
|
||||
5. **Confirm location** and show preview
|
||||
|
||||
## Installation Check
|
||||
|
||||
Check for article extraction tools in this order:
|
||||
|
||||
### Option 1: reader (Recommended - Mozilla's Readability)
|
||||
|
||||
```bash
|
||||
command -v reader
|
||||
```
|
||||
|
||||
If not installed:
|
||||
```bash
|
||||
npm install -g @mozilla/readability-cli
|
||||
# or
|
||||
npm install -g reader-cli
|
||||
```
|
||||
|
||||
### Option 2: trafilatura (Python-based, very good)
|
||||
|
||||
```bash
|
||||
command -v trafilatura
|
||||
```
|
||||
|
||||
If not installed:
|
||||
```bash
|
||||
pip3 install trafilatura
|
||||
```
|
||||
|
||||
### Option 3: Fallback (curl + simple parsing)
|
||||
|
||||
If no tools available, use basic curl + text extraction (less reliable but works)
|
||||
|
||||
## Extraction Methods
|
||||
|
||||
### Method 1: Using reader (Best for most articles)
|
||||
|
||||
```bash
|
||||
# Extract article
|
||||
reader "URL" > article.txt
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Based on Mozilla's Readability algorithm
|
||||
- Excellent at removing clutter
|
||||
- Preserves article structure
|
||||
|
||||
### Method 2: Using trafilatura (Best for blogs/news)
|
||||
|
||||
```bash
|
||||
# Extract article
|
||||
trafilatura --URL "URL" --output-format txt > article.txt
|
||||
|
||||
# Or with more options
|
||||
trafilatura --URL "URL" --output-format txt --no-comments --no-tables > article.txt
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Very accurate extraction
|
||||
- Good with various site structures
|
||||
- Handles multiple languages
|
||||
|
||||
**Options:**
|
||||
- `--no-comments`: Skip comment sections
|
||||
- `--no-tables`: Skip data tables
|
||||
- `--precision`: Favor precision over recall
|
||||
- `--recall`: Extract more content (may include some noise)
|
||||
|
||||
### Method 3: Fallback (curl + basic parsing)
|
||||
|
||||
```bash
|
||||
# Download and extract basic content
|
||||
curl -s "URL" | python3 -c "
|
||||
from html.parser import HTMLParser
|
||||
import sys
|
||||
|
||||
class ArticleExtractor(HTMLParser):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.in_content = False
|
||||
self.content = []
|
||||
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside'}
|
||||
self.current_tag = None
|
||||
|
||||
def handle_starttag(self, tag, attrs):
|
||||
if tag not in self.skip_tags:
|
||||
if tag in {'p', 'article', 'main', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6'}:
|
||||
self.in_content = True
|
||||
self.current_tag = tag
|
||||
|
||||
def handle_data(self, data):
|
||||
if self.in_content and data.strip():
|
||||
self.content.append(data.strip())
|
||||
|
||||
def get_content(self):
|
||||
return '\n\n'.join(self.content)
|
||||
|
||||
parser = ArticleExtractor()
|
||||
parser.feed(sys.stdin.read())
|
||||
print(parser.get_content())
|
||||
" > article.txt
|
||||
```
|
||||
|
||||
**Note:** This is less reliable but works without dependencies.
|
||||
|
||||
## Getting Article Title
|
||||
|
||||
Extract title for filename:
|
||||
|
||||
### Using reader:
|
||||
```bash
|
||||
# reader outputs markdown with title at top
|
||||
TITLE=$(reader "URL" | head -n 1 | sed 's/^# //')
|
||||
```
|
||||
|
||||
### Using trafilatura:
|
||||
```bash
|
||||
# Get metadata including title
|
||||
TITLE=$(trafilatura --URL "URL" --json | python3 -c "import json, sys; print(json.load(sys.stdin)['title'])")
|
||||
```
|
||||
|
||||
### Using curl (fallback):
|
||||
```bash
|
||||
TITLE=$(curl -s "URL" | grep -oP '<title>\K[^<]+' | sed 's/ - .*//' | sed 's/ | .*//')
|
||||
```
|
||||
|
||||
## Filename Creation
|
||||
|
||||
Clean title for filesystem:
|
||||
|
||||
```bash
|
||||
# Get title
|
||||
TITLE="Article Title from Website"
|
||||
|
||||
# Clean for filesystem (remove special chars, limit length)
|
||||
FILENAME=$(echo "$TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | tr '<' '' | tr '>' '' | tr '|' '-' | cut -c 1-100 | sed 's/ *$//')
|
||||
|
||||
# Add extension
|
||||
FILENAME="${FILENAME}.txt"
|
||||
```
|
||||
|
||||
## Complete Workflow
|
||||
|
||||
```bash
|
||||
ARTICLE_URL="https://example.com/article"
|
||||
|
||||
# Check for tools
|
||||
if command -v reader &> /dev/null; then
|
||||
TOOL="reader"
|
||||
echo "Using reader (Mozilla Readability)"
|
||||
elif command -v trafilatura &> /dev/null; then
|
||||
TOOL="trafilatura"
|
||||
echo "Using trafilatura"
|
||||
else
|
||||
TOOL="fallback"
|
||||
echo "Using fallback method (may be less accurate)"
|
||||
fi
|
||||
|
||||
# Extract article
|
||||
case $TOOL in
|
||||
reader)
|
||||
# Get content
|
||||
reader "$ARTICLE_URL" > temp_article.txt
|
||||
|
||||
# Get title (first line after # in markdown)
|
||||
TITLE=$(head -n 1 temp_article.txt | sed 's/^# //')
|
||||
;;
|
||||
|
||||
trafilatura)
|
||||
# Get title from metadata
|
||||
METADATA=$(trafilatura --URL "$ARTICLE_URL" --json)
|
||||
TITLE=$(echo "$METADATA" | python3 -c "import json, sys; print(json.load(sys.stdin).get('title', 'Article'))")
|
||||
|
||||
# Get clean content
|
||||
trafilatura --URL "$ARTICLE_URL" --output-format txt --no-comments > temp_article.txt
|
||||
;;
|
||||
|
||||
fallback)
|
||||
# Get title
|
||||
TITLE=$(curl -s "$ARTICLE_URL" | grep -oP '<title>\K[^<]+' | head -n 1)
|
||||
TITLE=${TITLE%% - *} # Remove site name
|
||||
TITLE=${TITLE%% | *} # Remove site name (alternate)
|
||||
|
||||
# Get content (basic extraction)
|
||||
curl -s "$ARTICLE_URL" | python3 -c "
|
||||
from html.parser import HTMLParser
|
||||
import sys
|
||||
|
||||
class ArticleExtractor(HTMLParser):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.in_content = False
|
||||
self.content = []
|
||||
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside', 'form'}
|
||||
|
||||
def handle_starttag(self, tag, attrs):
|
||||
if tag not in self.skip_tags:
|
||||
if tag in {'p', 'article', 'main'}:
|
||||
self.in_content = True
|
||||
if tag in {'h1', 'h2', 'h3'}:
|
||||
self.content.append('\n')
|
||||
|
||||
def handle_data(self, data):
|
||||
if self.in_content and data.strip():
|
||||
self.content.append(data.strip())
|
||||
|
||||
def get_content(self):
|
||||
return '\n\n'.join(self.content)
|
||||
|
||||
parser = ArticleExtractor()
|
||||
parser.feed(sys.stdin.read())
|
||||
print(parser.get_content())
|
||||
" > temp_article.txt
|
||||
;;
|
||||
esac
|
||||
|
||||
# Clean filename
|
||||
FILENAME=$(echo "$TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | tr '<>' '' | tr '|' '-' | cut -c 1-80 | sed 's/ *$//' | sed 's/^ *//')
|
||||
FILENAME="${FILENAME}.txt"
|
||||
|
||||
# Move to final filename
|
||||
mv temp_article.txt "$FILENAME"
|
||||
|
||||
# Show result
|
||||
echo "✓ Extracted article: $TITLE"
|
||||
echo "✓ Saved to: $FILENAME"
|
||||
echo ""
|
||||
echo "Preview (first 10 lines):"
|
||||
head -n 10 "$FILENAME"
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. Tool not installed**
|
||||
- Try alternate tool (reader → trafilatura → fallback)
|
||||
- Offer to install: "Install reader with: npm install -g reader-cli"
|
||||
|
||||
**2. Paywall or login required**
|
||||
- Extraction tools may fail
|
||||
- Inform user: "This article requires authentication. Cannot extract."
|
||||
|
||||
**3. Invalid URL**
|
||||
- Check URL format
|
||||
- Try with and without redirects
|
||||
|
||||
**4. No content extracted**
|
||||
- Site may use heavy JavaScript
|
||||
- Try fallback method
|
||||
- Inform user if extraction fails
|
||||
|
||||
**5. Special characters in title**
|
||||
- Clean title for filesystem
|
||||
- Remove: `/`, `:`, `?`, `"`, `<`, `>`, `|`
|
||||
- Replace with `-` or remove
|
||||
|
||||
## Output Format
|
||||
|
||||
### Saved File Contains:
|
||||
- Article title (if available)
|
||||
- Author (if available from tool)
|
||||
- Main article text
|
||||
- Section headings
|
||||
- No navigation, ads, or clutter
|
||||
|
||||
### What Gets Removed:
|
||||
- Navigation menus
|
||||
- Ads and promotional content
|
||||
- Newsletter signup forms
|
||||
- Related articles sidebars
|
||||
- Comment sections (optional)
|
||||
- Social media buttons
|
||||
- Cookie notices
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
**1. Use reader for most articles**
|
||||
- Best all-around tool
|
||||
- Based on Firefox Reader View
|
||||
- Works on most news sites and blogs
|
||||
|
||||
**2. Use trafilatura for:**
|
||||
- Academic articles
|
||||
- News sites
|
||||
- Blogs with complex layouts
|
||||
- Non-English content
|
||||
|
||||
**3. Fallback method limitations:**
|
||||
- May include some noise
|
||||
- Less accurate paragraph detection
|
||||
- Better than nothing for simple sites
|
||||
|
||||
**4. Check extraction quality:**
|
||||
- Always show preview to user
|
||||
- Ask if it looks correct
|
||||
- Offer to try different tool if needed
|
||||
|
||||
## Example Usage
|
||||
|
||||
**Simple extraction:**
|
||||
```bash
|
||||
# User: "Extract https://example.com/article"
|
||||
reader "https://example.com/article" > temp.txt
|
||||
TITLE=$(head -n 1 temp.txt | sed 's/^# //')
|
||||
FILENAME="$(echo "$TITLE" | tr '/' '-').txt"
|
||||
mv temp.txt "$FILENAME"
|
||||
echo "✓ Saved to: $FILENAME"
|
||||
```
|
||||
|
||||
**With error handling:**
|
||||
```bash
|
||||
if ! reader "$URL" > temp.txt 2>/dev/null; then
|
||||
if command -v trafilatura &> /dev/null; then
|
||||
trafilatura --URL "$URL" --output-format txt > temp.txt
|
||||
else
|
||||
echo "Error: Could not extract article. Install reader or trafilatura."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
- ✅ Always show preview after extraction (first 10 lines)
|
||||
- ✅ Verify extraction succeeded before saving
|
||||
- ✅ Clean filename for filesystem compatibility
|
||||
- ✅ Try fallback method if primary fails
|
||||
- ✅ Inform user which tool was used
|
||||
- ✅ Keep filename length reasonable (< 100 chars)
|
||||
|
||||
## After Extraction
|
||||
|
||||
Display to user:
|
||||
1. "✓ Extracted: [Article Title]"
|
||||
2. "✓ Saved to: [filename]"
|
||||
3. Show preview (first 10-15 lines)
|
||||
4. File size and location
|
||||
|
||||
Ask if needed:
|
||||
- "Would you like me to also create a Ship-Learn-Next plan from this?" (if using ship-learn-next skill)
|
||||
- "Should I extract another article?"
|
||||
538
skills/content-research-writer/SKILL.md
Normal file
538
skills/content-research-writer/SKILL.md
Normal file
@@ -0,0 +1,538 @@
|
||||
---
|
||||
name: content-research-writer
|
||||
description: Assists in writing high-quality content by conducting research, adding citations, improving hooks, iterating on outlines, and providing real-time feedback on each section. Transforms your writing process from solo effort to collaborative partnership.
|
||||
---
|
||||
|
||||
# Content Research Writer
|
||||
|
||||
This skill acts as your writing partner, helping you research, outline, draft, and refine content while maintaining your unique voice and style.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- Writing blog posts, articles, or newsletters
|
||||
- Creating educational content or tutorials
|
||||
- Drafting thought leadership pieces
|
||||
- Researching and writing case studies
|
||||
- Producing technical documentation with sources
|
||||
- Writing with proper citations and references
|
||||
- Improving hooks and introductions
|
||||
- Getting section-by-section feedback while writing
|
||||
|
||||
## What This Skill Does
|
||||
|
||||
1. **Collaborative Outlining**: Helps you structure ideas into coherent outlines
|
||||
2. **Research Assistance**: Finds relevant information and adds citations
|
||||
3. **Hook Improvement**: Strengthens your opening to capture attention
|
||||
4. **Section Feedback**: Reviews each section as you write
|
||||
5. **Voice Preservation**: Maintains your writing style and tone
|
||||
6. **Citation Management**: Adds and formats references properly
|
||||
7. **Iterative Refinement**: Helps you improve through multiple drafts
|
||||
|
||||
## How to Use
|
||||
|
||||
### Setup Your Writing Environment
|
||||
|
||||
Create a dedicated folder for your article:
|
||||
```
|
||||
mkdir ~/writing/my-article-title
|
||||
cd ~/writing/my-article-title
|
||||
```
|
||||
|
||||
Create your draft file:
|
||||
```
|
||||
touch article-draft.md
|
||||
```
|
||||
|
||||
Open Claude Code from this directory and start writing.
|
||||
|
||||
### Basic Workflow
|
||||
|
||||
1. **Start with an outline**:
|
||||
```
|
||||
Help me create an outline for an article about [topic]
|
||||
```
|
||||
|
||||
2. **Research and add citations**:
|
||||
```
|
||||
Research [specific topic] and add citations to my outline
|
||||
```
|
||||
|
||||
3. **Improve the hook**:
|
||||
```
|
||||
Here's my introduction. Help me make the hook more compelling.
|
||||
```
|
||||
|
||||
4. **Get section feedback**:
|
||||
```
|
||||
I just finished the "Why This Matters" section. Review it and give feedback.
|
||||
```
|
||||
|
||||
5. **Refine and polish**:
|
||||
```
|
||||
Review the full draft for flow, clarity, and consistency.
|
||||
```
|
||||
|
||||
## Instructions
|
||||
|
||||
When a user requests writing assistance:
|
||||
|
||||
1. **Understand the Writing Project**
|
||||
|
||||
Ask clarifying questions:
|
||||
- What's the topic and main argument?
|
||||
- Who's the target audience?
|
||||
- What's the desired length/format?
|
||||
- What's your goal? (educate, persuade, entertain, explain)
|
||||
- Any existing research or sources to include?
|
||||
- What's your writing style? (formal, conversational, technical)
|
||||
|
||||
2. **Collaborative Outlining**
|
||||
|
||||
Help structure the content:
|
||||
|
||||
```markdown
|
||||
# Article Outline: [Title]
|
||||
|
||||
## Hook
|
||||
- [Opening line/story/statistic]
|
||||
- [Why reader should care]
|
||||
|
||||
## Introduction
|
||||
- Context and background
|
||||
- Problem statement
|
||||
- What this article covers
|
||||
|
||||
## Main Sections
|
||||
|
||||
### Section 1: [Title]
|
||||
- Key point A
|
||||
- Key point B
|
||||
- Example/evidence
|
||||
- [Research needed: specific topic]
|
||||
|
||||
### Section 2: [Title]
|
||||
- Key point C
|
||||
- Key point D
|
||||
- Data/citation needed
|
||||
|
||||
### Section 3: [Title]
|
||||
- Key point E
|
||||
- Counter-arguments
|
||||
- Resolution
|
||||
|
||||
## Conclusion
|
||||
- Summary of main points
|
||||
- Call to action
|
||||
- Final thought
|
||||
|
||||
## Research To-Do
|
||||
- [ ] Find data on [topic]
|
||||
- [ ] Get examples of [concept]
|
||||
- [ ] Source citation for [claim]
|
||||
```
|
||||
|
||||
**Iterate on outline**:
|
||||
- Adjust based on feedback
|
||||
- Ensure logical flow
|
||||
- Identify research gaps
|
||||
- Mark sections for deep dives
|
||||
|
||||
3. **Conduct Research**
|
||||
|
||||
When user requests research on a topic:
|
||||
|
||||
- Search for relevant information
|
||||
- Find credible sources
|
||||
- Extract key facts, quotes, and data
|
||||
- Add citations in requested format
|
||||
|
||||
Example output:
|
||||
```markdown
|
||||
## Research: AI Impact on Productivity
|
||||
|
||||
Key Findings:
|
||||
|
||||
1. **Productivity Gains**: Studies show 40% time savings for
|
||||
content creation tasks [1]
|
||||
|
||||
2. **Adoption Rates**: 67% of knowledge workers use AI tools
|
||||
weekly [2]
|
||||
|
||||
3. **Expert Quote**: "AI augments rather than replaces human
|
||||
creativity" - Dr. Jane Smith, MIT [3]
|
||||
|
||||
Citations:
|
||||
[1] McKinsey Global Institute. (2024). "The Economic Potential
|
||||
of Generative AI"
|
||||
[2] Stack Overflow Developer Survey (2024)
|
||||
[3] Smith, J. (2024). MIT Technology Review interview
|
||||
|
||||
Added to outline under Section 2.
|
||||
```
|
||||
|
||||
4. **Improve Hooks**
|
||||
|
||||
When user shares an introduction, analyze and strengthen:
|
||||
|
||||
**Current Hook Analysis**:
|
||||
- What works: [positive elements]
|
||||
- What could be stronger: [areas for improvement]
|
||||
- Emotional impact: [current vs. potential]
|
||||
|
||||
**Suggested Alternatives**:
|
||||
|
||||
Option 1: [Bold statement]
|
||||
> [Example]
|
||||
*Why it works: [explanation]*
|
||||
|
||||
Option 2: [Personal story]
|
||||
> [Example]
|
||||
*Why it works: [explanation]*
|
||||
|
||||
Option 3: [Surprising data]
|
||||
> [Example]
|
||||
*Why it works: [explanation]*
|
||||
|
||||
**Questions to hook**:
|
||||
- Does it create curiosity?
|
||||
- Does it promise value?
|
||||
- Is it specific enough?
|
||||
- Does it match the audience?
|
||||
|
||||
5. **Provide Section-by-Section Feedback**
|
||||
|
||||
As user writes each section, review for:
|
||||
|
||||
```markdown
|
||||
# Feedback: [Section Name]
|
||||
|
||||
## What Works Well ✓
|
||||
- [Strength 1]
|
||||
- [Strength 2]
|
||||
- [Strength 3]
|
||||
|
||||
## Suggestions for Improvement
|
||||
|
||||
### Clarity
|
||||
- [Specific issue] → [Suggested fix]
|
||||
- [Complex sentence] → [Simpler alternative]
|
||||
|
||||
### Flow
|
||||
- [Transition issue] → [Better connection]
|
||||
- [Paragraph order] → [Suggested reordering]
|
||||
|
||||
### Evidence
|
||||
- [Claim needing support] → [Add citation or example]
|
||||
- [Generic statement] → [Make more specific]
|
||||
|
||||
### Style
|
||||
- [Tone inconsistency] → [Match your voice better]
|
||||
- [Word choice] → [Stronger alternative]
|
||||
|
||||
## Specific Line Edits
|
||||
|
||||
Original:
|
||||
> [Exact quote from draft]
|
||||
|
||||
Suggested:
|
||||
> [Improved version]
|
||||
|
||||
Why: [Explanation]
|
||||
|
||||
## Questions to Consider
|
||||
- [Thought-provoking question 1]
|
||||
- [Thought-provoking question 2]
|
||||
|
||||
Ready to move to next section!
|
||||
```
|
||||
|
||||
6. **Preserve Writer's Voice**
|
||||
|
||||
Important principles:
|
||||
|
||||
- **Learn their style**: Read existing writing samples
|
||||
- **Suggest, don't replace**: Offer options, not directives
|
||||
- **Match tone**: Formal, casual, technical, friendly
|
||||
- **Respect choices**: If they prefer their version, support it
|
||||
- **Enhance, don't override**: Make their writing better, not different
|
||||
|
||||
Ask periodically:
|
||||
- "Does this sound like you?"
|
||||
- "Is this the right tone?"
|
||||
- "Should I be more/less [formal/casual/technical]?"
|
||||
|
||||
7. **Citation Management**
|
||||
|
||||
Handle references based on user preference:
|
||||
|
||||
**Inline Citations**:
|
||||
```markdown
|
||||
Studies show 40% productivity improvement (McKinsey, 2024).
|
||||
```
|
||||
|
||||
**Numbered References**:
|
||||
```markdown
|
||||
Studies show 40% productivity improvement [1].
|
||||
|
||||
[1] McKinsey Global Institute. (2024)...
|
||||
```
|
||||
|
||||
**Footnote Style**:
|
||||
```markdown
|
||||
Studies show 40% productivity improvement^1
|
||||
|
||||
^1: McKinsey Global Institute. (2024)...
|
||||
```
|
||||
|
||||
Maintain a running citations list:
|
||||
```markdown
|
||||
## References
|
||||
|
||||
1. Author. (Year). "Title". Publication.
|
||||
2. Author. (Year). "Title". Publication.
|
||||
...
|
||||
```
|
||||
|
||||
8. **Final Review and Polish**
|
||||
|
||||
When draft is complete, provide comprehensive feedback:
|
||||
|
||||
```markdown
|
||||
# Full Draft Review
|
||||
|
||||
## Overall Assessment
|
||||
|
||||
**Strengths**:
|
||||
- [Major strength 1]
|
||||
- [Major strength 2]
|
||||
- [Major strength 3]
|
||||
|
||||
**Impact**: [Overall effectiveness assessment]
|
||||
|
||||
## Structure & Flow
|
||||
- [Comments on organization]
|
||||
- [Transition quality]
|
||||
- [Pacing assessment]
|
||||
|
||||
## Content Quality
|
||||
- [Argument strength]
|
||||
- [Evidence sufficiency]
|
||||
- [Example effectiveness]
|
||||
|
||||
## Technical Quality
|
||||
- Grammar and mechanics: [assessment]
|
||||
- Consistency: [assessment]
|
||||
- Citations: [completeness check]
|
||||
|
||||
## Readability
|
||||
- Clarity score: [evaluation]
|
||||
- Sentence variety: [evaluation]
|
||||
- Paragraph length: [evaluation]
|
||||
|
||||
## Final Polish Suggestions
|
||||
|
||||
1. **Introduction**: [Specific improvements]
|
||||
2. **Body**: [Specific improvements]
|
||||
3. **Conclusion**: [Specific improvements]
|
||||
4. **Title**: [Options if needed]
|
||||
|
||||
## Pre-Publish Checklist
|
||||
- [ ] All claims sourced
|
||||
- [ ] Citations formatted
|
||||
- [ ] Examples clear
|
||||
- [ ] Transitions smooth
|
||||
- [ ] Call to action present
|
||||
- [ ] Proofread for typos
|
||||
|
||||
Ready to publish! 🚀
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Teresa Torres's Workflow
|
||||
|
||||
**User**: "I'm writing an article about continuous discovery. Help me create an outline."
|
||||
|
||||
**Process**:
|
||||
1. Collaborates on outline structure
|
||||
2. Identifies research needs
|
||||
3. User starts writing introduction
|
||||
4. Reviews and improves the hook
|
||||
5. User writes each section
|
||||
6. Provides feedback after each section
|
||||
7. Conducts research and adds citations
|
||||
8. Final review of complete draft
|
||||
9. Polish and prep for publishing
|
||||
|
||||
**Result**: Well-researched, properly cited article written in Teresa's voice with strong structure and flow.
|
||||
|
||||
### Example 2: Research-Heavy Article
|
||||
|
||||
**User**: "I'm writing about AI's impact on product management. Help me find current data and examples."
|
||||
|
||||
**Output**:
|
||||
```markdown
|
||||
## Research Compiled
|
||||
|
||||
### Recent Data
|
||||
- Gartner: 80% of PMs will use AI tools by 2025 [1]
|
||||
- Survey: 45% report faster feature shipping [2]
|
||||
- Case study: Airbnb's AI-assisted prioritization [3]
|
||||
|
||||
### Expert Quotes
|
||||
- "AI amplifies PM judgment, not replaces it" - Marty Cagan
|
||||
- [Additional quotes with citations]
|
||||
|
||||
### Real Examples
|
||||
1. **Company A**: Used AI for user research synthesis
|
||||
- Result: 60% time savings
|
||||
- Source: [citation]
|
||||
|
||||
2. **Company B**: AI-powered roadmap analysis
|
||||
- Result: Better stakeholder alignment
|
||||
- Source: [citation]
|
||||
|
||||
All added to your outline with proper citations.
|
||||
```
|
||||
|
||||
### Example 3: Hook Improvement
|
||||
|
||||
**User's Original Hook**:
|
||||
> "Product management is changing because of AI. In this article, I'll discuss some ways AI affects product managers."
|
||||
|
||||
**Improved Options**:
|
||||
|
||||
**Option 1 (Data-driven)**:
|
||||
> "Last month, I asked AI to analyze 500 customer interviews. It took 30 minutes instead of 3 weeks. Product management will never be the same."
|
||||
|
||||
**Option 2 (Question)**:
|
||||
> "What if you could talk to every customer, read every review, and analyze every support ticket—all before your morning coffee?"
|
||||
|
||||
**Option 3 (Story)**:
|
||||
> "Sarah spent two weeks building the wrong feature. Not because she didn't understand her users, but because she couldn't process the hundreds of interviews fast enough to spot the pattern."
|
||||
|
||||
### Example 4: Section Feedback
|
||||
|
||||
**User**: "Just finished my 'Common Mistakes' section. Thoughts?"
|
||||
|
||||
**Response**:
|
||||
```markdown
|
||||
# Feedback: Common Mistakes Section
|
||||
|
||||
## What Works Well ✓
|
||||
- Strong examples (the Slack notification story)
|
||||
- Clear structure (3 mistakes, clean layout)
|
||||
- Practical advice for each mistake
|
||||
|
||||
## Suggestions
|
||||
|
||||
### Make It More Specific
|
||||
Your second point says "Bad prioritization" but could be stronger:
|
||||
|
||||
Current:
|
||||
> "Many teams prioritize badly by focusing on features instead of outcomes."
|
||||
|
||||
Suggested:
|
||||
> "I've watched teams ship 14 features in a quarter yet move none of their key metrics. They prioritized activity over progress."
|
||||
|
||||
### Add Data
|
||||
The third mistake would benefit from evidence:
|
||||
> "[Add citation]: Studies show teams without regular user contact are 3x more likely to build unused features [needs source]"
|
||||
|
||||
### Flow Improvement
|
||||
Consider reordering: Mistake 3 → Mistake 2 → Mistake 1
|
||||
This builds from small to big impact.
|
||||
|
||||
Ready for the next section!
|
||||
```
|
||||
|
||||
## Writing Workflows
|
||||
|
||||
### Blog Post Workflow
|
||||
1. Outline together
|
||||
2. Research key points
|
||||
3. Write introduction → get feedback
|
||||
4. Write body sections → feedback each
|
||||
5. Write conclusion → final review
|
||||
6. Polish and edit
|
||||
|
||||
### Newsletter Workflow
|
||||
1. Discuss hook ideas
|
||||
2. Quick outline (shorter format)
|
||||
3. Draft in one session
|
||||
4. Review for clarity and links
|
||||
5. Quick polish
|
||||
|
||||
### Technical Tutorial Workflow
|
||||
1. Outline steps
|
||||
2. Write code examples
|
||||
3. Add explanations
|
||||
4. Test instructions
|
||||
5. Add troubleshooting section
|
||||
6. Final review for accuracy
|
||||
|
||||
### Thought Leadership Workflow
|
||||
1. Brainstorm unique angle
|
||||
2. Research existing perspectives
|
||||
3. Develop your thesis
|
||||
4. Write with strong POV
|
||||
5. Add supporting evidence
|
||||
6. Craft compelling conclusion
|
||||
|
||||
## Pro Tips
|
||||
|
||||
1. **Work in VS Code**: Better than web Claude for long-form writing
|
||||
2. **One section at a time**: Get feedback incrementally
|
||||
3. **Save research separately**: Keep a research.md file
|
||||
4. **Version your drafts**: article-v1.md, article-v2.md, etc.
|
||||
5. **Read aloud**: Use feedback to identify clunky sentences
|
||||
6. **Set deadlines**: "I want to finish the draft today"
|
||||
7. **Take breaks**: Write, get feedback, pause, revise
|
||||
|
||||
## File Organization
|
||||
|
||||
Recommended structure for writing projects:
|
||||
|
||||
```
|
||||
~/writing/article-name/
|
||||
├── outline.md # Your outline
|
||||
├── research.md # All research and citations
|
||||
├── draft-v1.md # First draft
|
||||
├── draft-v2.md # Revised draft
|
||||
├── final.md # Publication-ready
|
||||
├── feedback.md # Collected feedback
|
||||
└── sources/ # Reference materials
|
||||
├── study1.pdf
|
||||
└── article2.md
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Research
|
||||
- Verify sources before citing
|
||||
- Use recent data when possible
|
||||
- Balance different perspectives
|
||||
- Link to original sources
|
||||
|
||||
### For Feedback
|
||||
- Be specific about what you want: "Is this too technical?"
|
||||
- Share your concerns: "I'm worried this section drags"
|
||||
- Ask questions: "Does this flow logically?"
|
||||
- Request alternatives: "What's another way to explain this?"
|
||||
|
||||
### For Voice
|
||||
- Share examples of your writing
|
||||
- Specify tone preferences
|
||||
- Point out good matches: "That sounds like me!"
|
||||
- Flag mismatches: "Too formal for my style"
|
||||
|
||||
## Related Use Cases
|
||||
|
||||
- Creating social media posts from articles
|
||||
- Adapting content for different audiences
|
||||
- Writing email newsletters
|
||||
- Drafting technical documentation
|
||||
- Creating presentation content
|
||||
- Writing case studies
|
||||
- Developing course outlines
|
||||
|
||||
328
skills/ship-learn-next/SKILL.md
Normal file
328
skills/ship-learn-next/SKILL.md
Normal file
@@ -0,0 +1,328 @@
|
||||
---
|
||||
name: ship-learn-next
|
||||
description: Transform learning content (like YouTube transcripts, articles, tutorials) into actionable implementation plans using the Ship-Learn-Next framework. Use when user wants to turn advice, lessons, or educational content into concrete action steps, reps, or a learning quest.
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Write
|
||||
---
|
||||
|
||||
# Ship-Learn-Next Action Planner
|
||||
|
||||
This skill helps transform passive learning content into actionable **Ship-Learn-Next cycles** - turning advice and lessons into concrete, shippable iterations.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Activate when the user:
|
||||
- Has a transcript/article/tutorial and wants to "implement the advice"
|
||||
- Asks to "turn this into a plan" or "make this actionable"
|
||||
- Wants to extract implementation steps from educational content
|
||||
- Needs help breaking down big ideas into small, shippable reps
|
||||
- Says things like "I watched/read X, now what should I do?"
|
||||
|
||||
## Core Framework: Ship-Learn-Next
|
||||
|
||||
Every learning quest follows three repeating phases:
|
||||
|
||||
1. **SHIP** - Create something real (code, content, product, demonstration)
|
||||
2. **LEARN** - Honest reflection on what happened
|
||||
3. **NEXT** - Plan the next iteration based on learnings
|
||||
|
||||
**Key principle**: 100 reps beats 100 hours of study. Learning = doing better, not knowing more.
|
||||
|
||||
## How This Skill Works
|
||||
|
||||
### Step 1: Read the Content
|
||||
|
||||
Read the file the user provides (transcript, article, notes):
|
||||
|
||||
```bash
|
||||
# User provides path to file
|
||||
FILE_PATH="/path/to/content.txt"
|
||||
```
|
||||
|
||||
Use the Read tool to analyze the content.
|
||||
|
||||
### Step 2: Extract Core Lessons
|
||||
|
||||
Identify from the content:
|
||||
- **Main advice/lessons**: What are the key takeaways?
|
||||
- **Actionable principles**: What can actually be practiced?
|
||||
- **Skills being taught**: What would someone learn by doing this?
|
||||
- **Examples/case studies**: Real implementations mentioned
|
||||
|
||||
**Do NOT**:
|
||||
- Summarize everything (focus on actionable parts)
|
||||
- List theory without application
|
||||
- Include "nice to know" vs "need to practice"
|
||||
|
||||
### Step 3: Define the Quest
|
||||
|
||||
Help the user frame their learning goal:
|
||||
|
||||
Ask:
|
||||
1. "Based on this content, what do you want to achieve in 4-8 weeks?"
|
||||
2. "What would success look like? (Be specific)"
|
||||
3. "What's something concrete you could build/create/ship?"
|
||||
|
||||
**Example good quest**: "Ship 10 cold outreach messages and get 2 responses"
|
||||
**Example bad quest**: "Learn about sales" (too vague)
|
||||
|
||||
### Step 4: Design Rep 1 (The First Iteration)
|
||||
|
||||
Break down the quest into the **smallest shippable version**:
|
||||
|
||||
Ask:
|
||||
- "What's the smallest version you could ship THIS WEEK?"
|
||||
- "What do you need to learn JUST to do that?" (not everything)
|
||||
- "What would 'done' look like for rep 1?"
|
||||
|
||||
**Make it:**
|
||||
- Concrete and specific
|
||||
- Completable in 1-7 days
|
||||
- Produces real evidence/artifact
|
||||
- Small enough to not be intimidating
|
||||
- Big enough to learn something meaningful
|
||||
|
||||
### Step 5: Create the Rep Plan
|
||||
|
||||
Structure each rep with:
|
||||
|
||||
```markdown
|
||||
## Rep 1: [Specific Goal]
|
||||
|
||||
**Ship Goal**: [What you'll create/do]
|
||||
**Success Criteria**: [How you'll know it's done]
|
||||
**What You'll Learn**: [Specific skills/insights]
|
||||
**Resources Needed**: [Minimal - just what's needed for THIS rep]
|
||||
**Timeline**: [Specific deadline]
|
||||
|
||||
**Action Steps**:
|
||||
1. [Concrete step 1]
|
||||
2. [Concrete step 2]
|
||||
3. [Concrete step 3]
|
||||
...
|
||||
|
||||
**After Shipping - Reflection Questions**:
|
||||
- What actually happened? (Be specific)
|
||||
- What worked? What didn't?
|
||||
- What surprised you?
|
||||
- On a scale of 1-10, how did this rep go?
|
||||
- What would you do differently next time?
|
||||
```
|
||||
|
||||
### Step 6: Map Future Reps (2-5)
|
||||
|
||||
Based on the content, suggest a progression:
|
||||
|
||||
```markdown
|
||||
## Rep 2: [Next level]
|
||||
**Builds on**: What you learned in Rep 1
|
||||
**New challenge**: One new thing to try/improve
|
||||
**Expected difficulty**: [Easier/Same/Harder - and why]
|
||||
|
||||
## Rep 3: [Continue progression]
|
||||
...
|
||||
```
|
||||
|
||||
**Progression principles**:
|
||||
- Each rep adds ONE new element
|
||||
- Increase difficulty based on success
|
||||
- Reference specific lessons from the content
|
||||
- Keep reps shippable (not theoretical)
|
||||
|
||||
### Step 7: Connect to Content
|
||||
|
||||
For each rep, reference the source material:
|
||||
|
||||
- "This implements the [concept] from minute X"
|
||||
- "You're practicing the [technique] mentioned in the video"
|
||||
- "This tests the advice about [topic]"
|
||||
|
||||
**But**: Always emphasize DOING over studying. Point to resources only when needed for the specific rep.
|
||||
|
||||
## Conversation Style
|
||||
|
||||
**Direct but supportive**:
|
||||
- No fluff, but encouraging
|
||||
- "Ship it, then we'll improve it"
|
||||
- "What's the smallest version you could do this week?"
|
||||
|
||||
**Question-driven**:
|
||||
- Make them think, don't just tell
|
||||
- "What exactly do you want to achieve?" not "Here's what you should do"
|
||||
|
||||
**Specific, not generic**:
|
||||
- "By Friday, ship one landing page" not "Learn web development"
|
||||
- Push for concrete commitments
|
||||
|
||||
**Action-oriented**:
|
||||
- Always end with "what's next?"
|
||||
- Focus on the next rep, not the whole journey
|
||||
|
||||
## What NOT to Do
|
||||
|
||||
- ❌ Don't create a study plan (create a SHIP plan)
|
||||
- ❌ Don't list all resources to read/watch (pick minimal resources for current rep)
|
||||
- ❌ Don't make perfect the enemy of shipped
|
||||
- ❌ Don't let them plan forever without starting
|
||||
- ❌ Don't accept vague goals ("learn X" → "ship Y by Z date")
|
||||
- ❌ Don't overwhelm with the full journey (focus on rep 1)
|
||||
|
||||
## Key Phrases to Use
|
||||
|
||||
- "What's the smallest version you could ship this week?"
|
||||
- "What do you need to learn JUST to do that?"
|
||||
- "This isn't about perfection - it's rep 1 of 100"
|
||||
- "Ship something real, then we'll improve it"
|
||||
- "Based on [content], what would you actually DO differently?"
|
||||
- "Learning = doing better, not knowing more"
|
||||
|
||||
## Example Output Structure
|
||||
|
||||
```markdown
|
||||
# Your Ship-Learn-Next Quest: [Title]
|
||||
|
||||
## Quest Overview
|
||||
**Goal**: [What they want to achieve in 4-8 weeks]
|
||||
**Source**: [The content that inspired this]
|
||||
**Core Lessons**: [3-5 key actionable takeaways from content]
|
||||
|
||||
---
|
||||
|
||||
## Rep 1: [Specific, Shippable Goal]
|
||||
|
||||
**Ship Goal**: [Concrete deliverable]
|
||||
**Timeline**: [This week / By [date]]
|
||||
**Success Criteria**:
|
||||
- [ ] [Specific thing 1]
|
||||
- [ ] [Specific thing 2]
|
||||
- [ ] [Specific thing 3]
|
||||
|
||||
**What You'll Practice** (from the content):
|
||||
- [Skill/concept 1 from source material]
|
||||
- [Skill/concept 2 from source material]
|
||||
|
||||
**Action Steps**:
|
||||
1. [Concrete step]
|
||||
2. [Concrete step]
|
||||
3. [Concrete step]
|
||||
4. Ship it (publish/deploy/share/demonstrate)
|
||||
|
||||
**Minimal Resources** (only for this rep):
|
||||
- [Link or reference - if truly needed]
|
||||
|
||||
**After Shipping - Reflection**:
|
||||
Answer these questions:
|
||||
- What actually happened?
|
||||
- What worked? What didn't?
|
||||
- What surprised you?
|
||||
- Rate this rep: _/10
|
||||
- What's one thing to try differently next time?
|
||||
|
||||
---
|
||||
|
||||
## Rep 2: [Next Iteration]
|
||||
|
||||
**Builds on**: Rep 1 + [what you learned]
|
||||
**New element**: [One new challenge/skill]
|
||||
**Ship goal**: [Next concrete deliverable]
|
||||
|
||||
[Similar structure...]
|
||||
|
||||
---
|
||||
|
||||
## Rep 3-5: Future Path
|
||||
|
||||
**Rep 3**: [Brief description]
|
||||
**Rep 4**: [Brief description]
|
||||
**Rep 5**: [Brief description]
|
||||
|
||||
*(Details will evolve based on what you learn in Reps 1-2)*
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
- This is about DOING, not studying
|
||||
- Aim for 100 reps over time (not perfection on rep 1)
|
||||
- Each rep = Plan → Do → Reflect → Next
|
||||
- You learn by shipping, not by consuming
|
||||
|
||||
**Ready to ship Rep 1?**
|
||||
```
|
||||
|
||||
## Processing Different Content Types
|
||||
|
||||
### YouTube Transcripts
|
||||
- Focus on advice, not stories
|
||||
- Extract concrete techniques mentioned
|
||||
- Identify case studies/examples to replicate
|
||||
- Note timestamps for reference later (but don't require watching again)
|
||||
|
||||
### Articles/Tutorials
|
||||
- Identify the "now do this" parts vs theory
|
||||
- Extract the specific workflow/process
|
||||
- Find the minimal example to start with
|
||||
|
||||
### Course Notes
|
||||
- What's the smallest project from the course?
|
||||
- Which modules are needed for rep 1? (ignore the rest for now)
|
||||
- What can be practiced immediately?
|
||||
|
||||
## Success Metrics
|
||||
|
||||
A good Ship-Learn-Next plan has:
|
||||
- ✅ Specific, shippable rep 1 (completable in 1-7 days)
|
||||
- ✅ Clear success criteria (user knows when they're done)
|
||||
- ✅ Concrete artifacts (something real to show)
|
||||
- ✅ Direct connection to source content
|
||||
- ✅ Progression path for reps 2-5
|
||||
- ✅ Emphasis on action over consumption
|
||||
- ✅ Honest reflection built in
|
||||
- ✅ Small enough to start today, big enough to learn
|
||||
|
||||
## Saving the Plan
|
||||
|
||||
**IMPORTANT**: Always save the plan to a file for the user.
|
||||
|
||||
### Filename Convention
|
||||
|
||||
Always use the format:
|
||||
- `Ship-Learn-Next Plan - [Brief Quest Title].md`
|
||||
|
||||
Examples:
|
||||
- `Ship-Learn-Next Plan - Build in Proven Markets.md`
|
||||
- `Ship-Learn-Next Plan - Learn React.md`
|
||||
- `Ship-Learn-Next Plan - Cold Email Outreach.md`
|
||||
|
||||
**Quest title should be**:
|
||||
- Brief (3-6 words)
|
||||
- Descriptive of the main goal
|
||||
- Based on the content's core lesson/theme
|
||||
|
||||
### What to Save
|
||||
|
||||
**Complete plan including**:
|
||||
- Quest overview with goal and source
|
||||
- All reps (1-5) with full details
|
||||
- Action steps and reflection questions
|
||||
- Timeline commitments
|
||||
- Reference to source material
|
||||
|
||||
**Format**: Always save as Markdown (`.md`) for readability
|
||||
|
||||
## After Creating the Plan
|
||||
|
||||
**Display to user**:
|
||||
1. Show them you've saved the plan: "✓ Saved to: [filename]"
|
||||
2. Give a brief overview of the quest
|
||||
3. Highlight Rep 1 (what's due this week)
|
||||
|
||||
**Then ask**:
|
||||
1. "When will you ship Rep 1?"
|
||||
2. "What's the one thing that might stop you? How will you handle it?"
|
||||
3. "Come back after you ship and we'll reflect + plan Rep 2"
|
||||
|
||||
**Remember**: You're not creating a curriculum. You're helping them ship something real, learn from it, and ship the next thing.
|
||||
|
||||
Let's help them ship.
|
||||
485
skills/tapestry/SKILL.md
Normal file
485
skills/tapestry/SKILL.md
Normal file
@@ -0,0 +1,485 @@
|
||||
---
|
||||
name: tapestry
|
||||
description: Unified content extraction and action planning. Use when user says "tapestry <URL>", "weave <URL>", "help me plan <URL>", "extract and plan <URL>", "make this actionable <URL>", or similar phrases indicating they want to extract content and create an action plan. Automatically detects content type (YouTube video, article, PDF) and processes accordingly.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
- Write
|
||||
---
|
||||
|
||||
# Tapestry: Unified Content Extraction + Action Planning
|
||||
|
||||
This is the **master skill** that orchestrates the entire Tapestry workflow:
|
||||
1. Detect content type from URL
|
||||
2. Extract content using appropriate skill
|
||||
3. Automatically create a Ship-Learn-Next action plan
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Activate when the user:
|
||||
- Says "tapestry [URL]"
|
||||
- Says "weave [URL]"
|
||||
- Says "help me plan [URL]"
|
||||
- Says "extract and plan [URL]"
|
||||
- Says "make this actionable [URL]"
|
||||
- Says "turn [URL] into a plan"
|
||||
- Provides a URL and asks to "learn and implement from this"
|
||||
- Wants the full Tapestry workflow (extract → plan)
|
||||
|
||||
**Keywords to watch for**: tapestry, weave, plan, actionable, extract and plan, make a plan, turn into action
|
||||
|
||||
## How It Works
|
||||
|
||||
### Complete Workflow:
|
||||
1. **Detect URL type** (YouTube, article, PDF)
|
||||
2. **Extract content** using appropriate skill:
|
||||
- YouTube → youtube-transcript skill
|
||||
- Article → article-extractor skill
|
||||
- PDF → download and extract text
|
||||
3. **Create action plan** using ship-learn-next skill
|
||||
4. **Save both** content file and plan file
|
||||
5. **Present summary** to user
|
||||
|
||||
## URL Detection Logic
|
||||
|
||||
### YouTube Videos
|
||||
|
||||
**Patterns to detect:**
|
||||
- `youtube.com/watch?v=`
|
||||
- `youtu.be/`
|
||||
- `youtube.com/shorts/`
|
||||
- `m.youtube.com/watch?v=`
|
||||
|
||||
**Action:** Use youtube-transcript skill
|
||||
|
||||
### Web Articles/Blog Posts
|
||||
|
||||
**Patterns to detect:**
|
||||
- `http://` or `https://`
|
||||
- NOT YouTube, NOT PDF
|
||||
- Common domains: medium.com, substack.com, dev.to, etc.
|
||||
- Any HTML page
|
||||
|
||||
**Action:** Use article-extractor skill
|
||||
|
||||
### PDF Documents
|
||||
|
||||
**Patterns to detect:**
|
||||
- URL ends with `.pdf`
|
||||
- URL returns `Content-Type: application/pdf`
|
||||
|
||||
**Action:** Download and extract text
|
||||
|
||||
### Other Content
|
||||
|
||||
**Fallback:**
|
||||
- Try article-extractor (works for most HTML)
|
||||
- If fails, inform user of unsupported type
|
||||
|
||||
## Step-by-Step Workflow
|
||||
|
||||
### Step 1: Detect Content Type
|
||||
|
||||
```bash
|
||||
URL="$1"
|
||||
|
||||
# Check for YouTube
|
||||
if [[ "$URL" =~ youtube\.com/watch || "$URL" =~ youtu\.be/ || "$URL" =~ youtube\.com/shorts ]]; then
|
||||
CONTENT_TYPE="youtube"
|
||||
|
||||
# Check for PDF
|
||||
elif [[ "$URL" =~ \.pdf$ ]]; then
|
||||
CONTENT_TYPE="pdf"
|
||||
|
||||
# Check if URL returns PDF
|
||||
elif curl -sI "$URL" | grep -i "Content-Type: application/pdf" > /dev/null; then
|
||||
CONTENT_TYPE="pdf"
|
||||
|
||||
# Default to article
|
||||
else
|
||||
CONTENT_TYPE="article"
|
||||
fi
|
||||
|
||||
echo "📍 Detected: $CONTENT_TYPE"
|
||||
```
|
||||
|
||||
### Step 2: Extract Content (by Type)
|
||||
|
||||
#### YouTube Video
|
||||
|
||||
```bash
|
||||
# Use youtube-transcript skill workflow
|
||||
echo "📺 Extracting YouTube transcript..."
|
||||
|
||||
# 1. Check for yt-dlp
|
||||
if ! command -v yt-dlp &> /dev/null; then
|
||||
echo "Installing yt-dlp..."
|
||||
brew install yt-dlp
|
||||
fi
|
||||
|
||||
# 2. Get video title
|
||||
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
|
||||
|
||||
# 3. Download transcript
|
||||
yt-dlp --write-auto-sub --skip-download --sub-langs en --output "temp_transcript" "$URL"
|
||||
|
||||
# 4. Convert to clean text (deduplicate)
|
||||
python3 -c "
|
||||
import sys, re
|
||||
seen = set()
|
||||
vtt_file = 'temp_transcript.en.vtt'
|
||||
try:
|
||||
with open(vtt_file, 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
|
||||
clean = re.sub('<[^>]*>', '', line)
|
||||
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
|
||||
if clean and clean not in seen:
|
||||
print(clean)
|
||||
seen.add(clean)
|
||||
except FileNotFoundError:
|
||||
print('Error: Could not find transcript file', file=sys.stderr)
|
||||
sys.exit(1)
|
||||
" > "${VIDEO_TITLE}.txt"
|
||||
|
||||
# 5. Cleanup
|
||||
rm -f temp_transcript.en.vtt
|
||||
|
||||
CONTENT_FILE="${VIDEO_TITLE}.txt"
|
||||
echo "✓ Saved transcript: $CONTENT_FILE"
|
||||
```
|
||||
|
||||
#### Article/Blog Post
|
||||
|
||||
```bash
|
||||
# Use article-extractor skill workflow
|
||||
echo "📄 Extracting article content..."
|
||||
|
||||
# 1. Check for extraction tools
|
||||
if command -v reader &> /dev/null; then
|
||||
TOOL="reader"
|
||||
elif command -v trafilatura &> /dev/null; then
|
||||
TOOL="trafilatura"
|
||||
else
|
||||
TOOL="fallback"
|
||||
fi
|
||||
|
||||
echo "Using: $TOOL"
|
||||
|
||||
# 2. Extract based on tool
|
||||
case $TOOL in
|
||||
reader)
|
||||
reader "$URL" > temp_article.txt
|
||||
ARTICLE_TITLE=$(head -n 1 temp_article.txt | sed 's/^# //')
|
||||
;;
|
||||
|
||||
trafilatura)
|
||||
METADATA=$(trafilatura --URL "$URL" --json)
|
||||
ARTICLE_TITLE=$(echo "$METADATA" | python3 -c "import json, sys; print(json.load(sys.stdin).get('title', 'Article'))")
|
||||
trafilatura --URL "$URL" --output-format txt --no-comments > temp_article.txt
|
||||
;;
|
||||
|
||||
fallback)
|
||||
ARTICLE_TITLE=$(curl -s "$URL" | grep -oP '<title>\K[^<]+' | head -n 1)
|
||||
ARTICLE_TITLE=${ARTICLE_TITLE%% - *}
|
||||
curl -s "$URL" | python3 -c "
|
||||
from html.parser import HTMLParser
|
||||
import sys
|
||||
|
||||
class ArticleExtractor(HTMLParser):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.content = []
|
||||
self.skip_tags = {'script', 'style', 'nav', 'header', 'footer', 'aside', 'form'}
|
||||
self.in_content = False
|
||||
|
||||
def handle_starttag(self, tag, attrs):
|
||||
if tag not in self.skip_tags and tag in {'p', 'article', 'main'}:
|
||||
self.in_content = True
|
||||
|
||||
def handle_data(self, data):
|
||||
if self.in_content and data.strip():
|
||||
self.content.append(data.strip())
|
||||
|
||||
def get_content(self):
|
||||
return '\n\n'.join(self.content)
|
||||
|
||||
parser = ArticleExtractor()
|
||||
parser.feed(sys.stdin.read())
|
||||
print(parser.get_content())
|
||||
" > temp_article.txt
|
||||
;;
|
||||
esac
|
||||
|
||||
# 3. Clean filename
|
||||
FILENAME=$(echo "$ARTICLE_TITLE" | tr '/' '-' | tr ':' '-' | tr '?' '' | tr '"' '' | cut -c 1-80 | sed 's/ *$//')
|
||||
CONTENT_FILE="${FILENAME}.txt"
|
||||
mv temp_article.txt "$CONTENT_FILE"
|
||||
|
||||
echo "✓ Saved article: $CONTENT_FILE"
|
||||
```
|
||||
|
||||
#### PDF Document
|
||||
|
||||
```bash
|
||||
# Download and extract PDF
|
||||
echo "📑 Downloading PDF..."
|
||||
|
||||
# 1. Download PDF
|
||||
PDF_FILENAME=$(basename "$URL")
|
||||
curl -L -o "$PDF_FILENAME" "$URL"
|
||||
|
||||
# 2. Extract text using pdftotext (if available)
|
||||
if command -v pdftotext &> /dev/null; then
|
||||
pdftotext "$PDF_FILENAME" temp_pdf.txt
|
||||
CONTENT_FILE="${PDF_FILENAME%.pdf}.txt"
|
||||
mv temp_pdf.txt "$CONTENT_FILE"
|
||||
echo "✓ Extracted text from PDF: $CONTENT_FILE"
|
||||
|
||||
# Optionally keep PDF
|
||||
echo "Keep original PDF? (y/n)"
|
||||
read -r KEEP_PDF
|
||||
if [[ ! "$KEEP_PDF" =~ ^[Yy]$ ]]; then
|
||||
rm "$PDF_FILENAME"
|
||||
fi
|
||||
else
|
||||
# No pdftotext available
|
||||
echo "⚠️ pdftotext not found. PDF downloaded but not extracted."
|
||||
echo " Install with: brew install poppler"
|
||||
CONTENT_FILE="$PDF_FILENAME"
|
||||
fi
|
||||
```
|
||||
|
||||
### Step 3: Create Ship-Learn-Next Action Plan
|
||||
|
||||
**IMPORTANT**: Always create an action plan after extracting content.
|
||||
|
||||
```bash
|
||||
# Read the extracted content
|
||||
CONTENT_FILE="[from previous step]"
|
||||
|
||||
# Invoke ship-learn-next skill logic:
|
||||
# 1. Read the content file
|
||||
# 2. Extract core actionable lessons
|
||||
# 3. Create 5-rep progression plan
|
||||
# 4. Save as: Ship-Learn-Next Plan - [Quest Title].md
|
||||
|
||||
# See ship-learn-next/SKILL.md for full details
|
||||
```
|
||||
|
||||
**Key points for plan creation:**
|
||||
- Extract actionable lessons (not just summaries)
|
||||
- Define a specific 4-8 week quest
|
||||
- Create Rep 1 (shippable this week)
|
||||
- Design Reps 2-5 (progressive iterations)
|
||||
- Save plan to markdown file
|
||||
- Use format: `Ship-Learn-Next Plan - [Brief Quest Title].md`
|
||||
|
||||
### Step 4: Present Results
|
||||
|
||||
Show user:
|
||||
```
|
||||
✅ Tapestry Workflow Complete!
|
||||
|
||||
📥 Content Extracted:
|
||||
✓ [Content type]: [Title]
|
||||
✓ Saved to: [filename.txt]
|
||||
✓ [X] words extracted
|
||||
|
||||
📋 Action Plan Created:
|
||||
✓ Quest: [Quest title]
|
||||
✓ Saved to: Ship-Learn-Next Plan - [Title].md
|
||||
|
||||
🎯 Your Quest: [One-line summary]
|
||||
|
||||
📍 Rep 1 (This Week): [Rep 1 goal]
|
||||
|
||||
When will you ship Rep 1?
|
||||
```
|
||||
|
||||
## Complete Tapestry Workflow Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Tapestry: Extract content + create action plan
|
||||
# Usage: tapestry <URL>
|
||||
|
||||
URL="$1"
|
||||
|
||||
if [ -z "$URL" ]; then
|
||||
echo "Usage: tapestry <URL>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "🧵 Tapestry Workflow Starting..."
|
||||
echo "URL: $URL"
|
||||
echo ""
|
||||
|
||||
# Step 1: Detect content type
|
||||
if [[ "$URL" =~ youtube\.com/watch || "$URL" =~ youtu\.be/ || "$URL" =~ youtube\.com/shorts ]]; then
|
||||
CONTENT_TYPE="youtube"
|
||||
elif [[ "$URL" =~ \.pdf$ ]] || curl -sI "$URL" | grep -iq "Content-Type: application/pdf"; then
|
||||
CONTENT_TYPE="pdf"
|
||||
else
|
||||
CONTENT_TYPE="article"
|
||||
fi
|
||||
|
||||
echo "📍 Detected: $CONTENT_TYPE"
|
||||
echo ""
|
||||
|
||||
# Step 2: Extract content
|
||||
case $CONTENT_TYPE in
|
||||
youtube)
|
||||
echo "📺 Extracting YouTube transcript..."
|
||||
# [YouTube extraction code from above]
|
||||
;;
|
||||
|
||||
article)
|
||||
echo "📄 Extracting article..."
|
||||
# [Article extraction code from above]
|
||||
;;
|
||||
|
||||
pdf)
|
||||
echo "📑 Downloading PDF..."
|
||||
# [PDF extraction code from above]
|
||||
;;
|
||||
esac
|
||||
|
||||
echo ""
|
||||
|
||||
# Step 3: Create action plan
|
||||
echo "🚀 Creating Ship-Learn-Next action plan..."
|
||||
# [Plan creation using ship-learn-next skill]
|
||||
|
||||
echo ""
|
||||
echo "✅ Tapestry Workflow Complete!"
|
||||
echo ""
|
||||
echo "📥 Content: $CONTENT_FILE"
|
||||
echo "📋 Plan: Ship-Learn-Next Plan - [title].md"
|
||||
echo ""
|
||||
echo "🎯 Next: Review your action plan and ship Rep 1!"
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Issues:
|
||||
|
||||
**1. Unsupported URL type**
|
||||
- Try article extraction as fallback
|
||||
- If fails: "Could not extract content from this URL type"
|
||||
|
||||
**2. No content extracted**
|
||||
- Check if URL is accessible
|
||||
- Try alternate extraction method
|
||||
- Inform user: "Extraction failed. URL may require authentication."
|
||||
|
||||
**3. Tools not installed**
|
||||
- Auto-install when possible (yt-dlp, reader, trafilatura)
|
||||
- Provide install instructions if auto-install fails
|
||||
- Use fallback methods when available
|
||||
|
||||
**4. Empty or invalid content**
|
||||
- Verify file has content before creating plan
|
||||
- Don't create plan if extraction failed
|
||||
- Show preview to user before planning
|
||||
|
||||
## Best Practices
|
||||
|
||||
- ✅ Always show what was detected ("📍 Detected: youtube")
|
||||
- ✅ Display progress for each step
|
||||
- ✅ Save both content file AND plan file
|
||||
- ✅ Show preview of extracted content (first 10 lines)
|
||||
- ✅ Create plan automatically (don't ask)
|
||||
- ✅ Present clear summary at end
|
||||
- ✅ Ask commitment question: "When will you ship Rep 1?"
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: YouTube Video (using "tapestry")
|
||||
|
||||
```
|
||||
User: tapestry https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
||||
|
||||
Claude:
|
||||
🧵 Tapestry Workflow Starting...
|
||||
📍 Detected: youtube
|
||||
📺 Extracting YouTube transcript...
|
||||
✓ Saved transcript: Never Gonna Give You Up.txt
|
||||
|
||||
🚀 Creating action plan...
|
||||
✓ Quest: Master Video Production
|
||||
✓ Saved plan: Ship-Learn-Next Plan - Master Video Production.md
|
||||
|
||||
✅ Complete! When will you ship Rep 1?
|
||||
```
|
||||
|
||||
### Example 2: Article (using "weave")
|
||||
|
||||
```
|
||||
User: weave https://example.com/how-to-build-saas
|
||||
|
||||
Claude:
|
||||
🧵 Tapestry Workflow Starting...
|
||||
📍 Detected: article
|
||||
📄 Extracting article...
|
||||
✓ Using reader (Mozilla Readability)
|
||||
✓ Saved article: How to Build a SaaS.txt
|
||||
|
||||
🚀 Creating action plan...
|
||||
✓ Quest: Build a SaaS MVP
|
||||
✓ Saved plan: Ship-Learn-Next Plan - Build a SaaS MVP.md
|
||||
|
||||
✅ Complete! When will you ship Rep 1?
|
||||
```
|
||||
|
||||
### Example 3: PDF (using "help me plan")
|
||||
|
||||
```
|
||||
User: help me plan https://example.com/research-paper.pdf
|
||||
|
||||
Claude:
|
||||
🧵 Tapestry Workflow Starting...
|
||||
📍 Detected: pdf
|
||||
📑 Downloading PDF...
|
||||
✓ Downloaded: research-paper.pdf
|
||||
✓ Extracted text: research-paper.txt
|
||||
|
||||
🚀 Creating action plan...
|
||||
✓ Quest: Apply Research Findings
|
||||
✓ Saved plan: Ship-Learn-Next Plan - Apply Research Findings.md
|
||||
|
||||
✅ Complete! When will you ship Rep 1?
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
This skill orchestrates the other skills, so requires:
|
||||
|
||||
**For YouTube:**
|
||||
- yt-dlp (auto-installed)
|
||||
- Python 3 (for deduplication)
|
||||
|
||||
**For Articles:**
|
||||
- reader (npm) OR trafilatura (pip)
|
||||
- Falls back to basic curl if neither available
|
||||
|
||||
**For PDFs:**
|
||||
- curl (built-in)
|
||||
- pdftotext (optional - from poppler package)
|
||||
- Install: `brew install poppler` (macOS)
|
||||
- Install: `apt install poppler-utils` (Linux)
|
||||
|
||||
**For Planning:**
|
||||
- No additional requirements (uses built-in tools)
|
||||
|
||||
## Philosophy
|
||||
|
||||
**Tapestry weaves learning content into action.**
|
||||
|
||||
The unified workflow ensures you never just consume content - you always create an implementation plan. This transforms passive learning into active building.
|
||||
|
||||
Extract → Plan → Ship → Learn → Next.
|
||||
|
||||
That's the Tapestry way.
|
||||
418
skills/youtube-transcript/SKILL.md
Normal file
418
skills/youtube-transcript/SKILL.md
Normal file
@@ -0,0 +1,418 @@
|
||||
---
|
||||
name: youtube-transcript
|
||||
description: Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
- Write
|
||||
---
|
||||
|
||||
# YouTube Transcript Downloader
|
||||
|
||||
This skill helps download transcripts (subtitles/captions) from YouTube videos using yt-dlp.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Activate this skill when the user:
|
||||
- Provides a YouTube URL and wants the transcript
|
||||
- Asks to "download transcript from YouTube"
|
||||
- Wants to "get captions" or "get subtitles" from a video
|
||||
- Asks to "transcribe a YouTube video"
|
||||
- Needs text content from a YouTube video
|
||||
|
||||
## How It Works
|
||||
|
||||
### Priority Order:
|
||||
1. **Check if yt-dlp is installed** - install if needed
|
||||
2. **List available subtitles** - see what's actually available
|
||||
3. **Try manual subtitles first** (`--write-sub`) - highest quality
|
||||
4. **Fallback to auto-generated** (`--write-auto-sub`) - usually available
|
||||
5. **Last resort: Whisper transcription** - if no subtitles exist (requires user confirmation)
|
||||
6. **Confirm the download** and show the user where the file is saved
|
||||
7. **Optionally clean up** the VTT format if the user wants plain text
|
||||
|
||||
## Installation Check
|
||||
|
||||
**IMPORTANT**: Always check if yt-dlp is installed first:
|
||||
|
||||
```bash
|
||||
which yt-dlp || command -v yt-dlp
|
||||
```
|
||||
|
||||
### If Not Installed
|
||||
|
||||
Attempt automatic installation based on the system:
|
||||
|
||||
**macOS (Homebrew)**:
|
||||
```bash
|
||||
brew install yt-dlp
|
||||
```
|
||||
|
||||
**Linux (apt/Debian/Ubuntu)**:
|
||||
```bash
|
||||
sudo apt update && sudo apt install -y yt-dlp
|
||||
```
|
||||
|
||||
**Alternative (pip - works on all systems)**:
|
||||
```bash
|
||||
pip3 install yt-dlp
|
||||
# or
|
||||
python3 -m pip install yt-dlp
|
||||
```
|
||||
|
||||
**If installation fails**: Inform the user they need to install yt-dlp manually and provide them with installation instructions from https://github.com/yt-dlp/yt-dlp#installation
|
||||
|
||||
## Check Available Subtitles
|
||||
|
||||
**ALWAYS do this first** before attempting to download:
|
||||
|
||||
```bash
|
||||
yt-dlp --list-subs "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
This shows what subtitle types are available without downloading anything. Look for:
|
||||
- Manual subtitles (better quality)
|
||||
- Auto-generated subtitles (usually available)
|
||||
- Available languages
|
||||
|
||||
## Download Strategy
|
||||
|
||||
### Option 1: Manual Subtitles (Preferred)
|
||||
|
||||
Try this first - highest quality, human-created:
|
||||
|
||||
```bash
|
||||
yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
### Option 2: Auto-Generated Subtitles (Fallback)
|
||||
|
||||
If manual subtitles aren't available:
|
||||
|
||||
```bash
|
||||
yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
Both commands create a `.vtt` file (WebVTT subtitle format).
|
||||
|
||||
## Option 3: Whisper Transcription (Last Resort)
|
||||
|
||||
**ONLY use this if both manual and auto-generated subtitles are unavailable.**
|
||||
|
||||
### Step 1: Show File Size and Ask for Confirmation
|
||||
|
||||
```bash
|
||||
# Get audio file size estimate
|
||||
yt-dlp --print "%(filesize,filesize_approx)s" -f "bestaudio" "YOUTUBE_URL"
|
||||
|
||||
# Or get duration to estimate
|
||||
yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
**IMPORTANT**: Display the file size to the user and ask: "No subtitles are available. I can download the audio (approximately X MB) and transcribe it using Whisper. Would you like to proceed?"
|
||||
|
||||
**Wait for user confirmation before continuing.**
|
||||
|
||||
### Step 2: Check for Whisper Installation
|
||||
|
||||
```bash
|
||||
command -v whisper
|
||||
```
|
||||
|
||||
If not installed, ask user: "Whisper is not installed. Install it with `pip install openai-whisper` (requires ~1-3GB for models)? This is a one-time installation."
|
||||
|
||||
**Wait for user confirmation before installing.**
|
||||
|
||||
Install if approved:
|
||||
```bash
|
||||
pip3 install openai-whisper
|
||||
```
|
||||
|
||||
### Step 3: Download Audio Only
|
||||
|
||||
```bash
|
||||
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
### Step 4: Transcribe with Whisper
|
||||
|
||||
```bash
|
||||
# Auto-detect language (recommended)
|
||||
whisper audio_VIDEO_ID.mp3 --model base --output_format vtt
|
||||
|
||||
# Or specify language if known
|
||||
whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt
|
||||
```
|
||||
|
||||
**Model Options** (stick to `base` for now):
|
||||
- `tiny` - fastest, least accurate (~1GB)
|
||||
- `base` - good balance (~1GB) ← **USE THIS**
|
||||
- `small` - better accuracy (~2GB)
|
||||
- `medium` - very good (~5GB)
|
||||
- `large` - best accuracy (~10GB)
|
||||
|
||||
### Step 5: Cleanup
|
||||
|
||||
After transcription completes, ask user: "Transcription complete! Would you like me to delete the audio file to save space?"
|
||||
|
||||
If yes:
|
||||
```bash
|
||||
rm audio_VIDEO_ID.mp3
|
||||
```
|
||||
|
||||
## Getting Video Information
|
||||
|
||||
### Extract Video Title (for filename)
|
||||
|
||||
```bash
|
||||
yt-dlp --print "%(title)s" "YOUTUBE_URL"
|
||||
```
|
||||
|
||||
Use this to create meaningful filenames based on the video title. Clean the title for filesystem compatibility:
|
||||
- Replace `/` with `-`
|
||||
- Replace special characters that might cause issues
|
||||
- Consider using sanitized version: `$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')`
|
||||
|
||||
## Post-Processing
|
||||
|
||||
### Convert to Plain Text (Recommended)
|
||||
|
||||
YouTube's auto-generated VTT files contain **duplicate lines** because captions are shown progressively with overlapping timestamps. Always deduplicate when converting to plain text while preserving the original speaking order.
|
||||
|
||||
```bash
|
||||
python3 -c "
|
||||
import sys, re
|
||||
seen = set()
|
||||
with open('transcript.en.vtt', 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
|
||||
clean = re.sub('<[^>]*>', '', line)
|
||||
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
|
||||
if clean and clean not in seen:
|
||||
print(clean)
|
||||
seen.add(clean)
|
||||
" > transcript.txt
|
||||
```
|
||||
|
||||
### Complete Post-Processing with Video Title
|
||||
|
||||
```bash
|
||||
# Get video title
|
||||
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "YOUTUBE_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
|
||||
|
||||
# Find the VTT file
|
||||
VTT_FILE=$(ls *.vtt | head -n 1)
|
||||
|
||||
# Convert with deduplication
|
||||
python3 -c "
|
||||
import sys, re
|
||||
seen = set()
|
||||
with open('$VTT_FILE', 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
|
||||
clean = re.sub('<[^>]*>', '', line)
|
||||
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
|
||||
if clean and clean not in seen:
|
||||
print(clean)
|
||||
seen.add(clean)
|
||||
" > "${VIDEO_TITLE}.txt"
|
||||
|
||||
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
|
||||
|
||||
# Clean up VTT file
|
||||
rm "$VTT_FILE"
|
||||
echo "✓ Cleaned up temporary VTT file"
|
||||
```
|
||||
|
||||
## Output Formats
|
||||
|
||||
- **VTT format** (`.vtt`): Includes timestamps and formatting, good for video players
|
||||
- **Plain text** (`.txt`): Just the text content, good for reading or analysis
|
||||
|
||||
## Tips
|
||||
|
||||
- The filename will be `{output_name}.{language_code}.vtt` (e.g., `transcript.en.vtt`)
|
||||
- Most YouTube videos have auto-generated English subtitles
|
||||
- Some videos may have multiple language options
|
||||
- If auto-subtitles aren't available, try `--write-sub` instead for manual subtitles
|
||||
|
||||
## Complete Workflow Example
|
||||
|
||||
```bash
|
||||
VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ"
|
||||
|
||||
# Get video title for filename
|
||||
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
|
||||
OUTPUT_NAME="transcript_temp"
|
||||
|
||||
# ============================================
|
||||
# STEP 1: Check if yt-dlp is installed
|
||||
# ============================================
|
||||
if ! command -v yt-dlp &> /dev/null; then
|
||||
echo "yt-dlp not found, attempting to install..."
|
||||
if command -v brew &> /dev/null; then
|
||||
brew install yt-dlp
|
||||
elif command -v apt &> /dev/null; then
|
||||
sudo apt update && sudo apt install -y yt-dlp
|
||||
else
|
||||
pip3 install yt-dlp
|
||||
fi
|
||||
fi
|
||||
|
||||
# ============================================
|
||||
# STEP 2: List available subtitles
|
||||
# ============================================
|
||||
echo "Checking available subtitles..."
|
||||
yt-dlp --list-subs "$VIDEO_URL"
|
||||
|
||||
# ============================================
|
||||
# STEP 3: Try manual subtitles first
|
||||
# ============================================
|
||||
echo "Attempting to download manual subtitles..."
|
||||
if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
|
||||
echo "✓ Manual subtitles downloaded successfully!"
|
||||
ls -lh ${OUTPUT_NAME}.*
|
||||
else
|
||||
# ============================================
|
||||
# STEP 4: Fallback to auto-generated
|
||||
# ============================================
|
||||
echo "Manual subtitles not available. Trying auto-generated..."
|
||||
if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
|
||||
echo "✓ Auto-generated subtitles downloaded successfully!"
|
||||
ls -lh ${OUTPUT_NAME}.*
|
||||
else
|
||||
# ============================================
|
||||
# STEP 5: Last resort - Whisper transcription
|
||||
# ============================================
|
||||
echo "⚠ No subtitles available for this video."
|
||||
|
||||
# Get file size
|
||||
FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL")
|
||||
DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
|
||||
TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL")
|
||||
|
||||
echo "Video: $TITLE"
|
||||
echo "Duration: $((DURATION / 60)) minutes"
|
||||
echo "Audio size: ~$((FILE_SIZE / 1024 / 1024)) MB"
|
||||
echo ""
|
||||
echo "Would you like to download and transcribe with Whisper? (y/n)"
|
||||
read -r RESPONSE
|
||||
|
||||
if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then
|
||||
# Check for Whisper
|
||||
if ! command -v whisper &> /dev/null; then
|
||||
echo "Whisper not installed. Install now? (requires ~1-3GB) (y/n)"
|
||||
read -r INSTALL_RESPONSE
|
||||
if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then
|
||||
pip3 install openai-whisper
|
||||
else
|
||||
echo "Cannot proceed without Whisper. Exiting."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Download audio
|
||||
echo "Downloading audio..."
|
||||
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "$VIDEO_URL"
|
||||
|
||||
# Get the actual audio filename
|
||||
AUDIO_FILE=$(ls audio_*.mp3 | head -n 1)
|
||||
|
||||
# Transcribe
|
||||
echo "Transcribing with Whisper (this may take a few minutes)..."
|
||||
whisper "$AUDIO_FILE" --model base --output_format vtt
|
||||
|
||||
# Cleanup
|
||||
echo "Transcription complete! Delete audio file? (y/n)"
|
||||
read -r CLEANUP_RESPONSE
|
||||
if [[ "$CLEANUP_RESPONSE" =~ ^[Yy]$ ]]; then
|
||||
rm "$AUDIO_FILE"
|
||||
echo "Audio file deleted."
|
||||
fi
|
||||
|
||||
ls -lh *.vtt
|
||||
else
|
||||
echo "Transcription cancelled."
|
||||
exit 0
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ============================================
|
||||
# STEP 6: Convert to readable plain text with deduplication
|
||||
# ============================================
|
||||
VTT_FILE=$(ls ${OUTPUT_NAME}*.vtt 2>/dev/null || ls *.vtt | head -n 1)
|
||||
if [ -f "$VTT_FILE" ]; then
|
||||
echo "Converting to readable format and removing duplicates..."
|
||||
python3 -c "
|
||||
import sys, re
|
||||
seen = set()
|
||||
with open('$VTT_FILE', 'r') as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
|
||||
clean = re.sub('<[^>]*>', '', line)
|
||||
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
|
||||
if clean and clean not in seen:
|
||||
print(clean)
|
||||
seen.add(clean)
|
||||
" > "${VIDEO_TITLE}.txt"
|
||||
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
|
||||
|
||||
# Clean up temporary VTT file
|
||||
rm "$VTT_FILE"
|
||||
echo "✓ Cleaned up temporary VTT file"
|
||||
else
|
||||
echo "⚠ No VTT file found to convert"
|
||||
fi
|
||||
|
||||
echo "✓ Complete!"
|
||||
```
|
||||
|
||||
**Note**: This complete workflow handles all scenarios with proper error checking and user prompts at each decision point.
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Issues and Solutions:
|
||||
|
||||
**1. yt-dlp not installed**
|
||||
- Attempt automatic installation based on system (Homebrew/apt/pip)
|
||||
- If installation fails, provide manual installation link
|
||||
- Verify installation before proceeding
|
||||
|
||||
**2. No subtitles available**
|
||||
- List available subtitles first to confirm
|
||||
- Try both `--write-sub` and `--write-auto-sub`
|
||||
- If both fail, offer Whisper transcription option
|
||||
- Show file size and ask for user confirmation before downloading audio
|
||||
|
||||
**3. Invalid or private video**
|
||||
- Check if URL is correct format: `https://www.youtube.com/watch?v=VIDEO_ID`
|
||||
- Some videos may be private, age-restricted, or geo-blocked
|
||||
- Inform user of the specific error from yt-dlp
|
||||
|
||||
**4. Whisper installation fails**
|
||||
- May require system dependencies (ffmpeg, rust)
|
||||
- Provide fallback: "Install manually with: `pip3 install openai-whisper`"
|
||||
- Check available disk space (models require 1-10GB depending on size)
|
||||
|
||||
**5. Download interrupted or failed**
|
||||
- Check internet connection
|
||||
- Verify sufficient disk space
|
||||
- Try again with `--no-check-certificate` if SSL issues occur
|
||||
|
||||
**6. Multiple subtitle languages**
|
||||
- By default, yt-dlp downloads all available languages
|
||||
- Can specify with `--sub-langs en` for English only
|
||||
- List available with `--list-subs` first
|
||||
|
||||
### Best Practices:
|
||||
|
||||
- ✅ Always check what's available before attempting download (`--list-subs`)
|
||||
- ✅ Verify success at each step before proceeding to next
|
||||
- ✅ Ask user before large downloads (audio files, Whisper models)
|
||||
- ✅ Clean up temporary files after processing
|
||||
- ✅ Provide clear feedback about what's happening at each stage
|
||||
- ✅ Handle errors gracefully with helpful messages
|
||||
Reference in New Issue
Block a user