Initial commit
This commit is contained in:
98
skills/parse-video/SKILL.md
Normal file
98
skills/parse-video/SKILL.md
Normal file
@@ -0,0 +1,98 @@
|
||||
---
|
||||
name: parse-video
|
||||
description: 根据 todolist.md 使用 MCP 解析视频元数据并下载。读取任务清单,调用 MCP 服务下载视频和封面,保存元数据 JSON,更新任务完成状态。
|
||||
---
|
||||
|
||||
# Parse Video - 视频解析下载
|
||||
|
||||
## 概述
|
||||
|
||||
根据 todolist.md 执行视频解析和下载。调用 MCP 服务获取元数据、下载视频和封面。
|
||||
|
||||
**前置条件**:
|
||||
- MCP `parse-video` 服务已配置运行
|
||||
- 用户已提供 todolist.md
|
||||
|
||||
## 工作流程
|
||||
|
||||
### 1. 读取 todolist.md
|
||||
|
||||
从 todolist.md 获取待处理任务:
|
||||
|
||||
## 6VbNVltFQRX (http://xhslink.com/o/6VbNVltFQRX)
|
||||
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.json
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.mp4
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX_cover.jpg
|
||||
...
|
||||
|
||||
提取:VideoId、原始 URL、输出路径。
|
||||
|
||||
### 2. 调用 MCP 解析元数据
|
||||
|
||||
```python
|
||||
# 调用 MCP
|
||||
metadata = mcp_parse_video(original_url)
|
||||
|
||||
# 增强元数据
|
||||
metadata['video_id'] = video_id
|
||||
metadata['original_url'] = original_url
|
||||
|
||||
# 保存到 todolist 指定路径
|
||||
save_json(metadata, f"orgin/{video_id}/{video_id}.json")
|
||||
```
|
||||
|
||||
### 3. 下载视频和封面
|
||||
|
||||
```python
|
||||
# 使用 MCP 下载
|
||||
mcp_download_video(
|
||||
video_id=video_id,
|
||||
video_url=metadata['videoUrls'],
|
||||
output_dir=f"orgin/{video_id}/"
|
||||
)
|
||||
```
|
||||
|
||||
文件将保存为:
|
||||
- `orgin/{VideoId}/{VideoId}.mp4`
|
||||
- `orgin/{VideoId}/{VideoId}_cover.jpg`(如果有封面)
|
||||
|
||||
### 4. 更新 todolist.md
|
||||
|
||||
下载完成后标记任务:
|
||||
|
||||
```markdown
|
||||
## 6VbNVltFQRX (http://xhslink.com/o/6VbNVltFQRX)
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.json
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.mp4
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX_cover.jpg
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.mp3 # rip-video 处理
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.srt # rip-video 处理
|
||||
```
|
||||
|
||||
### 5. 输出报告
|
||||
|
||||
```
|
||||
============================================================
|
||||
视频下载完成!
|
||||
============================================================
|
||||
成功: {success}/{total}
|
||||
失败: {failed}/{total}
|
||||
输出目录: {outputDir}/orgin/
|
||||
剩余解析次数: {leftTimes}
|
||||
|
||||
下一步:使用 rip-video 提取音频和字幕
|
||||
============================================================
|
||||
```
|
||||
|
||||
## 错误处理
|
||||
|
||||
- **MCP 服务不可用**:提示检查服务状态
|
||||
- **速率限制**:显示剩余配额,等待后重试
|
||||
- **解析失败**:跳过该视频,在报告中记录,不更新 todolist
|
||||
- **下载失败**:记录错误,标记该项为失败
|
||||
|
||||
## 集成说明
|
||||
|
||||
**上游**:plan-video 生成 todolist.md
|
||||
**下游**:rip-video 使用下载的 MP4 提取音频和字幕
|
||||
105
skills/plan-video/SKILL.md
Normal file
105
skills/plan-video/SKILL.md
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
name: plan-video
|
||||
description: 视频处理任务规划工具。从用户输入中提取视频 URLs,生成唯一 VideoId,创建结构化的 todolist.md 追踪待生成文件。支持小红书、抖音、TikTok、B站、YouTube、快手等 11+ 平台。
|
||||
---
|
||||
|
||||
# Plan Video - 视频任务规划
|
||||
|
||||
## 概述
|
||||
|
||||
为批量视频处理创建结构化任务清单。提取 URLs → 生成 VideoIds → 创建 todolist.md 追踪文件。
|
||||
|
||||
## 使用时机
|
||||
|
||||
- 用户提供视频 URLs 需要批量处理
|
||||
- 在下载/处理前需要规划任务结构
|
||||
|
||||
## 工作流程
|
||||
|
||||
### 1. 提取视频 URLs
|
||||
|
||||
从用户输入(消息/文件)中提取所有视频链接。
|
||||
|
||||
```python
|
||||
from scripts.extract_video_id import extract_urls
|
||||
urls = extract_urls(user_input_text)
|
||||
```
|
||||
|
||||
**支持平台**:小红书、抖音、TikTok、B站、YouTube、快手等
|
||||
|
||||
### 2. 提取 VideoId
|
||||
|
||||
为每个 URL 提取唯一标识符。
|
||||
|
||||
```python
|
||||
from scripts.extract_video_id import extract_video_id
|
||||
video_id = extract_video_id(url)
|
||||
# 'http://xhslink.com/o/6VbNVltFQRX' → '6VbNVltFQRX'
|
||||
```
|
||||
|
||||
### 3. 生成 todolist.md
|
||||
|
||||
在输出目录创建任务清单,列出所有待生成文件。
|
||||
|
||||
**格式**:
|
||||
```markdown
|
||||
# Video Processing Tasks
|
||||
|
||||
## {VideoId} ({原始短链接URL})
|
||||
|
||||
- [ ] orgin/{VideoId}/{VideoId}.json # 元数据,通过 skill `parse-video` 获得
|
||||
- [ ] orgin/{VideoId}/{VideoId}.mp4 # 视频,通过 skill `parse-video` 获得
|
||||
- [ ] orgin/{VideoId}/{VideoId}_cover.jpg # 封面,通过 skill `rip-video` 获得
|
||||
- [ ] orgin/{VideoId}/{VideoId}.mp3 # 音频,通过 skill `rip-video` 获得
|
||||
- [ ] orgin/{VideoId}/{VideoId}.srt # 字幕,通过 skill `rip-video` 获得
|
||||
```
|
||||
|
||||
**示例**:
|
||||
```markdown
|
||||
## 6VbNVltFQRX (http://xhslink.com/o/6VbNVltFQRX)
|
||||
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.json
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.mp4
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX_cover.jpg
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.mp3
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.srt
|
||||
```
|
||||
|
||||
### 4. 输出报告
|
||||
|
||||
```markdown
|
||||
============================================================
|
||||
视频处理计划已创建!
|
||||
============================================================
|
||||
找到视频: {total} 个
|
||||
输出目录: {outputDir}/orgin/
|
||||
任务列表: {outputDir}/todolist.md
|
||||
|
||||
可以继续使用 parse-video 和 rip-video 进行下载和处理。
|
||||
============================================================
|
||||
```
|
||||
|
||||
## 错误处理
|
||||
|
||||
- **未找到 URLs**:提示用户未检测到有效视频链接
|
||||
- **VideoId 提取失败**:跳过该 URL,在报告中说明
|
||||
- **输出目录不存在**:自动创建目录结构
|
||||
- **todolist.md 已存在**:询问用户是否覆盖或追加
|
||||
|
||||
## 资源文件
|
||||
|
||||
### scripts/extract_video_id.py
|
||||
|
||||
VideoId 提取工具。
|
||||
|
||||
**函数**:
|
||||
- `extract_video_id(url: str) -> str`:提取单个 URL 的 VideoId
|
||||
- `extract_urls(text: str) -> list[str]`:从文本提取所有视频 URLs
|
||||
|
||||
## 集成说明
|
||||
|
||||
此 skill 为后续处理准备基础:
|
||||
1. **parse-video**:使用 todolist.md 中的 VideoIds 和 URLs 解析元数据、下载视频
|
||||
2. **rip-video**:使用 todolist.md 中的 MP4 路径提取音频和字幕
|
||||
|
||||
todolist.md 是整个视频处理流程的中心追踪文档。
|
||||
140
skills/plan-video/scripts/extract_video_id.py
Normal file
140
skills/plan-video/scripts/extract_video_id.py
Normal file
@@ -0,0 +1,140 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Extract VideoId from various video platform URLs.
|
||||
|
||||
Supports: 小红书, 抖音, TikTok, B站, YouTube, 快手等
|
||||
"""
|
||||
|
||||
import re
|
||||
from urllib.parse import urlparse, parse_qs
|
||||
|
||||
|
||||
def extract_video_id(url: str) -> str:
|
||||
"""
|
||||
Extract VideoId from video platform URL.
|
||||
|
||||
Args:
|
||||
url: Video URL from supported platforms
|
||||
|
||||
Returns:
|
||||
VideoId string
|
||||
|
||||
Raises:
|
||||
ValueError: If URL format is not recognized
|
||||
"""
|
||||
url = url.strip()
|
||||
|
||||
# 小红书短链接: http://xhslink.com/o/6VbNVltFQRX
|
||||
if 'xhslink.com' in url or 'xiaohongshu.com' in url:
|
||||
match = re.search(r'/o/([A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
# 抖音: https://v.douyin.com/xxx/ or https://www.douyin.com/video/xxx
|
||||
if 'douyin.com' in url:
|
||||
# 短链接
|
||||
match = re.search(r'v\.douyin\.com/([A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
# 长链接
|
||||
match = re.search(r'/video/(\d+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
# TikTok: https://www.tiktok.com/@user/video/1234567890
|
||||
if 'tiktok.com' in url:
|
||||
match = re.search(r'/video/(\d+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
# 短链接 https://vm.tiktok.com/xxx/
|
||||
match = re.search(r'vm\.tiktok\.com/([A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
# B站: https://www.bilibili.com/video/BVxxx or https://b23.tv/xxx
|
||||
if 'bilibili.com' in url or 'b23.tv' in url:
|
||||
# BV号
|
||||
match = re.search(r'/(BV[A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
# av号
|
||||
match = re.search(r'/av(\d+)', url)
|
||||
if match:
|
||||
return f"av{match.group(1)}"
|
||||
# 短链接
|
||||
match = re.search(r'b23\.tv/([A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
# YouTube: https://www.youtube.com/watch?v=xxx or https://youtu.be/xxx
|
||||
if 'youtube.com' in url or 'youtu.be' in url:
|
||||
# youtu.be 短链接
|
||||
match = re.search(r'youtu\.be/([A-Za-z0-9_-]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
# 标准链接
|
||||
parsed = urlparse(url)
|
||||
if parsed.query:
|
||||
params = parse_qs(parsed.query)
|
||||
if 'v' in params:
|
||||
return params['v'][0]
|
||||
|
||||
# 快手: https://www.kuaishou.com/short-video/xxx
|
||||
if 'kuaishou.com' in url:
|
||||
match = re.search(r'/short-video/(\d+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
match = re.search(r'\.com/([A-Za-z0-9]+)', url)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
# 如果都不匹配,尝试提取 URL 最后的路径部分
|
||||
parsed = urlparse(url)
|
||||
path_parts = [p for p in parsed.path.split('/') if p]
|
||||
if path_parts:
|
||||
# 返回最后一个非空路径部分
|
||||
return path_parts[-1]
|
||||
|
||||
raise ValueError(f"无法从 URL 提取 VideoId: {url}")
|
||||
|
||||
|
||||
def extract_urls(text: str) -> list[str]:
|
||||
"""
|
||||
Extract video URLs from text.
|
||||
|
||||
Args:
|
||||
text: Text containing video URLs
|
||||
|
||||
Returns:
|
||||
List of unique video URLs
|
||||
"""
|
||||
pattern = r'https?://[^\s]+(?:douyin|tiktok|youtube|youtu\.be|xiaohongshu|xhslink|bilibili|b23\.tv|kuaishou)[^\s]*'
|
||||
urls = re.findall(pattern, text)
|
||||
return list(dict.fromkeys(urls)) # 保持顺序去重
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# 测试用例
|
||||
test_urls = [
|
||||
'http://xhslink.com/o/6VbNVltFQRX',
|
||||
'http://xhslink.com/o/16X9vM9CTBo',
|
||||
'https://v.douyin.com/eFyQRjc/',
|
||||
'https://www.douyin.com/video/7123456789',
|
||||
'https://www.tiktok.com/@user/video/7123456789',
|
||||
'https://vm.tiktok.com/ZMeAbCdEf/',
|
||||
'https://www.bilibili.com/video/BV1xx411c7mD',
|
||||
'https://b23.tv/av12345678',
|
||||
'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
|
||||
'https://youtu.be/dQw4w9WgXcQ',
|
||||
'https://www.kuaishou.com/short-video/1234567890',
|
||||
]
|
||||
|
||||
print("VideoId 提取测试:\n")
|
||||
for url in test_urls:
|
||||
try:
|
||||
video_id = extract_video_id(url)
|
||||
print(f"✅ {url}")
|
||||
print(f" → {video_id}\n")
|
||||
except ValueError as e:
|
||||
print(f"❌ {url}")
|
||||
print(f" → {e}\n")
|
||||
85
skills/rip-video/SKILL.md
Normal file
85
skills/rip-video/SKILL.md
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
name: rip-video
|
||||
description: 根据 todolist.md 使用 MCP 从 MP4 视频提取音频和字幕。读取任务清单,调用 MCP 服务提取封面、音频(mp3)和字幕(srt),更新任务完成状态。
|
||||
---
|
||||
|
||||
# Rip Video - 视频资源提取
|
||||
|
||||
## 概述
|
||||
|
||||
根据 todolist.md 从已下载的 MP4 视频中提取音频和字幕。调用 MCP 服务生成封面、MP3 音频和 SRT 字幕。
|
||||
|
||||
**前置条件**:
|
||||
- MCP `rip-video` 服务已配置运行(需要 ffmpeg/ffprobe)
|
||||
- todolist.md 的视频文件已标记完成且 mp4 文件真实存在
|
||||
|
||||
## 工作流程
|
||||
|
||||
### 1. 读取 todolist.md
|
||||
|
||||
从 todolist.md 获取待处理的 MP4 文件:
|
||||
|
||||
```markdown
|
||||
## 6VbNVltFQRX (http://xhslink.com/o/6VbNVltFQRX)
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.json
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.mp4
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX_cover.jpg
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.mp3
|
||||
- [ ] orgin/6VbNVltFQRX/6VbNVltFQRX.srt
|
||||
```
|
||||
|
||||
提取:MP4 路径、VideoId、待提取的资源。
|
||||
|
||||
### 2. 检查现有文件
|
||||
|
||||
检查哪些资源需要提取:
|
||||
- 封面:`{VideoId}_cover.jpg` 或 `{VideoId}-cover.jpg`
|
||||
- 音频:`{VideoId}.mp3`
|
||||
- 字幕:`{VideoId}.srt`
|
||||
|
||||
跳过已存在的文件。
|
||||
|
||||
### 3. 调用 MCP `rip_video` 提取资源
|
||||
|
||||
**提取设置**(MCP 服务端配置):
|
||||
- 封面:00:00:01 时间点,高质量
|
||||
- 音频:192kbps MP3
|
||||
- 字幕:SRT 格式(如果有嵌入字幕)
|
||||
|
||||
### 4. 更新 todolist.md
|
||||
|
||||
提取完成后标记任务:
|
||||
|
||||
```markdown
|
||||
## 6VbNVltFQRX (http://xhslink.com/o/6VbNVltFQRX)
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.json
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.mp4
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX_cover.jpg
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.mp3
|
||||
- [x] orgin/6VbNVltFQRX/6VbNVltFQRX.srt
|
||||
```
|
||||
|
||||
### 5. 输出报告
|
||||
|
||||
```
|
||||
============================================================
|
||||
视频资源提取完成!
|
||||
============================================================
|
||||
处理视频: {total} 个
|
||||
成功: {success} | 跳过: {skipped} | 失败: {failed}
|
||||
|
||||
所有任务已完成!
|
||||
============================================================
|
||||
```
|
||||
|
||||
## 错误处理
|
||||
|
||||
- **MCP 服务不可用**:提示检查 `rip-video` 服务状态
|
||||
- **MP4 文件不存在**:跳过该视频,在报告中记录
|
||||
- **无嵌入字幕**:正常情况,在报告中标记但不算失败
|
||||
- **提取失败**:记录错误,不更新 `todolist`
|
||||
|
||||
## 集成说明
|
||||
|
||||
**上游**:parse-video 下载 MP4 文件
|
||||
**输出**:完整的视频资源集(视频、封面、音频、字幕)
|
||||
Reference in New Issue
Block a user