Initial commit

This commit is contained in:
Zhongwei Li
2025-11-29 17:52:13 +08:00
commit 4b20ee9596
10 changed files with 3079 additions and 0 deletions

View File

@@ -0,0 +1,713 @@
---
name: llm-docs-optimizer
description: Optimize documentation for AI coding assistants and LLMs. Improves docs for Claude, Copilot, and other AI tools through c7score optimization, llms.txt generation, question-driven restructuring, and automated quality scoring. Use when asked to improve, optimize, or enhance documentation for AI assistants, LLMs, c7score, Context7, or when creating llms.txt files. Also use for documentation quality analysis, README optimization, or ensuring docs follow best practices for LLM retrieval systems.
version: 1.3.0
---
# LLM Docs Optimizer
This skill optimizes project documentation and README files for AI coding assistants and LLMs like Claude, GitHub Copilot, and others. It improves documentation quality through multiple approaches: c7score optimization (Context7's quality benchmark), llms.txt file generation for LLM navigation, question-driven content restructuring, and automated quality scoring across 5 key metrics.
**Version:** 1.3.0
## Understanding C7Score
C7score evaluates documentation using 5 metrics across two categories:
**LLM Analysis (85% of score):**
1. **Question-Snippet Comparison (80%)**: How well snippets answer common developer questions
2. **LLM Evaluation (5%)**: Relevancy, clarity, correctness, and uniqueness
**Text Analysis (15% of score):**
3. **Formatting (5%)**: Proper structure and language tags
4. **Project Metadata (5%)**: Absence of irrelevant content
5. **Initialization (5%)**: Not just imports/installations
For detailed information on each metric, read `references/c7score_metrics.md`.
## Core Workflow
### Step 0: Ask About llms.txt Generation (C7Score Optimization Only)
**IMPORTANT**: When the user requests c7score documentation optimization, ALWAYS ask if they also want an llms.txt file:
Use the `AskUserQuestion` tool with this question:
```
Question: "Would you also like me to generate an llms.txt file for your project?"
Header: "llms.txt"
Options:
- "Yes, create both optimized docs and llms.txt"
Description: "Optimize documentation for c7score AND generate an llms.txt navigation file"
- "No, just optimize the documentation"
Description: "Only perform c7score optimization without llms.txt generation"
```
**If user chooses "Yes"**:
- Proceed with c7score optimization workflow (Steps 1-5)
- Then follow the llms.txt generation workflow
- Provide both optimized documentation AND llms.txt file
**If user chooses "No"**:
- Proceed with c7score optimization workflow only (Steps 1-5)
**Note**: If the user explicitly requests ONLY llms.txt generation (no c7score mention), skip this step and go directly to the llms.txt generation workflow.
### Step 1: Analyze Current Documentation
When given a project or documentation to optimize:
1. **Read the documentation files** (README.md, docs/*.md, etc.)
2. **Run the analysis script** (optional but recommended) to identify issues:
```bash
python scripts/analyze_docs.py <path-to-readme.md>
```
Note: The script requires Python 3.7+ and is optional. You can skip it if Python is unavailable.
3. **Review the analysis report** (if script was run) to understand current state:
- Count of code snippets with issues
- Breakdown by metric type
- Duplicate snippets
- Language distribution
### Step 2: Generate Developer Questions
Create a list of 15-20 questions that developers commonly ask about the project:
- Focus on "How do I..." questions
- Cover setup, configuration, basic usage, common operations
- Include authentication, error handling, advanced features
- Think about real-world use cases
**Example questions:**
- How do I install and set up [project]?
- How do I authenticate/configure [project]?
- How do I [main feature/operation]?
- How do I handle errors?
- How do I integrate with [common tools]?
### Step 3: Map Questions to Snippets
Evaluate which questions are well-answered by existing documentation:
- ✅ Questions with complete, working code examples
- ⚠️ Questions with partial or theoretical answers
- ❌ Questions with no answers
Prioritize filling gaps for unanswered questions.
### Step 4: Optimize Documentation
Apply optimizations based on priority:
**Priority 1: Question Coverage (80% of score)**
- Add complete code examples for unanswered questions
- Transform API references into usage examples
- Ensure each major snippet answers at least one common question
- Make examples self-contained and runnable
**Priority 2: Remove Duplicates**
- Identify similar or identical snippets
- Consolidate into comprehensive examples
- Ensure each snippet provides unique value
**Priority 3: Fix Formatting**
- Use proper language tags (python, javascript, typescript, bash, etc.)
- Follow TITLE / DESCRIPTION / CODE structure
- Avoid very short (<3 lines) or very long (>100 lines) snippets
- Don't use descriptive strings as language tags
**Priority 4: Remove Metadata**
- Remove or minimize licensing snippets
- Remove directory structure listings
- Remove citations and BibTeX entries
- Keep only usage-relevant content
**Priority 5: Enhance Initialization Snippets**
- Combine import-only snippets with usage examples
- Add context to installation commands
- Always show what comes after setup
For detailed transformation patterns, read `references/optimization_patterns.md`.
### Step 5: Validate Optimizations
Before finalizing, verify each optimized snippet:
✅ Can run standalone (copy-paste works)
✅ Answers a specific developer question
✅ Provides unique information
✅ Uses proper format and language tag
✅ Focuses on practical usage
✅ Includes necessary imports/setup
✅ No licensing, citations, or directory trees
✅ Syntactically correct code
### Step 6: Evaluate C7Score Impact
After optimization, provide a c7score evaluation comparing the original and optimized documentation:
**Evaluation Process:**
1. **Analyze Original Documentation** against c7score metrics:
- Question-Snippet Matching (80%): How well do code examples answer developer questions?
- LLM Evaluation (10%): Clarity, correctness, unique information
- Formatting (5%): Proper markdown structure and language tags
- Metadata Removal (2.5%): Absence of licenses, citations, directory trees
- Initialization (2.5%): More than just imports/installation
2. **Analyze Optimized Documentation** using the same metrics
3. **Calculate Scores** (0-100 for each metric):
- For Question-Snippet Matching:
- 90-100: Excellent - Complete, practical answers with context
- 70-89: Good - Most questions answered with working examples
- 50-69: Fair - Partial answers, missing context
- 30-49: Poor - Vague or incomplete answers
- 0-29: Very Poor - Questions not addressed
- For LLM Evaluation:
- 90-100: Unique, clear, syntactically perfect
- 70-89: Mostly unique and clear, minor issues
- 50-69: Some duplicates or clarity issues
- 30-49: Significant duplicates or syntax errors
- 0-29: Major quality problems
- For Formatting:
- 100: All snippets properly formatted with language tags
- 80-99: Minor formatting issues
- 50-79: Multiple formatting problems
- 0-49: Significant formatting issues
- For Metadata Removal:
- 100: No project metadata
- 50-99: Some metadata present
- 0-49: Significant metadata content
- For Initialization:
- 100: All examples show usage beyond setup
- 50-99: Some initialization-only snippets
- 0-49: Many initialization-only snippets
4. **Present Results** in this format:
```markdown
## C7Score Evaluation
### Original Documentation Score: XX/100
**Metric Breakdown:**
- Question-Snippet Matching: XX/100 (weight: 80%)
- Analysis: [Brief explanation of score]
- LLM Evaluation: XX/100 (weight: 10%)
- Analysis: [Brief explanation]
- Formatting: XX/100 (weight: 5%)
- Analysis: [Brief explanation]
- Metadata Removal: XX/100 (weight: 2.5%)
- Analysis: [Brief explanation]
- Initialization: XX/100 (weight: 2.5%)
- Analysis: [Brief explanation]
**Weighted Average:** XX/100
---
### Optimized Documentation Score: XX/100
**Metric Breakdown:**
[Same format as above]
**Weighted Average:** XX/100
---
### Improvement Summary
**Overall Improvement:** +XX points (XX → XX)
**Key Improvements:**
- [Metric]: +XX points - [What specifically improved]
- [Metric]: +XX points - [What specifically improved]
**Impact Assessment:**
[Brief explanation of how optimizations improved the documentation quality]
```
5. **Scoring Guidelines:**
- Be objective and consistent
- Base scores on concrete evidence from the documentation
- Explain reasoning for each score
- Highlight specific improvements made
- Final score is weighted average: (Q×0.8) + (L×0.1) + (F×0.05) + (M×0.025) + (I×0.025)
**Note:** These are estimated scores based on c7score methodology. For official scores, users can submit to Context7's benchmark.
## Common Transformation Patterns
### Transform API Reference → Complete Example
**Before:**
```
## authenticate(api_key)
Authenticates the client.
```
**After:**
```
## Authentication
```python
from library import Client
client = Client(api_key="your_key")
client.authenticate()
# Now ready to make requests
result = client.get_data()
```
```
### Transform Import-Only → Quick Start
**Before:**
```python
from library import Client, Config
```
**After:**
```python
# Install: pip install library
from library import Client, Config
# Initialize and use
config = Config(api_key="key")
client = Client(config)
result = client.query("SELECT * FROM data")
```
### Transform Multiple Small → One Comprehensive
Combine related small snippets into one complete workflow example.
## README Structure for High Scores
Organize documentation to prioritize question-answering:
1. **Quick Start** (High Priority)
- Installation + immediate usage
- Complete, working first example
2. **Common Use Cases** (High Priority)
- Each major feature with full examples
- Real-world scenarios
3. **Configuration** (Medium Priority)
- Common configuration patterns with context
4. **Error Handling** (Medium Priority)
- Practical error handling examples
5. **API Reference** (Lower Priority)
- Include usage examples for each method
6. **Advanced Topics** (Lower Priority)
- Complex scenarios with complete code
## Tips for High Scores
1. **Think "How would a developer use this?"** - Lead with usage, not theory
2. **Make examples copy-paste ready** - Include all imports and setup
3. **Answer questions, don't just document APIs** - Show solutions, not just signatures
4. **One snippet, one lesson** - Avoid duplicate information
5. **Format consistently** - Use proper language tags and structure
6. **Remove noise** - No licensing, directory trees, or pure imports in main docs
7. **Test your examples** - Ensure code is syntactically correct and runnable
8. **Focus on the 80%** - Question-answering dominates the score
## Skill Capabilities
This skill provides two main capabilities:
1. **C7Score Documentation Optimization** - Improve documentation quality for AI-assisted coding
2. **llms.txt File Generation** - Create LLM-friendly navigation files for projects
## When to Use This Skill
### For C7Score Optimization:
- User asks to optimize documentation for c7score
- User wants to improve README or docs for Context7
- User requests documentation analysis or quality assessment
- User is creating new documentation for a library/framework
- User mentions improving documentation for AI coding assistants
- User wants to follow best practices for developer documentation
### For llms.txt Generation:
- User asks to create an llms.txt file
- User mentions llmstxt.org or llms.txt format
- User wants to make their project more accessible to LLMs
- User is setting up documentation navigation for AI tools
- User asks how to help LLMs understand their project structure
## Output Format
When optimizing documentation, provide:
1. **Analysis summary** - Key findings and issues
2. **Optimized documentation** - Complete, improved files
3. **Change summary** - What was improved and why
4. **Score impact estimate** - Expected improvement by metric
5. **Recommendations** - Further improvement suggestions
Save the optimized documentation files in the user's working directory or a designated output location. You can ask the user where they'd like the files saved if unclear.
## Examples
- For c7score optimization: See `examples/sample_readme.md` for before/after transformations
- For llms.txt generation: See `examples/sample_llmstxt.md` for different project types
---
# Creating llms.txt Files
## What is llms.txt?
**llms.txt** is a standardized markdown file format designed to provide LLM-friendly content summaries and documentation navigation. It helps language models and AI agents quickly understand project structure and find relevant documentation.
Key purposes:
- Provides brief background information and guidance
- Links to detailed markdown documentation
- Optimized for consumption by language models
- Helps LLMs navigate documentation efficiently
- Used at inference time when users request information
Official specification: https://llmstxt.org/
For complete format details, read `references/llmstxt_format.md`.
## llms.txt Generation Workflow
### Step 1: Analyze Project Structure
When asked to create an llms.txt file:
1. **Explore the project directory** to understand structure:
- Identify documentation files (README.md, docs/, CONTRIBUTING.md, etc.)
- Find example files or tutorials
- Locate API reference or configuration docs
- Check for guides, blog posts, or additional resources
2. **Identify project type**:
- Python library, CLI tool, web framework, Claude skill, etc.
- This determines the appropriate section structure
3. **Assess documentation organization**:
- Is documentation in a single README?
- Multiple files in a docs/ directory?
- Wiki, website, or external documentation?
### Step 2: Determine Project Category
Choose the appropriate template based on project type:
**Python Library / Package:**
- Documentation, API Reference, Examples, Development, Optional
**CLI Tool:**
- Getting Started, Commands, Configuration, Examples, Optional
**Web Framework:**
- Documentation, Guides, API Reference, Examples, Integrations, Optional
**Claude Skill:**
- Documentation, Reference Materials, Examples, Development, Optional
**General Project:**
- Documentation, Guides, Examples, Contributing, Optional
See `examples/sample_llmstxt.md` for complete examples of each type.
### Step 3: Create the Structure
Build the llms.txt file following this structure:
#### 1. H1 Title (Required)
```markdown
# Project Name
```
#### 2. Blockquote Summary (Highly Recommended)
```markdown
> Brief description of what the project does, its main purpose, and key value proposition.
> Should be 1-3 sentences that give LLMs essential context.
```
#### 3. Key Features/Principles (Optional but Helpful)
```markdown
Key features:
- Main feature or capability
- Another important aspect
- Third key point
Project follows these principles:
- Design principle 1
- Design principle 2
```
#### 4. Documentation Sections (Core Content)
Organize links into H2-headed sections:
```markdown
## Documentation
- [Link Title](https://full-url): Brief description of what this contains
- [Another Doc](https://full-url): What developers will find here
## API Reference
- [Core API](https://full-url): Main API documentation
- [Configuration](https://full-url): Configuration options
## Examples
- [Basic Usage](https://full-url): Simple getting-started examples
- [Advanced Patterns](https://full-url): Complex use cases
## Optional
- [Blog](https://full-url): Latest updates and tutorials
- [Community](https://full-url): Where to get help
```
### Step 4: Format Links Properly
Each link must follow this exact format:
```markdown
- [Descriptive Title](https://full-url): Optional helpful notes about the resource
```
**Requirements:**
- Use markdown bullet lists (`-`)
- Use markdown hyperlinks `[text](url)`
- Use **full URLs** with protocol (https://), not relative paths
- Add `:` followed by helpful description (optional but recommended)
- Prefer linking to `.md` files when possible
**Examples:**
✅ Good:
```markdown
- [Quick Start](https://github.com/user/repo/blob/main/docs/quickstart.md): Get running in 5 minutes
- [API Reference](https://github.com/user/repo/blob/main/docs/api.md): Complete function documentation
```
❌ Bad:
```markdown
- [Guide](../docs/guide.md): A guide
- Guide: docs/guide.md
- [Click here](guide)
```
### Step 5: Organize Sections by Priority
Order sections from most to least important:
**High Priority (First):**
- Documentation / Getting Started
- Core API / Commands
- Examples
**Medium Priority (Middle):**
- Guides / Tutorials
- Configuration
- Development / Contributing
**Low Priority (Last - Optional Section):**
- Blog posts
- Community links
- Changelog
- Extended tutorials
- Background reading
The **"Optional" section** has special meaning: LLMs can skip this when shorter context is needed.
### Step 6: Handle Different Repository Structures
#### GitHub Repository
For GitHub repos, construct URLs like:
```markdown
https://github.com/username/repo/blob/main/path/to/file.md
```
#### Local Files Only
If no remote repository exists yet, use placeholder URLs:
```markdown
https://github.com/username/repo/blob/main/README.md
```
And note in your response that URLs need to be updated when the repo is published.
#### Documentation Website
If project has a docs website, prefer linking to markdown versions:
```markdown
- [Guide](https://docs.example.com/guide.md): Getting started guide
```
Or link to HTML with `.md` suffix if markdown versions exist:
```markdown
- [Guide](https://docs.example.com/guide.html.md): Getting started guide
```
### Step 7: Validate the File
Before finalizing, check:
- ✅ File named exactly `llms.txt` (lowercase)
- ✅ Has H1 title as first element
- ✅ Has blockquote summary (highly recommended)
- ✅ Uses only H1 and H2 headings (no H3, H4, etc. in descriptive content)
- ✅ All links use full URLs with protocol
- ✅ Links use proper markdown format `[text](url)`
- ✅ Sections logically organized (essential → optional)
- ✅ Descriptive notes added after colons where helpful
- ✅ Content is concise and clear
- ✅ No complex markdown (tables, images, code blocks in the llms.txt itself)
## Common Section Templates
### For Python Libraries
```markdown
# LibraryName
> Brief description of what the library does and its main use case.
## Documentation
- Getting started, installation, core concepts
## API Reference
- Module/class/function documentation
## Examples
- Usage examples, patterns, recipes
## Development
- Contributing, testing, development setup
## Optional
- Changelog, blog, community
```
### For CLI Tools
```markdown
# ToolName
> Brief description of what the tool does.
## Getting Started
- Installation, quickstart
## Commands
- Command reference and examples
## Configuration
- Config files, environment variables
## Examples
- Common workflows and patterns
## Optional
- Advanced usage, plugins, troubleshooting
```
### For Web Frameworks
```markdown
# FrameworkName
> Brief description and key features.
## Documentation
- Core concepts, routing, data fetching
## Guides
- Authentication, deployment, testing
## API Reference
- Configuration, CLI, components
## Examples
- Sample applications
## Integrations
- Third-party tools and services
## Optional
- Blog, showcase, community
```
### For Claude Skills
```markdown
# skill-name
> Brief description of what the skill does.
## Documentation
- README, SKILL.md, usage guide
## Reference Materials
- Specifications, patterns, formats
## Examples
- Usage examples, before/after
## Development
- Scripts, contributing guide
## Optional
- External resources, related tools
```
## Tips for High-Quality llms.txt Files
1. **Be Concise**: Use clear, brief language in descriptions
2. **Think Like a New User**: What would they want to find first?
3. **Descriptive Links**: Use meaningful link text, not "click here"
4. **Add Context**: Notes after colons help LLMs understand what each link contains
5. **Stable URLs**: Link to versioned or permanent documentation
6. **Progressive Detail**: Start with essentials, end with optional resources
7. **Test Comprehension**: Read it yourself - does it make sense quickly?
8. **Keep Updated**: Update as documentation structure evolves
## Output Format for llms.txt Generation
When generating an llms.txt file, provide:
1. **Analysis summary** - Project type, documentation structure, identified resources
2. **Generated llms.txt file** - Complete, properly formatted file
3. **File placement instructions** - Where to save it (repository root)
4. **URL update notes** - If using placeholder URLs that need updating
5. **Suggestions** - Additional documentation that could improve the file
Save the file as `llms.txt` in the project root directory.
## Integration with C7Score Optimization
llms.txt generation can be combined with c7score optimization:
1. **Optimize documentation first** - Improve README and docs for c7score
2. **Then generate llms.txt** - Create navigation file pointing to optimized docs
3. **Result**: High-quality documentation with LLM-friendly navigation
Or generate them independently based on user needs.
## Additional Resources
- Format specification: `references/llmstxt_format.md`
- Examples by project type: `examples/sample_llmstxt.md`
- Official specification: https://llmstxt.org/
- Real examples: https://llmstxt.site/

View File

@@ -0,0 +1,405 @@
# Example: llms.txt Generation for Different Project Types
This document shows examples of llms.txt files generated for different types of projects, demonstrating how to structure the file based on project characteristics.
---
## Example 1: Python Library (Data Processing)
### Project Context
A Python library called "DataFlow" for stream data processing with multiple output formats.
### Generated llms.txt
```markdown
# DataFlow
> DataFlow is a Python library for processing data streams with real-time transformations
> and multiple output formats. It provides efficient stream processing with lazy evaluation
> and built-in error handling.
Key features:
- Fast stream processing with lazy evaluation
- Support for CSV, JSON, Parquet, and custom formats
- Built-in error handling and recovery
- Zero-dependency core library
- Extensible plugin system
## Documentation
- [Quick Start Guide](https://github.com/example/dataflow/blob/main/docs/quickstart.md): Get up and running in 5 minutes
- [Core Concepts](https://github.com/example/dataflow/blob/main/docs/concepts.md): Understanding streams, transformations, and processing
- [Configuration Guide](https://github.com/example/dataflow/blob/main/docs/configuration.md): All configuration options explained
## API Reference
- [Stream API](https://github.com/example/dataflow/blob/main/docs/api/stream.md): Stream creation and manipulation methods
- [Transformations](https://github.com/example/dataflow/blob/main/docs/api/transforms.md): Built-in transformation functions
- [Exports](https://github.com/example/dataflow/blob/main/docs/api/exports.md): Output format specifications
## Examples
- [Basic Usage](https://github.com/example/dataflow/blob/main/examples/basic.md): Simple stream processing examples
- [Common Patterns](https://github.com/example/dataflow/blob/main/examples/patterns.md): Filtering, mapping, and aggregation
- [Error Handling](https://github.com/example/dataflow/blob/main/examples/errors.md): Handling failures and recovery
- [Advanced Usage](https://github.com/example/dataflow/blob/main/examples/advanced.md): Parallel processing and custom plugins
## Development
- [Contributing Guide](https://github.com/example/dataflow/blob/main/CONTRIBUTING.md): How to contribute to DataFlow
- [Development Setup](https://github.com/example/dataflow/blob/main/docs/development.md): Setting up local development environment
- [Testing](https://github.com/example/dataflow/blob/main/docs/testing.md): Running and writing tests
## Optional
- [DataFlow Blog](https://dataflow.example.com/blog/): Latest updates and tutorials
- [Changelog](https://github.com/example/dataflow/blob/main/CHANGELOG.md): Version history and release notes
- [Performance Benchmarks](https://github.com/example/dataflow/blob/main/docs/performance.md): Benchmark results and optimization tips
```
### Why This Structure?
- **Blockquote**: Clearly explains what DataFlow is and its main value proposition
- **Key Features**: Bullet list highlights important capabilities
- **Documentation**: Essential guides for getting started and understanding core concepts
- **API Reference**: Organized by major components (Stream, Transformations, Exports)
- **Examples**: Progressive from basic to advanced, includes error handling
- **Development**: Resources for contributors
- **Optional**: Secondary resources like blog and benchmarks
---
## Example 2: CLI Tool (Developer Tool)
### Project Context
A command-line tool called "BuildKit" for managing build processes and deployment pipelines.
### Generated llms.txt
```markdown
# BuildKit
> BuildKit is a CLI tool for managing build processes, running tests, and deploying
> applications across multiple environments. It provides a unified interface for common
> development workflows.
BuildKit follows these principles:
- Convention over configuration
- Fast feedback loops
- Environment parity
- Reproducible builds
## Getting Started
- [Installation](https://buildkit.dev/docs/install.md): Installing BuildKit on macOS, Linux, and Windows
- [Quick Start](https://buildkit.dev/docs/quickstart.md): Your first BuildKit project in 5 minutes
- [Core Concepts](https://buildkit.dev/docs/concepts.md): Understanding tasks, pipelines, and environments
## Commands
- [build](https://buildkit.dev/docs/commands/build.md): Build your project with automatic dependency detection
- [test](https://buildkit.dev/docs/commands/test.md): Run tests with parallel execution
- [deploy](https://buildkit.dev/docs/commands/deploy.md): Deploy to staging or production
- [watch](https://buildkit.dev/docs/commands/watch.md): Watch for changes and rebuild automatically
- [All Commands](https://buildkit.dev/docs/commands/): Complete command reference
## Configuration
- [buildkit.yml](https://buildkit.dev/docs/config.md): Configuration file reference
- [Environment Variables](https://buildkit.dev/docs/env.md): Environment-specific configuration
- [Plugins](https://buildkit.dev/docs/plugins.md): Extending BuildKit with custom plugins
## Examples
- [Node.js Projects](https://buildkit.dev/examples/nodejs.md): Building and deploying Node.js apps
- [Python Projects](https://buildkit.dev/examples/python.md): Python application workflows
- [Monorepos](https://buildkit.dev/examples/monorepo.md): Managing multiple packages
- [CI/CD Integration](https://buildkit.dev/examples/ci.md): Using BuildKit in CI/CD pipelines
## Optional
- [BuildKit Blog](https://buildkit.dev/blog/): Tutorials and case studies
- [Plugin Directory](https://buildkit.dev/plugins/): Community plugins
- [Troubleshooting](https://buildkit.dev/docs/troubleshooting.md): Common issues and solutions
```
### Why This Structure?
- **Principles**: Shows design philosophy upfront
- **Getting Started**: Installation and quickstart are priority for CLI tools
- **Commands**: Individual command documentation (most important for CLI tools)
- **Configuration**: Clear section for config files and customization
- **Examples**: Language/framework-specific guides
- **Optional**: Community resources and troubleshooting
---
## Example 3: Web Framework
### Project Context
A web framework called "FastWeb" for building modern web applications.
### Generated llms.txt
```markdown
# FastWeb
> FastWeb is a modern web framework for building full-stack applications with Python.
> It provides server-side rendering, API routes, and built-in database support with
> zero configuration required.
FastWeb features:
- File-based routing with automatic code splitting
- Server-side rendering (SSR) and static site generation (SSG)
- Built-in API routes and middleware
- Real-time capabilities with WebSockets
- TypeScript-first with excellent type inference
## Documentation
- [Getting Started](https://fastweb.dev/docs/getting-started.md): Create your first FastWeb app
- [Routing](https://fastweb.dev/docs/routing.md): File-based routing and dynamic routes
- [Data Fetching](https://fastweb.dev/docs/data.md): Loading data on server and client
- [Rendering](https://fastweb.dev/docs/rendering.md): SSR, SSG, and client-side rendering
- [API Routes](https://fastweb.dev/docs/api.md): Building REST and GraphQL APIs
## Guides
- [Authentication](https://fastweb.dev/guides/auth.md): User authentication and authorization
- [Database Integration](https://fastweb.dev/guides/database.md): Working with databases
- [Deployment](https://fastweb.dev/guides/deployment.md): Deploying to production
- [Testing](https://fastweb.dev/guides/testing.md): Unit and integration testing
- [Performance](https://fastweb.dev/guides/performance.md): Optimization best practices
## API Reference
- [Configuration](https://fastweb.dev/api/config.md): fastweb.config.js options
- [CLI](https://fastweb.dev/api/cli.md): Command-line interface reference
- [Components](https://fastweb.dev/api/components.md): Built-in components
- [Hooks](https://fastweb.dev/api/hooks.md): React-style hooks API
- [Utilities](https://fastweb.dev/api/utils.md): Helper functions and utilities
## Examples
- [Blog](https://fastweb.dev/examples/blog.md): Building a blog with markdown
- [E-commerce](https://fastweb.dev/examples/ecommerce.md): Product catalog and checkout
- [Dashboard](https://fastweb.dev/examples/dashboard.md): Admin dashboard with charts
- [Real-time Chat](https://fastweb.dev/examples/chat.md): WebSocket-based chat app
## Integrations
- [Databases](https://fastweb.dev/integrations/databases.md): PostgreSQL, MySQL, MongoDB
- [CSS Frameworks](https://fastweb.dev/integrations/css.md): Tailwind, Bootstrap, etc.
- [Analytics](https://fastweb.dev/integrations/analytics.md): Google Analytics, Plausible
- [CMS](https://fastweb.dev/integrations/cms.md): Headless CMS integrations
## Optional
- [FastWeb Blog](https://fastweb.dev/blog/): Tutorials and announcements
- [Showcase](https://fastweb.dev/showcase/): Sites built with FastWeb
- [Community](https://fastweb.dev/community/): Discord, GitHub discussions
- [Changelog](https://fastweb.dev/changelog/): Version history
```
### Why This Structure?
- **Framework Features**: Lists core capabilities upfront
- **Documentation**: Core framework concepts and features
- **Guides**: Task-oriented how-to guides (authentication, deployment, etc.)
- **API Reference**: Technical reference for configuration and APIs
- **Examples**: Complete application examples
- **Integrations**: Third-party tool integration guides
- **Optional**: Community and showcase resources
---
## Example 4: Claude Skill
### Project Context
A Claude skill for optimizing documentation (this project!).
### Generated llms.txt
```markdown
# c7score-optimizer
> A Claude skill that optimizes project documentation and README files to score highly
> on Context7's c7score benchmark, making docs more effective for AI-assisted coding tools.
> Also generates llms.txt files for projects.
The skill provides:
- Documentation analysis and quality assessment
- Question-driven content restructuring
- Code snippet enhancement with context
- llms.txt file generation
- Python analysis script for automated scanning
## Documentation
- [README](https://github.com/example/c7score-optimizer/blob/main/README.md): Overview, installation, and usage
- [Skill Definition](https://github.com/example/c7score-optimizer/blob/main/SKILL.md): Complete skill workflow and instructions
- [Changelog](https://github.com/example/c7score-optimizer/blob/main/CHANGELOG.md): Version history and updates
## Reference Materials
- [C7Score Metrics](https://github.com/example/c7score-optimizer/blob/main/references/c7score_metrics.md): Understanding the c7score benchmark
- [Optimization Patterns](https://github.com/example/c7score-optimizer/blob/main/references/optimization_patterns.md): 20+ transformation patterns
- [llms.txt Format](https://github.com/example/c7score-optimizer/blob/main/references/llmstxt_format.md): Complete llms.txt specification
## Examples
- [README Optimization](https://github.com/example/c7score-optimizer/blob/main/examples/sample_readme.md): Before/after documentation transformation
- [llms.txt Generation](https://github.com/example/c7score-optimizer/blob/main/examples/sample_llmstxt.md): Generated llms.txt examples
## Development
- [Analysis Script](https://github.com/example/c7score-optimizer/blob/main/scripts/analyze_docs.py): Python tool for documentation scanning
- [Contributing](https://github.com/example/c7score-optimizer/blob/main/CONTRIBUTING.md): How to contribute improvements
## Optional
- [Context7 c7score](https://www.context7.ai/c7score): Official c7score benchmark
- [llmstxt.org](https://llmstxt.org/): Official llms.txt specification
- [Claude Code Docs](https://docs.claude.com/claude-code): Claude Code documentation
```
### Why This Structure?
- **Skill Capabilities**: Clear explanation of what the skill does
- **Documentation**: Essential files (README, SKILL.md, CHANGELOG)
- **Reference Materials**: Detailed specifications and patterns
- **Examples**: Practical before/after demonstrations
- **Development**: Tools and contribution guides
- **Optional**: External resources and official documentation
---
## Key Patterns Across All Examples
### 1. Strong Opening
Every example has:
- Clear H1 with project name
- Informative blockquote explaining what it is
- Key features/principles in bullets
### 2. Logical Section Progression
Common pattern:
1. **Getting Started / Documentation** (high priority)
2. **API / Commands / Core Features** (high priority)
3. **Guides / Examples** (practical applications)
4. **Development / Contributing** (for contributors)
5. **Optional** (secondary resources)
### 3. Descriptive Links
All links include:
- Clear, action-oriented titles
- Helpful descriptions after colons
- Context about what each resource contains
### 4. Full URLs
All examples use complete URLs with protocol:
-`https://example.com/docs/guide.md`
-`/docs/guide.md`
-`../guide.md`
### 5. Markdown-First
Prefer linking to `.md` files:
-`docs/guide.md`
- ⚠️ `docs/guide.html` (acceptable if no .md available)
---
## Decision Tree: What Sections to Include?
### For Libraries/Packages
- **Must have**: Documentation, API Reference, Examples
- **Should have**: Getting Started, Development
- **Nice to have**: Guides, Integrations, Optional
### For CLI Tools
- **Must have**: Getting Started, Commands, Examples
- **Should have**: Configuration, Development
- **Nice to have**: Plugins, Troubleshooting, Optional
### For Frameworks
- **Must have**: Documentation, Guides, API Reference, Examples
- **Should have**: Integrations, Getting Started
- **Nice to have**: Showcase, Optional
### For Skills/Plugins
- **Must have**: Documentation, Reference Materials
- **Should have**: Examples, Development
- **Nice to have**: Optional (external resources)
---
## Common Customizations by Project Type
### Open Source Project
Add to Optional:
- Contributing guide
- Code of conduct
- Governance
- Roadmap
### Commercial Product
Add sections:
- Pricing/Plans
- Support
- Enterprise features
- Migration guides
### Educational Resource
Add sections:
- Tutorials
- Video courses
- Exercises
- Certification
### Research Project
Add sections:
- Papers
- Datasets
- Experiments
- Citations
---
## Anti-Patterns to Avoid
### ❌ Too Granular
```markdown
## Installation
- [macOS Install](url)
- [Linux Install](url)
- [Windows Install](url)
- [Docker Install](url)
```
Better: One "Installation" link covering all platforms
### ❌ No Descriptions
```markdown
- [Guide](url)
- [Docs](url)
- [API](url)
```
Better: Add helpful context after colons
### ❌ Outdated Links
```markdown
- [Guide](https://example.com/v1/guide.md)
```
Better: Link to latest version or version-agnostic URLs
### ❌ Relative URLs
```markdown
- [Guide](../docs/guide.md)
```
Better: Use full URLs with protocol
### ❌ Too Much Content
Don't paste entire documentation. Link to it.
---
Use these examples as templates when generating llms.txt files for different project types!

View File

@@ -0,0 +1,352 @@
# Example: Before and After C7Score Optimization
This example demonstrates the transformation of a typical README into an optimized version that scores higher on the c7score benchmark.
---
## BEFORE Optimization (Low c7score)
```markdown
# DataFlow
![Build Status](https://img.shields.io/badge/build-passing-brightgreen)
![Version](https://img.shields.io/badge/version-2.3.1-blue)
Last updated: March 15, 2024
DataFlow is a Python library for processing data streams.
## Features
- Stream processing
- Data transformation
- Multiple output formats
- Error handling
## Installation
```bash
pip install dataflow
```
## Usage
```python
from dataflow import Stream
s = Stream()
```
## API
See the API documentation for details.
## License
MIT
```
### Issues with the BEFORE version:
- ❌ Metadata badges and timestamps (reduce c7score)
- ❌ Vague descriptions without concrete examples
- ❌ Code snippets lack context and imports
- ❌ No question-driven structure
- ❌ Missing error handling examples
- ❌ No progressive complexity
- ❌ Features listed without demonstrations
- ❌ Incomplete quickstart example
---
## AFTER Optimization (High c7score)
```markdown
# DataFlow
A Python library for processing data streams with real-time transformations and multiple output formats.
## Quick Start
### How do I install DataFlow?
```bash
pip install dataflow
```
For development installation:
```bash
git clone https://github.com/example/dataflow.git
cd dataflow
pip install -e ".[dev]"
```
### How do I process my first data stream?
```python
from dataflow import Stream, Processor
# Create a stream and add data
stream = Stream()
stream.add_data([1, 2, 3, 4, 5])
# Process the stream
processor = Processor()
result = processor.transform(stream, lambda x: x * 2)
print(result.to_list()) # Output: [2, 4, 6, 8, 10]
```
### How do I handle different data sources?
```python
from dataflow import Stream, FileSource, APISource
# Load from a file
file_stream = Stream.from_source(
FileSource('data.csv', format='csv')
)
# Load from an API
api_stream = Stream.from_source(
APISource('https://api.example.com/data', auth_token='your_token')
)
# Process both streams
for item in file_stream:
print(f"Processing: {item}")
```
## Common Use Cases
### How do I transform data in a stream?
```python
from dataflow import Stream, Transformer
stream = Stream.from_list([
{'name': 'Alice', 'age': 30},
{'name': 'Bob', 'age': 25}
])
# Apply transformations
transformed = stream.map(lambda x: {
**x,
'age_group': 'adult' if x['age'] >= 18 else 'minor'
})
for person in transformed:
print(person)
# Output: {'name': 'Alice', 'age': 30, 'age_group': 'adult'}
# {'name': 'Bob', 'age': 25, 'age_group': 'adult'}
```
### How do I filter data in a stream?
```python
from dataflow import Stream
stream = Stream.from_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Filter even numbers
evens = stream.filter(lambda x: x % 2 == 0)
print(evens.to_list()) # Output: [2, 4, 6, 8, 10]
```
### How do I export stream data to different formats?
```python
from dataflow import Stream
stream = Stream.from_list([
{'id': 1, 'value': 100},
{'id': 2, 'value': 200}
])
# Export to JSON
stream.to_json('output.json', indent=2)
# Export to CSV
stream.to_csv('output.csv', headers=True)
# Export to parquet (requires pyarrow)
stream.to_parquet('output.parquet', compression='snappy')
```
## Error Handling
### How do I handle errors during stream processing?
```python
from dataflow import Stream, ProcessingError
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
try:
result = stream.map(lambda x: x * 2)
except ProcessingError as e:
print(f"Error processing stream: {e}")
print(f"Failed at item: {e.item}")
print(f"Position: {e.position}")
```
### How do I skip invalid items in a stream?
```python
from dataflow import Stream
stream = Stream.from_list([1, 2, 'invalid', 4, 5])
# Skip invalid items with error handling
def safe_transform(x):
try:
return x * 2
except TypeError:
return None
result = stream.map(safe_transform).filter(lambda x: x is not None)
print(result.to_list()) # Output: [2, 4, 8, 10]
```
## Advanced Usage
### How do I chain multiple transformations?
```python
from dataflow import Stream
result = (
Stream.from_list([1, 2, 3, 4, 5])
.filter(lambda x: x > 2) # Keep values > 2
.map(lambda x: x * 2) # Double them
.map(lambda x: {'value': x}) # Convert to dict
.to_list()
)
print(result)
# Output: [{'value': 6}, {'value': 8}, {'value': 10}]
```
### How do I process streams in parallel?
```python
from dataflow import Stream, ParallelProcessor
stream = Stream.from_list(range(1000))
# Process with 4 worker threads
processor = ParallelProcessor(workers=4)
result = processor.map(stream, lambda x: expensive_operation(x))
for item in result:
print(item)
```
### How do I aggregate data from a stream?
```python
from dataflow import Stream
stream = Stream.from_list([1, 2, 3, 4, 5])
# Calculate sum
total = stream.reduce(lambda acc, x: acc + x, initial=0)
print(f"Sum: {total}") # Output: Sum: 15
# Calculate average
count = stream.count()
average = total / count
print(f"Average: {average}") # Output: Average: 3.0
```
## Performance
### How do I optimize stream processing performance?
```python
from dataflow import Stream, BufferedStream
# Use buffering for better performance
stream = BufferedStream.from_source(
source=large_data_source,
buffer_size=1000 # Process in chunks of 1000
)
# Enable lazy evaluation
result = stream.lazy().map(transform_fn).filter(filter_fn)
# Only evaluate when needed
final_data = result.to_list()
```
## Requirements
- Python 3.7 or higher
- Optional dependencies:
- `pyarrow` for Parquet support
- `pandas` for DataFrame integration
## Installation Options
Standard installation:
```bash
pip install dataflow
```
With optional dependencies:
```bash
pip install dataflow[parquet] # For Parquet support
pip install dataflow[pandas] # For pandas integration
pip install dataflow[all] # All optional features
```
## License
MIT License - see LICENSE file for details
```
### Improvements in the AFTER version:
- ✅ Removed metadata badges and timestamps
- ✅ Question-driven headers throughout
- ✅ Complete code examples with imports and context
- ✅ Progressive complexity (basic → advanced)
- ✅ Error handling examples
- ✅ Multiple use cases demonstrated
- ✅ Concrete outputs shown in comments
- ✅ Installation options clearly explained
- ✅ Common questions answered with working code
---
## C7Score Impact Estimate
### BEFORE Version Metrics:
- Question-Snippet Matching: ~40/100 (incomplete examples, poor alignment)
- LLM Evaluation: ~50/100 (vague descriptions)
- Formatting: ~70/100 (basic markdown, code blocks present)
- Metadata Removal: ~30/100 (badges and timestamps present)
- Initialization Examples: ~50/100 (incomplete quickstart)
**Estimated BEFORE c7score: ~45/100**
### AFTER Version Metrics:
- Question-Snippet Matching: ~90/100 (excellent Q&A alignment)
- LLM Evaluation: ~95/100 (comprehensive, clear)
- Formatting: ~95/100 (proper structure, complete blocks)
- Metadata Removal: ~100/100 (all noise removed)
- Initialization Examples: ~95/100 (complete, progressive)
**Estimated AFTER c7score: ~92/100**
---
## Key Transformation Patterns Used
1. **Question Headers**: "Installation" → "How do I install DataFlow?"
2. **Complete Examples**: Added imports, setup, and expected outputs
3. **Progressive Complexity**: Basic → Common → Advanced sections
4. **Error Scenarios**: Dedicated error handling examples
5. **Concrete Outputs**: Included actual output in code comments
6. **Noise Removal**: Stripped badges and timestamps
7. **Context Addition**: Every snippet is runnable as-is
8. **Multiple Paths**: Showed different ways to achieve goals
Use this example as a template for optimizing your own documentation!

View File

@@ -0,0 +1,349 @@
# C7Score Metrics Reference
## Overview
c7score evaluates documentation quality for Context7 using 5 metrics divided into two groups:
- **LLM Analysis** (Metrics 1-2): AI-powered evaluation
- **Text Analysis** (Metrics 3-5): Rule-based checks
## Metric 1: Question-Snippet Comparison (LLM)
**What it measures:** How well code snippets answer common developer questions about the library.
**Scoring approach:**
- LLM generates 15 common questions developers might ask about the library
- Each snippet is evaluated on how well it answers these questions
- Higher scores for snippets that directly address practical usage questions
**Optimization strategies:**
- Include code examples that answer "how do I..." questions
- Provide working code snippets for common use cases
- Address setup, configuration, and basic operations
- Show real-world usage patterns, not just API signatures
- Include examples that demonstrate the library's main features
**What scores well:**
- "How do I initialize the client?" with full working example
- "How do I handle authentication?" with complete code
- "How do I make a basic query?" with error handling included
**What scores poorly:**
- Partial code that doesn't run standalone
- API reference without usage examples
- Theoretical explanations without practical code
## Metric 2: LLM Evaluation (LLM)
**What it measures:** Overall snippet quality including relevancy, clarity, and correctness.
**Scoring criteria:**
- **Relevancy**: Does the snippet provide useful information about the library?
- **Clarity**: Is the code and explanation easy to understand?
- **Correctness**: Is the code syntactically correct and using proper APIs?
- **Uniqueness**: Are snippets providing unique information or duplicating content?
**Optimization strategies:**
- Ensure each snippet provides distinct, valuable information
- Use clear variable names and structure
- Add brief explanatory comments where helpful
- Verify all code is syntactically correct
- Remove or consolidate duplicate snippets
- Test code examples to ensure they work
**What causes low scores:**
- High rate of duplicate snippets (>25% identical copies)
- Unclear or confusing code structure
- Syntax errors or incorrect API usage
- Snippets that don't add new information
## Metric 3: Formatting (Text Analysis)
**What it measures:** Whether snippets have the expected format and structure.
**Checks performed:**
- Are categories missing? (e.g., no title, description, or code)
- Are code snippets too short or too long?
- Are language tags actually descriptions? (e.g., "FORTE Build System Configuration")
- Are languages set to "none" or showing console output?
- Is the code just a list or argument descriptions?
**Optimization strategies:**
- Follow consistent snippet structure: TITLE / DESCRIPTION / CODE
- Use 40-dash delimiters between snippets (----------------------------------------)
- Set proper language tags (python, javascript, typescript, bash, etc.)
- Avoid very short snippets (<3 lines) unless absolutely necessary
- Avoid very long snippets (>100 lines) - break into focused examples
- Don't use lists in place of code
**Example good format:**
```
Getting Started with Authentication
----------------------------------------
Initialize the client with your API key and authenticate requests.
```python
from library import Client
client = Client(api_key="your_api_key")
client.authenticate()
```
```
**What to avoid:**
- Language tags like "CLI Arguments" or "Configuration File"
- Pretty-printed tables instead of code
- Numbered/bulleted lists masquerading as code
- Missing titles or descriptions
- Inconsistent formatting
## Metric 4: Project Metadata (Text Analysis)
**What it measures:** Presence of irrelevant project information that doesn't help developers use the library.
**Checks performed:**
- BibTeX citations (would have language tag "Bibtex")
- Licensing information
- Directory structure listings
- Project governance or administrative content
**Optimization strategies:**
- Remove or minimize licensing snippets
- Avoid directory tree representations
- Don't include citation information
- Focus on usage, not project management
- Keep administrative content out of code documentation
**What to remove or relocate:**
- LICENSE files or license text
- CONTRIBUTING.md guidelines
- Directory listings or project structure
- Academic citations (BibTeX, APA, etc.)
- Governance policies
**Exception:** Brief installation or setup instructions that mention directories are okay if needed for library usage.
## Metric 5: Initialization (Text Analysis)
**What it measures:** Snippets that are only imports or installations without meaningful content.
**Checks performed:**
- Snippets that are just import statements
- Snippets that are just installation commands (pip install, npm install)
- No additional context or usage examples
**Optimization strategies:**
- Combine imports with usage examples
- Show installation in context of setup process
- Always follow imports with actual usage code
- Make installation snippets include next steps
**Good approach:**
```python
# Installation and basic usage
# First install: pip install library-name
from library import Client
# Initialize and make your first request
client = Client()
result = client.get_data()
```
**Poor approach:**
```python
# Just imports
import library
from library import Client
```
```bash
# Just installation
pip install library-name
```
## Scoring Weights
Default c7score weights (can be customized):
- Question-Snippet Comparison: 0.8 (80%)
- LLM Evaluation: 0.05 (5%)
- Formatting: 0.05 (5%)
- Project Metadata: 0.05 (5%)
- Initialization: 0.05 (5%)
The question-answer metric dominates because Context7's primary goal is helping developers answer practical questions about library usage.
## Overall Best Practices
1. **Focus on answering questions**: Think "How would a developer actually use this?"
2. **Provide complete, working examples**: Not just fragments
3. **Ensure uniqueness**: Each snippet should teach something new
4. **Structure consistently**: TITLE / DESCRIPTION / CODE format
5. **Use proper language tags**: python, javascript, typescript, etc.
6. **Remove noise**: No licensing, directory trees, or pure imports
7. **Test your code**: All examples should be syntactically correct
8. **Keep it practical**: Real-world usage beats theoretical explanation
---
## Self-Evaluation Rubrics
When evaluating documentation quality using c7score methodology, use these detailed rubrics:
### 1. Question-Snippet Matching Rubric (80% weight)
**Score: 90-100 (Excellent)**
- All major developer questions have complete answers
- Code examples are self-contained and runnable
- Examples include imports, setup, and usage context
- Common use cases are clearly demonstrated
- Error handling is shown where relevant
- Examples progress from simple to advanced
**Score: 70-89 (Good)**
- Most questions are answered with working code
- Examples are mostly complete but may miss minor details
- Some context or imports may be implicit
- Common use cases covered
- Minor gaps in error handling
**Score: 50-69 (Fair)**
- Some questions answered, others partially addressed
- Examples require significant external knowledge
- Missing imports or setup context
- Limited use case coverage
- Error handling largely absent
**Score: 30-49 (Poor)**
- Few questions fully answered
- Examples are fragments without context
- Unclear how to actually use the code
- Major use cases not covered
- No error handling
**Score: 0-29 (Very Poor)**
- Questions not addressed in documentation
- No practical examples
- Only API signatures without usage
- Cannot determine how to use the library
### 2. LLM Evaluation Rubric (10% weight)
**Unique Information (30% of metric):**
- 100%: Every snippet provides unique value, no duplicates
- 75%: Minimal duplication, mostly unique content
- 50%: Some repeated information across snippets
- 25%: Significant duplication
- 0%: Many duplicate snippets
**Clarity (30% of metric):**
- 100%: Well-worded, professional, no errors
- 75%: Clear with minor grammar/wording issues
- 50%: Understandable but awkward phrasing
- 25%: Confusing or poorly worded
- 0%: Unclear, incomprehensible
**Correct Syntax (40% of metric):**
- 100%: All code syntactically perfect
- 75%: Minor syntax issues (missing semicolons, etc.)
- 50%: Some syntax errors but code is recognizable
- 25%: Multiple syntax errors
- 0%: Code is not valid
**Final LLM Evaluation Score** = (Unique×0.3) + (Clarity×0.3) + (Syntax×0.4)
### 3. Formatting Rubric (5% weight)
**Score: 100 (Perfect)**
- All snippets have proper language tags (python, javascript, etc.)
- Language tags are actual languages, not descriptions
- All code blocks use triple backticks with language
- Code blocks are properly closed
- No lists within CODE sections
- Minimum length requirements met (5+ words)
**Score: 80-99 (Minor Issues)**
- 1-2 snippets missing language tags
- One or two incorrectly formatted blocks
- Minor inconsistencies
**Score: 50-79 (Multiple Problems)**
- Several snippets missing language tags
- Some use descriptive strings instead of language names
- Inconsistent formatting
**Score: 0-49 (Significant Issues)**
- Many snippets improperly formatted
- Widespread use of wrong language tags
- Code not in proper blocks
### 4. Metadata Removal Rubric (2.5% weight)
**Score: 100 (Clean)**
- No license text in code examples
- No citation formats (BibTeX, RIS)
- No directory structure listings
- No project metadata
- Pure code and usage examples
**Score: 75-99 (Minimal Metadata)**
- One or two snippets with minor metadata
- Brief license mentions that don't dominate
**Score: 50-74 (Some Metadata)**
- Several snippets include project metadata
- Directory structures present
- Some citation content
**Score: 0-49 (Heavy Metadata)**
- Significant license/citation content
- Multiple directory listings
- Project metadata dominates
### 5. Initialization Rubric (2.5% weight)
**Score: 100 (Excellent)**
- All examples show usage beyond setup
- Installation combined with first usage
- Imports followed by practical examples
- No standalone import/install snippets
**Score: 75-99 (Mostly Good)**
- 1-2 snippets are setup-only
- Most examples show actual usage
**Score: 50-74 (Some Init-Only)**
- Several snippets are just imports/installation
- Mixed quality
**Score: 0-49 (Many Init-Only)**
- Many snippets are only imports
- Many snippets are only installation
- Lack of usage examples
### Scoring Best Practices
**When evaluating:**
1. **Read entire documentation** before scoring
2. **Count specific examples** (e.g., "7 out of 10 snippets...")
3. **Be consistent** between before/after evaluations
4. **Explain scores** with concrete evidence
5. **Use percentages** when quantifying (e.g., "80% of examples...")
6. **Identify improvements** specifically
7. **Calculate weighted average**: (Q×0.8) + (L×0.1) + (F×0.05) + (M×0.025) + (I×0.025)
**Example Calculation:**
- Question-Snippet: 85/100 × 0.8 = 68
- LLM Evaluation: 90/100 × 0.1 = 9
- Formatting: 100/100 × 0.05 = 5
- Metadata: 100/100 × 0.025 = 2.5
- Initialization: 95/100 × 0.025 = 2.375
- **Total: 86.875 ≈ 87/100**
### Common Scoring Mistakes to Avoid
**Being too generous**: Score based on evidence, not potential
**Ignoring weights**: Question-answer matters most (80%)
**Vague explanations**: Say "5 of 8 examples lack imports" not "some issues"
**Inconsistent standards**: Apply same rubric to before/after
**Forgetting context**: Consider project type and audience
**Be specific, objective, and consistent**

View File

@@ -0,0 +1,406 @@
# llms.txt Format Specification
This document provides a complete reference for creating llms.txt files according to the official specification at https://llmstxt.org/
## Overview
**llms.txt** is a standardized markdown file format designed to provide LLM-friendly content summaries and documentation. It solves a critical problem: context windows are too small to handle most websites in their entirety.
### Purpose
- Provides brief background information, guidance, and links to detailed markdown files
- Optimized for consumption by language models and AI agents
- Used at inference time when users explicitly request information
- Helps LLMs navigate documentation, understand projects, and access the right resources
- Enables chatbots with search functionality to retrieve relevant information efficiently
### Why Markdown?
The specification uses markdown rather than XML/JSON because "we expect many of these files to be read by language models and agents" while still being "readable using standard programmatic-based tools."
## File Structure
The format follows a specific structural hierarchy:
1. **H1 heading** (`# Title`) - **REQUIRED**
2. **Blockquote summary** (`> text`) - Optional but recommended
3. **Descriptive content** (paragraphs, bullet lists) - Optional
4. **H2-delimited sections** (`## Section Name`) with file lists - Optional
### Basic Template
```markdown
# Project Name
> Brief summary of what this project does and why it exists.
- Key principle or feature
- Another important concept
- Third key point
## Documentation
- [Main Guide](https://example.com/docs/guide.md): Getting started guide
- [API Reference](https://example.com/docs/api.md): Complete API documentation
## Examples
- [Basic Usage](https://example.com/examples/basic.md): Simple examples
- [Advanced Patterns](https://example.com/examples/advanced.md): Complex use cases
## Optional
- [Blog](https://example.com/blog/): Latest news and updates
- [Community](https://example.com/community/): Join discussions
```
## Required Elements
### H1 Title (Required)
The project or site name - this is the **ONLY mandatory element**.
```markdown
# Project Name
```
## Optional Elements
### Blockquote Summary (Recommended)
Brief project description with key information necessary for understanding the rest of the file.
```markdown
> Project Name is a Python library for data processing. It provides efficient
> stream transformations and supports multiple output formats.
```
### Descriptive Content (Optional)
Any markdown content **EXCEPT headings**. Use paragraphs, bullet lists, etc.
```markdown
Key features:
- Fast stream processing with lazy evaluation
- Built-in error handling and recovery
- Zero-dependency core library
- Extensible plugin system
Project Name follows these principles:
1. Simplicity over complexity
2. Performance by default
3. Developer experience first
```
**Important:** Do NOT use H2, H3, or other headings in descriptive content. Only H1 (title) and H2 (section headers) are allowed.
### File List Sections (Optional)
H2-headed sections containing links to resources.
```markdown
## Section Name
- [Link Title](https://full-url): Optional description or notes about the resource
- [Another Link](https://url): More details here
```
## Link Format Requirements
Each file list entry must follow this exact pattern:
```markdown
- [Link Title](https://full-url): Optional description
```
### Rules:
1. Use markdown bullet lists (`-`)
2. Include markdown hyperlinks `[name](url)`
3. Optionally add `:` followed by notes about the file
4. Links should point to markdown versions of documentation (preferably `.md` files)
5. Use full URLs, not relative paths
### Examples:
```markdown
## Documentation
- [Quick Start](https://docs.example.com/quickstart.md): Get up and running in 5 minutes
- [Configuration Guide](https://docs.example.com/config.md): All configuration options explained
- [API Reference](https://docs.example.com/api.md): Complete API documentation with examples
```
## Special Sections
### Optional Section
The **"Optional"** section has special meaning: content here can be skipped when shorter context is needed.
```markdown
## Optional
- [Blog](https://example.com/blog/): Latest news about the project
- [Case Studies](https://example.com/cases/): Real-world usage examples
- [Video Tutorials](https://example.com/videos/): Visual learning resources
```
Use this section for:
- Secondary resources
- Community links
- Blog posts and news
- Extended tutorials
- Background reading
## Common Section Names
### Documentation-focused Projects
```markdown
## Documentation
- Core docs, guides, tutorials
## API Reference
- Function references, method documentation
## Examples
- Code samples, patterns, recipes
## Guides
- How-to guides, best practices
## Development
- Contributing, setup, testing
## Optional
- Blog, community, extended resources
```
### Tool/CLI Projects
```markdown
## Getting Started
- Installation, quickstart
## Commands
- CLI reference, usage examples
## Configuration
- Config files, options
## Examples
- Common workflows, patterns
## Optional
- Advanced usage, plugins
```
### Framework Projects
```markdown
## Core Concepts
- Architecture, principles
## Documentation
- Guides, tutorials
## API Reference
- Component APIs, hooks
## Examples
- Starter templates, patterns
## Plugins/Integrations
- Extensions, third-party tools
## Optional
- Blog, showcase, community
```
## File Placement
### Repository Location
Place at **`/llms.txt`** in the repository root, alongside `README.md`.
### Web Serving
For websites, serve at the root path `/llms.txt` (e.g., `https://example.com/llms.txt`).
### Companion Files
You can create expanded versions:
- `llms-ctx.txt` - Expanded content without URLs
- `llms-ctx-full.txt` - Expanded content with URLs
For referenced pages, create markdown versions:
- `page.html``page.html.md`
- Or use `index.html.md` for pages without filenames
## Best Practices
### Content Guidelines
1. **Be Concise**: Use clear, brief language
2. **Avoid Jargon**: Explain technical terms or link to explanations
3. **Information Hierarchy**: Most important content first
4. **Test with LLMs**: Verify that language models can understand your content
5. **Keep Updated**: Maintain accuracy as your project evolves
### Link Best Practices
1. **Descriptive Titles**: Use meaningful link text (not "click here")
2. **Helpful Notes**: Add context after colons to explain what each resource contains
3. **Stable URLs**: Link to permanent, versioned documentation
4. **Markdown Files**: Prefer `.md` files over HTML when possible
5. **Complete URLs**: Use full URLs with protocol (https://)
### Organizational Strategy
1. **Start with Essentials**: Put most important docs first
2. **Logical Grouping**: Group related resources under descriptive H2 headings
3. **Progressive Detail**: Basic → Intermediate → Advanced
4. **Optional Last**: Secondary resources go in the "Optional" section
5. **Consistent Format**: Use the same link format throughout
## Examples from the Wild
### Real-World Implementations
- **Astro**: https://docs.astro.build/llms.txt
- **FastHTML**: https://www.fastht.ml/docs/llms.txt
- **Shopify**: https://shopify.dev/llms.txt
- **Strapi**: https://docs.strapi.io/llms.txt
- **Modal**: https://modal.com/llms.txt
### Example: FastHTML Style
```markdown
# FastHTML
> FastHTML is a Python library for building web applications using pure Python.
FastHTML follows these principles:
- Write HTML in Python with no JavaScript required
- Use standard Python patterns and idioms
- Deploy anywhere Python runs
## Documentation
- [Tutorial](https://docs.fastht.ml/tutorial.md): Step-by-step introduction
- [Reference](https://docs.fastht.ml/reference.md): Complete API reference
- [Examples](https://docs.fastht.ml/examples.md): Common patterns and recipes
## Optional
- [FastHTML Blog](https://fastht.ml/blog/): Latest updates
```
### Example: Framework Style
```markdown
# Astro
> Astro is an all-in-one web framework for building fast, content-focused websites.
- Uses island architecture for better performance
- Server-first design with minimal client JavaScript
- Supports React, Vue, Svelte, and other UI frameworks
- Zero JavaScript by default
## Documentation Sets
- [Getting Started](https://docs.astro.build/getting-started.md): Installation and first project
- [Core Concepts](https://docs.astro.build/core-concepts.md): Islands, components, routing
- [Complete Docs](https://docs.astro.build/llms-full.txt): Full documentation set
## API Reference
- [Configuration](https://docs.astro.build/reference/configuration.md): astro.config.mjs options
- [CLI Commands](https://docs.astro.build/reference/cli.md): Command-line reference
- [Integrations API](https://docs.astro.build/reference/integrations.md): Building integrations
## Optional
- [Astro Blog](https://astro.build/blog/): Development news
- [Showcase](https://astro.build/showcase/): Sites built with Astro
```
## Allowed Markdown Elements
### Supported
- `#` H1 for title (required)
- `##` H2 for section headers
- `>` Blockquotes for summary
- `-` Bullet lists
- `[text](url)` Markdown links
- `:` Colon separator for notes after links
- Plain paragraphs
- Numbered lists (`1.`, `2.`, etc.)
### Not Used/Forbidden
- H3, H4, H5, H6 headings in descriptive content
- XML, JSON, or other structured formats
- Complex markdown tables
- Images (focus on text content)
- Code blocks (link to them instead)
## Tools and Integration
### CLI Tool
`llms_txt2ctx` - Command-line tool for processing and expanding llms.txt files
### Framework Plugins
- **VitePress**: https://github.com/okineadev/vitepress-plugin-llms
- **Docusaurus**: https://github.com/rachfop/docusaurus-plugin-llms
- **Drupal**: https://www.drupal.org/project/llm_support
- **PHP**: https://github.com/raphaelstolt/llms-txt-php
### Directories
- https://llmstxt.site/ - Directory of available llms.txt files
- https://directory.llmstxt.cloud/ - Community directory
## Common Mistakes to Avoid
1. **Using Relative URLs**: Always use full URLs with protocol
2. **Too Much Content**: Keep it concise, link to details
3. **Missing Descriptions**: Add helpful notes after link colons
4. **No Structure**: Use H2 sections to organize links
5. **Outdated Links**: Keep URLs current as docs evolve
6. **Complex Formatting**: Stick to simple markdown
7. **No Summary**: Always include a blockquote summary
8. **Wrong File Location**: Must be at repository root as `/llms.txt`
## Validation Checklist
Before publishing your llms.txt:
- ✅ File is named exactly `llms.txt` (lowercase)
- ✅ File is at repository root
- ✅ Has H1 title as first element
- ✅ Has blockquote summary
- ✅ Uses only H1 and H2 headings
- ✅ All links use full URLs
- ✅ Links use proper markdown format `[text](url)`
- ✅ Descriptive notes added after colons where helpful
- ✅ Sections logically organized
- ✅ Essential content comes before optional
- ✅ Links point to markdown files when possible
- ✅ Content is concise and clear
- ✅ Tested with an LLM for comprehension
## Additional Resources
- **Official Site**: https://llmstxt.org/
- **GitHub**: https://github.com/answerdotai/llms-txt
- **Issues**: https://github.com/AnswerDotAI/llms-txt/issues/new
- **Community**: Discord channel (check official site for link)
## Version
This reference is based on the llms.txt specification as of November 2025. Check https://llmstxt.org/ for the latest updates.

View File

@@ -0,0 +1,428 @@
# Documentation Optimization Patterns for C7Score
## Analysis Workflow
### Step 1: Audit Current Documentation
Review the existing documentation and categorize snippets:
1. **Question-answering snippets**: Count how many snippets directly answer developer questions
2. **API reference snippets**: Count pure API documentation without usage examples
3. **Installation/import-only snippets**: Count snippets that are just setup with no usage
4. **Metadata snippets**: Count licensing, directory structures, citations
5. **Duplicate snippets**: Identify repeated or very similar content
6. **Formatting issues**: Note inconsistent formats, wrong language tags, etc.
### Step 2: Generate Common Questions
Use an LLM to generate 15-20 common questions a developer would ask about the library:
**Example questions:**
- How do I install and set up [library]?
- How do I [main feature 1]?
- How do I [main feature 2]?
- How do I handle errors?
- How do I configure [common setting]?
- What are the authentication options?
- How do I integrate with [common use case]?
- What are the rate limits and how do I handle them?
- How do I use [advanced feature]?
- How do I test code using [library]?
### Step 3: Map Questions to Snippets
Create a mapping:
- Which questions are well-answered by existing snippets?
- Which questions have weak or missing answers?
- Which snippets don't answer any important questions?
### Step 4: Optimize High-Impact Areas
Focus optimization efforts based on c7score weights:
**Priority 1 (80% of score): Question-Snippet Matching**
- Add missing snippets for unanswered questions
- Enhance snippets that partially answer questions
- Ensure each snippet addresses at least one common question
**Priority 2 (5% each): Other Metrics**
- Remove duplicates
- Fix formatting inconsistencies
- Remove metadata snippets
- Combine import-only snippets with usage
## Snippet Transformation Patterns
### Pattern 1: API Reference → Usage Example
**Before:**
```
Client.authenticate(api_key: str) -> bool
Authenticates the client with the provided API key.
Parameters:
- api_key (str): Your API key
Returns:
- bool: True if authentication succeeded
```
**After:**
```
Authenticating Your Client
----------------------------------------
Authenticate your client using your API key from the dashboard.
```python
from library import Client
# Initialize with your API key
client = Client(api_key="your_api_key_here")
# Authenticate
if client.authenticate():
print("Successfully authenticated!")
else:
print("Authentication failed")
```
```
### Pattern 2: Import-Only → Complete Setup
**Before:**
```python
from library import Client, Query, Response
```
**After:**
```
Quick Start: Making Your First Request
----------------------------------------
Install the library, import the client, and make your first API call.
```python
# Install: pip install library-name
from library import Client
# Initialize and authenticate
client = Client(api_key="your_api_key")
# Make your first request
response = client.query("SELECT * FROM data LIMIT 10")
for row in response:
print(row)
```
```
### Pattern 3: Multiple Small → One Comprehensive
**Before (3 separate snippets):**
```python
client = Client()
```
```python
client.connect()
```
```python
client.query("SELECT * FROM table")
```
**After (1 comprehensive snippet):**
```
Complete Workflow: Connect and Query
----------------------------------------
Full example showing initialization, connection, and querying.
```python
from library import Client
# Initialize the client
client = Client(
api_key="your_api_key",
region="us-west-2"
)
# Establish connection
client.connect()
# Execute query
result = client.query("SELECT * FROM users WHERE active = true")
# Process results
for row in result:
print(f"User: {row['name']}, Email: {row['email']}")
# Close connection
client.close()
```
```
### Pattern 4: Remove Metadata Snippets
**Remove these entirely:**
```
Project Structure
----------------------------------------
myproject/
├── src/
│ ├── main.py
│ └── utils.py
├── tests/
└── README.md
```
```
License
----------------------------------------
MIT License
Copyright (c) 2024...
```
```
Citation
----------------------------------------
@article{library2024,
title={The Library Paper},
...
}
```
## README Optimization
### Structure Your README for High Scores
**1. Quick Start Section (High Priority)**
```markdown
## Quick Start
```python
# Install
pip install your-library
# Import and use
from your_library import Client
client = Client(api_key="key")
result = client.do_something()
print(result)
```
```
**2. Common Use Cases (High Priority)**
For each major feature, provide:
- Clear section title answering "How do I...?"
- Brief description
- Complete, working code example
- Expected output or result
**3. API Reference (Lower Priority)**
Keep it, but ensure each API method has at least one usage example.
**4. Configuration Examples (Medium Priority)**
Show common configuration scenarios with full context.
**5. Error Handling (Medium Priority)**
Demonstrate proper error handling in realistic scenarios.
### What to Minimize or Remove
- **Installation only**: Always combine with first usage
- **Long lists**: Convert to example-driven content
- **Project governance**: Move to separate CONTRIBUTING.md
- **Licensing**: Link to LICENSE file, don't duplicate
- **Directory trees**: Remove unless essential for setup
- **Academic citations**: Remove from main docs
## Testing Documentation Quality
### Manual Quality Checks
Before finalizing, verify each snippet:
1. ✅ **Can run standalone**: Copy-paste the code and it works (with minimal setup)
2. ✅ **Answers a question**: Clearly addresses a "how do I..." query
3. ✅ **Unique information**: Doesn't duplicate other snippets
4. ✅ **Proper format**: Has title, description, and code with correct language tag
5. ✅ **Practical focus**: Shows real-world usage, not just theory
6. ✅ **Complete imports**: Includes all necessary imports
7. ✅ **No metadata**: No licensing, citations, or directory trees
8. ✅ **Correct syntax**: Code is valid and would actually run
### Question Coverage Matrix
Create a checklist:
- [ ] Installation and setup
- [ ] Basic initialization
- [ ] Authentication methods
- [ ] Primary use case 1
- [ ] Primary use case 2
- [ ] Configuration options
- [ ] Error handling
- [ ] Advanced features
- [ ] Integration examples
- [ ] Testing approaches
Each checkbox should map to at least one high-quality snippet.
## Iteration and Refinement
After creating optimized documentation:
1. Run c7score to get baseline metrics
2. Identify lowest-scoring metric
3. Apply targeted improvements for that metric
4. Re-run c7score
5. Repeat until reaching target score (typically 85+)
### Common Score Ranges
- **90-100**: Excellent, comprehensive, question-focused documentation
- **80-89**: Good documentation with some gaps or formatting issues
- **70-79**: Adequate but needs more complete examples or has duplicates
- **60-69**: Significant gaps in question coverage or many formatting issues
- **Below 60**: Major restructuring needed
## Example: Full Snippet Transformation
### Original (Low Score)
```markdown
## Installation
```bash
npm install my-library
```
## Usage
Import the library:
```javascript
const MyLibrary = require('my-library');
```
## API
### connect(options)
Connects to the service.
### query(sql)
Executes a query.
```
### Optimized (High Score)
```markdown
## Getting Started: Installation and First Query
```javascript
// Install the library
// npm install my-library
const MyLibrary = require('my-library');
// Connect to your database
const client = new MyLibrary({
host: 'your-host.example.com',
apiKey: 'your-api-key',
database: 'production'
});
await client.connect();
// Run your first query
const results = await client.query('SELECT * FROM users LIMIT 5');
console.log(results);
// Always close the connection
await client.close();
```
## Common Use Cases
### Authenticating with OAuth
```javascript
const MyLibrary = require('my-library');
// OAuth authentication flow
const client = new MyLibrary({
authMethod: 'oauth',
clientId: 'your-client-id',
clientSecret: 'your-client-secret'
});
// Get auth URL for user
const authUrl = client.getAuthUrl('http://localhost:3000/callback');
console.log(`Visit: ${authUrl}`);
// After user authorizes, exchange code for token
const tokens = await client.exchangeCode(authCode);
await client.connect();
```
### Handling Errors and Retries
```javascript
const MyLibrary = require('my-library');
const client = new MyLibrary({
host: 'your-host.example.com',
apiKey: 'your-api-key',
// Configure automatic retries
retries: 3,
retryDelay: 1000
});
try {
await client.connect();
const results = await client.query('SELECT * FROM users');
console.log(results);
} catch (error) {
if (error.code === 'TIMEOUT') {
console.error('Query timed out, try a smaller result set');
} else if (error.code === 'AUTH_ERROR') {
console.error('Authentication failed, check your API key');
} else {
console.error('Unexpected error:', error.message);
}
} finally {
await client.close();
}
```
### Advanced: Batch Operations
```javascript
const MyLibrary = require('my-library');
const client = new MyLibrary({
host: 'your-host.example.com',
apiKey: 'your-api-key'
});
await client.connect();
// Batch insert for better performance
const users = [
{ name: 'Alice', email: 'alice@example.com' },
{ name: 'Bob', email: 'bob@example.com' },
{ name: 'Charlie', email: 'charlie@example.com' }
];
const result = await client.batchInsert('users', users);
console.log(`Inserted ${result.rowCount} users`);
await client.close();
```
```
**Score Impact:**
- Question coverage: +40 points (answers 4 major questions)
- Removes import-only: +5 points
- Consistent formatting: +5 points
- Working examples: +20 points
- No duplicates: +10 points
- **Total improvement: ~80 point increase**

View File

@@ -0,0 +1,343 @@
#!/usr/bin/env python3
"""
Documentation Analyzer for C7Score Optimization
Analyzes README and documentation files to identify:
- Snippets that are import-only or installation-only
- Potential formatting issues
- Metadata content (licensing, citations, directory structures)
- Duplicate or near-duplicate code blocks
- Missing question-answering examples
"""
import re
import sys
from pathlib import Path
from typing import List, Dict, Tuple
from collections import Counter
class CodeSnippet:
def __init__(self, language: str, code: str, context: str, line_num: int):
self.language = language
self.code = code.strip()
self.context = context # Text before the code block
self.line_num = line_num
self.issues = []
def __repr__(self):
return f"CodeSnippet(lang={self.language}, lines={len(self.code.splitlines())}, line={self.line_num})"
def extract_code_snippets(content: str) -> List[CodeSnippet]:
"""Extract all code blocks from markdown content."""
snippets = []
lines = content.split('\n')
i = 0
while i < len(lines):
line = lines[i]
# Match code block start
if line.strip().startswith('```'):
# Extract language
language = line.strip()[3:].strip() or 'unknown'
start_line = i
# Get context (previous non-empty lines up to 5)
context_lines = []
for j in range(max(0, i-5), i):
if lines[j].strip():
context_lines.append(lines[j].strip())
context = ' '.join(context_lines[-3:]) # Last 3 lines of context
# Collect code until end marker
i += 1
code_lines = []
while i < len(lines) and not lines[i].strip().startswith('```'):
code_lines.append(lines[i])
i += 1
code = '\n'.join(code_lines)
snippets.append(CodeSnippet(language, code, context, start_line + 1))
i += 1
return snippets
def analyze_snippet(snippet: CodeSnippet) -> List[str]:
"""Analyze a single code snippet for c7score issues."""
issues = []
code = snippet.code.strip()
lines = [l.strip() for l in code.split('\n') if l.strip()]
# Check 1: Import-only snippets
if lines:
import_patterns = [
r'^import\s+',
r'^from\s+\S+\s+import\s+',
r'^require\s*\(',
r'^const\s+\S+\s*=\s*require',
r'^using\s+',
]
import_count = sum(1 for line in lines if any(re.match(p, line) for p in import_patterns))
if import_count == len(lines) and len(lines) <= 5:
issues.append("⚠️ Import-only snippet (Metric 5: Initialization)")
# Check 2: Installation-only snippets
install_patterns = [
r'pip install',
r'npm install',
r'yarn add',
r'cargo install',
r'go get',
r'gem install',
]
if len(lines) <= 2 and any(any(pattern in line for pattern in install_patterns) for line in lines):
issues.append("⚠️ Installation-only snippet (Metric 5: Initialization)")
# Check 3: Snippet length
if len(lines) < 3:
issues.append("⚠️ Very short snippet (<3 lines) (Metric 3: Formatting)")
elif len(lines) > 100:
issues.append("⚠️ Very long snippet (>100 lines) (Metric 3: Formatting)")
# Check 4: Language tag issues
problematic_languages = [
'configuration', 'config', 'cli arguments', 'arguments',
'none', 'console', 'output', 'text', 'plaintext'
]
if snippet.language.lower() in problematic_languages:
issues.append(f"⚠️ Problematic language tag: '{snippet.language}' (Metric 3: Formatting)")
# Check 5: Looks like a list
if len(lines) > 3:
list_markers = sum(1 for line in lines if re.match(r'^\s*[-*\d.]+\s', line))
if list_markers / len(lines) > 0.5:
issues.append("⚠️ Appears to be a list, not code (Metric 3: Formatting)")
# Check 6: Directory structure
if any(all(char in line for char in ['', '', '', '']) for line in lines):
issues.append("⚠️ Directory structure detected (Metric 4: Project Metadata)")
# Check 7: License or citation markers
license_markers = ['license', 'copyright', 'mit', 'apache', 'gpl', 'bsd']
citation_markers = ['@article', '@book', 'bibtex', 'doi:', 'citation']
code_lower = code.lower()
if any(marker in code_lower for marker in license_markers) and len(code) > 100:
issues.append("⚠️ License content detected (Metric 4: Project Metadata)")
if any(marker in code_lower for marker in citation_markers):
issues.append("⚠️ Citation content detected (Metric 4: Project Metadata)")
return issues
def find_duplicates(snippets: List[CodeSnippet]) -> List[Tuple[int, int]]:
"""Find duplicate or near-duplicate snippets."""
duplicates = []
for i, snippet1 in enumerate(snippets):
for j, snippet2 in enumerate(snippets[i+1:], start=i+1):
# Normalize for comparison
code1 = re.sub(r'\s+', ' ', snippet1.code.lower()).strip()
code2 = re.sub(r'\s+', ' ', snippet2.code.lower()).strip()
# Exact duplicate
if code1 == code2:
duplicates.append((i, j))
# Near duplicate (>80% similar)
elif len(code1) > 20 and len(code2) > 20:
# Simple similarity check
min_len = min(len(code1), len(code2))
max_len = max(len(code1), len(code2))
if min_len / max_len > 0.8:
# Check if one contains most of the other
if code1 in code2 or code2 in code1:
duplicates.append((i, j))
return duplicates
def generate_question_suggestions(content: str) -> List[str]:
"""Suggest questions that should be answered in the documentation."""
# Extract apparent project name
title_match = re.search(r'^#\s+(.+)$', content, re.MULTILINE)
project_name = title_match.group(1) if title_match else "this library"
questions = [
f"How do I install {project_name}?",
f"How do I get started with {project_name}?",
f"How do I initialize/configure {project_name}?",
f"How do I authenticate with {project_name}?",
f"What are the main features and how do I use them?",
f"How do I handle errors in {project_name}?",
f"How do I perform [common operation]?",
f"What are common configuration options?",
f"How do I integrate {project_name} with [common tools]?",
f"How do I test code using {project_name}?",
]
return questions
def analyze_documentation(file_path: str) -> Dict:
"""Analyze documentation file for c7score optimization opportunities."""
path = Path(file_path)
if not path.exists():
return {"error": f"File not found: {file_path}"}
content = path.read_text(encoding='utf-8')
snippets = extract_code_snippets(content)
# Analyze each snippet
snippet_issues = []
for snippet in snippets:
issues = analyze_snippet(snippet)
if issues:
snippet_issues.append({
'snippet': snippet,
'issues': issues
})
# Find duplicates
duplicates = find_duplicates(snippets)
# Calculate statistics
total_snippets = len(snippets)
snippets_with_issues = len(snippet_issues)
# Language distribution
language_dist = Counter(s.language for s in snippets)
# Issue type counts
issue_types = Counter()
for item in snippet_issues:
for issue in item['issues']:
# Extract metric number
if "Metric 3" in issue:
issue_types["Formatting (M3)"] += 1
elif "Metric 4" in issue:
issue_types["Metadata (M4)"] += 1
elif "Metric 5" in issue:
issue_types["Initialization (M5)"] += 1
return {
'file': file_path,
'total_snippets': total_snippets,
'snippets_with_issues': snippets_with_issues,
'issue_breakdown': dict(issue_types),
'duplicates': len(duplicates),
'language_distribution': dict(language_dist),
'detailed_issues': snippet_issues,
'duplicate_pairs': duplicates,
'question_suggestions': generate_question_suggestions(content),
}
def print_report(analysis: Dict):
"""Print a formatted analysis report."""
if 'error' in analysis:
print(f"{analysis['error']}")
return
print(f"\n{'='*70}")
print(f"C7Score Documentation Analysis: {analysis['file']}")
print(f"{'='*70}\n")
print(f"📊 Summary Statistics")
print(f"{''*70}")
print(f"Total code snippets: {analysis['total_snippets']}")
print(f"Snippets with issues: {analysis['snippets_with_issues']}")
print(f"Duplicate snippets: {analysis['duplicates']}")
if analysis['total_snippets'] > 0:
issue_rate = (analysis['snippets_with_issues'] / analysis['total_snippets']) * 100
print(f"Issue rate: {issue_rate:.1f}%")
print(f"\n📝 Language Distribution")
print(f"{''*70}")
for lang, count in sorted(analysis['language_distribution'].items(), key=lambda x: x[1], reverse=True):
print(f" {lang}: {count}")
if analysis['issue_breakdown']:
print(f"\n⚠️ Issue Breakdown by Metric")
print(f"{''*70}")
for issue_type, count in sorted(analysis['issue_breakdown'].items(), key=lambda x: x[1], reverse=True):
print(f" {issue_type}: {count}")
if analysis['detailed_issues']:
print(f"\n🔍 Detailed Issues (Showing first 10)")
print(f"{''*70}")
for i, item in enumerate(analysis['detailed_issues'][:10], 1):
snippet = item['snippet']
print(f"\n{i}. Line {snippet.line_num} [{snippet.language}] ({len(snippet.code.splitlines())} lines)")
for issue in item['issues']:
print(f" {issue}")
# Show first 2 lines of code
code_preview = '\n'.join(snippet.code.split('\n')[:2])
print(f" Preview: {code_preview[:80]}...")
if analysis['duplicate_pairs']:
print(f"\n🔄 Duplicate Snippets")
print(f"{''*70}")
for i, (idx1, idx2) in enumerate(analysis['duplicate_pairs'][:5], 1):
print(f"{i}. Snippets at lines {snippets[idx1].line_num} and {snippets[idx2].line_num} are duplicates")
print(f"\n💡 Suggested Questions to Answer")
print(f"{''*70}")
for i, question in enumerate(analysis['question_suggestions'], 1):
print(f"{i}. {question}")
print(f"\n✅ Recommendations")
print(f"{''*70}")
recommendations = []
if analysis['issue_breakdown'].get('Initialization (M5)', 0) > 0:
recommendations.append(
"• Combine import-only and installation-only snippets with actual usage examples"
)
if analysis['issue_breakdown'].get('Formatting (M3)', 0) > 0:
recommendations.append(
"• Fix formatting issues: use proper language tags, avoid very short/long snippets"
)
if analysis['issue_breakdown'].get('Metadata (M4)', 0) > 0:
recommendations.append(
"• Remove or relocate metadata content (licensing, citations, directory trees)"
)
if analysis['duplicates'] > 0:
recommendations.append(
f"• Remove or consolidate {analysis['duplicates']} duplicate snippets (reduces LLM score)"
)
if analysis['total_snippets'] < 10:
recommendations.append(
"• Add more comprehensive code examples answering common developer questions"
)
if not recommendations:
recommendations.append("• Documentation looks good! Consider running actual c7score for detailed metrics")
for rec in recommendations:
print(rec)
print(f"\n{'='*70}\n")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python analyze_docs.py <path-to-readme-or-doc.md>")
sys.exit(1)
file_path = sys.argv[1]
analysis = analyze_documentation(file_path)
print_report(analysis)