Initial commit

2025-11-29 17:52:13 +08:00
commit 4b20ee9596
10 changed files with 3079 additions and 0 deletions
--- a/skills/llm-docs-optimizer/references/c7score_metrics.md
+++ b/skills/llm-docs-optimizer/references/c7score_metrics.md
@@ -0,0 +1,349 @@
+# C7Score Metrics Reference
+
+## Overview
+
+c7score evaluates documentation quality for Context7 using 5 metrics divided into two groups:
+- **LLM Analysis** (Metrics 1-2): AI-powered evaluation
+- **Text Analysis** (Metrics 3-5): Rule-based checks
+
+## Metric 1: Question-Snippet Comparison (LLM)
+
+**What it measures:** How well code snippets answer common developer questions about the library.
+
+**Scoring approach:**
+- LLM generates 15 common questions developers might ask about the library
+- Each snippet is evaluated on how well it answers these questions
+- Higher scores for snippets that directly address practical usage questions
+
+**Optimization strategies:**
+- Include code examples that answer "how do I..." questions
+- Provide working code snippets for common use cases
+- Address setup, configuration, and basic operations
+- Show real-world usage patterns, not just API signatures
+- Include examples that demonstrate the library's main features
+
+**What scores well:**
+- "How do I initialize the client?" with full working example
+- "How do I handle authentication?" with complete code
+- "How do I make a basic query?" with error handling included
+
+**What scores poorly:**
+- Partial code that doesn't run standalone
+- API reference without usage examples
+- Theoretical explanations without practical code
+
+## Metric 2: LLM Evaluation (LLM)
+
+**What it measures:** Overall snippet quality including relevancy, clarity, and correctness.
+
+**Scoring criteria:**
+- **Relevancy**: Does the snippet provide useful information about the library?
+- **Clarity**: Is the code and explanation easy to understand?
+- **Correctness**: Is the code syntactically correct and using proper APIs?
+- **Uniqueness**: Are snippets providing unique information or duplicating content?
+
+**Optimization strategies:**
+- Ensure each snippet provides distinct, valuable information
+- Use clear variable names and structure
+- Add brief explanatory comments where helpful
+- Verify all code is syntactically correct
+- Remove or consolidate duplicate snippets
+- Test code examples to ensure they work
+
+**What causes low scores:**
+- High rate of duplicate snippets (>25% identical copies)
+- Unclear or confusing code structure
+- Syntax errors or incorrect API usage
+- Snippets that don't add new information
+
+## Metric 3: Formatting (Text Analysis)
+
+**What it measures:** Whether snippets have the expected format and structure.
+
+**Checks performed:**
+- Are categories missing? (e.g., no title, description, or code)
+- Are code snippets too short or too long?
+- Are language tags actually descriptions? (e.g., "FORTE Build System Configuration")
+- Are languages set to "none" or showing console output?
+- Is the code just a list or argument descriptions?
+
+**Optimization strategies:**
+- Follow consistent snippet structure: TITLE / DESCRIPTION / CODE
+- Use 40-dash delimiters between snippets (----------------------------------------)
+- Set proper language tags (python, javascript, typescript, bash, etc.)
+- Avoid very short snippets (<3 lines) unless absolutely necessary
+- Avoid very long snippets (>100 lines) - break into focused examples
+- Don't use lists in place of code
+
+**Example good format:**
+```
+Getting Started with Authentication
+----------------------------------------
+Initialize the client with your API key and authenticate requests.
+
+```python
+from library import Client
+
+client = Client(api_key="your_api_key")
+client.authenticate()
+```
+```
+
+**What to avoid:**
+- Language tags like "CLI Arguments" or "Configuration File"
+- Pretty-printed tables instead of code
+- Numbered/bulleted lists masquerading as code
+- Missing titles or descriptions
+- Inconsistent formatting
+
+## Metric 4: Project Metadata (Text Analysis)
+
+**What it measures:** Presence of irrelevant project information that doesn't help developers use the library.
+
+**Checks performed:**
+- BibTeX citations (would have language tag "Bibtex")
+- Licensing information
+- Directory structure listings
+- Project governance or administrative content
+
+**Optimization strategies:**
+- Remove or minimize licensing snippets
+- Avoid directory tree representations
+- Don't include citation information
+- Focus on usage, not project management
+- Keep administrative content out of code documentation
+
+**What to remove or relocate:**
+- LICENSE files or license text
+- CONTRIBUTING.md guidelines
+- Directory listings or project structure
+- Academic citations (BibTeX, APA, etc.)
+- Governance policies
+
+**Exception:** Brief installation or setup instructions that mention directories are okay if needed for library usage.
+
+## Metric 5: Initialization (Text Analysis)
+
+**What it measures:** Snippets that are only imports or installations without meaningful content.
+
+**Checks performed:**
+- Snippets that are just import statements
+- Snippets that are just installation commands (pip install, npm install)
+- No additional context or usage examples
+
+**Optimization strategies:**
+- Combine imports with usage examples
+- Show installation in context of setup process
+- Always follow imports with actual usage code
+- Make installation snippets include next steps
+
+**Good approach:**
+```python
+# Installation and basic usage
+# First install: pip install library-name
+
+from library import Client
+
+# Initialize and make your first request
+client = Client()
+result = client.get_data()
+```
+
+**Poor approach:**
+```python
+# Just imports
+import library
+from library import Client
+```
+
+```bash
+# Just installation
+pip install library-name
+```
+
+## Scoring Weights
+
+Default c7score weights (can be customized):
+- Question-Snippet Comparison: 0.8 (80%)
+- LLM Evaluation: 0.05 (5%)
+- Formatting: 0.05 (5%)
+- Project Metadata: 0.05 (5%)
+- Initialization: 0.05 (5%)
+
+The question-answer metric dominates because Context7's primary goal is helping developers answer practical questions about library usage.
+
+## Overall Best Practices
+
+1. **Focus on answering questions**: Think "How would a developer actually use this?"
+2. **Provide complete, working examples**: Not just fragments
+3. **Ensure uniqueness**: Each snippet should teach something new
+4. **Structure consistently**: TITLE / DESCRIPTION / CODE format
+5. **Use proper language tags**: python, javascript, typescript, etc.
+6. **Remove noise**: No licensing, directory trees, or pure imports
+7. **Test your code**: All examples should be syntactically correct
+8. **Keep it practical**: Real-world usage beats theoretical explanation
+
+---
+
+## Self-Evaluation Rubrics
+
+When evaluating documentation quality using c7score methodology, use these detailed rubrics:
+
+### 1. Question-Snippet Matching Rubric (80% weight)
+
+**Score: 90-100 (Excellent)**
+- All major developer questions have complete answers
+- Code examples are self-contained and runnable
+- Examples include imports, setup, and usage context
+- Common use cases are clearly demonstrated
+- Error handling is shown where relevant
+- Examples progress from simple to advanced
+
+**Score: 70-89 (Good)**
+- Most questions are answered with working code
+- Examples are mostly complete but may miss minor details
+- Some context or imports may be implicit
+- Common use cases covered
+- Minor gaps in error handling
+
+**Score: 50-69 (Fair)**
+- Some questions answered, others partially addressed
+- Examples require significant external knowledge
+- Missing imports or setup context
+- Limited use case coverage
+- Error handling largely absent
+
+**Score: 30-49 (Poor)**
+- Few questions fully answered
+- Examples are fragments without context
+- Unclear how to actually use the code
+- Major use cases not covered
+- No error handling
+
+**Score: 0-29 (Very Poor)**
+- Questions not addressed in documentation
+- No practical examples
+- Only API signatures without usage
+- Cannot determine how to use the library
+
+### 2. LLM Evaluation Rubric (10% weight)
+
+**Unique Information (30% of metric):**
+- 100%: Every snippet provides unique value, no duplicates
+- 75%: Minimal duplication, mostly unique content
+- 50%: Some repeated information across snippets
+- 25%: Significant duplication
+- 0%: Many duplicate snippets
+
+**Clarity (30% of metric):**
+- 100%: Well-worded, professional, no errors
+- 75%: Clear with minor grammar/wording issues
+- 50%: Understandable but awkward phrasing
+- 25%: Confusing or poorly worded
+- 0%: Unclear, incomprehensible
+
+**Correct Syntax (40% of metric):**
+- 100%: All code syntactically perfect
+- 75%: Minor syntax issues (missing semicolons, etc.)
+- 50%: Some syntax errors but code is recognizable
+- 25%: Multiple syntax errors
+- 0%: Code is not valid
+
+**Final LLM Evaluation Score** = (Unique×0.3) + (Clarity×0.3) + (Syntax×0.4)
+
+### 3. Formatting Rubric (5% weight)
+
+**Score: 100 (Perfect)**
+- All snippets have proper language tags (python, javascript, etc.)
+- Language tags are actual languages, not descriptions
+- All code blocks use triple backticks with language
+- Code blocks are properly closed
+- No lists within CODE sections
+- Minimum length requirements met (5+ words)
+
+**Score: 80-99 (Minor Issues)**
+- 1-2 snippets missing language tags
+- One or two incorrectly formatted blocks
+- Minor inconsistencies
+
+**Score: 50-79 (Multiple Problems)**
+- Several snippets missing language tags
+- Some use descriptive strings instead of language names
+- Inconsistent formatting
+
+**Score: 0-49 (Significant Issues)**
+- Many snippets improperly formatted
+- Widespread use of wrong language tags
+- Code not in proper blocks
+
+### 4. Metadata Removal Rubric (2.5% weight)
+
+**Score: 100 (Clean)**
+- No license text in code examples
+- No citation formats (BibTeX, RIS)
+- No directory structure listings
+- No project metadata
+- Pure code and usage examples
+
+**Score: 75-99 (Minimal Metadata)**
+- One or two snippets with minor metadata
+- Brief license mentions that don't dominate
+
+**Score: 50-74 (Some Metadata)**
+- Several snippets include project metadata
+- Directory structures present
+- Some citation content
+
+**Score: 0-49 (Heavy Metadata)**
+- Significant license/citation content
+- Multiple directory listings
+- Project metadata dominates
+
+### 5. Initialization Rubric (2.5% weight)
+
+**Score: 100 (Excellent)**
+- All examples show usage beyond setup
+- Installation combined with first usage
+- Imports followed by practical examples
+- No standalone import/install snippets
+
+**Score: 75-99 (Mostly Good)**
+- 1-2 snippets are setup-only
+- Most examples show actual usage
+
+**Score: 50-74 (Some Init-Only)**
+- Several snippets are just imports/installation
+- Mixed quality
+
+**Score: 0-49 (Many Init-Only)**
+- Many snippets are only imports
+- Many snippets are only installation
+- Lack of usage examples
+
+### Scoring Best Practices
+
+**When evaluating:**
+1. **Read entire documentation** before scoring
+2. **Count specific examples** (e.g., "7 out of 10 snippets...")
+3. **Be consistent** between before/after evaluations
+4. **Explain scores** with concrete evidence
+5. **Use percentages** when quantifying (e.g., "80% of examples...")
+6. **Identify improvements** specifically
+7. **Calculate weighted average**: (Q×0.8) + (L×0.1) + (F×0.05) + (M×0.025) + (I×0.025)
+
+**Example Calculation:**
+- Question-Snippet: 85/100 × 0.8 = 68
+- LLM Evaluation: 90/100 × 0.1 = 9
+- Formatting: 100/100 × 0.05 = 5
+- Metadata: 100/100 × 0.025 = 2.5
+- Initialization: 95/100 × 0.025 = 2.375
+- **Total: 86.875 ≈ 87/100**
+
+### Common Scoring Mistakes to Avoid
+
+❌ **Being too generous**: Score based on evidence, not potential
+❌ **Ignoring weights**: Question-answer matters most (80%)
+❌ **Vague explanations**: Say "5 of 8 examples lack imports" not "some issues"
+❌ **Inconsistent standards**: Apply same rubric to before/after
+❌ **Forgetting context**: Consider project type and audience
+✅ **Be specific, objective, and consistent**