11 KiB
C7Score Metrics Reference
Overview
c7score evaluates documentation quality for Context7 using 5 metrics divided into two groups:
- LLM Analysis (Metrics 1-2): AI-powered evaluation
- Text Analysis (Metrics 3-5): Rule-based checks
Metric 1: Question-Snippet Comparison (LLM)
What it measures: How well code snippets answer common developer questions about the library.
Scoring approach:
- LLM generates 15 common questions developers might ask about the library
- Each snippet is evaluated on how well it answers these questions
- Higher scores for snippets that directly address practical usage questions
Optimization strategies:
- Include code examples that answer "how do I..." questions
- Provide working code snippets for common use cases
- Address setup, configuration, and basic operations
- Show real-world usage patterns, not just API signatures
- Include examples that demonstrate the library's main features
What scores well:
- "How do I initialize the client?" with full working example
- "How do I handle authentication?" with complete code
- "How do I make a basic query?" with error handling included
What scores poorly:
- Partial code that doesn't run standalone
- API reference without usage examples
- Theoretical explanations without practical code
Metric 2: LLM Evaluation (LLM)
What it measures: Overall snippet quality including relevancy, clarity, and correctness.
Scoring criteria:
- Relevancy: Does the snippet provide useful information about the library?
- Clarity: Is the code and explanation easy to understand?
- Correctness: Is the code syntactically correct and using proper APIs?
- Uniqueness: Are snippets providing unique information or duplicating content?
Optimization strategies:
- Ensure each snippet provides distinct, valuable information
- Use clear variable names and structure
- Add brief explanatory comments where helpful
- Verify all code is syntactically correct
- Remove or consolidate duplicate snippets
- Test code examples to ensure they work
What causes low scores:
- High rate of duplicate snippets (>25% identical copies)
- Unclear or confusing code structure
- Syntax errors or incorrect API usage
- Snippets that don't add new information
Metric 3: Formatting (Text Analysis)
What it measures: Whether snippets have the expected format and structure.
Checks performed:
- Are categories missing? (e.g., no title, description, or code)
- Are code snippets too short or too long?
- Are language tags actually descriptions? (e.g., "FORTE Build System Configuration")
- Are languages set to "none" or showing console output?
- Is the code just a list or argument descriptions?
Optimization strategies:
- Follow consistent snippet structure: TITLE / DESCRIPTION / CODE
- Use 40-dash delimiters between snippets (----------------------------------------)
- Set proper language tags (python, javascript, typescript, bash, etc.)
- Avoid very short snippets (<3 lines) unless absolutely necessary
- Avoid very long snippets (>100 lines) - break into focused examples
- Don't use lists in place of code
Example good format:
Getting Started with Authentication
----------------------------------------
Initialize the client with your API key and authenticate requests.
```python
from library import Client
client = Client(api_key="your_api_key")
client.authenticate()
**What to avoid:**
- Language tags like "CLI Arguments" or "Configuration File"
- Pretty-printed tables instead of code
- Numbered/bulleted lists masquerading as code
- Missing titles or descriptions
- Inconsistent formatting
## Metric 4: Project Metadata (Text Analysis)
**What it measures:** Presence of irrelevant project information that doesn't help developers use the library.
**Checks performed:**
- BibTeX citations (would have language tag "Bibtex")
- Licensing information
- Directory structure listings
- Project governance or administrative content
**Optimization strategies:**
- Remove or minimize licensing snippets
- Avoid directory tree representations
- Don't include citation information
- Focus on usage, not project management
- Keep administrative content out of code documentation
**What to remove or relocate:**
- LICENSE files or license text
- CONTRIBUTING.md guidelines
- Directory listings or project structure
- Academic citations (BibTeX, APA, etc.)
- Governance policies
**Exception:** Brief installation or setup instructions that mention directories are okay if needed for library usage.
## Metric 5: Initialization (Text Analysis)
**What it measures:** Snippets that are only imports or installations without meaningful content.
**Checks performed:**
- Snippets that are just import statements
- Snippets that are just installation commands (pip install, npm install)
- No additional context or usage examples
**Optimization strategies:**
- Combine imports with usage examples
- Show installation in context of setup process
- Always follow imports with actual usage code
- Make installation snippets include next steps
**Good approach:**
```python
# Installation and basic usage
# First install: pip install library-name
from library import Client
# Initialize and make your first request
client = Client()
result = client.get_data()
Poor approach:
# Just imports
import library
from library import Client
# Just installation
pip install library-name
Scoring Weights
Default c7score weights (can be customized):
- Question-Snippet Comparison: 0.8 (80%)
- LLM Evaluation: 0.05 (5%)
- Formatting: 0.05 (5%)
- Project Metadata: 0.05 (5%)
- Initialization: 0.05 (5%)
The question-answer metric dominates because Context7's primary goal is helping developers answer practical questions about library usage.
Overall Best Practices
- Focus on answering questions: Think "How would a developer actually use this?"
- Provide complete, working examples: Not just fragments
- Ensure uniqueness: Each snippet should teach something new
- Structure consistently: TITLE / DESCRIPTION / CODE format
- Use proper language tags: python, javascript, typescript, etc.
- Remove noise: No licensing, directory trees, or pure imports
- Test your code: All examples should be syntactically correct
- Keep it practical: Real-world usage beats theoretical explanation
Self-Evaluation Rubrics
When evaluating documentation quality using c7score methodology, use these detailed rubrics:
1. Question-Snippet Matching Rubric (80% weight)
Score: 90-100 (Excellent)
- All major developer questions have complete answers
- Code examples are self-contained and runnable
- Examples include imports, setup, and usage context
- Common use cases are clearly demonstrated
- Error handling is shown where relevant
- Examples progress from simple to advanced
Score: 70-89 (Good)
- Most questions are answered with working code
- Examples are mostly complete but may miss minor details
- Some context or imports may be implicit
- Common use cases covered
- Minor gaps in error handling
Score: 50-69 (Fair)
- Some questions answered, others partially addressed
- Examples require significant external knowledge
- Missing imports or setup context
- Limited use case coverage
- Error handling largely absent
Score: 30-49 (Poor)
- Few questions fully answered
- Examples are fragments without context
- Unclear how to actually use the code
- Major use cases not covered
- No error handling
Score: 0-29 (Very Poor)
- Questions not addressed in documentation
- No practical examples
- Only API signatures without usage
- Cannot determine how to use the library
2. LLM Evaluation Rubric (10% weight)
Unique Information (30% of metric):
- 100%: Every snippet provides unique value, no duplicates
- 75%: Minimal duplication, mostly unique content
- 50%: Some repeated information across snippets
- 25%: Significant duplication
- 0%: Many duplicate snippets
Clarity (30% of metric):
- 100%: Well-worded, professional, no errors
- 75%: Clear with minor grammar/wording issues
- 50%: Understandable but awkward phrasing
- 25%: Confusing or poorly worded
- 0%: Unclear, incomprehensible
Correct Syntax (40% of metric):
- 100%: All code syntactically perfect
- 75%: Minor syntax issues (missing semicolons, etc.)
- 50%: Some syntax errors but code is recognizable
- 25%: Multiple syntax errors
- 0%: Code is not valid
Final LLM Evaluation Score = (Unique×0.3) + (Clarity×0.3) + (Syntax×0.4)
3. Formatting Rubric (5% weight)
Score: 100 (Perfect)
- All snippets have proper language tags (python, javascript, etc.)
- Language tags are actual languages, not descriptions
- All code blocks use triple backticks with language
- Code blocks are properly closed
- No lists within CODE sections
- Minimum length requirements met (5+ words)
Score: 80-99 (Minor Issues)
- 1-2 snippets missing language tags
- One or two incorrectly formatted blocks
- Minor inconsistencies
Score: 50-79 (Multiple Problems)
- Several snippets missing language tags
- Some use descriptive strings instead of language names
- Inconsistent formatting
Score: 0-49 (Significant Issues)
- Many snippets improperly formatted
- Widespread use of wrong language tags
- Code not in proper blocks
4. Metadata Removal Rubric (2.5% weight)
Score: 100 (Clean)
- No license text in code examples
- No citation formats (BibTeX, RIS)
- No directory structure listings
- No project metadata
- Pure code and usage examples
Score: 75-99 (Minimal Metadata)
- One or two snippets with minor metadata
- Brief license mentions that don't dominate
Score: 50-74 (Some Metadata)
- Several snippets include project metadata
- Directory structures present
- Some citation content
Score: 0-49 (Heavy Metadata)
- Significant license/citation content
- Multiple directory listings
- Project metadata dominates
5. Initialization Rubric (2.5% weight)
Score: 100 (Excellent)
- All examples show usage beyond setup
- Installation combined with first usage
- Imports followed by practical examples
- No standalone import/install snippets
Score: 75-99 (Mostly Good)
- 1-2 snippets are setup-only
- Most examples show actual usage
Score: 50-74 (Some Init-Only)
- Several snippets are just imports/installation
- Mixed quality
Score: 0-49 (Many Init-Only)
- Many snippets are only imports
- Many snippets are only installation
- Lack of usage examples
Scoring Best Practices
When evaluating:
- Read entire documentation before scoring
- Count specific examples (e.g., "7 out of 10 snippets...")
- Be consistent between before/after evaluations
- Explain scores with concrete evidence
- Use percentages when quantifying (e.g., "80% of examples...")
- Identify improvements specifically
- Calculate weighted average: (Q×0.8) + (L×0.1) + (F×0.05) + (M×0.025) + (I×0.025)
Example Calculation:
- Question-Snippet: 85/100 × 0.8 = 68
- LLM Evaluation: 90/100 × 0.1 = 9
- Formatting: 100/100 × 0.05 = 5
- Metadata: 100/100 × 0.025 = 2.5
- Initialization: 95/100 × 0.025 = 2.375
- Total: 86.875 ≈ 87/100
Common Scoring Mistakes to Avoid
❌ Being too generous: Score based on evidence, not potential ❌ Ignoring weights: Question-answer matters most (80%) ❌ Vague explanations: Say "5 of 8 examples lack imports" not "some issues" ❌ Inconsistent standards: Apply same rubric to before/after ❌ Forgetting context: Consider project type and audience ✅ Be specific, objective, and consistent