# C7Score Metrics Reference ## Overview c7score evaluates documentation quality for Context7 using 5 metrics divided into two groups: - **LLM Analysis** (Metrics 1-2): AI-powered evaluation - **Text Analysis** (Metrics 3-5): Rule-based checks ## Metric 1: Question-Snippet Comparison (LLM) **What it measures:** How well code snippets answer common developer questions about the library. **Scoring approach:** - LLM generates 15 common questions developers might ask about the library - Each snippet is evaluated on how well it answers these questions - Higher scores for snippets that directly address practical usage questions **Optimization strategies:** - Include code examples that answer "how do I..." questions - Provide working code snippets for common use cases - Address setup, configuration, and basic operations - Show real-world usage patterns, not just API signatures - Include examples that demonstrate the library's main features **What scores well:** - "How do I initialize the client?" with full working example - "How do I handle authentication?" with complete code - "How do I make a basic query?" with error handling included **What scores poorly:** - Partial code that doesn't run standalone - API reference without usage examples - Theoretical explanations without practical code ## Metric 2: LLM Evaluation (LLM) **What it measures:** Overall snippet quality including relevancy, clarity, and correctness. **Scoring criteria:** - **Relevancy**: Does the snippet provide useful information about the library? - **Clarity**: Is the code and explanation easy to understand? - **Correctness**: Is the code syntactically correct and using proper APIs? - **Uniqueness**: Are snippets providing unique information or duplicating content? **Optimization strategies:** - Ensure each snippet provides distinct, valuable information - Use clear variable names and structure - Add brief explanatory comments where helpful - Verify all code is syntactically correct - Remove or consolidate duplicate snippets - Test code examples to ensure they work **What causes low scores:** - High rate of duplicate snippets (>25% identical copies) - Unclear or confusing code structure - Syntax errors or incorrect API usage - Snippets that don't add new information ## Metric 3: Formatting (Text Analysis) **What it measures:** Whether snippets have the expected format and structure. **Checks performed:** - Are categories missing? (e.g., no title, description, or code) - Are code snippets too short or too long? - Are language tags actually descriptions? (e.g., "FORTE Build System Configuration") - Are languages set to "none" or showing console output? - Is the code just a list or argument descriptions? **Optimization strategies:** - Follow consistent snippet structure: TITLE / DESCRIPTION / CODE - Use 40-dash delimiters between snippets (----------------------------------------) - Set proper language tags (python, javascript, typescript, bash, etc.) - Avoid very short snippets (<3 lines) unless absolutely necessary - Avoid very long snippets (>100 lines) - break into focused examples - Don't use lists in place of code **Example good format:** ``` Getting Started with Authentication ---------------------------------------- Initialize the client with your API key and authenticate requests. ```python from library import Client client = Client(api_key="your_api_key") client.authenticate() ``` ``` **What to avoid:** - Language tags like "CLI Arguments" or "Configuration File" - Pretty-printed tables instead of code - Numbered/bulleted lists masquerading as code - Missing titles or descriptions - Inconsistent formatting ## Metric 4: Project Metadata (Text Analysis) **What it measures:** Presence of irrelevant project information that doesn't help developers use the library. **Checks performed:** - BibTeX citations (would have language tag "Bibtex") - Licensing information - Directory structure listings - Project governance or administrative content **Optimization strategies:** - Remove or minimize licensing snippets - Avoid directory tree representations - Don't include citation information - Focus on usage, not project management - Keep administrative content out of code documentation **What to remove or relocate:** - LICENSE files or license text - CONTRIBUTING.md guidelines - Directory listings or project structure - Academic citations (BibTeX, APA, etc.) - Governance policies **Exception:** Brief installation or setup instructions that mention directories are okay if needed for library usage. ## Metric 5: Initialization (Text Analysis) **What it measures:** Snippets that are only imports or installations without meaningful content. **Checks performed:** - Snippets that are just import statements - Snippets that are just installation commands (pip install, npm install) - No additional context or usage examples **Optimization strategies:** - Combine imports with usage examples - Show installation in context of setup process - Always follow imports with actual usage code - Make installation snippets include next steps **Good approach:** ```python # Installation and basic usage # First install: pip install library-name from library import Client # Initialize and make your first request client = Client() result = client.get_data() ``` **Poor approach:** ```python # Just imports import library from library import Client ``` ```bash # Just installation pip install library-name ``` ## Scoring Weights Default c7score weights (can be customized): - Question-Snippet Comparison: 0.8 (80%) - LLM Evaluation: 0.05 (5%) - Formatting: 0.05 (5%) - Project Metadata: 0.05 (5%) - Initialization: 0.05 (5%) The question-answer metric dominates because Context7's primary goal is helping developers answer practical questions about library usage. ## Overall Best Practices 1. **Focus on answering questions**: Think "How would a developer actually use this?" 2. **Provide complete, working examples**: Not just fragments 3. **Ensure uniqueness**: Each snippet should teach something new 4. **Structure consistently**: TITLE / DESCRIPTION / CODE format 5. **Use proper language tags**: python, javascript, typescript, etc. 6. **Remove noise**: No licensing, directory trees, or pure imports 7. **Test your code**: All examples should be syntactically correct 8. **Keep it practical**: Real-world usage beats theoretical explanation --- ## Self-Evaluation Rubrics When evaluating documentation quality using c7score methodology, use these detailed rubrics: ### 1. Question-Snippet Matching Rubric (80% weight) **Score: 90-100 (Excellent)** - All major developer questions have complete answers - Code examples are self-contained and runnable - Examples include imports, setup, and usage context - Common use cases are clearly demonstrated - Error handling is shown where relevant - Examples progress from simple to advanced **Score: 70-89 (Good)** - Most questions are answered with working code - Examples are mostly complete but may miss minor details - Some context or imports may be implicit - Common use cases covered - Minor gaps in error handling **Score: 50-69 (Fair)** - Some questions answered, others partially addressed - Examples require significant external knowledge - Missing imports or setup context - Limited use case coverage - Error handling largely absent **Score: 30-49 (Poor)** - Few questions fully answered - Examples are fragments without context - Unclear how to actually use the code - Major use cases not covered - No error handling **Score: 0-29 (Very Poor)** - Questions not addressed in documentation - No practical examples - Only API signatures without usage - Cannot determine how to use the library ### 2. LLM Evaluation Rubric (10% weight) **Unique Information (30% of metric):** - 100%: Every snippet provides unique value, no duplicates - 75%: Minimal duplication, mostly unique content - 50%: Some repeated information across snippets - 25%: Significant duplication - 0%: Many duplicate snippets **Clarity (30% of metric):** - 100%: Well-worded, professional, no errors - 75%: Clear with minor grammar/wording issues - 50%: Understandable but awkward phrasing - 25%: Confusing or poorly worded - 0%: Unclear, incomprehensible **Correct Syntax (40% of metric):** - 100%: All code syntactically perfect - 75%: Minor syntax issues (missing semicolons, etc.) - 50%: Some syntax errors but code is recognizable - 25%: Multiple syntax errors - 0%: Code is not valid **Final LLM Evaluation Score** = (Unique×0.3) + (Clarity×0.3) + (Syntax×0.4) ### 3. Formatting Rubric (5% weight) **Score: 100 (Perfect)** - All snippets have proper language tags (python, javascript, etc.) - Language tags are actual languages, not descriptions - All code blocks use triple backticks with language - Code blocks are properly closed - No lists within CODE sections - Minimum length requirements met (5+ words) **Score: 80-99 (Minor Issues)** - 1-2 snippets missing language tags - One or two incorrectly formatted blocks - Minor inconsistencies **Score: 50-79 (Multiple Problems)** - Several snippets missing language tags - Some use descriptive strings instead of language names - Inconsistent formatting **Score: 0-49 (Significant Issues)** - Many snippets improperly formatted - Widespread use of wrong language tags - Code not in proper blocks ### 4. Metadata Removal Rubric (2.5% weight) **Score: 100 (Clean)** - No license text in code examples - No citation formats (BibTeX, RIS) - No directory structure listings - No project metadata - Pure code and usage examples **Score: 75-99 (Minimal Metadata)** - One or two snippets with minor metadata - Brief license mentions that don't dominate **Score: 50-74 (Some Metadata)** - Several snippets include project metadata - Directory structures present - Some citation content **Score: 0-49 (Heavy Metadata)** - Significant license/citation content - Multiple directory listings - Project metadata dominates ### 5. Initialization Rubric (2.5% weight) **Score: 100 (Excellent)** - All examples show usage beyond setup - Installation combined with first usage - Imports followed by practical examples - No standalone import/install snippets **Score: 75-99 (Mostly Good)** - 1-2 snippets are setup-only - Most examples show actual usage **Score: 50-74 (Some Init-Only)** - Several snippets are just imports/installation - Mixed quality **Score: 0-49 (Many Init-Only)** - Many snippets are only imports - Many snippets are only installation - Lack of usage examples ### Scoring Best Practices **When evaluating:** 1. **Read entire documentation** before scoring 2. **Count specific examples** (e.g., "7 out of 10 snippets...") 3. **Be consistent** between before/after evaluations 4. **Explain scores** with concrete evidence 5. **Use percentages** when quantifying (e.g., "80% of examples...") 6. **Identify improvements** specifically 7. **Calculate weighted average**: (Q×0.8) + (L×0.1) + (F×0.05) + (M×0.025) + (I×0.025) **Example Calculation:** - Question-Snippet: 85/100 × 0.8 = 68 - LLM Evaluation: 90/100 × 0.1 = 9 - Formatting: 100/100 × 0.05 = 5 - Metadata: 100/100 × 0.025 = 2.5 - Initialization: 95/100 × 0.025 = 2.375 - **Total: 86.875 ≈ 87/100** ### Common Scoring Mistakes to Avoid ❌ **Being too generous**: Score based on evidence, not potential ❌ **Ignoring weights**: Question-answer matters most (80%) ❌ **Vague explanations**: Say "5 of 8 examples lack imports" not "some issues" ❌ **Inconsistent standards**: Apply same rubric to before/after ❌ **Forgetting context**: Consider project type and audience ✅ **Be specific, objective, and consistent**