commit 33b13a8d5e53a8357fd882d2c3059639d7dffdf2 Author: Zhongwei Li Date: Sat Nov 29 18:14:54 2025 +0800 Initial commit diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json new file mode 100644 index 0000000..493e7b4 --- /dev/null +++ b/.claude-plugin/plugin.json @@ -0,0 +1,17 @@ +{ + "name": "doc-writer", + "description": "Write clear, effective technical documentation following industry-proven patterns from exemplary projects (React, Rust, Stripe, Twilio) and authoritative style guides (Google, Diátaxis, Write the Docs, Microsoft). Use when creating or improving READMEs, API docs, tutorials, guides, or any technical documentation.", + "version": "1.0.0", + "author": { + "name": "codethread" + }, + "skills": [ + "./skills" + ], + "agents": [ + "./agents" + ], + "hooks": [ + "./hooks" + ] +} \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..abc7282 --- /dev/null +++ b/README.md @@ -0,0 +1,3 @@ +# doc-writer + +Write clear, effective technical documentation following industry-proven patterns from exemplary projects (React, Rust, Stripe, Twilio) and authoritative style guides (Google, Diátaxis, Write the Docs, Microsoft). Use when creating or improving READMEs, API docs, tutorials, guides, or any technical documentation. diff --git a/agents/docs-reviewer.md b/agents/docs-reviewer.md new file mode 100644 index 0000000..095e345 --- /dev/null +++ b/agents/docs-reviewer.md @@ -0,0 +1,246 @@ +--- +name: docs-reviewer +description: Ruthlessly simplifies documentation by eliminating unnecessary content. Use proactively after writing any documentation to ensure clarity and focus. MUST BE USED for CLAUDE.md, SKILL.md, slash commands, and agent files. +tools: Read, Grep, Glob, Skill +model: sonnet +--- + +# Documentation Reviewer + +Ruthlessly simplify documentation by challenging every element's necessity. + +## Core Philosophy + +**Minimal yet complete**. Every paragraph, sentence, example, and emoji must justify its existence. + +**Guiding question**: Would the documentation still be clear without this element? + +## Review Process + +### 1. Initial Assessment + +Identify: +- Primary goal of the document +- Target audience (humans vs. Claude Code) +- Critical information that must remain + +### 2. Challenge Every Element + +**Paragraphs**: +- Does this add new information or repeat? +- Could this be a single sentence? +- Is this addressing a real need or hypothetical concern? + +**Sentences**: +- Does every word contribute meaning? +- Can this be said more directly? +- Is this stating the obvious? + +**Examples**: +- Does this clarify something genuinely confusing? +- Is one example enough? +- Could the concept be understood without it? + +**Code blocks**: +- Is the entire block necessary or just a fragment? +- Are comments explaining what the code makes obvious? + +**Lists**: +- Could this be prose instead? +- Are all items equally important? + +**Emojis**: +- Remove unless explicitly requested by user + +### 3. Extra Scrutiny for Claude Code Docs + +**For CLAUDE.md, SKILL.md, commands, agents** - be **particularly brutal**. + +**Eliminate**: +- Motivational language ("This powerful feature...") +- Hand-holding for obvious concepts +- Multiple examples when one suffices +- Redundant section introductions +- Unnecessary context that doesn't change behavior +- Hedge words ("typically," "generally," "usually") +- Filler transitions ("Now that we've covered X...") +- Repetition across sections + +**Preserve**: +- Precise technical specifications +- Non-obvious behavior and edge cases +- Minimum context for correct operation +- Clear structure and navigation +- Actionable instructions without preamble + +**Question**: +- "Does Claude need to know this to function correctly?" +- "Would removing this cause errors or confusion?" +- "Is this documenting the API or explaining how to think?" + +### 4. Simplify Structure + +**Headings**: +- Merge sections covering related content +- Remove heading levels with only one subsection +- Make headings scannable and descriptive + +**Organization**: +- Flatten unnecessarily deep hierarchies +- Combine short sections +- Remove "Introduction" and "Conclusion" if they repeat content + +## Output Format + +Provide feedback in three categories: + +**REMOVE (High Priority)**: +- Content to delete entirely +- Specify exactly what (line numbers, sections) +- Brief explanation why (redundant, obvious, unnecessary) + +**SIMPLIFY (Medium Priority)**: +- Content that could be shorter or clearer +- Suggest specific simplifications +- Show before/after when helpful + +**KEEP BUT QUESTION (Low Priority)**: +- Borderline content +- Explain the concern +- Let author decide + +## Examples + +### Before (Verbose): + +```markdown +## Introduction + +In this section, we're going to explore the important topic of configuration +options. Configuration is a crucial part of any system, and understanding +how to configure things properly will help you get the most out of the tool. +Let's dive into the various configuration options that are available to you. + +### Configuration File Location + +The configuration file can be found in your home directory. Specifically, +it will be located at `~/.config/app/config.json`. This is where you'll +want to make changes to customize your experience. +``` + +### After (Ruthless): + +```markdown +## Configuration + +Edit `~/.config/app/config.json` to customize behavior. +``` + +--- + +### Before (Excessive Examples): + +````markdown +You can use the API like this: + +Example 1: +```python +result = api.get('/users') +``` + +Example 2: +```python +result = api.get('/posts') +``` + +Example 3: +```python +result = api.get('/comments') +``` + +As you can see, you simply call `api.get()` with the endpoint path. +```` + +### After (Essential): + +````markdown +```python +result = api.get('/users') +``` +```` + +--- + +### Before (Hedge Words): + +```markdown +Generally speaking, you'll typically want to use this approach in most cases, +as it usually provides better performance. +``` + +### After (Direct): + +```markdown +Use this approach for better performance. +``` + +## Red Flags + +Watch for: +- 🚩 Same information in multiple places +- 🚩 "Introduction" or "Overview" sections adding no value +- 🚩 Paragraphs that could be bullet points +- 🚩 Bullet points that could be single sentences +- 🚩 Examples showing obvious variations +- 🚩 Warnings about obvious consequences +- 🚩 Step-by-step for trivial tasks +- 🚩 Explanations of what code shows +- 🚩 Metaphors for simple concepts +- 🚩 "As mentioned above" or "As we'll see later" + +## Special Cases + +**User-facing docs (README.md)**: +- Slightly less ruthless +- Marketing tone may be appropriate +- More generous with examples for complex features + +**API Reference**: +- Comprehensive but concise +- Remove prose, keep specifications +- One clear example per method +- No redundant parameter descriptions + +**Tutorials**: +- More examples justified +- But each must teach something new +- Remove steps for obvious actions + +## Review Checklist + +- [ ] Challenged every paragraph's existence +- [ ] Questioned every example's necessity +- [ ] Eliminated hedge words and filler +- [ ] Removed obvious explanations +- [ ] Condensed or merged redundant sections +- [ ] Checked if formatting aids or hinders scanning +- [ ] Applied extra scrutiny to Claude Code docs +- [ ] Verified critical information remains clear + +## Delivering Feedback + +Be direct and specific: + +✓ **Good**: +> Lines 45-67: Remove entire "Background" section. Content repeats "Setup" section and doesn't affect usage. + +✗ **Bad**: +> The Background section seems a bit long and might be improved. + +✓ **Good**: +> Line 89: Change "typically you'll want to generally use this approach in most cases" to "use this approach" + +✗ **Bad**: +> Try to be more concise here. + +**Goal**: Minimal documentation that fully serves its purpose, not merely shorter documentation. diff --git a/hooks/doc-writer-suggest.ts b/hooks/doc-writer-suggest.ts new file mode 100755 index 0000000..c9503f5 --- /dev/null +++ b/hooks/doc-writer-suggest.ts @@ -0,0 +1,101 @@ +#!/usr/bin/env bun +import type { PostToolUseHookInput, SyncHookJSONOutput } from '@anthropic-ai/claude-agent-sdk'; +import { readSessionCache, writeSessionCache } from '../../../utils/session-cache'; +import { readFileSync } from 'node:fs'; + +interface SessionCache { + doc_writer_suggested: boolean; + first_triggered: string; + triggered_by: string; +} + +const PLUGIN_NAME = 'doc-writer'; + +async function main() { + try { + // Read input from stdin + const input = readFileSync(0, 'utf-8'); + const data: PostToolUseHookInput = JSON.parse(input); + + // Check if the tool was Write, Edit, or MultiEdit + const relevantTools = ['Write', 'Edit', 'MultiEdit']; + if (!relevantTools.includes(data.tool_name)) { + process.exit(0); + } + + // Check if any markdown files were modified + let isMarkdownFile = false; + let filePath = ''; + + if (data.tool_name === 'Write' || data.tool_name === 'Edit') { + // Single file operations + const toolInput = data.tool_input as Record; + filePath = toolInput.file_path as string; + if (filePath?.toLowerCase().endsWith('.md')) { + isMarkdownFile = true; + } + } else if (data.tool_name === 'MultiEdit') { + // MultiEdit might have multiple files + const toolInput = data.tool_input as Record; + const edits = toolInput.edits as Array<{ file_path: string }>; + if (edits && Array.isArray(edits)) { + for (const edit of edits) { + if (edit.file_path?.toLowerCase().endsWith('.md')) { + isMarkdownFile = true; + filePath = edit.file_path; + break; + } + } + } + } + + // If a markdown file was modified, suggest the doc-writer skill + if (isMarkdownFile) { + // Check session cache - only suggest once per session + const session = readSessionCache(PLUGIN_NAME, data.cwd, data.session_id); + + // If already suggested this session, exit silently + if (session?.doc_writer_suggested) { + process.exit(0); + } + + let context = '\n'; + context += `Detected markdown file modification: ${filePath}\n\n`; + context += 'ESSENTIAL SKILL:\n'; + context += ' → doc-writer:writing-documentation\n\n'; + context += 'RECOMMENDED AGENT:\n'; + context += ' → doc-writer:docs-reviewer\n'; + context += ''; + + // Return JSON with hookSpecificOutput for PostToolUse + // Note: decision is undefined (no blocking), but additionalContext should still be provided + const output: SyncHookJSONOutput = { + hookSpecificOutput: { + hookEventName: 'PostToolUse', + additionalContext: context, + }, + }; + + console.log(JSON.stringify(output)); + + // Mark as suggested in session cache + const sessionCache: SessionCache = { + doc_writer_suggested: true, + first_triggered: new Date().toISOString(), + triggered_by: filePath, + }; + writeSessionCache(PLUGIN_NAME, data.cwd, data.session_id, sessionCache); + } + + // Exit 0 = success, additionalContext is added to context if provided + process.exit(0); + } catch (err) { + console.error('Error in doc-writer-suggest hook:', err); + process.exit(1); + } +} + +main().catch((err) => { + console.error('Uncaught error:', err); + process.exit(1); +}); diff --git a/hooks/hooks.json b/hooks/hooks.json new file mode 100644 index 0000000..6e32e82 --- /dev/null +++ b/hooks/hooks.json @@ -0,0 +1,17 @@ +{ + "description": "Auto-suggests doc-writer skill when markdown files are modified", + "hooks": { + "PostToolUse": [ + { + "matcher": "Write|Edit|MultiEdit", + "hooks": [ + { + "type": "command", + "command": "${CLAUDE_PLUGIN_ROOT}/hooks/doc-writer-suggest.ts", + "timeout": 30 + } + ] + } + ] + } +} diff --git a/plugin.lock.json b/plugin.lock.json new file mode 100644 index 0000000..24c0984 --- /dev/null +++ b/plugin.lock.json @@ -0,0 +1,69 @@ +{ + "$schema": "internal://schemas/plugin.lock.v1.json", + "pluginId": "gh:codethread/claude-code-plugins:plugins/doc-writer", + "normalized": { + "repo": null, + "ref": "refs/tags/v20251128.0", + "commit": "a48847d03a02868034ac603ade4b4ed02c67eec4", + "treeHash": "7362b73192ce9452d8bd6a9b91118ccf51065dbb5cdb661757a9d8d1b1991e89", + "generatedAt": "2025-11-28T10:15:43.739247Z", + "toolVersion": "publish_plugins.py@0.2.0" + }, + "origin": { + "remote": "git@github.com:zhongweili/42plugin-data.git", + "branch": "master", + "commit": "aa1497ed0949fd50e99e70d6324a29c5b34f9390", + "repoRoot": "/Users/zhongweili/projects/openmind/42plugin-data" + }, + "manifest": { + "name": "doc-writer", + "description": "Write clear, effective technical documentation following industry-proven patterns from exemplary projects (React, Rust, Stripe, Twilio) and authoritative style guides (Google, Diátaxis, Write the Docs, Microsoft). Use when creating or improving READMEs, API docs, tutorials, guides, or any technical documentation.", + "version": "1.0.0" + }, + "content": { + "files": [ + { + "path": "README.md", + "sha256": "b6502da5fc049534e11113a996de80425d6c8367aefe08dec1293bb1a22bfa8d" + }, + { + "path": "agents/docs-reviewer.md", + "sha256": "da187349757f83fe01940c20389db2c672027abd95ed9dcab57bb8bc178cc3bc" + }, + { + "path": "hooks/hooks.json", + "sha256": "7b57fc5b35c35bedbfce8cbe33f39ad89387c821e8d23b78be3c99d4815a5054" + }, + { + "path": "hooks/doc-writer-suggest.ts", + "sha256": "49ce003fe40b471dcf1977dfaa3452d9bb9ab87d0c3fe5edd429104e363645cf" + }, + { + "path": ".claude-plugin/plugin.json", + "sha256": "fcc0eecc1fffaf48b1989045818b84ddd85742b4f4cbbaefb286b3ad5c506925" + }, + { + "path": "skills/writing-documentation/SKILL.md", + "sha256": "b8cba8305ddb632f1b6e2ebeec17ddb33f3973ed6683f4cabcee3e2e787622d3" + }, + { + "path": "skills/writing-documentation/references/best-practices.md", + "sha256": "5a389f83161923f97592b5eb4a9fa18428ee40903a04e441b1633a0b31755b85" + }, + { + "path": "skills/writing-documentation/references/llm-pitfalls.md", + "sha256": "1d9f96ca717bb62fcc8880eec0ef9f9a859c7b6a16eb1822c36a9655bbd157de" + }, + { + "path": "skills/writing-documentation/references/exemplary-projects.md", + "sha256": "c43a0f37ab3bc05b15b44e512bc192761f928376044769e3418afaaacce86123" + } + ], + "dirSha256": "7362b73192ce9452d8bd6a9b91118ccf51065dbb5cdb661757a9d8d1b1991e89" + }, + "security": { + "scannedAt": null, + "scannerVersion": null, + "flags": [] + } +} \ No newline at end of file diff --git a/skills/writing-documentation/SKILL.md b/skills/writing-documentation/SKILL.md new file mode 100644 index 0000000..fcc5a96 --- /dev/null +++ b/skills/writing-documentation/SKILL.md @@ -0,0 +1,173 @@ +--- +name: writing-documentation +description: Write clear, effective technical documentation following industry-proven patterns from exemplary projects and authoritative style guides, with built-in countermeasures for common LLM documentation issues +--- + +# Writing Documentation Skill + +## Documentation Types (Diátaxis Framework) + +- **Tutorial** - Learning-oriented, step-by-step +- **How-to** - Task-oriented, specific problems +- **Reference** - Technical specifications +- **Explanation** - Clarifies concepts + +Details in `references/best-practices.md`. + +## Writing for Claude Code + +**CRITICAL**: When writing documentation that Claude reads (SKILL.md, CLAUDE.md, commands, agents): + +### 1. Test Claude's Base Knowledge First + +Verify what Claude already knows: + +```bash +# Use haiku for cost-effective testing (gives same quality answers as sonnet) +claude --print --model haiku "Do NOT use any skills. How would you [perform task]?" +claude --print --model haiku "Do NOT use any skills. When should you [make decision]?" +``` + +### 2. Document ONLY Unique Patterns + +Include only what Claude wouldn't naturally do: + +- ✓ Opinionated architectural choices +- ✓ Counter-intuitive decisions +- ✓ Project-specific conventions +- ✓ Non-default patterns + +Remove redundant content: +- ✗ Standard library usage +- ✗ Common best practices +- ✗ Well-known patterns +- ✗ Basic language features + +### 3. Example: React Skill Reduction + +Testing revealed Claude knows TanStack Query/Zustand/RTL patterns but doesn't default to: +- "Test stores, not components" (counter-cultural) +- "NO useState for complex logic" (prescriptive) +- "Inline actions unless repeated 2+" (specific rule) + +Result: 328→125 lines (-62%) by documenting only unique opinions. + +## Verifying Technical Accuracy + +### API Verification Workflow + +When documenting unfamiliar APIs or libraries: + +**1. Launch researcher agent:** + +``` +Use Task tool to launch researcher agent to verify [API/library] documentation +``` + +Researcher agent uses Context7 MCP to fetch official API docs and verify method signatures. + +**2. Read the codebase:** + +For internal/project APIs: + +``` +Read relevant source files to verify method signatures exist +``` + +**3. State version requirements:** +- Specify versions when certain: `# Using pandas 2.0+ DataFrame.merge()` +- Add verification note when uncertain: `# Verify this API exists in your version` + +**4. Direct to official docs:** +Add link to authoritative source. + +### Security Verification + +**Required checks before documenting code:** + +1. **SQL**: Parameterized queries, never string concatenation +2. **YAML**: `yaml.safe_load()`, never `yaml.load()` +3. **Credentials**: Environment variables, never hard-coded +4. **Input**: Always validate before processing +5. **Errors**: Handle network/file operations + +Use researcher agent if uncertain about security best practices. + +## Code Example Requirements + +### Every Example Must Include + +1. **All imports and dependencies** +2. **Complete, copy-paste ready code** (no ellipsis or pseudo-code) +3. **Expected output** when relevant +4. **Error handling** for production use +5. **Context explaining "why"** + +Example: + +```python +# Process in batches of 1000 to avoid memory exhaustion. +# Testing: smaller (100) = 3x overhead, larger (10000) = OOM on 8GB systems. +BATCH_SIZE = 1000 + +for batch in chunks(items, BATCH_SIZE): + process_batch(batch) +``` + +### Production-Ready Requirements + +Include when relevant: +- Authentication/authorization +- Logging for debugging +- Rate limiting and retries +- Timeout handling +- Resource cleanup + +See `references/best-practices.md` for complete production-ready examples. + +## Using docs-reviewer Agent + +After writing documentation: + +``` +Use docs-reviewer agent to ruthlessly simplify +``` + +The agent challenges every element's necessity, asking "Would the documentation still be clear without this?" + +**Critical for**: +- CLAUDE.md, SKILL.md files +- Slash commands and agents +- Any Claude Code documentation + +Eliminates: +- Motivational language +- Redundant examples +- Unnecessary context that doesn't change behavior + +## LLM Self-Checklist + +Before publishing: + +**Verification:** +- [ ] Verified APIs exist (researcher agent, codebase read, or official docs) +- [ ] Code is complete and secure (all imports, parameterized queries, error handling) +- [ ] Examples are production-ready (auth, logging, retries, timeouts) + +**Content Quality:** +- [ ] Context and "why" explained, not just "what" +- [ ] Specific details, not generic (40ms not "significant", name technologies not "various aspects") +- [ ] Consistent terminology throughout +- [ ] Appropriate hedging (direct for facts, hedge only when uncertain) + +**Claude Code Docs:** +- [ ] Tested base knowledge with `claude --print` +- [ ] Documented only unique patterns +- [ ] Applied docs-reviewer agent for ruthless simplification + +## References + +Research foundation in `references/`: +- `exemplary-projects.md` - Analysis of well-documented projects +- `best-practices.md` - Authoritative style guide synthesis +- `llm-pitfalls.md` - LLM-specific quality issues diff --git a/skills/writing-documentation/references/best-practices.md b/skills/writing-documentation/references/best-practices.md new file mode 100644 index 0000000..a310c39 --- /dev/null +++ b/skills/writing-documentation/references/best-practices.md @@ -0,0 +1,428 @@ +# Documentation Best Practices & Guidelines + +Synthesized from authoritative sources: Google Developer Documentation Style Guide, Diátaxis Framework, Write the Docs community, and Microsoft Writing Style Guide. + +## Core Documentation Frameworks + +### Diátaxis Framework +**Created by:** Daniele Procida (Canonical) +**Adopted by:** Python, Ubuntu, OpenStack, Gatsby, Cloudflare, and hundreds of other projects + +**Four Interconnected Documentation Types:** + +1. **Tutorials** (Learning-focused) + - Teaches foundational concepts + - Takes learner by the hand + - Learning-oriented, not goal-oriented + - Example: "Getting started with Python" + +2. **How-To Guides** (Task-oriented) + - Solves specific problems + - Goal-oriented practical steps + - Assumes some knowledge + - Example: "How to configure authentication" + +3. **Technical Reference** (Information-oriented) + - Factual lookup documentation + - Describes the machinery + - Accurate and complete + - Example: "API Reference" + +4. **Explanation** (Understanding-oriented) + - Conceptual understanding + - Clarifies and enlightens + - Provides context + - Example: "Understanding async/await" + +**Key Insight:** Mixing these types causes confusion. Keep them separate but linked. + +--- + +## Voice and Tone (Universal Standards) + +### Person, Voice, and Tense + +| Element | Recommendation | Example | +|---------|----------------|---------| +| **Person** | Second person ("you") | "You can configure..." not "We can configure..." | +| **Voice** | Active voice | "The API returns JSON" not "JSON is returned" | +| **Tense** | Present tense | "The function processes..." not "will process..." | +| **Mood** | Imperative for instructions | "Click Submit" not "You should click Submit" | + +**Why This Matters:** +- **Present tense:** Computers have no past/future; everything happens now +- **Active voice:** Clarifies who performs actions; easier to understand +- **Second person:** Directly addresses readers; increases engagement +- **Imperative:** More concise and direct for instructions + +### Tone Characteristics + +**Recommended:** +- Conversational but professional +- Friendly without being frivolous +- Warm and relaxed (Microsoft: "ready to lend a hand") +- Technically accurate without unnecessary jargon + +**Avoid:** +- Overly casual language +- Humor that doesn't translate globally +- Marketing speak in technical docs +- Condescending or patronizing tone + +--- + +## Content Organization + +### Documentation Structure (Standard Pattern) + +**For README/Landing Pages:** +1. **Title** (only one H1 heading) +2. **Brief description** (what and why) +3. **Installation/Setup** +4. **Quick Start** (get running in minutes) +5. **Usage examples** +6. **Features** +7. **API Reference** (or link to separate docs) +8. **Contributing/License** + +**For API Documentation:** +1. **Overview** (what it does, why it exists) +2. **Authentication** +3. **Quick Start** (first successful request) +4. **Core Concepts** (key abstractions) +5. **API Reference** (endpoints, parameters, responses) +6. **Error Handling** +7. **Examples** (common use cases) + +**For Tutorials:** +1. **Prerequisites** (what reader needs to know) +2. **Learning objectives** (what they'll accomplish) +3. **Step-by-step instructions** +4. **Expected output** (validate success) +5. **Next steps** (where to go from here) + +### Heading Hierarchy + +```markdown +# Project Title (ONLY ONE H1) + +## Major Section (H2) + +### Subsection (H3) + +#### Sub-subsection (H4) +``` + +**Rules:** +- Only one H1 per page (the title) +- Don't skip heading levels (no H2 → H4) +- Use sentence case for headings +- Make headings descriptive and scannable + +--- + +## Writing Style Guidelines + +### Word Choice and Grammar + +**Use:** +- Serial (Oxford) commas in lists +- Standard American spelling (if applicable) +- Descriptive link text ("See the installation guide" not "click here") +- Consistent terminology throughout +- Short, clear sentences +- Concrete examples + +**Avoid:** +- Ambiguous pronouns (this, that, it without clear antecedent) +- Future tense for current features +- Passive voice when active is clearer +- Unnecessarily long words or phrases +- Jargon without definition + +### Formatting Standards + +**Code-related text:** +- Use `code font` for: file names, paths, variables, function names, commands +- Use **bold** for: UI elements users click, emphasis +- Use *italics* sparingly (mainly for introducing new terms) + +**Lists:** +- **Numbered lists:** For sequential steps +- **Bulleted lists:** For non-sequential items +- **Description lists:** For term-definition pairs + +--- + +## Code Examples and Documentation + +### Essential Elements for Code Examples + +1. **Context** - Explain what the code does and why +2. **Complete examples** - Code should be runnable (or clearly marked otherwise) +3. **Expected output** - Show what happens when code runs +4. **Comments** - Explain "why," not "what" +5. **Error handling** - Show how to handle failures + +### Code Comment Quality + +**Bad:** States the obvious +```javascript +i++; // increment i +const user = getUser(); // get the user +``` + +**Good:** Explains reasoning +```javascript +// API requires explicit null to avoid nested table joins +// We aggregate extra data in the next query +const userInfo = await externalUserApi(userId, null); +``` + +### API Documentation Standards + +For every method/function, document: + +**Parameters:** +- Name and description +- Data type +- Required vs. optional +- Default values +- Constraints (min/max, allowed values, format) + +**Returns:** +- Return type +- Description of return value +- Possible return states + +**Errors/Exceptions:** +- What errors can occur +- When they occur +- How to handle them + +**Example:** +```javascript +/** + * Calculates the sum of two numbers. + * + * @param {number} a - The first number (required) + * @param {number} b - The second number (required) + * @returns {number} The sum of a and b + * @throws {TypeError} If either parameter is not a number + * + * @example + * const result = sum(5, 3); + * // Returns: 8 + * + * @example + * sum(5, "3"); // Throws TypeError + */ +function sum(a, b) { + if (typeof a !== 'number' || typeof b !== 'number') { + throw new TypeError('Both parameters must be numbers'); + } + return a + b; +} +``` + +--- + +## Audience Awareness + +### Know Your Audience + +**Key Questions:** +- What is their skill level? +- What are they trying to accomplish? +- What do they already know? +- What context do they need? + +**Adaptations:** +- **Beginners:** More context, fewer assumptions, step-by-step guidance +- **Intermediate:** Less hand-holding, focus on patterns and best practices +- **Advanced:** Concise reference, edge cases, performance considerations + +### State Prerequisites + +Always specify what readers should know before starting: + +**Good:** +```markdown +## Prerequisites + +Before starting this tutorial, you should have: +- Basic familiarity with JavaScript +- Node.js 18+ installed +- Understanding of REST APIs +``` + +**Also Good (if none):** +```markdown +## Prerequisites + +This guide assumes no prior knowledge. We'll cover everything from scratch. +``` + +### Serve Multiple Audiences + +**Techniques:** +1. **Progressive disclosure:** Basic info first, "advanced" sections clearly marked +2. **Difficulty indicators:** Label sections (beginner/intermediate/advanced) +3. **Multiple pathways:** Quick start for experienced users, detailed tutorial for beginners +4. **Collapsible sections:** Advanced details hidden until needed +5. **"If you're new to X" sidebars:** Extra context for those who need it + +--- + +## Accessibility and Inclusivity + +### Inclusive Language + +**Use:** +- Gender-neutral language +- "They/their" for singular when gender unknown +- Descriptive terms, not assumptions +- Globally understood examples + +**Avoid:** +- Gendered pronouns when unnecessary +- Cultural references that don't translate +- Idioms and colloquialisms +- Ableist language ("sanity check" → "validation check") + +### Visual Accessibility + +**Images:** +- Always include descriptive alt text +- Use high-resolution or vector images +- Ensure sufficient color contrast +- Don't rely solely on color to convey information + +**Structure:** +- Clear visual hierarchy +- Sufficient whitespace +- Readable font sizes +- Responsive design for mobile + +--- + +## Maintenance and Lifecycle + +### Documentation as Code + +**Principles:** +1. **Version control:** Documentation lives in git with code +2. **Review process:** Documentation PRs reviewed like code +3. **CI/CD integration:** Test documentation builds +4. **Automated testing:** Check links, code examples, formatting + +### Keeping Documentation Current + +**Critical Rules:** +- Update documentation in same PR as code changes +- Mark deprecated features clearly +- Date time-sensitive content +- Remove outdated content (don't just mark it) +- Regular audits for accuracy + +**Warning:** Outdated documentation is worse than no documentation. + +--- + +## Actionable Quality Checklist + +### Before Publishing Documentation + +- [ ] **Clarity:** Can a new user accomplish the task? +- [ ] **Accuracy:** Is all information current and correct? +- [ ] **Completeness:** Are all steps and prerequisites covered? +- [ ] **Examples:** Are there working code examples? +- [ ] **Error handling:** Are failure modes documented? +- [ ] **Navigation:** Can users find related topics easily? +- [ ] **Accessibility:** Alt text, clear headings, good contrast? +- [ ] **Consistency:** Terminology and style consistent throughout? +- [ ] **Tested:** Have you actually followed the steps? +- [ ] **Feedback:** Is there a way to report issues? + +### Common Quality Issues + +**Structure:** +- Missing prerequisites +- Skipping heading levels +- Too many or too few headings +- Poor information architecture + +**Content:** +- Outdated examples +- Missing error documentation +- Unclear parameter descriptions +- No expected output shown + +**Style:** +- Passive voice overuse +- Inconsistent terminology +- Undefined jargon +- Walls of text without breaks + +--- + +## Documentation Maturity Model + +**Level 1 - Basic:** +- Exists but scattered +- May have outdated sections +- Grammar and formatting issues +- No clear organization + +**Level 2 - Structured:** +- Organized with clear headings +- Consistent formatting +- Basic separation of concerns +- Regular updates + +**Level 3 - Complete:** +- Tutorials, guides, reference, and explanations +- Multiple learning pathways +- Code examples tested +- Error documentation comprehensive + +**Level 4 - Maintained:** +- Updated with code changes +- Audience-aware content +- Accessible and inclusive +- Feedback mechanisms active + +**Level 5 - Excellent:** +- All above plus: +- Searchable and navigable +- Multiple formats (web, PDF, etc.) +- Interactive examples +- Analytics-driven improvements +- Community contributions welcomed + +--- + +## Quick Reference: Style Decisions + +| Aspect | Recommendation | +|--------|----------------| +| **Headings** | Sentence case | +| **Lists** | Serial comma | +| **Code** | Inline `backticks` | +| **UI elements** | **Bold** | +| **Links** | Descriptive text | +| **Person** | Second person ("you") | +| **Voice** | Active | +| **Tense** | Present | +| **Tone** | Conversational, professional | +| **Paragraphs** | Short (2-4 sentences) | +| **Examples** | Required for all APIs | + +--- + +## Sources and Further Reading + +- **Google Developer Documentation Style Guide:** https://developers.google.com/style +- **Diátaxis Framework:** https://diataxis.fr/ +- **Write the Docs:** https://www.writethedocs.org/guide/ +- **Microsoft Writing Style Guide:** https://learn.microsoft.com/en-us/style-guide/ diff --git a/skills/writing-documentation/references/exemplary-projects.md b/skills/writing-documentation/references/exemplary-projects.md new file mode 100644 index 0000000..741b370 --- /dev/null +++ b/skills/writing-documentation/references/exemplary-projects.md @@ -0,0 +1,222 @@ +# Exemplary Documentation Projects + +This reference document synthesizes research on projects widely recognized for exceptional documentation quality. + +## Framework Documentation + +### React (react.dev) +**Why Exceptional:** +- Prioritizes learning outcomes over comprehensive coverage +- Interactive sandboxes reduce friction +- Progressive complexity with clear learning paths +- Removed gatekeeping; accessible to newcomers + +**Key Patterns:** +- Learning path architecture (Get Started → Learn → Reference) +- Conversational tone, example-driven teaching +- "Try it out" sections encourage experimentation +- Progressive disclosure: simple examples first, edge cases later +- Each section includes "What's Next?" guidance + +**Notable Examples:** +- useState Hook Documentation: Opens with 3-line working example +- Tic-tac-toe Tutorial: Complete interactive app built step-by-step +- Dual presentation pathways: Quick answers + deep dives + +--- + +### Rust Documentation +**Why Exceptional:** +- Built-in documentation testing (`cargo test --doc`) +- Comprehensive philosophy: "if public, must be documented" +- Tooling integration standardizes quality across ecosystem +- Community RFC-based conventions ensure consistency + +**Key Patterns:** +- Standard structure: summary → explanation → examples → errors → panics +- All code examples compile and execute as tests +- Smart boilerplate hiding with `#` prefix +- Explicit documentation of failure modes + +**Notable Examples:** +- Vec Documentation: Multiple usage examples, guarantees section +- Result Documentation: Pattern matching, error propagation patterns +- Tested examples guarantee correctness + +--- + +### Django Documentation +**Why Exceptional:** +- Learn-by-doing philosophy with real projects +- Multiple learning pathways (tutorials, topic guides, reference, how-tos) +- Comprehensive coverage from intro to production +- Progressive revelation of complexity + +**Key Patterns:** +- Tutorials take you "by the hand" through concrete projects +- Show directory structure visually +- Platform-specific variants (Windows vs. Linux) +- Explain reasoning: "why does Django work this way?" +- Anticipate common problems + +**Notable Examples:** +- "Writing your first Django app" tutorial: 7-part narrative building on previous parts +- Multi-part tutorials create continuity and momentum +- Admin interface-first teaching builds early confidence + +--- + +### Vue.js Documentation +**Why Exceptional:** +- Flexibility recognition: presents multiple API approaches equally +- Multiple entry points for different learning styles +- Clear prerequisites stated upfront +- Progressive framework philosophy reflected in docs + +**Key Patterns:** +- Dual API presentation (Options vs. Composition) +- Multiple learning paths: "Try it" vs. "Read Guide" vs. "Examples" +- Visual SFC (Single File Component) structure diagrams +- Progressive disclosure: scales from simple to advanced +- "Try in Playground" links for hands-on learning + +**Notable Examples:** +- Reactivity Fundamentals: Side-by-side API comparison +- Component Basics: Progressive complexity (template → props → events) +- Prerequisites stated clearly: "assumes basic HTML, CSS, JS familiarity" + +--- + +## Developer Tools & Platforms + +### Stripe +**Why Exceptional:** +- Industry gold standard for API documentation +- "Don't overdo it" philosophy: elegant simplicity +- Balances comprehensiveness with navigability +- Assumes developer intelligence + +**Key Patterns:** +- Two-panel layout (explanation + code side-by-side) +- Clean aesthetic with whitespace +- Single-page format with anchor navigation +- Multi-language code samples inline +- Robust search functionality +- Seamless topic linking + +**Notable Examples:** +- Quickstart Guide: Multiple languages inline, progressive complexity +- Error Handling: Transforms errors into actionable guidance +- API Reference: Clear request/response examples + +--- + +### Twilio +**Why Exceptional:** +- Gold standard for intuitive structure +- Makes complex concepts accessible +- Supports multiple programming languages seamlessly +- Use case-driven organization + +**Key Patterns:** +- Two-panel layout with multi-language samples +- Beginner-friendly explainers ("What's a REST API, anyway?") +- Progressive learning paths for varying experience +- Practical tutorials with screenshots +- Language selector at top of every page +- Copy-paste ready code + +**Notable Examples:** +- Quickstart Guides: Simple actionable steps with screenshots +- Webhooks Documentation: Explains concept before technical details +- Customer story integration: Real examples demonstrate value + +--- + +### Slack +**Why Exceptional:** +- Balances simplicity with depth +- Plain language without dumbing down +- Difficulty indicators help self-assessment +- Accessible to junior devs, comprehensive for advanced + +**Key Patterns:** +- Difficulty labels (beginner/intermediate/advanced) +- Plain language and everyday vocabulary +- Clear "next steps" guidance +- Example-driven explanations +- Concept-first teaching ("why" before "how") + +**Notable Examples:** +- Getting Started Guide: Clear progression from basics +- Interactive App Tutorials: Real-world scenarios +- Events API: Complex concepts in accessible language + +--- + +### Vercel +**Why Exceptional:** +- Integrated examples repository +- Template-driven learning +- TypeScript-first documentation +- Multiple learning formats + +**Key Patterns:** +- Examples repository as documentation +- Template ecosystem for common use cases +- Progressive complexity (hello-world to sophisticated) +- Visual workflow diagrams +- Links to runnable GitHub repos +- Changelog-driven updates + +**Notable Examples:** +- Deployment Documentation: Links to working example repos +- Edge Functions Guide: TypeScript types + runnable examples +- Open-source template library + +--- + +## Universal Success Patterns + +### Structural Patterns +1. **Two-panel or multi-column layout** - Code alongside explanations +2. **Clear navigation** - Persistent sidebars or top navigation +3. **Getting Started sections** - Quick wins before comprehensive coverage +4. **Progressive disclosure** - Basic concepts first, advanced clearly marked +5. **Multiple learning pathways** - Tutorials, guides, reference, how-tos + +### Content Patterns +1. **Example-driven teaching** - Show before explaining +2. **Multi-language support** - Code in 3-5+ languages +3. **Real use cases** - Show what you can build +4. **Error documentation** - Comprehensive error guides +5. **Copy-paste readiness** - Code works immediately + +### Writing Style +1. **Conversational but professional** - Accessible without being casual +2. **Active voice** - "Create an API key" not "An API key can be created" +3. **Short sentences and paragraphs** - Digestible chunks +4. **Task-oriented structure** - Organized around what users want to accomplish +5. **Minimal jargon** - Define technical terms when necessary +6. **Examples before concepts** - Show first, explain second + +### Quality Markers +1. **Respect for user time** - Clear "why this matters" upfront +2. **Assumption of competency** - Don't over-explain basics +3. **Practical focus** - Real-world over theoretical +4. **Multiple learning styles** - Reading, code, visuals, examples +5. **Regular maintenance** - Changelogs show active updates +6. **Feedback mechanisms** - Ways to report issues + +## Common Anti-Patterns to Avoid + +1. **Abstract explanations before examples** - Show, then explain +2. **Alphabetical API organization** - Hard to discover patterns +3. **Outdated examples** - Damages credibility +4. **Unstated prerequisites** - Readers get lost +5. **Dense paragraphs** - Cognitive overload +6. **Separating "why" from "how"** - Context matters +7. **Jargon without definition** - Excludes learners +8. **Missing error documentation** - Users struggle to debug +9. **Inconsistent terminology** - Creates confusion +10. **No visual hierarchy** - Hard to scan diff --git a/skills/writing-documentation/references/llm-pitfalls.md b/skills/writing-documentation/references/llm-pitfalls.md new file mode 100644 index 0000000..cfe939f --- /dev/null +++ b/skills/writing-documentation/references/llm-pitfalls.md @@ -0,0 +1,774 @@ +# LLM Documentation Pitfalls: Problems and Countermeasures + +This reference documents systematic quality issues in LLM-generated technical documentation, based on 2024-2025 research across academic papers, industry studies, and developer communities. + +## Overview + +Research shows that even advanced models like GPT-4 and Claude-3 produce correct code only 65.2% and 56.7% of the time respectively. Over 40% of AI-generated code contains security vulnerabilities. Beyond code quality, LLM documentation suffers from stylistic tells, factual hallucinations, missing context, and formulaic patterns that reduce trust and usability. + +**Key insight**: These issues are systematic, not random. They stem from fundamental LLM characteristics: probabilistic generation, training data limitations, and lack of verification mechanisms. + +--- + +## CATEGORY 1: Accuracy and Factual Issues + +### 1.1 Hallucinations + +**Problem**: LLMs confidently generate false information—non-existent APIs, fabricated parameters, invented research papers. + +**Why it occurs**: +- Predict plausible text sequences, not verified facts +- 19.7% package hallucination rate (205,474+ unique fake packages) +- 50%+ failures from non-existent API invocations +- Training data contamination with outdated/incorrect code + +**Examples**: +```python +# HALLUCINATED - pandas.DataFrame.merge_smart() doesn't exist +df1.merge_smart(df2, on='id') + +# HALLUCINATED - requests.get_json() isn't a method +data = requests.get_json('https://api.example.com') +``` + +**Detection strategies**: +- Semantic entropy analysis (most robust method) +- Multi-sampling consistency checks +- Automated validation against package registries +- API stub verification + +**Countermeasures**: +- Use RAG (Retrieval-Augmented Generation) with current docs +- Verify all APIs against official documentation +- Test all code examples before publication +- Flag low-frequency APIs for extra review + +--- + +### 1.2 Outdated and Version-Specific Information + +**Problem**: Training data cutoffs cause documentation to reference deprecated methods and obsolete patterns. + +**Why it occurs**: +- Static training data (e.g., GPT-4o trained on DuckDB 0.10.2 while current is 1.3.2) +- Frequency bias: deprecated high-frequency APIs get generated preferentially +- 70-90% deprecation usage rate when context is outdated +- "3-month-old documentation can be completely outdated" + +**Examples**: +```python +# Deprecated in Python 3.2, removed in 3.9 +for child in root.getchildren(): # LLMs still suggest this + process(child) + +# Modern approach: +for child in root: + process(child) +``` + +**Countermeasures**: +- Always specify exact versions in documentation +- Use RAG to fetch current API documentation +- Run linters that flag deprecated usage +- Include version compatibility matrices +- Regular documentation audits for version drift + +--- + +### 1.3 Mixed Accurate/Inaccurate Information + +**Problem**: Most dangerous pattern—correct elements mask errors, making detection difficult even for experts. + +**Why it occurs**: +- LLMs blend information from multiple sources +- High-confidence patterns from different versions get mixed +- Temporal conflation (can't distinguish "2020 code" vs "2024 code") + +**Examples**: +```python +# Mixing Python 2 and Python 3 +from __future__ import print_function # Python 2 compatibility +response = urllib2.urlopen(url) # Python 2 (removed in Python 3) +data = response.read().decode('utf-8') # Python 3 style +``` + +**Countermeasures**: +- Test entire code examples, not just parts +- Cross-reference multiple authoritative sources +- Use version-pinned dependency files in context +- Automated compatibility checking + +--- + +## CATEGORY 2: Code Quality Issues + +### 2.1 Code That Doesn't Work + +**Problem**: 35-70% failure rate depending on model. Contains syntax errors, logic flaws, off-by-one errors. + +**Why it occurs**: +- Token-by-token prediction doesn't ensure correctness +- Training objective optimizes for pattern matching, not accuracy +- No execution verification during generation + +**Examples**: +```python +# Logic error: Missing duplicate detection +def find_unique(items): + result = [] + for item in items: + result.append(item) # Should check if already in result + return result + +# Math error: Wrong operation +average = sum(values) * len(values) # Should divide, not multiply + +# Off-by-one error +middle = arr[len(arr) // 2 + 1] # Wrong for even-length arrays +``` + +**Countermeasures**: +- Execute all examples in sandboxed environments +- Automated testing with comprehensive test suites +- Static analysis (linters, type checkers) +- Iterative refinement (LlmFix-style approaches show 9.5% improvement) + +--- + +### 2.2 Incomplete Code Examples + +**Problem**: Missing imports, setup steps, configuration, and contextual information needed for execution. + +**Why it occurs**: +- Context window limitations +- Training data fragmentation (snippets lack full context) +- Implicit knowledge assumption + +**Examples**: +```python +# Generated code +df = pd.read_csv('data.csv') + +# Missing: +# import pandas as pd +# pip install pandas +``` + +**Countermeasures**: +- Prompt for "complete, runnable code with all imports" +- Dependency checking against registries (npm, PyPI, etc.) +- Template-based generation with required structure +- "Copy-paste ready" standard + +--- + +### 2.3 Security Vulnerabilities + +**Problem**: 40%+ of AI-generated code contains security flaws. Java shows 70%+ security failure rate. + +**Why it occurs**: +- Training on vulnerable code from open-source datasets +- Security not prioritized unless explicitly prompted +- Simpler insecure patterns appear more frequently + +**Examples**: +```python +# SQL Injection vulnerability +def get_user(username): + query = f"SELECT * FROM users WHERE username = '{username}'" + return db.execute(query) +# Attack: username = "admin' OR '1'='1" + +# Secure version: +def get_user(username): + query = "SELECT * FROM users WHERE username = ?" + return db.execute(query, (username,)) +``` + +```python +# Unsafe YAML loading +config = yaml.load(f) # Allows arbitrary code execution + +# Secure: +config = yaml.safe_load(f) +``` + +**Countermeasures**: +- Security-focused prompts ("secure code with input validation") +- Static security analysis (SAST tools: Snyk, Semgrep, CodeQL) +- Dependency vulnerability scanning +- Security review for critical code paths +- Default to secure patterns in templates + +--- + +### 2.4 Missing Error Handling and Edge Cases + +**Problem**: Implements "happy path" only. Systematically overlooks null values, boundary conditions, empty collections. + +**Why it occurs**: +- Training data bias toward success cases +- Error handling adds verbosity +- Edge cases underrepresented in training data + +**Examples**: +```python +# Crashes on empty input +def get_average(numbers): + return sum(numbers) / len(numbers) # ZeroDivisionError when numbers = [] + +# Missing null checks +def getUserName(user): + return user.profile.name.toUpperCase() # Crashes if any part is null + +# No boundary checking +def get_percentage(value, total): + return (value / total) * 100 # Crashes if total = 0 +``` + +**Countermeasures**: +- Explicit prompts: "Handle null values, empty arrays, and boundary conditions" +- Test-driven approach with edge case test suites +- Defensive programming templates +- Automated edge case testing (null, empty, max values, Unicode) + +--- + +### 2.5 Inconsistent Code Style + +**Problem**: 24 types of style inconsistencies across 5 dimensions, affecting maintainability despite functional correctness. + +**Why it occurs**: +- Training data diversity (varied conventions) +- Context-dependent generation +- Systematic biases (avoiding blank lines, avoiding comprehensions) + +**Examples**: +```python +# Inconsistent spacing +def process1(items): + result=[] # No spaces + +def process2(items): + result = [] # With spaces + +# Inconsistent naming +const userId = getCurrentUser(); # camelCase +const user_data = fetchUserData(userId); # snake_case +const UserSettings = loadSettings(); # PascalCase +``` + +**Countermeasures**: +- Specify style guides in prompts ("Follow PEP 8") +- Automated formatters (Black, Prettier, gofmt) +- Linter enforcement +- Few-shot examples with consistent style + +--- + +## CATEGORY 3: Writing Style Problems + +### 3.1 Verbal Tics and Repetitive Patterns + +**Problem**: Overuse of specific phrases makes AI authorship instantly recognizable. + +**Common tells**: +- "It's worth noting that" +- "Keep in mind" +- "Delve into" +- "In the realm of" +- "Tapestry" +- "Landscape" +- "Leverage" +- "Robust" +- "Pivotal" + +**Why it occurs**: Statistical attractors—phrases that co-occur frequently in training data. + +**Examples**: +```markdown + +It's worth noting that we should delve into the pivotal role of machine learning +in the realm of automation. That being said, it's important to note that this +cutting-edge technology leverages sophisticated algorithms. + + +Machine learning plays a crucial role in automation. This advanced technology +uses sophisticated algorithms. +``` + +**Countermeasures**: +- Scan for telltale phrases and replace with plain language +- Use varied vocabulary +- Avoid formulaic transitions +- Edit for conciseness + +--- + +### 3.2 Over-Explanation and Verbosity + +**Problem**: Padding with unnecessary words, redundant explanations, obvious statements. + +**Why it occurs**: +- RLHF reward models favor longer outputs +- "Verbosity compensation" when uncertain +- Lack of experience to fill space + +**Examples**: +```markdown + +In today's fast-paced business environment, it's important to note that companies +need to engage in the implementation of optimization strategies in order to +streamline their operations in a way that improves efficiency. + + +Companies should implement optimization strategies to streamline operations and +improve efficiency. +``` + +**Countermeasures**: +- Cut redundant exposition +- Remove generic opening padding +- Use concise constructions ("to" not "in order to") +- Delete "It goes without saying" + +--- + +### 3.3 Hedging and Uncertainty Language Overuse + +**Problem**: Excessive qualifying statements make content feel tentative and non-committal. + +**Common patterns**: +- "Generally speaking" +- "Typically" +- "Could potentially" +- "To some extent" +- "It could be argued that" + +**Why it occurs**: +- Risk avoidance (safe language) +- Training on academic content +- Probabilistic nature creates uncertainty + +**Examples**: +```markdown + +It could be argued that machine learning tends to generally improve efficiency +in most automation scenarios, and typically provides somewhat better results. + + +Machine learning improves efficiency in automation and provides better results. +``` + +**Countermeasures**: +- Make direct claims where appropriate +- Remove unnecessary qualifiers +- Use confident language for established facts + +--- + +### 3.4 Generic, Vague Descriptions + +**Problem**: Replaces specific facts with generic descriptions that could apply to anything. "Regression to the mean." + +**Common patterns**: +- "Significant impact" +- "Crucial role" +- "Comprehensive approach" +- "Game-changing" +- "Innovative solutions" + +**Why it occurs**: Generic statements are statistically common; specific facts are rare. + +**Examples**: +```markdown + +The comprehensive AI implementation provided significant improvements across +various aspects of the business, delivering transformative results. + + +The AI implementation increased sales by 23%, reduced processing time from +4 hours to 30 minutes, and identified $2M in cost savings. +``` + +**Countermeasures**: +- Use concrete numbers, dates, names +- Provide specific examples +- Cite actual sources +- Replace "research shows" with named studies + +--- + +### 3.5 Lack of Authentic Voice + +**Problem**: Missing emotional depth, personal perspective, humor, and memorable communication. + +**Why it occurs**: +- No lived experience +- Training for neutrality +- Pattern matching without unique perspective + +**Missing elements**: +- Personal anecdotes +- Humor or wit +- Sensory details +- Emotional language +- Opinions +- Conversational asides + +**Examples**: +```markdown + +The Amalfi Coast is a beautiful destination with scenic views and pleasant weather. + + +The Amalfi Coast hits you with the scent of lemon groves before you even see +the cliffs. The air tastes like salt and possibility, and the stone steps are +worn smooth by centuries of footsteps. +``` + +**Countermeasures**: +- Add personal stories and experiences +- Include sensory details +- Use conversational tone +- Share opinions and perspectives +- Show vulnerability and humor + +--- + +### 3.6 Formulaic Structures + +**Problem**: Recognizable templates create "fill-in-the-blank" feel. + +**Common patterns**: +- Rule of three everywhere +- "Not only...but also" overuse +- Participial phrase endings ("-ing" phrases) +- Gerund subjects ("Developing the new data...") +- Every section has exactly 3-5 bullets + +**Examples**: +```markdown + +Not only does machine learning improve accuracy, but it also enhances speed. +Implementing this technology provides significant benefits, marking a pivotal +moment in automation history. Whether you're a startup, a mid-sized company, +or an enterprise... + + +Machine learning improves both accuracy and speed. This technology transforms +automation. Companies of all sizes can benefit. +``` + +**Countermeasures**: +- Vary sentence structure +- Break the rule of three +- Use natural flow instead of templates +- Vary list lengths + +--- + +## CATEGORY 4: Content and Context Issues + +### 4.1 Missing "Why" and Context + +**Problem**: Documents *what* code does but not *why* decisions were made. + +**Why it occurs**: "Code lacks business or product logic that went into coding something a specific way." + +**What's missing**: +- Reasoning behind design decisions +- Business logic context +- Alternatives considered +- Project-specific constraints +- Abandoned approaches (leading to re-exploration) + +**Examples**: +```python +# What's documented (what) +def process_batch(items, size=1000): + for batch in chunks(items, size): + process(batch) + +# What's missing (why) +# Process in batches of 1000 to avoid memory exhaustion on large datasets. +# Batch size of 1000 determined through load testing - smaller batches +# increased overhead, larger caused OOM errors on production hardware. +# Alternative approaches tried: streaming (too slow), full load (OOM). +``` + +**Countermeasures**: +- Explain reasoning, not just functionality +- Document alternatives considered +- Include business/product context +- Explain constraints and trade-offs + +--- + +### 4.2 Inconsistent Terminology + +**Problem**: Using different terms for the same concept creates confusion. + +**Why it occurs**: LLMs must infer whether terms are synonyms or distinct concepts, making probabilistic guesses. + +**Examples**: +- "API key," "access token," "auth credential" used interchangeably +- "User," "account," "profile" without clear distinctions +- Mixing technical terms across sections + +**Impact**: "Inconsistency compounds over time, creating increasingly unreliable AI experience that erodes developer trust." + +**Countermeasures**: +- Create and enforce terminology glossary +- Use consistent naming throughout +- Define distinctions between similar terms +- Review for terminology drift + +--- + +### 4.3 Unrealistic or Toy Examples + +**Problem**: Oversimplified scenarios that don't reflect production usage. + +**Why it occurs**: +- Training data bias toward tutorials +- Happy path optimization +- Simplification tendency + +**Common issues**: +- Hard-coded credentials +- Missing error handling, retries, timeouts +- Unrealistic scale (5 items instead of millions) +- No authentication or authorization +- Missing logging, monitoring, rate limiting + +**Examples**: +```python +# Toy example +def getUserData(userId): + response = await fetch(`/api/users/${userId}`); + return response.json(); + +# Production needs: +# - Error handling (network failures, 404s, 500s) +# - Timeout configuration +# - Retry logic with exponential backoff +# - Logging/monitoring +# - Request cancellation +# - Rate limiting +# - Authentication +``` + +**Countermeasures**: +- Prompt for "production-ready code" +- Security checklists (authentication, validation) +- Real-world test scenarios +- Flag hard-coded credentials +- Require error handling + +--- + +### 4.4 Lack of Warnings and Caveats + +**Problem**: Presents information confidently without appropriate uncertainty markers or warnings. + +**Why it occurs**: LLMs trained to sound authoritative. + +**Missing elements**: +- Security warnings +- Version compatibility notes +- Deprecation warnings +- Performance implications +- Prerequisites + +**Countermeasures**: +- Explicitly prompt for warnings +- Add security review step +- Include version compatibility matrices +- Document prerequisites clearly + +--- + +## CATEGORY 5: Documentation Structure Issues + +### 5.1 Over-Comprehensive Coverage + +**Problem**: Tries to cover every aspect equally, creating information overload. + +**Why it occurs**: "AI writing sounds comprehensive, balanced, covers every angle equally." + +**Impact**: +- Difficulty identifying what's important +- Tedious reading +- Key information buried + +**Countermeasures**: +- Prioritize common use cases +- Mark advanced sections clearly +- Use progressive disclosure +- Focus on "80% of what users need" + +--- + +### 5.2 Surface-Level Coverage + +**Problem**: Provides broad, high-level explanations that avoid depth. + +**Why it occurs**: +- Limited understanding +- Avoids taking strong stances +- Cannot grasp sophisticated algorithms + +**Countermeasures**: +- Prompt for specific depth level +- Request concrete examples +- Ask for trade-off analysis +- Include advanced sections for complex topics + +--- + +## LLM-Specific Quality Checklist + +Before publishing LLM-assisted documentation, verify: + +### Code Quality +- [ ] All code examples tested and working +- [ ] No hallucinated APIs or libraries +- [ ] All dependencies verified in registries +- [ ] Versions specified and compatible +- [ ] No deprecated methods used +- [ ] Security vulnerabilities scanned (SAST) +- [ ] Error handling for edge cases +- [ ] Input validation present +- [ ] No hard-coded credentials +- [ ] Style consistent throughout + +### Content Quality +- [ ] No telltale LLM phrases ("delve," "leverage," "realm") +- [ ] Specific examples with concrete details +- [ ] "Why" explained, not just "what" +- [ ] Terminology consistent throughout +- [ ] Authentic voice, not generic +- [ ] Production-ready examples +- [ ] Warnings and caveats included +- [ ] Context provided for decisions + +### Structural Quality +- [ ] Not over-comprehensive (focused on common needs) +- [ ] Not surface-level (sufficient depth) +- [ ] Progressive complexity +- [ ] Clear prerequisites +- [ ] Multiple learning pathways + +--- + +## Detection and Remediation Workflow + +### 1. Automated Detection +- Scan for telltale phrases +- Run code through linters and SAST tools +- Verify APIs against official documentation +- Check dependencies in registries +- Test code examples +- Validate version compatibility + +### 2. Manual Review +- Check for missing "why" explanations +- Verify terminology consistency +- Assess authenticity of voice +- Validate security practices +- Review edge case handling + +### 3. Remediation +- Replace LLM verbal tics with plain language +- Add context and reasoning +- Make examples production-ready +- Add error handling and security +- Include specific, concrete details +- Break formulaic templates +- Test all code thoroughly + +--- + +## Best Practices for LLM-Assisted Documentation + +### Generation Phase +1. **Use RAG** with current documentation +2. **Specify exact versions** in prompts +3. **Request security** and error handling explicitly +4. **Provide complete context** (dependencies, constraints) +5. **Ask for production-ready** code +6. **Request "why" explanations**, not just "what" + +### Editing Phase +1. **Remove telltale phrases** and verbal tics +2. **Add specificity**: Replace generic with concrete +3. **Verify all APIs** against official docs +4. **Test all code** in target environment +5. **Scan for security** vulnerabilities +6. **Add personal voice** and experience +7. **Include edge cases** and error handling +8. **Ensure consistency** in terminology and style + +### Review Phase +1. **Security review** for critical paths +2. **Expert review** for technical accuracy +3. **User testing** with actual developers +4. **Version compatibility** validation +5. **Accessibility check** (alt text, headings) + +--- + +## Key Takeaways + +### Most Critical Issues to Address +1. **Hallucinations** (19.7% package rate, 50%+ API failures) +2. **Security vulnerabilities** (40%+ prevalence) +3. **Missing edge case handling** (systematic) +4. **Outdated information** (training cutoffs) +5. **Telltale phrases** (instant recognition) +6. **Missing context** ("why" not documented) + +### Universal Principles +- **AI as assistant, not replacement** - Human review essential +- **Test everything** - Never publish untested code +- **Verify facts** - Check all APIs and versions +- **Add context** - Explain reasoning and trade-offs +- **Edit ruthlessly** - Remove artificial patterns +- **Prioritize security** - Scan and review + +### Success Formula +**RAG + Explicit Prompts + Automated Testing + Human Review + Iterative Editing = Quality Documentation** + +--- + +## Sources + +This research synthesizes findings from: + +**Academic Research**: +- arxiv:2411.01414 (Code generation mistakes) +- arxiv:2407.09726 (API hallucinations) +- arxiv:2409.20550 (Practical hallucinations) +- arxiv:2407.00456 (Style inconsistencies) +- arxiv:2304.10778 (Code quality evaluation) +- Nature publication on semantic entropy + +**Industry Studies**: +- Endor Labs (security vulnerabilities) +- Stanford study (AI coding assistants) +- Amazon Science (package hallucinations) + +**Technical Writing Resources**: +- Wikipedia (AI writing signs) +- Grammarly blog (AI phrases) +- Write the Docs community +- Technical Writer HQ +- Multiple developer blogs and forums + +**Statistics Referenced**: +- GPT-4: 65.2% correctness +- Claude-3: 56.7% correctness +- Security failures: 40%+ (70%+ Java) +- Package hallucinations: 19.7% +- API failures: 50%+ for low-frequency APIs