22 KiB
name, description, allowed-tools, version
| name | description | allowed-tools | version |
|---|---|---|---|
| ai-check | Detect AI/LLM-generated text patterns in research writing. Use when: (1) Reviewing manuscript drafts before submission, (2) Pre-commit validation of documentation, (3) Quality assurance checks on research artifacts, (4) Ensuring natural academic writing style, (5) Tracking writing authenticity over time. Analyzes grammar perfection, sentence uniformity, paragraph structure, word frequency (AI-typical words like 'delve', 'leverage', 'robust'), punctuation patterns, and transition word overuse. | Read, Grep, Bash | 1.0.0 |
AI-Generated Text Detection Skill
Purpose
Detect patterns typical of LLM-generated text to ensure natural, human-authored academic writing. This skill helps maintain authenticity in research publications, dissertations, and documentation.
When to Use This Skill
Primary Use Cases
- Pre-Commit Validation - Automatically check manuscripts and documentation before git commits
- Manuscript Review - Validate academic writing before submission to journals or committees
- Quality Assurance - Part of systematic QA workflow for research artifacts
- On-Demand Analysis - Manual review of any text file for AI patterns
- Writing Evolution Tracking - Monitor writing style changes over time
Specific Scenarios
- Before submitting dissertation chapters to advisor
- Prior to journal article submission
- When reviewing team member contributions
- After making significant edits to documentation
- When suspicious patterns are noticed in writing
- During peer review or committee review preparation
Detection Methodology
1. Grammar Pattern Analysis
What We Check:
- Excessive Perfection: Zero typos, missing commas, or minor errors throughout
- Comma Placement: Perfect comma usage in all complex sentences
- Formal Register: Consistent formal tone with no informal elements
- Grammar Consistency: No variations in grammatical choices
Red Flags:
- Absolutely no grammatical errors in 10+ pages
- Every semicolon and colon used perfectly
- No sentence fragments or run-ons even in appropriate contexts
- Overly formal language even in methods sections
Human Writing Typically Has:
- Occasional minor typos or comma splices
- Some inconsistency in formality
- Natural variations in grammar choices
- Context-appropriate informality
2. Sentence Structure Uniformity
What We Check:
- Sentence Length Distribution: Variation in sentence lengths
- Structural Patterns: Repetitive sentence structures
- Complexity Variation: Mix of simple, compound, complex sentences
- Opening Patterns: How sentences begin
Red Flags:
- Most sentences 15-25 words (AI sweet spot)
- Repetitive subject-verb-object patterns
- Every paragraph starts with topic sentence
- Excessive use of transition words at sentence starts
- Predictable sentence complexity patterns
Human Writing Typically Has:
- Wide variation (5-40+ word sentences)
- Unpredictable sentence structures
- Occasional fragments for emphasis
- Natural flow without forced transitions
3. Paragraph Structure Analysis
What We Check:
- Paragraph Length: Uniformity vs. natural variation
- Structural Pattern: Topic sentence + support + conclusion pattern
- Information Flow: Natural vs. algorithmic organization
- Paragraph Transitions: Connection between paragraphs
Red Flags:
- All paragraphs 4-6 sentences long
- Every paragraph follows same structure
- Mechanical transitions between paragraphs
- Perfectly balanced paragraph lengths
- No single-sentence paragraphs
Human Writing Typically Has:
- Paragraph length variation (1-10+ sentences)
- Structural diversity based on content
- Natural transitions
- Strategic use of short/long paragraphs for emphasis
4. Word Frequency Analysis (AI-Typical Words)
High-Risk AI Words (overused by LLMs):
Verbs:
- "delve" (rarely used by humans)
- "leverage" (business jargon)
- "utilize" (instead of "use")
- "facilitate" (overly formal)
- "demonstrate" (overused)
- "implement" (in non-technical contexts)
- "enhance" (marketing language)
Adjectives:
- "robust" (technical overuse)
- "comprehensive" (vague intensifier)
- "innovative" (buzzword)
- "cutting-edge" (cliché)
- "significant" (statistical overuse)
- "substantial" (formal overuse)
- "considerable" (formal overuse)
- "crucial" (intensity overuse)
Transition Words (overused):
- "furthermore" (very formal)
- "moreover" (archaic feeling)
- "additionally" (redundant)
- "consequently" (overused)
- "subsequently" (temporal overuse)
- "nevertheless" (formal overuse)
- "nonetheless" (synonym overuse)
Phrases:
- "it is important to note that"
- "it should be emphasized that"
- "a comprehensive analysis of"
- "in the context of"
- "with respect to"
- "in terms of"
- "in order to" (instead of "to")
Detection Criteria:
- Count frequency per 1000 words
- Compare to human academic writing baselines
- Flag if 3+ high-risk words per 1000 words
- Weight by word rarity (delve = high weight)
5. Punctuation Patterns
What We Check:
- Semicolon Usage: Frequency and correctness
- Colon Usage: Perfect usage patterns
- Em-Dash Usage: Consistent stylistic choices
- Comma Patterns: Perfection vs. natural variation
- Ellipsis/Exclamation: Absence in informal contexts
Red Flags:
- Excessive semicolon use (2+ per paragraph)
- Perfect colon usage throughout
- Consistent em-dash formatting (—)
- No missing commas anywhere
- Zero informal punctuation
Human Writing Typically Has:
- Inconsistent punctuation choices
- Occasional missing/extra commas
- Variable dash formatting (- vs -- vs —)
- Some informal punctuation where appropriate
Confidence Scoring System
Scoring Formula
Overall Confidence = Weighted average of:
- Grammar perfection: 20%
- Sentence uniformity: 25%
- Paragraph structure: 20%
- AI-typical words: 25%
- Punctuation patterns: 10%
Each metric scored 0-100, then combined with weights.
Confidence Levels
Low Confidence (0-30%): Likely Human Writing
Characteristics:
- Natural sentence length variation (5-40+ words)
- Occasional grammatical imperfections
- Authentic voice and natural flow
- Domain-specific terminology used naturally
- Structural variety in paragraphs
- Minimal AI-typical words (0-2 per 1000 words)
Action: ✅ Writing appears authentic, no changes needed
Medium Confidence (30-70%): Possible AI Assistance
Characteristics:
- Some uniformity in sentence structure
- Mix of AI-typical and natural patterns
- May be human-edited AI output
- Overly formal in places
- Some transition word overuse
- 3-5 AI-typical words per 1000 words
Action: ⚠️ Review flagged sections, apply suggestions selectively
Examples of Mixed Writing:
- AI-generated first draft with heavy human editing
- Human writing that mimics academic formality excessively
- Non-native English speakers using formal templates
- Multiple authors with different styles
High Confidence (70-100%): Likely AI-Generated
Characteristics:
- Excessive uniformity across all metrics
- Multiple AI-typical word clusters
- Perfect grammar and punctuation throughout
- Artificial transition patterns
- Mechanical paragraph structure
- 6+ AI-typical words per 1000 words
Action: 🚫 Significant revision needed, rewrite in authentic voice
Output Format
When running AI-check analysis, generate a comprehensive report:
1. Executive Summary
Overall Confidence Score: 65%
Status: MEDIUM - Possible AI assistance detected
Files Analyzed: 1
Total Words: 3,456
Recommendation: Review flagged sections
2. Metric Breakdown
Grammar Perfection: 85% (High - suspiciously few errors)
Sentence Uniformity: 72% (High - repetitive structures)
Paragraph Structure: 68% (Medium - some variation)
AI-Typical Words: 58% (Medium - 4.2 per 1000 words)
Punctuation Patterns: 45% (Low - natural variation)
3. Flagged Sections
Lines 45-67 (Confidence: 82%)
Pattern: Excessive transition words + uniform sentences
AI Words: "moreover", "furthermore", "leverage", "robust"
Lines 112-134 (Confidence: 76%)
Pattern: Perfect grammar + mechanical structure
AI Words: "delve", "comprehensive", "facilitate"
4. Specific Issues Detected
High-Risk AI Words Found (per 1000 words):
• "delve" (2 occurrences) - RARELY used by humans
• "leverage" (3 occurrences) - Business jargon overuse
• "robust" (4 occurrences) - Technical overuse
• "furthermore" (6 occurrences) - Formal transition overuse
Sentence Uniformity Issues:
• 67% of sentences are 15-25 words (AI sweet spot)
• 82% of paragraphs start with transition words
• Low variation in sentence complexity
Paragraph Structure Issues:
• All paragraphs 4-6 sentences long
• Mechanical topic-sentence pattern throughout
5. Word Frequency Report
Top AI-Typical Words:
1. "furthermore" - 6x (baseline: 0.5x per 1000 words)
2. "robust" - 4x (baseline: 0.8x per 1000 words)
3. "leverage" - 3x (baseline: 0.3x per 1000 words)
4. "comprehensive" - 3x (baseline: 1.2x per 1000 words)
5. "delve" - 2x (baseline: 0.1x per 1000 words)
Comparison to Human Academic Writing:
Your text: 4.2 AI-typical words per 1000
Human baseline: 1.5 AI-typical words per 1000
Ratio: 2.8x higher than human baseline
Improvement Suggestions
For High Confidence (70-100%) Detections
Sentence Structure:
- ❌ "Furthermore, the results demonstrate a comprehensive analysis of the robust dataset."
- ✅ "The results show our analysis covered the full dataset."
Why Better: Simpler words, no transition word, more direct
Word Choice:
- ❌ "This study delves into the utilization of innovative methodologies."
- ✅ "We examine how researchers use new methods."
Why Better: Active voice, common words, clearer meaning
Paragraph Variation:
- ❌ All paragraphs 5 sentences, topic sentence + 3 support + conclusion
- ✅ Mix paragraph lengths: 2, 7, 4, 3, 6 sentences based on content needs
Why Better: Natural flow based on content, not formula
Specific Suggestion Categories
1. Vary Sentence Lengths
Current: 15-25 word sentences consistently
Suggestion: Mix short (5-10), medium (15-20), long (25-35) sentences
Example:
- Short: "The effect was significant."
- Medium: "We observed a 23% increase across all conditions."
- Long: "This finding aligns with previous work showing that..."
2. Replace AI-Typical Words
Replace → With
- "delve into" → "examine", "explore", "investigate"
- "leverage" → "use", "apply", "employ"
- "utilize" → "use"
- "robust" → "strong", "reliable", "thorough"
- "facilitate" → "enable", "help", "allow"
- "furthermore" → "also", "next", [or remove]
- "moreover" → "additionally", "also", [or use dash]
- "comprehensive" → "complete", "thorough", "full"
3. Add Natural Imperfections (Where Appropriate)
- Use contractions in appropriate contexts ("it's", "we'll")
- Include domain-specific jargon naturally
- Allow informal phrasing in methods/procedures
- Use occasional sentence fragments for emphasis
- Add personal observations or interpretations
- Include field-specific colloquialisms
4. Break Paragraph Uniformity
Current: All paragraphs follow topic-support-support-support-conclusion
Suggestion: Vary based on content
- Use single-sentence paragraphs for emphasis
- Combine related ideas into longer paragraphs
- Don't force every paragraph to have 5 sentences
- Let content determine structure, not formula
5. Remove Mechanical Transitions
❌ "Furthermore, the results show... Moreover, the analysis reveals..."
✅ "The results show... The analysis also reveals..." [simpler transitions]
✅ "The results show... Looking closer, the analysis..." [natural bridges]
Integration Points
1. Pre-Commit Hook Integration
Automatic checking before git commits
# Configured in .claude/settings.json
"gitPreCommit": {
"command": "python3 hooks/pre-commit-ai-check.py",
"enabled": true
}
Behavior:
- Runs on staged
.md,.tex,.rstfiles - Warns if confidence 30-70%
- Blocks commit if confidence >70%
- User can override with
git commit --no-verify
Exit Codes:
- 0: Pass (confidence <30%)
- 1: Warning (confidence 30-70%, commit allowed)
- 2: Block (confidence >70%, commit blocked)
2. Quality Assurance Integration
Part of comprehensive QA workflow
Integrated into code/quality_assurance/qa_manager.py:
- Runs during manuscript phase QA validation
- Checks all deliverable documents
- Generates detailed QA report section
- Fails QA if confidence >40%
Configuration (.ai-check-config.yaml):
qa_integration:
enabled: true
max_confidence_threshold: 0.40
check_manuscripts: true
check_documentation: true
generate_detailed_reports: true
3. Manuscript Writer Agent Integration
Real-time feedback during writing
Agent checks writing incrementally:
- After drafting each section
- Before moving to next phase
- Applies suggestions automatically
- Re-checks until confidence <30%
Agent Workflow:
- Draft section
- Run ai-check skill
- Review detection results
- Apply improvement suggestions
- Re-check until authentic
- Proceed to next section
4. Standalone Skill Usage
Manual invocation by user or agents
User Invocation:
Please run ai-check on docs/manuscript/discussion.tex and provide detailed feedback.
Agent Invocation:
I'll use the ai-check skill to verify this text before proceeding.
CLI Tool:
python tools/ai_check.py path/to/file.md
python tools/ai_check.py --directory docs/
python tools/ai_check.py --format html --output report.html
Tracking System
Historical Tracking
Log all AI-check runs to database for evolution tracking:
Database Schema (PostgreSQL via research-database MCP):
CREATE TABLE ai_check_history (
id SERIAL PRIMARY KEY,
file_path TEXT NOT NULL,
git_commit TEXT,
timestamp TIMESTAMP DEFAULT NOW(),
overall_confidence FLOAT,
grammar_score FLOAT,
sentence_score FLOAT,
paragraph_score FLOAT,
word_score FLOAT,
punctuation_score FLOAT,
ai_words_found JSONB,
flagged_sections JSONB
);
Trend Analysis
Track writing evolution:
File: docs/manuscript/discussion.tex
Version History:
2025-01-15: 78% confidence (HIGH - likely AI)
2025-01-18: 52% confidence (MEDIUM - revision 1)
2025-01-20: 34% confidence (LOW-MEDIUM - revision 2)
2025-01-22: 18% confidence (LOW - authentic writing)
Trend: ✅ Improving toward authentic writing
Use Cases:
- Monitor dissertation chapters over time
- Track improvements after applying suggestions
- Demonstrate writing authenticity to committee
- Identify sections needing more work
Configuration
Configuration File: .ai-check-config.yaml
# AI-Check Skill Configuration
# Pre-Commit Hook Settings
pre_commit:
enabled: true
check_files: [".md", ".tex", ".rst", ".txt"]
check_docstrings: true # Check Python docstrings
block_threshold: 0.70 # Block commit if >= 70%
warn_threshold: 0.30 # Warn if >= 30%
exclude_patterns:
- "*/examples/*"
- "*/tests/*"
- "*/node_modules/*"
- "*/.venv/*"
# Quality Assurance Integration
qa_integration:
enabled: true
max_confidence_threshold: 0.40 # Fail QA if >= 40%
check_manuscripts: true
check_documentation: true
generate_detailed_reports: true
track_history: true
# Detection Parameters
detection:
# Weight each metric (must sum to 1.0)
weights:
grammar_perfection: 0.20
sentence_uniformity: 0.25
paragraph_structure: 0.20
ai_word_frequency: 0.25
punctuation_patterns: 0.10
# AI-typical word lists
ai_words:
high_risk: ["delve", "leverage", "utilize"]
medium_risk: ["robust", "comprehensive", "facilitate"]
transitions: ["furthermore", "moreover", "additionally"]
# Thresholds
ai_words_per_1000_threshold: 3.0
human_baseline_per_1000: 1.5
# Report Generation
reporting:
default_format: "markdown" # markdown, json, html
include_suggestions: true
include_word_frequency: true
include_flagged_sections: true
max_flagged_sections: 10
# Tracking
tracking:
enabled: true
database: "research-database-mcp"
retention_days: 365
Per-Project Overrides
Create .ai-check.local.yaml for project-specific settings:
# Project-specific overrides
pre_commit:
block_threshold: 0.60 # More lenient for early drafts
detection:
ai_words:
high_risk: ["delve"] # Only flag worst offenders
Examples
Example 1: High Confidence Detection
Input Text:
Furthermore, this comprehensive study delves into the robust
methodologies utilized to facilitate the implementation of innovative
approaches. Moreover, the analysis demonstrates significant findings
that leverage state-of-the-art techniques. Subsequently, the results
indicate substantial improvements across all metrics. Nevertheless,
additional research is crucial to fully comprehend the implications.
AI-Check Report:
Overall Confidence: 89% (HIGH - Likely AI-generated)
Issues Detected:
- 8 AI-typical words in 60 words (13.3 per 1000 words!)
- Every sentence starts with transition word
- Uniform sentence length (15-18 words each)
- Perfect grammar, zero natural imperfections
- Mechanical paragraph structure
AI Words Found:
- furthermore, comprehensive, delves, robust
- utilized, facilitate, innovative, leverage
- demonstrates, significant, subsequently, substantial
- nevertheless, crucial, comprehend
Recommendation: Complete rewrite recommended
Suggested Revision:
We examined the methods used in this approach. The analysis shows
clear improvements across metrics. However, more research is needed
to understand the full implications.
(23 words, 12% confidence - much more natural)
Example 2: Medium Confidence Detection
Input Text:
The experimental design followed standard protocols established in
previous work (Smith et al., 2023). We collected data from 150
participants over six months. Statistical analysis used mixed-effects
models to account for repeated measures. The results showed a
significant main effect of condition (p < 0.001).
AI-Check Report:
Overall Confidence: 35% (MEDIUM - Possible minor AI assistance)
Issues Detected:
- Slightly uniform sentence length (11-15 words)
- One AI-typical word: "significant" (statistical context acceptable)
- Otherwise natural academic writing
Recommendation: Minor revisions optional, writing appears largely authentic
Example 3: Low Confidence (Human Writing)
Input Text:
OK so here's what we found. The effect was huge - way bigger than
expected. Participants in the experimental group scored 23% higher
on average. This wasn't just statistically significant; it was
practically meaningful.
We're still not sure why. Maybe it's the timing? Could be the
instructions were clearer. Need to run follow-ups.
AI-Check Report:
Overall Confidence: 8% (LOW - Clearly human writing)
Human Writing Indicators:
- Natural sentence variation (4-19 words)
- Informal elements ("OK so", "way bigger")
- Incomplete thoughts and questions
- Natural uncertainty expressions
- Zero AI-typical words
- Authentic voice throughout
Recommendation: Writing is authentic, no changes needed
Best Practices
For PhD Students
-
Run Before Advisor Meetings
- Check chapters before sending to advisor
- Ensure authenticity before committee review
- Track improvements over time
-
Use During Drafting
- Check each section after writing
- Apply suggestions immediately
- Develop natural writing habits
-
Pre-Submission Validation
- Run on complete manuscripts before journal submission
- Check supplementary materials
- Verify all documentation
For Research Teams
-
Establish Team Standards
- Set agreed-upon confidence thresholds
- Define when to block vs. warn
- Create team-specific word lists
-
Code Review Integration
- Check documentation in pull requests
- Validate README files and guides
- Ensure authentic technical writing
-
Track Team Writing
- Monitor trends across team members
- Identify systematic issues
- Share improvement strategies
For Journal Submission
-
Pre-Submission Checklist
- Overall confidence <30%
- No flagged high-risk sections
- AI-typical words <2 per 1000 words
- Natural sentence variation present
- Authentic academic voice throughout
-
Demonstrating Authenticity
- Include AI-check reports in submission materials
- Show writing evolution over time
- Document revision process
Limitations
What This Skill Cannot Do
-
Not 100% Accurate
- LLMs constantly improving
- Patterns evolve over time
- False positives possible (very formal human writing)
- False negatives possible (heavily edited AI text)
-
Cannot Detect All AI Usage
- Well-edited AI text may pass
- Human writing in AI style may be flagged
- Paraphrasing tools may evade detection
- Future models may have different patterns
-
Domain Limitations
- Trained primarily on academic writing
- May not work well for creative writing
- Technical jargon may affect scores
- Non-English text not supported
Use Alongside Human Judgment
This skill is a tool, not a replacement for human judgment:
- Use confidence scores as guidance, not absolute truth
- Consider context and field-specific norms
- Combine with plagiarism detection tools
- Maintain academic integrity standards
- Update word lists as AI patterns evolve
Support
Troubleshooting
Problem: False positive on authentic writing Solution: Check if writing is overly formal. Consider field-specific norms. Adjust thresholds in config.
Problem: AI text passing with low confidence Solution: Update AI-typical word lists. Check for heavily edited text. Report patterns for skill updates.
Problem: Pre-commit hook too slow Solution: Reduce checked file types. Enable caching. Check only modified sections.
Problem: Disagreement with manual review Solution: Generate detailed report. Review flagged sections specifically. Consider multiple metrics not just overall score.
Getting Help
- Documentation: See
docs/skills/ai-check-reference.md - Issues: https://github.com/astoreyai/ai_scientist/issues
- Updates: Check for skill updates regularly as AI patterns evolve
Last Updated: 2025-11-09 Version: 1.0.0 License: MIT