16 KiB
Citation Validation Guide
Comprehensive guide to validating citation accuracy, completeness, and formatting in BibTeX files.
Overview
Citation validation ensures:
- All citations are accurate and complete
- DOIs resolve correctly
- Required fields are present
- No duplicate entries
- Proper formatting and syntax
- Links are accessible
Validation should be performed:
- After extracting metadata
- Before manuscript submission
- After manual edits to BibTeX files
- Periodically for maintained bibliographies
Validation Categories
1. DOI Verification
Purpose: Ensure DOIs are valid and resolve correctly.
What to Check
DOI format:
Valid: 10.1038/s41586-021-03819-2
Valid: 10.1126/science.aam9317
Invalid: 10.1038/invalid
Invalid: doi:10.1038/... (should omit "doi:" prefix in BibTeX)
DOI resolution:
- DOI should resolve via https://doi.org/
- Should redirect to actual article
- Should not return 404 or error
Metadata consistency:
- CrossRef metadata should match BibTeX
- Author names should align
- Title should match
- Year should match
How to Validate
Manual check:
- Copy DOI from BibTeX
- Visit https://doi.org/10.1038/nature12345
- Verify it redirects to correct article
- Check metadata matches
Automated check (recommended):
python scripts/validate_citations.py references.bib --check-dois
Process:
- Extract all DOIs from BibTeX file
- Query doi.org resolver for each
- Query CrossRef API for metadata
- Compare metadata with BibTeX entry
- Report discrepancies
Common Issues
Broken DOIs:
- Typos in DOI
- Publisher changed DOI (rare)
- Article retracted
- Solution: Find correct DOI from publisher site
Mismatched metadata:
- BibTeX has old/incorrect information
- Solution: Re-extract metadata from CrossRef
Missing DOIs:
- Older articles may not have DOIs
- Acceptable for pre-2000 publications
- Add URL or PMID instead
2. Required Fields
Purpose: Ensure all necessary information is present.
Required by Entry Type
@article:
author % REQUIRED
title % REQUIRED
journal % REQUIRED
year % REQUIRED
volume % Highly recommended
pages % Highly recommended
doi % Highly recommended for modern papers
@book:
author OR editor % REQUIRED (at least one)
title % REQUIRED
publisher % REQUIRED
year % REQUIRED
isbn % Recommended
@inproceedings:
author % REQUIRED
title % REQUIRED
booktitle % REQUIRED (conference/proceedings name)
year % REQUIRED
pages % Recommended
@incollection (book chapter):
author % REQUIRED
title % REQUIRED (chapter title)
booktitle % REQUIRED (book title)
publisher % REQUIRED
year % REQUIRED
editor % Recommended
pages % Recommended
@phdthesis:
author % REQUIRED
title % REQUIRED
school % REQUIRED
year % REQUIRED
@misc (preprints, datasets, etc.):
author % REQUIRED
title % REQUIRED
year % REQUIRED
howpublished % Recommended (bioRxiv, Zenodo, etc.)
doi OR url % At least one required
Validation Script
python scripts/validate_citations.py references.bib --check-required-fields
Output:
Error: Entry 'Smith2024' missing required field 'journal'
Error: Entry 'Doe2023' missing required field 'year'
Warning: Entry 'Jones2022' missing recommended field 'volume'
3. Author Name Formatting
Purpose: Ensure consistent, correct author name formatting.
Proper Format
Recommended BibTeX format:
author = {Last1, First1 and Last2, First2 and Last3, First3}
Examples:
% Correct
author = {Smith, John}
author = {Smith, John A.}
author = {Smith, John Andrew}
author = {Smith, John and Doe, Jane}
author = {Smith, John and Doe, Jane and Johnson, Mary}
% For many authors
author = {Smith, John and Doe, Jane and others}
% Incorrect
author = {John Smith} % First Last format (not recommended)
author = {Smith, J.; Doe, J.} % Semicolon separator (wrong)
author = {Smith J, Doe J} % Missing commas
Special Cases
Suffixes (Jr., III, etc.):
author = {King, Jr., Martin Luther}
Multiple surnames (hyphenated):
author = {Smith-Jones, Mary}
Van, von, de, etc.:
author = {van der Waals, Johannes}
author = {de Broglie, Louis}
Organizations as authors:
author = {{World Health Organization}}
% Double braces treat as single author
Validation Checks
Automated validation:
python scripts/validate_citations.py references.bib --check-authors
Checks for:
- Proper separator (and, not &, ; , etc.)
- Comma placement
- Empty author fields
- Malformed names
4. Data Consistency
Purpose: Ensure all fields contain valid, reasonable values.
Year Validation
Valid years:
year = {2024} % Current/recent
year = {1953} % Watson & Crick DNA structure (historical)
year = {1665} % Hooke's Micrographia (very old)
Invalid years:
year = {24} % Two digits (ambiguous)
year = {202} % Typo
year = {2025} % Future (unless accepted/in press)
year = {0} % Obviously wrong
Check:
- Four digits
- Reasonable range (1600-current+1)
- Not all zeros
Volume/Number Validation
volume = {123} % Numeric
volume = {12} % Valid
number = {3} % Valid
number = {S1} % Supplement issue (valid)
Invalid:
volume = {Vol. 123} % Should be just number
number = {Issue 3} % Should be just number
Page Range Validation
Correct format:
pages = {123--145} % En-dash (two hyphens)
pages = {e0123456} % PLOS-style article ID
pages = {123} % Single page
Incorrect format:
pages = {123-145} % Single hyphen (use --)
pages = {pp. 123-145} % Remove "pp."
pages = {123–145} % Unicode en-dash (may cause issues)
URL Validation
Check:
- URLs are accessible (return 200 status)
- HTTPS when available
- No obvious typos
- Permanent links (not temporary)
Valid:
url = {https://www.nature.com/articles/nature12345}
url = {https://arxiv.org/abs/2103.14030}
Questionable:
url = {http://...} % HTTP instead of HTTPS
url = {file:///...} % Local file path
url = {bit.ly/...} % URL shortener (not permanent)
5. Duplicate Detection
Purpose: Find and remove duplicate entries.
Types of Duplicates
Exact duplicates (same DOI):
@article{Smith2024a,
doi = {10.1038/nature12345},
...
}
@article{Smith2024b,
doi = {10.1038/nature12345}, % Same DOI!
...
}
Near duplicates (similar title/authors):
@article{Smith2024,
title = {Machine Learning for Drug Discovery},
...
}
@article{Smith2024method,
title = {Machine learning for drug discovery}, % Same, different case
...
}
Preprint + Published:
@misc{Smith2023arxiv,
title = {AlphaFold Results},
howpublished = {arXiv},
...
}
@article{Smith2024,
title = {AlphaFold Results}, % Same paper, now published
journal = {Nature},
...
}
% Keep published version only
Detection Methods
By DOI (most reliable):
- Same DOI = exact duplicate
- Keep one, remove other
By title similarity:
- Normalize: lowercase, remove punctuation
- Calculate similarity (e.g., Levenshtein distance)
- Flag if >90% similar
By author-year-title:
- Same first author + year + similar title
- Likely duplicate
Automated detection:
python scripts/validate_citations.py references.bib --check-duplicates
Output:
Warning: Possible duplicate entries:
- Smith2024a (DOI: 10.1038/nature12345)
- Smith2024b (DOI: 10.1038/nature12345)
Recommendation: Keep one entry, remove the other.
6. Format and Syntax
Purpose: Ensure valid BibTeX syntax.
Common Syntax Errors
Missing commas:
@article{Smith2024,
author = {Smith, John} % Missing comma!
title = {Title}
}
% Should be:
author = {Smith, John}, % Comma after each field
Unbalanced braces:
title = {Title with {Protected} Text % Missing closing brace
% Should be:
title = {Title with {Protected} Text}
Missing closing brace for entry:
@article{Smith2024,
author = {Smith, John},
title = {Title}
% Missing closing brace!
% Should end with:
}
Invalid characters in keys:
@article{Smith&Doe2024, % & not allowed in key
...
}
% Use:
@article{SmithDoe2024,
...
}
BibTeX Syntax Rules
Entry structure:
@TYPE{citationkey,
field1 = {value1},
field2 = {value2},
...
fieldN = {valueN}
}
Citation keys:
- Alphanumeric and some punctuation (-, _, ., :)
- No spaces
- Case-sensitive
- Unique within file
Field values:
- Enclosed in {braces} or "quotes"
- Braces preferred for complex text
- Numbers can be unquoted:
year = 2024
Special characters:
{and}for grouping\for LaTeX commands- Protect capitalization:
{AlphaFold} - Accents:
{\"u},{\'e},{\aa}
Validation
python scripts/validate_citations.py references.bib --check-syntax
Checks:
- Valid BibTeX structure
- Balanced braces
- Proper commas
- Valid entry types
- Unique citation keys
Validation Workflow
Step 1: Basic Validation
Run comprehensive validation:
python scripts/validate_citations.py references.bib
Checks all:
- DOI resolution
- Required fields
- Author formatting
- Data consistency
- Duplicates
- Syntax
Step 2: Review Report
Examine validation report:
{
"total_entries": 150,
"valid_entries": 140,
"errors": [
{
"entry": "Smith2024",
"error": "missing_required_field",
"field": "journal",
"severity": "high"
},
{
"entry": "Doe2023",
"error": "invalid_doi",
"doi": "10.1038/broken",
"severity": "high"
}
],
"warnings": [
{
"entry": "Jones2022",
"warning": "missing_recommended_field",
"field": "volume",
"severity": "medium"
}
],
"duplicates": [
{
"entries": ["Smith2024a", "Smith2024b"],
"reason": "same_doi",
"doi": "10.1038/nature12345"
}
]
}
Step 3: Fix Issues
High-priority (errors):
- Add missing required fields
- Fix broken DOIs
- Remove duplicates
- Correct syntax errors
Medium-priority (warnings):
- Add recommended fields
- Improve author formatting
- Fix page ranges
Low-priority:
- Standardize formatting
- Add URLs for accessibility
Step 4: Auto-Fix
Use auto-fix for safe corrections:
python scripts/validate_citations.py references.bib \
--auto-fix \
--output fixed_references.bib
Auto-fix can:
- Fix page range format (- to --)
- Remove "pp." from pages
- Standardize author separators
- Fix common syntax errors
- Normalize field order
Auto-fix cannot:
- Add missing information
- Find correct DOIs
- Determine which duplicate to keep
- Fix semantic errors
Step 5: Manual Review
Review auto-fixed file:
# Check what changed
diff references.bib fixed_references.bib
# Review specific entries that had errors
grep -A 10 "Smith2024" fixed_references.bib
Step 6: Re-Validate
Validate after fixes:
python scripts/validate_citations.py fixed_references.bib --verbose
Should show:
✓ All DOIs valid
✓ All required fields present
✓ No duplicates found
✓ Syntax valid
✓ 150/150 entries valid
Validation Checklist
Use this checklist before final submission:
DOI Validation
- All DOIs resolve correctly
- Metadata matches between BibTeX and CrossRef
- No broken or invalid DOIs
Completeness
- All entries have required fields
- Modern papers (2000+) have DOIs
- Authors properly formatted
- Journals/conferences properly named
Consistency
- Years are 4-digit numbers
- Page ranges use -- not -
- Volume/number are numeric
- URLs are accessible
Duplicates
- No entries with same DOI
- No near-duplicate titles
- Preprints updated to published versions
Formatting
- Valid BibTeX syntax
- Balanced braces
- Proper commas
- Unique citation keys
Final Checks
- Bibliography compiles without errors
- All citations in text appear in bibliography
- All bibliography entries cited in text
- Citation style matches journal requirements
Best Practices
1. Validate Early and Often
# After extraction
python scripts/extract_metadata.py --doi ... --output refs.bib
python scripts/validate_citations.py refs.bib
# After manual edits
python scripts/validate_citations.py refs.bib
# Before submission
python scripts/validate_citations.py refs.bib --strict
2. Use Automated Tools
Don't validate manually - use scripts:
- Faster
- More comprehensive
- Catches errors humans miss
- Generates reports
3. Keep Backup
# Before auto-fix
cp references.bib references_backup.bib
# Run auto-fix
python scripts/validate_citations.py references.bib \
--auto-fix \
--output references_fixed.bib
# Review changes
diff references.bib references_fixed.bib
# If satisfied, replace
mv references_fixed.bib references.bib
4. Fix High-Priority First
Priority order:
- Syntax errors (prevent compilation)
- Missing required fields (incomplete citations)
- Broken DOIs (broken links)
- Duplicates (confusion, wasted space)
- Missing recommended fields
- Formatting inconsistencies
5. Document Exceptions
For entries that can't be fixed:
@article{Old1950,
author = {Smith, John},
title = {Title},
journal = {Obscure Journal},
year = {1950},
volume = {12},
pages = {34--56},
note = {DOI not available for publications before 2000}
}
6. Validate Against Journal Requirements
Different journals have different requirements:
- Citation style (numbered, author-year)
- Abbreviations (journal names)
- Maximum reference count
- Format (BibTeX, EndNote, manual)
Check journal author guidelines!
Common Validation Issues
Issue 1: Metadata Mismatch
Problem: BibTeX says 2023, CrossRef says 2024.
Cause:
- Online-first vs print publication
- Correction/update
- Extraction error
Solution:
- Check actual article
- Use more recent/accurate date
- Update BibTeX entry
- Re-validate
Issue 2: Special Characters
Problem: LaTeX compilation fails on special characters.
Cause:
- Accented characters (é, ü, ñ)
- Chemical formulas (H₂O)
- Math symbols (α, β, ±)
Solution:
% Use LaTeX commands
author = {M{\"u}ller, Hans} % Müller
title = {Study of H\textsubscript{2}O} % H₂O
% Or use UTF-8 with proper LaTeX packages
Issue 3: Incomplete Extraction
Problem: Extracted metadata missing fields.
Cause:
- Source doesn't provide all metadata
- Extraction error
- Incomplete record
Solution:
- Check original article
- Manually add missing fields
- Use alternative source (PubMed vs CrossRef)
Issue 4: Cannot Find Duplicate
Problem: Same paper appears twice, not detected.
Cause:
- Different DOIs (should be rare)
- Different titles (abbreviated, typo)
- Different citation keys
Solution:
- Manual search for author + year
- Check for similar titles
- Remove manually
Summary
Validation ensures citation quality:
✓ Accuracy: DOIs resolve, metadata correct
✓ Completeness: All required fields present
✓ Consistency: Proper formatting throughout
✓ No duplicates: Each paper cited once
✓ Valid syntax: BibTeX compiles without errors
Always validate before final submission!
Use automated tools:
python scripts/validate_citations.py references.bib
Follow workflow:
- Extract metadata
- Validate
- Fix errors
- Re-validate
- Submit