17 KiB
PubMed Search Guide
Comprehensive guide to searching PubMed for biomedical and life sciences literature, including MeSH terms, field tags, advanced search strategies, and E-utilities API usage.
Overview
PubMed is the premier database for biomedical literature:
- Coverage: 35+ million citations
- Scope: Biomedical and life sciences
- Sources: MEDLINE, life science journals, online books
- Authority: Maintained by National Library of Medicine (NLM) / NCBI
- Access: Free, no account required
- Updates: Daily with new citations
- Curation: High-quality metadata, MeSH indexing
Basic Search
Simple Keyword Search
PubMed automatically maps terms to MeSH and searches multiple fields:
diabetes
CRISPR gene editing
Alzheimer's disease treatment
cancer immunotherapy
Automatic Features:
- Automatic MeSH mapping
- Plural/singular variants
- Abbreviation expansion
- Spell checking
Exact Phrase Search
Use quotation marks for exact phrases:
"CRISPR-Cas9"
"systematic review"
"randomized controlled trial"
"machine learning"
MeSH (Medical Subject Headings)
What is MeSH?
MeSH is a controlled vocabulary thesaurus for indexing biomedical literature:
- Hierarchical structure: Organized in tree structures
- Consistent indexing: Same concept always tagged the same way
- Comprehensive: Covers diseases, drugs, anatomy, techniques, etc.
- Professional curation: NLM indexers assign MeSH terms
Finding MeSH Terms
MeSH Browser: https://meshb.nlm.nih.gov/search
Example:
Search: "heart attack"
MeSH term: "Myocardial Infarction"
In PubMed:
- Search with keyword
- Check "MeSH Terms" in left sidebar
- Select relevant MeSH terms
- Add to search
Using MeSH in Searches
Basic MeSH search:
"Diabetes Mellitus"[MeSH]
"CRISPR-Cas Systems"[MeSH]
"Alzheimer Disease"[MeSH]
"Neoplasms"[MeSH]
MeSH with subheadings:
"Diabetes Mellitus/drug therapy"[MeSH]
"Neoplasms/genetics"[MeSH]
"Heart Failure/prevention and control"[MeSH]
Common subheadings:
/drug therapy: Drug treatment/diagnosis: Diagnostic aspects/genetics: Genetic aspects/epidemiology: Occurrence and distribution/prevention and control: Prevention methods/etiology: Causes/surgery: Surgical treatment/metabolism: Metabolic aspects
MeSH Explosion
By default, MeSH searches include narrower terms (explosion):
"Neoplasms"[MeSH]
# Includes: Breast Neoplasms, Lung Neoplasms, etc.
Disable explosion (exact term only):
"Neoplasms"[MeSH:NoExp]
MeSH Major Topic
Search only where MeSH term is a major focus:
"Diabetes Mellitus"[MeSH Major Topic]
# Only papers where diabetes is main topic
Field Tags
Field tags specify which part of the record to search.
Common Field Tags
Title and Abstract:
cancer[Title] # In title only
treatment[Title/Abstract] # In title or abstract
"machine learning"[Title/Abstract]
Author:
"Smith J"[Author]
"Doudna JA"[Author]
"Collins FS"[Author]
Author - Full Name:
"Smith, John"[Full Author Name]
Journal:
"Nature"[Journal]
"Science"[Journal]
"New England Journal of Medicine"[Journal]
"Nat Commun"[Journal] # Abbreviated form
Publication Date:
2023[Publication Date]
2020:2024[Publication Date] # Date range
2023/01/01:2023/12/31[Publication Date]
Date Created:
2023[Date - Create] # When added to PubMed
Publication Type:
"Review"[Publication Type]
"Clinical Trial"[Publication Type]
"Meta-Analysis"[Publication Type]
"Randomized Controlled Trial"[Publication Type]
Language:
English[Language]
French[Language]
DOI:
10.1038/nature12345[DOI]
PMID (PubMed ID):
12345678[PMID]
Article ID:
PMC1234567[PMC] # PubMed Central ID
Less Common But Useful Tags
humans[MeSH Terms] # Only human studies
animals[MeSH Terms] # Only animal studies
"United States"[Place of Publication]
nih[Grant Number] # NIH-funded research
"Female"[Sex] # Female subjects
"Aged, 80 and over"[Age] # Elderly subjects
Boolean Operators
Combine search terms with Boolean logic.
AND
Both terms must be present (default behavior):
diabetes AND treatment
"CRISPR-Cas9" AND "gene editing"
cancer AND immunotherapy AND "clinical trial"[Publication Type]
OR
Either term must be present:
"heart attack" OR "myocardial infarction"
diabetes OR "diabetes mellitus"
CRISPR OR Cas9 OR "gene editing"
Use case: Synonyms and related terms
NOT
Exclude terms:
cancer NOT review
diabetes NOT animal
"machine learning" NOT "deep learning"
Caution: May exclude relevant papers that mention both terms.
Combining Operators
Use parentheses for complex logic:
(diabetes OR "diabetes mellitus") AND (treatment OR therapy)
("CRISPR" OR "gene editing") AND ("therapeutic" OR "therapy")
AND 2020:2024[Publication Date]
(cancer OR neoplasm) AND (immunotherapy OR "immune checkpoint inhibitor")
AND ("clinical trial"[Publication Type] OR "randomized controlled trial"[Publication Type])
Advanced Search Builder
Access: https://pubmed.ncbi.nlm.nih.gov/advanced/
Features:
- Visual query builder
- Add multiple query boxes
- Select field tags from dropdowns
- Combine with AND/OR/NOT
- Preview results
- Shows final query string
- Save queries
Workflow:
- Add search terms in separate boxes
- Select field tags
- Choose Boolean operators
- Preview results
- Refine as needed
- Copy final query string
- Use in scripts or save
Example built query:
#1: "Diabetes Mellitus, Type 2"[MeSH]
#2: "Metformin"[MeSH]
#3: "Clinical Trial"[Publication Type]
#4: 2020:2024[Publication Date]
#5: #1 AND #2 AND #3 AND #4
Filters and Limits
Article Types
"Review"[Publication Type]
"Systematic Review"[Publication Type]
"Meta-Analysis"[Publication Type]
"Clinical Trial"[Publication Type]
"Randomized Controlled Trial"[Publication Type]
"Case Reports"[Publication Type]
"Comparative Study"[Publication Type]
Species
humans[MeSH Terms]
mice[MeSH Terms]
rats[MeSH Terms]
Sex
"Female"[MeSH Terms]
"Male"[MeSH Terms]
Age Groups
"Infant"[MeSH Terms]
"Child"[MeSH Terms]
"Adolescent"[MeSH Terms]
"Adult"[MeSH Terms]
"Aged"[MeSH Terms]
"Aged, 80 and over"[MeSH Terms]
Text Availability
free full text[Filter] # Free full-text available
Journal Categories
"Journal Article"[Publication Type]
E-utilities API
NCBI provides programmatic access via E-utilities (Entrez Programming Utilities).
Overview
Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/
Main Tools:
- ESearch: Search and retrieve PMIDs
- EFetch: Retrieve full records
- ESummary: Retrieve document summaries
- ELink: Find related articles
- EInfo: Database statistics
No API key required, but recommended for:
- Higher rate limits (10/sec vs 3/sec)
- Better performance
- Identify your project
Get API key: https://www.ncbi.nlm.nih.gov/account/
ESearch - Search PubMed
Retrieve PMIDs for a query.
Endpoint: /esearch.fcgi
Parameters:
db: Database (pubmed)term: Search queryretmax: Maximum results (default 20, max 10000)retstart: Starting position (for pagination)sort: Sort order (relevance, pub_date, author)api_key: Your API key (optional but recommended)
Example URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&
term=diabetes+AND+treatment&
retmax=100&
retmode=json&
api_key=YOUR_API_KEY
Response:
{
"esearchresult": {
"count": "250000",
"retmax": "100",
"idlist": ["12345678", "12345679", ...]
}
}
EFetch - Retrieve Records
Get full metadata for PMIDs.
Endpoint: /efetch.fcgi
Parameters:
db: Database (pubmed)id: Comma-separated PMIDsretmode: Format (xml, json, text)rettype: Type (abstract, medline, full)api_key: Your API key
Example URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&
id=12345678,12345679&
retmode=xml&
api_key=YOUR_API_KEY
Response: XML with complete metadata including:
- Title
- Authors (with affiliations)
- Abstract
- Journal
- Publication date
- DOI
- PMID, PMCID
- MeSH terms
- Keywords
ESummary - Get Summaries
Lighter-weight alternative to EFetch.
Example:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?
db=pubmed&
id=12345678&
retmode=json&
api_key=YOUR_API_KEY
Returns: Key metadata without full abstract and details.
ELink - Find Related Articles
Find related articles or links to other databases.
Example:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?
dbfrom=pubmed&
db=pubmed&
id=12345678&
linkname=pubmed_pubmed_citedin
Link types:
pubmed_pubmed: Related articlespubmed_pubmed_citedin: Papers citing this articlepubmed_pmc: PMC full-text versionspubmed_protein: Related protein records
Rate Limiting
Without API key:
- 3 requests per second
- Block if exceeded
With API key:
- 10 requests per second
- Better for programmatic access
Best practice:
import time
time.sleep(0.34) # ~3 requests/second
# or
time.sleep(0.11) # ~10 requests/second with API key
API Key Usage
Get API key:
- Create NCBI account: https://www.ncbi.nlm.nih.gov/account/
- Settings → API Key Management
- Create new API key
- Copy key
Use in requests:
&api_key=YOUR_API_KEY_HERE
Store securely:
# In environment variable
export NCBI_API_KEY="your_key_here"
# In script
import os
api_key = os.getenv('NCBI_API_KEY')
Search Strategies
Comprehensive Systematic Search
For systematic reviews and meta-analyses:
# 1. Identify key concepts
Concept 1: Diabetes
Concept 2: Treatment
Concept 3: Outcomes
# 2. Find MeSH terms and synonyms
Concept 1: "Diabetes Mellitus"[MeSH] OR diabetes OR diabetic
Concept 2: "Drug Therapy"[MeSH] OR treatment OR therapy OR medication
Concept 3: "Treatment Outcome"[MeSH] OR outcome OR efficacy OR effectiveness
# 3. Combine with AND
("Diabetes Mellitus"[MeSH] OR diabetes OR diabetic)
AND ("Drug Therapy"[MeSH] OR treatment OR therapy OR medication)
AND ("Treatment Outcome"[MeSH] OR outcome OR efficacy OR effectiveness)
# 4. Add filters
AND 2015:2024[Publication Date]
AND ("Clinical Trial"[Publication Type] OR "Randomized Controlled Trial"[Publication Type])
AND English[Language]
AND humans[MeSH Terms]
Finding Clinical Trials
# Specific disease + clinical trials
"Alzheimer Disease"[MeSH]
AND ("Clinical Trial"[Publication Type]
OR "Randomized Controlled Trial"[Publication Type])
AND 2020:2024[Publication Date]
# Specific drug trials
"Metformin"[MeSH]
AND "Diabetes Mellitus, Type 2"[MeSH]
AND "Randomized Controlled Trial"[Publication Type]
Finding Reviews
# Systematic reviews on topic
"CRISPR-Cas Systems"[MeSH]
AND ("Systematic Review"[Publication Type] OR "Meta-Analysis"[Publication Type])
# Reviews in high-impact journals
cancer immunotherapy
AND "Review"[Publication Type]
AND ("Nature"[Journal] OR "Science"[Journal] OR "Cell"[Journal])
Finding Recent Papers
# Papers from last year
"machine learning"[Title/Abstract]
AND "drug discovery"[Title/Abstract]
AND 2024[Publication Date]
# Recent papers in specific journal
"CRISPR"[Title/Abstract]
AND "Nature"[Journal]
AND 2023:2024[Publication Date]
Author Tracking
# Specific author's recent work
"Doudna JA"[Author] AND 2020:2024[Publication Date]
# Author + topic
"Church GM"[Author] AND "synthetic biology"[Title/Abstract]
High-Quality Evidence
# Meta-analyses and systematic reviews
(diabetes OR "diabetes mellitus")
AND (treatment OR therapy)
AND ("Meta-Analysis"[Publication Type] OR "Systematic Review"[Publication Type])
# RCTs only
cancer immunotherapy
AND "Randomized Controlled Trial"[Publication Type]
AND 2020:2024[Publication Date]
Script Integration
search_pubmed.py Usage
Basic search:
python scripts/search_pubmed.py "diabetes treatment"
With MeSH terms:
python scripts/search_pubmed.py \
--query '"Diabetes Mellitus"[MeSH] AND "Drug Therapy"[MeSH]'
Date range filter:
python scripts/search_pubmed.py "CRISPR" \
--date-start 2020-01-01 \
--date-end 2024-12-31 \
--limit 200
Publication type filter:
python scripts/search_pubmed.py "cancer immunotherapy" \
--publication-types "Clinical Trial,Randomized Controlled Trial" \
--limit 100
Export to BibTeX:
python scripts/search_pubmed.py "Alzheimer's disease" \
--limit 100 \
--format bibtex \
--output alzheimers.bib
Complex query from file:
# Save complex query in query.txt
cat > query.txt << 'EOF'
("Diabetes Mellitus, Type 2"[MeSH] OR "diabetes"[Title/Abstract])
AND ("Metformin"[MeSH] OR "metformin"[Title/Abstract])
AND "Randomized Controlled Trial"[Publication Type]
AND 2015:2024[Publication Date]
AND English[Language]
EOF
# Run search
python scripts/search_pubmed.py --query-file query.txt --limit 500
Batch Searches
# Search multiple topics
TOPICS=("diabetes treatment" "cancer immunotherapy" "CRISPR gene editing")
for topic in "${TOPICS[@]}"; do
python scripts/search_pubmed.py "$topic" \
--limit 100 \
--output "${topic// /_}.json"
sleep 1
done
Extract Metadata
# Search returns PMIDs
python scripts/search_pubmed.py "topic" --output results.json
# Extract full metadata
python scripts/extract_metadata.py \
--input results.json \
--output references.bib
Tips and Best Practices
Search Construction
-
Start with MeSH terms:
- Use MeSH Browser to find correct terms
- More precise than keyword search
- Captures all papers on topic regardless of terminology
-
Include text word variants:
# Better coverage ("Diabetes Mellitus"[MeSH] OR diabetes OR diabetic) -
Use field tags appropriately:
[MeSH]for standardized concepts[Title/Abstract]for specific terms[Author]for known authors[Journal]for specific venues
-
Build incrementally:
# Step 1: Basic search diabetes # Step 2: Add specificity "Diabetes Mellitus, Type 2"[MeSH] # Step 3: Add treatment "Diabetes Mellitus, Type 2"[MeSH] AND "Metformin"[MeSH] # Step 4: Add study type "Diabetes Mellitus, Type 2"[MeSH] AND "Metformin"[MeSH] AND "Clinical Trial"[Publication Type] # Step 5: Add date range ... AND 2020:2024[Publication Date]
Optimizing Results
-
Too many results: Add filters
- Restrict publication type
- Narrow date range
- Add more specific MeSH terms
- Use Major Topic:
[MeSH Major Topic]
-
Too few results: Broaden search
- Remove restrictive filters
- Use OR for synonyms
- Expand date range
- Use MeSH explosion (default)
-
Irrelevant results: Refine terms
- Use more specific MeSH terms
- Add exclusions with NOT
- Use Title field instead of all fields
- Add MeSH subheadings
Quality Control
-
Document search strategy:
- Save exact query string
- Record search date
- Note number of results
- Save filters used
-
Export systematically:
- Use consistent file naming
- Export to JSON for flexibility
- Convert to BibTeX as needed
- Keep original search results
-
Validate retrieved citations:
python scripts/validate_citations.py pubmed_results.bib
Staying Current
-
Set up search alerts:
- PubMed → Save search
- Receive email updates
- Daily, weekly, or monthly
-
Track specific journals:
"Nature"[Journal] AND CRISPR[Title] -
Follow key authors:
"Church GM"[Author]
Common Issues and Solutions
Issue: MeSH Term Not Found
Solution:
- Check spelling
- Use MeSH Browser
- Try related terms
- Use text word search as fallback
Issue: Zero Results
Solution:
- Remove filters
- Check query syntax
- Use OR for broader search
- Try synonyms
Issue: Poor Quality Results
Solution:
- Add publication type filters
- Restrict to recent years
- Use MeSH Major Topic
- Filter by journal quality
Issue: Duplicates from Different Sources
Solution:
python scripts/format_bibtex.py results.bib \
--deduplicate \
--output clean.bib
Issue: API Rate Limiting
Solution:
- Get API key (increases limit to 10/sec)
- Add delays in scripts
- Process in batches
- Use off-peak hours
Summary
PubMed provides authoritative biomedical literature search:
✓ Curated content: MeSH indexing, quality control
✓ Precise search: Field tags, MeSH terms, filters
✓ Programmatic access: E-utilities API
✓ Free access: No subscription required
✓ Comprehensive: 35M+ citations, daily updates
Key strategies:
- Use MeSH terms for precise searching
- Combine with text words for comprehensive coverage
- Apply appropriate field tags
- Filter by publication type and date
- Use E-utilities API for automation
- Document search strategy for reproducibility
For broader coverage across disciplines, complement with Google Scholar.