Initial commit
This commit is contained in:
546
skills/literature-review/SKILL.md
Normal file
546
skills/literature-review/SKILL.md
Normal file
@@ -0,0 +1,546 @@
|
||||
---
|
||||
name: literature-review
|
||||
description: Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).
|
||||
---
|
||||
|
||||
# Literature Review
|
||||
|
||||
## Overview
|
||||
|
||||
Conduct systematic, comprehensive literature reviews following rigorous academic methodology. Search multiple literature databases, synthesize findings thematically, verify all citations for accuracy, and generate professional output documents in markdown and PDF formats.
|
||||
|
||||
This skill integrates with multiple scientific skills for database access (gget, bioservices, datacommons-client) and provides specialized tools for citation verification, result aggregation, and document generation.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Conducting a systematic literature review for research or publication
|
||||
- Synthesizing current knowledge on a specific topic across multiple sources
|
||||
- Performing meta-analysis or scoping reviews
|
||||
- Writing the literature review section of a research paper or thesis
|
||||
- Investigating the state of the art in a research domain
|
||||
- Identifying research gaps and future directions
|
||||
- Requiring verified citations and professional formatting
|
||||
|
||||
## Core Workflow
|
||||
|
||||
Literature reviews follow a structured, multi-phase workflow:
|
||||
|
||||
### Phase 1: Planning and Scoping
|
||||
|
||||
1. **Define Research Question**: Use PICO framework (Population, Intervention, Comparison, Outcome) for clinical/biomedical reviews
|
||||
- Example: "What is the efficacy of CRISPR-Cas9 (I) for treating sickle cell disease (P) compared to standard care (C)?"
|
||||
|
||||
2. **Establish Scope and Objectives**:
|
||||
- Define clear, specific research questions
|
||||
- Determine review type (narrative, systematic, scoping, meta-analysis)
|
||||
- Set boundaries (time period, geographic scope, study types)
|
||||
|
||||
3. **Develop Search Strategy**:
|
||||
- Identify 2-4 main concepts from research question
|
||||
- List synonyms, abbreviations, and related terms for each concept
|
||||
- Plan Boolean operators (AND, OR, NOT) to combine terms
|
||||
- Select minimum 3 complementary databases
|
||||
|
||||
4. **Set Inclusion/Exclusion Criteria**:
|
||||
- Date range (e.g., last 10 years: 2015-2024)
|
||||
- Language (typically English, or specify multilingual)
|
||||
- Publication types (peer-reviewed, preprints, reviews)
|
||||
- Study designs (RCTs, observational, in vitro, etc.)
|
||||
- Document all criteria clearly
|
||||
|
||||
### Phase 2: Systematic Literature Search
|
||||
|
||||
1. **Multi-Database Search**:
|
||||
|
||||
Select databases appropriate for the domain:
|
||||
|
||||
**Biomedical & Life Sciences:**
|
||||
- Use `gget` skill: `gget search pubmed "search terms"` for PubMed/PMC
|
||||
- Use `gget` skill: `gget search biorxiv "search terms"` for preprints
|
||||
- Use `bioservices` skill for ChEMBL, KEGG, UniProt, etc.
|
||||
|
||||
**General Scientific Literature:**
|
||||
- Search arXiv via direct API (preprints in physics, math, CS, q-bio)
|
||||
- Search Semantic Scholar via API (200M+ papers, cross-disciplinary)
|
||||
- Use Google Scholar for comprehensive coverage (manual or careful scraping)
|
||||
|
||||
**Specialized Databases:**
|
||||
- Use `gget alphafold` for protein structures
|
||||
- Use `gget cosmic` for cancer genomics
|
||||
- Use `datacommons-client` for demographic/statistical data
|
||||
- Use specialized databases as appropriate for the domain
|
||||
|
||||
2. **Document Search Parameters**:
|
||||
```markdown
|
||||
## Search Strategy
|
||||
|
||||
### Database: PubMed
|
||||
- **Date searched**: 2024-10-25
|
||||
- **Date range**: 2015-01-01 to 2024-10-25
|
||||
- **Search string**:
|
||||
```
|
||||
("CRISPR"[Title] OR "Cas9"[Title])
|
||||
AND ("sickle cell"[MeSH] OR "SCD"[Title/Abstract])
|
||||
AND 2015:2024[Publication Date]
|
||||
```
|
||||
- **Results**: 247 articles
|
||||
```
|
||||
|
||||
Repeat for each database searched.
|
||||
|
||||
3. **Export and Aggregate Results**:
|
||||
- Export results in JSON format from each database
|
||||
- Combine all results into a single file
|
||||
- Use `scripts/search_databases.py` for post-processing:
|
||||
```bash
|
||||
python search_databases.py combined_results.json \
|
||||
--deduplicate \
|
||||
--format markdown \
|
||||
--output aggregated_results.md
|
||||
```
|
||||
|
||||
### Phase 3: Screening and Selection
|
||||
|
||||
1. **Deduplication**:
|
||||
```bash
|
||||
python search_databases.py results.json --deduplicate --output unique_results.json
|
||||
```
|
||||
- Removes duplicates by DOI (primary) or title (fallback)
|
||||
- Document number of duplicates removed
|
||||
|
||||
2. **Title Screening**:
|
||||
- Review all titles against inclusion/exclusion criteria
|
||||
- Exclude obviously irrelevant studies
|
||||
- Document number excluded at this stage
|
||||
|
||||
3. **Abstract Screening**:
|
||||
- Read abstracts of remaining studies
|
||||
- Apply inclusion/exclusion criteria rigorously
|
||||
- Document reasons for exclusion
|
||||
|
||||
4. **Full-Text Screening**:
|
||||
- Obtain full texts of remaining studies
|
||||
- Conduct detailed review against all criteria
|
||||
- Document specific reasons for exclusion
|
||||
- Record final number of included studies
|
||||
|
||||
5. **Create PRISMA Flow Diagram**:
|
||||
```
|
||||
Initial search: n = X
|
||||
├─ After deduplication: n = Y
|
||||
├─ After title screening: n = Z
|
||||
├─ After abstract screening: n = A
|
||||
└─ Included in review: n = B
|
||||
```
|
||||
|
||||
### Phase 4: Data Extraction and Quality Assessment
|
||||
|
||||
1. **Extract Key Data** from each included study:
|
||||
- Study metadata (authors, year, journal, DOI)
|
||||
- Study design and methods
|
||||
- Sample size and population characteristics
|
||||
- Key findings and results
|
||||
- Limitations noted by authors
|
||||
- Funding sources and conflicts of interest
|
||||
|
||||
2. **Assess Study Quality**:
|
||||
- **For RCTs**: Use Cochrane Risk of Bias tool
|
||||
- **For observational studies**: Use Newcastle-Ottawa Scale
|
||||
- **For systematic reviews**: Use AMSTAR 2
|
||||
- Rate each study: High, Moderate, Low, or Very Low quality
|
||||
- Consider excluding very low-quality studies
|
||||
|
||||
3. **Organize by Themes**:
|
||||
- Identify 3-5 major themes across studies
|
||||
- Group studies by theme (studies may appear in multiple themes)
|
||||
- Note patterns, consensus, and controversies
|
||||
|
||||
### Phase 5: Synthesis and Analysis
|
||||
|
||||
1. **Create Review Document** from template:
|
||||
```bash
|
||||
cp assets/review_template.md my_literature_review.md
|
||||
```
|
||||
|
||||
2. **Write Thematic Synthesis** (NOT study-by-study summaries):
|
||||
- Organize Results section by themes or research questions
|
||||
- Synthesize findings across multiple studies within each theme
|
||||
- Compare and contrast different approaches and results
|
||||
- Identify consensus areas and points of controversy
|
||||
- Highlight the strongest evidence
|
||||
|
||||
Example structure:
|
||||
```markdown
|
||||
#### 3.3.1 Theme: CRISPR Delivery Methods
|
||||
|
||||
Multiple delivery approaches have been investigated for therapeutic
|
||||
gene editing. Viral vectors (AAV) were used in 15 studies^1-15^ and
|
||||
showed high transduction efficiency (65-85%) but raised immunogenicity
|
||||
concerns^3,7,12^. In contrast, lipid nanoparticles demonstrated lower
|
||||
efficiency (40-60%) but improved safety profiles^16-23^.
|
||||
```
|
||||
|
||||
3. **Critical Analysis**:
|
||||
- Evaluate methodological strengths and limitations across studies
|
||||
- Assess quality and consistency of evidence
|
||||
- Identify knowledge gaps and methodological gaps
|
||||
- Note areas requiring future research
|
||||
|
||||
4. **Write Discussion**:
|
||||
- Interpret findings in broader context
|
||||
- Discuss clinical, practical, or research implications
|
||||
- Acknowledge limitations of the review itself
|
||||
- Compare with previous reviews if applicable
|
||||
- Propose specific future research directions
|
||||
|
||||
### Phase 6: Citation Verification
|
||||
|
||||
**CRITICAL**: All citations must be verified for accuracy before final submission.
|
||||
|
||||
1. **Verify All DOIs**:
|
||||
```bash
|
||||
python scripts/verify_citations.py my_literature_review.md
|
||||
```
|
||||
|
||||
This script:
|
||||
- Extracts all DOIs from the document
|
||||
- Verifies each DOI resolves correctly
|
||||
- Retrieves metadata from CrossRef
|
||||
- Generates verification report
|
||||
- Outputs properly formatted citations
|
||||
|
||||
2. **Review Verification Report**:
|
||||
- Check for any failed DOIs
|
||||
- Verify author names, titles, and publication details match
|
||||
- Correct any errors in the original document
|
||||
- Re-run verification until all citations pass
|
||||
|
||||
3. **Format Citations Consistently**:
|
||||
- Choose one citation style and use throughout (see `references/citation_styles.md`)
|
||||
- Common styles: APA, Nature, Vancouver, Chicago, IEEE
|
||||
- Use verification script output to format citations correctly
|
||||
- Ensure in-text citations match reference list format
|
||||
|
||||
### Phase 7: Document Generation
|
||||
|
||||
1. **Generate PDF**:
|
||||
```bash
|
||||
python scripts/generate_pdf.py my_literature_review.md \
|
||||
--citation-style apa \
|
||||
--output my_review.pdf
|
||||
```
|
||||
|
||||
Options:
|
||||
- `--citation-style`: apa, nature, chicago, vancouver, ieee
|
||||
- `--no-toc`: Disable table of contents
|
||||
- `--no-numbers`: Disable section numbering
|
||||
- `--check-deps`: Check if pandoc/xelatex are installed
|
||||
|
||||
2. **Review Final Output**:
|
||||
- Check PDF formatting and layout
|
||||
- Verify all sections are present
|
||||
- Ensure citations render correctly
|
||||
- Check that figures/tables appear properly
|
||||
- Verify table of contents is accurate
|
||||
|
||||
3. **Quality Checklist**:
|
||||
- [ ] All DOIs verified with verify_citations.py
|
||||
- [ ] Citations formatted consistently
|
||||
- [ ] PRISMA flow diagram included (for systematic reviews)
|
||||
- [ ] Search methodology fully documented
|
||||
- [ ] Inclusion/exclusion criteria clearly stated
|
||||
- [ ] Results organized thematically (not study-by-study)
|
||||
- [ ] Quality assessment completed
|
||||
- [ ] Limitations acknowledged
|
||||
- [ ] References complete and accurate
|
||||
- [ ] PDF generates without errors
|
||||
|
||||
## Database-Specific Search Guidance
|
||||
|
||||
### PubMed / PubMed Central
|
||||
|
||||
Access via `gget` skill:
|
||||
```bash
|
||||
# Search PubMed
|
||||
gget search pubmed "CRISPR gene editing" -l 100
|
||||
|
||||
# Search with filters
|
||||
# Use PubMed Advanced Search Builder to construct complex queries
|
||||
# Then execute via gget or direct Entrez API
|
||||
```
|
||||
|
||||
**Search tips**:
|
||||
- Use MeSH terms: `"sickle cell disease"[MeSH]`
|
||||
- Field tags: `[Title]`, `[Title/Abstract]`, `[Author]`
|
||||
- Date filters: `2020:2024[Publication Date]`
|
||||
- Boolean operators: AND, OR, NOT
|
||||
- See MeSH browser: https://meshb.nlm.nih.gov/search
|
||||
|
||||
### bioRxiv / medRxiv
|
||||
|
||||
Access via `gget` skill:
|
||||
```bash
|
||||
gget search biorxiv "CRISPR sickle cell" -l 50
|
||||
```
|
||||
|
||||
**Important considerations**:
|
||||
- Preprints are not peer-reviewed
|
||||
- Verify findings with caution
|
||||
- Check if preprint has been published (CrossRef)
|
||||
- Note preprint version and date
|
||||
|
||||
### arXiv
|
||||
|
||||
Access via direct API or WebFetch:
|
||||
```python
|
||||
# Example search categories:
|
||||
# q-bio.QM (Quantitative Methods)
|
||||
# q-bio.GN (Genomics)
|
||||
# q-bio.MN (Molecular Networks)
|
||||
# cs.LG (Machine Learning)
|
||||
# stat.ML (Machine Learning Statistics)
|
||||
|
||||
# Search format: category AND terms
|
||||
search_query = "cat:q-bio.QM AND ti:\"single cell sequencing\""
|
||||
```
|
||||
|
||||
### Semantic Scholar
|
||||
|
||||
Access via direct API (requires API key, or use free tier):
|
||||
- 200M+ papers across all fields
|
||||
- Excellent for cross-disciplinary searches
|
||||
- Provides citation graphs and paper recommendations
|
||||
- Use for finding highly influential papers
|
||||
|
||||
### Specialized Biomedical Databases
|
||||
|
||||
Use appropriate skills:
|
||||
- **ChEMBL**: `bioservices` skill for chemical bioactivity
|
||||
- **UniProt**: `gget` or `bioservices` skill for protein information
|
||||
- **KEGG**: `bioservices` skill for pathways and genes
|
||||
- **COSMIC**: `gget` skill for cancer mutations
|
||||
- **AlphaFold**: `gget alphafold` for protein structures
|
||||
- **PDB**: `gget` or direct API for experimental structures
|
||||
|
||||
### Citation Chaining
|
||||
|
||||
Expand search via citation networks:
|
||||
|
||||
1. **Forward citations** (papers citing key papers):
|
||||
- Use Google Scholar "Cited by"
|
||||
- Use Semantic Scholar or OpenAlex APIs
|
||||
- Identifies newer research building on seminal work
|
||||
|
||||
2. **Backward citations** (references from key papers):
|
||||
- Extract references from included papers
|
||||
- Identify highly cited foundational work
|
||||
- Find papers cited by multiple included studies
|
||||
|
||||
## Citation Style Guide
|
||||
|
||||
Detailed formatting guidelines are in `references/citation_styles.md`. Quick reference:
|
||||
|
||||
### APA (7th Edition)
|
||||
- In-text: (Smith et al., 2023)
|
||||
- Reference: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Title. *Journal*, *22*(4), 301-318. https://doi.org/10.xxx/yyy
|
||||
|
||||
### Nature
|
||||
- In-text: Superscript numbers^1,2^
|
||||
- Reference: Smith, J. D., Johnson, M. L. & Williams, K. R. Title. *Nat. Rev. Drug Discov.* **22**, 301-318 (2023).
|
||||
|
||||
### Vancouver
|
||||
- In-text: Superscript numbers^1,2^
|
||||
- Reference: Smith JD, Johnson ML, Williams KR. Title. Nat Rev Drug Discov. 2023;22(4):301-18.
|
||||
|
||||
**Always verify citations** with verify_citations.py before finalizing.
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Search Strategy
|
||||
1. **Use multiple databases** (minimum 3): Ensures comprehensive coverage
|
||||
2. **Include preprint servers**: Captures latest unpublished findings
|
||||
3. **Document everything**: Search strings, dates, result counts for reproducibility
|
||||
4. **Test and refine**: Run pilot searches, review results, adjust search terms
|
||||
|
||||
### Screening and Selection
|
||||
1. **Use clear criteria**: Document inclusion/exclusion criteria before screening
|
||||
2. **Screen systematically**: Title → Abstract → Full text
|
||||
3. **Document exclusions**: Record reasons for excluding studies
|
||||
4. **Consider dual screening**: For systematic reviews, have two reviewers screen independently
|
||||
|
||||
### Synthesis
|
||||
1. **Organize thematically**: Group by themes, NOT by individual studies
|
||||
2. **Synthesize across studies**: Compare, contrast, identify patterns
|
||||
3. **Be critical**: Evaluate quality and consistency of evidence
|
||||
4. **Identify gaps**: Note what's missing or understudied
|
||||
|
||||
### Quality and Reproducibility
|
||||
1. **Assess study quality**: Use appropriate quality assessment tools
|
||||
2. **Verify all citations**: Run verify_citations.py script
|
||||
3. **Document methodology**: Provide enough detail for others to reproduce
|
||||
4. **Follow guidelines**: Use PRISMA for systematic reviews
|
||||
|
||||
### Writing
|
||||
1. **Be objective**: Present evidence fairly, acknowledge limitations
|
||||
2. **Be systematic**: Follow structured template
|
||||
3. **Be specific**: Include numbers, statistics, effect sizes where available
|
||||
4. **Be clear**: Use clear headings, logical flow, thematic organization
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
1. **Single database search**: Misses relevant papers; always search multiple databases
|
||||
2. **No search documentation**: Makes review irreproducible; document all searches
|
||||
3. **Study-by-study summary**: Lacks synthesis; organize thematically instead
|
||||
4. **Unverified citations**: Leads to errors; always run verify_citations.py
|
||||
5. **Too broad search**: Yields thousands of irrelevant results; refine with specific terms
|
||||
6. **Too narrow search**: Misses relevant papers; include synonyms and related terms
|
||||
7. **Ignoring preprints**: Misses latest findings; include bioRxiv, medRxiv, arXiv
|
||||
8. **No quality assessment**: Treats all evidence equally; assess and report quality
|
||||
9. **Publication bias**: Only positive results published; note potential bias
|
||||
10. **Outdated search**: Field evolves rapidly; clearly state search date
|
||||
|
||||
## Example Workflow
|
||||
|
||||
Complete workflow for a biomedical literature review:
|
||||
|
||||
```bash
|
||||
# 1. Create review document from template
|
||||
cp assets/review_template.md crispr_sickle_cell_review.md
|
||||
|
||||
# 2. Search multiple databases using appropriate skills
|
||||
# - Use gget skill for PubMed, bioRxiv
|
||||
# - Use direct API access for arXiv, Semantic Scholar
|
||||
# - Export results in JSON format
|
||||
|
||||
# 3. Aggregate and process results
|
||||
python scripts/search_databases.py combined_results.json \
|
||||
--deduplicate \
|
||||
--rank citations \
|
||||
--year-start 2015 \
|
||||
--year-end 2024 \
|
||||
--format markdown \
|
||||
--output search_results.md \
|
||||
--summary
|
||||
|
||||
# 4. Screen results and extract data
|
||||
# - Manually screen titles, abstracts, full texts
|
||||
# - Extract key data into the review document
|
||||
# - Organize by themes
|
||||
|
||||
# 5. Write the review following template structure
|
||||
# - Introduction with clear objectives
|
||||
# - Detailed methodology section
|
||||
# - Results organized thematically
|
||||
# - Critical discussion
|
||||
# - Clear conclusions
|
||||
|
||||
# 6. Verify all citations
|
||||
python scripts/verify_citations.py crispr_sickle_cell_review.md
|
||||
|
||||
# Review the citation report
|
||||
cat crispr_sickle_cell_review_citation_report.json
|
||||
|
||||
# Fix any failed citations and re-verify
|
||||
python scripts/verify_citations.py crispr_sickle_cell_review.md
|
||||
|
||||
# 7. Generate professional PDF
|
||||
python scripts/generate_pdf.py crispr_sickle_cell_review.md \
|
||||
--citation-style nature \
|
||||
--output crispr_sickle_cell_review.pdf
|
||||
|
||||
# 8. Review final PDF and markdown outputs
|
||||
```
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
This skill works seamlessly with other scientific skills:
|
||||
|
||||
### Database Access Skills
|
||||
- **gget**: PubMed, bioRxiv, COSMIC, AlphaFold, Ensembl, UniProt
|
||||
- **bioservices**: ChEMBL, KEGG, Reactome, UniProt, PubChem
|
||||
- **datacommons-client**: Demographics, economics, health statistics
|
||||
|
||||
### Analysis Skills
|
||||
- **pydeseq2**: RNA-seq differential expression (for methods sections)
|
||||
- **scanpy**: Single-cell analysis (for methods sections)
|
||||
- **anndata**: Single-cell data (for methods sections)
|
||||
- **biopython**: Sequence analysis (for background sections)
|
||||
|
||||
### Visualization Skills
|
||||
- **matplotlib**: Generate figures and plots for review
|
||||
- **seaborn**: Statistical visualizations
|
||||
|
||||
### Writing Skills
|
||||
- **brand-guidelines**: Apply institutional branding to PDF
|
||||
- **internal-comms**: Adapt review for different audiences
|
||||
|
||||
## Resources
|
||||
|
||||
### Bundled Resources
|
||||
|
||||
**Scripts:**
|
||||
- `scripts/verify_citations.py`: Verify DOIs and generate formatted citations
|
||||
- `scripts/generate_pdf.py`: Convert markdown to professional PDF
|
||||
- `scripts/search_databases.py`: Process, deduplicate, and format search results
|
||||
|
||||
**References:**
|
||||
- `references/citation_styles.md`: Detailed citation formatting guide (APA, Nature, Vancouver, Chicago, IEEE)
|
||||
- `references/database_strategies.md`: Comprehensive database search strategies
|
||||
|
||||
**Assets:**
|
||||
- `assets/review_template.md`: Complete literature review template with all sections
|
||||
|
||||
### External Resources
|
||||
|
||||
**Guidelines:**
|
||||
- PRISMA (Systematic Reviews): http://www.prisma-statement.org/
|
||||
- Cochrane Handbook: https://training.cochrane.org/handbook
|
||||
- AMSTAR 2 (Review Quality): https://amstar.ca/
|
||||
|
||||
**Tools:**
|
||||
- MeSH Browser: https://meshb.nlm.nih.gov/search
|
||||
- PubMed Advanced Search: https://pubmed.ncbi.nlm.nih.gov/advanced/
|
||||
- Boolean Search Guide: https://www.ncbi.nlm.nih.gov/books/NBK3827/
|
||||
|
||||
**Citation Styles:**
|
||||
- APA Style: https://apastyle.apa.org/
|
||||
- Nature Portfolio: https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards
|
||||
- NLM/Vancouver: https://www.nlm.nih.gov/bsd/uniform_requirements.html
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Required Python Packages
|
||||
```bash
|
||||
uv pip install requests # For citation verification
|
||||
```
|
||||
|
||||
### Required System Tools
|
||||
```bash
|
||||
# For PDF generation
|
||||
brew install pandoc # macOS
|
||||
apt-get install pandoc # Linux
|
||||
|
||||
# For LaTeX (PDF generation)
|
||||
brew install --cask mactex # macOS
|
||||
apt-get install texlive-xetex # Linux
|
||||
```
|
||||
|
||||
Check dependencies:
|
||||
```bash
|
||||
python scripts/generate_pdf.py --check-deps
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
This literature-review skill provides:
|
||||
|
||||
1. **Systematic methodology** following academic best practices
|
||||
2. **Multi-database integration** via existing scientific skills
|
||||
3. **Citation verification** ensuring accuracy and credibility
|
||||
4. **Professional output** in markdown and PDF formats
|
||||
5. **Comprehensive guidance** covering the entire review process
|
||||
6. **Quality assurance** with verification and validation tools
|
||||
7. **Reproducibility** through detailed documentation requirements
|
||||
|
||||
Conduct thorough, rigorous literature reviews that meet academic standards and provide comprehensive synthesis of current knowledge in any domain.
|
||||
412
skills/literature-review/assets/review_template.md
Normal file
412
skills/literature-review/assets/review_template.md
Normal file
@@ -0,0 +1,412 @@
|
||||
# [Literature Review Title]
|
||||
|
||||
**Authors**: [Author Names and Affiliations]
|
||||
**Date**: [Date]
|
||||
**Review Type**: [Narrative / Systematic / Scoping / Meta-Analysis / Umbrella Review]
|
||||
**Review Protocol**: [PROSPERO ID if registered, or state "Not registered"]
|
||||
**PRISMA Compliance**: [Yes/No/Partial - specify which guidelines]
|
||||
|
||||
---
|
||||
|
||||
## Abstract
|
||||
|
||||
**Background**: [Context and rationale]
|
||||
**Objectives**: [Primary and secondary objectives]
|
||||
**Methods**: [Databases, dates, selection criteria, quality assessment]
|
||||
**Results**: [n studies included; key findings by theme]
|
||||
**Conclusions**: [Main conclusions and implications]
|
||||
**Registration**: [PROSPERO ID or "Not registered"]
|
||||
**Keywords**: [5-8 keywords]
|
||||
|
||||
---
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
### 1.1 Background and Context
|
||||
|
||||
[Provide background information on the topic. Establish why this literature review is important and timely. Discuss the broader context and current state of knowledge.]
|
||||
|
||||
### 1.2 Scope and Objectives
|
||||
|
||||
[Clearly define the scope of the review and state the specific objectives. What questions will this review address?]
|
||||
|
||||
**Primary Research Questions:**
|
||||
1. [Research question 1]
|
||||
2. [Research question 2]
|
||||
3. [Research question 3]
|
||||
|
||||
### 1.3 Significance
|
||||
|
||||
[Explain the significance of this review. Why is it important to synthesize this literature now? What gaps does it fill?]
|
||||
|
||||
---
|
||||
|
||||
## 2. Methodology
|
||||
|
||||
### 2.1 Protocol and Registration
|
||||
|
||||
**Protocol**: [PROSPERO ID / OSF link / Not registered]
|
||||
**Deviations**: [Document any protocol deviations]
|
||||
**PRISMA**: [Checklist in Appendix B]
|
||||
|
||||
### 2.2 Search Strategy
|
||||
|
||||
**Databases:** [PubMed, Scopus, Web of Science, bioRxiv, etc.]
|
||||
**Supplementary:** [Citation chaining, grey literature, trial registries]
|
||||
|
||||
**Search String Example:**
|
||||
```
|
||||
("CRISPR"[Title/Abstract] OR "Cas9"[Title/Abstract]) AND
|
||||
("disease"[MeSH Terms]) AND ("2015/01/01"[Date] : "2024/12/31"[Date])
|
||||
```
|
||||
|
||||
**Dates:** [YYYY-MM-DD to YYYY-MM-DD] | **Executed:** [Date]
|
||||
**Validation:** [Key papers used to test search strategy]
|
||||
|
||||
### 2.3 Tools and Software
|
||||
|
||||
**Screening:** [Rayyan, Covidence, ASReview]
|
||||
**Analysis:** [VOSviewer, R, Python]
|
||||
**Citation Management:** [Zotero, Mendeley, EndNote]
|
||||
**AI Tools:** [Any AI-assisted tools used; document validation approach]
|
||||
|
||||
### 2.4 Inclusion and Exclusion Criteria
|
||||
|
||||
**Inclusion Criteria:**
|
||||
- [Criterion 1: e.g., Published between 2015-2024]
|
||||
- [Criterion 2: e.g., Peer-reviewed articles and preprints]
|
||||
- [Criterion 3: e.g., English language]
|
||||
- [Criterion 4: e.g., Human or animal studies]
|
||||
- [Criterion 5: e.g., Original research or systematic reviews]
|
||||
|
||||
**Exclusion Criteria:**
|
||||
- [Criterion 1: e.g., Case reports with n<5]
|
||||
- [Criterion 2: e.g., Conference abstracts without full text]
|
||||
- [Criterion 3: e.g., Editorials and commentaries]
|
||||
- [Criterion 4: e.g., Duplicate publications]
|
||||
- [Criterion 5: e.g., Retracted articles]
|
||||
- [Criterion 6: e.g., Studies with unavailable full text after author contact]
|
||||
|
||||
### 2.5 Study Selection
|
||||
|
||||
**Reviewers:** [n independent reviewers] | **Conflict resolution:** [Method]
|
||||
**Inter-rater reliability:** [Cohen's kappa = X]
|
||||
|
||||
**PRISMA Flow:**
|
||||
```
|
||||
Records identified: n=[X] → Deduplicated: n=[Y] →
|
||||
Title/abstract screened: n=[Y] → Full-text assessed: n=[Z] → Included: n=[N]
|
||||
```
|
||||
|
||||
**Exclusion reasons:** [List with counts]
|
||||
|
||||
### 2.6 Data Extraction
|
||||
|
||||
**Method:** [Standardized form (Appendix E); pilot-tested on n studies]
|
||||
**Extractors:** [n independent] | **Verification:** [Double-checked]
|
||||
|
||||
**Items:** Study ID, design, population, interventions/exposures, outcomes, statistics, funding, COI, bias domains
|
||||
|
||||
**Missing data:** [Author contact protocol]
|
||||
|
||||
### 2.7 Quality Assessment
|
||||
|
||||
**Tool:** [Cochrane RoB 2.0 / ROBINS-I / Newcastle-Ottawa / AMSTAR 2 / JBI]
|
||||
**Method:** [2 independent reviewers; third for conflicts]
|
||||
**Rating:** [Low/Moderate/High risk of bias]
|
||||
**Publication bias:** [Funnel plots, Egger's test - if meta-analysis]
|
||||
|
||||
### 2.8 Synthesis and Analysis
|
||||
|
||||
**Approach:** [Narrative / Meta-analysis / Both]
|
||||
**Statistics** (if meta-analysis): Effect measures, heterogeneity (I², τ²), sensitivity analyses, subgroups
|
||||
**Software:** [RevMan, R, Stata]
|
||||
**Certainty:** [GRADE framework; factors: bias, inconsistency, indirectness, imprecision]
|
||||
|
||||
---
|
||||
|
||||
## 3. Results
|
||||
|
||||
### 3.1 Study Selection
|
||||
|
||||
**Summary:** [X records → Y deduplicated → Z full-text → N included (M in meta-analysis)]
|
||||
**Study types:** [RCTs: n=X, Observational: n=Y, Reviews: n=Z]
|
||||
**Years:** [Range; peak year]
|
||||
**Geography:** [Countries represented]
|
||||
**Source:** [Peer-reviewed: n=X, Preprints: n=Y]
|
||||
|
||||
### 3.2 Bibliometric Overview
|
||||
|
||||
[Optional: Trends, journal distribution, author networks, citations, keywords - if analyzed with VOSviewer or similar]
|
||||
|
||||
### 3.3 Study Characteristics
|
||||
|
||||
| Study | Year | Design | Sample Size | Key Methods | Main Findings | Quality |
|
||||
|-------|------|--------|-------------|-------------|---------------|---------|
|
||||
| First Author et al. | 2023 | [Type] | n=[X] | [Methods] | [Brief findings] | [Low/Mod/High RoB] |
|
||||
|
||||
**Quality:** Low RoB: n=X ([%]); Moderate: n=Y ([%]); High: n=Z ([%])
|
||||
|
||||
### 3.4 Thematic Synthesis
|
||||
|
||||
[Organize by themes, NOT study-by-study. Synthesize across studies to identify consensus, controversies, and gaps.]
|
||||
|
||||
#### 3.4.1 Theme 1: [Title]
|
||||
|
||||
**Findings:** [Synthesis of key findings from multiple studies]
|
||||
**Supporting studies:** [X, Y, Z]
|
||||
**Contradictory evidence:** [If any]
|
||||
**Certainty:** [GRADE rating if applicable]
|
||||
|
||||
### 3.5 Methodological Approaches
|
||||
|
||||
**Common methods:** [Method 1 (n studies), Method 2 (n studies)]
|
||||
**Emerging techniques:** [New approaches observed]
|
||||
**Methodological quality:** [Overall assessment]
|
||||
|
||||
### 3.6 Meta-Analysis Results
|
||||
|
||||
[Include only if conducting meta-analysis]
|
||||
|
||||
**Effect estimates:** [Primary/secondary outcomes with 95% CI, p-values]
|
||||
**Heterogeneity:** [I²=X%, τ²=Y, interpretation]
|
||||
**Subgroups & sensitivity:** [Key findings from analyses]
|
||||
**Publication bias:** [Funnel plot, Egger's p=X]
|
||||
**Forest plots:** [Include for primary outcomes]
|
||||
|
||||
### 3.7 Knowledge Gaps
|
||||
|
||||
**Knowledge:** [Unanswered research questions]
|
||||
**Methodological:** [Study design/measurement issues]
|
||||
**Translational:** [Research-to-practice gaps]
|
||||
**Populations:** [Underrepresented groups/contexts]
|
||||
|
||||
---
|
||||
|
||||
## 4. Discussion
|
||||
|
||||
### 4.1 Main Findings
|
||||
|
||||
[Synthesize key findings by research question]
|
||||
|
||||
**Principal findings:** [Top 3-5 takeaways]
|
||||
**Consensus:** [Where studies agree]
|
||||
**Controversy:** [Conflicting results]
|
||||
|
||||
### 4.2 Interpretation and Implications
|
||||
|
||||
**Context:** [How findings advance/challenge current understanding]
|
||||
**Mechanisms:** [Potential explanations for observed patterns]
|
||||
|
||||
**Implications for:**
|
||||
- **Practice:** [Actionable recommendations]
|
||||
- **Policy:** [If relevant]
|
||||
- **Research:** [Theoretical, methodological, priority directions]
|
||||
|
||||
### 4.3 Strengths and Limitations
|
||||
|
||||
**Strengths:** [Comprehensive search, rigorous methods, large evidence base, transparency]
|
||||
|
||||
**Limitations:**
|
||||
- Search/selection: [Language bias, database coverage, grey literature, publication bias]
|
||||
- Methodological: [Heterogeneity, study quality]
|
||||
- Temporal: [Rapid evolution, search cutoff date]
|
||||
|
||||
**Impact:** [How limitations affect conclusions]
|
||||
|
||||
### 4.4 Comparison with Previous Reviews
|
||||
|
||||
[If relevant: How does this review update/differ from prior reviews?]
|
||||
|
||||
### 4.5 Future Research
|
||||
|
||||
**Priority questions:**
|
||||
1. [Question] - Rationale, suggested approach, expected impact
|
||||
2. [Question] - Rationale, suggested approach, expected impact
|
||||
3. [Question] - Rationale, suggested approach, expected impact
|
||||
|
||||
**Recommendations:** [Methodological improvements, understudied populations, emerging technologies]
|
||||
|
||||
---
|
||||
|
||||
## 5. Conclusions
|
||||
|
||||
[Concise conclusions addressing research questions]
|
||||
|
||||
1. [Conclusion directly addressing primary research question]
|
||||
2. [Key finding conclusion]
|
||||
3. [Gap/future direction conclusion]
|
||||
|
||||
**Evidence certainty:** [High/Moderate/Low/Very Low]
|
||||
**Translation readiness:** [Ready / Needs more research / Preliminary]
|
||||
|
||||
---
|
||||
|
||||
## 6. Declarations
|
||||
|
||||
### Author Contributions
|
||||
[CRediT taxonomy: Author 1 - Conceptualization, Methodology, Writing; Author 2 - Analysis, Review; etc.]
|
||||
|
||||
### Funding
|
||||
[Grant details with numbers] OR [No funding received]
|
||||
|
||||
### Conflicts of Interest
|
||||
[Author-specific declarations] OR [None]
|
||||
|
||||
### Data Availability
|
||||
**Protocol:** [PROSPERO/OSF ID or "Not registered"]
|
||||
**Data/Code:** [Repository URL/DOI or "Available upon request"]
|
||||
**Materials:** [Search strategies (Appendix A), PRISMA checklist (Appendix B), extraction form (Appendix E)]
|
||||
|
||||
### Acknowledgments
|
||||
[Contributors not meeting authorship criteria, librarians, patient involvement]
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
[Use consistent style: APA / Nature / Vancouver]
|
||||
|
||||
**Format examples:**
|
||||
|
||||
APA: Author, A. A., & Author, B. B. (Year). Title. *Journal*, *volume*(issue), pages. https://doi.org/xx.xxxx
|
||||
|
||||
Nature: Author, A. A. & Author, B. B. Title. *J. Name* **volume**, pages (year).
|
||||
|
||||
Vancouver: Author AA, Author BB. Title. J Abbrev. Year;volume(issue):pages. doi:xx.xxxx
|
||||
|
||||
1. [First reference]
|
||||
2. [Second reference]
|
||||
3. [Continue...]
|
||||
|
||||
---
|
||||
|
||||
## 8. Appendices
|
||||
|
||||
### Appendix A: Search Strings
|
||||
|
||||
**PubMed** (Date: YYYY-MM-DD; Results: n)
|
||||
```
|
||||
[Complete search string with operators and MeSH terms]
|
||||
```
|
||||
|
||||
[Repeat for each database: Scopus, Web of Science, bioRxiv, etc.]
|
||||
|
||||
### Appendix B: PRISMA Checklist
|
||||
|
||||
| Section | Item | Reported? | Page |
|
||||
|---------|------|-----------|------|
|
||||
| Title | Identify as systematic review | Yes/No | # |
|
||||
| Abstract | Structured summary | Yes/No | # |
|
||||
| Methods | Eligibility, sources, search, selection, data, quality | Yes/No | # |
|
||||
| Results | Selection, characteristics, risk of bias, syntheses | Yes/No | # |
|
||||
| Discussion | Interpretation, limitations, conclusions | Yes/No | # |
|
||||
| Other | Registration, support, conflicts, availability | Yes/No | # |
|
||||
|
||||
### Appendix C: Excluded Studies
|
||||
|
||||
| Study | Year | Reason | Category |
|
||||
|-------|------|--------|----------|
|
||||
| Author et al. | Year | [Reason] | [Wrong population/outcome/design/etc.] |
|
||||
|
||||
**Summary:** Wrong population (n=X), Wrong outcome (n=Y), etc.
|
||||
|
||||
### Appendix D: Quality Assessment
|
||||
|
||||
**Tool:** [Cochrane RoB 2.0 / ROBINS-I / Newcastle-Ottawa / etc.]
|
||||
|
||||
| Study | Domain 1 | Domain 2 | Domain 3 | Overall |
|
||||
|-------|----------|----------|----------|---------|
|
||||
| Study 1 | Low | Low | Some concerns | Low |
|
||||
| Study 2 | [Score] | [Score] | [Score] | [Overall] |
|
||||
|
||||
### Appendix E: Data Extraction Form
|
||||
|
||||
```
|
||||
STUDY: Author______ Year______ DOI______
|
||||
DESIGN: □RCT □Cohort □Case-Control □Cross-sectional □Other______
|
||||
POPULATION: n=_____ Age_____ Setting_____
|
||||
INTERVENTION/EXPOSURE: _____
|
||||
OUTCOMES: Primary_____ Secondary_____
|
||||
RESULTS: Effect size_____ 95%CI_____ p=_____
|
||||
QUALITY: □Low □Moderate □High RoB
|
||||
FUNDING/COI: _____
|
||||
```
|
||||
|
||||
### Appendix F: Meta-Analysis Details
|
||||
|
||||
[Only if meta-analysis performed]
|
||||
|
||||
**Software:** [R 4.x.x with meta/metafor packages / RevMan / Stata]
|
||||
**Model:** [Random-effects; justification]
|
||||
**Code:** [Link to repository]
|
||||
**Sensitivity analyses:** [Details]
|
||||
|
||||
### Appendix G: Author Contacts
|
||||
|
||||
| Study | Contact Date | Response | Data Received |
|
||||
|-------|--------------|----------|---------------|
|
||||
| Author et al. | YYYY-MM-DD | Yes/No | Yes/No/Partial |
|
||||
|
||||
---
|
||||
|
||||
## 9. Supplementary Materials
|
||||
|
||||
[If applicable]
|
||||
|
||||
**Tables:** S1 (Full study characteristics), S2 (Quality scores), S3 (Subgroups), S4 (Sensitivity)
|
||||
**Figures:** S1 (PRISMA diagram), S2 (Risk of bias), S3 (Funnel plot), S4 (Forest plots), S5 (Networks)
|
||||
**Data:** S1 (Extraction file), S2 (Search results), S3 (Analysis code), S4 (PRISMA checklist)
|
||||
**Repository:** [OSF/GitHub/Zenodo URL with DOI]
|
||||
|
||||
---
|
||||
|
||||
## Review Metadata
|
||||
|
||||
**Registration:** [Registry] ID: [Number] (Date: YYYY-MM-DD)
|
||||
**Search dates:** Initial: [Date]; Updated: [Date]
|
||||
**Version:** [1.0] | **Last updated:** [Date]
|
||||
|
||||
**Quality checks:**
|
||||
- [ ] Citations verified with verify_citations.py
|
||||
- [ ] PRISMA checklist completed
|
||||
- [ ] Search reproducible
|
||||
- [ ] Independent data verification
|
||||
- [ ] Code peer-reviewed
|
||||
- [ ] All authors approved
|
||||
|
||||
---
|
||||
|
||||
## Usage Notes
|
||||
|
||||
**Review type adaptations:**
|
||||
- Systematic Review: Use all sections
|
||||
- Meta-Analysis: Include sections 3.6, Appendix F
|
||||
- Narrative Review: May omit some methodology detail
|
||||
- Scoping Review: Follow PRISMA-ScR, may omit quality assessment
|
||||
|
||||
**Key principles:**
|
||||
1. Remove all [bracketed placeholders]
|
||||
2. Follow PRISMA 2020 guidelines
|
||||
3. Pre-register when feasible (PROSPERO/OSF)
|
||||
4. Use thematic synthesis, not study-by-study
|
||||
5. Be transparent and reproducible
|
||||
6. Verify all DOIs before submission
|
||||
7. Make data/code openly available
|
||||
|
||||
**Common pitfalls to avoid:**
|
||||
- Don't list studies - synthesize them
|
||||
- Don't cherry-pick results
|
||||
- Don't ignore limitations
|
||||
- Don't overstate conclusions
|
||||
- Don't skip publication bias assessment
|
||||
|
||||
**Resources:**
|
||||
- PRISMA 2020: http://prisma-statement.org/
|
||||
- PROSPERO: https://www.crd.york.ac.uk/prospero/
|
||||
- Cochrane Handbook: https://training.cochrane.org/handbook
|
||||
- GRADE: https://www.gradeworkinggroup.org/
|
||||
|
||||
**DELETE THIS SECTION FROM YOUR FINAL REVIEW**
|
||||
|
||||
---
|
||||
166
skills/literature-review/references/citation_styles.md
Normal file
166
skills/literature-review/references/citation_styles.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Citation Styles Reference
|
||||
|
||||
This document provides detailed guidelines for formatting citations in various academic styles commonly used in literature reviews.
|
||||
|
||||
## APA Style (7th Edition)
|
||||
|
||||
### Journal Articles
|
||||
|
||||
**Format**: Author, A. A., Author, B. B., & Author, C. C. (Year). Title of article. *Title of Periodical*, *volume*(issue), page range. https://doi.org/xx.xxx/yyyy
|
||||
|
||||
**Example**: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Machine learning approaches in drug discovery. *Nature Reviews Drug Discovery*, *22*(4), 301-318. https://doi.org/10.1038/nrd.2023.001
|
||||
|
||||
### Books
|
||||
|
||||
**Format**: Author, A. A. (Year). *Title of work: Capital letter also for subtitle*. Publisher Name. https://doi.org/xxxx
|
||||
|
||||
**Example**: Kumar, V., Abbas, A. K., & Aster, J. C. (2021). *Robbins and Cotran pathologic basis of disease* (10th ed.). Elsevier.
|
||||
|
||||
### Book Chapters
|
||||
|
||||
**Format**: Author, A. A., & Author, B. B. (Year). Title of chapter. In E. E. Editor & F. F. Editor (Eds.), *Title of book* (pp. xx-xx). Publisher.
|
||||
|
||||
**Example**: Brown, P. O., & Botstein, D. (2020). Exploring the new world of the genome with DNA microarrays. In M. B. Eisen & P. O. Brown (Eds.), *DNA microarrays: A molecular cloning manual* (pp. 1-45). Cold Spring Harbor Laboratory Press.
|
||||
|
||||
### Preprints
|
||||
|
||||
**Format**: Author, A. A., & Author, B. B. (Year). Title of preprint. *Repository Name*. https://doi.org/xxxx
|
||||
|
||||
**Example**: Zhang, Y., Chen, L., & Wang, H. (2024). Novel therapeutic targets in Alzheimer's disease. *bioRxiv*. https://doi.org/10.1101/2024.01.001
|
||||
|
||||
### Conference Papers
|
||||
|
||||
**Format**: Author, A. A. (Year, Month day-day). Title of paper. In E. E. Editor (Ed.), *Title of conference proceedings* (pp. xx-xx). Publisher. https://doi.org/xxxx
|
||||
|
||||
---
|
||||
|
||||
## Nature Style
|
||||
|
||||
### Journal Articles
|
||||
|
||||
**Format**: Author, A. A., Author, B. B. & Author, C. C. Title of article. *J. Name* **volume**, page range (year).
|
||||
|
||||
**Example**: Smith, J. D., Johnson, M. L. & Williams, K. R. Machine learning approaches in drug discovery. *Nat. Rev. Drug Discov.* **22**, 301-318 (2023).
|
||||
|
||||
### Books
|
||||
|
||||
**Format**: Author, A. A. & Author, B. B. *Book Title* (Publisher, Year).
|
||||
|
||||
**Example**: Kumar, V., Abbas, A. K. & Aster, J. C. *Robbins and Cotran Pathologic Basis of Disease* 10th edn (Elsevier, 2021).
|
||||
|
||||
### Multiple Authors
|
||||
|
||||
- 1-2 authors: List all
|
||||
- 3+ authors: List first author followed by "et al."
|
||||
|
||||
**Example**: Zhang, Y. et al. Novel therapeutic targets in Alzheimer's disease. *bioRxiv* https://doi.org/10.1101/2024.01.001 (2024).
|
||||
|
||||
---
|
||||
|
||||
## Chicago Style (Author-Date)
|
||||
|
||||
### Journal Articles
|
||||
|
||||
**Format**: Author, First Name Middle Initial. Year. "Article Title." *Journal Title* volume, no. issue (Month): page range. https://doi.org/xxxx.
|
||||
|
||||
**Example**: Smith, John D., Mary L. Johnson, and Karen R. Williams. 2023. "Machine Learning Approaches in Drug Discovery." *Nature Reviews Drug Discovery* 22, no. 4 (April): 301-318. https://doi.org/10.1038/nrd.2023.001.
|
||||
|
||||
### Books
|
||||
|
||||
**Format**: Author, First Name Middle Initial. Year. *Book Title: Subtitle*. Edition. Place: Publisher.
|
||||
|
||||
**Example**: Kumar, Vinay, Abul K. Abbas, and Jon C. Aster. 2021. *Robbins and Cotran Pathologic Basis of Disease*. 10th ed. Philadelphia: Elsevier.
|
||||
|
||||
---
|
||||
|
||||
## Vancouver Style (Numbered)
|
||||
|
||||
### Journal Articles
|
||||
|
||||
**Format**: Author AA, Author BB, Author CC. Title of article. Abbreviated Journal Name. Year;volume(issue):page range.
|
||||
|
||||
**Example**: Smith JD, Johnson ML, Williams KR. Machine learning approaches in drug discovery. Nat Rev Drug Discov. 2023;22(4):301-18.
|
||||
|
||||
### Books
|
||||
|
||||
**Format**: Author AA, Author BB. Title of book. Edition. Place: Publisher; Year.
|
||||
|
||||
**Example**: Kumar V, Abbas AK, Aster JC. Robbins and Cotran pathologic basis of disease. 10th ed. Philadelphia: Elsevier; 2021.
|
||||
|
||||
### Citation in Text
|
||||
|
||||
Use superscript numbers in order of appearance: "Recent studies^1,2^ have shown..."
|
||||
|
||||
---
|
||||
|
||||
## IEEE Style
|
||||
|
||||
### Journal Articles
|
||||
|
||||
**Format**: [#] A. A. Author, B. B. Author, and C. C. Author, "Title of article," *Abbreviated Journal Name*, vol. x, no. x, pp. xxx-xxx, Month Year.
|
||||
|
||||
**Example**: [1] J. D. Smith, M. L. Johnson, and K. R. Williams, "Machine learning approaches in drug discovery," *Nat. Rev. Drug Discov.*, vol. 22, no. 4, pp. 301-318, Apr. 2023.
|
||||
|
||||
### Books
|
||||
|
||||
**Format**: [#] A. A. Author, *Title of Book*, xth ed. City, State: Publisher, Year.
|
||||
|
||||
**Example**: [2] V. Kumar, A. K. Abbas, and J. C. Aster, *Robbins and Cotran Pathologic Basis of Disease*, 10th ed. Philadelphia, PA: Elsevier, 2021.
|
||||
|
||||
---
|
||||
|
||||
## Common Abbreviations for Journal Names
|
||||
|
||||
- Nature: Nat.
|
||||
- Science: Science
|
||||
- Cell: Cell
|
||||
- Nature Reviews Drug Discovery: Nat. Rev. Drug Discov.
|
||||
- Journal of the American Chemical Society: J. Am. Chem. Soc.
|
||||
- Proceedings of the National Academy of Sciences: Proc. Natl. Acad. Sci. U.S.A.
|
||||
- PLOS ONE: PLoS ONE
|
||||
- Bioinformatics: Bioinformatics
|
||||
- Nucleic Acids Research: Nucleic Acids Res.
|
||||
|
||||
---
|
||||
|
||||
## DOI Best Practices
|
||||
|
||||
1. **Always verify DOIs**: Use the verify_citations.py script to check all DOIs
|
||||
2. **Format as URLs**: https://doi.org/10.xxxx/yyyy (preferred over doi:10.xxxx/yyyy)
|
||||
3. **No period after DOI**: DOI should be the last element without trailing punctuation
|
||||
4. **Resolve redirects**: Check that DOIs resolve to the correct article
|
||||
|
||||
---
|
||||
|
||||
## In-Text Citation Guidelines
|
||||
|
||||
### APA Style
|
||||
- (Smith et al., 2023)
|
||||
- Smith et al. (2023) demonstrated...
|
||||
- Multiple citations: (Brown, 2022; Smith et al., 2023; Zhang, 2024)
|
||||
|
||||
### Nature Style
|
||||
- Superscript numbers: Recent studies^1,2^ have shown...
|
||||
- Or: Recent studies (refs 1,2) have shown...
|
||||
|
||||
### Chicago Style
|
||||
- (Smith, Johnson, and Williams 2023)
|
||||
- Smith, Johnson, and Williams (2023) found...
|
||||
|
||||
---
|
||||
|
||||
## Reference List Organization
|
||||
|
||||
### By Citation Style
|
||||
- **APA, Chicago**: Alphabetical by first author's last name
|
||||
- **Nature, Vancouver, IEEE**: Numerical order of first appearance in text
|
||||
|
||||
### Hanging Indents
|
||||
Most styles use hanging indents where the first line is flush left and subsequent lines are indented.
|
||||
|
||||
### Consistency
|
||||
Maintain consistent formatting throughout:
|
||||
- Capitalization (title case vs. sentence case)
|
||||
- Journal name abbreviations
|
||||
- DOI presentation
|
||||
- Author name format
|
||||
381
skills/literature-review/references/database_strategies.md
Normal file
381
skills/literature-review/references/database_strategies.md
Normal file
@@ -0,0 +1,381 @@
|
||||
# Literature Database Search Strategies
|
||||
|
||||
This document provides comprehensive guidance for searching multiple literature databases systematically and effectively.
|
||||
|
||||
## Available Databases and Skills
|
||||
|
||||
### Biomedical & Life Sciences
|
||||
|
||||
#### PubMed / PubMed Central
|
||||
- **Access**: Use `gget` skill or WebFetch tool
|
||||
- **Coverage**: 35M+ citations in biomedical literature
|
||||
- **Best for**: Clinical studies, biomedical research, genetics, molecular biology
|
||||
- **Search tips**: Use MeSH terms, Boolean operators (AND, OR, NOT), field tags [Title], [Author]
|
||||
- **Example**: `"CRISPR"[Title] AND "gene editing"[Title/Abstract] AND 2020:2024[Publication Date]`
|
||||
|
||||
#### bioRxiv / medRxiv
|
||||
- **Access**: Use `gget` skill or direct API
|
||||
- **Coverage**: Preprints in biology and medicine
|
||||
- **Best for**: Latest unpublished research, cutting-edge findings
|
||||
- **Note**: Not peer-reviewed; verify findings with caution
|
||||
- **Search tips**: Search by category (bioinformatics, genomics, etc.)
|
||||
|
||||
### General Scientific Literature
|
||||
|
||||
#### arXiv
|
||||
- **Access**: Direct API access
|
||||
- **Coverage**: Preprints in physics, mathematics, computer science, quantitative biology
|
||||
- **Best for**: Computational methods, bioinformatics algorithms, theoretical work
|
||||
- **Categories**: q-bio (Quantitative Biology), cs.LG (Machine Learning), stat.ML (Statistics)
|
||||
- **Search format**: `cat:q-bio.QM AND title:"single cell"`
|
||||
|
||||
#### Semantic Scholar
|
||||
- **Access**: Direct API (requires API key)
|
||||
- **Coverage**: 200M+ papers across all fields
|
||||
- **Best for**: Cross-disciplinary searches, citation graphs, paper recommendations
|
||||
- **Features**: Influential citations, paper summaries, related papers
|
||||
- **Rate limits**: 100 requests/5 minutes with API key
|
||||
|
||||
#### Google Scholar
|
||||
- **Access**: Web scraping (use cautiously) or manual search
|
||||
- **Coverage**: Comprehensive across all fields
|
||||
- **Best for**: Finding highly cited papers, conference proceedings, theses
|
||||
- **Limitations**: No official API, rate limiting
|
||||
- **Export**: Use "Cite" feature for formatted citations
|
||||
|
||||
### Specialized Databases
|
||||
|
||||
#### ChEMBL / PubChem
|
||||
- **Access**: Use `gget` skill or `bioservices` skill
|
||||
- **Coverage**: Chemical compounds, bioactivity data, drug molecules
|
||||
- **Best for**: Drug discovery, chemical biology, medicinal chemistry
|
||||
- **ChEMBL**: 2M+ compounds, bioactivity data
|
||||
- **PubChem**: 110M+ compounds, assay data
|
||||
|
||||
#### UniProt
|
||||
- **Access**: Use `gget` skill or `bioservices` skill
|
||||
- **Coverage**: Protein sequence and functional information
|
||||
- **Best for**: Protein research, sequence analysis, functional annotations
|
||||
- **Search by**: Protein name, gene name, organism, function
|
||||
|
||||
#### KEGG (Kyoto Encyclopedia of Genes and Genomes)
|
||||
- **Access**: Use `bioservices` skill
|
||||
- **Coverage**: Pathways, diseases, drugs, genes
|
||||
- **Best for**: Pathway analysis, systems biology, metabolic research
|
||||
|
||||
#### COSMIC (Catalogue of Somatic Mutations in Cancer)
|
||||
- **Access**: Use `gget` skill or direct download
|
||||
- **Coverage**: Cancer genomics, somatic mutations
|
||||
- **Best for**: Cancer research, mutation analysis
|
||||
|
||||
#### AlphaFold Database
|
||||
- **Access**: Use `gget` skill with `alphafold` command
|
||||
- **Coverage**: 200M+ protein structure predictions
|
||||
- **Best for**: Structural biology, protein modeling
|
||||
|
||||
#### PDB (Protein Data Bank)
|
||||
- **Access**: Use `gget` or direct API
|
||||
- **Coverage**: Experimental 3D structures of proteins, nucleic acids
|
||||
- **Best for**: Structural biology, drug design, molecular modeling
|
||||
|
||||
### Citation & Reference Management
|
||||
|
||||
#### OpenAlex
|
||||
- **Access**: Direct API (free, no key required)
|
||||
- **Coverage**: 250M+ works, comprehensive metadata
|
||||
- **Best for**: Citation analysis, author disambiguation, institutional research
|
||||
- **Features**: Open access, excellent for bibliometrics
|
||||
|
||||
#### Dimensions
|
||||
- **Access**: Free tier available
|
||||
- **Coverage**: Publications, grants, patents, clinical trials
|
||||
- **Best for**: Research impact, funding analysis, translational research
|
||||
|
||||
---
|
||||
|
||||
## Search Strategy Framework
|
||||
|
||||
### 1. Define Research Question (PICO Framework)
|
||||
|
||||
For clinical/biomedical reviews:
|
||||
- **P**opulation: Who is the study about?
|
||||
- **I**ntervention: What is being tested?
|
||||
- **C**omparison: What is it compared to?
|
||||
- **O**utcome: What are the results?
|
||||
|
||||
**Example**: "What is the efficacy of CRISPR-Cas9 gene therapy (I) for treating sickle cell disease (P) compared to standard care (C) in improving patient outcomes (O)?"
|
||||
|
||||
### 2. Develop Search Terms
|
||||
|
||||
#### Primary Concepts
|
||||
Identify 2-4 main concepts from your research question.
|
||||
|
||||
**Example**:
|
||||
- Concept 1: CRISPR, Cas9, gene editing
|
||||
- Concept 2: sickle cell disease, SCD, hemoglobin disorders
|
||||
- Concept 3: gene therapy, therapeutic editing
|
||||
|
||||
#### Synonyms & Related Terms
|
||||
List alternative terms, abbreviations, and related concepts.
|
||||
|
||||
**Tool**: Use MeSH (Medical Subject Headings) browser for standardized terms
|
||||
|
||||
#### Boolean Operators
|
||||
- **AND**: Narrows search (must include both terms)
|
||||
- **OR**: Broadens search (includes either term)
|
||||
- **NOT**: Excludes terms
|
||||
|
||||
**Example**: `(CRISPR OR Cas9 OR "gene editing") AND ("sickle cell" OR SCD) AND therapy`
|
||||
|
||||
#### Wildcards & Truncation
|
||||
- `*` or `%`: Matches any characters
|
||||
- `?`: Matches single character
|
||||
|
||||
**Example**: `genom*` matches genomic, genomics, genome
|
||||
|
||||
### 3. Set Inclusion/Exclusion Criteria
|
||||
|
||||
#### Inclusion Criteria
|
||||
- **Date range**: e.g., 2015-2024 (last 10 years)
|
||||
- **Language**: English (or specify multilingual)
|
||||
- **Publication type**: Peer-reviewed articles, reviews, preprints
|
||||
- **Study design**: RCTs, cohort studies, meta-analyses
|
||||
- **Population**: Human, animal models, in vitro
|
||||
|
||||
#### Exclusion Criteria
|
||||
- Case reports (n<5)
|
||||
- Conference abstracts without full text
|
||||
- Non-original research (editorials, commentaries)
|
||||
- Duplicate publications
|
||||
- Retracted articles
|
||||
|
||||
### 4. Database Selection Strategy
|
||||
|
||||
#### Multi-Database Approach
|
||||
Search at least 3 complementary databases:
|
||||
|
||||
1. **Primary database**: PubMed (biomedical) or arXiv (computational)
|
||||
2. **Preprint server**: bioRxiv/medRxiv or arXiv
|
||||
3. **Comprehensive database**: Semantic Scholar or Google Scholar
|
||||
4. **Specialized database**: ChEMBL, UniProt, or field-specific
|
||||
|
||||
#### Database-Specific Syntax
|
||||
|
||||
| Database | Field Tags | Example |
|
||||
|----------|-----------|---------|
|
||||
| PubMed | [Title], [Author], [MeSH] | "CRISPR"[Title] AND 2020:2024[DP] |
|
||||
| arXiv | ti:, au:, cat: | ti:"machine learning" AND cat:q-bio.QM |
|
||||
| Semantic Scholar | title:, author:, year: | title:"deep learning" year:2020-2024 |
|
||||
|
||||
---
|
||||
|
||||
## Search Execution Workflow
|
||||
|
||||
### Phase 1: Pilot Search
|
||||
1. Run initial search with broad terms
|
||||
2. Review first 50 results for relevance
|
||||
3. Note common keywords and MeSH terms
|
||||
4. Refine search strategy
|
||||
|
||||
### Phase 2: Comprehensive Search
|
||||
1. Execute refined searches across all selected databases
|
||||
2. Export results in standard format (RIS, BibTeX, JSON)
|
||||
3. Document search strings and date for each database
|
||||
4. Record number of results per database
|
||||
|
||||
### Phase 3: Deduplication
|
||||
1. Import all results into a single file
|
||||
2. Use `search_databases.py --deduplicate` to remove duplicates
|
||||
3. Identify duplicates by DOI (primary) or title (fallback)
|
||||
4. Keep the version with most complete metadata
|
||||
|
||||
### Phase 4: Screening
|
||||
1. **Title screening**: Review titles, exclude obviously irrelevant
|
||||
2. **Abstract screening**: Read abstracts, apply inclusion/exclusion criteria
|
||||
3. **Full-text screening**: Obtain and review full texts
|
||||
4. Document reasons for exclusion at each stage
|
||||
|
||||
### Phase 5: Quality Assessment
|
||||
1. Assess study quality using appropriate tools:
|
||||
- **RCTs**: Cochrane Risk of Bias tool
|
||||
- **Observational**: Newcastle-Ottawa Scale
|
||||
- **Systematic reviews**: AMSTAR 2
|
||||
2. Grade quality of evidence (high, moderate, low, very low)
|
||||
3. Consider excluding very low-quality studies
|
||||
|
||||
---
|
||||
|
||||
## Search Documentation Template
|
||||
|
||||
### Required Documentation
|
||||
All searches must be documented for reproducibility:
|
||||
|
||||
```markdown
|
||||
## Search Strategy
|
||||
|
||||
### Database: PubMed
|
||||
- **Date searched**: 2024-10-25
|
||||
- **Date range**: 2015-01-01 to 2024-10-25
|
||||
- **Search string**:
|
||||
```
|
||||
("CRISPR"[Title] OR "Cas9"[Title] OR "gene editing"[Title/Abstract])
|
||||
AND ("sickle cell disease"[MeSH] OR "SCD"[Title/Abstract])
|
||||
AND ("gene therapy"[MeSH] OR "therapeutic editing"[Title/Abstract])
|
||||
AND 2015:2024[Publication Date]
|
||||
AND English[Language]
|
||||
```
|
||||
- **Results**: 247 articles
|
||||
- **After deduplication**: 189 articles
|
||||
|
||||
### Database: bioRxiv
|
||||
- **Date searched**: 2024-10-25
|
||||
- **Date range**: 2015-01-01 to 2024-10-25
|
||||
- **Search string**: "CRISPR" AND "sickle cell" (in title/abstract)
|
||||
- **Results**: 34 preprints
|
||||
- **After deduplication**: 28 preprints
|
||||
|
||||
### Total Unique Articles
|
||||
- **Combined results**: 217 unique articles
|
||||
- **After title screening**: 156 articles
|
||||
- **After abstract screening**: 89 articles
|
||||
- **After full-text screening**: 52 articles included in review
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Search Techniques
|
||||
|
||||
### Citation Chaining
|
||||
|
||||
#### Forward Citation Search
|
||||
Find papers that cite a key paper:
|
||||
- Use Google Scholar "Cited by" feature
|
||||
- Use OpenAlex or Semantic Scholar APIs
|
||||
- Identifies newer research building on seminal work
|
||||
|
||||
#### Backward Citation Search
|
||||
Review references in key papers:
|
||||
- Extract references from included papers
|
||||
- Search for highly cited references
|
||||
- Identifies foundational research
|
||||
|
||||
### Snowball Sampling
|
||||
1. Start with 3-5 highly relevant papers
|
||||
2. Extract all their references
|
||||
3. Check which references are cited by multiple papers
|
||||
4. Review those high-overlap references
|
||||
5. Repeat for newly identified key papers
|
||||
|
||||
### Author Search
|
||||
Follow prolific authors in the field:
|
||||
- Search by author name across databases
|
||||
- Check author profiles (ORCID, Google Scholar)
|
||||
- Review recent publications and preprints
|
||||
|
||||
### Related Article Features
|
||||
Many databases suggest related articles:
|
||||
- PubMed "Similar articles"
|
||||
- Semantic Scholar "Recommended papers"
|
||||
- Use to discover papers missed by keyword search
|
||||
|
||||
---
|
||||
|
||||
## Quality Control Checklist
|
||||
|
||||
### Before Searching
|
||||
- [ ] Research question clearly defined
|
||||
- [ ] PICO criteria established (if applicable)
|
||||
- [ ] Search terms and synonyms listed
|
||||
- [ ] Inclusion/exclusion criteria documented
|
||||
- [ ] Target databases selected (minimum 3)
|
||||
- [ ] Date range determined
|
||||
|
||||
### During Searching
|
||||
- [ ] Search string tested and refined
|
||||
- [ ] Results exported with complete metadata
|
||||
- [ ] Search parameters documented
|
||||
- [ ] Number of results recorded per database
|
||||
- [ ] Search date recorded
|
||||
|
||||
### After Searching
|
||||
- [ ] Duplicates removed
|
||||
- [ ] Screening protocol followed
|
||||
- [ ] Reasons for exclusion documented
|
||||
- [ ] Quality assessment completed
|
||||
- [ ] All citations verified with verify_citations.py
|
||||
- [ ] Search methodology documented in review
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
1. **Too narrow search**: Missing relevant papers
|
||||
- Solution: Include synonyms, related terms, broader concepts
|
||||
|
||||
2. **Too broad search**: Thousands of irrelevant results
|
||||
- Solution: Add specific concepts with AND, use field tags
|
||||
|
||||
3. **Single database**: Incomplete coverage
|
||||
- Solution: Search minimum 3 complementary databases
|
||||
|
||||
4. **Ignoring preprints**: Missing latest findings
|
||||
- Solution: Include bioRxiv, medRxiv, or arXiv
|
||||
|
||||
5. **No documentation**: Irreproducible search
|
||||
- Solution: Document every search string, date, and result count
|
||||
|
||||
6. **Manual deduplication**: Time-consuming and error-prone
|
||||
- Solution: Use search_databases.py script
|
||||
|
||||
7. **Unverified citations**: Broken DOIs, incorrect metadata
|
||||
- Solution: Run verify_citations.py on final reference list
|
||||
|
||||
8. **Publication bias**: Only including published positive results
|
||||
- Solution: Search trial registries, contact authors for unpublished data
|
||||
|
||||
---
|
||||
|
||||
## Example Multi-Database Search Workflow
|
||||
|
||||
```python
|
||||
# Example workflow using available skills
|
||||
|
||||
# 1. Search PubMed via gget
|
||||
search_term = "CRISPR AND sickle cell disease"
|
||||
# Use gget search pubmed search_term
|
||||
|
||||
# 2. Search bioRxiv
|
||||
# Use gget search biorxiv search_term
|
||||
|
||||
# 3. Search arXiv for computational papers
|
||||
# Search arXiv with: cat:q-bio AND "CRISPR" AND "sickle cell"
|
||||
|
||||
# 4. Search Semantic Scholar via API
|
||||
# Use semantic scholar API with search query
|
||||
|
||||
# 5. Aggregate and deduplicate results
|
||||
# python search_databases.py combined_results.json --deduplicate --format markdown --output review_papers.md
|
||||
|
||||
# 6. Verify all citations
|
||||
# python verify_citations.py review_papers.md
|
||||
|
||||
# 7. Generate final PDF
|
||||
# python generate_pdf.py review_papers.md --citation-style nature
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
### MeSH Browser
|
||||
https://meshb.nlm.nih.gov/search
|
||||
|
||||
### Boolean Search Tutorial
|
||||
https://www.ncbi.nlm.nih.gov/books/NBK3827/
|
||||
|
||||
### Citation Style Guides
|
||||
See references/citation_styles.md in this skill
|
||||
|
||||
### PRISMA Guidelines
|
||||
Preferred Reporting Items for Systematic Reviews and Meta-Analyses:
|
||||
http://www.prisma-statement.org/
|
||||
176
skills/literature-review/scripts/generate_pdf.py
Normal file
176
skills/literature-review/scripts/generate_pdf.py
Normal file
@@ -0,0 +1,176 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
PDF Generation Script for Literature Reviews
|
||||
Converts markdown files to professionally formatted PDFs with proper styling.
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
def generate_pdf(
|
||||
markdown_file: str,
|
||||
output_pdf: str = None,
|
||||
citation_style: str = "apa",
|
||||
template: str = None,
|
||||
toc: bool = True,
|
||||
number_sections: bool = True
|
||||
) -> bool:
|
||||
"""
|
||||
Generate a PDF from a markdown file using pandoc.
|
||||
|
||||
Args:
|
||||
markdown_file: Path to the markdown file
|
||||
output_pdf: Path for output PDF (defaults to same name as markdown)
|
||||
citation_style: Citation style (apa, nature, chicago, etc.)
|
||||
template: Path to custom LaTeX template
|
||||
toc: Include table of contents
|
||||
number_sections: Number the sections
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
|
||||
# Verify markdown file exists
|
||||
if not os.path.exists(markdown_file):
|
||||
print(f"Error: Markdown file not found: {markdown_file}")
|
||||
return False
|
||||
|
||||
# Set default output path
|
||||
if output_pdf is None:
|
||||
output_pdf = Path(markdown_file).with_suffix('.pdf')
|
||||
|
||||
# Check if pandoc is installed
|
||||
try:
|
||||
subprocess.run(['pandoc', '--version'], capture_output=True, check=True)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("Error: pandoc is not installed.")
|
||||
print("Install with: brew install pandoc (macOS) or apt-get install pandoc (Linux)")
|
||||
return False
|
||||
|
||||
# Build pandoc command
|
||||
cmd = [
|
||||
'pandoc',
|
||||
markdown_file,
|
||||
'-o', str(output_pdf),
|
||||
'--pdf-engine=xelatex', # Better Unicode support
|
||||
'-V', 'geometry:margin=1in',
|
||||
'-V', 'fontsize=11pt',
|
||||
'-V', 'colorlinks=true',
|
||||
'-V', 'linkcolor=blue',
|
||||
'-V', 'urlcolor=blue',
|
||||
'-V', 'citecolor=blue',
|
||||
]
|
||||
|
||||
# Add table of contents
|
||||
if toc:
|
||||
cmd.extend(['--toc', '--toc-depth=3'])
|
||||
|
||||
# Add section numbering
|
||||
if number_sections:
|
||||
cmd.append('--number-sections')
|
||||
|
||||
# Add citation processing if bibliography exists
|
||||
bib_file = Path(markdown_file).with_suffix('.bib')
|
||||
if bib_file.exists():
|
||||
cmd.extend([
|
||||
'--citeproc',
|
||||
'--bibliography', str(bib_file),
|
||||
'--csl', f'{citation_style}.csl' if not citation_style.endswith('.csl') else citation_style
|
||||
])
|
||||
|
||||
# Add custom template if provided
|
||||
if template and os.path.exists(template):
|
||||
cmd.extend(['--template', template])
|
||||
|
||||
# Execute pandoc
|
||||
try:
|
||||
print(f"Generating PDF: {output_pdf}")
|
||||
print(f"Command: {' '.join(cmd)}")
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
|
||||
print(f"✓ PDF generated successfully: {output_pdf}")
|
||||
return True
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"Error generating PDF:")
|
||||
print(f"STDOUT: {e.stdout}")
|
||||
print(f"STDERR: {e.stderr}")
|
||||
return False
|
||||
|
||||
def check_dependencies():
|
||||
"""Check if required dependencies are installed."""
|
||||
dependencies = {
|
||||
'pandoc': 'pandoc --version',
|
||||
'xelatex': 'xelatex --version'
|
||||
}
|
||||
|
||||
missing = []
|
||||
for name, cmd in dependencies.items():
|
||||
try:
|
||||
subprocess.run(cmd.split(), capture_output=True, check=True)
|
||||
print(f"✓ {name} is installed")
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print(f"✗ {name} is NOT installed")
|
||||
missing.append(name)
|
||||
|
||||
if missing:
|
||||
print("\n" + "="*60)
|
||||
print("Missing dependencies:")
|
||||
for dep in missing:
|
||||
if dep == 'pandoc':
|
||||
print(" - pandoc: brew install pandoc (macOS) or apt-get install pandoc (Linux)")
|
||||
elif dep == 'xelatex':
|
||||
print(" - xelatex: brew install --cask mactex (macOS) or apt-get install texlive-xetex (Linux)")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def main():
|
||||
"""Command-line interface."""
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python generate_pdf.py <markdown_file> [output_pdf] [--citation-style STYLE]")
|
||||
print("\nOptions:")
|
||||
print(" --citation-style STYLE Citation style (default: apa)")
|
||||
print(" --no-toc Disable table of contents")
|
||||
print(" --no-numbers Disable section numbering")
|
||||
print(" --check-deps Check if dependencies are installed")
|
||||
sys.exit(1)
|
||||
|
||||
# Check dependencies mode
|
||||
if '--check-deps' in sys.argv:
|
||||
check_dependencies()
|
||||
sys.exit(0)
|
||||
|
||||
# Parse arguments
|
||||
markdown_file = sys.argv[1]
|
||||
output_pdf = sys.argv[2] if len(sys.argv) > 2 and not sys.argv[2].startswith('--') else None
|
||||
|
||||
citation_style = 'apa'
|
||||
toc = True
|
||||
number_sections = True
|
||||
|
||||
# Parse optional flags
|
||||
if '--citation-style' in sys.argv:
|
||||
idx = sys.argv.index('--citation-style')
|
||||
if idx + 1 < len(sys.argv):
|
||||
citation_style = sys.argv[idx + 1]
|
||||
|
||||
if '--no-toc' in sys.argv:
|
||||
toc = False
|
||||
|
||||
if '--no-numbers' in sys.argv:
|
||||
number_sections = False
|
||||
|
||||
# Generate PDF
|
||||
success = generate_pdf(
|
||||
markdown_file,
|
||||
output_pdf,
|
||||
citation_style=citation_style,
|
||||
toc=toc,
|
||||
number_sections=number_sections
|
||||
)
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
303
skills/literature-review/scripts/search_databases.py
Normal file
303
skills/literature-review/scripts/search_databases.py
Normal file
@@ -0,0 +1,303 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Literature Database Search Script
|
||||
Searches multiple literature databases and aggregates results.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from typing import Dict, List
|
||||
from datetime import datetime
|
||||
|
||||
def format_search_results(results: List[Dict], output_format: str = 'json') -> str:
|
||||
"""
|
||||
Format search results for output.
|
||||
|
||||
Args:
|
||||
results: List of search results
|
||||
output_format: Format (json, markdown, or bibtex)
|
||||
|
||||
Returns:
|
||||
Formatted string
|
||||
"""
|
||||
if output_format == 'json':
|
||||
return json.dumps(results, indent=2)
|
||||
|
||||
elif output_format == 'markdown':
|
||||
md = f"# Literature Search Results\n\n"
|
||||
md += f"**Search Date**: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n"
|
||||
md += f"**Total Results**: {len(results)}\n\n"
|
||||
|
||||
for i, result in enumerate(results, 1):
|
||||
md += f"## {i}. {result.get('title', 'Untitled')}\n\n"
|
||||
md += f"**Authors**: {result.get('authors', 'Unknown')}\n\n"
|
||||
md += f"**Year**: {result.get('year', 'N/A')}\n\n"
|
||||
md += f"**Source**: {result.get('source', 'Unknown')}\n\n"
|
||||
|
||||
if result.get('abstract'):
|
||||
md += f"**Abstract**: {result['abstract']}\n\n"
|
||||
|
||||
if result.get('doi'):
|
||||
md += f"**DOI**: [{result['doi']}](https://doi.org/{result['doi']})\n\n"
|
||||
|
||||
if result.get('url'):
|
||||
md += f"**URL**: {result['url']}\n\n"
|
||||
|
||||
if result.get('citations'):
|
||||
md += f"**Citations**: {result['citations']}\n\n"
|
||||
|
||||
md += "---\n\n"
|
||||
|
||||
return md
|
||||
|
||||
elif output_format == 'bibtex':
|
||||
bibtex = ""
|
||||
for i, result in enumerate(results, 1):
|
||||
entry_type = result.get('type', 'article')
|
||||
cite_key = f"{result.get('first_author', 'unknown')}{result.get('year', '0000')}"
|
||||
|
||||
bibtex += f"@{entry_type}{{{cite_key},\n"
|
||||
bibtex += f" title = {{{result.get('title', '')}}},\n"
|
||||
bibtex += f" author = {{{result.get('authors', '')}}},\n"
|
||||
bibtex += f" year = {{{result.get('year', '')}}},\n"
|
||||
|
||||
if result.get('journal'):
|
||||
bibtex += f" journal = {{{result['journal']}}},\n"
|
||||
|
||||
if result.get('volume'):
|
||||
bibtex += f" volume = {{{result['volume']}}},\n"
|
||||
|
||||
if result.get('pages'):
|
||||
bibtex += f" pages = {{{result['pages']}}},\n"
|
||||
|
||||
if result.get('doi'):
|
||||
bibtex += f" doi = {{{result['doi']}}},\n"
|
||||
|
||||
bibtex += "}\n\n"
|
||||
|
||||
return bibtex
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unknown format: {output_format}")
|
||||
|
||||
def deduplicate_results(results: List[Dict]) -> List[Dict]:
|
||||
"""
|
||||
Remove duplicate results based on DOI or title.
|
||||
|
||||
Args:
|
||||
results: List of search results
|
||||
|
||||
Returns:
|
||||
Deduplicated list
|
||||
"""
|
||||
seen_dois = set()
|
||||
seen_titles = set()
|
||||
unique_results = []
|
||||
|
||||
for result in results:
|
||||
doi = result.get('doi', '').lower().strip()
|
||||
title = result.get('title', '').lower().strip()
|
||||
|
||||
# Check DOI first (more reliable)
|
||||
if doi and doi in seen_dois:
|
||||
continue
|
||||
|
||||
# Check title as fallback
|
||||
if not doi and title in seen_titles:
|
||||
continue
|
||||
|
||||
# Add to results
|
||||
if doi:
|
||||
seen_dois.add(doi)
|
||||
if title:
|
||||
seen_titles.add(title)
|
||||
|
||||
unique_results.append(result)
|
||||
|
||||
return unique_results
|
||||
|
||||
def rank_results(results: List[Dict], criteria: str = 'citations') -> List[Dict]:
|
||||
"""
|
||||
Rank results by specified criteria.
|
||||
|
||||
Args:
|
||||
results: List of search results
|
||||
criteria: Ranking criteria (citations, year, relevance)
|
||||
|
||||
Returns:
|
||||
Ranked list
|
||||
"""
|
||||
if criteria == 'citations':
|
||||
return sorted(results, key=lambda x: x.get('citations', 0), reverse=True)
|
||||
elif criteria == 'year':
|
||||
return sorted(results, key=lambda x: x.get('year', '0'), reverse=True)
|
||||
elif criteria == 'relevance':
|
||||
return sorted(results, key=lambda x: x.get('relevance_score', 0), reverse=True)
|
||||
else:
|
||||
return results
|
||||
|
||||
def filter_by_year(results: List[Dict], start_year: int = None, end_year: int = None) -> List[Dict]:
|
||||
"""
|
||||
Filter results by publication year range.
|
||||
|
||||
Args:
|
||||
results: List of search results
|
||||
start_year: Minimum year (inclusive)
|
||||
end_year: Maximum year (inclusive)
|
||||
|
||||
Returns:
|
||||
Filtered list
|
||||
"""
|
||||
filtered = []
|
||||
|
||||
for result in results:
|
||||
try:
|
||||
year = int(result.get('year', 0))
|
||||
if start_year and year < start_year:
|
||||
continue
|
||||
if end_year and year > end_year:
|
||||
continue
|
||||
filtered.append(result)
|
||||
except (ValueError, TypeError):
|
||||
# Include if year parsing fails
|
||||
filtered.append(result)
|
||||
|
||||
return filtered
|
||||
|
||||
def generate_search_summary(results: List[Dict]) -> Dict:
|
||||
"""
|
||||
Generate summary statistics for search results.
|
||||
|
||||
Args:
|
||||
results: List of search results
|
||||
|
||||
Returns:
|
||||
Summary dictionary
|
||||
"""
|
||||
summary = {
|
||||
'total_results': len(results),
|
||||
'sources': {},
|
||||
'year_distribution': {},
|
||||
'avg_citations': 0,
|
||||
'total_citations': 0
|
||||
}
|
||||
|
||||
citations = []
|
||||
|
||||
for result in results:
|
||||
# Count by source
|
||||
source = result.get('source', 'Unknown')
|
||||
summary['sources'][source] = summary['sources'].get(source, 0) + 1
|
||||
|
||||
# Count by year
|
||||
year = result.get('year', 'Unknown')
|
||||
summary['year_distribution'][year] = summary['year_distribution'].get(year, 0) + 1
|
||||
|
||||
# Collect citations
|
||||
if result.get('citations'):
|
||||
try:
|
||||
citations.append(int(result['citations']))
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
if citations:
|
||||
summary['avg_citations'] = sum(citations) / len(citations)
|
||||
summary['total_citations'] = sum(citations)
|
||||
|
||||
return summary
|
||||
|
||||
def main():
|
||||
"""Command-line interface for search result processing."""
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python search_databases.py <results.json> [options]")
|
||||
print("\nOptions:")
|
||||
print(" --format FORMAT Output format (json, markdown, bibtex)")
|
||||
print(" --output FILE Output file (default: stdout)")
|
||||
print(" --rank CRITERIA Rank by (citations, year, relevance)")
|
||||
print(" --year-start YEAR Filter by start year")
|
||||
print(" --year-end YEAR Filter by end year")
|
||||
print(" --deduplicate Remove duplicates")
|
||||
print(" --summary Show summary statistics")
|
||||
sys.exit(1)
|
||||
|
||||
# Load results
|
||||
results_file = sys.argv[1]
|
||||
try:
|
||||
with open(results_file, 'r', encoding='utf-8') as f:
|
||||
results = json.load(f)
|
||||
except Exception as e:
|
||||
print(f"Error loading results: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
# Parse options
|
||||
output_format = 'markdown'
|
||||
output_file = None
|
||||
rank_criteria = None
|
||||
year_start = None
|
||||
year_end = None
|
||||
do_dedup = False
|
||||
show_summary = False
|
||||
|
||||
i = 2
|
||||
while i < len(sys.argv):
|
||||
arg = sys.argv[i]
|
||||
|
||||
if arg == '--format' and i + 1 < len(sys.argv):
|
||||
output_format = sys.argv[i + 1]
|
||||
i += 2
|
||||
elif arg == '--output' and i + 1 < len(sys.argv):
|
||||
output_file = sys.argv[i + 1]
|
||||
i += 2
|
||||
elif arg == '--rank' and i + 1 < len(sys.argv):
|
||||
rank_criteria = sys.argv[i + 1]
|
||||
i += 2
|
||||
elif arg == '--year-start' and i + 1 < len(sys.argv):
|
||||
year_start = int(sys.argv[i + 1])
|
||||
i += 2
|
||||
elif arg == '--year-end' and i + 1 < len(sys.argv):
|
||||
year_end = int(sys.argv[i + 1])
|
||||
i += 2
|
||||
elif arg == '--deduplicate':
|
||||
do_dedup = True
|
||||
i += 1
|
||||
elif arg == '--summary':
|
||||
show_summary = True
|
||||
i += 1
|
||||
else:
|
||||
i += 1
|
||||
|
||||
# Process results
|
||||
if do_dedup:
|
||||
results = deduplicate_results(results)
|
||||
print(f"After deduplication: {len(results)} results")
|
||||
|
||||
if year_start or year_end:
|
||||
results = filter_by_year(results, year_start, year_end)
|
||||
print(f"After year filter: {len(results)} results")
|
||||
|
||||
if rank_criteria:
|
||||
results = rank_results(results, rank_criteria)
|
||||
print(f"Ranked by: {rank_criteria}")
|
||||
|
||||
# Show summary
|
||||
if show_summary:
|
||||
summary = generate_search_summary(results)
|
||||
print("\n" + "="*60)
|
||||
print("SEARCH SUMMARY")
|
||||
print("="*60)
|
||||
print(json.dumps(summary, indent=2))
|
||||
print()
|
||||
|
||||
# Format output
|
||||
output = format_search_results(results, output_format)
|
||||
|
||||
# Write output
|
||||
if output_file:
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
f.write(output)
|
||||
print(f"✓ Results saved to: {output_file}")
|
||||
else:
|
||||
print(output)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
222
skills/literature-review/scripts/verify_citations.py
Normal file
222
skills/literature-review/scripts/verify_citations.py
Normal file
@@ -0,0 +1,222 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Citation Verification Script
|
||||
Verifies DOIs, URLs, and citation metadata for accuracy.
|
||||
"""
|
||||
|
||||
import re
|
||||
import requests
|
||||
import json
|
||||
from typing import Dict, List, Tuple
|
||||
from urllib.parse import urlparse
|
||||
import time
|
||||
|
||||
class CitationVerifier:
|
||||
def __init__(self):
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
'User-Agent': 'CitationVerifier/1.0 (Literature Review Tool)'
|
||||
})
|
||||
|
||||
def extract_dois(self, text: str) -> List[str]:
|
||||
"""Extract all DOIs from text."""
|
||||
doi_pattern = r'10\.\d{4,}/[^\s\]\)"]+'
|
||||
return re.findall(doi_pattern, text)
|
||||
|
||||
def verify_doi(self, doi: str) -> Tuple[bool, Dict]:
|
||||
"""
|
||||
Verify a DOI and retrieve metadata.
|
||||
Returns (is_valid, metadata)
|
||||
"""
|
||||
try:
|
||||
url = f"https://doi.org/api/handles/{doi}"
|
||||
response = self.session.get(url, timeout=10)
|
||||
|
||||
if response.status_code == 200:
|
||||
# DOI exists, now get metadata from CrossRef
|
||||
metadata = self._get_crossref_metadata(doi)
|
||||
return True, metadata
|
||||
else:
|
||||
return False, {}
|
||||
except Exception as e:
|
||||
return False, {"error": str(e)}
|
||||
|
||||
def _get_crossref_metadata(self, doi: str) -> Dict:
|
||||
"""Get metadata from CrossRef API."""
|
||||
try:
|
||||
url = f"https://api.crossref.org/works/{doi}"
|
||||
response = self.session.get(url, timeout=10)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
message = data.get('message', {})
|
||||
|
||||
# Extract key metadata
|
||||
metadata = {
|
||||
'title': message.get('title', [''])[0],
|
||||
'authors': self._format_authors(message.get('author', [])),
|
||||
'year': self._extract_year(message),
|
||||
'journal': message.get('container-title', [''])[0],
|
||||
'volume': message.get('volume', ''),
|
||||
'pages': message.get('page', ''),
|
||||
'doi': doi
|
||||
}
|
||||
return metadata
|
||||
return {}
|
||||
except Exception as e:
|
||||
return {"error": str(e)}
|
||||
|
||||
def _format_authors(self, authors: List[Dict]) -> str:
|
||||
"""Format author list."""
|
||||
if not authors:
|
||||
return ""
|
||||
|
||||
formatted = []
|
||||
for author in authors[:3]: # First 3 authors
|
||||
given = author.get('given', '')
|
||||
family = author.get('family', '')
|
||||
if family:
|
||||
formatted.append(f"{family}, {given[0]}." if given else family)
|
||||
|
||||
if len(authors) > 3:
|
||||
formatted.append("et al.")
|
||||
|
||||
return ", ".join(formatted)
|
||||
|
||||
def _extract_year(self, message: Dict) -> str:
|
||||
"""Extract publication year."""
|
||||
date_parts = message.get('published-print', {}).get('date-parts', [[]])
|
||||
if not date_parts or not date_parts[0]:
|
||||
date_parts = message.get('published-online', {}).get('date-parts', [[]])
|
||||
|
||||
if date_parts and date_parts[0]:
|
||||
return str(date_parts[0][0])
|
||||
return ""
|
||||
|
||||
def verify_url(self, url: str) -> Tuple[bool, int]:
|
||||
"""
|
||||
Verify a URL is accessible.
|
||||
Returns (is_accessible, status_code)
|
||||
"""
|
||||
try:
|
||||
response = self.session.head(url, timeout=10, allow_redirects=True)
|
||||
is_accessible = response.status_code < 400
|
||||
return is_accessible, response.status_code
|
||||
except Exception as e:
|
||||
return False, 0
|
||||
|
||||
def verify_citations_in_file(self, filepath: str) -> Dict:
|
||||
"""
|
||||
Verify all citations in a markdown file.
|
||||
Returns a report of verification results.
|
||||
"""
|
||||
with open(filepath, 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
|
||||
dois = self.extract_dois(content)
|
||||
|
||||
report = {
|
||||
'total_dois': len(dois),
|
||||
'verified': [],
|
||||
'failed': [],
|
||||
'metadata': {}
|
||||
}
|
||||
|
||||
for doi in dois:
|
||||
print(f"Verifying DOI: {doi}")
|
||||
is_valid, metadata = self.verify_doi(doi)
|
||||
|
||||
if is_valid:
|
||||
report['verified'].append(doi)
|
||||
report['metadata'][doi] = metadata
|
||||
else:
|
||||
report['failed'].append(doi)
|
||||
|
||||
time.sleep(0.5) # Rate limiting
|
||||
|
||||
return report
|
||||
|
||||
def format_citation_apa(self, metadata: Dict) -> str:
|
||||
"""Format citation in APA style."""
|
||||
authors = metadata.get('authors', '')
|
||||
year = metadata.get('year', 'n.d.')
|
||||
title = metadata.get('title', '')
|
||||
journal = metadata.get('journal', '')
|
||||
volume = metadata.get('volume', '')
|
||||
pages = metadata.get('pages', '')
|
||||
doi = metadata.get('doi', '')
|
||||
|
||||
citation = f"{authors} ({year}). {title}. "
|
||||
if journal:
|
||||
citation += f"*{journal}*"
|
||||
if volume:
|
||||
citation += f", *{volume}*"
|
||||
if pages:
|
||||
citation += f", {pages}"
|
||||
if doi:
|
||||
citation += f". https://doi.org/{doi}"
|
||||
|
||||
return citation
|
||||
|
||||
def format_citation_nature(self, metadata: Dict) -> str:
|
||||
"""Format citation in Nature style."""
|
||||
authors = metadata.get('authors', '')
|
||||
title = metadata.get('title', '')
|
||||
journal = metadata.get('journal', '')
|
||||
volume = metadata.get('volume', '')
|
||||
pages = metadata.get('pages', '')
|
||||
year = metadata.get('year', '')
|
||||
|
||||
citation = f"{authors} {title}. "
|
||||
if journal:
|
||||
citation += f"*{journal}* "
|
||||
if volume:
|
||||
citation += f"**{volume}**, "
|
||||
if pages:
|
||||
citation += f"{pages} "
|
||||
if year:
|
||||
citation += f"({year})"
|
||||
|
||||
return citation
|
||||
|
||||
def main():
|
||||
"""Example usage."""
|
||||
import sys
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python verify_citations.py <markdown_file>")
|
||||
sys.exit(1)
|
||||
|
||||
filepath = sys.argv[1]
|
||||
verifier = CitationVerifier()
|
||||
|
||||
print(f"Verifying citations in: {filepath}")
|
||||
report = verifier.verify_citations_in_file(filepath)
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("CITATION VERIFICATION REPORT")
|
||||
print("="*60)
|
||||
print(f"\nTotal DOIs found: {report['total_dois']}")
|
||||
print(f"Verified: {len(report['verified'])}")
|
||||
print(f"Failed: {len(report['failed'])}")
|
||||
|
||||
if report['failed']:
|
||||
print("\nFailed DOIs:")
|
||||
for doi in report['failed']:
|
||||
print(f" - {doi}")
|
||||
|
||||
if report['metadata']:
|
||||
print("\n\nVerified Citations (APA format):")
|
||||
for doi, metadata in report['metadata'].items():
|
||||
citation = verifier.format_citation_apa(metadata)
|
||||
print(f"\n{citation}")
|
||||
|
||||
# Save detailed report
|
||||
output_file = filepath.replace('.md', '_citation_report.json')
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(report, f, indent=2)
|
||||
|
||||
print(f"\n\nDetailed report saved to: {output_file}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user