zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

6.3 KiB

Raw Blame History

bioRxiv API Reference

Overview

The bioRxiv API provides programmatic access to preprint metadata from the bioRxiv server. The API returns JSON-formatted data with comprehensive metadata about life sciences preprints.

Base URL

https://api.biorxiv.org

Rate Limiting

Be respectful of the API:

Add delays between requests (minimum 0.5 seconds recommended)
Use appropriate User-Agent headers
Cache results when possible

API Endpoints

1. Details by Date Range

Retrieve preprints posted within a specific date range.

Endpoint:

GET /details/biorxiv/{start_date}/{end_date}
GET /details/biorxiv/{start_date}/{end_date}/{category}

Parameters:

start_date: Start date in YYYY-MM-DD format
end_date: End date in YYYY-MM-DD format
category (optional): Filter by subject category

Example:

GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31/neuroscience

Response:

{
  "messages": [
    {
      "status": "ok",
      "count": 150,
      "total": 150
    }
  ],
  "collection": [
    {
      "doi": "10.1101/2024.01.15.123456",
      "title": "Example Paper Title",
      "authors": "Smith J, Doe J, Johnson A",
      "author_corresponding": "Smith J",
      "author_corresponding_institution": "University Example",
      "date": "2024-01-15",
      "version": "1",
      "type": "new results",
      "license": "cc_by",
      "category": "neuroscience",
      "jatsxml": "https://www.biorxiv.org/content/...",
      "abstract": "This is the abstract...",
      "published": ""
    }
  ]
}

2. Details by DOI

Retrieve details for a specific preprint by DOI.

Endpoint:

GET /details/biorxiv/{doi}

Parameters:

doi: The DOI of the preprint (e.g., 10.1101/2024.01.15.123456)

Example:

GET https://api.biorxiv.org/details/biorxiv/10.1101/2024.01.15.123456

3. Publications by Interval

Retrieve recent publications from a time interval.

Endpoint:

GET /pubs/biorxiv/{interval}/{cursor}/{format}

Parameters:

interval: Number of days back to search (e.g., 1 for last 24 hours)
cursor: Pagination cursor (0 for first page, increment by 100 for subsequent pages)
format: Response format (json or xml)

Example:

GET https://api.biorxiv.org/pubs/biorxiv/1/0/json

Response includes pagination:

{
  "messages": [
    {
      "status": "ok",
      "count": 100,
      "total": 250,
      "cursor": 100
    }
  ],
  "collection": [...]
}

Valid Categories

bioRxiv organizes preprints into the following categories:

animal-behavior-and-cognition
biochemistry
bioengineering
bioinformatics
biophysics
cancer-biology
cell-biology
clinical-trials
developmental-biology
ecology
epidemiology
evolutionary-biology
genetics
genomics
immunology
microbiology
molecular-biology
neuroscience
paleontology
pathology
pharmacology-and-toxicology
physiology
plant-biology
scientific-communication-and-education
synthetic-biology
systems-biology
zoology

Paper Metadata Fields

Each paper in the collection array contains:

Field	Description	Type
`doi`	Digital Object Identifier	string
`title`	Paper title	string
`authors`	Comma-separated author list	string
`author_corresponding`	Corresponding author name	string
`author_corresponding_institution`	Corresponding author's institution	string
`date`	Publication date (YYYY-MM-DD)	string
`version`	Version number	string
`type`	Type of submission (e.g., "new results")	string
`license`	License type (e.g., "cc_by")	string
`category`	Subject category	string
`jatsxml`	URL to JATS XML	string
`abstract`	Paper abstract	string
`published`	Journal publication info (if published)	string

Downloading Full Papers

PDF Download

PDFs can be downloaded directly (not through API):

https://www.biorxiv.org/content/{doi}v{version}.full.pdf

Example:

https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1.full.pdf

HTML Version

https://www.biorxiv.org/content/{doi}v{version}

JATS XML

Full structured XML is available via the jatsxml field in the API response.

Common Search Patterns

Author Search

Get papers from date range
Filter by author name (case-insensitive substring match in authors field)

Keyword Search

Get papers from date range (optionally filtered by category)
Search in title, abstract, or both fields
Filter papers containing keywords (case-insensitive)

Error Handling

Common HTTP status codes:

200: Success
404: Resource not found
500: Server error

Always check the messages array in the response:

{
  "messages": [
    {
      "status": "ok",
      "count": 100
    }
  ]
}

Best Practices

Cache results: Store retrieved papers to avoid repeated API calls
Use appropriate date ranges: Smaller date ranges return faster
Filter by category: Reduces data transfer and processing time
Batch processing: When downloading multiple PDFs, add delays between requests
Error handling: Always check response status and handle errors gracefully
Version tracking: Note that papers can have multiple versions

Python Usage Example

from biorxiv_search import BioRxivSearcher

searcher = BioRxivSearcher(verbose=True)

# Search by keywords
papers = searcher.search_by_keywords(
    keywords=["CRISPR", "gene editing"],
    start_date="2024-01-01",
    end_date="2024-12-31",
    category="genomics"
)

# Search by author
papers = searcher.search_by_author(
    author_name="Smith",
    start_date="2023-01-01",
    end_date="2024-12-31"
)

# Get specific paper
paper = searcher.get_paper_details("10.1101/2024.01.15.123456")

# Download PDF
searcher.download_pdf("10.1101/2024.01.15.123456", "paper.pdf")

External Resources

bioRxiv homepage: https://www.biorxiv.org/
API documentation: https://api.biorxiv.org/
JATS XML specification: https://jats.nlm.nih.gov/

6.3 KiB Raw Blame History

bioRxiv API Reference

Overview

Base URL

Rate Limiting

API Endpoints

1. Details by Date Range

2. Details by DOI

3. Publications by Interval

Valid Categories

Paper Metadata Fields

Downloading Full Papers

PDF Download

HTML Version

JATS XML

Common Search Patterns

Author Search

Keyword Search

Recent Papers by Category

Error Handling

Best Practices

Python Usage Example

External Resources

6.3 KiB

Raw Blame History