6.3 KiB
bioRxiv API Reference
Overview
The bioRxiv API provides programmatic access to preprint metadata from the bioRxiv server. The API returns JSON-formatted data with comprehensive metadata about life sciences preprints.
Base URL
https://api.biorxiv.org
Rate Limiting
Be respectful of the API:
- Add delays between requests (minimum 0.5 seconds recommended)
- Use appropriate User-Agent headers
- Cache results when possible
API Endpoints
1. Details by Date Range
Retrieve preprints posted within a specific date range.
Endpoint:
GET /details/biorxiv/{start_date}/{end_date}
GET /details/biorxiv/{start_date}/{end_date}/{category}
Parameters:
start_date: Start date in YYYY-MM-DD formatend_date: End date in YYYY-MM-DD formatcategory(optional): Filter by subject category
Example:
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31/neuroscience
Response:
{
"messages": [
{
"status": "ok",
"count": 150,
"total": 150
}
],
"collection": [
{
"doi": "10.1101/2024.01.15.123456",
"title": "Example Paper Title",
"authors": "Smith J, Doe J, Johnson A",
"author_corresponding": "Smith J",
"author_corresponding_institution": "University Example",
"date": "2024-01-15",
"version": "1",
"type": "new results",
"license": "cc_by",
"category": "neuroscience",
"jatsxml": "https://www.biorxiv.org/content/...",
"abstract": "This is the abstract...",
"published": ""
}
]
}
2. Details by DOI
Retrieve details for a specific preprint by DOI.
Endpoint:
GET /details/biorxiv/{doi}
Parameters:
doi: The DOI of the preprint (e.g.,10.1101/2024.01.15.123456)
Example:
GET https://api.biorxiv.org/details/biorxiv/10.1101/2024.01.15.123456
3. Publications by Interval
Retrieve recent publications from a time interval.
Endpoint:
GET /pubs/biorxiv/{interval}/{cursor}/{format}
Parameters:
interval: Number of days back to search (e.g.,1for last 24 hours)cursor: Pagination cursor (0 for first page, increment by 100 for subsequent pages)format: Response format (jsonorxml)
Example:
GET https://api.biorxiv.org/pubs/biorxiv/1/0/json
Response includes pagination:
{
"messages": [
{
"status": "ok",
"count": 100,
"total": 250,
"cursor": 100
}
],
"collection": [...]
}
Valid Categories
bioRxiv organizes preprints into the following categories:
animal-behavior-and-cognitionbiochemistrybioengineeringbioinformaticsbiophysicscancer-biologycell-biologyclinical-trialsdevelopmental-biologyecologyepidemiologyevolutionary-biologygeneticsgenomicsimmunologymicrobiologymolecular-biologyneurosciencepaleontologypathologypharmacology-and-toxicologyphysiologyplant-biologyscientific-communication-and-educationsynthetic-biologysystems-biologyzoology
Paper Metadata Fields
Each paper in the collection array contains:
| Field | Description | Type |
|---|---|---|
doi |
Digital Object Identifier | string |
title |
Paper title | string |
authors |
Comma-separated author list | string |
author_corresponding |
Corresponding author name | string |
author_corresponding_institution |
Corresponding author's institution | string |
date |
Publication date (YYYY-MM-DD) | string |
version |
Version number | string |
type |
Type of submission (e.g., "new results") | string |
license |
License type (e.g., "cc_by") | string |
category |
Subject category | string |
jatsxml |
URL to JATS XML | string |
abstract |
Paper abstract | string |
published |
Journal publication info (if published) | string |
Downloading Full Papers
PDF Download
PDFs can be downloaded directly (not through API):
https://www.biorxiv.org/content/{doi}v{version}.full.pdf
Example:
https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1.full.pdf
HTML Version
https://www.biorxiv.org/content/{doi}v{version}
JATS XML
Full structured XML is available via the jatsxml field in the API response.
Common Search Patterns
Author Search
- Get papers from date range
- Filter by author name (case-insensitive substring match in
authorsfield)
Keyword Search
- Get papers from date range (optionally filtered by category)
- Search in title, abstract, or both fields
- Filter papers containing keywords (case-insensitive)
Recent Papers by Category
- Use
/pubs/biorxiv/{interval}/0/jsonendpoint - Filter by category if needed
Error Handling
Common HTTP status codes:
200: Success404: Resource not found500: Server error
Always check the messages array in the response:
{
"messages": [
{
"status": "ok",
"count": 100
}
]
}
Best Practices
- Cache results: Store retrieved papers to avoid repeated API calls
- Use appropriate date ranges: Smaller date ranges return faster
- Filter by category: Reduces data transfer and processing time
- Batch processing: When downloading multiple PDFs, add delays between requests
- Error handling: Always check response status and handle errors gracefully
- Version tracking: Note that papers can have multiple versions
Python Usage Example
from biorxiv_search import BioRxivSearcher
searcher = BioRxivSearcher(verbose=True)
# Search by keywords
papers = searcher.search_by_keywords(
keywords=["CRISPR", "gene editing"],
start_date="2024-01-01",
end_date="2024-12-31",
category="genomics"
)
# Search by author
papers = searcher.search_by_author(
author_name="Smith",
start_date="2023-01-01",
end_date="2024-12-31"
)
# Get specific paper
paper = searcher.get_paper_details("10.1101/2024.01.15.123456")
# Download PDF
searcher.download_pdf("10.1101/2024.01.15.123456", "paper.pdf")
External Resources
- bioRxiv homepage: https://www.biorxiv.org/
- API documentation: https://api.biorxiv.org/
- JATS XML specification: https://jats.nlm.nih.gov/