Initial commit
This commit is contained in:
280
skills/biorxiv-database/references/api_reference.md
Normal file
280
skills/biorxiv-database/references/api_reference.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# bioRxiv API Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The bioRxiv API provides programmatic access to preprint metadata from the bioRxiv server. The API returns JSON-formatted data with comprehensive metadata about life sciences preprints.
|
||||
|
||||
## Base URL
|
||||
|
||||
```
|
||||
https://api.biorxiv.org
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
Be respectful of the API:
|
||||
- Add delays between requests (minimum 0.5 seconds recommended)
|
||||
- Use appropriate User-Agent headers
|
||||
- Cache results when possible
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### 1. Details by Date Range
|
||||
|
||||
Retrieve preprints posted within a specific date range.
|
||||
|
||||
**Endpoint:**
|
||||
```
|
||||
GET /details/biorxiv/{start_date}/{end_date}
|
||||
GET /details/biorxiv/{start_date}/{end_date}/{category}
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `start_date`: Start date in YYYY-MM-DD format
|
||||
- `end_date`: End date in YYYY-MM-DD format
|
||||
- `category` (optional): Filter by subject category
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31
|
||||
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31/neuroscience
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{
|
||||
"status": "ok",
|
||||
"count": 150,
|
||||
"total": 150
|
||||
}
|
||||
],
|
||||
"collection": [
|
||||
{
|
||||
"doi": "10.1101/2024.01.15.123456",
|
||||
"title": "Example Paper Title",
|
||||
"authors": "Smith J, Doe J, Johnson A",
|
||||
"author_corresponding": "Smith J",
|
||||
"author_corresponding_institution": "University Example",
|
||||
"date": "2024-01-15",
|
||||
"version": "1",
|
||||
"type": "new results",
|
||||
"license": "cc_by",
|
||||
"category": "neuroscience",
|
||||
"jatsxml": "https://www.biorxiv.org/content/...",
|
||||
"abstract": "This is the abstract...",
|
||||
"published": ""
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Details by DOI
|
||||
|
||||
Retrieve details for a specific preprint by DOI.
|
||||
|
||||
**Endpoint:**
|
||||
```
|
||||
GET /details/biorxiv/{doi}
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `doi`: The DOI of the preprint (e.g., `10.1101/2024.01.15.123456`)
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET https://api.biorxiv.org/details/biorxiv/10.1101/2024.01.15.123456
|
||||
```
|
||||
|
||||
### 3. Publications by Interval
|
||||
|
||||
Retrieve recent publications from a time interval.
|
||||
|
||||
**Endpoint:**
|
||||
```
|
||||
GET /pubs/biorxiv/{interval}/{cursor}/{format}
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `interval`: Number of days back to search (e.g., `1` for last 24 hours)
|
||||
- `cursor`: Pagination cursor (0 for first page, increment by 100 for subsequent pages)
|
||||
- `format`: Response format (`json` or `xml`)
|
||||
|
||||
**Example:**
|
||||
```
|
||||
GET https://api.biorxiv.org/pubs/biorxiv/1/0/json
|
||||
```
|
||||
|
||||
**Response includes pagination:**
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{
|
||||
"status": "ok",
|
||||
"count": 100,
|
||||
"total": 250,
|
||||
"cursor": 100
|
||||
}
|
||||
],
|
||||
"collection": [...]
|
||||
}
|
||||
```
|
||||
|
||||
## Valid Categories
|
||||
|
||||
bioRxiv organizes preprints into the following categories:
|
||||
|
||||
- `animal-behavior-and-cognition`
|
||||
- `biochemistry`
|
||||
- `bioengineering`
|
||||
- `bioinformatics`
|
||||
- `biophysics`
|
||||
- `cancer-biology`
|
||||
- `cell-biology`
|
||||
- `clinical-trials`
|
||||
- `developmental-biology`
|
||||
- `ecology`
|
||||
- `epidemiology`
|
||||
- `evolutionary-biology`
|
||||
- `genetics`
|
||||
- `genomics`
|
||||
- `immunology`
|
||||
- `microbiology`
|
||||
- `molecular-biology`
|
||||
- `neuroscience`
|
||||
- `paleontology`
|
||||
- `pathology`
|
||||
- `pharmacology-and-toxicology`
|
||||
- `physiology`
|
||||
- `plant-biology`
|
||||
- `scientific-communication-and-education`
|
||||
- `synthetic-biology`
|
||||
- `systems-biology`
|
||||
- `zoology`
|
||||
|
||||
## Paper Metadata Fields
|
||||
|
||||
Each paper in the `collection` array contains:
|
||||
|
||||
| Field | Description | Type |
|
||||
|-------|-------------|------|
|
||||
| `doi` | Digital Object Identifier | string |
|
||||
| `title` | Paper title | string |
|
||||
| `authors` | Comma-separated author list | string |
|
||||
| `author_corresponding` | Corresponding author name | string |
|
||||
| `author_corresponding_institution` | Corresponding author's institution | string |
|
||||
| `date` | Publication date (YYYY-MM-DD) | string |
|
||||
| `version` | Version number | string |
|
||||
| `type` | Type of submission (e.g., "new results") | string |
|
||||
| `license` | License type (e.g., "cc_by") | string |
|
||||
| `category` | Subject category | string |
|
||||
| `jatsxml` | URL to JATS XML | string |
|
||||
| `abstract` | Paper abstract | string |
|
||||
| `published` | Journal publication info (if published) | string |
|
||||
|
||||
## Downloading Full Papers
|
||||
|
||||
### PDF Download
|
||||
|
||||
PDFs can be downloaded directly (not through API):
|
||||
|
||||
```
|
||||
https://www.biorxiv.org/content/{doi}v{version}.full.pdf
|
||||
```
|
||||
|
||||
Example:
|
||||
```
|
||||
https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1.full.pdf
|
||||
```
|
||||
|
||||
### HTML Version
|
||||
|
||||
```
|
||||
https://www.biorxiv.org/content/{doi}v{version}
|
||||
```
|
||||
|
||||
### JATS XML
|
||||
|
||||
Full structured XML is available via the `jatsxml` field in the API response.
|
||||
|
||||
## Common Search Patterns
|
||||
|
||||
### Author Search
|
||||
|
||||
1. Get papers from date range
|
||||
2. Filter by author name (case-insensitive substring match in `authors` field)
|
||||
|
||||
### Keyword Search
|
||||
|
||||
1. Get papers from date range (optionally filtered by category)
|
||||
2. Search in title, abstract, or both fields
|
||||
3. Filter papers containing keywords (case-insensitive)
|
||||
|
||||
### Recent Papers by Category
|
||||
|
||||
1. Use `/pubs/biorxiv/{interval}/0/json` endpoint
|
||||
2. Filter by category if needed
|
||||
|
||||
## Error Handling
|
||||
|
||||
Common HTTP status codes:
|
||||
- `200`: Success
|
||||
- `404`: Resource not found
|
||||
- `500`: Server error
|
||||
|
||||
Always check the `messages` array in the response:
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{
|
||||
"status": "ok",
|
||||
"count": 100
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Cache results**: Store retrieved papers to avoid repeated API calls
|
||||
2. **Use appropriate date ranges**: Smaller date ranges return faster
|
||||
3. **Filter by category**: Reduces data transfer and processing time
|
||||
4. **Batch processing**: When downloading multiple PDFs, add delays between requests
|
||||
5. **Error handling**: Always check response status and handle errors gracefully
|
||||
6. **Version tracking**: Note that papers can have multiple versions
|
||||
|
||||
## Python Usage Example
|
||||
|
||||
```python
|
||||
from biorxiv_search import BioRxivSearcher
|
||||
|
||||
searcher = BioRxivSearcher(verbose=True)
|
||||
|
||||
# Search by keywords
|
||||
papers = searcher.search_by_keywords(
|
||||
keywords=["CRISPR", "gene editing"],
|
||||
start_date="2024-01-01",
|
||||
end_date="2024-12-31",
|
||||
category="genomics"
|
||||
)
|
||||
|
||||
# Search by author
|
||||
papers = searcher.search_by_author(
|
||||
author_name="Smith",
|
||||
start_date="2023-01-01",
|
||||
end_date="2024-12-31"
|
||||
)
|
||||
|
||||
# Get specific paper
|
||||
paper = searcher.get_paper_details("10.1101/2024.01.15.123456")
|
||||
|
||||
# Download PDF
|
||||
searcher.download_pdf("10.1101/2024.01.15.123456", "paper.pdf")
|
||||
```
|
||||
|
||||
## External Resources
|
||||
|
||||
- bioRxiv homepage: https://www.biorxiv.org/
|
||||
- API documentation: https://api.biorxiv.org/
|
||||
- JATS XML specification: https://jats.nlm.nih.gov/
|
||||
Reference in New Issue
Block a user