# bioRxiv API Reference

## Overview

The bioRxiv API provides programmatic access to preprint metadata from the bioRxiv server. The API returns JSON-formatted data with comprehensive metadata about life sciences preprints.

## Base URL

```
https://api.biorxiv.org
```

## Rate Limiting

Be respectful of the API:
- Add delays between requests (minimum 0.5 seconds recommended)
- Use appropriate User-Agent headers
- Cache results when possible

## API Endpoints

### 1. Details by Date Range

Retrieve preprints posted within a specific date range.

**Endpoint:**
```
GET /details/biorxiv/{start_date}/{end_date}
GET /details/biorxiv/{start_date}/{end_date}/{category}
```

**Parameters:**
- `start_date`: Start date in YYYY-MM-DD format
- `end_date`: End date in YYYY-MM-DD format
- `category` (optional): Filter by subject category

**Example:**
```
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31
GET https://api.biorxiv.org/details/biorxiv/2024-01-01/2024-01-31/neuroscience
```

**Response:**
```json
{
  "messages": [
    {
      "status": "ok",
      "count": 150,
      "total": 150
    }
  ],
  "collection": [
    {
      "doi": "10.1101/2024.01.15.123456",
      "title": "Example Paper Title",
      "authors": "Smith J, Doe J, Johnson A",
      "author_corresponding": "Smith J",
      "author_corresponding_institution": "University Example",
      "date": "2024-01-15",
      "version": "1",
      "type": "new results",
      "license": "cc_by",
      "category": "neuroscience",
      "jatsxml": "https://www.biorxiv.org/content/...",
      "abstract": "This is the abstract...",
      "published": ""
    }
  ]
}
```

### 2. Details by DOI

Retrieve details for a specific preprint by DOI.

**Endpoint:**
```
GET /details/biorxiv/{doi}
```

**Parameters:**
- `doi`: The DOI of the preprint (e.g., `10.1101/2024.01.15.123456`)

**Example:**
```
GET https://api.biorxiv.org/details/biorxiv/10.1101/2024.01.15.123456
```

### 3. Publications by Interval

Retrieve recent publications from a time interval.

**Endpoint:**
```
GET /pubs/biorxiv/{interval}/{cursor}/{format}
```

**Parameters:**
- `interval`: Number of days back to search (e.g., `1` for last 24 hours)
- `cursor`: Pagination cursor (0 for first page, increment by 100 for subsequent pages)
- `format`: Response format (`json` or `xml`)

**Example:**
```
GET https://api.biorxiv.org/pubs/biorxiv/1/0/json
```

**Response includes pagination:**
```json
{
  "messages": [
    {
      "status": "ok",
      "count": 100,
      "total": 250,
      "cursor": 100
    }
  ],
  "collection": [...]
}
```

## Valid Categories

bioRxiv organizes preprints into the following categories:

- `animal-behavior-and-cognition`
- `biochemistry`
- `bioengineering`
- `bioinformatics`
- `biophysics`
- `cancer-biology`
- `cell-biology`
- `clinical-trials`
- `developmental-biology`
- `ecology`
- `epidemiology`
- `evolutionary-biology`
- `genetics`
- `genomics`
- `immunology`
- `microbiology`
- `molecular-biology`
- `neuroscience`
- `paleontology`
- `pathology`
- `pharmacology-and-toxicology`
- `physiology`
- `plant-biology`
- `scientific-communication-and-education`
- `synthetic-biology`
- `systems-biology`
- `zoology`

## Paper Metadata Fields

Each paper in the `collection` array contains:

| Field | Description | Type |
|-------|-------------|------|
| `doi` | Digital Object Identifier | string |
| `title` | Paper title | string |
| `authors` | Comma-separated author list | string |
| `author_corresponding` | Corresponding author name | string |
| `author_corresponding_institution` | Corresponding author's institution | string |
| `date` | Publication date (YYYY-MM-DD) | string |
| `version` | Version number | string |
| `type` | Type of submission (e.g., "new results") | string |
| `license` | License type (e.g., "cc_by") | string |
| `category` | Subject category | string |
| `jatsxml` | URL to JATS XML | string |
| `abstract` | Paper abstract | string |
| `published` | Journal publication info (if published) | string |

## Downloading Full Papers

### PDF Download

PDFs can be downloaded directly (not through API):

```
https://www.biorxiv.org/content/{doi}v{version}.full.pdf
```

Example:
```
https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1.full.pdf
```

### HTML Version

```
https://www.biorxiv.org/content/{doi}v{version}
```

### JATS XML

Full structured XML is available via the `jatsxml` field in the API response.

## Common Search Patterns

### Author Search

1. Get papers from date range
2. Filter by author name (case-insensitive substring match in `authors` field)

### Keyword Search

1. Get papers from date range (optionally filtered by category)
2. Search in title, abstract, or both fields
3. Filter papers containing keywords (case-insensitive)

### Recent Papers by Category

1. Use `/pubs/biorxiv/{interval}/0/json` endpoint
2. Filter by category if needed

## Error Handling

Common HTTP status codes:
- `200`: Success
- `404`: Resource not found
- `500`: Server error

Always check the `messages` array in the response:
```json
{
  "messages": [
    {
      "status": "ok",
      "count": 100
    }
  ]
}
```

## Best Practices

1. **Cache results**: Store retrieved papers to avoid repeated API calls
2. **Use appropriate date ranges**: Smaller date ranges return faster
3. **Filter by category**: Reduces data transfer and processing time
4. **Batch processing**: When downloading multiple PDFs, add delays between requests
5. **Error handling**: Always check response status and handle errors gracefully
6. **Version tracking**: Note that papers can have multiple versions

## Python Usage Example

```python
from biorxiv_search import BioRxivSearcher

searcher = BioRxivSearcher(verbose=True)

# Search by keywords
papers = searcher.search_by_keywords(
    keywords=["CRISPR", "gene editing"],
    start_date="2024-01-01",
    end_date="2024-12-31",
    category="genomics"
)

# Search by author
papers = searcher.search_by_author(
    author_name="Smith",
    start_date="2023-01-01",
    end_date="2024-12-31"
)

# Get specific paper
paper = searcher.get_paper_details("10.1101/2024.01.15.123456")

# Download PDF
searcher.download_pdf("10.1101/2024.01.15.123456", "paper.pdf")
```

## External Resources

- bioRxiv homepage: https://www.biorxiv.org/
- API documentation: https://api.biorxiv.org/
- JATS XML specification: https://jats.nlm.nih.gov/