# GWAS Catalog API Reference

Comprehensive reference for the GWAS Catalog REST APIs, including endpoint specifications, query parameters, response formats, and advanced usage patterns.

## Table of Contents

- [API Overview](#api-overview)
- [Authentication and Rate Limiting](#authentication-and-rate-limiting)
- [GWAS Catalog REST API](#gwas-catalog-rest-api)
- [Summary Statistics API](#summary-statistics-api)
- [Response Formats](#response-formats)
- [Error Handling](#error-handling)
- [Advanced Query Patterns](#advanced-query-patterns)
- [Integration Examples](#integration-examples)

## API Overview

The GWAS Catalog provides two complementary REST APIs:

1. **GWAS Catalog REST API**: Access to curated SNP-trait associations, studies, and metadata
2. **Summary Statistics API**: Access to full GWAS summary statistics (all tested variants)

Both APIs use RESTful design principles with JSON responses in HAL (Hypertext Application Language) format, which includes `_links` for resource navigation.

### Base URLs

```
GWAS Catalog API:         https://www.ebi.ac.uk/gwas/rest/api
Summary Statistics API:   https://www.ebi.ac.uk/gwas/summary-statistics/api
```

### Version Information

The GWAS Catalog REST API v2.0 was released in 2024, with significant improvements:
- New endpoints (publications, genes, genomic context, ancestries)
- Enhanced data exposure (cohorts, background traits, licenses)
- Improved query capabilities
- Better performance and documentation

The previous API version remains available until May 2026 for backward compatibility.

## Authentication and Rate Limiting

### Authentication

**No authentication required** - Both APIs are open access and do not require API keys or registration.

### Rate Limiting

While no explicit rate limits are documented, follow best practices:
- Implement delays between consecutive requests (e.g., 0.1-0.5 seconds)
- Use pagination for large result sets
- Cache responses locally
- Use bulk downloads (FTP) for genome-wide data
- Avoid hammering the API with rapid consecutive requests

**Example with rate limiting:**
```python
import requests
from time import sleep

def query_with_rate_limit(url, delay=0.1):
    response = requests.get(url)
    sleep(delay)
    return response.json()
```

## GWAS Catalog REST API

The main API provides access to curated GWAS associations, studies, variants, and traits.

### Core Endpoints

#### 1. Studies

**Get all studies:**
```
GET /studies
```

**Get specific study:**
```
GET /studies/{accessionId}
```

**Search studies:**
```
GET /studies/search/findByPublicationIdPubmedId?pubmedId={pmid}
GET /studies/search/findByDiseaseTrait?diseaseTrait={trait}
```

**Query Parameters:**
- `page`: Page number (0-indexed)
- `size`: Results per page (default: 20)
- `sort`: Sort field (e.g., `publicationDate,desc`)

**Example:**
```python
import requests

# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

print(f"Title: {study.get('title')}")
print(f"PMID: {study.get('publicationInfo', {}).get('pubmedId')}")
print(f"Sample size: {study.get('initialSampleSize')}")
```

**Response Fields:**
- `accessionId`: Study identifier (GCST ID)
- `title`: Study title
- `publicationInfo`: Publication details including PMID
- `initialSampleSize`: Discovery cohort description
- `replicationSampleSize`: Replication cohort description
- `ancestries`: Population ancestry information
- `genotypingTechnologies`: Array or sequencing platforms
- `_links`: Links to related resources

#### 2. Associations

**Get all associations:**
```
GET /associations
```

**Get specific association:**
```
GET /associations/{associationId}
```

**Get associations for a trait:**
```
GET /efoTraits/{efoId}/associations
```

**Get associations for a variant:**
```
GET /singleNucleotidePolymorphisms/{rsId}/associations
```

**Query Parameters:**
- `projection`: Response projection (e.g., `associationBySnp`)
- `page`, `size`, `sort`: Pagination controls

**Example:**
```python
import requests

# Find all associations for type 2 diabetes
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}/associations"
params = {"size": 100, "page": 0}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
data = response.json()

associations = data.get('_embedded', {}).get('associations', [])
print(f"Found {len(associations)} associations")
```

**Response Fields:**
- `rsId`: Variant identifier
- `strongestAllele`: Risk or effect allele
- `pvalue`: Association p-value
- `pvalueText`: P-value as reported (may include inequality)
- `pvalueMantissa`: Mantissa of p-value
- `pvalueExponent`: Exponent of p-value
- `orPerCopyNum`: Odds ratio per allele copy
- `betaNum`: Effect size (quantitative traits)
- `betaUnit`: Unit of measurement
- `range`: Confidence interval
- `standardError`: Standard error
- `efoTrait`: Trait name
- `mappedLabel`: EFO standardized term
- `studyId`: Associated study accession

#### 3. Variants (Single Nucleotide Polymorphisms)

**Get variant details:**
```
GET /singleNucleotidePolymorphisms/{rsId}
```

**Search variants:**
```
GET /singleNucleotidePolymorphisms/search/findByRsId?rsId={rsId}
GET /singleNucleotidePolymorphisms/search/findByChromBpLocationRange?chrom={chr}&bpStart={start}&bpEnd={end}
GET /singleNucleotidePolymorphisms/search/findByGene?geneName={gene}
```

**Example:**
```python
import requests

# Get variant information
rs_id = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant = response.json()

print(f"rsID: {variant.get('rsId')}")
print(f"Location: chr{variant.get('locations', [{}])[0].get('chromosomeName')}:{variant.get('locations', [{}])[0].get('chromosomePosition')}")
```

**Response Fields:**
- `rsId`: rs number
- `merged`: Indicates if variant merged with another
- `functionalClass`: Variant consequence
- `locations`: Array of genomic locations
  - `chromosomeName`: Chromosome number
  - `chromosomePosition`: Base pair position
  - `region`: Genomic region information
- `genomicContexts`: Nearby genes
- `lastUpdateDate`: Last modification date

#### 4. Traits (EFO Terms)

**Get trait information:**
```
GET /efoTraits/{efoId}
```

**Search traits:**
```
GET /efoTraits/search/findByEfoUri?uri={efoUri}
GET /efoTraits/search/findByTraitIgnoreCase?trait={traitName}
```

**Example:**
```python
import requests

# Get trait details
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait = response.json()

print(f"Trait: {trait.get('trait')}")
print(f"EFO URI: {trait.get('uri')}")
```

#### 5. Publications

**Get publication information:**
```
GET /publications
GET /publications/{publicationId}
GET /publications/search/findByPubmedId?pubmedId={pmid}
```

#### 6. Genes

**Get gene information:**
```
GET /genes
GET /genes/{geneId}
GET /genes/search/findByGeneName?geneName={symbol}
```

### Pagination and Navigation

All list endpoints support pagination:

```python
import requests

def get_all_associations(trait_id):
    """Retrieve all associations for a trait with pagination"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"
    all_associations = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        all_associations.extend(associations)
        page += 1

    return all_associations
```

### HAL Links

Responses include `_links` for resource navigation:

```python
import requests

# Get study and follow links to associations
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795")
study = response.json()

# Follow link to associations
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
associations = associations_response.json()
```

## Summary Statistics API

Access full GWAS summary statistics for studies that have deposited complete data.

### Base URL
```
https://www.ebi.ac.uk/gwas/summary-statistics/api
```

### Core Endpoints

#### 1. Studies

**Get all studies with summary statistics:**
```
GET /studies
```

**Get specific study:**
```
GET /studies/{gcstId}
```

#### 2. Traits

**Get trait information:**
```
GET /traits/{efoId}
```

**Get associations for a trait:**
```
GET /traits/{efoId}/associations
```

**Query Parameters:**
- `p_lower`: Lower p-value threshold
- `p_upper`: Upper p-value threshold
- `size`: Number of results
- `page`: Page number

**Example:**
```python
import requests

# Find highly significant associations for a trait
trait_id = "EFO_0001360"
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/traits/{trait_id}/associations"
params = {
    "p_upper": "0.000000001",  # p < 1e-9
    "size": 100
}
response = requests.get(url, params=params)
results = response.json()
```

#### 3. Chromosomes

**Get associations by chromosome:**
```
GET /chromosomes/{chromosome}/associations
```

**Query by genomic region:**
```
GET /chromosomes/{chromosome}/associations?start={start}&end={end}
```

**Example:**
```python
import requests

# Query variants in a specific region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/chromosomes/{chromosome}/associations"
params = {
    "start": start_pos,
    "end": end_pos,
    "size": 1000
}
response = requests.get(url, params=params)
variants = response.json()
```

#### 4. Variants

**Get specific variant across studies:**
```
GET /variants/{variantId}
```

**Search by variant ID:**
```
GET /variants/{variantId}/associations
```

### Response Fields

**Association Fields:**
- `variant_id`: Variant identifier
- `chromosome`: Chromosome number
- `base_pair_location`: Position (bp)
- `effect_allele`: Effect allele
- `other_allele`: Reference allele
- `effect_allele_frequency`: Allele frequency
- `beta`: Effect size
- `standard_error`: Standard error
- `p_value`: P-value
- `ci_lower`: Lower confidence interval
- `ci_upper`: Upper confidence interval
- `odds_ratio`: Odds ratio (case-control studies)
- `study_accession`: GCST ID

## Response Formats

### Content Type

All API requests should include the header:
```
Content-Type: application/json
```

### HAL Format

Responses follow the HAL (Hypertext Application Language) specification:

```json
{
  "_embedded": {
    "associations": [
      {
        "rsId": "rs7903146",
        "pvalue": 1.2e-30,
        "efoTrait": "type 2 diabetes",
        "_links": {
          "self": {
            "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/12345"
          }
        }
      }
    ]
  },
  "_links": {
    "self": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=0"
    },
    "next": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=1"
    }
  },
  "page": {
    "size": 20,
    "totalElements": 1523,
    "totalPages": 77,
    "number": 0
  }
}
```

### Page Metadata

Paginated responses include page information:
- `size`: Items per page
- `totalElements`: Total number of results
- `totalPages`: Total number of pages
- `number`: Current page number (0-indexed)

## Error Handling

### HTTP Status Codes

- `200 OK`: Successful request
- `400 Bad Request`: Invalid parameters
- `404 Not Found`: Resource not found
- `500 Internal Server Error`: Server error

### Error Response Format

```json
{
  "timestamp": "2025-10-19T12:00:00.000+00:00",
  "status": 404,
  "error": "Not Found",
  "message": "No association found with id: 12345",
  "path": "/gwas/rest/api/associations/12345"
}
```

### Error Handling Example

```python
import requests

def safe_api_request(url, params=None):
    """Make API request with error handling"""
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")
        print(f"Response: {response.text}")
        return None
    except requests.exceptions.ConnectionError:
        print("Connection error - check network")
        return None
    except requests.exceptions.Timeout:
        print("Request timed out")
        return None
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return None
```

## Advanced Query Patterns

### 1. Cross-referencing Variants and Traits

```python
import requests

def get_variant_pleiotropy(rs_id):
    """Get all traits associated with a variant"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/{rs_id}/associations"
    params = {"projection": "associationBySnp"}

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    data = response.json()

    traits = {}
    for assoc in data.get('_embedded', {}).get('associations', []):
        trait = assoc.get('efoTrait')
        pvalue = assoc.get('pvalue')
        if trait:
            if trait not in traits or float(pvalue) < float(traits[trait]):
                traits[trait] = pvalue

    return traits

# Example usage
pleiotropy = get_variant_pleiotropy('rs7903146')
for trait, pval in sorted(pleiotropy.items(), key=lambda x: float(x[1])):
    print(f"{trait}: p={pval}")
```

### 2. Filtering by P-value Threshold

```python
import requests

def get_significant_associations(trait_id, p_threshold=5e-8):
    """Get genome-wide significant associations"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append(assoc)

        page += 1

    return results
```

### 3. Combining Main and Summary Statistics APIs

```python
import requests

def get_complete_variant_data(rs_id):
    """Get variant data from both APIs"""
    main_url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

    # Get basic variant info
    response = requests.get(main_url, headers={"Content-Type": "application/json"})
    variant_info = response.json()

    # Get associations
    assoc_url = f"{main_url}/associations"
    response = requests.get(assoc_url, headers={"Content-Type": "application/json"})
    associations = response.json()

    # Could also query summary statistics API for this variant
    # across all studies with summary data

    return {
        "variant": variant_info,
        "associations": associations
    }
```

### 4. Genomic Region Queries

```python
import requests

def query_region(chromosome, start, end, p_threshold=None):
    """Query variants in genomic region"""
    # From main API
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
    params = {
        "chrom": chromosome,
        "bpStart": start,
        "bpEnd": end,
        "size": 1000
    }

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    variants = response.json()

    # Can also query summary statistics API
    sumstats_url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chromosome}/associations"
    sumstats_params = {"start": start, "end": end, "size": 1000}
    if p_threshold:
        sumstats_params["p_upper"] = str(p_threshold)

    sumstats_response = requests.get(sumstats_url, params=sumstats_params)
    sumstats = sumstats_response.json()

    return {
        "catalog_variants": variants,
        "summary_stats": sumstats
    }
```

## Integration Examples

### Complete Workflow: Disease Genetic Architecture

```python
import requests
import pandas as pd
from time import sleep

class GWASCatalogQuery:
    def __init__(self):
        self.base_url = "https://www.ebi.ac.uk/gwas/rest/api"
        self.headers = {"Content-Type": "application/json"}

    def get_trait_associations(self, trait_id, p_threshold=5e-8):
        """Get all associations for a trait"""
        url = f"{self.base_url}/efoTraits/{trait_id}/associations"
        results = []
        page = 0

        while True:
            params = {"page": page, "size": 100}
            response = requests.get(url, params=params, headers=self.headers)

            if response.status_code != 200:
                break

            data = response.json()
            associations = data.get('_embedded', {}).get('associations', [])

            if not associations:
                break

            for assoc in associations:
                pvalue = assoc.get('pvalue')
                if pvalue and float(pvalue) <= p_threshold:
                    results.append({
                        'rs_id': assoc.get('rsId'),
                        'pvalue': float(pvalue),
                        'risk_allele': assoc.get('strongestAllele'),
                        'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                        'study': assoc.get('studyId'),
                        'pubmed_id': assoc.get('pubmedId')
                    })

            page += 1
            sleep(0.1)

        return pd.DataFrame(results)

    def get_variant_details(self, rs_id):
        """Get detailed variant information"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/{rs_id}"
        response = requests.get(url, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

    def get_gene_associations(self, gene_name):
        """Get variants associated with a gene"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/search/findByGene"
        params = {"geneName": gene_name}
        response = requests.get(url, params=params, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

# Example usage
gwas = GWASCatalogQuery()

# Query type 2 diabetes associations
df = gwas.get_trait_associations('EFO_0001360')
print(f"Found {len(df)} genome-wide significant associations")
print(f"Unique variants: {df['rs_id'].nunique()}")

# Get top variants
top_variants = df.nsmallest(10, 'pvalue')
print("\nTop 10 variants:")
print(top_variants[['rs_id', 'pvalue', 'risk_allele']])

# Get details for top variant
if len(top_variants) > 0:
    top_rs = top_variants.iloc[0]['rs_id']
    variant_info = gwas.get_variant_details(top_rs)
    if variant_info:
        loc = variant_info.get('locations', [{}])[0]
        print(f"\n{top_rs} location: chr{loc.get('chromosomeName')}:{loc.get('chromosomePosition')}")
```

### FTP Download Integration

```python
import requests
from pathlib import Path

def download_summary_statistics(gcst_id, output_dir="."):
    """Download summary statistics from FTP"""
    # FTP URL pattern
    ftp_base = "http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics"

    # Try harmonised file first
    harmonised_url = f"{ftp_base}/{gcst_id}/harmonised/{gcst_id}-harmonised.tsv.gz"

    output_path = Path(output_dir) / f"{gcst_id}.tsv.gz"

    try:
        response = requests.get(harmonised_url, stream=True)
        response.raise_for_status()

        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"Downloaded {gcst_id} to {output_path}")
        return output_path

    except requests.exceptions.HTTPError:
        print(f"Harmonised file not found for {gcst_id}")
        return None

# Example usage
download_summary_statistics("GCST001234", output_dir="./sumstats")
```

## Additional Resources

- **Interactive API Documentation**: https://www.ebi.ac.uk/gwas/rest/docs/api
- **Summary Statistics API Docs**: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
- **Workshop Materials**: https://github.com/EBISPOT/GWAS_Catalog-workshop
- **Blog Post on API v2**: https://ebispot.github.io/gwas-blog/rest-api-v2-release/
- **R Package (gwasrapidd)**: https://cran.r-project.org/package=gwasrapidd