Files
2025-11-30 08:30:10 +08:00

21 KiB

GWAS Catalog API Reference

Comprehensive reference for the GWAS Catalog REST APIs, including endpoint specifications, query parameters, response formats, and advanced usage patterns.

Table of Contents

API Overview

The GWAS Catalog provides two complementary REST APIs:

  1. GWAS Catalog REST API: Access to curated SNP-trait associations, studies, and metadata
  2. Summary Statistics API: Access to full GWAS summary statistics (all tested variants)

Both APIs use RESTful design principles with JSON responses in HAL (Hypertext Application Language) format, which includes _links for resource navigation.

Base URLs

GWAS Catalog API:         https://www.ebi.ac.uk/gwas/rest/api
Summary Statistics API:   https://www.ebi.ac.uk/gwas/summary-statistics/api

Version Information

The GWAS Catalog REST API v2.0 was released in 2024, with significant improvements:

  • New endpoints (publications, genes, genomic context, ancestries)
  • Enhanced data exposure (cohorts, background traits, licenses)
  • Improved query capabilities
  • Better performance and documentation

The previous API version remains available until May 2026 for backward compatibility.

Authentication and Rate Limiting

Authentication

No authentication required - Both APIs are open access and do not require API keys or registration.

Rate Limiting

While no explicit rate limits are documented, follow best practices:

  • Implement delays between consecutive requests (e.g., 0.1-0.5 seconds)
  • Use pagination for large result sets
  • Cache responses locally
  • Use bulk downloads (FTP) for genome-wide data
  • Avoid hammering the API with rapid consecutive requests

Example with rate limiting:

import requests
from time import sleep

def query_with_rate_limit(url, delay=0.1):
    response = requests.get(url)
    sleep(delay)
    return response.json()

GWAS Catalog REST API

The main API provides access to curated GWAS associations, studies, variants, and traits.

Core Endpoints

1. Studies

Get all studies:

GET /studies

Get specific study:

GET /studies/{accessionId}

Search studies:

GET /studies/search/findByPublicationIdPubmedId?pubmedId={pmid}
GET /studies/search/findByDiseaseTrait?diseaseTrait={trait}

Query Parameters:

  • page: Page number (0-indexed)
  • size: Results per page (default: 20)
  • sort: Sort field (e.g., publicationDate,desc)

Example:

import requests

# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

print(f"Title: {study.get('title')}")
print(f"PMID: {study.get('publicationInfo', {}).get('pubmedId')}")
print(f"Sample size: {study.get('initialSampleSize')}")

Response Fields:

  • accessionId: Study identifier (GCST ID)
  • title: Study title
  • publicationInfo: Publication details including PMID
  • initialSampleSize: Discovery cohort description
  • replicationSampleSize: Replication cohort description
  • ancestries: Population ancestry information
  • genotypingTechnologies: Array or sequencing platforms
  • _links: Links to related resources

2. Associations

Get all associations:

GET /associations

Get specific association:

GET /associations/{associationId}

Get associations for a trait:

GET /efoTraits/{efoId}/associations

Get associations for a variant:

GET /singleNucleotidePolymorphisms/{rsId}/associations

Query Parameters:

  • projection: Response projection (e.g., associationBySnp)
  • page, size, sort: Pagination controls

Example:

import requests

# Find all associations for type 2 diabetes
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}/associations"
params = {"size": 100, "page": 0}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
data = response.json()

associations = data.get('_embedded', {}).get('associations', [])
print(f"Found {len(associations)} associations")

Response Fields:

  • rsId: Variant identifier
  • strongestAllele: Risk or effect allele
  • pvalue: Association p-value
  • pvalueText: P-value as reported (may include inequality)
  • pvalueMantissa: Mantissa of p-value
  • pvalueExponent: Exponent of p-value
  • orPerCopyNum: Odds ratio per allele copy
  • betaNum: Effect size (quantitative traits)
  • betaUnit: Unit of measurement
  • range: Confidence interval
  • standardError: Standard error
  • efoTrait: Trait name
  • mappedLabel: EFO standardized term
  • studyId: Associated study accession

3. Variants (Single Nucleotide Polymorphisms)

Get variant details:

GET /singleNucleotidePolymorphisms/{rsId}

Search variants:

GET /singleNucleotidePolymorphisms/search/findByRsId?rsId={rsId}
GET /singleNucleotidePolymorphisms/search/findByChromBpLocationRange?chrom={chr}&bpStart={start}&bpEnd={end}
GET /singleNucleotidePolymorphisms/search/findByGene?geneName={gene}

Example:

import requests

# Get variant information
rs_id = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant = response.json()

print(f"rsID: {variant.get('rsId')}")
print(f"Location: chr{variant.get('locations', [{}])[0].get('chromosomeName')}:{variant.get('locations', [{}])[0].get('chromosomePosition')}")

Response Fields:

  • rsId: rs number
  • merged: Indicates if variant merged with another
  • functionalClass: Variant consequence
  • locations: Array of genomic locations
    • chromosomeName: Chromosome number
    • chromosomePosition: Base pair position
    • region: Genomic region information
  • genomicContexts: Nearby genes
  • lastUpdateDate: Last modification date

4. Traits (EFO Terms)

Get trait information:

GET /efoTraits/{efoId}

Search traits:

GET /efoTraits/search/findByEfoUri?uri={efoUri}
GET /efoTraits/search/findByTraitIgnoreCase?trait={traitName}

Example:

import requests

# Get trait details
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait = response.json()

print(f"Trait: {trait.get('trait')}")
print(f"EFO URI: {trait.get('uri')}")

5. Publications

Get publication information:

GET /publications
GET /publications/{publicationId}
GET /publications/search/findByPubmedId?pubmedId={pmid}

6. Genes

Get gene information:

GET /genes
GET /genes/{geneId}
GET /genes/search/findByGeneName?geneName={symbol}

Pagination and Navigation

All list endpoints support pagination:

import requests

def get_all_associations(trait_id):
    """Retrieve all associations for a trait with pagination"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"
    all_associations = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        all_associations.extend(associations)
        page += 1

    return all_associations

Responses include _links for resource navigation:

import requests

# Get study and follow links to associations
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795")
study = response.json()

# Follow link to associations
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
associations = associations_response.json()

Summary Statistics API

Access full GWAS summary statistics for studies that have deposited complete data.

Base URL

https://www.ebi.ac.uk/gwas/summary-statistics/api

Core Endpoints

1. Studies

Get all studies with summary statistics:

GET /studies

Get specific study:

GET /studies/{gcstId}

2. Traits

Get trait information:

GET /traits/{efoId}

Get associations for a trait:

GET /traits/{efoId}/associations

Query Parameters:

  • p_lower: Lower p-value threshold
  • p_upper: Upper p-value threshold
  • size: Number of results
  • page: Page number

Example:

import requests

# Find highly significant associations for a trait
trait_id = "EFO_0001360"
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/traits/{trait_id}/associations"
params = {
    "p_upper": "0.000000001",  # p < 1e-9
    "size": 100
}
response = requests.get(url, params=params)
results = response.json()

3. Chromosomes

Get associations by chromosome:

GET /chromosomes/{chromosome}/associations

Query by genomic region:

GET /chromosomes/{chromosome}/associations?start={start}&end={end}

Example:

import requests

# Query variants in a specific region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/chromosomes/{chromosome}/associations"
params = {
    "start": start_pos,
    "end": end_pos,
    "size": 1000
}
response = requests.get(url, params=params)
variants = response.json()

4. Variants

Get specific variant across studies:

GET /variants/{variantId}

Search by variant ID:

GET /variants/{variantId}/associations

Response Fields

Association Fields:

  • variant_id: Variant identifier
  • chromosome: Chromosome number
  • base_pair_location: Position (bp)
  • effect_allele: Effect allele
  • other_allele: Reference allele
  • effect_allele_frequency: Allele frequency
  • beta: Effect size
  • standard_error: Standard error
  • p_value: P-value
  • ci_lower: Lower confidence interval
  • ci_upper: Upper confidence interval
  • odds_ratio: Odds ratio (case-control studies)
  • study_accession: GCST ID

Response Formats

Content Type

All API requests should include the header:

Content-Type: application/json

HAL Format

Responses follow the HAL (Hypertext Application Language) specification:

{
  "_embedded": {
    "associations": [
      {
        "rsId": "rs7903146",
        "pvalue": 1.2e-30,
        "efoTrait": "type 2 diabetes",
        "_links": {
          "self": {
            "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/12345"
          }
        }
      }
    ]
  },
  "_links": {
    "self": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=0"
    },
    "next": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=1"
    }
  },
  "page": {
    "size": 20,
    "totalElements": 1523,
    "totalPages": 77,
    "number": 0
  }
}

Page Metadata

Paginated responses include page information:

  • size: Items per page
  • totalElements: Total number of results
  • totalPages: Total number of pages
  • number: Current page number (0-indexed)

Error Handling

HTTP Status Codes

  • 200 OK: Successful request
  • 400 Bad Request: Invalid parameters
  • 404 Not Found: Resource not found
  • 500 Internal Server Error: Server error

Error Response Format

{
  "timestamp": "2025-10-19T12:00:00.000+00:00",
  "status": 404,
  "error": "Not Found",
  "message": "No association found with id: 12345",
  "path": "/gwas/rest/api/associations/12345"
}

Error Handling Example

import requests

def safe_api_request(url, params=None):
    """Make API request with error handling"""
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")
        print(f"Response: {response.text}")
        return None
    except requests.exceptions.ConnectionError:
        print("Connection error - check network")
        return None
    except requests.exceptions.Timeout:
        print("Request timed out")
        return None
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return None

Advanced Query Patterns

1. Cross-referencing Variants and Traits

import requests

def get_variant_pleiotropy(rs_id):
    """Get all traits associated with a variant"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/{rs_id}/associations"
    params = {"projection": "associationBySnp"}

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    data = response.json()

    traits = {}
    for assoc in data.get('_embedded', {}).get('associations', []):
        trait = assoc.get('efoTrait')
        pvalue = assoc.get('pvalue')
        if trait:
            if trait not in traits or float(pvalue) < float(traits[trait]):
                traits[trait] = pvalue

    return traits

# Example usage
pleiotropy = get_variant_pleiotropy('rs7903146')
for trait, pval in sorted(pleiotropy.items(), key=lambda x: float(x[1])):
    print(f"{trait}: p={pval}")

2. Filtering by P-value Threshold

import requests

def get_significant_associations(trait_id, p_threshold=5e-8):
    """Get genome-wide significant associations"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append(assoc)

        page += 1

    return results

3. Combining Main and Summary Statistics APIs

import requests

def get_complete_variant_data(rs_id):
    """Get variant data from both APIs"""
    main_url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

    # Get basic variant info
    response = requests.get(main_url, headers={"Content-Type": "application/json"})
    variant_info = response.json()

    # Get associations
    assoc_url = f"{main_url}/associations"
    response = requests.get(assoc_url, headers={"Content-Type": "application/json"})
    associations = response.json()

    # Could also query summary statistics API for this variant
    # across all studies with summary data

    return {
        "variant": variant_info,
        "associations": associations
    }

4. Genomic Region Queries

import requests

def query_region(chromosome, start, end, p_threshold=None):
    """Query variants in genomic region"""
    # From main API
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
    params = {
        "chrom": chromosome,
        "bpStart": start,
        "bpEnd": end,
        "size": 1000
    }

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    variants = response.json()

    # Can also query summary statistics API
    sumstats_url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chromosome}/associations"
    sumstats_params = {"start": start, "end": end, "size": 1000}
    if p_threshold:
        sumstats_params["p_upper"] = str(p_threshold)

    sumstats_response = requests.get(sumstats_url, params=sumstats_params)
    sumstats = sumstats_response.json()

    return {
        "catalog_variants": variants,
        "summary_stats": sumstats
    }

Integration Examples

Complete Workflow: Disease Genetic Architecture

import requests
import pandas as pd
from time import sleep

class GWASCatalogQuery:
    def __init__(self):
        self.base_url = "https://www.ebi.ac.uk/gwas/rest/api"
        self.headers = {"Content-Type": "application/json"}

    def get_trait_associations(self, trait_id, p_threshold=5e-8):
        """Get all associations for a trait"""
        url = f"{self.base_url}/efoTraits/{trait_id}/associations"
        results = []
        page = 0

        while True:
            params = {"page": page, "size": 100}
            response = requests.get(url, params=params, headers=self.headers)

            if response.status_code != 200:
                break

            data = response.json()
            associations = data.get('_embedded', {}).get('associations', [])

            if not associations:
                break

            for assoc in associations:
                pvalue = assoc.get('pvalue')
                if pvalue and float(pvalue) <= p_threshold:
                    results.append({
                        'rs_id': assoc.get('rsId'),
                        'pvalue': float(pvalue),
                        'risk_allele': assoc.get('strongestAllele'),
                        'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                        'study': assoc.get('studyId'),
                        'pubmed_id': assoc.get('pubmedId')
                    })

            page += 1
            sleep(0.1)

        return pd.DataFrame(results)

    def get_variant_details(self, rs_id):
        """Get detailed variant information"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/{rs_id}"
        response = requests.get(url, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

    def get_gene_associations(self, gene_name):
        """Get variants associated with a gene"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/search/findByGene"
        params = {"geneName": gene_name}
        response = requests.get(url, params=params, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

# Example usage
gwas = GWASCatalogQuery()

# Query type 2 diabetes associations
df = gwas.get_trait_associations('EFO_0001360')
print(f"Found {len(df)} genome-wide significant associations")
print(f"Unique variants: {df['rs_id'].nunique()}")

# Get top variants
top_variants = df.nsmallest(10, 'pvalue')
print("\nTop 10 variants:")
print(top_variants[['rs_id', 'pvalue', 'risk_allele']])

# Get details for top variant
if len(top_variants) > 0:
    top_rs = top_variants.iloc[0]['rs_id']
    variant_info = gwas.get_variant_details(top_rs)
    if variant_info:
        loc = variant_info.get('locations', [{}])[0]
        print(f"\n{top_rs} location: chr{loc.get('chromosomeName')}:{loc.get('chromosomePosition')}")

FTP Download Integration

import requests
from pathlib import Path

def download_summary_statistics(gcst_id, output_dir="."):
    """Download summary statistics from FTP"""
    # FTP URL pattern
    ftp_base = "http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics"

    # Try harmonised file first
    harmonised_url = f"{ftp_base}/{gcst_id}/harmonised/{gcst_id}-harmonised.tsv.gz"

    output_path = Path(output_dir) / f"{gcst_id}.tsv.gz"

    try:
        response = requests.get(harmonised_url, stream=True)
        response.raise_for_status()

        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"Downloaded {gcst_id} to {output_path}")
        return output_path

    except requests.exceptions.HTTPError:
        print(f"Harmonised file not found for {gcst_id}")
        return None

# Example usage
download_summary_statistics("GCST001234", output_dir="./sumstats")

Additional Resources