zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

21 KiB

Raw Permalink Blame History

GWAS Catalog API Reference

Comprehensive reference for the GWAS Catalog REST APIs, including endpoint specifications, query parameters, response formats, and advanced usage patterns.

API Overview
Authentication and Rate Limiting
GWAS Catalog REST API
Summary Statistics API
Response Formats
Error Handling
Advanced Query Patterns
Integration Examples

API Overview

The GWAS Catalog provides two complementary REST APIs:

GWAS Catalog REST API: Access to curated SNP-trait associations, studies, and metadata
Summary Statistics API: Access to full GWAS summary statistics (all tested variants)

Both APIs use RESTful design principles with JSON responses in HAL (Hypertext Application Language) format, which includes _links for resource navigation.

Base URLs

GWAS Catalog API:         https://www.ebi.ac.uk/gwas/rest/api
Summary Statistics API:   https://www.ebi.ac.uk/gwas/summary-statistics/api

Version Information

The GWAS Catalog REST API v2.0 was released in 2024, with significant improvements:

New endpoints (publications, genes, genomic context, ancestries)
Enhanced data exposure (cohorts, background traits, licenses)
Improved query capabilities
Better performance and documentation

The previous API version remains available until May 2026 for backward compatibility.

Authentication and Rate Limiting

Authentication

No authentication required - Both APIs are open access and do not require API keys or registration.

Rate Limiting

While no explicit rate limits are documented, follow best practices:

Implement delays between consecutive requests (e.g., 0.1-0.5 seconds)
Use pagination for large result sets
Cache responses locally
Use bulk downloads (FTP) for genome-wide data
Avoid hammering the API with rapid consecutive requests

Example with rate limiting:

import requests
from time import sleep

def query_with_rate_limit(url, delay=0.1):
    response = requests.get(url)
    sleep(delay)
    return response.json()

GWAS Catalog REST API

The main API provides access to curated GWAS associations, studies, variants, and traits.

Core Endpoints

1. Studies

Get all studies:

GET /studies

Get specific study:

GET /studies/{accessionId}

Search studies:

GET /studies/search/findByPublicationIdPubmedId?pubmedId={pmid}
GET /studies/search/findByDiseaseTrait?diseaseTrait={trait}

Query Parameters:

page: Page number (0-indexed)
size: Results per page (default: 20)
sort: Sort field (e.g., publicationDate,desc)

Example:

import requests

# Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
response = requests.get(url, headers={"Content-Type": "application/json"})
study = response.json()

print(f"Title: {study.get('title')}")
print(f"PMID: {study.get('publicationInfo', {}).get('pubmedId')}")
print(f"Sample size: {study.get('initialSampleSize')}")

Response Fields:

accessionId: Study identifier (GCST ID)
title: Study title
publicationInfo: Publication details including PMID
initialSampleSize: Discovery cohort description
replicationSampleSize: Replication cohort description
ancestries: Population ancestry information
genotypingTechnologies: Array or sequencing platforms
_links: Links to related resources

2. Associations

Get all associations:

GET /associations

Get specific association:

GET /associations/{associationId}

Get associations for a trait:

GET /efoTraits/{efoId}/associations

Get associations for a variant:

GET /singleNucleotidePolymorphisms/{rsId}/associations

Query Parameters:

projection: Response projection (e.g., associationBySnp)
page, size, sort: Pagination controls

Example:

import requests

# Find all associations for type 2 diabetes
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}/associations"
params = {"size": 100, "page": 0}
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
data = response.json()

associations = data.get('_embedded', {}).get('associations', [])
print(f"Found {len(associations)} associations")

Response Fields:

rsId: Variant identifier
strongestAllele: Risk or effect allele
pvalue: Association p-value
pvalueText: P-value as reported (may include inequality)
pvalueMantissa: Mantissa of p-value
pvalueExponent: Exponent of p-value
orPerCopyNum: Odds ratio per allele copy
betaNum: Effect size (quantitative traits)
betaUnit: Unit of measurement
range: Confidence interval
standardError: Standard error
efoTrait: Trait name
mappedLabel: EFO standardized term
studyId: Associated study accession

3. Variants (Single Nucleotide Polymorphisms)

Get variant details:

GET /singleNucleotidePolymorphisms/{rsId}

Search variants:

GET /singleNucleotidePolymorphisms/search/findByRsId?rsId={rsId}
GET /singleNucleotidePolymorphisms/search/findByChromBpLocationRange?chrom={chr}&bpStart={start}&bpEnd={end}
GET /singleNucleotidePolymorphisms/search/findByGene?geneName={gene}

Example:

import requests

# Get variant information
rs_id = "rs7903146"
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
variant = response.json()

print(f"rsID: {variant.get('rsId')}")
print(f"Location: chr{variant.get('locations', [{}])[0].get('chromosomeName')}:{variant.get('locations', [{}])[0].get('chromosomePosition')}")

Response Fields:

rsId: rs number
merged: Indicates if variant merged with another
functionalClass: Variant consequence
locations: Array of genomic locations
- chromosomeName: Chromosome number
- chromosomePosition: Base pair position
- region: Genomic region information
genomicContexts: Nearby genes
lastUpdateDate: Last modification date

4. Traits (EFO Terms)

Get trait information:

GET /efoTraits/{efoId}

Search traits:

GET /efoTraits/search/findByEfoUri?uri={efoUri}
GET /efoTraits/search/findByTraitIgnoreCase?trait={traitName}

Example:

import requests

# Get trait details
trait_id = "EFO_0001360"
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}"
response = requests.get(url, headers={"Content-Type": "application/json"})
trait = response.json()

print(f"Trait: {trait.get('trait')}")
print(f"EFO URI: {trait.get('uri')}")

5. Publications

Get publication information:

GET /publications
GET /publications/{publicationId}
GET /publications/search/findByPubmedId?pubmedId={pmid}

6. Genes

Get gene information:

GET /genes
GET /genes/{geneId}
GET /genes/search/findByGeneName?geneName={symbol}

Pagination and Navigation

All list endpoints support pagination:

import requests

def get_all_associations(trait_id):
    """Retrieve all associations for a trait with pagination"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"
    all_associations = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        all_associations.extend(associations)
        page += 1

    return all_associations

HAL Links

Responses include _links for resource navigation:

import requests

# Get study and follow links to associations
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795")
study = response.json()

# Follow link to associations
associations_url = study['_links']['associations']['href']
associations_response = requests.get(associations_url)
associations = associations_response.json()

Summary Statistics API

Access full GWAS summary statistics for studies that have deposited complete data.

Base URL

https://www.ebi.ac.uk/gwas/summary-statistics/api

Core Endpoints

1. Studies

Get all studies with summary statistics:

GET /studies

Get specific study:

GET /studies/{gcstId}

2. Traits

Get trait information:

GET /traits/{efoId}

Get associations for a trait:

GET /traits/{efoId}/associations

Query Parameters:

p_lower: Lower p-value threshold
p_upper: Upper p-value threshold
size: Number of results
page: Page number

Example:

import requests

# Find highly significant associations for a trait
trait_id = "EFO_0001360"
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/traits/{trait_id}/associations"
params = {
    "p_upper": "0.000000001",  # p < 1e-9
    "size": 100
}
response = requests.get(url, params=params)
results = response.json()

3. Chromosomes

Get associations by chromosome:

GET /chromosomes/{chromosome}/associations

Query by genomic region:

GET /chromosomes/{chromosome}/associations?start={start}&end={end}

Example:

import requests

# Query variants in a specific region
chromosome = "10"
start_pos = 114000000
end_pos = 115000000

base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
url = f"{base_url}/chromosomes/{chromosome}/associations"
params = {
    "start": start_pos,
    "end": end_pos,
    "size": 1000
}
response = requests.get(url, params=params)
variants = response.json()

4. Variants

Get specific variant across studies:

GET /variants/{variantId}

Search by variant ID:

GET /variants/{variantId}/associations

Response Fields

Association Fields:

variant_id: Variant identifier
chromosome: Chromosome number
base_pair_location: Position (bp)
effect_allele: Effect allele
other_allele: Reference allele
effect_allele_frequency: Allele frequency
beta: Effect size
standard_error: Standard error
p_value: P-value
ci_lower: Lower confidence interval
ci_upper: Upper confidence interval
odds_ratio: Odds ratio (case-control studies)
study_accession: GCST ID

Response Formats

Content Type

All API requests should include the header:

Content-Type: application/json

HAL Format

Responses follow the HAL (Hypertext Application Language) specification:

{
  "_embedded": {
    "associations": [
      {
        "rsId": "rs7903146",
        "pvalue": 1.2e-30,
        "efoTrait": "type 2 diabetes",
        "_links": {
          "self": {
            "href": "https://www.ebi.ac.uk/gwas/rest/api/associations/12345"
          }
        }
      }
    ]
  },
  "_links": {
    "self": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=0"
    },
    "next": {
      "href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=1"
    }
  },
  "page": {
    "size": 20,
    "totalElements": 1523,
    "totalPages": 77,
    "number": 0
  }
}

Page Metadata

Paginated responses include page information:

size: Items per page
totalElements: Total number of results
totalPages: Total number of pages
number: Current page number (0-indexed)

Error Handling

HTTP Status Codes

200 OK: Successful request
400 Bad Request: Invalid parameters
404 Not Found: Resource not found
500 Internal Server Error: Server error

Error Response Format

{
  "timestamp": "2025-10-19T12:00:00.000+00:00",
  "status": 404,
  "error": "Not Found",
  "message": "No association found with id: 12345",
  "path": "/gwas/rest/api/associations/12345"
}

Error Handling Example

import requests

def safe_api_request(url, params=None):
    """Make API request with error handling"""
    try:
        response = requests.get(url, params=params, timeout=30)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.HTTPError as e:
        print(f"HTTP Error: {e}")
        print(f"Response: {response.text}")
        return None
    except requests.exceptions.ConnectionError:
        print("Connection error - check network")
        return None
    except requests.exceptions.Timeout:
        print("Request timed out")
        return None
    except requests.exceptions.RequestException as e:
        print(f"Request error: {e}")
        return None

Advanced Query Patterns

1. Cross-referencing Variants and Traits

import requests

def get_variant_pleiotropy(rs_id):
    """Get all traits associated with a variant"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/{rs_id}/associations"
    params = {"projection": "associationBySnp"}

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    data = response.json()

    traits = {}
    for assoc in data.get('_embedded', {}).get('associations', []):
        trait = assoc.get('efoTrait')
        pvalue = assoc.get('pvalue')
        if trait:
            if trait not in traits or float(pvalue) < float(traits[trait]):
                traits[trait] = pvalue

    return traits

# Example usage
pleiotropy = get_variant_pleiotropy('rs7903146')
for trait, pval in sorted(pleiotropy.items(), key=lambda x: float(x[1])):
    print(f"{trait}: p={pval}")

2. Filtering by P-value Threshold

import requests

def get_significant_associations(trait_id, p_threshold=5e-8):
    """Get genome-wide significant associations"""
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/efoTraits/{trait_id}/associations"

    results = []
    page = 0

    while True:
        params = {"page": page, "size": 100}
        response = requests.get(url, params=params, headers={"Content-Type": "application/json"})

        if response.status_code != 200:
            break

        data = response.json()
        associations = data.get('_embedded', {}).get('associations', [])

        if not associations:
            break

        for assoc in associations:
            pvalue = assoc.get('pvalue')
            if pvalue and float(pvalue) <= p_threshold:
                results.append(assoc)

        page += 1

    return results

3. Combining Main and Summary Statistics APIs

import requests

def get_complete_variant_data(rs_id):
    """Get variant data from both APIs"""
    main_url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"

    # Get basic variant info
    response = requests.get(main_url, headers={"Content-Type": "application/json"})
    variant_info = response.json()

    # Get associations
    assoc_url = f"{main_url}/associations"
    response = requests.get(assoc_url, headers={"Content-Type": "application/json"})
    associations = response.json()

    # Could also query summary statistics API for this variant
    # across all studies with summary data

    return {
        "variant": variant_info,
        "associations": associations
    }

4. Genomic Region Queries

import requests

def query_region(chromosome, start, end, p_threshold=None):
    """Query variants in genomic region"""
    # From main API
    base_url = "https://www.ebi.ac.uk/gwas/rest/api"
    url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
    params = {
        "chrom": chromosome,
        "bpStart": start,
        "bpEnd": end,
        "size": 1000
    }

    response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
    variants = response.json()

    # Can also query summary statistics API
    sumstats_url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chromosome}/associations"
    sumstats_params = {"start": start, "end": end, "size": 1000}
    if p_threshold:
        sumstats_params["p_upper"] = str(p_threshold)

    sumstats_response = requests.get(sumstats_url, params=sumstats_params)
    sumstats = sumstats_response.json()

    return {
        "catalog_variants": variants,
        "summary_stats": sumstats
    }

Integration Examples

Complete Workflow: Disease Genetic Architecture

import requests
import pandas as pd
from time import sleep

class GWASCatalogQuery:
    def __init__(self):
        self.base_url = "https://www.ebi.ac.uk/gwas/rest/api"
        self.headers = {"Content-Type": "application/json"}

    def get_trait_associations(self, trait_id, p_threshold=5e-8):
        """Get all associations for a trait"""
        url = f"{self.base_url}/efoTraits/{trait_id}/associations"
        results = []
        page = 0

        while True:
            params = {"page": page, "size": 100}
            response = requests.get(url, params=params, headers=self.headers)

            if response.status_code != 200:
                break

            data = response.json()
            associations = data.get('_embedded', {}).get('associations', [])

            if not associations:
                break

            for assoc in associations:
                pvalue = assoc.get('pvalue')
                if pvalue and float(pvalue) <= p_threshold:
                    results.append({
                        'rs_id': assoc.get('rsId'),
                        'pvalue': float(pvalue),
                        'risk_allele': assoc.get('strongestAllele'),
                        'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
                        'study': assoc.get('studyId'),
                        'pubmed_id': assoc.get('pubmedId')
                    })

            page += 1
            sleep(0.1)

        return pd.DataFrame(results)

    def get_variant_details(self, rs_id):
        """Get detailed variant information"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/{rs_id}"
        response = requests.get(url, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

    def get_gene_associations(self, gene_name):
        """Get variants associated with a gene"""
        url = f"{self.base_url}/singleNucleotidePolymorphisms/search/findByGene"
        params = {"geneName": gene_name}
        response = requests.get(url, params=params, headers=self.headers)

        if response.status_code == 200:
            return response.json()
        return None

# Example usage
gwas = GWASCatalogQuery()

# Query type 2 diabetes associations
df = gwas.get_trait_associations('EFO_0001360')
print(f"Found {len(df)} genome-wide significant associations")
print(f"Unique variants: {df['rs_id'].nunique()}")

# Get top variants
top_variants = df.nsmallest(10, 'pvalue')
print("\nTop 10 variants:")
print(top_variants[['rs_id', 'pvalue', 'risk_allele']])

# Get details for top variant
if len(top_variants) > 0:
    top_rs = top_variants.iloc[0]['rs_id']
    variant_info = gwas.get_variant_details(top_rs)
    if variant_info:
        loc = variant_info.get('locations', [{}])[0]
        print(f"\n{top_rs} location: chr{loc.get('chromosomeName')}:{loc.get('chromosomePosition')}")

FTP Download Integration

import requests
from pathlib import Path

def download_summary_statistics(gcst_id, output_dir="."):
    """Download summary statistics from FTP"""
    # FTP URL pattern
    ftp_base = "http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics"

    # Try harmonised file first
    harmonised_url = f"{ftp_base}/{gcst_id}/harmonised/{gcst_id}-harmonised.tsv.gz"

    output_path = Path(output_dir) / f"{gcst_id}.tsv.gz"

    try:
        response = requests.get(harmonised_url, stream=True)
        response.raise_for_status()

        with open(output_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"Downloaded {gcst_id} to {output_path}")
        return output_path

    except requests.exceptions.HTTPError:
        print(f"Harmonised file not found for {gcst_id}")
        return None

# Example usage
download_summary_statistics("GCST001234", output_dir="./sumstats")

Additional Resources

Interactive API Documentation: https://www.ebi.ac.uk/gwas/rest/docs/api
Summary Statistics API Docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
Workshop Materials: https://github.com/EBISPOT/GWAS_Catalog-workshop
Blog Post on API v2: https://ebispot.github.io/gwas-blog/rest-api-v2-release/
R Package (gwasrapidd): https://cran.r-project.org/package=gwasrapidd

21 KiB Raw Permalink Blame History

GWAS Catalog API Reference

Table of Contents

API Overview

Base URLs

Version Information

Authentication and Rate Limiting

Authentication

Rate Limiting

GWAS Catalog REST API

Core Endpoints

1. Studies

2. Associations

3. Variants (Single Nucleotide Polymorphisms)

4. Traits (EFO Terms)

5. Publications

6. Genes

Pagination and Navigation

HAL Links

Summary Statistics API

Base URL

Core Endpoints

1. Studies

2. Traits

3. Chromosomes

4. Variants

Response Fields

Response Formats

Content Type

HAL Format

Page Metadata

Error Handling

HTTP Status Codes

Error Response Format

Error Handling Example

Advanced Query Patterns

1. Cross-referencing Variants and Traits

2. Filtering by P-value Threshold

3. Combining Main and Summary Statistics APIs

4. Genomic Region Queries

Integration Examples

Complete Workflow: Disease Genetic Architecture

FTP Download Integration

Additional Resources

21 KiB

Raw Permalink Blame History