794 lines
21 KiB
Markdown
794 lines
21 KiB
Markdown
# GWAS Catalog API Reference
|
|
|
|
Comprehensive reference for the GWAS Catalog REST APIs, including endpoint specifications, query parameters, response formats, and advanced usage patterns.
|
|
|
|
## Table of Contents
|
|
|
|
- [API Overview](#api-overview)
|
|
- [Authentication and Rate Limiting](#authentication-and-rate-limiting)
|
|
- [GWAS Catalog REST API](#gwas-catalog-rest-api)
|
|
- [Summary Statistics API](#summary-statistics-api)
|
|
- [Response Formats](#response-formats)
|
|
- [Error Handling](#error-handling)
|
|
- [Advanced Query Patterns](#advanced-query-patterns)
|
|
- [Integration Examples](#integration-examples)
|
|
|
|
## API Overview
|
|
|
|
The GWAS Catalog provides two complementary REST APIs:
|
|
|
|
1. **GWAS Catalog REST API**: Access to curated SNP-trait associations, studies, and metadata
|
|
2. **Summary Statistics API**: Access to full GWAS summary statistics (all tested variants)
|
|
|
|
Both APIs use RESTful design principles with JSON responses in HAL (Hypertext Application Language) format, which includes `_links` for resource navigation.
|
|
|
|
### Base URLs
|
|
|
|
```
|
|
GWAS Catalog API: https://www.ebi.ac.uk/gwas/rest/api
|
|
Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/api
|
|
```
|
|
|
|
### Version Information
|
|
|
|
The GWAS Catalog REST API v2.0 was released in 2024, with significant improvements:
|
|
- New endpoints (publications, genes, genomic context, ancestries)
|
|
- Enhanced data exposure (cohorts, background traits, licenses)
|
|
- Improved query capabilities
|
|
- Better performance and documentation
|
|
|
|
The previous API version remains available until May 2026 for backward compatibility.
|
|
|
|
## Authentication and Rate Limiting
|
|
|
|
### Authentication
|
|
|
|
**No authentication required** - Both APIs are open access and do not require API keys or registration.
|
|
|
|
### Rate Limiting
|
|
|
|
While no explicit rate limits are documented, follow best practices:
|
|
- Implement delays between consecutive requests (e.g., 0.1-0.5 seconds)
|
|
- Use pagination for large result sets
|
|
- Cache responses locally
|
|
- Use bulk downloads (FTP) for genome-wide data
|
|
- Avoid hammering the API with rapid consecutive requests
|
|
|
|
**Example with rate limiting:**
|
|
```python
|
|
import requests
|
|
from time import sleep
|
|
|
|
def query_with_rate_limit(url, delay=0.1):
|
|
response = requests.get(url)
|
|
sleep(delay)
|
|
return response.json()
|
|
```
|
|
|
|
## GWAS Catalog REST API
|
|
|
|
The main API provides access to curated GWAS associations, studies, variants, and traits.
|
|
|
|
### Core Endpoints
|
|
|
|
#### 1. Studies
|
|
|
|
**Get all studies:**
|
|
```
|
|
GET /studies
|
|
```
|
|
|
|
**Get specific study:**
|
|
```
|
|
GET /studies/{accessionId}
|
|
```
|
|
|
|
**Search studies:**
|
|
```
|
|
GET /studies/search/findByPublicationIdPubmedId?pubmedId={pmid}
|
|
GET /studies/search/findByDiseaseTrait?diseaseTrait={trait}
|
|
```
|
|
|
|
**Query Parameters:**
|
|
- `page`: Page number (0-indexed)
|
|
- `size`: Results per page (default: 20)
|
|
- `sort`: Sort field (e.g., `publicationDate,desc`)
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Get a specific study
|
|
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795"
|
|
response = requests.get(url, headers={"Content-Type": "application/json"})
|
|
study = response.json()
|
|
|
|
print(f"Title: {study.get('title')}")
|
|
print(f"PMID: {study.get('publicationInfo', {}).get('pubmedId')}")
|
|
print(f"Sample size: {study.get('initialSampleSize')}")
|
|
```
|
|
|
|
**Response Fields:**
|
|
- `accessionId`: Study identifier (GCST ID)
|
|
- `title`: Study title
|
|
- `publicationInfo`: Publication details including PMID
|
|
- `initialSampleSize`: Discovery cohort description
|
|
- `replicationSampleSize`: Replication cohort description
|
|
- `ancestries`: Population ancestry information
|
|
- `genotypingTechnologies`: Array or sequencing platforms
|
|
- `_links`: Links to related resources
|
|
|
|
#### 2. Associations
|
|
|
|
**Get all associations:**
|
|
```
|
|
GET /associations
|
|
```
|
|
|
|
**Get specific association:**
|
|
```
|
|
GET /associations/{associationId}
|
|
```
|
|
|
|
**Get associations for a trait:**
|
|
```
|
|
GET /efoTraits/{efoId}/associations
|
|
```
|
|
|
|
**Get associations for a variant:**
|
|
```
|
|
GET /singleNucleotidePolymorphisms/{rsId}/associations
|
|
```
|
|
|
|
**Query Parameters:**
|
|
- `projection`: Response projection (e.g., `associationBySnp`)
|
|
- `page`, `size`, `sort`: Pagination controls
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Find all associations for type 2 diabetes
|
|
trait_id = "EFO_0001360"
|
|
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}/associations"
|
|
params = {"size": 100, "page": 0}
|
|
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
|
|
data = response.json()
|
|
|
|
associations = data.get('_embedded', {}).get('associations', [])
|
|
print(f"Found {len(associations)} associations")
|
|
```
|
|
|
|
**Response Fields:**
|
|
- `rsId`: Variant identifier
|
|
- `strongestAllele`: Risk or effect allele
|
|
- `pvalue`: Association p-value
|
|
- `pvalueText`: P-value as reported (may include inequality)
|
|
- `pvalueMantissa`: Mantissa of p-value
|
|
- `pvalueExponent`: Exponent of p-value
|
|
- `orPerCopyNum`: Odds ratio per allele copy
|
|
- `betaNum`: Effect size (quantitative traits)
|
|
- `betaUnit`: Unit of measurement
|
|
- `range`: Confidence interval
|
|
- `standardError`: Standard error
|
|
- `efoTrait`: Trait name
|
|
- `mappedLabel`: EFO standardized term
|
|
- `studyId`: Associated study accession
|
|
|
|
#### 3. Variants (Single Nucleotide Polymorphisms)
|
|
|
|
**Get variant details:**
|
|
```
|
|
GET /singleNucleotidePolymorphisms/{rsId}
|
|
```
|
|
|
|
**Search variants:**
|
|
```
|
|
GET /singleNucleotidePolymorphisms/search/findByRsId?rsId={rsId}
|
|
GET /singleNucleotidePolymorphisms/search/findByChromBpLocationRange?chrom={chr}&bpStart={start}&bpEnd={end}
|
|
GET /singleNucleotidePolymorphisms/search/findByGene?geneName={gene}
|
|
```
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Get variant information
|
|
rs_id = "rs7903146"
|
|
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
|
|
response = requests.get(url, headers={"Content-Type": "application/json"})
|
|
variant = response.json()
|
|
|
|
print(f"rsID: {variant.get('rsId')}")
|
|
print(f"Location: chr{variant.get('locations', [{}])[0].get('chromosomeName')}:{variant.get('locations', [{}])[0].get('chromosomePosition')}")
|
|
```
|
|
|
|
**Response Fields:**
|
|
- `rsId`: rs number
|
|
- `merged`: Indicates if variant merged with another
|
|
- `functionalClass`: Variant consequence
|
|
- `locations`: Array of genomic locations
|
|
- `chromosomeName`: Chromosome number
|
|
- `chromosomePosition`: Base pair position
|
|
- `region`: Genomic region information
|
|
- `genomicContexts`: Nearby genes
|
|
- `lastUpdateDate`: Last modification date
|
|
|
|
#### 4. Traits (EFO Terms)
|
|
|
|
**Get trait information:**
|
|
```
|
|
GET /efoTraits/{efoId}
|
|
```
|
|
|
|
**Search traits:**
|
|
```
|
|
GET /efoTraits/search/findByEfoUri?uri={efoUri}
|
|
GET /efoTraits/search/findByTraitIgnoreCase?trait={traitName}
|
|
```
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Get trait details
|
|
trait_id = "EFO_0001360"
|
|
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{trait_id}"
|
|
response = requests.get(url, headers={"Content-Type": "application/json"})
|
|
trait = response.json()
|
|
|
|
print(f"Trait: {trait.get('trait')}")
|
|
print(f"EFO URI: {trait.get('uri')}")
|
|
```
|
|
|
|
#### 5. Publications
|
|
|
|
**Get publication information:**
|
|
```
|
|
GET /publications
|
|
GET /publications/{publicationId}
|
|
GET /publications/search/findByPubmedId?pubmedId={pmid}
|
|
```
|
|
|
|
#### 6. Genes
|
|
|
|
**Get gene information:**
|
|
```
|
|
GET /genes
|
|
GET /genes/{geneId}
|
|
GET /genes/search/findByGeneName?geneName={symbol}
|
|
```
|
|
|
|
### Pagination and Navigation
|
|
|
|
All list endpoints support pagination:
|
|
|
|
```python
|
|
import requests
|
|
|
|
def get_all_associations(trait_id):
|
|
"""Retrieve all associations for a trait with pagination"""
|
|
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
|
|
url = f"{base_url}/efoTraits/{trait_id}/associations"
|
|
all_associations = []
|
|
page = 0
|
|
|
|
while True:
|
|
params = {"page": page, "size": 100}
|
|
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
|
|
|
|
if response.status_code != 200:
|
|
break
|
|
|
|
data = response.json()
|
|
associations = data.get('_embedded', {}).get('associations', [])
|
|
|
|
if not associations:
|
|
break
|
|
|
|
all_associations.extend(associations)
|
|
page += 1
|
|
|
|
return all_associations
|
|
```
|
|
|
|
### HAL Links
|
|
|
|
Responses include `_links` for resource navigation:
|
|
|
|
```python
|
|
import requests
|
|
|
|
# Get study and follow links to associations
|
|
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795")
|
|
study = response.json()
|
|
|
|
# Follow link to associations
|
|
associations_url = study['_links']['associations']['href']
|
|
associations_response = requests.get(associations_url)
|
|
associations = associations_response.json()
|
|
```
|
|
|
|
## Summary Statistics API
|
|
|
|
Access full GWAS summary statistics for studies that have deposited complete data.
|
|
|
|
### Base URL
|
|
```
|
|
https://www.ebi.ac.uk/gwas/summary-statistics/api
|
|
```
|
|
|
|
### Core Endpoints
|
|
|
|
#### 1. Studies
|
|
|
|
**Get all studies with summary statistics:**
|
|
```
|
|
GET /studies
|
|
```
|
|
|
|
**Get specific study:**
|
|
```
|
|
GET /studies/{gcstId}
|
|
```
|
|
|
|
#### 2. Traits
|
|
|
|
**Get trait information:**
|
|
```
|
|
GET /traits/{efoId}
|
|
```
|
|
|
|
**Get associations for a trait:**
|
|
```
|
|
GET /traits/{efoId}/associations
|
|
```
|
|
|
|
**Query Parameters:**
|
|
- `p_lower`: Lower p-value threshold
|
|
- `p_upper`: Upper p-value threshold
|
|
- `size`: Number of results
|
|
- `page`: Page number
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Find highly significant associations for a trait
|
|
trait_id = "EFO_0001360"
|
|
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
|
|
url = f"{base_url}/traits/{trait_id}/associations"
|
|
params = {
|
|
"p_upper": "0.000000001", # p < 1e-9
|
|
"size": 100
|
|
}
|
|
response = requests.get(url, params=params)
|
|
results = response.json()
|
|
```
|
|
|
|
#### 3. Chromosomes
|
|
|
|
**Get associations by chromosome:**
|
|
```
|
|
GET /chromosomes/{chromosome}/associations
|
|
```
|
|
|
|
**Query by genomic region:**
|
|
```
|
|
GET /chromosomes/{chromosome}/associations?start={start}&end={end}
|
|
```
|
|
|
|
**Example:**
|
|
```python
|
|
import requests
|
|
|
|
# Query variants in a specific region
|
|
chromosome = "10"
|
|
start_pos = 114000000
|
|
end_pos = 115000000
|
|
|
|
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
|
|
url = f"{base_url}/chromosomes/{chromosome}/associations"
|
|
params = {
|
|
"start": start_pos,
|
|
"end": end_pos,
|
|
"size": 1000
|
|
}
|
|
response = requests.get(url, params=params)
|
|
variants = response.json()
|
|
```
|
|
|
|
#### 4. Variants
|
|
|
|
**Get specific variant across studies:**
|
|
```
|
|
GET /variants/{variantId}
|
|
```
|
|
|
|
**Search by variant ID:**
|
|
```
|
|
GET /variants/{variantId}/associations
|
|
```
|
|
|
|
### Response Fields
|
|
|
|
**Association Fields:**
|
|
- `variant_id`: Variant identifier
|
|
- `chromosome`: Chromosome number
|
|
- `base_pair_location`: Position (bp)
|
|
- `effect_allele`: Effect allele
|
|
- `other_allele`: Reference allele
|
|
- `effect_allele_frequency`: Allele frequency
|
|
- `beta`: Effect size
|
|
- `standard_error`: Standard error
|
|
- `p_value`: P-value
|
|
- `ci_lower`: Lower confidence interval
|
|
- `ci_upper`: Upper confidence interval
|
|
- `odds_ratio`: Odds ratio (case-control studies)
|
|
- `study_accession`: GCST ID
|
|
|
|
## Response Formats
|
|
|
|
### Content Type
|
|
|
|
All API requests should include the header:
|
|
```
|
|
Content-Type: application/json
|
|
```
|
|
|
|
### HAL Format
|
|
|
|
Responses follow the HAL (Hypertext Application Language) specification:
|
|
|
|
```json
|
|
{
|
|
"_embedded": {
|
|
"associations": [
|
|
{
|
|
"rsId": "rs7903146",
|
|
"pvalue": 1.2e-30,
|
|
"efoTrait": "type 2 diabetes",
|
|
"_links": {
|
|
"self": {
|
|
"href": "https://www.ebi.ac.uk/gwas/rest/api/associations/12345"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"_links": {
|
|
"self": {
|
|
"href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=0"
|
|
},
|
|
"next": {
|
|
"href": "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360/associations?page=1"
|
|
}
|
|
},
|
|
"page": {
|
|
"size": 20,
|
|
"totalElements": 1523,
|
|
"totalPages": 77,
|
|
"number": 0
|
|
}
|
|
}
|
|
```
|
|
|
|
### Page Metadata
|
|
|
|
Paginated responses include page information:
|
|
- `size`: Items per page
|
|
- `totalElements`: Total number of results
|
|
- `totalPages`: Total number of pages
|
|
- `number`: Current page number (0-indexed)
|
|
|
|
## Error Handling
|
|
|
|
### HTTP Status Codes
|
|
|
|
- `200 OK`: Successful request
|
|
- `400 Bad Request`: Invalid parameters
|
|
- `404 Not Found`: Resource not found
|
|
- `500 Internal Server Error`: Server error
|
|
|
|
### Error Response Format
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-10-19T12:00:00.000+00:00",
|
|
"status": 404,
|
|
"error": "Not Found",
|
|
"message": "No association found with id: 12345",
|
|
"path": "/gwas/rest/api/associations/12345"
|
|
}
|
|
```
|
|
|
|
### Error Handling Example
|
|
|
|
```python
|
|
import requests
|
|
|
|
def safe_api_request(url, params=None):
|
|
"""Make API request with error handling"""
|
|
try:
|
|
response = requests.get(url, params=params, timeout=30)
|
|
response.raise_for_status()
|
|
return response.json()
|
|
except requests.exceptions.HTTPError as e:
|
|
print(f"HTTP Error: {e}")
|
|
print(f"Response: {response.text}")
|
|
return None
|
|
except requests.exceptions.ConnectionError:
|
|
print("Connection error - check network")
|
|
return None
|
|
except requests.exceptions.Timeout:
|
|
print("Request timed out")
|
|
return None
|
|
except requests.exceptions.RequestException as e:
|
|
print(f"Request error: {e}")
|
|
return None
|
|
```
|
|
|
|
## Advanced Query Patterns
|
|
|
|
### 1. Cross-referencing Variants and Traits
|
|
|
|
```python
|
|
import requests
|
|
|
|
def get_variant_pleiotropy(rs_id):
|
|
"""Get all traits associated with a variant"""
|
|
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
|
|
url = f"{base_url}/singleNucleotidePolymorphisms/{rs_id}/associations"
|
|
params = {"projection": "associationBySnp"}
|
|
|
|
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
|
|
data = response.json()
|
|
|
|
traits = {}
|
|
for assoc in data.get('_embedded', {}).get('associations', []):
|
|
trait = assoc.get('efoTrait')
|
|
pvalue = assoc.get('pvalue')
|
|
if trait:
|
|
if trait not in traits or float(pvalue) < float(traits[trait]):
|
|
traits[trait] = pvalue
|
|
|
|
return traits
|
|
|
|
# Example usage
|
|
pleiotropy = get_variant_pleiotropy('rs7903146')
|
|
for trait, pval in sorted(pleiotropy.items(), key=lambda x: float(x[1])):
|
|
print(f"{trait}: p={pval}")
|
|
```
|
|
|
|
### 2. Filtering by P-value Threshold
|
|
|
|
```python
|
|
import requests
|
|
|
|
def get_significant_associations(trait_id, p_threshold=5e-8):
|
|
"""Get genome-wide significant associations"""
|
|
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
|
|
url = f"{base_url}/efoTraits/{trait_id}/associations"
|
|
|
|
results = []
|
|
page = 0
|
|
|
|
while True:
|
|
params = {"page": page, "size": 100}
|
|
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
|
|
|
|
if response.status_code != 200:
|
|
break
|
|
|
|
data = response.json()
|
|
associations = data.get('_embedded', {}).get('associations', [])
|
|
|
|
if not associations:
|
|
break
|
|
|
|
for assoc in associations:
|
|
pvalue = assoc.get('pvalue')
|
|
if pvalue and float(pvalue) <= p_threshold:
|
|
results.append(assoc)
|
|
|
|
page += 1
|
|
|
|
return results
|
|
```
|
|
|
|
### 3. Combining Main and Summary Statistics APIs
|
|
|
|
```python
|
|
import requests
|
|
|
|
def get_complete_variant_data(rs_id):
|
|
"""Get variant data from both APIs"""
|
|
main_url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
|
|
|
|
# Get basic variant info
|
|
response = requests.get(main_url, headers={"Content-Type": "application/json"})
|
|
variant_info = response.json()
|
|
|
|
# Get associations
|
|
assoc_url = f"{main_url}/associations"
|
|
response = requests.get(assoc_url, headers={"Content-Type": "application/json"})
|
|
associations = response.json()
|
|
|
|
# Could also query summary statistics API for this variant
|
|
# across all studies with summary data
|
|
|
|
return {
|
|
"variant": variant_info,
|
|
"associations": associations
|
|
}
|
|
```
|
|
|
|
### 4. Genomic Region Queries
|
|
|
|
```python
|
|
import requests
|
|
|
|
def query_region(chromosome, start, end, p_threshold=None):
|
|
"""Query variants in genomic region"""
|
|
# From main API
|
|
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
|
|
url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange"
|
|
params = {
|
|
"chrom": chromosome,
|
|
"bpStart": start,
|
|
"bpEnd": end,
|
|
"size": 1000
|
|
}
|
|
|
|
response = requests.get(url, params=params, headers={"Content-Type": "application/json"})
|
|
variants = response.json()
|
|
|
|
# Can also query summary statistics API
|
|
sumstats_url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chromosome}/associations"
|
|
sumstats_params = {"start": start, "end": end, "size": 1000}
|
|
if p_threshold:
|
|
sumstats_params["p_upper"] = str(p_threshold)
|
|
|
|
sumstats_response = requests.get(sumstats_url, params=sumstats_params)
|
|
sumstats = sumstats_response.json()
|
|
|
|
return {
|
|
"catalog_variants": variants,
|
|
"summary_stats": sumstats
|
|
}
|
|
```
|
|
|
|
## Integration Examples
|
|
|
|
### Complete Workflow: Disease Genetic Architecture
|
|
|
|
```python
|
|
import requests
|
|
import pandas as pd
|
|
from time import sleep
|
|
|
|
class GWASCatalogQuery:
|
|
def __init__(self):
|
|
self.base_url = "https://www.ebi.ac.uk/gwas/rest/api"
|
|
self.headers = {"Content-Type": "application/json"}
|
|
|
|
def get_trait_associations(self, trait_id, p_threshold=5e-8):
|
|
"""Get all associations for a trait"""
|
|
url = f"{self.base_url}/efoTraits/{trait_id}/associations"
|
|
results = []
|
|
page = 0
|
|
|
|
while True:
|
|
params = {"page": page, "size": 100}
|
|
response = requests.get(url, params=params, headers=self.headers)
|
|
|
|
if response.status_code != 200:
|
|
break
|
|
|
|
data = response.json()
|
|
associations = data.get('_embedded', {}).get('associations', [])
|
|
|
|
if not associations:
|
|
break
|
|
|
|
for assoc in associations:
|
|
pvalue = assoc.get('pvalue')
|
|
if pvalue and float(pvalue) <= p_threshold:
|
|
results.append({
|
|
'rs_id': assoc.get('rsId'),
|
|
'pvalue': float(pvalue),
|
|
'risk_allele': assoc.get('strongestAllele'),
|
|
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
|
|
'study': assoc.get('studyId'),
|
|
'pubmed_id': assoc.get('pubmedId')
|
|
})
|
|
|
|
page += 1
|
|
sleep(0.1)
|
|
|
|
return pd.DataFrame(results)
|
|
|
|
def get_variant_details(self, rs_id):
|
|
"""Get detailed variant information"""
|
|
url = f"{self.base_url}/singleNucleotidePolymorphisms/{rs_id}"
|
|
response = requests.get(url, headers=self.headers)
|
|
|
|
if response.status_code == 200:
|
|
return response.json()
|
|
return None
|
|
|
|
def get_gene_associations(self, gene_name):
|
|
"""Get variants associated with a gene"""
|
|
url = f"{self.base_url}/singleNucleotidePolymorphisms/search/findByGene"
|
|
params = {"geneName": gene_name}
|
|
response = requests.get(url, params=params, headers=self.headers)
|
|
|
|
if response.status_code == 200:
|
|
return response.json()
|
|
return None
|
|
|
|
# Example usage
|
|
gwas = GWASCatalogQuery()
|
|
|
|
# Query type 2 diabetes associations
|
|
df = gwas.get_trait_associations('EFO_0001360')
|
|
print(f"Found {len(df)} genome-wide significant associations")
|
|
print(f"Unique variants: {df['rs_id'].nunique()}")
|
|
|
|
# Get top variants
|
|
top_variants = df.nsmallest(10, 'pvalue')
|
|
print("\nTop 10 variants:")
|
|
print(top_variants[['rs_id', 'pvalue', 'risk_allele']])
|
|
|
|
# Get details for top variant
|
|
if len(top_variants) > 0:
|
|
top_rs = top_variants.iloc[0]['rs_id']
|
|
variant_info = gwas.get_variant_details(top_rs)
|
|
if variant_info:
|
|
loc = variant_info.get('locations', [{}])[0]
|
|
print(f"\n{top_rs} location: chr{loc.get('chromosomeName')}:{loc.get('chromosomePosition')}")
|
|
```
|
|
|
|
### FTP Download Integration
|
|
|
|
```python
|
|
import requests
|
|
from pathlib import Path
|
|
|
|
def download_summary_statistics(gcst_id, output_dir="."):
|
|
"""Download summary statistics from FTP"""
|
|
# FTP URL pattern
|
|
ftp_base = "http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics"
|
|
|
|
# Try harmonised file first
|
|
harmonised_url = f"{ftp_base}/{gcst_id}/harmonised/{gcst_id}-harmonised.tsv.gz"
|
|
|
|
output_path = Path(output_dir) / f"{gcst_id}.tsv.gz"
|
|
|
|
try:
|
|
response = requests.get(harmonised_url, stream=True)
|
|
response.raise_for_status()
|
|
|
|
with open(output_path, 'wb') as f:
|
|
for chunk in response.iter_content(chunk_size=8192):
|
|
f.write(chunk)
|
|
|
|
print(f"Downloaded {gcst_id} to {output_path}")
|
|
return output_path
|
|
|
|
except requests.exceptions.HTTPError:
|
|
print(f"Harmonised file not found for {gcst_id}")
|
|
return None
|
|
|
|
# Example usage
|
|
download_summary_statistics("GCST001234", output_dir="./sumstats")
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
- **Interactive API Documentation**: https://www.ebi.ac.uk/gwas/rest/docs/api
|
|
- **Summary Statistics API Docs**: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
|
|
- **Workshop Materials**: https://github.com/EBISPOT/GWAS_Catalog-workshop
|
|
- **Blog Post on API v2**: https://ebispot.github.io/gwas-blog/rest-api-v2-release/
|
|
- **R Package (gwasrapidd)**: https://cran.r-project.org/package=gwasrapidd
|