Initial commit
This commit is contained in:
249
skills/opentargets-database/references/api_reference.md
Normal file
249
skills/opentargets-database/references/api_reference.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Open Targets Platform API Reference
|
||||
|
||||
## API Endpoint
|
||||
|
||||
```
|
||||
https://api.platform.opentargets.org/api/v4/graphql
|
||||
```
|
||||
|
||||
Interactive GraphQL playground with documentation:
|
||||
```
|
||||
https://api.platform.opentargets.org/api/v4/graphql/browser
|
||||
```
|
||||
|
||||
## Access Methods
|
||||
|
||||
The Open Targets Platform provides multiple access methods:
|
||||
|
||||
1. **GraphQL API** - Best for single entity queries and flexible data retrieval
|
||||
2. **Web Interface** - Interactive platform at https://platform.opentargets.org
|
||||
3. **Data Downloads** - FTP at https://ftp.ebi.ac.uk/pub/databases/opentargets/platform/
|
||||
4. **Google BigQuery** - For large-scale systematic queries
|
||||
|
||||
## Authentication
|
||||
|
||||
No authentication is required for the GraphQL API. All data is freely accessible.
|
||||
|
||||
## Rate Limits
|
||||
|
||||
For systematic queries involving multiple targets or diseases, use dataset downloads or BigQuery instead of repeated API calls. The API is optimized for single-entity and exploratory queries.
|
||||
|
||||
## GraphQL Query Structure
|
||||
|
||||
GraphQL queries consist of:
|
||||
1. Query operation with optional variables
|
||||
2. Field selection (request only needed fields)
|
||||
3. Nested entity traversal
|
||||
|
||||
### Basic Python Example
|
||||
|
||||
```python
|
||||
import requests
|
||||
import json
|
||||
|
||||
# Define the query
|
||||
query_string = """
|
||||
query target($ensemblId: String!){
|
||||
target(ensemblId: $ensemblId){
|
||||
id
|
||||
approvedSymbol
|
||||
biotype
|
||||
geneticConstraint {
|
||||
constraintType
|
||||
exp
|
||||
obs
|
||||
score
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
# Define variables
|
||||
variables = {"ensemblId": "ENSG00000169083"}
|
||||
|
||||
# Make the request
|
||||
base_url = "https://api.platform.opentargets.org/api/v4/graphql"
|
||||
response = requests.post(base_url, json={"query": query_string, "variables": variables})
|
||||
data = json.loads(response.text)
|
||||
print(data)
|
||||
```
|
||||
|
||||
## Available Query Endpoints
|
||||
|
||||
### /target
|
||||
Retrieve gene annotations, tractability assessments, and disease associations.
|
||||
|
||||
**Common fields:**
|
||||
- `id` - Ensembl gene ID
|
||||
- `approvedSymbol` - HGNC gene symbol
|
||||
- `approvedName` - Full gene name
|
||||
- `biotype` - Gene type (protein_coding, etc.)
|
||||
- `tractability` - Druggability assessment
|
||||
- `safetyLiabilities` - Safety information
|
||||
- `expressions` - Baseline expression data
|
||||
- `knownDrugs` - Approved/clinical drugs
|
||||
- `associatedDiseases` - Disease associations with evidence
|
||||
|
||||
### /disease
|
||||
Retrieve disease/phenotype data, known drugs, and clinical information.
|
||||
|
||||
**Common fields:**
|
||||
- `id` - EFO disease identifier
|
||||
- `name` - Disease name
|
||||
- `description` - Disease description
|
||||
- `therapeuticAreas` - High-level disease categories
|
||||
- `synonyms` - Alternative names
|
||||
- `knownDrugs` - Drugs indicated for disease
|
||||
- `associatedTargets` - Target associations with evidence
|
||||
|
||||
### /drug
|
||||
Retrieve compound details, mechanisms of action, and pharmacovigilance data.
|
||||
|
||||
**Common fields:**
|
||||
- `id` - ChEMBL identifier
|
||||
- `name` - Drug name
|
||||
- `drugType` - Small molecule, antibody, etc.
|
||||
- `maximumClinicalTrialPhase` - Development stage
|
||||
- `indications` - Disease indications
|
||||
- `mechanismsOfAction` - Target mechanisms
|
||||
- `adverseEvents` - Pharmacovigilance data
|
||||
|
||||
### /search
|
||||
Search across all entities (targets, diseases, drugs).
|
||||
|
||||
**Parameters:**
|
||||
- `queryString` - Search term
|
||||
- `entityNames` - Filter by entity type(s)
|
||||
- `page` - Pagination
|
||||
|
||||
### /associationDiseaseIndirect
|
||||
Retrieve target-disease associations including indirect evidence from disease descendants in ontology.
|
||||
|
||||
**Key fields:**
|
||||
- `rows` - Association records with scores
|
||||
- `aggregations` - Aggregated statistics
|
||||
|
||||
## Example Queries
|
||||
|
||||
### Query 1: Get target information with disease associations
|
||||
|
||||
```python
|
||||
query = """
|
||||
query targetInfo($ensemblId: String!) {
|
||||
target(ensemblId: $ensemblId) {
|
||||
approvedSymbol
|
||||
approvedName
|
||||
tractability {
|
||||
label
|
||||
modality
|
||||
value
|
||||
}
|
||||
associatedDiseases(page: {size: 10}) {
|
||||
rows {
|
||||
disease {
|
||||
name
|
||||
}
|
||||
score
|
||||
datatypeScores {
|
||||
componentId
|
||||
score
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
variables = {"ensemblId": "ENSG00000157764"}
|
||||
```
|
||||
|
||||
### Query 2: Search for diseases
|
||||
|
||||
```python
|
||||
query = """
|
||||
query searchDiseases($queryString: String!) {
|
||||
search(queryString: $queryString, entityNames: ["disease"]) {
|
||||
hits {
|
||||
id
|
||||
entity
|
||||
name
|
||||
description
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
variables = {"queryString": "alzheimer"}
|
||||
```
|
||||
|
||||
### Query 3: Get evidence for target-disease pair
|
||||
|
||||
```python
|
||||
query = """
|
||||
query evidences($ensemblId: String!, $efoId: String!) {
|
||||
disease(efoId: $efoId) {
|
||||
evidences(ensemblIds: [$ensemblId], size: 100) {
|
||||
rows {
|
||||
datasourceId
|
||||
datatypeId
|
||||
score
|
||||
studyId
|
||||
literature
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
variables = {"ensemblId": "ENSG00000157764", "efoId": "EFO_0000249"}
|
||||
```
|
||||
|
||||
### Query 4: Get known drugs for a disease
|
||||
|
||||
```python
|
||||
query = """
|
||||
query knownDrugs($efoId: String!) {
|
||||
disease(efoId: $efoId) {
|
||||
knownDrugs {
|
||||
uniqueDrugs
|
||||
rows {
|
||||
drug {
|
||||
name
|
||||
id
|
||||
}
|
||||
targets {
|
||||
approvedSymbol
|
||||
}
|
||||
phase
|
||||
status
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
variables = {"efoId": "EFO_0000249"}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
GraphQL returns status code 200 even for errors. Check the response structure:
|
||||
|
||||
```python
|
||||
if 'errors' in response_data:
|
||||
print(f"GraphQL errors: {response_data['errors']}")
|
||||
else:
|
||||
print(f"Data: {response_data['data']}")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Request only needed fields** - Minimize data transfer and improve response time
|
||||
2. **Use variables** - Make queries reusable and safer
|
||||
3. **Handle pagination** - Most list fields support pagination with `page: {size: N, index: M}`
|
||||
4. **Explore the schema** - Use the GraphQL browser to discover available fields
|
||||
5. **Batch related queries** - Combine multiple entity fetches in a single query when possible
|
||||
6. **Cache results** - Store frequently accessed data locally to reduce API calls
|
||||
7. **Use BigQuery for bulk** - Switch to BigQuery/downloads for systematic analyses
|
||||
|
||||
## Data Licensing
|
||||
|
||||
All Open Targets Platform data is freely available. When using the data in research or commercial products, cite the latest publication:
|
||||
|
||||
Ochoa, D. et al. (2025) Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Research, 53(D1):D1467-D1477.
|
||||
306
skills/opentargets-database/references/evidence_types.md
Normal file
306
skills/opentargets-database/references/evidence_types.md
Normal file
@@ -0,0 +1,306 @@
|
||||
# Evidence Types and Data Sources
|
||||
|
||||
## Overview
|
||||
|
||||
Evidence represents any event or set of events that identifies a target as a potential causal gene or protein for a disease. Evidence is standardized and mapped to:
|
||||
- **Ensembl gene IDs** for targets
|
||||
- **EFO (Experimental Factor Ontology)** for diseases/phenotypes
|
||||
|
||||
Evidence is organized into **data types** (broader categories) and **data sources** (specific databases/studies).
|
||||
|
||||
## Evidence Data Types
|
||||
|
||||
### 1. Genetic Association
|
||||
|
||||
Evidence from human genetics linking genetic variants to disease phenotypes.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**GWAS (Genome-Wide Association Studies)**
|
||||
- Population-level common variant associations
|
||||
- Filtered with Locus-to-Gene (L2G) scores >0.05
|
||||
- Includes fine-mapping and colocalization data
|
||||
- Sources: GWAS Catalog, FinnGen, UK Biobank, EBI GWAS
|
||||
|
||||
**Gene Burden Tests**
|
||||
- Rare variant association analyses
|
||||
- Aggregate effects of multiple rare variants in a gene
|
||||
- Particularly relevant for Mendelian and rare diseases
|
||||
|
||||
**ClinVar Germline**
|
||||
- Clinical variant interpretations
|
||||
- Classifications: pathogenic, likely pathogenic, VUS, benign
|
||||
- Expert-reviewed variant-disease associations
|
||||
|
||||
**Genomics England PanelApp**
|
||||
- Expert gene-disease ratings
|
||||
- Green (confirmed), amber (probable), red (no evidence)
|
||||
- Focus on rare diseases and cancer
|
||||
|
||||
**Gene2Phenotype**
|
||||
- Curated gene-disease relationships
|
||||
- Allelic requirements and inheritance patterns
|
||||
- Clinical validity assessments
|
||||
|
||||
**UniProt Literature & Variants**
|
||||
- Literature-based gene-disease associations
|
||||
- Expert-curated from scientific publications
|
||||
|
||||
**Orphanet**
|
||||
- Rare disease gene associations
|
||||
- Expert-reviewed and maintained
|
||||
|
||||
**ClinGen**
|
||||
- Clinical genome resource classifications
|
||||
- Gene-disease validity assertions
|
||||
|
||||
### 2. Somatic Mutations
|
||||
|
||||
Evidence from cancer genomics identifying driver genes and therapeutic targets.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**Cancer Gene Census**
|
||||
- Expert-curated cancer genes
|
||||
- Tier classifications (1 = strong evidence, 2 = emerging)
|
||||
- Mutation types and cancer types
|
||||
|
||||
**IntOGen**
|
||||
- Computational driver gene predictions
|
||||
- Aggregated from large cohort studies
|
||||
- Statistical significance of mutations
|
||||
|
||||
**ClinVar Somatic**
|
||||
- Somatic clinical variant interpretations
|
||||
- Oncogenic/likely oncogenic classifications
|
||||
|
||||
**Cancer Biomarkers**
|
||||
- FDA/EMA approved biomarkers
|
||||
- Clinical trial biomarkers
|
||||
- Prognostic and predictive markers
|
||||
|
||||
### 3. Known Drugs
|
||||
|
||||
Evidence from clinical precedence showing drugs targeting genes for disease indications.
|
||||
|
||||
#### Data Source:
|
||||
|
||||
**ChEMBL**
|
||||
- Approved drugs (Phase 4)
|
||||
- Clinical candidates (Phase 1-3)
|
||||
- Withdrawn drugs
|
||||
- Drug-target-indication triplets with mechanism of action
|
||||
|
||||
**Clinical Trial Information:**
|
||||
- `phase`: Maximum clinical trial phase (1, 2, 3, 4)
|
||||
- `status`: Active, terminated, completed, withdrawn
|
||||
- `mechanismOfAction`: How drug affects target
|
||||
|
||||
### 4. Affected Pathways
|
||||
|
||||
Evidence linking genes to disease through pathway perturbations and functional screens.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**CRISPR Screens**
|
||||
- Genome-scale knockout screens
|
||||
- Cancer dependency and essentiality data
|
||||
|
||||
**Project Score (Cancer Dependency Map)**
|
||||
- CRISPR-Cas9 fitness screens across cancer cell lines
|
||||
- Gene essentiality profiles
|
||||
|
||||
**SLAPenrich**
|
||||
- Pathway enrichment analysis
|
||||
- Somatic mutation pathway impacts
|
||||
|
||||
**PROGENy**
|
||||
- Pathway activity inference
|
||||
- Signaling pathway perturbations
|
||||
|
||||
**Reactome**
|
||||
- Expert-curated pathway annotations
|
||||
- Biological pathway representations
|
||||
|
||||
**Gene Signatures**
|
||||
- Expression-based signatures
|
||||
- Pathway activity patterns
|
||||
|
||||
### 5. RNA Expression
|
||||
|
||||
Evidence from differential gene expression in disease vs. control tissues.
|
||||
|
||||
#### Data Source:
|
||||
|
||||
**Expression Atlas**
|
||||
- Differential expression data
|
||||
- Baseline expression across tissues/conditions
|
||||
- RNA-Seq and microarray studies
|
||||
- Log2 fold-change and p-values
|
||||
|
||||
### 6. Animal Models
|
||||
|
||||
Evidence from in vivo studies showing phenotypes associated with gene perturbations.
|
||||
|
||||
#### Data Source:
|
||||
|
||||
**IMPC (International Mouse Phenotyping Consortium)**
|
||||
- Systematic mouse knockout phenotypes
|
||||
- Phenotype-disease mappings via ontologies
|
||||
- Standardized phenotyping procedures
|
||||
|
||||
### 7. Literature
|
||||
|
||||
Evidence from text-mining of biomedical literature.
|
||||
|
||||
#### Data Source:
|
||||
|
||||
**Europe PMC**
|
||||
- Co-occurrence of genes and diseases in abstracts
|
||||
- Normalized citation counts
|
||||
- Weighted by publication type and recency
|
||||
|
||||
## Evidence Scoring
|
||||
|
||||
Each evidence source has its own scoring methodology:
|
||||
|
||||
### Score Ranges
|
||||
- Most scores normalized to 0-1 range
|
||||
- Higher scores indicate stronger evidence
|
||||
- Scores are NOT confidence levels but relative strength indicators
|
||||
|
||||
### Common Scoring Approaches:
|
||||
|
||||
**Binary Classifications:**
|
||||
- ClinVar: Pathogenic (1.0), Likely pathogenic (0.99), etc.
|
||||
- Gene2Phenotype: Confirmed/probable ratings
|
||||
- PanelApp: Green/amber/red classifications
|
||||
|
||||
**Statistical Measures:**
|
||||
- GWAS: L2G scores incorporating multiple lines of evidence
|
||||
- Gene Burden: Statistical significance of variant aggregation
|
||||
- Expression: Adjusted p-values and fold-changes
|
||||
|
||||
**Clinical Precedence:**
|
||||
- Known Drugs: Phase weights (Phase 4 = 1.0, Phase 3 = 0.8, etc.)
|
||||
- Clinical status modifiers
|
||||
|
||||
**Computational Predictions:**
|
||||
- IntOGen: Q-values from driver mutation analysis
|
||||
- PROGENy/SLAPenrich: Pathway activity/enrichment scores
|
||||
|
||||
## Evidence Interpretation Guidelines
|
||||
|
||||
### Strengths by Data Type
|
||||
|
||||
**Genetic Association** - Strongest human genetic evidence
|
||||
- Direct link between genetic variation and disease
|
||||
- Mendelian diseases: high confidence
|
||||
- GWAS: requires L2G to identify causal gene
|
||||
- Consider ancestry and population-specific effects
|
||||
|
||||
**Somatic Mutations** - Direct evidence in cancer
|
||||
- Strong for oncology indications
|
||||
- Driver mutations indicate therapeutic potential
|
||||
- Consider cancer type specificity
|
||||
|
||||
**Known Drugs** - Clinical validation
|
||||
- Highest confidence: approved drugs (Phase 4)
|
||||
- Consider mechanism relevance to new indication
|
||||
- Phase 1-2: early evidence, higher risk
|
||||
|
||||
**Affected Pathways** - Mechanistic insights
|
||||
- Supports biological plausibility
|
||||
- May not predict clinical success
|
||||
- Useful for hypothesis generation
|
||||
|
||||
**RNA Expression** - Observational evidence
|
||||
- Correlation, not causation
|
||||
- May reflect disease consequence vs. cause
|
||||
- Useful for biomarker identification
|
||||
|
||||
**Animal Models** - Translational evidence
|
||||
- Strong for understanding biology
|
||||
- Variable translation to human disease
|
||||
- Most useful when phenotype matches human disease
|
||||
|
||||
**Literature** - Exploratory signal
|
||||
- Text-mining captures research focus
|
||||
- May reflect publication bias
|
||||
- Requires manual literature review for validation
|
||||
|
||||
### Important Considerations
|
||||
|
||||
1. **Multiple evidence types strengthen confidence** - Convergent evidence from different data types provides stronger support
|
||||
|
||||
2. **Under-studied diseases score lower** - Novel or rare diseases may have strong evidence but lower aggregate scores due to limited research
|
||||
|
||||
3. **Association scores are not probabilities** - Scores rank relative evidence strength, not success probability
|
||||
|
||||
4. **Context matters** - Evidence strength depends on:
|
||||
- Disease mechanism understanding
|
||||
- Target biology and druggability
|
||||
- Clinical precedence in related indications
|
||||
- Safety considerations
|
||||
|
||||
5. **Data source reliability varies** - Weight expert-curated sources (ClinGen, Gene2Phenotype) higher than computational predictions
|
||||
|
||||
## Using Evidence in Queries
|
||||
|
||||
### Filtering by Data Type
|
||||
|
||||
```python
|
||||
query = """
|
||||
query evidenceByType($ensemblId: String!, $efoId: String!, $dataTypes: [String!]) {
|
||||
disease(efoId: $efoId) {
|
||||
evidences(ensemblIds: [$ensemblId], datatypes: $dataTypes) {
|
||||
rows {
|
||||
datasourceId
|
||||
score
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
variables = {
|
||||
"ensemblId": "ENSG00000157764",
|
||||
"efoId": "EFO_0000249",
|
||||
"dataTypes": ["genetic_association", "somatic_mutation"]
|
||||
}
|
||||
```
|
||||
|
||||
### Accessing Data Type Scores
|
||||
|
||||
Data type scores aggregate all source scores within that type:
|
||||
|
||||
```python
|
||||
query = """
|
||||
query associationScores($ensemblId: String!, $efoId: String!) {
|
||||
target(ensemblId: $ensemblId) {
|
||||
associatedDiseases(efoIds: [$efoId]) {
|
||||
rows {
|
||||
disease {
|
||||
name
|
||||
}
|
||||
score
|
||||
datatypeScores {
|
||||
componentId
|
||||
score
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
```
|
||||
|
||||
## Evidence Quality Assessment
|
||||
|
||||
When evaluating evidence:
|
||||
|
||||
1. **Check multiple sources** - Single source may be unreliable
|
||||
2. **Prioritize human genetic evidence** - Strongest disease relevance
|
||||
3. **Consider clinical precedence** - Known drugs indicate druggability
|
||||
4. **Assess mechanistic support** - Pathway evidence supports biology
|
||||
5. **Review literature manually** - For critical decisions, read primary publications
|
||||
6. **Validate in primary databases** - Cross-reference with ClinVar, ClinGen, etc.
|
||||
401
skills/opentargets-database/references/target_annotations.md
Normal file
401
skills/opentargets-database/references/target_annotations.md
Normal file
@@ -0,0 +1,401 @@
|
||||
# Target Annotations and Features
|
||||
|
||||
## Overview
|
||||
|
||||
Open Targets defines a target as "any naturally-occurring molecule that can be targeted by a medicinal product." Targets are primarily protein-coding genes identified by Ensembl gene IDs, but also include RNAs and pseudogenes from canonical chromosomes.
|
||||
|
||||
## Core Target Annotations
|
||||
|
||||
### 1. Tractability Assessment
|
||||
|
||||
Tractability evaluates the druggability potential of a target across different modalities.
|
||||
|
||||
#### Modalities Assessed:
|
||||
|
||||
**Small Molecule**
|
||||
- Prediction of small molecule druggability
|
||||
- Based on structural features, chemical precedence
|
||||
- Buckets: Clinical precedence, Discovery precedence, Predicted tractable
|
||||
|
||||
**Antibody**
|
||||
- Likelihood of antibody-based therapeutic success
|
||||
- Cell surface/secreted protein location
|
||||
- Precedence categories similar to small molecules
|
||||
|
||||
**PROTAC (Protein Degradation)**
|
||||
- Assessment for targeted protein degradation
|
||||
- E3 ligase compatibility
|
||||
- Emerging modality category
|
||||
|
||||
**Other Modalities**
|
||||
- Gene therapy, RNA-based therapeutics
|
||||
- Oligonucleotide approaches
|
||||
|
||||
#### Tractability Levels:
|
||||
|
||||
1. **Clinical Precedence** - Target of approved/clinical drug with similar mechanism
|
||||
2. **Discovery Precedence** - Target of tool compounds or compounds in preclinical development
|
||||
3. **Predicted Tractable** - Computational predictions suggest druggability
|
||||
4. **Unknown** - Insufficient data to assess
|
||||
|
||||
### 2. Safety Liabilities
|
||||
|
||||
Safety information aggregated from multiple sources to identify potential toxicity concerns.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**ToxCast**
|
||||
- High-throughput toxicology screening data
|
||||
- In vitro assay results
|
||||
- Toxicity pathway activation
|
||||
|
||||
**AOPWiki (Adverse Outcome Pathways)**
|
||||
- Mechanistic pathways from molecular initiating event to adverse outcome
|
||||
- Systems toxicology frameworks
|
||||
|
||||
**PharmGKB**
|
||||
- Pharmacogenomic relationships
|
||||
- Genetic variants affecting drug response and toxicity
|
||||
|
||||
**Published Literature**
|
||||
- Expert-curated safety concerns from publications
|
||||
- Clinical trial adverse events
|
||||
|
||||
#### Safety Flags:
|
||||
|
||||
- **Organ toxicity** - Liver, kidney, cardiac effects
|
||||
- **Target safety liability** - Known on-target toxic effects
|
||||
- **Off-target effects** - Unintended activity concerns
|
||||
- **Clinical observations** - Adverse events from drugs targeting gene
|
||||
|
||||
### 3. Baseline Expression
|
||||
|
||||
Gene/protein expression across tissues and cell types from multiple sources.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**Expression Atlas**
|
||||
- RNA-Seq expression across tissues/conditions
|
||||
- Normalized expression levels (TPM, FPKM)
|
||||
- Differential expression studies
|
||||
|
||||
**GTEx (Genotype-Tissue Expression)**
|
||||
- Comprehensive tissue expression from healthy donors
|
||||
- Median TPM across 53 tissues
|
||||
- Expression variation analysis
|
||||
|
||||
**Human Protein Atlas**
|
||||
- Protein expression via immunohistochemistry
|
||||
- Subcellular localization
|
||||
- Tissue specificity classifications
|
||||
|
||||
#### Expression Metrics:
|
||||
|
||||
- **TPM (Transcripts Per Million)** - Normalized RNA abundance
|
||||
- **Tissue specificity** - Enrichment in specific tissues
|
||||
- **Protein level** - Correlation with RNA expression
|
||||
- **Subcellular location** - Where protein is found in cell
|
||||
|
||||
### 4. Molecular Interactions
|
||||
|
||||
Protein-protein interactions, complex memberships, and molecular partnerships.
|
||||
|
||||
#### Interaction Types:
|
||||
|
||||
**Physical Interactions**
|
||||
- Direct protein-protein binding
|
||||
- Complex components
|
||||
- Sources: IntAct, BioGRID, STRING
|
||||
|
||||
**Pathway Membership**
|
||||
- Biological pathways from Reactome
|
||||
- Functional relationships
|
||||
- Upstream/downstream regulators
|
||||
|
||||
**Target Interactors**
|
||||
- Direct interactors relevant to disease associations
|
||||
- Context-specific interactions
|
||||
|
||||
### 5. Gene Essentiality
|
||||
|
||||
Dependency data indicating if gene is essential for cell survival.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**Project Score**
|
||||
- CRISPR-Cas9 fitness screens
|
||||
- 300+ cancer cell lines
|
||||
- Scaled essentiality scores (0-1)
|
||||
|
||||
**DepMap Portal**
|
||||
- Large-scale cancer dependency data
|
||||
- Genetic and pharmacological perturbations
|
||||
- Common essential genes identification
|
||||
|
||||
#### Essentiality Metrics:
|
||||
|
||||
- **Score range**: 0 (non-essential) to 1 (essential)
|
||||
- **Context**: Cell line specific vs. pan-essential
|
||||
- **Therapeutic window**: Selectivity between disease and normal cells
|
||||
|
||||
### 6. Chemical Probes and Tool Compounds
|
||||
|
||||
High-quality small molecules for target validation.
|
||||
|
||||
#### Sources:
|
||||
|
||||
**Probes & Drugs Portal**
|
||||
- Chemical probes with characterized selectivity
|
||||
- Quality ratings and annotations
|
||||
- Target engagement data
|
||||
|
||||
**Structural Genomics Consortium (SGC)**
|
||||
- Target Enabling Packages (TEPs)
|
||||
- Comprehensive target reagents
|
||||
- Freely available to academia
|
||||
|
||||
**Probe Criteria:**
|
||||
- Potency (typically IC50 < 100 nM)
|
||||
- Selectivity (>30-fold vs. off-targets)
|
||||
- Cell activity demonstrated
|
||||
- Negative control available
|
||||
|
||||
### 7. Pharmacogenetics
|
||||
|
||||
Genetic variants affecting drug response for drugs targeting the gene.
|
||||
|
||||
#### Data Source: ClinPGx
|
||||
|
||||
**Information Included:**
|
||||
- Variant-drug pairs
|
||||
- Clinical annotations (dosing, efficacy, toxicity)
|
||||
- Evidence level and sources
|
||||
- PharmGKB cross-references
|
||||
|
||||
**Clinical Utility:**
|
||||
- Dosing adjustments based on genotype
|
||||
- Contraindications for specific variants
|
||||
- Efficacy predictors
|
||||
|
||||
### 8. Genetic Constraint
|
||||
|
||||
Measures of negative selection against variants in the gene.
|
||||
|
||||
#### Data Source: gnomAD
|
||||
|
||||
**Metrics:**
|
||||
|
||||
**pLI (probability of Loss-of-function Intolerance)**
|
||||
- Range: 0-1
|
||||
- pLI > 0.9 indicates intolerant to LoF variants
|
||||
- High pLI suggests essentiality
|
||||
|
||||
**LOEUF (Loss-of-function Observed/Expected Upper bound Fraction)**
|
||||
- Lower values indicate greater constraint
|
||||
- More interpretable than pLI across range
|
||||
|
||||
**Missense Constraint**
|
||||
- Z-scores for missense depletion
|
||||
- O/E ratios for missense variants
|
||||
|
||||
**Interpretation:**
|
||||
- High constraint suggests important biological function
|
||||
- May indicate safety concerns if inhibited
|
||||
- Essential genes often show high constraint
|
||||
|
||||
### 9. Comparative Genomics
|
||||
|
||||
Cross-species gene conservation and ortholog information.
|
||||
|
||||
#### Data Source: Ensembl Compara
|
||||
|
||||
**Ortholog Data:**
|
||||
- Mouse, rat, zebrafish, other model organisms
|
||||
- Orthology confidence (1:1, 1:many, many:many)
|
||||
- Percent identity and similarity
|
||||
|
||||
**Utility:**
|
||||
- Model organism studies transferability
|
||||
- Functional conservation assessment
|
||||
- Evolution and selective pressure
|
||||
|
||||
### 10. Cancer Annotations
|
||||
|
||||
Cancer-specific target features for oncology indications.
|
||||
|
||||
#### Data Sources:
|
||||
|
||||
**Cancer Gene Census**
|
||||
- Role in cancer (oncogene, TSG, fusion)
|
||||
- Tier classification (1 = established, 2 = emerging)
|
||||
- Tumor types and mutation types
|
||||
|
||||
**Cancer Hallmarks**
|
||||
- Functional roles in cancer biology
|
||||
- Hallmarks: proliferation, apoptosis evasion, metastasis, etc.
|
||||
- Links to specific cancer processes
|
||||
|
||||
**Oncology Clinical Trials**
|
||||
- Drugs in development targeting gene for cancer
|
||||
- Trial phases and indications
|
||||
|
||||
### 11. Mouse Phenotypes
|
||||
|
||||
Phenotypes from mouse knockout/mutation studies.
|
||||
|
||||
#### Data Source: MGI (Mouse Genome Informatics)
|
||||
|
||||
**Phenotype Data:**
|
||||
- Knockout phenotypes
|
||||
- Disease model associations
|
||||
- Mammalian Phenotype Ontology (MP) terms
|
||||
|
||||
**Utility:**
|
||||
- Predict on-target effects
|
||||
- Safety liability identification
|
||||
- Mechanism of action insights
|
||||
|
||||
### 12. Pathways
|
||||
|
||||
Biological pathway annotations placing target in functional context.
|
||||
|
||||
#### Data Source: Reactome
|
||||
|
||||
**Pathway Information:**
|
||||
- Curated biological pathways
|
||||
- Hierarchical organization
|
||||
- Pathway diagrams with target position
|
||||
|
||||
**Applications:**
|
||||
- Mechanism hypothesis generation
|
||||
- Related target identification
|
||||
- Systems biology analysis
|
||||
|
||||
## Using Target Annotations in Queries
|
||||
|
||||
### Query Template: Comprehensive Target Profile
|
||||
|
||||
```python
|
||||
query = """
|
||||
query targetProfile($ensemblId: String!) {
|
||||
target(ensemblId: $ensemblId) {
|
||||
id
|
||||
approvedSymbol
|
||||
approvedName
|
||||
biotype
|
||||
|
||||
# Tractability
|
||||
tractability {
|
||||
label
|
||||
modality
|
||||
value
|
||||
}
|
||||
|
||||
# Safety
|
||||
safetyLiabilities {
|
||||
event
|
||||
effects {
|
||||
dosing
|
||||
organsAffected
|
||||
}
|
||||
}
|
||||
|
||||
# Expression
|
||||
expressions {
|
||||
tissue {
|
||||
label
|
||||
}
|
||||
rna {
|
||||
value
|
||||
level
|
||||
}
|
||||
protein {
|
||||
level
|
||||
}
|
||||
}
|
||||
|
||||
# Chemical probes
|
||||
chemicalProbes {
|
||||
id
|
||||
probeminer
|
||||
origin
|
||||
}
|
||||
|
||||
# Known drugs
|
||||
knownDrugs {
|
||||
uniqueDrugs
|
||||
rows {
|
||||
drug {
|
||||
name
|
||||
maximumClinicalTrialPhase
|
||||
}
|
||||
phase
|
||||
status
|
||||
}
|
||||
}
|
||||
|
||||
# Genetic constraint
|
||||
geneticConstraint {
|
||||
constraintType
|
||||
score
|
||||
exp
|
||||
obs
|
||||
}
|
||||
|
||||
# Pathways
|
||||
pathways {
|
||||
pathway
|
||||
pathwayId
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
variables = {"ensemblId": "ENSG00000157764"}
|
||||
```
|
||||
|
||||
## Annotation Interpretation Guidelines
|
||||
|
||||
### For Target Prioritization:
|
||||
|
||||
1. **Druggability (Tractability):**
|
||||
- Clinical precedence >> Discovery precedence > Predicted
|
||||
- Consider modality relevant to therapeutic approach
|
||||
- Check for existing tool compounds
|
||||
|
||||
2. **Safety Assessment:**
|
||||
- Review organ toxicity signals
|
||||
- Check expression in critical tissues
|
||||
- Assess genetic constraint (high = safety concern if inhibited)
|
||||
- Evaluate clinical adverse events from drugs
|
||||
|
||||
3. **Disease Relevance:**
|
||||
- Combine with association scores
|
||||
- Check expression in disease-relevant tissues
|
||||
- Review pathway context
|
||||
|
||||
4. **Validation Readiness:**
|
||||
- Chemical probes available?
|
||||
- Model organism data supportive?
|
||||
- Known drugs provide mechanism insight?
|
||||
|
||||
5. **Clinical Path Considerations:**
|
||||
- Pharmacogenetic factors
|
||||
- Expression pattern (tissue-specific is better for selectivity)
|
||||
- Essentiality (non-essential better for safety)
|
||||
|
||||
### Red Flags:
|
||||
|
||||
- **High essentiality + ubiquitous expression** - Poor therapeutic window
|
||||
- **Multiple safety liabilities** - Toxicity concerns
|
||||
- **High genetic constraint (pLI > 0.9)** - Critical gene, inhibition may be harmful
|
||||
- **No tractability precedence** - Higher risk, longer development
|
||||
- **Conflicting evidence** - Requires deeper investigation
|
||||
|
||||
### Green Flags:
|
||||
|
||||
- **Clinical precedence + related indication** - De-risked mechanism
|
||||
- **Tissue-specific expression** - Better selectivity
|
||||
- **Chemical probes available** - Faster validation
|
||||
- **Low essentiality + disease relevance** - Good therapeutic window
|
||||
- **Multiple evidence types converge** - Higher confidence
|
||||
Reference in New Issue
Block a user