402 lines
10 KiB
Markdown
402 lines
10 KiB
Markdown
# Target Annotations and Features
|
|
|
|
## Overview
|
|
|
|
Open Targets defines a target as "any naturally-occurring molecule that can be targeted by a medicinal product." Targets are primarily protein-coding genes identified by Ensembl gene IDs, but also include RNAs and pseudogenes from canonical chromosomes.
|
|
|
|
## Core Target Annotations
|
|
|
|
### 1. Tractability Assessment
|
|
|
|
Tractability evaluates the druggability potential of a target across different modalities.
|
|
|
|
#### Modalities Assessed:
|
|
|
|
**Small Molecule**
|
|
- Prediction of small molecule druggability
|
|
- Based on structural features, chemical precedence
|
|
- Buckets: Clinical precedence, Discovery precedence, Predicted tractable
|
|
|
|
**Antibody**
|
|
- Likelihood of antibody-based therapeutic success
|
|
- Cell surface/secreted protein location
|
|
- Precedence categories similar to small molecules
|
|
|
|
**PROTAC (Protein Degradation)**
|
|
- Assessment for targeted protein degradation
|
|
- E3 ligase compatibility
|
|
- Emerging modality category
|
|
|
|
**Other Modalities**
|
|
- Gene therapy, RNA-based therapeutics
|
|
- Oligonucleotide approaches
|
|
|
|
#### Tractability Levels:
|
|
|
|
1. **Clinical Precedence** - Target of approved/clinical drug with similar mechanism
|
|
2. **Discovery Precedence** - Target of tool compounds or compounds in preclinical development
|
|
3. **Predicted Tractable** - Computational predictions suggest druggability
|
|
4. **Unknown** - Insufficient data to assess
|
|
|
|
### 2. Safety Liabilities
|
|
|
|
Safety information aggregated from multiple sources to identify potential toxicity concerns.
|
|
|
|
#### Data Sources:
|
|
|
|
**ToxCast**
|
|
- High-throughput toxicology screening data
|
|
- In vitro assay results
|
|
- Toxicity pathway activation
|
|
|
|
**AOPWiki (Adverse Outcome Pathways)**
|
|
- Mechanistic pathways from molecular initiating event to adverse outcome
|
|
- Systems toxicology frameworks
|
|
|
|
**PharmGKB**
|
|
- Pharmacogenomic relationships
|
|
- Genetic variants affecting drug response and toxicity
|
|
|
|
**Published Literature**
|
|
- Expert-curated safety concerns from publications
|
|
- Clinical trial adverse events
|
|
|
|
#### Safety Flags:
|
|
|
|
- **Organ toxicity** - Liver, kidney, cardiac effects
|
|
- **Target safety liability** - Known on-target toxic effects
|
|
- **Off-target effects** - Unintended activity concerns
|
|
- **Clinical observations** - Adverse events from drugs targeting gene
|
|
|
|
### 3. Baseline Expression
|
|
|
|
Gene/protein expression across tissues and cell types from multiple sources.
|
|
|
|
#### Data Sources:
|
|
|
|
**Expression Atlas**
|
|
- RNA-Seq expression across tissues/conditions
|
|
- Normalized expression levels (TPM, FPKM)
|
|
- Differential expression studies
|
|
|
|
**GTEx (Genotype-Tissue Expression)**
|
|
- Comprehensive tissue expression from healthy donors
|
|
- Median TPM across 53 tissues
|
|
- Expression variation analysis
|
|
|
|
**Human Protein Atlas**
|
|
- Protein expression via immunohistochemistry
|
|
- Subcellular localization
|
|
- Tissue specificity classifications
|
|
|
|
#### Expression Metrics:
|
|
|
|
- **TPM (Transcripts Per Million)** - Normalized RNA abundance
|
|
- **Tissue specificity** - Enrichment in specific tissues
|
|
- **Protein level** - Correlation with RNA expression
|
|
- **Subcellular location** - Where protein is found in cell
|
|
|
|
### 4. Molecular Interactions
|
|
|
|
Protein-protein interactions, complex memberships, and molecular partnerships.
|
|
|
|
#### Interaction Types:
|
|
|
|
**Physical Interactions**
|
|
- Direct protein-protein binding
|
|
- Complex components
|
|
- Sources: IntAct, BioGRID, STRING
|
|
|
|
**Pathway Membership**
|
|
- Biological pathways from Reactome
|
|
- Functional relationships
|
|
- Upstream/downstream regulators
|
|
|
|
**Target Interactors**
|
|
- Direct interactors relevant to disease associations
|
|
- Context-specific interactions
|
|
|
|
### 5. Gene Essentiality
|
|
|
|
Dependency data indicating if gene is essential for cell survival.
|
|
|
|
#### Data Sources:
|
|
|
|
**Project Score**
|
|
- CRISPR-Cas9 fitness screens
|
|
- 300+ cancer cell lines
|
|
- Scaled essentiality scores (0-1)
|
|
|
|
**DepMap Portal**
|
|
- Large-scale cancer dependency data
|
|
- Genetic and pharmacological perturbations
|
|
- Common essential genes identification
|
|
|
|
#### Essentiality Metrics:
|
|
|
|
- **Score range**: 0 (non-essential) to 1 (essential)
|
|
- **Context**: Cell line specific vs. pan-essential
|
|
- **Therapeutic window**: Selectivity between disease and normal cells
|
|
|
|
### 6. Chemical Probes and Tool Compounds
|
|
|
|
High-quality small molecules for target validation.
|
|
|
|
#### Sources:
|
|
|
|
**Probes & Drugs Portal**
|
|
- Chemical probes with characterized selectivity
|
|
- Quality ratings and annotations
|
|
- Target engagement data
|
|
|
|
**Structural Genomics Consortium (SGC)**
|
|
- Target Enabling Packages (TEPs)
|
|
- Comprehensive target reagents
|
|
- Freely available to academia
|
|
|
|
**Probe Criteria:**
|
|
- Potency (typically IC50 < 100 nM)
|
|
- Selectivity (>30-fold vs. off-targets)
|
|
- Cell activity demonstrated
|
|
- Negative control available
|
|
|
|
### 7. Pharmacogenetics
|
|
|
|
Genetic variants affecting drug response for drugs targeting the gene.
|
|
|
|
#### Data Source: ClinPGx
|
|
|
|
**Information Included:**
|
|
- Variant-drug pairs
|
|
- Clinical annotations (dosing, efficacy, toxicity)
|
|
- Evidence level and sources
|
|
- PharmGKB cross-references
|
|
|
|
**Clinical Utility:**
|
|
- Dosing adjustments based on genotype
|
|
- Contraindications for specific variants
|
|
- Efficacy predictors
|
|
|
|
### 8. Genetic Constraint
|
|
|
|
Measures of negative selection against variants in the gene.
|
|
|
|
#### Data Source: gnomAD
|
|
|
|
**Metrics:**
|
|
|
|
**pLI (probability of Loss-of-function Intolerance)**
|
|
- Range: 0-1
|
|
- pLI > 0.9 indicates intolerant to LoF variants
|
|
- High pLI suggests essentiality
|
|
|
|
**LOEUF (Loss-of-function Observed/Expected Upper bound Fraction)**
|
|
- Lower values indicate greater constraint
|
|
- More interpretable than pLI across range
|
|
|
|
**Missense Constraint**
|
|
- Z-scores for missense depletion
|
|
- O/E ratios for missense variants
|
|
|
|
**Interpretation:**
|
|
- High constraint suggests important biological function
|
|
- May indicate safety concerns if inhibited
|
|
- Essential genes often show high constraint
|
|
|
|
### 9. Comparative Genomics
|
|
|
|
Cross-species gene conservation and ortholog information.
|
|
|
|
#### Data Source: Ensembl Compara
|
|
|
|
**Ortholog Data:**
|
|
- Mouse, rat, zebrafish, other model organisms
|
|
- Orthology confidence (1:1, 1:many, many:many)
|
|
- Percent identity and similarity
|
|
|
|
**Utility:**
|
|
- Model organism studies transferability
|
|
- Functional conservation assessment
|
|
- Evolution and selective pressure
|
|
|
|
### 10. Cancer Annotations
|
|
|
|
Cancer-specific target features for oncology indications.
|
|
|
|
#### Data Sources:
|
|
|
|
**Cancer Gene Census**
|
|
- Role in cancer (oncogene, TSG, fusion)
|
|
- Tier classification (1 = established, 2 = emerging)
|
|
- Tumor types and mutation types
|
|
|
|
**Cancer Hallmarks**
|
|
- Functional roles in cancer biology
|
|
- Hallmarks: proliferation, apoptosis evasion, metastasis, etc.
|
|
- Links to specific cancer processes
|
|
|
|
**Oncology Clinical Trials**
|
|
- Drugs in development targeting gene for cancer
|
|
- Trial phases and indications
|
|
|
|
### 11. Mouse Phenotypes
|
|
|
|
Phenotypes from mouse knockout/mutation studies.
|
|
|
|
#### Data Source: MGI (Mouse Genome Informatics)
|
|
|
|
**Phenotype Data:**
|
|
- Knockout phenotypes
|
|
- Disease model associations
|
|
- Mammalian Phenotype Ontology (MP) terms
|
|
|
|
**Utility:**
|
|
- Predict on-target effects
|
|
- Safety liability identification
|
|
- Mechanism of action insights
|
|
|
|
### 12. Pathways
|
|
|
|
Biological pathway annotations placing target in functional context.
|
|
|
|
#### Data Source: Reactome
|
|
|
|
**Pathway Information:**
|
|
- Curated biological pathways
|
|
- Hierarchical organization
|
|
- Pathway diagrams with target position
|
|
|
|
**Applications:**
|
|
- Mechanism hypothesis generation
|
|
- Related target identification
|
|
- Systems biology analysis
|
|
|
|
## Using Target Annotations in Queries
|
|
|
|
### Query Template: Comprehensive Target Profile
|
|
|
|
```python
|
|
query = """
|
|
query targetProfile($ensemblId: String!) {
|
|
target(ensemblId: $ensemblId) {
|
|
id
|
|
approvedSymbol
|
|
approvedName
|
|
biotype
|
|
|
|
# Tractability
|
|
tractability {
|
|
label
|
|
modality
|
|
value
|
|
}
|
|
|
|
# Safety
|
|
safetyLiabilities {
|
|
event
|
|
effects {
|
|
dosing
|
|
organsAffected
|
|
}
|
|
}
|
|
|
|
# Expression
|
|
expressions {
|
|
tissue {
|
|
label
|
|
}
|
|
rna {
|
|
value
|
|
level
|
|
}
|
|
protein {
|
|
level
|
|
}
|
|
}
|
|
|
|
# Chemical probes
|
|
chemicalProbes {
|
|
id
|
|
probeminer
|
|
origin
|
|
}
|
|
|
|
# Known drugs
|
|
knownDrugs {
|
|
uniqueDrugs
|
|
rows {
|
|
drug {
|
|
name
|
|
maximumClinicalTrialPhase
|
|
}
|
|
phase
|
|
status
|
|
}
|
|
}
|
|
|
|
# Genetic constraint
|
|
geneticConstraint {
|
|
constraintType
|
|
score
|
|
exp
|
|
obs
|
|
}
|
|
|
|
# Pathways
|
|
pathways {
|
|
pathway
|
|
pathwayId
|
|
}
|
|
}
|
|
}
|
|
"""
|
|
|
|
variables = {"ensemblId": "ENSG00000157764"}
|
|
```
|
|
|
|
## Annotation Interpretation Guidelines
|
|
|
|
### For Target Prioritization:
|
|
|
|
1. **Druggability (Tractability):**
|
|
- Clinical precedence >> Discovery precedence > Predicted
|
|
- Consider modality relevant to therapeutic approach
|
|
- Check for existing tool compounds
|
|
|
|
2. **Safety Assessment:**
|
|
- Review organ toxicity signals
|
|
- Check expression in critical tissues
|
|
- Assess genetic constraint (high = safety concern if inhibited)
|
|
- Evaluate clinical adverse events from drugs
|
|
|
|
3. **Disease Relevance:**
|
|
- Combine with association scores
|
|
- Check expression in disease-relevant tissues
|
|
- Review pathway context
|
|
|
|
4. **Validation Readiness:**
|
|
- Chemical probes available?
|
|
- Model organism data supportive?
|
|
- Known drugs provide mechanism insight?
|
|
|
|
5. **Clinical Path Considerations:**
|
|
- Pharmacogenetic factors
|
|
- Expression pattern (tissue-specific is better for selectivity)
|
|
- Essentiality (non-essential better for safety)
|
|
|
|
### Red Flags:
|
|
|
|
- **High essentiality + ubiquitous expression** - Poor therapeutic window
|
|
- **Multiple safety liabilities** - Toxicity concerns
|
|
- **High genetic constraint (pLI > 0.9)** - Critical gene, inhibition may be harmful
|
|
- **No tractability precedence** - Higher risk, longer development
|
|
- **Conflicting evidence** - Requires deeper investigation
|
|
|
|
### Green Flags:
|
|
|
|
- **Clinical precedence + related indication** - De-risked mechanism
|
|
- **Tissue-specific expression** - Better selectivity
|
|
- **Chemical probes available** - Faster validation
|
|
- **Low essentiality + disease relevance** - Good therapeutic window
|
|
- **Multiple evidence types converge** - Higher confidence
|