Initial commit
This commit is contained in:
583
skills/etetoolkit/references/api_reference.md
Normal file
583
skills/etetoolkit/references/api_reference.md
Normal file
@@ -0,0 +1,583 @@
|
||||
# ETE Toolkit API Reference
|
||||
|
||||
## Overview
|
||||
|
||||
ETE (Environment for Tree Exploration) is a Python toolkit for phylogenetic tree manipulation, analysis, and visualization. This reference covers the main classes and methods.
|
||||
|
||||
## Core Classes
|
||||
|
||||
### TreeNode (alias: Tree)
|
||||
|
||||
The fundamental class representing tree structures with hierarchical node organization.
|
||||
|
||||
**Constructor:**
|
||||
```python
|
||||
from ete3 import Tree
|
||||
t = Tree(newick=None, format=0, dist=None, support=None, name=None)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `newick`: Newick string or file path
|
||||
- `format`: Newick format (0-100). Common formats:
|
||||
- `0`: Flexible format with branch lengths and names
|
||||
- `1`: With internal node names
|
||||
- `2`: With bootstrap/support values
|
||||
- `5`: Internal node names and branch lengths
|
||||
- `8`: All features (names, distances, support)
|
||||
- `9`: Leaf names only
|
||||
- `100`: Topology only
|
||||
- `dist`: Branch length to parent (default: 1.0)
|
||||
- `support`: Bootstrap/confidence value (default: 1.0)
|
||||
- `name`: Node identifier
|
||||
|
||||
### PhyloTree
|
||||
|
||||
Specialized class for phylogenetic analysis, extending TreeNode.
|
||||
|
||||
**Constructor:**
|
||||
```python
|
||||
from ete3 import PhyloTree
|
||||
t = PhyloTree(newick=None, alignment=None, alg_format='fasta',
|
||||
sp_naming_function=None, format=0)
|
||||
```
|
||||
|
||||
**Additional Parameters:**
|
||||
- `alignment`: Path to alignment file or alignment string
|
||||
- `alg_format`: 'fasta' or 'phylip'
|
||||
- `sp_naming_function`: Custom function to extract species from node names
|
||||
|
||||
### ClusterTree
|
||||
|
||||
Class for hierarchical clustering analysis.
|
||||
|
||||
**Constructor:**
|
||||
```python
|
||||
from ete3 import ClusterTree
|
||||
t = ClusterTree(newick, text_array=None)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `text_array`: Tab-delimited matrix with column headers and row names
|
||||
|
||||
### NCBITaxa
|
||||
|
||||
Class for NCBI taxonomy database operations.
|
||||
|
||||
**Constructor:**
|
||||
```python
|
||||
from ete3 import NCBITaxa
|
||||
ncbi = NCBITaxa(dbfile=None)
|
||||
```
|
||||
|
||||
First instantiation downloads ~300MB NCBI taxonomy database to `~/.etetoolkit/taxa.sqlite`.
|
||||
|
||||
## Node Properties
|
||||
|
||||
### Basic Attributes
|
||||
|
||||
| Property | Type | Description | Default |
|
||||
|----------|------|-------------|---------|
|
||||
| `name` | str | Node identifier | "NoName" |
|
||||
| `dist` | float | Branch length to parent | 1.0 |
|
||||
| `support` | float | Bootstrap/confidence value | 1.0 |
|
||||
| `up` | TreeNode | Parent node reference | None |
|
||||
| `children` | list | Child nodes | [] |
|
||||
|
||||
### Custom Features
|
||||
|
||||
Add any custom data to nodes:
|
||||
```python
|
||||
node.add_feature("custom_name", value)
|
||||
node.add_features(feature1=value1, feature2=value2)
|
||||
```
|
||||
|
||||
Access features:
|
||||
```python
|
||||
value = node.custom_name
|
||||
# or
|
||||
value = getattr(node, "custom_name", default_value)
|
||||
```
|
||||
|
||||
## Navigation & Traversal
|
||||
|
||||
### Basic Navigation
|
||||
|
||||
```python
|
||||
# Check node type
|
||||
node.is_leaf() # Returns True if terminal node
|
||||
node.is_root() # Returns True if root node
|
||||
len(node) # Number of leaves under node
|
||||
|
||||
# Get relatives
|
||||
parent = node.up
|
||||
children = node.children
|
||||
root = node.get_tree_root()
|
||||
```
|
||||
|
||||
### Traversal Strategies
|
||||
|
||||
```python
|
||||
# Three traversal strategies
|
||||
for node in tree.traverse("preorder"): # Root → Left → Right
|
||||
print(node.name)
|
||||
|
||||
for node in tree.traverse("postorder"): # Left → Right → Root
|
||||
print(node.name)
|
||||
|
||||
for node in tree.traverse("levelorder"): # Level by level
|
||||
print(node.name)
|
||||
|
||||
# Exclude root
|
||||
for node in tree.iter_descendants("postorder"):
|
||||
print(node.name)
|
||||
```
|
||||
|
||||
### Getting Nodes
|
||||
|
||||
```python
|
||||
# Get all leaves
|
||||
leaves = tree.get_leaves()
|
||||
for leaf in tree: # Shortcut iteration
|
||||
print(leaf.name)
|
||||
|
||||
# Get all descendants
|
||||
descendants = tree.get_descendants()
|
||||
|
||||
# Get ancestors
|
||||
ancestors = node.get_ancestors()
|
||||
|
||||
# Get specific nodes by attribute
|
||||
nodes = tree.search_nodes(name="NodeA")
|
||||
node = tree & "NodeA" # Shortcut syntax
|
||||
|
||||
# Get leaves by name
|
||||
leaves = tree.get_leaves_by_name("LeafA")
|
||||
|
||||
# Get common ancestor
|
||||
ancestor = tree.get_common_ancestor("LeafA", "LeafB", "LeafC")
|
||||
|
||||
# Custom filtering
|
||||
filtered = [n for n in tree.traverse() if n.dist > 0.5 and n.is_leaf()]
|
||||
```
|
||||
|
||||
### Iterator Methods (Memory Efficient)
|
||||
|
||||
```python
|
||||
# For large trees, use iterators
|
||||
for match in tree.iter_search_nodes(name="X"):
|
||||
if some_condition:
|
||||
break # Stop early
|
||||
|
||||
for leaf in tree.iter_leaves():
|
||||
process(leaf)
|
||||
|
||||
for descendant in node.iter_descendants():
|
||||
process(descendant)
|
||||
```
|
||||
|
||||
## Tree Construction & Modification
|
||||
|
||||
### Creating Trees from Scratch
|
||||
|
||||
```python
|
||||
# Empty tree
|
||||
t = Tree()
|
||||
|
||||
# Add children
|
||||
child1 = t.add_child(name="A", dist=1.0)
|
||||
child2 = t.add_child(name="B", dist=2.0)
|
||||
|
||||
# Add siblings
|
||||
sister = child1.add_sister(name="C", dist=1.5)
|
||||
|
||||
# Populate with random topology
|
||||
t.populate(10) # Creates 10 random leaves
|
||||
t.populate(5, names_library=["A", "B", "C", "D", "E"])
|
||||
```
|
||||
|
||||
### Removing & Deleting Nodes
|
||||
|
||||
```python
|
||||
# Detach: removes entire subtree
|
||||
node.detach()
|
||||
# or
|
||||
parent.remove_child(node)
|
||||
|
||||
# Delete: removes node, reconnects children to parent
|
||||
node.delete()
|
||||
# or
|
||||
parent.remove_child(node)
|
||||
```
|
||||
|
||||
### Pruning
|
||||
|
||||
Keep only specified leaves:
|
||||
```python
|
||||
# Keep only these leaves, remove all others
|
||||
tree.prune(["A", "B", "C"])
|
||||
|
||||
# Preserve original branch lengths
|
||||
tree.prune(["A", "B", "C"], preserve_branch_length=True)
|
||||
```
|
||||
|
||||
### Tree Concatenation
|
||||
|
||||
```python
|
||||
# Attach one tree as child of another
|
||||
t1 = Tree("(A,(B,C));")
|
||||
t2 = Tree("((D,E),(F,G));")
|
||||
A = t1 & "A"
|
||||
A.add_child(t2)
|
||||
```
|
||||
|
||||
### Tree Copying
|
||||
|
||||
```python
|
||||
# Four copy methods
|
||||
copy1 = tree.copy() # Default: cpickle (preserves types)
|
||||
copy2 = tree.copy("newick") # Fastest: basic topology
|
||||
copy3 = tree.copy("newick-extended") # Includes custom features as text
|
||||
copy4 = tree.copy("deepcopy") # Slowest: handles complex objects
|
||||
```
|
||||
|
||||
## Tree Operations
|
||||
|
||||
### Rooting
|
||||
|
||||
```python
|
||||
# Set outgroup (reroot tree)
|
||||
outgroup_node = tree & "OutgroupLeaf"
|
||||
tree.set_outgroup(outgroup_node)
|
||||
|
||||
# Midpoint rooting
|
||||
midpoint = tree.get_midpoint_outgroup()
|
||||
tree.set_outgroup(midpoint)
|
||||
|
||||
# Unroot tree
|
||||
tree.unroot()
|
||||
```
|
||||
|
||||
### Resolving Polytomies
|
||||
|
||||
```python
|
||||
# Resolve multifurcations to bifurcations
|
||||
tree.resolve_polytomy(recursive=False) # Single node only
|
||||
tree.resolve_polytomy(recursive=True) # Entire tree
|
||||
```
|
||||
|
||||
### Ladderize
|
||||
|
||||
```python
|
||||
# Sort branches by size
|
||||
tree.ladderize()
|
||||
tree.ladderize(direction=1) # Ascending order
|
||||
```
|
||||
|
||||
### Convert to Ultrametric
|
||||
|
||||
```python
|
||||
# Make all leaves equidistant from root
|
||||
tree.convert_to_ultrametric()
|
||||
tree.convert_to_ultrametric(tree_length=100) # Specific total length
|
||||
```
|
||||
|
||||
## Distance & Comparison
|
||||
|
||||
### Distance Calculations
|
||||
|
||||
```python
|
||||
# Branch length distance between nodes
|
||||
dist = tree.get_distance("A", "B")
|
||||
dist = nodeA.get_distance(nodeB)
|
||||
|
||||
# Topology-only distance (count nodes)
|
||||
dist = tree.get_distance("A", "B", topology_only=True)
|
||||
|
||||
# Farthest node
|
||||
farthest, distance = node.get_farthest_node()
|
||||
farthest_leaf, distance = node.get_farthest_leaf()
|
||||
```
|
||||
|
||||
### Monophyly Testing
|
||||
|
||||
```python
|
||||
# Check if values form monophyletic group
|
||||
is_mono, clade_type, base_node = tree.check_monophyly(
|
||||
values=["A", "B", "C"],
|
||||
target_attr="name"
|
||||
)
|
||||
# Returns: (bool, "monophyletic"|"paraphyletic"|"polyphyletic", node)
|
||||
|
||||
# Get all monophyletic clades
|
||||
monophyletic_nodes = tree.get_monophyletic(
|
||||
values=["A", "B", "C"],
|
||||
target_attr="name"
|
||||
)
|
||||
```
|
||||
|
||||
### Tree Comparison
|
||||
|
||||
```python
|
||||
# Robinson-Foulds distance
|
||||
rf, max_rf, common_leaves, parts_t1, parts_t2 = t1.robinson_foulds(t2)
|
||||
print(f"RF distance: {rf}/{max_rf}")
|
||||
|
||||
# Normalized RF distance
|
||||
result = t1.compare(t2)
|
||||
norm_rf = result["norm_rf"] # 0.0 to 1.0
|
||||
ref_edges = result["ref_edges_in_source"]
|
||||
```
|
||||
|
||||
## Input/Output
|
||||
|
||||
### Reading Trees
|
||||
|
||||
```python
|
||||
# From string
|
||||
t = Tree("(A:1,(B:1,(C:1,D:1):0.5):0.5);")
|
||||
|
||||
# From file
|
||||
t = Tree("tree.nw")
|
||||
|
||||
# With format
|
||||
t = Tree("tree.nw", format=1)
|
||||
```
|
||||
|
||||
### Writing Trees
|
||||
|
||||
```python
|
||||
# To string
|
||||
newick = tree.write()
|
||||
newick = tree.write(format=1)
|
||||
newick = tree.write(format=1, features=["support", "custom_feature"])
|
||||
|
||||
# To file
|
||||
tree.write(outfile="output.nw")
|
||||
tree.write(format=5, outfile="output.nw", features=["name", "dist"])
|
||||
|
||||
# Custom leaf function (for collapsing)
|
||||
def is_leaf(node):
|
||||
return len(node) <= 3 # Treat small clades as leaves
|
||||
|
||||
newick = tree.write(is_leaf_fn=is_leaf)
|
||||
```
|
||||
|
||||
### Tree Rendering
|
||||
|
||||
```python
|
||||
# Show interactive GUI
|
||||
tree.show()
|
||||
|
||||
# Render to file (PNG, PDF, SVG)
|
||||
tree.render("tree.png")
|
||||
tree.render("tree.pdf", w=200, units="mm")
|
||||
tree.render("tree.svg", dpi=300)
|
||||
|
||||
# ASCII representation
|
||||
print(tree)
|
||||
print(tree.get_ascii(show_internal=True, compact=False))
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Caching Content
|
||||
|
||||
For frequent access to node contents:
|
||||
```python
|
||||
# Cache all node contents
|
||||
node2content = tree.get_cached_content()
|
||||
|
||||
# Fast lookup
|
||||
for node in tree.traverse():
|
||||
leaves = node2content[node]
|
||||
print(f"Node has {len(leaves)} leaves")
|
||||
```
|
||||
|
||||
### Precomputing Distances
|
||||
|
||||
```python
|
||||
# For multiple distance queries
|
||||
node2dist = {}
|
||||
for node in tree.traverse():
|
||||
node2dist[node] = node.get_distance(tree)
|
||||
```
|
||||
|
||||
## PhyloTree-Specific Methods
|
||||
|
||||
### Sequence Alignment
|
||||
|
||||
```python
|
||||
# Link alignment
|
||||
tree.link_to_alignment("alignment.fasta", alg_format="fasta")
|
||||
|
||||
# Access sequences
|
||||
for leaf in tree:
|
||||
print(f"{leaf.name}: {leaf.sequence}")
|
||||
```
|
||||
|
||||
### Species Naming
|
||||
|
||||
```python
|
||||
# Default: first 3 letters
|
||||
# Custom function
|
||||
def get_species(node_name):
|
||||
return node_name.split("_")[0]
|
||||
|
||||
tree.set_species_naming_function(get_species)
|
||||
|
||||
# Manual setting
|
||||
for leaf in tree:
|
||||
leaf.species = extract_species(leaf.name)
|
||||
```
|
||||
|
||||
### Evolutionary Events
|
||||
|
||||
```python
|
||||
# Detect duplication/speciation events
|
||||
events = tree.get_descendant_evol_events()
|
||||
|
||||
for node in tree.traverse():
|
||||
if hasattr(node, "evoltype"):
|
||||
print(f"{node.name}: {node.evoltype}") # "D" or "S"
|
||||
|
||||
# With species tree
|
||||
species_tree = Tree("(human, (chimp, gorilla));")
|
||||
events = tree.get_descendant_evol_events(species_tree=species_tree)
|
||||
```
|
||||
|
||||
### Gene Tree Operations
|
||||
|
||||
```python
|
||||
# Get species trees from duplicated gene families
|
||||
species_trees = tree.get_speciation_trees()
|
||||
|
||||
# Split by duplication events
|
||||
subtrees = tree.split_by_dups()
|
||||
|
||||
# Collapse lineage-specific expansions
|
||||
tree.collapse_lineage_specific_expansions()
|
||||
```
|
||||
|
||||
## NCBITaxa Methods
|
||||
|
||||
### Database Operations
|
||||
|
||||
```python
|
||||
from ete3 import NCBITaxa
|
||||
ncbi = NCBITaxa()
|
||||
|
||||
# Update database
|
||||
ncbi.update_taxonomy_database()
|
||||
```
|
||||
|
||||
### Querying Taxonomy
|
||||
|
||||
```python
|
||||
# Get taxid from name
|
||||
taxid = ncbi.get_name_translator(["Homo sapiens"])
|
||||
# Returns: {'Homo sapiens': [9606]}
|
||||
|
||||
# Get name from taxid
|
||||
names = ncbi.get_taxid_translator([9606, 9598])
|
||||
# Returns: {9606: 'Homo sapiens', 9598: 'Pan troglodytes'}
|
||||
|
||||
# Get rank
|
||||
rank = ncbi.get_rank([9606])
|
||||
# Returns: {9606: 'species'}
|
||||
|
||||
# Get lineage
|
||||
lineage = ncbi.get_lineage(9606)
|
||||
# Returns: [1, 131567, 2759, ..., 9606]
|
||||
|
||||
# Get descendants
|
||||
descendants = ncbi.get_descendant_taxa("Primates")
|
||||
descendants = ncbi.get_descendant_taxa("Primates", collapse_subspecies=True)
|
||||
```
|
||||
|
||||
### Building Taxonomy Trees
|
||||
|
||||
```python
|
||||
# Get minimal tree connecting taxa
|
||||
tree = ncbi.get_topology([9606, 9598, 9593]) # Human, chimp, gorilla
|
||||
|
||||
# Annotate tree with taxonomy
|
||||
tree.annotate_ncbi_taxa()
|
||||
|
||||
# Access taxonomy info
|
||||
for node in tree.traverse():
|
||||
print(f"{node.sci_name} ({node.taxid}) - Rank: {node.rank}")
|
||||
```
|
||||
|
||||
## ClusterTree Methods
|
||||
|
||||
### Linking to Data
|
||||
|
||||
```python
|
||||
# Link matrix to tree
|
||||
tree.link_to_arraytable(matrix_string)
|
||||
|
||||
# Access profiles
|
||||
for leaf in tree:
|
||||
print(leaf.profile) # Numerical array
|
||||
```
|
||||
|
||||
### Cluster Metrics
|
||||
|
||||
```python
|
||||
# Get silhouette coefficient
|
||||
silhouette = tree.get_silhouette()
|
||||
|
||||
# Get Dunn index
|
||||
dunn = tree.get_dunn()
|
||||
|
||||
# Inter/intra cluster distances
|
||||
inter = node.intercluster_dist
|
||||
intra = node.intracluster_dist
|
||||
|
||||
# Standard deviation
|
||||
dev = node.deviation
|
||||
```
|
||||
|
||||
### Distance Metrics
|
||||
|
||||
Supported metrics:
|
||||
- `"euclidean"`: Euclidean distance
|
||||
- `"pearson"`: Pearson correlation
|
||||
- `"spearman"`: Spearman rank correlation
|
||||
|
||||
```python
|
||||
tree.dist_to(node2, metric="pearson")
|
||||
```
|
||||
|
||||
## Common Error Handling
|
||||
|
||||
```python
|
||||
# Check if tree is empty
|
||||
if tree.children:
|
||||
print("Tree has children")
|
||||
|
||||
# Check if node exists
|
||||
nodes = tree.search_nodes(name="X")
|
||||
if nodes:
|
||||
node = nodes[0]
|
||||
|
||||
# Safe feature access
|
||||
value = getattr(node, "feature_name", default_value)
|
||||
|
||||
# Check format compatibility
|
||||
try:
|
||||
tree.write(format=1)
|
||||
except:
|
||||
print("Tree lacks internal node names")
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use appropriate traversal**: Postorder for bottom-up, preorder for top-down
|
||||
2. **Cache for repeated access**: Use `get_cached_content()` for frequent queries
|
||||
3. **Use iterators for large trees**: Memory-efficient processing
|
||||
4. **Preserve branch lengths**: Use `preserve_branch_length=True` when pruning
|
||||
5. **Choose copy method wisely**: "newick" for speed, "cpickle" for full fidelity
|
||||
6. **Validate monophyly**: Check returned clade type (monophyletic/paraphyletic/polyphyletic)
|
||||
7. **Use PhyloTree for phylogenetics**: Specialized methods for evolutionary analysis
|
||||
8. **Cache NCBI queries**: Store results to avoid repeated database access
|
||||
Reference in New Issue
Block a user