zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

12 KiB

Raw Blame History

ETE Toolkit API Reference

Overview

ETE (Environment for Tree Exploration) is a Python toolkit for phylogenetic tree manipulation, analysis, and visualization. This reference covers the main classes and methods.

Core Classes

TreeNode (alias: Tree)

The fundamental class representing tree structures with hierarchical node organization.

Constructor:

from ete3 import Tree
t = Tree(newick=None, format=0, dist=None, support=None, name=None)

Parameters:

newick: Newick string or file path
format: Newick format (0-100). Common formats:
- 0: Flexible format with branch lengths and names
- 1: With internal node names
- 2: With bootstrap/support values
- 5: Internal node names and branch lengths
- 8: All features (names, distances, support)
- 9: Leaf names only
- 100: Topology only
dist: Branch length to parent (default: 1.0)
support: Bootstrap/confidence value (default: 1.0)
name: Node identifier

PhyloTree

Specialized class for phylogenetic analysis, extending TreeNode.

Constructor:

from ete3 import PhyloTree
t = PhyloTree(newick=None, alignment=None, alg_format='fasta',
              sp_naming_function=None, format=0)

Additional Parameters:

alignment: Path to alignment file or alignment string
alg_format: 'fasta' or 'phylip'
sp_naming_function: Custom function to extract species from node names

ClusterTree

Class for hierarchical clustering analysis.

Constructor:

from ete3 import ClusterTree
t = ClusterTree(newick, text_array=None)

Parameters:

text_array: Tab-delimited matrix with column headers and row names

NCBITaxa

Class for NCBI taxonomy database operations.

Constructor:

from ete3 import NCBITaxa
ncbi = NCBITaxa(dbfile=None)

First instantiation downloads ~300MB NCBI taxonomy database to ~/.etetoolkit/taxa.sqlite.

Node Properties

Basic Attributes

Property	Type	Description	Default
`name`	str	Node identifier	"NoName"
`dist`	float	Branch length to parent	1.0
`support`	float	Bootstrap/confidence value	1.0
`up`	TreeNode	Parent node reference	None
`children`	list	Child nodes	[]

Custom Features

Add any custom data to nodes:

node.add_feature("custom_name", value)
node.add_features(feature1=value1, feature2=value2)

Access features:

value = node.custom_name
# or
value = getattr(node, "custom_name", default_value)

# Check node type
node.is_leaf()          # Returns True if terminal node
node.is_root()          # Returns True if root node
len(node)               # Number of leaves under node

# Get relatives
parent = node.up
children = node.children
root = node.get_tree_root()

Traversal Strategies

# Three traversal strategies
for node in tree.traverse("preorder"):    # Root → Left → Right
    print(node.name)

for node in tree.traverse("postorder"):   # Left → Right → Root
    print(node.name)

for node in tree.traverse("levelorder"):  # Level by level
    print(node.name)

# Exclude root
for node in tree.iter_descendants("postorder"):
    print(node.name)

Getting Nodes

# Get all leaves
leaves = tree.get_leaves()
for leaf in tree:  # Shortcut iteration
    print(leaf.name)

# Get all descendants
descendants = tree.get_descendants()

# Get ancestors
ancestors = node.get_ancestors()

# Get specific nodes by attribute
nodes = tree.search_nodes(name="NodeA")
node = tree & "NodeA"  # Shortcut syntax

# Get leaves by name
leaves = tree.get_leaves_by_name("LeafA")

# Get common ancestor
ancestor = tree.get_common_ancestor("LeafA", "LeafB", "LeafC")

# Custom filtering
filtered = [n for n in tree.traverse() if n.dist > 0.5 and n.is_leaf()]

Iterator Methods (Memory Efficient)

# For large trees, use iterators
for match in tree.iter_search_nodes(name="X"):
    if some_condition:
        break  # Stop early

for leaf in tree.iter_leaves():
    process(leaf)

for descendant in node.iter_descendants():
    process(descendant)

Tree Construction & Modification

Creating Trees from Scratch

# Empty tree
t = Tree()

# Add children
child1 = t.add_child(name="A", dist=1.0)
child2 = t.add_child(name="B", dist=2.0)

# Add siblings
sister = child1.add_sister(name="C", dist=1.5)

# Populate with random topology
t.populate(10)  # Creates 10 random leaves
t.populate(5, names_library=["A", "B", "C", "D", "E"])

Removing & Deleting Nodes

# Detach: removes entire subtree
node.detach()
# or
parent.remove_child(node)

# Delete: removes node, reconnects children to parent
node.delete()
# or
parent.remove_child(node)

Pruning

Keep only specified leaves:

# Keep only these leaves, remove all others
tree.prune(["A", "B", "C"])

# Preserve original branch lengths
tree.prune(["A", "B", "C"], preserve_branch_length=True)

Tree Concatenation

# Attach one tree as child of another
t1 = Tree("(A,(B,C));")
t2 = Tree("((D,E),(F,G));")
A = t1 & "A"
A.add_child(t2)

Tree Copying

# Four copy methods
copy1 = tree.copy()  # Default: cpickle (preserves types)
copy2 = tree.copy("newick")  # Fastest: basic topology
copy3 = tree.copy("newick-extended")  # Includes custom features as text
copy4 = tree.copy("deepcopy")  # Slowest: handles complex objects

Tree Operations

Rooting

# Set outgroup (reroot tree)
outgroup_node = tree & "OutgroupLeaf"
tree.set_outgroup(outgroup_node)

# Midpoint rooting
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)

# Unroot tree
tree.unroot()

Resolving Polytomies

# Resolve multifurcations to bifurcations
tree.resolve_polytomy(recursive=False)  # Single node only
tree.resolve_polytomy(recursive=True)   # Entire tree

Ladderize

# Sort branches by size
tree.ladderize()
tree.ladderize(direction=1)  # Ascending order

Convert to Ultrametric

# Make all leaves equidistant from root
tree.convert_to_ultrametric()
tree.convert_to_ultrametric(tree_length=100)  # Specific total length

Distance & Comparison

Distance Calculations

# Branch length distance between nodes
dist = tree.get_distance("A", "B")
dist = nodeA.get_distance(nodeB)

# Topology-only distance (count nodes)
dist = tree.get_distance("A", "B", topology_only=True)

# Farthest node
farthest, distance = node.get_farthest_node()
farthest_leaf, distance = node.get_farthest_leaf()

Monophyly Testing

# Check if values form monophyletic group
is_mono, clade_type, base_node = tree.check_monophyly(
    values=["A", "B", "C"],
    target_attr="name"
)
# Returns: (bool, "monophyletic"|"paraphyletic"|"polyphyletic", node)

# Get all monophyletic clades
monophyletic_nodes = tree.get_monophyletic(
    values=["A", "B", "C"],
    target_attr="name"
)

Tree Comparison

# Robinson-Foulds distance
rf, max_rf, common_leaves, parts_t1, parts_t2 = t1.robinson_foulds(t2)
print(f"RF distance: {rf}/{max_rf}")

# Normalized RF distance
result = t1.compare(t2)
norm_rf = result["norm_rf"]  # 0.0 to 1.0
ref_edges = result["ref_edges_in_source"]

Input/Output

Reading Trees

# From string
t = Tree("(A:1,(B:1,(C:1,D:1):0.5):0.5);")

# From file
t = Tree("tree.nw")

# With format
t = Tree("tree.nw", format=1)

Writing Trees

# To string
newick = tree.write()
newick = tree.write(format=1)
newick = tree.write(format=1, features=["support", "custom_feature"])

# To file
tree.write(outfile="output.nw")
tree.write(format=5, outfile="output.nw", features=["name", "dist"])

# Custom leaf function (for collapsing)
def is_leaf(node):
    return len(node) <= 3  # Treat small clades as leaves

newick = tree.write(is_leaf_fn=is_leaf)

Tree Rendering

# Show interactive GUI
tree.show()

# Render to file (PNG, PDF, SVG)
tree.render("tree.png")
tree.render("tree.pdf", w=200, units="mm")
tree.render("tree.svg", dpi=300)

# ASCII representation
print(tree)
print(tree.get_ascii(show_internal=True, compact=False))

Performance Optimization

Caching Content

For frequent access to node contents:

# Cache all node contents
node2content = tree.get_cached_content()

# Fast lookup
for node in tree.traverse():
    leaves = node2content[node]
    print(f"Node has {len(leaves)} leaves")

Precomputing Distances

# For multiple distance queries
node2dist = {}
for node in tree.traverse():
    node2dist[node] = node.get_distance(tree)

PhyloTree-Specific Methods

Sequence Alignment

# Link alignment
tree.link_to_alignment("alignment.fasta", alg_format="fasta")

# Access sequences
for leaf in tree:
    print(f"{leaf.name}: {leaf.sequence}")

Species Naming

# Default: first 3 letters
# Custom function
def get_species(node_name):
    return node_name.split("_")[0]

tree.set_species_naming_function(get_species)

# Manual setting
for leaf in tree:
    leaf.species = extract_species(leaf.name)

Evolutionary Events

# Detect duplication/speciation events
events = tree.get_descendant_evol_events()

for node in tree.traverse():
    if hasattr(node, "evoltype"):
        print(f"{node.name}: {node.evoltype}")  # "D" or "S"

# With species tree
species_tree = Tree("(human, (chimp, gorilla));")
events = tree.get_descendant_evol_events(species_tree=species_tree)

Gene Tree Operations

# Get species trees from duplicated gene families
species_trees = tree.get_speciation_trees()

# Split by duplication events
subtrees = tree.split_by_dups()

# Collapse lineage-specific expansions
tree.collapse_lineage_specific_expansions()

NCBITaxa Methods

Database Operations

from ete3 import NCBITaxa
ncbi = NCBITaxa()

# Update database
ncbi.update_taxonomy_database()

Querying Taxonomy

# Get taxid from name
taxid = ncbi.get_name_translator(["Homo sapiens"])
# Returns: {'Homo sapiens': [9606]}

# Get name from taxid
names = ncbi.get_taxid_translator([9606, 9598])
# Returns: {9606: 'Homo sapiens', 9598: 'Pan troglodytes'}

# Get rank
rank = ncbi.get_rank([9606])
# Returns: {9606: 'species'}

# Get lineage
lineage = ncbi.get_lineage(9606)
# Returns: [1, 131567, 2759, ..., 9606]

# Get descendants
descendants = ncbi.get_descendant_taxa("Primates")
descendants = ncbi.get_descendant_taxa("Primates", collapse_subspecies=True)

Building Taxonomy Trees

# Get minimal tree connecting taxa
tree = ncbi.get_topology([9606, 9598, 9593])  # Human, chimp, gorilla

# Annotate tree with taxonomy
tree.annotate_ncbi_taxa()

# Access taxonomy info
for node in tree.traverse():
    print(f"{node.sci_name} ({node.taxid}) - Rank: {node.rank}")

ClusterTree Methods

Linking to Data

# Link matrix to tree
tree.link_to_arraytable(matrix_string)

# Access profiles
for leaf in tree:
    print(leaf.profile)  # Numerical array

Cluster Metrics

# Get silhouette coefficient
silhouette = tree.get_silhouette()

# Get Dunn index
dunn = tree.get_dunn()

# Inter/intra cluster distances
inter = node.intercluster_dist
intra = node.intracluster_dist

# Standard deviation
dev = node.deviation

Distance Metrics

Supported metrics:

"euclidean": Euclidean distance
"pearson": Pearson correlation
"spearman": Spearman rank correlation

tree.dist_to(node2, metric="pearson")

Common Error Handling

# Check if tree is empty
if tree.children:
    print("Tree has children")

# Check if node exists
nodes = tree.search_nodes(name="X")
if nodes:
    node = nodes[0]

# Safe feature access
value = getattr(node, "feature_name", default_value)

# Check format compatibility
try:
    tree.write(format=1)
except:
    print("Tree lacks internal node names")

Best Practices

Use appropriate traversal: Postorder for bottom-up, preorder for top-down
Cache for repeated access: Use get_cached_content() for frequent queries
Use iterators for large trees: Memory-efficient processing
Preserve branch lengths: Use preserve_branch_length=True when pruning
Choose copy method wisely: "newick" for speed, "cpickle" for full fidelity
Validate monophyly: Check returned clade type (monophyletic/paraphyletic/polyphyletic)
Use PhyloTree for phylogenetics: Specialized methods for evolutionary analysis
Cache NCBI queries: Store results to avoid repeated database access

12 KiB Raw Blame History

ETE Toolkit API Reference

Overview

Core Classes

TreeNode (alias: Tree)

PhyloTree

ClusterTree

NCBITaxa

Node Properties

Basic Attributes

Custom Features

Navigation & Traversal

Basic Navigation

Traversal Strategies

Getting Nodes

Iterator Methods (Memory Efficient)

Tree Construction & Modification

Creating Trees from Scratch

Removing & Deleting Nodes

Pruning

Tree Concatenation

Tree Copying

Tree Operations

Rooting

Resolving Polytomies

Ladderize

Convert to Ultrametric

Distance & Comparison

Distance Calculations

Monophyly Testing

Tree Comparison

Input/Output

Reading Trees

Writing Trees

Tree Rendering

Performance Optimization

Caching Content

Precomputing Distances

PhyloTree-Specific Methods

Sequence Alignment

Species Naming

Evolutionary Events

Gene Tree Operations

NCBITaxa Methods

Database Operations

Querying Taxonomy

Building Taxonomy Trees

ClusterTree Methods

Linking to Data

Cluster Metrics

Distance Metrics

Common Error Handling

Best Practices

12 KiB

Raw Blame History