Recently I was asked by a researcher for a family-level phylogenetic tree of ferns. The Fern Tree of Life (FTOL) project that I maintain generates a maximally sampled global fern phylogeny, but it is at the species level. So how can we go from that to a family-level tree?
Basically it involves the following steps:
Load a list of all species in the tree and the name of the family each belongs to.
Check that each family is monophyletic or monotypic (this must be true for the next step to work1).
Select a single exemplar species for each family (this could be any species within the family, as long as the family is monophyletic).
Trim the tree to only the exemplar species (one per family).
A few packages used here bear extra mention. The MonoPhy package is great at doing exactly what the name would suggest: checking for monophyly. I am a huge fan of the assertr package for proactive assertion about data. In this case, the code would fail (issue an error) if the assumption of monophyletic/monotypic families did not hold. Finally, the ftolr package by yours truly provides the most recent fern tree and associated taxonomic data.
Of course, this approach should work for any tree assuming the two requirements are met (the higher level taxa are all monophyletic or monotypic and the tree is ultrametric).
# Load packageslibrary(tidyverse)library(ftolr)library(ape)library(MonoPhy)library(assertr)# Check FTOL version and cutoff dateft_data_ver()
# Analyze monophyly of each familyfamily_mono_test <-AssessMonophyly( phy,as.data.frame(taxonomy[, c("species", "family")]))# Check that all families are monophyletic or monotypicfamily_mono_summary <- family_mono_test$family$result %>%rownames_to_column("family") %>%as_tibble() %>%assert(in_set("Yes", "Monotypic"), Monophyly)# Inspect:family_mono_summary
# Get one exemplar tip (species) per familyrep_tips <- taxonomy %>%group_by(family) %>%slice(1) %>%ungroup()# Subset phylogeny to one tip per familyphy_family <- ape::keep.tip(phy, rep_tips$species)# Relabel with family namesnew_tips <-tibble(species = phy_family$tip.label) %>%left_join(rep_tips, by ="species") %>%pull(family)phy_family$tip.label <- new_tips# Visualize treeplot(ladderize(phy_family), no.margin =TRUE)