Title: | Open Software for Teaching Evolutionary Biology at Multiple Scales Through Virtual Inquiries |
---|---|
Description: | "Evolutionary Virtual Education" - 'evolved' - provides multiple tools to help educators (especially at the graduate level or in advanced undergraduate level courses) apply inquiry-based learning in general evolution classes. In particular, the tools provided include functions that simulate evolutionary processes (e.g., genetic drift, natural selection within a single locus) or concepts (e.g. Hardy-Weinberg equilibrium, phylogenetic distribution of traits). More than only simulating, the package also provides tools for students to analyze (e.g., measuring, testing, visualizing) datasets with characteristics that are common to many fields related to evolutionary biology. Importantly, the package is heavily oriented towards providing tools for inquiry-based learning - where students follow scientific practices to actively construct knowledge. For additional details, see package's vignettes. |
Authors: | Matheus Januario [aut, cph, cre] , Jennifer Auler [aut] , Andressa Viol [aut], Daniel Rabosky [aut] |
Maintainer: | Matheus Januario <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-12-20 04:26:33 UTC |
Source: | https://github.com/mjanuario/evolved |
Many ammonoid (Cephalopoda, Mollusca) fossil occurrences from different moments of the geological past. Much information (i.e., extra columns) was removed from the original dataset to make it more compact, but it can be fully accessed by the data URL. This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
ammonoidea_fossil
ammonoidea_fossil
A data.frame
with 58111 rows and 13 columns.
Organism phylum
Organism taxonomic class
Organism taxonomic order
Organism taxonomic family
Organism genus
Organism specific name
Earlier known geological period of occurrence
Later known geological period of occurrence
Occurrence's oldest time boundary in million years
Occurrence's newest time boundary in million years
Midpoint between max_ma and min_ma
Longitude of place where occurrence was found. Follows decimal degree format.
Latitude of place where occurrence was found. Follows decimal degree format.
The Paleobiology Database (downloaded on 2022-03-11).
Data URL: http://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Ammonoidea&show=full,classext,genus,subgenus,acconly,ident,img,etbasis,strat,lith,env,timebins,timecompare,resgroup,ref,ent,entname,crmod
Birds species list
birds_spp
birds_spp
A vector
containing the names of almost all (i.e., 9993) extant bird species, following Jetz et al. (2012) taxonomy. This dataset is part of the package and is
licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Actual file downloaded from https://vertlife.org/data/
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., & Mooers, A. O. (2012). The global diversity of birds in space and time. Nature, 491(7424), 444-448.
calcFossilDivTT
calculates fossil diversity through time using
different methods.
calcFossilDivTT( data, tax.lvl = "species", method = "rangethrough", bin.reso = 1 )
calcFossilDivTT( data, tax.lvl = "species", method = "rangethrough", bin.reso = 1 )
data |
A |
tax.lvl |
A |
method |
A |
bin.reso |
A |
A data.frame
containing the diversity (column div
) of
the chosen taxonomic level through time, with calculation based on
method
. If "method = rangethrough"
, the time moments are the
layer boundaries given in data
.
If "method = stdmethod"
, the time moments are evenly-space bins with
length equal to bin.reso
, starting at the earliest bound in the
dataset.
Matheus Januario, Jennifer Auler
Foote, M., Miller, A. I., Raup, D. M., & Stanley, S. M. (2007). Principles of paleontology. Macmillan.
# Loading data data("dinos_fossil") # Using function: div1 <- calcFossilDivTT(dinos_fossil, method = "stdmethod") div2 <- calcFossilDivTT(dinos_fossil, method = "stdmethod", bin.reso = 10) # Comparing different bins sizes in the standard method plot(x=div1$age, y=div1$div, type="l", xlab = "Time (Mya)", ylab = "Richness", xlim=rev(range(div1$age)), col="red") lines(x=div2$age, y=div2$div, col="blue") # Comparing different methods: div3 <- calcFossilDivTT(dinos_fossil, method = "rangethrough") plot(x=div1$age, y=div1$div, type="l", xlab = "Time (Mya)", ylab = "Richness", xlim=rev(range(div1$age)), col="red") lines(x=div3$age, y=div3$div, col="blue")
# Loading data data("dinos_fossil") # Using function: div1 <- calcFossilDivTT(dinos_fossil, method = "stdmethod") div2 <- calcFossilDivTT(dinos_fossil, method = "stdmethod", bin.reso = 10) # Comparing different bins sizes in the standard method plot(x=div1$age, y=div1$div, type="l", xlab = "Time (Mya)", ylab = "Richness", xlim=rev(range(div1$age)), col="red") lines(x=div2$age, y=div2$div, col="blue") # Comparing different methods: div3 <- calcFossilDivTT(dinos_fossil, method = "rangethrough") plot(x=div1$age, y=div1$div, type="l", xlab = "Time (Mya)", ylab = "Richness", xlim=rev(range(div1$age)), col="red") lines(x=div3$age, y=div3$div, col="blue")
checkAndFixUltrametric
finds and correct small numerical errors that
might appear in ultrametric trees that where created through simulations.
This function should never be used as a formal statistical method to make a
tree ultrametric, as it was designed just to correct small rounding errors.
checkAndFixUltrametric(phy)
checkAndFixUltrametric(phy)
phy |
A |
A check and fixed phylo
object.
Daniel Rabosky, Matheus Januario, Jennifer Auler
Paradis, E. (2012). Analysis of Phylogenetics and Evolution with R (Vol. 2). New York: Springer.
Popescu, A. A., Huber, K. T., & Paradis, E. (2012). ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics, 28(11), 1536-1537.
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t = 5) phy$edge.length[1] <- phy$edge.length[1]+0.1 ape::is.ultrametric(phy) phy <- checkAndFixUltrametric(phy) ape::is.ultrametric(phy)
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t = 5) phy$edge.length[1] <- phy$edge.length[1]+0.1 ape::is.ultrametric(phy) phy <- checkAndFixUltrametric(phy) ape::is.ultrametric(phy)
countSeqDiffs
counts the number of protein differences among two sequences of proteins within the same "ProteinSeq" object.
countSeqDiffs(x, taxon1, taxon2)
countSeqDiffs(x, taxon1, taxon2)
x |
A "ProteinSeq" object containing proteins from |
taxon1 |
A character giving the common name of the first species that
will be compared. Must be a name present in |
taxon2 |
A character giving the common name of the second species that
will be compared. Must be a name present in |
A integer giving the number of protein differences between
taxon1
and taxon2
.
Matheus Januario, Dan Rabosky, Jennifer Auler
countSeqDiffs(cytOxidase, "human", "chimpanzee") countSeqDiffs(cytOxidase, "human", "cnidaria") countSeqDiffs(cytOxidase, "chimpanzee", "cnidaria")
countSeqDiffs(cytOxidase, "human", "chimpanzee") countSeqDiffs(cytOxidase, "human", "cnidaria") countSeqDiffs(cytOxidase, "chimpanzee", "cnidaria")
cytOxidase
is a set of homologous protein sequences from the GENE
cytochrome oxidase SUBUNIT 1 gene. This mitochondrial gene, often known as
CO1 (“see-oh-one”), plays a key role in cellular respiration. C01 contains
approximately 513 aminoacids and has been used by previous studies for
reconstructing phylogenetic trees and estimating divergence times in
Metazoaria by assuming a molecular clock. Its 5' partition is used for the
‘Barcoding of Life’ initiative, for instance. This dataset is part of the
package and is licensed under the Creative Commons Attribution 4.0
International License (CC BY 4.0).
cytOxidase
cytOxidase
A object of class "ProteinSeq" with 17 entries, each representing a different animal species
Organism's popular name or a taxonomic classification
Aminoacid sequence of length 513
The object of class "ProteinSeq" is structured as a named
vector
of 17 different animal species, with each individual component
being a sequence of 513 aminoacids.
Amino acid sequences were originally downloaded from genebank and later curated and aligned by Daniel L. Rabosky.
Data on the body size of many cetacean species and species-specific speciation rates. This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
data_whales
data_whales
A data.frame
with 75 rows and 4 columns.
Whale species
Log of body mass (grams)
Species-specific speciation rate
Suggested color to be used for the tip's clade
Species follow taxonomy from Steeman et al (2009). Species-specific speciation rates from Rabosky 2014 & Rabosky et al, 2014. Mass data from PanTHERIA (Jones et al, 2009).
Compilation of many primary sources (see details).
Jones, K. E., Bielby, J., Cardillo, M., Fritz, S. A., O'Dell, J., Orme, C. D. L., ... & Purvis, A. (2009). PanTHERIA: a species‐level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090‐184. Ecology, 90(9), 2648-2648.
Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS one, 9(2), e89543.
Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014). BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707.
Steeman, M. E., Hebsgaard, M. B., Fordyce, R. E., Ho, S. Y., Rabosky, D. L., Nielsen, R., ... & Willerslev, E. (2009). Radiation of extant cetaceans driven by restructuring of the oceans. Systematic biology, 58(6), 573-585.
Many dinosaur (including avian species) fossil occurrences from different moments of the geological past. Much information (i.e., extra columns) was removed from the original dataset to make it more compact, but it can be fully accessed by the data URL. This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
dinos_fossil
dinos_fossil
A data.frame
containing 15527 rows and 13 columns
Organism phylum
Organism taxonomic class
Organism taxonomic order
Organism taxonomic family
Organism genus
Organism specific name
Earlier known geological period of occurrence
Later known geological period of occurrence
Occurrence's oldest time boundary in million years
Occurrence's newest time boundary in million years
Midpoint between max_ma and min_ma
Longitude of place where occurrence was found. Follows decimal degree format.
Latitude of place where occurrence was found. Follows decimal degree format.
The Paleobiology Database (downloaded on 2022-03-11).
Data URL: http://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Dinosauria&show=full,classext,genus,subgenus,acconly,ident,img,etbasis,strat,lith,env,timebins,timecompare,resgroup,ref,ent,entname,crmod
estimateSpeciation
Estimates the speciation rate assuming a
constant-rate, pure-birth model.
estimateSpeciation(phy)
estimateSpeciation(phy)
phy |
A |
A numeric
with the estimated speciation rate.
Daniel Rabosky, Matheus Januario, Jennifer Auler
Yule G.U. 1925. A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FRS. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character. 213:21–87.
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t = 5) estimateSpeciation(phy)
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t = 5) estimateSpeciation(phy)
fitCRBD
fits a constant-rate birth-death process to a phylogeny in the
format of ape
package's phylo
object. Optimization is based on
likelihood functions made with diversitree
. This function is
basically a wrapper for the diversitree
's make.bd
function.
fitCRBD(phy, n.opt = 5, l.min = 0.001, l.max = 5, max.bad = 200)
fitCRBD(phy, n.opt = 5, l.min = 0.001, l.max = 5, max.bad = 200)
phy |
A |
n.opt |
Number of optimizations that will be tried by function. |
l.min |
Lower bound for optimization. Default value is |
l.max |
Upper bound for optimization. Default value is |
max.bad |
Maximum number of unsuccessful optimization attempts. Default
value is |
A numeric
with the best estimates of speciation S
and extinction E
rates.
Daniel Rabosky, Matheus Januario, Jennifer Auler
Paradis, E. (2012). Analysis of Phylogenetics and Evolution with R (Vol. 2). New York: Springer.
Popescu, A. A., Huber, K. T., & Paradis, E. (2012). ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics, 28(11), 1536-1537.
FitzJohn, R. G. (2010). Analysing diversification with diversitree. R Package. ver, 9-2.
FitzJohn, R. G. (2012). Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution, 3(6), 1084-1092.
see help page from diversitree::make.bd
and
stats::optim
S <- 0.1 E <- 0.1 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 30, max.t = 8) fitCRBD(phy)
S <- 0.1 E <- 0.1 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 30, max.t = 8) fitCRBD(phy)
lttPlot
plots the lineage through time (LTT) of a phylo
object.
It also adds a reference line connecting the edges of the graph.
lttPlot( phy, lwd = 1, col = "red", plot = TRUE, rel.time = FALSE, add = FALSE, knitr = FALSE )
lttPlot( phy, lwd = 1, col = "red", plot = TRUE, rel.time = FALSE, add = FALSE, knitr = FALSE )
phy |
A |
lwd |
Line width. |
col |
Line color. |
plot |
A |
rel.time |
A |
add |
A |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
Plots the sum of alive lineages per point in time, and adds a red
line as a reference of expectation under pure birth. If plot = FALSE
,
a list the richness of each point in time, and phy
's crown age.
Daniel Rabosky, Matheus Januario, Jennifer Auler
Paradis, E. (2012). Analysis of Phylogenetics and Evolution with R (Vol. 2). New York: Springer.
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 20, max.t = 5) lttPlot(phy, knitr = TRUE) lttPlot(phy, plot = FALSE, knitr = TRUE)
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 20, max.t = 5) lttPlot(phy, knitr = TRUE) lttPlot(phy, plot = FALSE, knitr = TRUE)
Many mammal fossil occurrences from different moments of the geological past. Much information (i.e., extra columns) was removed from the original dataset to make it more compact, but it can be fully accessed by the data URL. This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
mammals_fossil
mammals_fossil
A data.frame
containing 69463 rows and 13 columns
Organism phylum
Organism taxonomic class
Organism taxonomic order
Organism taxonomic family
Organism genus
Organism specific name
Earlier known geological period of occurrence
Later known geological period of occurrence
Occurrence's oldest time boundary in million years
Occurrence's newest time boundary in million years
Midpoint between max_ma and min_ma
Longitude of place where occurrence was found. Follows decimal degree format.
Latitude of place where occurrence was found. Follows decimal degree format.
The Paleobiology Database (downloaded on 2022-03-11).
Data URL: http://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Mammalia&show=full,classext,genus,subgenus,acconly,ident,img,etbasis,strat,lith,env,timebins,timecompare,resgroup,ref,ent,entname,crmod
Mammals species list
mammals_spp
mammals_spp
A vector
containing the names of almost all (i.e., 4099) extant mammal species, following Upham et al (2019) taxonomy. This dataset is part of the package and is licensed
under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Actual file downloaded from https://vertlife.org/data/
Upham, N. S., Esselstyn, J. A., & Jetz, W. (2019). Inferring the mammal tree: species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS biology, 17(12), e3000494.
NatSelSim
simulates natural selection in a bi-allelic gene through
n.gen
generations.
NatSelSim( w11 = 1, w12 = 1, w22 = 0.9, p0 = 0.5, n.gen = 10, plot.type = "animateall", print.data = FALSE, knitr = FALSE )
NatSelSim( w11 = 1, w12 = 1, w22 = 0.9, p0 = 0.5, n.gen = 10, plot.type = "animateall", print.data = FALSE, knitr = FALSE )
w11 |
Number giving the fitness of genotype A1A1. Values will be normalized if any genotype fitness exceeds one. |
w12 |
Number giving the fitness of genotype A1A2. Values will be normalized if any genotype fitness exceeds one. |
w22 |
Number giving the fitness of genotype A2A2. Values will be normalized if any genotype fitness exceeds one. |
p0 |
Initial (time = 0) allelic frequency of A1.
A2's initial allelic frequency is |
n.gen |
Number of generation that will be simulated. |
plot.type |
String indicating if plot should be animated.
The default, "animateall" animate all possible panels.
Other options are "static" (no animation), "animate1", "animate3", or
"animate4". Users can animate each panel individually (using
|
print.data |
Logical indicating whether all
simulation results should be returned as a |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
If any value of fitness (i.e., w11
, w12
,
w22
) is larger than one, fitness is interpreted as absolute fitness
and values are re-normalized.
If print.data = TRUE
, it returns a data.frame
containing the number of individuals for each genotype through time. The
plots done by the function shows (1) Allele frequency change through time.
(2) The adaptive landscape (which remains static during the whole simulation,
so can't be animated), (3) Time series of mean population fitness,
and (4) Time series of genotypic population frequencies.
Matheus Januario, Jennifer Auler, Dan Rabosky
Fisher, R. A. (1930). The Fundamental Theorem of Natural Selection. In: The genetical theory of natural selection. The Clarendon Press
Plutynski, A. (2006). What was Fisher’s fundamental theorem of natural selection and what was it for?. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 37(1), 59-82.
#using the default values (w11=1, w12=1, w22=0.9, p0=0.5, n.gen=10) NatSelSim() # Continuing a simulation for extra time: # Run the first simulation sim1=NatSelSim(w11 = .4, w12 = .5, w22 = .4, p0 = 0.35, n.gen = 5, plot.type = "static", print.data = TRUE, knitr = TRUE) # Then take the allelic frequency form the first sim: new_p0 <- (sim1$AA[nrow(sim1)] + sim1$Aa[nrow(sim1)]*1/2) # and use as p0 for a second one: NatSelSim(w11 = .4, w12 = .5, w22 = .4, p0 = new_p0, n.gen = 5, plot.type = "static", knitr = TRUE)
#using the default values (w11=1, w12=1, w22=0.9, p0=0.5, n.gen=10) NatSelSim() # Continuing a simulation for extra time: # Run the first simulation sim1=NatSelSim(w11 = .4, w12 = .5, w22 = .4, p0 = 0.35, n.gen = 5, plot.type = "static", print.data = TRUE, knitr = TRUE) # Then take the allelic frequency form the first sim: new_p0 <- (sim1$AA[nrow(sim1)] + sim1$Aa[nrow(sim1)]*1/2) # and use as p0 for a second one: NatSelSim(w11 = .4, w12 = .5, w22 = .4, p0 = new_p0, n.gen = 5, plot.type = "static", knitr = TRUE)
OneGenHWSim
creates n.sim
simulations of one
generation of genotypes under Hardy-Weinberg equilibrium for a
bi-allelic loci.
OneGenHWSim(n.ind = 50, p = 0.5, n.sim = 100)
OneGenHWSim(n.ind = 50, p = 0.5, n.sim = 100)
n.ind |
Integer indicating the census size of the simulated populations. If decimals are inserted, they will be rounded. |
p |
Numerical between zero and one that indicates A1's allele
frequency. A2's allele frequency is assumed to be |
n.sim |
Number of simulations to be made. If decimals are inserted, they will be rounded. |
A data.frame
containing the number of individuals for each
genotype.
Matheus Januario, Dan Rabosky, Jennifer Auler
Hardy, G. H. (1908). Mendelian proportions in a mixed population. Science, 28, 49–50.
Weinberg, W. (1908). Uber den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins fur vaterlandische Naturkunde in Wurttemberg, Stuttgart 64:369–382. [On the demonstration of inheritance in humans]. Translation by R. A. Jameson printed in D. L. Jameson (Ed.), (1977). Benchmark papers in genetics, Volume 8: Evolutionary genetics (pp. 115–125). Stroudsburg, PA: Dowden, Hutchinson & Ross.
Mayo, O. (2008). A century of Hardy–Weinberg equilibrium. Twin Research and Human Genetics, 11(3), 249-256.
#using the default values (n.ind = 50, p = 0.5, n.sim = 100): OneGenHWSim() #Simulating with a already fixed allele: OneGenHWSim(n.ind = 50, p = 1) # Testing if the simulation works: A1freq <- .789 #any value could work n.simul <- 100 simulations <- OneGenHWSim(n.ind = n.simul, n.sim = n.simul, p = A1freq) #expected: c(A1freq^2, 2*A1freq*(1-A1freq), (1-A1freq)^2) #simulated: apply(X = simulations, MARGIN = 2, FUN = function(x){mean(x)/n.simul})
#using the default values (n.ind = 50, p = 0.5, n.sim = 100): OneGenHWSim() #Simulating with a already fixed allele: OneGenHWSim(n.ind = 50, p = 1) # Testing if the simulation works: A1freq <- .789 #any value could work n.simul <- 100 simulations <- OneGenHWSim(n.ind = n.simul, n.sim = n.simul, p = A1freq) #expected: c(A1freq^2, 2*A1freq*(1-A1freq), (1-A1freq)^2) #simulated: apply(X = simulations, MARGIN = 2, FUN = function(x){mean(x)/n.simul})
Plot NatSelSim output
plotNatSel( gen.HW = gen.HW, p.t = p.t, w.t = w.t, t = t, W.gntp = c(w11, w12, w22), plot.type = "animateall", knitr = FALSE )
plotNatSel( gen.HW = gen.HW, p.t = p.t, w.t = w.t, t = t, W.gntp = c(w11, w12, w22), plot.type = "animateall", knitr = FALSE )
gen.HW |
Dataframe with A1A1, A1A2 and A2A2 genotypic frequencies in each generation (nrows = NGen) |
p.t |
Allelic frequency through time |
w.t |
Mean population fitness through time |
t |
time |
W.gntp |
Initial genotypic fitness |
plot.type |
String indicating if plot should be animated. The default, "animateall", animate all possible panels. Other options are "static", "animate1", "animate3", or "animate4". |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
Plot of NatSelSim's output (see NatSelSim
's help page for
details).
plotPaintedWhales
plots the phylogeny from Steeman et al (2011), coloring the Dolphins (Delphinidae), porpoises (Phocoenidae), the Mysticetes, the baleen whales (Balaenopteridae), and the Beaked whales (Ziphiidae).
plotPaintedWhales( show.legend = TRUE, direction = "rightwards", knitr = FALSE, ... )
plotPaintedWhales( show.legend = TRUE, direction = "rightwards", knitr = FALSE, ... )
show.legend |
Logical indicating if clade legend should be shown. |
direction |
Phylogeny plotting direction. Should be set to "rightwards" |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
... |
other arguments to be passed to |
The whale phylogeny, with branch lengths being colored by a major whale taxonomic group.
Matheus Januario, Jennifer Auler
Steeman, M. E., Hebsgaard, M. B., Fordyce, R. E., Ho, S. Y., Rabosky, D. L., Nielsen, R., ... & Willerslev, E. (2009). Radiation of extant cetaceans driven by restructuring of the oceans. Systematic biology, 58(6), 573-585.
help page from phytools::plotSimmap
plotPaintedWhales(knitr = TRUE)
plotPaintedWhales(knitr = TRUE)
plotProteinSeq
draws the sequences of proteins within the same "ProteinSeq" object. In this format, more similar sequences will have
similar banding patterns.
plotProteinSeq(x, taxon.to.plot, knitr = FALSE)
plotProteinSeq(x, taxon.to.plot, knitr = FALSE)
x |
A "ProteinSeq" object containing proteins from |
taxon.to.plot |
A character vector providing the common name of the species that will be plotted. Must be a name present in |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
A draw of the protein sequence(s) provided. Colors refer to specific amino acids ("R", "W", "I", "F", "S", "T", "N", "H", "K", "D", "G", "L", "Y", "V", "M", "A", "E", "P", "Q", "C")", "gaps/space in the sequence ("-"), ambiguous amino acid ("B" - often representing either asparagine ("N") or aspartic acid ("D")), or another marker for ambiguous amino acid ("X").
Matheus Januario, Jennifer Auler
data(cytOxidase) plotProteinSeq(cytOxidase, c("human", "chimpanzee", "cnidaria"), knitr = TRUE)
data(cytOxidase) plotProteinSeq(cytOxidase, c("human", "chimpanzee", "cnidaria"), knitr = TRUE)
plotRawFossilOccs
calculates and plots the early and late boundaries
associated with each taxa in a dataset.
plotRawFossilOccs( data, tax.lvl = NULL, sort = TRUE, use.midpoint = TRUE, return.ranges = FALSE, knitr = FALSE )
plotRawFossilOccs( data, tax.lvl = NULL, sort = TRUE, use.midpoint = TRUE, return.ranges = FALSE, knitr = FALSE )
data |
A |
tax.lvl |
A |
sort |
|
use.midpoint |
|
return.ranges |
|
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
Plots a pile of the max-min temporal ranges of the chosen
tax.lvl
. This usually will be stratigraphic ranges for occurrences
(so there is no attempt to estimate "true" ranges), and if
tax.lvl = NULL
(the default), occurrences are drawn as ranges of
stratigraphic resolution (= the fossil dating imprecision). If
return.ranges = TRUE
, it returns a data.frame
containing the
diversity (column div
) of the chosen taxonomic level, through time.
Matheus Januario, Jennifer Auler
data("dinos_fossil") oldpar <- par(no.readonly = TRUE) par(mfrow=c(1,2)) plotRawFossilOccs(dinos_fossil, tax.lvl = "species", knitr = TRUE) plotRawFossilOccs(dinos_fossil, tax.lvl = "genus", knitr = TRUE) par(oldpar)
data("dinos_fossil") oldpar <- par(no.readonly = TRUE) par(mfrow=c(1,2)) plotRawFossilOccs(dinos_fossil, tax.lvl = "species", knitr = TRUE) plotRawFossilOccs(dinos_fossil, tax.lvl = "genus", knitr = TRUE) par(oldpar)
Plot WFDriftSim output
plotWFDrift(p.through.time, plot.type = plot, knitr = FALSE)
plotWFDrift(p.through.time, plot.type = plot, knitr = FALSE)
p.through.time |
Matrix with n.gen columns and n.sim lines |
plot.type |
String. Options are "static" or "animate" |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
A static or animated plot of populations under genetic drift through time
store_p = WFDriftSim(Ne = 5, n.gen = 10, p0=.2, n.sim=5, plot = "none", print.data = TRUE) plotWFDrift(store_p, "static")
store_p = WFDriftSim(Ne = 5, n.gen = 10, p0=.2, n.sim=5, plot = "none", print.data = TRUE) plotWFDrift(store_p, "static")
ProteinSeq
classThe ProteinSeq
class is an input for the functions countSeqDiffs
and is.ProteinSeq
. It consists of a character vector. Each entry in this vector represents the aminoacid (the protein components coded by a gene) sequence, for a given aligned protein sequence. The object must be a character, named vector, with the names typically corresponding to the species (name could be scientific or common name) from which every sequence came. The characters within the vector must correspond to valid aminoacid symbols (i.e. capitalized letters or deletion "_" symbols). Particularly, the following symbols relate to amino acids: "A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"
.
Importantly, the symbol "_"
means an indel (insertion or deletion), and the symbols "X", "B", "Z", "J"
should be considered as ambiguous site readings.
is.ProteinSeq(x) ## S3 method for class 'ProteinSeq' print(x, ...) ## S3 method for class 'ProteinSeq' summary(object, ...) ## S3 method for class 'ProteinSeq' head(x, n = 20, ...) ## S3 method for class 'ProteinSeq' tail(x, n = 20, ...)
is.ProteinSeq(x) ## S3 method for class 'ProteinSeq' print(x, ...) ## S3 method for class 'ProteinSeq' summary(object, ...) ## S3 method for class 'ProteinSeq' head(x, n = 20, ...) ## S3 method for class 'ProteinSeq' tail(x, n = 20, ...)
x |
an object of the class |
... |
arguments to be passed to or from other methods. #' @return Shows the last |
object |
an object of the class |
n |
number of aminoacids to be shown |
is.ProteinSeq
A ProteinSeq
must be a list containing
multiple vectors made of characters (usually letters that code to Amino Acids,
deletions, etc). All of these must have the correct length (i.e. same as all
the others) and their relative positions should match (i.e., the object must
contain alligned Amino acide sequences).
A logical indicating if object x
is of
class ProteinSeq
print.ProteinSeq
Prints a brief summary of a
print.ProteinSeq
containing the number of sequences and
the length of the alignment. See more details of the format in
??ProteinSeq
.
Same as print.ProteinSeq
.
Shows the first n
elements of a ProteinSeq
object.
Daniel Rabosky, Matheus Januario, Jennifer Auler
simulateBirthDeathRich
calculates the number of species at a certain
point in time, following a birth-death process.
simulateBirthDeathRich(t, S = NULL, E = NULL, K = NULL, R = NULL)
simulateBirthDeathRich(t, S = NULL, E = NULL, K = NULL, R = NULL)
t |
Point in time which richness will be simulated. |
S |
A numeric representing the per-capita speciation rate (in number
of events per lineage per million years). Must be larger than |
E |
A numeric representing the per-capita extinction rate (in number
of events per lineage per million years). Must be smaller than |
K |
A numeric representing the extinction fraction (i.e.,
|
R |
A numeric representing the per-capita Net Diversification
rate (i.e., |
The function only accepts as inputs S
and E
, or
K
and R
.
The number of simulated species (i.e., the richness).
Matheus Januario, Daniel Rabosky, Jennifer Auler
Raup, D. M. (1985). Mathematical models of cladogenesis. Paleobiology, 11(1), 42-52.
# running a single simulation: SS <- 0.40 EE <- 0.09 tt <- 10 #in Mya simulateBirthDeathRich(t = tt, S = SS, E = EE) #running many simulations and graphing results: nSim <- 1000 res <- vector() for(i in 1:nSim){ res <- c(res, simulateBirthDeathRich(t = tt, S = SS, E = EE)) } plot(table(res)/length(res), xlab="Richness", ylab="Probability")
# running a single simulation: SS <- 0.40 EE <- 0.09 tt <- 10 #in Mya simulateBirthDeathRich(t = tt, S = SS, E = EE) #running many simulations and graphing results: nSim <- 1000 res <- vector() for(i in 1:nSim){ res <- c(res, simulateBirthDeathRich(t = tt, S = SS, E = EE)) } plot(table(res)/length(res), xlab="Richness", ylab="Probability")
simulateTree
uses a birth-death process to simulate a phylogenetic
tree, following the format of ape
package's phylo
object. The
function is basically a wrapper for the diversitree
's tree.bd
function.
simulateTree( pars, max.taxa = Inf, max.t, min.taxa = 2, include.extinct = FALSE )
simulateTree( pars, max.taxa = Inf, max.t, min.taxa = 2, include.extinct = FALSE )
pars |
|
max.taxa |
Maximum number of taxa to include in the tree. If
|
max.t |
Maximum length to evolve the phylogeny over. If equal to
|
min.taxa |
Minimum number of taxa to include in the tree. |
include.extinct |
A |
see help page from diversitree::tree.bd
A phylo
object
Daniel Rabosky, Matheus Januario, Jennifer Auler
Paradis, E. (2012). Analysis of Phylogenetics and Evolution with R (Vol. 2). New York: Springer.
Popescu, A. A., Huber, K. T., & Paradis, E. (2012). ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics, 28(11), 1536-1537.
FitzJohn, R. G. (2010). Analysing diversification with diversitree. R Packag. ver, 9-2.
FitzJohn, R. G. (2012). Diversitree: comparative phylogenetic analyses of diversification in R. Methods in Ecology and Evolution, 3(6), 1084-1092.
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t=Inf) ape::plot.phylo(phy) ape::axisPhylo() # alternatively, we can stop the simulation using time: set.seed(42) phy2 <- simulateTree(pars = c(S, E), max.t=7) ape::plot.phylo(phy2) ape::axisPhylo()
S <- 1 E <- 0 set.seed(1) phy <- simulateTree(pars = c(S, E), max.taxa = 6, max.t=Inf) ape::plot.phylo(phy) ape::axisPhylo() # alternatively, we can stop the simulation using time: set.seed(42) phy2 <- simulateTree(pars = c(S, E), max.t=7) ape::plot.phylo(phy2) ape::axisPhylo()
Values of clade diversity for many clades of organisms (note some clades are nested within other clades in the dataset). This dataset is part of the package and is licensed =under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
timeseries_fossil
timeseries_fossil
A data.frame
with 598 rows and 6 columns.
Time series clade
Primary source of the Time series
Stem age of clade
Geological relative time (in Million years ago relative to present)
Geological time in million years since clade stem age
Number of species at given geological time
Legend:
anth = Anthozoa (Cnidaria);
art = Articulata (Crinoidea, Echinodermata);
biv = Bivalvia (Mollusca);
bryo = Bryozoa (Lophotrochozoa, Ectoprocta);
ceph = Cephalopoda (Mollusca);
chon = Chondrocytes (Chordata);
crin = Crinoidea (Echinodermata);
dinosauria = Dinosauria (Chordata);
ech = Echinoidea (Echinodermata);
foram = Foraminifera (Retaria);
gast = Gastropoda (Mollusca);
graptoloids = Graptolites (Graptolithina);
ling = Ligulata (Brachiopoda);
ostr = Ostracoda (Crustacea, Arthropoda);
tril = Trilobita (Arthropoda).
Data originally compiled from many primary sources. Organized, curated by, and downloaded from, Rabosky & Benson (2021).
Rabosky, D. L., & Benson, R. B. (2021). Ecological and biogeographic drivers of biodiversity cannot be resolved using clade age-richness data. Nature Communications, 12(1), 2945.
Many trilobite fossil occurrences from different moments of the geological past. Much information (i.e., extra columns) was removed from the original dataset to make it more compact, but it can be fully accessed by the data URL. This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
trilob_fossil
trilob_fossil
A data.frame
containing 24965 rows and 13 columns
Organism phylum
Organism taxonomic class
Organism taxonomic order
Organism taxonomic family
Organism genus
Organism specific name
Earlier known geological period of occurrence
Later known geological period of occurrence
Occurrence's oldest time boundary in million years
Occurrence's newest time boundary in million years
Midpoint between max_ma and min_ma
Longitude of place where occurrence was found. Follows decimal degree format.
Latitude of place where occurrence was found. Follows decimal degree format.
The Paleobiology Database (downloaded on 2022-03-11).
Data URL: http://paleobiodb.org/data1.2/occs/list.csv?datainfo&rowcount&base_name=Trilobita&show=full,classext,genus,subgenus,acconly,ident,img,etbasis,strat,lith,env,timebins,timecompare,resgroup,ref,ent,entname,crmod
WFDriftSim
simulates genetic drift of diploid Wright–Fisher
populations with a given effective population size through a certain number of
generations.
WFDriftSim( Ne, n.gen, p0 = 0.5, n.sim = 1, plot.type = "animate", print.data = FALSE, knitr = FALSE )
WFDriftSim( Ne, n.gen, p0 = 0.5, n.sim = 1, plot.type = "animate", print.data = FALSE, knitr = FALSE )
Ne |
Number giving the effective population size of the population |
n.gen |
Number of generations to be simulated. |
p0 |
Initial frequency of a given allele. As the simulated organism is
diploid, the other alleles frequency will be |
n.sim |
Number of simulations to be made. If decimals are inserted,
they will be rounded. Default value is |
plot.type |
Character indicating if simulations should be plotted as colored
lines. Each color represents a different population. If
|
print.data |
Logical indicating whether all simulation results should be
returned as a |
knitr |
Logical indicating if plot is intended to show up in RMarkdown files made by the |
The effective population size (Ne
) is strongly connected
with the rate of genetic drift (for details, see Waples, 2022).
If plot.type = "static"
or "animate"
, plots the
timeseries of all simulations, with each line+color referring to a
different simulation. Note that if many simulations (generally more
than 20) are simulated, colors might be cycled and different simulation
will have the same color. If print.data = TRUE
, returns a
data.frame
with the simulation results.
Matheus Januario, Dan Rabosky, Jennifer Auler
Fisher RA (1922) On the dominance ratio. Proc. R. Soc. Edinb 42:321–341
Kimura M (1955) Solution of a process of random genetic drift with a continuous model. PNAS–USA 41(3):144–150
Tran, T. D., Hofrichter, J., & Jost, J. (2013). An introduction to the mathematical structure of the Wright–Fisher model of population genetics. Theory in Biosciences, 132(2), 73-82. [good for the historical review, math can be challenging]
Waples, R. S. (2022). What is Ne, anyway?. Journal of Heredity.
Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159
#Default values: WFDriftSim(Ne = 5, n.gen = 10, knitr = TRUE) #A population which has already fixed one of the alleles: WFDriftSim(Ne = 5, n.gen = 10, p0=1, knitr = TRUE) #Many populations:: WFDriftSim(Ne = 5, n.gen = 10, p0=0.2, n.sim=10, knitr = TRUE) ######## continuing a previous simulation: n.gen_1stsim <- 10 # number of gens in the 1st sim: sim1 <- WFDriftSim(Ne = 5, n.gen = n.gen_1stsim, p0=.2, n.sim=10, plot.type = "none", print.data = TRUE, knitr = TRUE) n.gen_2ndsim <-7 # number of gens in the 2nd sim: # now, note how we assigned p0: sim2 <- WFDriftSim(Ne = 5, n.gen = n.gen_2ndsim, p0=sim1[,ncol(sim1)], plot.type = "static", n.sim=10, print.data = TRUE, knitr = TRUE) # if we want to merge both simulations, then we have to: # remove first column of 2nd sim (because it repeats # the last column of the 1st sim) sim2 <- sim2[,-1] # re-name 2nd sim columns: colnames(sim2) <- paste0("gen", (n.gen_1stsim+1):(n.gen_1stsim+n.gen_2ndsim)) #finally, merging both rounds of simulations: all_sims <- cbind(sim1, sim2) head(all_sims)
#Default values: WFDriftSim(Ne = 5, n.gen = 10, knitr = TRUE) #A population which has already fixed one of the alleles: WFDriftSim(Ne = 5, n.gen = 10, p0=1, knitr = TRUE) #Many populations:: WFDriftSim(Ne = 5, n.gen = 10, p0=0.2, n.sim=10, knitr = TRUE) ######## continuing a previous simulation: n.gen_1stsim <- 10 # number of gens in the 1st sim: sim1 <- WFDriftSim(Ne = 5, n.gen = n.gen_1stsim, p0=.2, n.sim=10, plot.type = "none", print.data = TRUE, knitr = TRUE) n.gen_2ndsim <-7 # number of gens in the 2nd sim: # now, note how we assigned p0: sim2 <- WFDriftSim(Ne = 5, n.gen = n.gen_2ndsim, p0=sim1[,ncol(sim1)], plot.type = "static", n.sim=10, print.data = TRUE, knitr = TRUE) # if we want to merge both simulations, then we have to: # remove first column of 2nd sim (because it repeats # the last column of the 1st sim) sim2 <- sim2[,-1] # re-name 2nd sim columns: colnames(sim2) <- paste0("gen", (n.gen_1stsim+1):(n.gen_1stsim+n.gen_2ndsim)) #finally, merging both rounds of simulations: all_sims <- cbind(sim1, sim2) head(all_sims)
An ultrametric phylogenetic tree of the living cetaceans. Phylogeny generated by Steeman et al (2009). This dataset is part of the package and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
whale_phylo
whale_phylo
An ultrametric phylo
object with 87 tips
Original phylogeny generation by Steeman et al (2001). File obtained from Rabosky et al, 2014.
Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014). BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707.
Steeman, M. E., Hebsgaard, M. B., Fordyce, R. E., Ho, S. Y., Rabosky, D. L., Nielsen, R., ... & Willerslev, E. (2009). Radiation of extant cetaceans driven by restructuring of the oceans. Systematic biology, 58(6), 573-585.