2.3. From parsimony-based to semiparametric approaches
Event-based methods were based on explicit process models and therefore represented a considerable advance over cladistic methods in terms of statistical power. However, they still have several limitations. The cost of events, for example, cannot be estimated from the data but must be fixed a priori using ad hoc criteria such as the phylogenetic conservation of ancestral ranges. As explained above, through dispersal and extinction, the descendants come to occupy a different range than the ancestor, so these two processes are underestimated (“minimized”) in event-based reconstructions (Sanmartín 2012). Another constraint imposed by the use of parsimony is that biogeographic reconstructions are inferred on cladograms with no branch lengths or time information. This makes it difficult to discriminate between shared distribution history and biogeographic “pseudocongruence”, that is, shared patterns that are spatially but not temporally congruent (Page 1993; Donoghue and Moore 2003) and between competing dispersal and vicariance explanations: in vicariance, the appearance of the barrier within the ancestral range is the trigger for allopatric speciation, while in dispersal, the barrier predates geographic change (Chapter 1). Finally, parsimony-based inference ignores two sources of error associated with reconstructing trait evolution: “phylogenetic uncertainty” – ancestral areas are reconstructed over a single topology, the most parsimonious tree, assuming there is no uncertainty in phylogenetic relationships – and “reconstruction uncertainty”, where only the minimum-cost (most parsimonious) reconstructions are evaluated, even though alternative reconstructions could be almost as likely, or even more likely if additional information is considered (Ronquist 2004).
Incorporating phylogenetic uncertainty in EBMs is relatively straightforward: run the analysis over a distribution of trees that represent the level of clade support in the phylogeny; this distribution can be obtained from bootstrap pseudoreplication or a Bayesian posterior probability distribution. Non-bifurcating nodes (polytomies) and nodes with low clade support can then be associated with low support for ancestral ranges. In Nylander et al.’s (2008) Bayes-DIVA method, DIVA parsimony-based reconstructions are averaged over a distribution of trees representing the posterior probability obtained from a Bayesian phylogenetic analysis. Figure 2.4 gives an example. Node “X” is a highly supported clade including three species distributed in areas C, B and A. Phylogenetic relationships in the rest of the tree are uncertain and vary over the Bayesian sample of trees, including the potential sister-group. For example, the stem or parent node of X (“Y” in Figure 2.4) has as the other descendant: “D” in tree 1, “E” in tree 2 and “F(ED)” in tree 3. Each of these tree topologies has a different posterior probability (PP) value. Because in Bayesian inference (BI), the frequency with which a tree is sampled in the analysis is proportional to its posterior probability (Ronquist 2004), the nodal ancestral area reconstructions in Bayes-DIVA can be interpreted as “marginal probabilities” (i.e. the different wedges in the pie chart in Figure 2.4). In other words, averaging DIVA reconstructions over a Bayesian sample of trees gives us estimates of ancestral ranges at nodes that are marginalized over the variation in the remainder tree topology. Notice that DIVA does not integrate branch length information, so the only parameter that is marginalized is the tree topology; in this sense, Bayes-DIVA can be considered an empirical Bayesian method (Nylander et al. 2008). It is also a semiparametric model since it contains a parametric (Bayesian phylogenetic inference) and a nonparametric (parsimony biogeographic inference) component. Another important distinction is given by tree 4: Pagel et al.’s (2004) definition of a “floating node”. Trees containing different definitions (bipartitions) of node X (tree 4, PP = 0.10) are excluded from the marginal estimations for that node in Bayes-DIVA: that is, the wedges in the pie chart sum to 0.90 (Figure 2.4). In other words, Bayes-DIVA uses a node-to-node approach in accounting for phylogenetic uncertainty: only those trees containing the node of interest (X) will be used in the estimation of ancestral-range marginal probabilities.
Figure 2.4. Accounting for phylogenetic uncertainty: the Bayes-DIVA approach. Trees 1–3 represent different phylogenetic relationships among nine species distributed in areas A–I. These trees were obtained by Bayesian inference, so each of them is associated with a posterior probability (PP) value, that is, the frequency with which the tree appears in the Bayesian posterior distribution. In all trees (1–3), the species present in areas A–C form a well-supported clade, node “X”; however, the remainder of the tree topology, including the identity of the sister-group and the definition of the “parent node Y”, are uncertain. In Bayes-DIVA, a DIVA analysis is run on each of these alternative trees. Because each tree is “weighted” by its posterior probability in the Bayesian sample, the end result is marginal probabilities of ancestral ranges, that is, DIVA reconstructions are marginalized or integrated over the uncertainty in the tree topology; this is represented in the central pie chart, where range A receives the highest marginal probability, followed by B and AB. Notice that only those trees containing node X are included in the computation of marginal probabilities. Tree 4, in which C, B and A do not form a clade, will not be included in the computation of marginal probabilities. For a color version of this figure, see www.iste.co.uk/guilbert/biogeography.zip
Nylander et al. (2008) demonstrated that accounting for phylogenetic uncertainty may also reduce biogeographic uncertainty: that is, for a given node, some ancestral ranges that were equally optimal in DIVA will be associated with higher marginal probabilities in the Bayes-DIVA analysis; in Figure 2.4, the ancestral range for node X that receives the highest marginal probability is A.
Harris and Xiang (2009) subsequently developed S-DIVA, implemented in the software RASP (Yu et al. 2010), to introduce “reconstruction uncertainty” in a Bayes-DIVA analysis. In a DIVA reconstruction, there could be several equally parsimonious ancestral ranges optimized at a given node (e.g. A, B or AB in Figure 2.4), as well as for each of these equally parsimonious ancestral ranges, there could be multiple different pathways (combinations of dispersal and cladogenetic events) by which a given ancestral range (e.g. A) is optimized along the tree. In Nylander et al. (2008)’s Bayes-DIVA approach, biogeographic uncertainty was defined as the first option: equally parsimonious ancestral ranges were assigned an equal weight (1/N), where N is the number of most parsimonious (MP) alternative ranges for a given node. In Harris and Xiang’s S-DIVA approach, biogeographic uncertainty is defined as the frequency with which a particular ancestral range appears within the pool of most parsimonious biogeographic pathways: F(r) = i/Rt, where i is the number of times a range (r) occurs in the total number of MP scenarios (Rt) over the tree.
Nylander et al.’s interpretation of ancestral range uncertainty as marginal probabilities is consistent with Bayesian probabilistic inference – nodal ancestral ranges are marginalized over a parameter (tree topology), whose values are sampled according to a probability distribution (Huelsenbeck et al. 2000). However, the probabilistic interpretation of F(r) in S-DIVA is not that clear. This has to do with the distinction between joint and marginal reconstructions. In a likelihood context, a joint reconstruction is the single best reconstruction across all nodes in a tree, and the marginal reconstructions are the single best reconstruction for each node considered along and after integrating for all possible reconstructions in the rest of the nodes. MP pathways in DIVA are estimated jointly over the tree, whereas ancestral ranges are estimated node-by-node.