Phylogeny and Reconstructing Phylogenetic Trees
Reconstruction Algorithms
The problem
Suppose all we know is how far apart the speces are as measured by the number
of differences in their characters, that is, the entries in the
mutation matrix. How can we reconstruct the phylogenetic tree? Of course, we might
not be able to, since the mutations are random and need not reflect the
actual distances between species, so a better question is: how can we
reconstruct the most likely phylogenetic tree? This question could be made
precise using statistical theory, but there is no guarantee that the problem
is tractable. We can come up with some promising algorithms that should
give trees that aren't too far away.
A solution: the "minimum" reconstruction method
It seems reasonable that the two species that share the greatest number
of characters are the most closely related. That is, the smallest entry
in the mutation matrix indicates which two species diverged most recently.
Also, the next smallest entry should indicate which two species diverged
just before that. And so forth. That's the idea of the algorithm, but it
needs a little clarification. Suppose species 1 and 2 are closest with 4
differences in their characters, and species 1 and 3 are next closest
with 6 differences. Then since we conclude that species 1 and 2 diverged most recently,
it won't be species 1 and 3 that diverged just before that, rather it will
be the ancestral species of 1 and 2 that diverged from species 3 just before
that.
Here you see a distance matrix (which is actually a
mutation matrix based on a hidden phylogenetic tree) and the reconstructed
phylogenetic tree build from this minimum reconstruction method.
You should be able to see how that tree can be derived from the matrix.
If you like, you can ask for a different matrix based on different
mutations. Note how radically the reconstructed tree varies with
each set of mutations. If there are more characters, then
you'll see that the reconstructed matrix depends less on which actual
mutations occur.
Two other solutions: the "average" and "maximum" methods
These two algorithms are very similar to the minimum reconstruction method.
They start out exactly the same by joining the two species that share
the most characters. To explain these methods,
suppose that species 1 and 2 are closest. Name their ancestral species
as species 6. With the minimum method, we effectively determined
that the distance between species 6 and any other species such as species 3
was the minimum of the distance from 1 to 3 and the distance from 2 to 3.
With the maximum method, instead take the distance from species 6 to species
3 to be the maximum of these two distances. And, of course, for the
average method, take the average of those two distances.
It's surprizing how similar the trees resulting from these three algorithms
are. You can see for yourself by
selecting a different algorithm.
References
The minimum algorithm is a version of the single linkage clustering
algorithm also known as the nearest neighbor technique. The maximum
reconstruction method is also known as the complete linkage clustering
algorithm and the nearest neighbor technique. Further references on these
algorithms:
- K. Florek, J. Lukaszewicz, J. Perkul, H. Steinhaus, and S. Zubrzycki. Sur la liason et la
division des points d'un ensemble fini. Colloquium Math. 2 (1951), 282-285.
- G. N. Lance and W. T. Williams. A general theory of classificatory sorting strategies I.
Hierarchical systems. Computer J.
9 (1967), 373-380.
- S. C. Johnson. Hierarchical clustering aschemes. Psychometrika 32 (1967), 241-254.
- R. R. Sokal and C. D. Michener. A statistical method for evaluating systematic relationships.
Univ. Kansas Sci. Bull. 38 (1958), 1409-1438.
- P. H. A. Sneath and R. R. Sokal. Numerical Taxonomy, the Principles and Practice of
Numerical Classification. W. H. Freeman, San Francisco, 1973.
to mutations.
to the cover page.
to the next page putting everything together.
David E. Joyce
Department of Mathematics and Computer Science
Clark University
January, 1996