Phylogeny and Reconstructing Phylogenetic Trees

Mutations




Distances between species

What is a good measure of how close two species are? We can use the phylogenetic tree to define a distance between species. Any two species have a unique most recent common ancestor in the tree. Two species are closely related if that common ancestor is recent and distantly related if the common ancestor is remote. So, for a measure of the distance between species, just add the distances of each to their ancestor together. That is, if the ancestor occurs 2 time units in the past, then the distance between the two species would be 4. (This measure can be used to determine the distance between any two points in the tree, too. Just determine the length (in time) of the unique path between the points.)

Mutations as an appoximation of distance

Suppose now that we only know about the extant species, and we don't know how they evolved. How can we reconstruct the phylogenetic tree? It would help to know the distances bewteen any two extant species. In fact, if we knew the exact distances, then it would be easy to reconstruct the tree. Unfortunately, we don't know the distances, either.

But there are ways to approximate the distances. If two species have diverged from a common ancestor, then they will have evolved in their own separate ways, and, so, their characters will differ. We can construct another measure of the distance between species by summing the differences in their characters. We've got a lot of choices here. What characters should we choose? How do we measure the difference in a characteristic? Different choices give us different sets of distances between species, and these different sets of measures will lead us to reconstruct different phylogenetic trees.

There are other problems with this approach. Not all species evolve at the same rate. Some species have been stable for millions of years. Others evolve very fast. If a species depends on a characteristic for its continued survival, that characteristic will not change as any mutations of it will be eliminated. Call such characters essential. And most visible characters are essential for the species. This means that if we choose essential characters, any differences should count as very significant. There are, however, some difficulties with considering essential characters. If one species evolves by changing an essential characteristic, whatever ecological forces supported that change may also apply to other species, and that could lead to parallel evolution. Thus, differences or similarities in essential characters need not reflect large or small distances in the phylogenetic tree.

Irrelevant mutations

We could, on the other hand, consider irrelevant mutations. These are mutations in characters that don't matter. The rate of change of irrelevant mutations should be fairly uniform among species. Unfortunately, if a characteristic really doesn't matter, it should be very difficult to perceive the value of that characteristic in a species.

For an example of an irrelevant mutation, consider this. There are 64 (4 cubed) different codons for 20 amino acids. Some amino acids are coded by up to four different codons. For these multiply coded amino acids, typically the third nucleotide can take any of the four possible values. In other words, a mutation in this third nucleotide is irrelevant. The DNA can mutate at this site and the resulting protein doesn't change.

Here is a phylogenetic tree with five extant species alongside a matrix. This 5 by 5 matrix results from mutations of 40 irrelevant characterisics each with 4 alternate values (like those described in the previous paragraph). The mutation rate is uniform with a value of 100 mutations per 1000 time units, that is, 0.1 mutations per time unit. The (i,j)th entry in the matrix indicates how many of the 40 characters differ between species i and species j. If two species are not very distant in the tree, there hasn't been much time for mutations to occur, so the entry in this matrix should be small. If two species are quite distant (as, for example, when their common ancestor is at the top of the tree), the entry in the matrix should be large.

You can play around with the mutation matrix if you like. Press the "mutate" button to request a new set of mutations with the same number of characters, the same number of alternatives per characteristic, and the same mutation rate. You can also change these parameters, and each time you do, you'll get a new set of mutations automatically.


to the introduction. to the cover page. to the next page about reconstruction algorithms.


David E. Joyce

Department of Mathematics and Computer Science
Clark University

January, 1996