Phylogeny_101 - Part 2: Building phylogenetic trees by hand


This section is adapted from Activities Exchange from the national health museum .

To study the evolutionary relationships between organisms alive today, various methods can be employed to estimate when those organisms may have diverged from a common ancestor. In the past, much of this work was done by making observations of anatomy and physiology and with comparisons in fossil records.

More recently, techniques have been developed in molecular biology for performing such comparisons, they are collectedly referred to as 'Molecular Phylogeny' since they all perform comparisons at the molecular level such as by comparing DNA, RNA or protein sequences.

Generally speaking, there are three major methods [in molecular phylogeny] of constructing phylogenetic trees:
a. Distance-matrix based methods, ex, , least squares, neighbor joining.
b. Maximum Parsimony
c. Maximum Likelihood

Distance-matrix based methods require a multiple sequence alignment in order to construc the phylogenetic tree; however, the other two methods (maximum parsimony and maximum likelihood) do not require the multiple sequence alignment in order to construct the phylogenetic tree.

In this exercise, we will use the distance-matrix based method. The general idea is to calculate a measure of the distance between each pair of species, tabulate those distances in a square matrix, then find a tree that predicts the observed set of distances as closely as possible.

We can think of the distance between each pair of species as the branch length separating that pair of species. Those organisms that show the greatest number of nucleotide sequence differences are considered to have diverged from a common ancestor (following separate evolutionary paths) the greatest number of years ago (e.g., Nos. 1 and 6 in the diagram below). If two organisms have few nucleotide differences between them, but a large and approximately equal number of differences from some third organism, they would be closely related and likely be found on "twigs" of a branch that are far removed from that third species. (e.g., Nos. 3, 4, and 1).


Figure 1: Phylogenetic tree

There are several algorithms available in the Biology Workbench that perform tree reconstruction automatically. In this segment of the unit, we will do manually what those algorithms do automatically.

Step 1: Collect sequences.
If we want to construct a phylogenetic tree we need to compare sequences that come from the same particular segment of DNA common to ALL organisms. Figure 2 shows the FASTA records of the sequences that we are going to compare.

>seq_org1
VNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKF
>seq_org2
VNFKLLSHCLLVTLACHLPTEFTPAVHASLDKF
>seq_org33
ENFKLLTNVLVCVLAHHFGRFFTPPVHAAYQKF
>seq_org4
ENFKLLTNVLVCVLAVHFGKFFTPPVHAAYQKF
>seq_org5
DNFKLLSEMIIQVLASHHPPCFTPDVHGMMVKF


Figure 2: FASTA records of the amino acid sequences to be used for tree construction

Step 2: Align all sequences.
To estimate when those organisms may have diverged from a common ancestor, we need to compare how different from each other the sequences are. How do we do that? One way is by trying to align them; that is, by matching the positions of the residues that have not changed and ALIGNING them in the same column as illustrated in the figure below.

Text Box: Seq_org1          VNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKF  Seq_org2          VNFKLLSHCLLVTLACHLPTEFTPAVHASLDKF  Seq_org3          ENFKLLTNVLVCVLAHHFGRFFTPPVHAAYQKF  Seq_org4          ENFKLLTNVLVCVLAVHFGKFFTPPVHAAYQKF  Seq_org5          DNFKLLSEMIIQVLASHHPPCFTPDVHGMMVKF

Figure 3: Multiple sequence alignment. Completely conserved residues are shown in blue

Step 3: Calculating evolutionary distance.
We need to use the results of the previous step to count  the number of differences between all pairs of sequences AND to make the actual tree construction task a little bit easier, we also need to sort the differences from smallest difference to largest difference.


Figure 4. Results of counting the differences in the alignment by comparing pairs of sequences.

Figure 5. Same as in Figure 4, but sorted in ascending order.

Step 4: Build the tree.
We need to build a binary tree with those five sequences, that is, a tree in which each node (denoted as a circle in the figures below) can have up to two banches (denoted as lines in the figures below) and the sequences go on the leaves (tips) of the tree.

Hint: we have five sequences, therefore the final tree has to have five leaves. There are many configurations (i.e. topologies) that can be used to represent the sequences; the figures below depict three different topologies of binary trees with five leaves. Choose the tree topology that best fits your data in the previous step and fill in the blanks with the sequence# in the appropriate branch of the tree. Remember, small difference means close proximity in the tree; whereas large difference means large distance in the tree.
Figure 6. Three possible tree topologies to choose from.

1. Start with TWO sequences and add the rest of the sequences one at a time.
2. Each new sequence becomes a leaf of the tree (meaning, nothing further can be attached to this point).
3. Choose the place carefully and take into account the information in the chart above.
4. Sequences 3 and 4 are closest; therefore, they should stem from the same tree branch.
5. Sequences 1 and 2 are also close to each other than to any other. These should become a separate branch of the tree.
6.These two branches [1-2, 3-4] are closest to each other than to Sequence 5. Thus, Sequence 5 seems to be the outlier.

Answer:

The exercise that you have just gone through is logically identical to the work that Carl Woese http://en.wikipedia.org/wiki/Carl_Woese and George Fox did in 1977 in order to discover the division of all of life into three kingdoms, the archaea, the bacteria, and the eukaryotes.  You can find their original paper at Woese C, Fox G (1977). "Phylogenetic structure of the prokaryotic domain: the primary kingdoms.". Proc Natl Acad Sci U S A 74 (11): 5088-90

<< Previous ^Top^ Next >>