Whether evolution proceeds by tiny steps or big leaps, by examining the fossil record we can trace the evolution of man and animals back to the emergence of the first animals in the Cambrian explosion five hundred and fifty million years ago. Rocks earlier than the Cambrian explosion have very few fossils – which are nearly all microbes. Unfortunately, microbial fossils are not very distinctive. They give few clues about the evolutionary changes that led to the emergence of animals. To go deeper into the history of life we need to dig into DNA, rather than rocks.
THE GENE CLOCK
The word for milk is lait in French, latte in Italian, leche in Spanish and leite in Portuguese but milk in English milch in German and mjölk in Swedish. French, Italian, Spanish and Portuguese are all Italic languages; whereas English, German and Swedish are Germanic languages. Other European language groups include Celtic, Hellenic and Slavic. In 1786, Sir William Jones, an English judge serving in India, first noticed similarities between the ancient Indian language, Sanskrit and various European languages. For instance, the word for king is rex in Latin, ri in Irish, raja in Sanskrit. The same root turns up in the English word ruler. Sir William considered that these similarities could not have arisen by chance but must reflect a common linguistic inheritance. The English scholar Thomas Young later coined the term Indo-European to describe these common languages.
Modern languages are thought to be derived from an ancestral proto-Indo-European language spoken by either a Bronze or Neolithic Age people. The original Indo-Europeans would have spoken a common proto-Indo-European but gradually as the people dispersed, their languages diverged to develop into the modern family of languages. Philologists (those who study language development) compare similar words in each language to derive a plausible ancestral word. For instance, a single word for milk, approximating to lakte, is thought to have been used by people who spoke proto-Italic, the ancestral language of modern French, Italian, Spanish and Portuguese. The patterns of divergence in each language group could then be estimated by counting the number of sound shifts required to change from the putative ancestral word to all its modern forms. Languages linked by few sound shifts, such as Spanish and Portuguese, are considered to have diverged relatively recently. Languages linked through more sound shifts, such as German and Portuguese, must have separated much further back. In this way, a family tree of languages can be suggested. By dating language divergence to a historical event (for instance, the settling of England by an Anglo-Saxon speaking people in about 550 ad that led to the separate development of English), philologists can provide a very rough calibration of the rate of divergence of languages.
People’s common inheritance can also be traced in their genes. Consider a short gene segment from four different individuals that reads: ATTGC in Harry, AATCA in Jim, GATGC in Betty and ACTGC in Bertha. A plausible sequence that might have belonged to their last common ancestor (the proto-Harry-Jim-Betty-Bertha) would be AATGC, since only one base change is needed to change the ancestor sequence into any of its modern descendants. Harry, Jim, Betty and Bertha could be said to belong to a gene family. If the DNA of another individual, Ted, was sequenced as AATTT we then could conclude that Ted was more distantly related to Harry, Jim, Betty and Bertha since two sequences changes are required to connect his sequence to any of the others. The last common ancestor of all five individuals, the proto-Harry-Jim-Betty-Bertha-Ted must have lived earlier than proto-Harry-Jim-Betty-Bertha. Just as with words, DNA sequences can be used to draw up family trees, only this time the tree reflects genetic rather than cultural inheritance.
Carl Woese of the University of Illinois was the first to make extensive use of DNA sequences to examine the early evolution of living creatures. He used the gene sequences encoding one of the sub-units of the ribosomes (the protein-making machine in cells) as a gene clock to construct a universal genetic tree. The tree divides all life into three domains. The first contains the eukaryotes (whose DNA is enclosed within the nucleus) which includes the unicellular protozoa (such as amoeba) and multicellular plants, fungi and animals – and us. The other two domains both consist entirely of prokaryotic (which means before the nucleus – whose DNA is not enclosed within nuclei) organisms. The eubacterial (true bacteria) domain contains most of the bacteria we are familiar with, such as E. coli. The third domain is that of a newly recognized bacterial group, called the Archaea.
Many aspects of the tree agreed more or less with evolutionary thinking. The ribosomal RNA sequences of the multicellular animals’ groups (vertebrates, worms, sponges, arthropods etc.) diverged at roughly the time of the Cambrian explosion. The rRNA of plant chloroplasts (which have their own DNA including rRNA genes) was found to be similar to bacterial rRNA, tying in with a theory championed by Lynn Margolis in the late 1960s, that these organelles were descended from symbiotic bacteria. The separate deep branching of eukaryotic (animals and plants) genes did come as something of a surprise. Until then it was generally assumed that eukaryotes had branched off from some bacterial ancestor billions of years ago; but that would have left us closely related to one of the bacterial groups. There was no evidence for this in the tree. Eukaryotes appeared as ancient as bacteria. This feature of the universal tree still remains a puzzle.
The recognition of the Archaea as a distinct domain of life also came as a major surprise to biologists. We have already met some of the Archaea in Chapter Two. The extreme thermophilic bacteria thriving in the undersea vents; the halophiles living in briny waters; and the methane-producing bacteria, are all Archaea. They have markedly different enzymes, fats, and cell structure to eubacteria and eukaryotes. Scientists had until then considered them as bacterial oddities but Woese’s analysis placed them as a separate form of life that, at a molecular level, is quite as different from, say, E. coli as we are. Scientists have sequenced all 1,664,976 DNA bases that make up the genome of an Archaea called Methanococcus janaschii, fished out of a two thousand, six hundred metre deep ‘white smoker’ hydrothermal vent chimney. Many of its genes are similar to those found in eukaryotes, suggesting that our nuclear genome may be the descendant of an ancient archaeon.
HOW DID GENES EVOLVE?
By and large, gene sequence data supports the neoDarwinian notion that gene evolution has involved a series of gradual modifications of existing genes through mutations. Nevertheless, problem areas remain. The first (already mentioned): how to account for apparent big jumps. A related problem, apparent in the DNA record, is the relationship between the major protein families. Examination of genes from diverse organisms has established that all modern proteins fall into about a thousand distinct protein families’. Although evolution within protein families, such as the globin (the protein in haemoglobin) gene family, can generally be traced through a number of antecedent proteins present in living creatures, finding the links between the protein families is much more difficult. Animal globins bear some relation to oxygen storage proteins found in bacteria but there is little or no identifiable relationship between these globin-related proteins and any of the nine hundred and ninety-nine or so, other protein families. The same is true for all the other protein families – there is much evidence for Darwinian evolution within the family, but no obvious close relative from which the family could have evolved. Each protein family is like a separate galaxy (of related proteins) in a vast outer space of protein sequences. New protein families must have arisen from existing proteins by some kind of mutational process but how their sequence traversed this vast empty sequence space devoid of Darwinian intermediates, is a mystery. It seems molecular evolution often proceeds though a series of small steps but that sometimes it takes big leaps – rather like the punctuated evolution envisaged by Gould and Eldridge. Big leaps are big problems for neoDarwinian evolution because the chances of a big jump landing anywhere useful are generally thought exceedingly small. As Richard Dawkins states, ‘However many ways there are of being alive, it is certain that there are vastly more ways of