Bioinformatics
Release of the human genome sequence data, followed by the release of other species’ whole-genomes has transformed biology into an informational science. Researchers from many disciplines, including biology, chemistry, mathematics, computer science, and physics, work together developing high-throughput tools to analyze these massive data sets. Bioinformatics is the management and analysis of biological data using computational techniques, and merges these many disciplines.
We are using bioinformatics in a number of ways: 1) to create a database of candidate genes implicated in craniofacial development and evolution; 2) to extract and align orthologous candidate sequences from the human, chimp and macaque whole-genomes; and 3) to conduct comparative genomics analyses on orthologous candidate sequences to examine molecular evolution among these species.
Candidate Gene Acquisition and Database
Through an extensive literature search and queries of public databases such as Online Mendelian Inheritance In Man (OMIM), we have assembled a list of candidate genes implicated in craniofacial development and evolution. This list of genes, along with their genomic locations, RefSeq accession numbers, phenotypic implications, and primary references, are stored in a database for easy access by investigators.
Orthologous Sequences
Human, chimp and macaque orthologs for many of the candidate genes in our database have been extracted and aligned from the most current releases of whole genome data available (hg18 [Homo sapiens], panTro2 [Pan troglodytes], and rheMac2 [Macaca mulatta]).
Comparative Genomics
We are examining the molecular evolution of both protein coding and non-coding regions of the DNA sequences of these candidate gene orthologues. Changes in coding regions are important because they may affect the gene’s expression or its product. Less intuitively, changes in non-coding regions also are important because these regions may contain regulatory elements that affect expression of nearby genes. Understanding how these sequences have changed since the common ancestor of humans, chimps and macaques will provide knowledge of specific genotype-phenotype relationships and will allow us to generate informed hypotheses about how genetic changes may have affected morphology over the course of primate evolution. These hypotheses can then be brought to bear upon the fossil record.
Preliminary Results
Multipoint analyses of SOLAR (Almasy and Blangero, 1998) was used to study genetic linkage of 35 craniofacial linear measures (see Morphometrics) for quantitative trait loci (QTL) with sex and age included as covariates. High LOD scores were found for the facial measurements mxt-mda and fzj-pmm (Figure 2) on chromosome 4 (see Genetics).
Figure 2. Lateral view of baboon skull with distances mxt-mda and fzj-pmm drawn in green. Both of these distances had high LOD scores (with mxt-mda showing a significant LOD score) and were mapped to a similar region on chromosome 4. For a description and placement of these landmarks see biological landmarks.
RefSeq genes, which were examined to identify plausible candidates for craniofacial development. Hand2, an important transcription factor mapped to human 4q33 (OMIM #602407), was identified in the region. Circumstantial supporting evidence is that a 4q33-ter deletion, called HCA1, causes facial developmental anomalies in human families (OMIM #607258). Furthermore, there is a QTL affecting facial length variation in the homologous region in the LG/J by SM/J intercross mice (Ehrich et al., 2003). Experimental embryology has shown that Hand2 is expressed in the maxillary process of the 1st branchial arch when the facial processes are extending before they unite at the midline (Figure 3). This expression domain is in the location measured by the mxt-mda and fzj-pmm distances.
Figure 3. A. Schematic of human facial development at 6, 7 and 10 weeks. Colored regions are derived from the first branchial arch with the red portions representing the maxillary prominences and blue portions representing the mandibular prominences (Modified from J.A. Stoffer, 2003 http://www.indiana.edu/~anat550/hnanim/face/face.html). B. Illustration of the maxillary (red) and mandibular (blue) components of the baboon skull. Note that both mxt-mda and fzj-pmm (Figure 2) distances are found within the maxillary portion of the skull and are therefore within the region influenced by Hand2.
The Hand2 amino acid code is 100% identical in human, chimp, macaque, and even mouse; human-mouse coding regions have 94.5% DNA sequence identity, again suggesting the functional importance of this gene and, importantly, that its effects on variation are mediated via expression, rather than protein structure. A novel augmentation of the standard McDonald-Kreitman (McDonald and Kreitman, 1991) test for selection (MKAR test, H. Lawson et al., (submitted)) was applied to the non-coding untranslated, intronic, and flanking regions of Hand2. The test uses an internal genomic control as a proxy for neutrally evolving sequence (Hardison et al., 2003). Human polymorphism data from dbSNP126 (Sherry et al., 2001) divergence was reckoned over both human-chimp and human-macaque time scales (ca. 6 and 23 Mya, respectively) (Stewart and Disotell, 1998) (Figure. 4).
Figure 4. MKAR test results for Hand2 regulatory region relative to chimp (top) and macaque (bottom) genome.
The ratio of polymorphism to divergence, rpd, in these non-coding intervals can be compared with the rpd for neutral sites to find regions that deviate significantly from neutrality (Akashi, 1995). MKAR test results for Hand2 are shown on Figure 4 relative to chimp (top) and macaque (bottom) genome. The Hand2 enhancer region shows marginally significant effects (p=0.055). Both polymorphism and divergence sites in the stringently defined candidate cis-regulatory modules (CRMs) overlapping these non-coding intervals provide many significant (p<=0.05) sequence elements for further study. In humans (based on the HapMap sample) there is essentially no linkage disequilibrium and very little SNP variation in the Hand2 gene itself, but the candidate regulatory sequence shows substantial polymorphism as well as local linkage disequilibrium, suggesting that an aspect of Hand2 expression may have been a target of selection. As we identify other LOD score candidates, similar analyses on both coding and non-coding sequence will be run.
References
Akashi, H., Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics, 1995. 139(2): p. 1067-76.
Almasy L, Blangero J. 1998. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62(5):1198-211.
Ehrich, T.H., et al., Pleiotropic effects on mandibular morphology I. Developmental morphological integration and differential dominance. Journal of Experimental Zoology (Molecular and Developmental Evolution) 2003. 296B: p. 58-79
Hardison, R.C., et al., Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res, 2003. 13(1): p. 13-26.
Lawson, H.A., et al., Recent natural selection in human noncoding sequences. Science, submitted.
McDonald, J.H. and M. Kreitman, Adaptive protein evolution at the Adh locus in Drosophila. Nature, 1991. 351(6328): p. 652-4.
Sherry, S.T., et al., dbSNP: the NCBI database of genetic variation. Nucleic Acids Res, 2001. 29(1): p. 308-11.
Stewart, C.B. and T.R. Disotell, Primate evolution – in and out of Africa. Curr Biol, 1998. 8(16): p. R582-8.