|
内容記述 |
This thesis is a study of phylogenetic approaches and database system as well as<br />their uses in bioinformatics. It focuses on three main topics: (A) molecular<br />phylogenetic analysis as an effective tool to investigate the evolutionary relationships<br />and rates and adaptation of two important groups: the mangrove family<br />Rhizophoraceae and the sere acute respiratory syndrome (SARS) coronavirus; (B)<br />statistical model and computer simulation approach for testing hybridization<br />hypotheses based on incongruent gene trees; and (C) a new data model and comparison<br />method for interacting classifications and phylogenetic trees in a taxonomic database.<br /> In Chapter 1 , I outlined the advances in today's biodiversity science and<br />bioinformatics, and in the studies of molecular evolution and phylogenetics. To meet<br />the major needs for a newly formed cross-disciplinary between biodiversity science and<br />bioinformatics, <i>i. e.,</i> biodiversity informatics, applications of phylogenetic approaches<br />and data models as well as taxonomic database systems in this field are needed.<br /> In Chapter 2, I Investigated the phylogenetic relationships and evolutionary rate<br />heterogeneity of the family Rhizophoraceae based on the sequences of chloroplast<br />genes <i>mat</i>K and <i>rbcL</i>, and ITS regions of nuclear ribosomal DNA. Phylogenetic trees<br />were constructed using the maximum parsimony (MP), neighbor-joining (NJ) and<br />maximum likelihood (ML) methods. The partition-homogeneity tests indicated that the<br />data sets were homogeneous, and the combined analysis showed that four mangrove<br />genera formed a monophyletic group and the terrestrial genus <i>Pellacalyx</i> was shown to<br />be the basal clade. Evolutionary rate heterogeneity for the plastid <i>mat</i>K and <i>rbc</i>L genes<br />in different species of the Rhizophoraceae was analyzed by means of the relative-rate<br />tests. A number of significant rate differences at synonymous and non-synonymous<br />sites were detected in the two genes. Two significant contrasts are that the mangrove<br />genus <i>Bruguiera</i> has relatively slower substitution rates than the terrestrial genus<br /><i>Carallia</i> at both synonymous and non-synonymous sites in the <i>mat</i>K sequences. The<br />Mantel tests showed that the synonymous and non-synonymous relative・rate matrices<br />are correlated at the <i>mat</i>K gene, suggesting that selective constraint at non-synonymous<br />sites is fairly constant among evolutionary lineages of the <i>mat</i>K locus. Second, there<br />are 13 significant contrasts at non-synonymous sites in the <i>rbc</i>L sequences. Among<br />them, six indicate that the mangrove genera have relatively faster non-synonymous<br />substitution rates than the related terrestrial groups. However, the terrestrial genus<br /><i>Carallia</i> still shows a relatively faster non-synonymous rate than the mangrove genus<br /><i>Kandelia.</i> Moreover, the <i>rbc</i>L non-synonymous sites also exhibit rate heterogeneity<br />among the terrestrial groups, regardless of their geographical distributions. The Mantel<br />tests show that the <i>rbc</i>L rates at synonymous and non-synonymous sites are<br />uncorrelated. The molecular evolutionary pattern of mangroves and their terrestrial<br />relatives in which non-synonymous and synonymous substitution rates are uncoupled<br />suggests that selection is probably an important influence on the rate variation.<br /> In Chapter 3, I detected the adaptive evolution in SARS coronavirus (SARS-CoV)<br />genome. First, 61 SARS coronavirus (SARS-CoV) genomic sequences derived from<br />the early, middle, and late phases of the SARS epidemic were analyzed together with<br />two viral sequences from palm civets. The neutral mutation rate of the viral genome<br />was constant but the amino acid substitution rate of the coding sequences slowed<br />during the course of the epidemic. Between the sequences of the palm civets and each<br />of the human SARS-Co-V sequences, the ratios of the rates of nonsynonymous to<br />synonymous changes (K<small>A</small>/ K<small>s</small>) for the S gene sequences were always greater than 1,<br />indicating an overall positive selection pressure. However, pairwise analysis of the K<small>A</small>/<br />K<small>s</small> for the genotypes in each epidemic group shows that the average K<small>A</small>/ K<small>s</small> for the<br /> early phase was significantly larger than that for the middle phase, which in tum was<br />significantly larger than the ratio for the late phase, which in fact was significantly less<br />than 1. These data indicated that the S gene showed the strongest initial responses to<br />positive selection pressures, followed by subsequent purifying selection and eventual<br />stabilization. Second, I further tested the hypothesis that radical amino acid<br />replacements in the spike protein, favored by environmental selective pressure during<br />the process of SARS-CoV interspecific transmission. I investigated 108 complete<br />sequences of the SARS-CoV S gene, and reconstructed the most recent common<br />ancestor (MRCA) sequences of the S gene and detected the adaptive evolution in the<br />spike protein. The results showed the simultaneous amino acid replacements in three<br />sites, i.e., 360, 665 and 701. These sites led to the excess of observed radical<br />substitution number over corresponding expectation under the assumption of selective<br />neutrality, indicative of potentially important roles they played in the adaptive<br />evolution of the spike protein.<br /> In Chapter 4, I characterized certain distinctions between hybridization and other<br />biological processes, including lineage sorting, paralogy, and lateral gene transfer, that<br />are responsible for topological incongruence between gene trees. Consider two<br />incongruent gene trees with three taxa, A, B, and C, where B is a sister group of A on<br />gene tree 1 but a sister group of C on gene tree 2. With a theoretical model based on the<br />molecular clock, we demonstrated that time of divergence of each gene between taxa A<br />and C is nearly equal in the case of hybridization (B is a hybrid) or lateral gene transfer,<br />but differs significantly in the case of lineage sorting or paralogy. After developing a<br />bootstrap test to test these altermative hypotheses, we extended the model and test to<br />account for incongruent gene trees with numerous taxa. Computer simulation studies<br />supported the validity of the theoretical model and bootstrap test when each gene<br />evolved at a constant rate. The computer simulation also suggested that the model<br />remained valid as long as the rate heterogeneity was occurring proportionally in the<br />same taxa for both genes.<br /> Finally, in Chapter 5, I described an information-theoretic view,<i> i. e.,</i> taxon-view,<br />which can be applied to biological classification to capture taxonomic concepts as data<br />entities and to develop a system for managing these concepts and the lineage<br />relationships among them. A new data model and methodology for comparing<br />interacting classiflcations were outlined. On the basis of the data model and<br />comparison and query methods, a prototype taxonomic database system called<br />HICLAS (Hierarchical CLAssification System) was built to query classification data<br />and to compare interacting classifications and phylogenetic trees. |