@misc{oai:ir.soken.ac.jp:00001221, author = {YANG, Zhong and ヤン, ツォン and YANG, Zhong}, month = {2016-02-17}, note = {This thesis is a study of phylogenetic approaches and database system as well as
their uses in bioinformatics. It focuses on three main topics: (A) molecular
phylogenetic analysis as an effective tool to investigate the evolutionary relationships
and rates and adaptation of two important groups: the mangrove family
Rhizophoraceae and the sere acute respiratory syndrome (SARS) coronavirus; (B)
statistical model and computer simulation approach for testing hybridization
hypotheses based on incongruent gene trees; and (C) a new data model and comparison
method for interacting classifications and phylogenetic trees in a taxonomic database.
  In Chapter 1 , I outlined the advances in today's biodiversity science and
bioinformatics, and in the studies of molecular evolution and phylogenetics. To meet
the major needs for a newly formed cross-disciplinary between biodiversity science and
bioinformatics, i. e., biodiversity informatics, applications of phylogenetic approaches
and data models as well as taxonomic database systems in this field are needed.
  In Chapter 2, I Investigated the phylogenetic relationships and evolutionary rate
heterogeneity of the family Rhizophoraceae based on the sequences of chloroplast
genes matK and rbcL, and ITS regions of nuclear ribosomal DNA. Phylogenetic trees
were constructed using the maximum parsimony (MP), neighbor-joining (NJ) and
maximum likelihood (ML) methods. The partition-homogeneity tests indicated that the
data sets were homogeneous, and the combined analysis showed that four mangrove
genera formed a monophyletic group and the terrestrial genus Pellacalyx was shown to
be the basal clade. Evolutionary rate heterogeneity for the plastid matK and rbcL genes
in different species of the Rhizophoraceae was analyzed by means of the relative-rate
tests. A number of significant rate differences at synonymous and non-synonymous
sites were detected in the two genes. Two significant contrasts are that the mangrove
genus Bruguiera has relatively slower substitution rates than the terrestrial genus
Carallia at both synonymous and non-synonymous sites in the matK sequences. The
Mantel tests showed that the synonymous and non-synonymous relative・rate matrices
are correlated at the matK gene, suggesting that selective constraint at non-synonymous
sites is fairly constant among evolutionary lineages of the matK locus. Second, there
are 13 significant contrasts at non-synonymous sites in the rbcL sequences. Among
them, six indicate that the mangrove genera have relatively faster non-synonymous
substitution rates than the related terrestrial groups. However, the terrestrial genus
Carallia still shows a relatively faster non-synonymous rate than the mangrove genus
Kandelia. Moreover, the rbcL non-synonymous sites also exhibit rate heterogeneity
among the terrestrial groups, regardless of their geographical distributions. The Mantel
tests show that the rbcL rates at synonymous and non-synonymous sites are
uncorrelated. The molecular evolutionary pattern of mangroves and their terrestrial
relatives in which non-synonymous and synonymous substitution rates are uncoupled
suggests that selection is probably an important influence on the rate variation.
   In Chapter 3, I detected the adaptive evolution in SARS coronavirus (SARS-CoV)
genome. First, 61 SARS coronavirus (SARS-CoV) genomic sequences derived from
the early, middle, and late phases of the SARS epidemic were analyzed together with
two viral sequences from palm civets. The neutral mutation rate of the viral genome
was constant but the amino acid substitution rate of the coding sequences slowed
during the course of the epidemic. Between the sequences of the palm civets and each
of the human SARS-Co-V sequences, the ratios of the rates of nonsynonymous to
synonymous changes (KA/ Ks) for the S gene sequences were always greater than 1,
indicating an overall positive selection pressure. However, pairwise analysis of the KA/
Ks for the genotypes in each epidemic group shows that the average KA/ Ks for the
early phase was significantly larger than that for the middle phase, which in tum was
significantly larger than the ratio for the late phase, which in fact was significantly less
than 1. These data indicated that the S gene showed the strongest initial responses to
positive selection pressures, followed by subsequent purifying selection and eventual
stabilization. Second, I further tested the hypothesis that radical amino acid
replacements in the spike protein, favored by environmental selective pressure during
the process of SARS-CoV interspecific transmission. I investigated 108 complete
sequences of the SARS-CoV S gene, and reconstructed the most recent common
ancestor (MRCA) sequences of the S gene and detected the adaptive evolution in the
spike protein. The results showed the simultaneous amino acid replacements in three
sites, i.e., 360, 665 and 701. These sites led to the excess of observed radical
substitution number over corresponding expectation under the assumption of selective
neutrality, indicative of potentially important roles they played in the adaptive
evolution of the spike protein.
  In Chapter 4, I characterized certain distinctions between hybridization and other
biological processes, including lineage sorting, paralogy, and lateral gene transfer, that
are responsible for topological incongruence between gene trees. Consider two
incongruent gene trees with three taxa, A, B, and C, where B is a sister group of A on
gene tree 1 but a sister group of C on gene tree 2. With a theoretical model based on the
molecular clock, we demonstrated that time of divergence of each gene between taxa A
and C is nearly equal in the case of hybridization (B is a hybrid) or lateral gene transfer,
but differs significantly in the case of lineage sorting or paralogy. After developing a
bootstrap test to test these altermative hypotheses, we extended the model and test to
account for incongruent gene trees with numerous taxa. Computer simulation studies
supported the validity of the theoretical model and bootstrap test when each gene
evolved at a constant rate. The computer simulation also suggested that the model
remained valid as long as the rate heterogeneity was occurring proportionally in the
same taxa for both genes.
  Finally, in Chapter 5, I described an information-theoretic view, i. e., taxon-view,
which can be applied to biological classification to capture taxonomic concepts as data
entities and to develop a system for managing these concepts and the lineage
relationships among them. A new data model and methodology for comparing
interacting classiflcations were outlined. On the basis of the data model and
comparison and query methods, a prototype taxonomic database system called
HICLAS (Hierarchical CLAssification System) was built to query classification data
and to compare interacting classifications and phylogenetic trees., 総研大乙第162号}, title = {Bioinformatics for the study of biodiversity}, year = {} }