WEKO3
アイテム
Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics
https://ir.soken.ac.jp/records/1204
https://ir.soken.ac.jp/records/12049e55d3b6bac14bfabe1b347ff9d0e7a2
名前 / ファイル  ライセンス  アクション 

要旨・審査要旨 / Abstract, Screening Result (315.4 kB)

Item type  学位論文 / Thesis or Dissertation(1)  

公開日  20100222  
タイトル  
タイトル  Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics  
タイトル  
タイトル  Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics  
言語  en  
言語  
言語  eng  
資源タイプ  
資源タイプ識別子  http://purl.org/coar/resource_type/c_46ec  
資源タイプ  thesis  
著者名 
徐, 泰健
× 徐, 泰健 

フリガナ 
セオ, タエクン
× セオ, タエクン 

著者 
SEO, TaeKun
× SEO, TaeKun 

学位授与機関  
学位授与機関名  総合研究大学院大学  
学位名  
学位名  博士（学術）  
学位記番号  
内容記述タイプ  Other  
内容記述  総研大甲第623号  
研究科  
値  先導科学研究科  
専攻  
値  21 生命体科学専攻  
学位授与年月日  
学位授与年月日  20020322  
学位授与年度  
値  2001  
要旨  
内容記述タイプ  Other  
内容記述  The high pace of viral sequence change means that variation in the times at which sequences are sampled can have a profound effect both on the ability to detect trends over time in evolutionary rates and on the power to reject the molecular clock hypothesis. Trends in viral evolutionary rates are of particular interest because their detection may allow connections to be established between a patient's treatment or condition and the process of evolution.<br />Variation in sequence isolation times also impacts the uncertainty associated with estimates of divergence times and evolutionary rates. Variation in isolation times can be intentionally adjusted to increase the power of hypothesis tests and to reduce the uncertainty of evolutionary parameter estimates, but this fact has received little previous attention. I provide approximations for the power to reject the molecular clock hypothesis when the alternative is that rates change in a linear fashion over time and when the alternative is that rates differ randomly among branches.<br /> When the evolutionary rate changes linearly, it can be shown as r(t) = a(t t<SUB>1</SUB>) + r where t is current time, t<SUB>1</SUB> is the time of origin and a is the amount of increase or decrease per unit time. For given a, we can calculate the power to reject the null hypothesis(H0: a = 0) using the fact that the statistic 2Δlog L = 2log 〓 tends to a noncentral χ<SUP>2</SUP> distribution under alternative hypothesis (H<SUB>1</SUB>: a ≠ 0) where the single circumflex (^) and double circumflex (〓) respectively denote maximum likelihood estimators (m.l.e.'s) under H<SUB>1</SUB> and H<SUB>0</SUB>.<br /> When the rates differ randomly among branches, we can consider the gamma distribution as a model of rate variation. If we further assume the number of substitution in each branch follows Poisson distribution, the probability density function of the number of substitutions is that of negative binomial distribution. The power to reject the null hypothesis(H0: Evolutionary rate does not vary) can be calculated using noncentral χ2 distribution.<br /> When the evolutionary rate is constant, the standard deviation of estimated evolutionary rates and divergence times can be approximated using Fisher information matrix. I illustrate how these approximations can be exploited to determine which vital sample should be sequenced when samples representing different dates are available.<br /> Using pseudomaximum likelihood approaches to phylogenetic inference and coalescent theory, I develop a computationally tractable method of estimating effective population size from serially sampled viral data. In this method, a two stage estimation procedure is adopted. The vector of times of internal nodes (〓) is estimated from sequence data and then these estimated node times serve as the basis for inferring effective population size (〓). Because the main interest is effective population size and not times of internal nodes, the internal node times are nuisance parameters in my analysis and the number of these nuisance parameters increases as the number of sequences increases. <br /> The variance of the maximum likelihood estimator of effective population size is approximated as<br /><br />(Numerical formula was abbreviated.)<br /><br />where n is the number of sequences.<br /> I show that the variance of the maximum likelihood estimator of effective population size depends on the serial sampling design only because internal node times on a coalescent genealogy can be better estimated with some designs than with others. Given the internal node times and the number of sequences sampled, the variance of the maximum likelihood estimator is independent of the serial sampling design.<br /> I estimate the effective size of the HIV1 population within nine hosts. If I assume that the mutation rate is 2.5 x 10<SUP>5</SUP> substitutions per generation and is the same in all patients, estimated generation lengths vary from 0.73 to 2.43 days per generation and the mean (1.47) is similar to the generation lengths estimated by other researchers. If I assume that generation length is 1.47 days and is the same in all patients, mutation rate estimates vary from 1.52 x 10<SUP>5</SUP> to 5.02 x 10<SUP>5</SUP>. The results indicate that effective viral population size and evolutionary rate per year are negatively correlated among HIV1 patients. <br /><br />(Figure 1(a), (b) were abbreviated.)<br /><br />Figure 1: A negative correlation between the evolutionary rate per year 〓 and the effective population size 〓. (a) assuming a generation length of 1.47 days, (b) assuming a mutation rate of 2.5 x 10<SUP>5</SUP> substitutions per generation.  
所蔵  
値  有 