Item type 
学位論文 / Thesis or Dissertation(1) 
公開日 
20100222 
タイトル 


タイトル 
Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics 
タイトル 


言語 
en 

タイトル 
Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics 
言語 


言語 
eng 
資源タイプ 


資源タイプ識別子 
http://purl.org/coar/resource_type/c_46ec 

資源タイプ 
thesis 
著者名 
徐, 泰健

フリガナ 
セオ, タエクン

著者 
SEO, TaeKun

学位授与機関 



学位授与機関名 
総合研究大学院大学 
学位名 


学位名 
博士（学術） 
学位記番号 


内容記述タイプ 
Other 

内容記述 
総研大甲第623号 
研究科 


値 
先導科学研究科 
専攻 


値 
21 生命体科学専攻 
学位授与年月日 


学位授与年月日 
20020322 
学位授与年度 



2001 
要旨 


内容記述タイプ 
Other 

内容記述 
The high pace of viral sequence change means that variation in the times at which sequences are sampled can have a profound effect both on the ability to detect trends over time in evolutionary rates and on the power to reject the molecular clock hypothesis. Trends in viral evolutionary rates are of particular interest because their detection may allow connections to be established between a patient's treatment or condition and the process of evolution.<br />Variation in sequence isolation times also impacts the uncertainty associated with estimates of divergence times and evolutionary rates. Variation in isolation times can be intentionally adjusted to increase the power of hypothesis tests and to reduce the uncertainty of evolutionary parameter estimates, but this fact has received little previous attention. I provide approximations for the power to reject the molecular clock hypothesis when the alternative is that rates change in a linear fashion over time and when the alternative is that rates differ randomly among branches.<br /> When the evolutionary rate changes linearly, it can be shown as r(t) = a(t t<SUB>1</SUB>) + r where t is current time, t<SUB>1</SUB> is the time of origin and a is the amount of increase or decrease per unit time. For given a, we can calculate the power to reject the null hypothesis(H0: a = 0) using the fact that the statistic 2Δlog L = 2log 〓 tends to a noncentral χ<SUP>2</SUP> distribution under alternative hypothesis (H<SUB>1</SUB>: a ≠ 0) where the single circumflex (^) and double circumflex (〓) respectively denote maximum likelihood estimators (m.l.e.'s) under H<SUB>1</SUB> and H<SUB>0</SUB>.<br /> When the rates differ randomly among branches, we can consider the gamma distribution as a model of rate variation. If we further assume the number of substitution in each branch follows Poisson distribution, the probability density function of the number of substitutions is that of negative binomial distribution. The power to reject the null hypothesis(H0: Evolutionary rate does not vary) can be calculated using noncentral χ2 distribution.<br /> When the evolutionary rate is constant, the standard deviation of estimated evolutionary rates and divergence times can be approximated using Fisher information matrix. I illustrate how these approximations can be exploited to determine which vital sample should be sequenced when samples representing different dates are available.<br /> Using pseudomaximum likelihood approaches to phylogenetic inference and coalescent theory, I develop a computationally tractable method of estimating effective population size from serially sampled viral data. In this method, a two stage estimation procedure is adopted. The vector of times of internal nodes (〓) is estimated from sequence data and then these estimated node times serve as the basis for inferring effective population size (〓). Because the main interest is effective population size and not times of internal nodes, the internal node times are nuisance parameters in my analysis and the number of these nuisance parameters increases as the number of sequences increases. <br /> The variance of the maximum likelihood estimator of effective population size is approximated as<br /><br />(Numerical formula was abbreviated.)<br /><br />where n is the number of sequences.<br /> I show that the variance of the maximum likelihood estimator of effective population size depends on the serial sampling design only because internal node times on a coalescent genealogy can be better estimated with some designs than with others. Given the internal node times and the number of sequences sampled, the variance of the maximum likelihood estimator is independent of the serial sampling design.<br /> I estimate the effective size of the HIV1 population within nine hosts. If I assume that the mutation rate is 2.5 x 10<SUP>5</SUP> substitutions per generation and is the same in all patients, estimated generation lengths vary from 0.73 to 2.43 days per generation and the mean (1.47) is similar to the generation lengths estimated by other researchers. If I assume that generation length is 1.47 days and is the same in all patients, mutation rate estimates vary from 1.52 x 10<SUP>5</SUP> to 5.02 x 10<SUP>5</SUP>. The results indicate that effective viral population size and evolutionary rate per year are negatively correlated among HIV1 patients. <br /><br />(Figure 1(a), (b) were abbreviated.)<br /><br />Figure 1: A negative correlation between the evolutionary rate per year 〓 and the effective population size 〓. (a) assuming a generation length of 1.47 days, (b) assuming a mutation rate of 2.5 x 10<SUP>5</SUP> substitutions per generation. 
所蔵 


値 
有 