@misc{oai:ir.soken.ac.jp:00001204, author = {徐, 泰健 and セオ, タエクン and SEO, Tae-Kun}, month = {2016-02-17}, note = {The high pace of viral sequence change means that variation in the times at which sequences are sampled can have a profound effect both on the ability to detect trends over time in evolutionary rates and on the power to reject the molecular clock hypothesis. Trends in viral evolutionary rates are of particular interest because their detection may allow connections to be established between a patient's treatment or condition and the process of evolution.
Variation in sequence isolation times also impacts the uncertainty associated with estimates of divergence times and evolutionary rates. Variation in isolation times can be intentionally adjusted to increase the power of hypothesis tests and to reduce the uncertainty of evolutionary parameter estimates, but this fact has received little previous attention. I provide approximations for the power to reject the molecular clock hypothesis when the alternative is that rates change in a linear fashion over time and when the alternative is that rates differ randomly among branches.
  When the evolutionary rate changes linearly, it can be shown as r(t) = a(t- t1) + r where t is current time, t1 is the time of origin and a is the amount of increase or decrease per unit time. For given a, we can calculate the power to reject the null hypothesis(H0: a = 0) using the fact that the statistic 2Δlog L = 2log 〓 tends to a non-central χ2 distribution under alternative hypothesis (H1: a ≠ 0) where the single circumflex (^) and double circumflex (〓) respectively denote maximum likelihood estimators (m.l.e.'s) under H1 and H0.
  When the rates differ randomly among branches, we can consider the gamma distribution as a model of rate variation. If we further assume the number of substitution in each branch follows Poisson distribution, the probability density function of the number of substitutions is that of negative binomial distribution. The power to reject the null hypothesis(H0: Evolutionary rate does not vary) can be calculated using non-central χ2 distribution.
  When the evolutionary rate is constant, the standard deviation of estimated evolutionary rates and divergence times can be approximated using Fisher information matrix. I illustrate how these approximations can be exploited to determine which vital sample should be sequenced when samples representing different dates are available.
  Using pseudo-maximum likelihood approaches to phylogenetic inference and coalescent theory, I develop a computationally tractable method of estimating effective population size from serially sampled viral data. In this method, a two stage estimation procedure is adopted. The vector of times of internal nodes (〓) is estimated from sequence data and then these estimated node times serve as the basis for inferring effective population size (〓). Because the main interest is effective population size and not times of internal nodes, the internal node times are nuisance parameters in my analysis and the number of these nuisance parameters increases as the number of sequences increases.
  The variance of the maximum likelihood estimator of effective population size is approximated as

(Numerical formula was abbreviated.)

where n is the number of sequences.
  I show that the variance of the maximum likelihood estimator of effective population size depends on the serial sampling design only because internal node times on a coalescent genealogy can be better estimated with some designs than with others. Given the internal node times and the number of sequences sampled, the variance of the maximum likelihood estimator is independent of the serial sampling design.
  I estimate the effective size of the HIV-1 population within nine hosts. If I assume that the mutation rate is 2.5 x 10-5 substitutions per generation and is the same in all patients, estimated generation lengths vary from 0.73 to 2.43 days per generation and the mean (1.47) is similar to the generation lengths estimated by other researchers. If I assume that generation length is 1.47 days and is the same in all patients, mutation rate estimates vary from 1.52 x 10-5 to 5.02 x 10-5. The results indicate that effective viral population size and evolutionary rate per year are negatively correlated among HIV-1 patients.

(Figure 1(a), (b) were abbreviated.)

Figure 1: A negative correlation between the evolutionary rate per year 〓 and the effective population size 〓. (a) assuming a generation length of 1.47 days, (b) assuming a mutation rate of 2.5 x 10-5 substitutions per generation., 総研大甲第623号}, title = {Statistical analysis of viral sequences : bridging sampling design, molecular phylogenetics and population genetics}, year = {} }