gfit                  package:happy                  R Documentation

_F_i_t _a _G_a_u_s_s_i_a_n _M_i_x_t_u_r_e _M_o_d_e_l _t_o _a_n _o_b_j_e_c_t _r_e_t_u_r_n_e_d _b_y _h_a_p_p_y()

_D_e_s_c_r_i_p_t_i_o_n:

     gfit() fits a QTL model to a happy() object. The model is a
     mixture of Gaussians, each with a different mean, and corresponds
     loosely to the "full" model in hfit(). The difference is that
     hfit() fits the observed phenotype values to the expected
     phenotypes under a full model, whereas gfit() uses maximum
     likelihood to fit the observed phenotype values to a mixture of
     Gaussians, each with a different mean but common variance. The
     other functions in this suite are not usually called directly.

     The statistical model fitted is as follows.    Consider first the
     case of fitting a QTL at a locus, L. Let *y*  be the vector of
     trait values. Let *X(L)* be the design matrix for fitting a QTL at
     the locus L. Let * t(L)* be the vector of parameters to be
     estimated at the locus. For an additive QTL, the paramters are the
     strain effect sizes; for a full interaction model there is a
     paramter for every possible strain combination. Then the one-QTL
     model is


                           *E(y) = X(L).t*


     There are S(S-1)/2 parameters to be estimated in a full model
     allowing for interactions between the alleles within the locus, so
     the i,j'th element of the design matrix *X* is related to the
     strain probabilities thus:


                          *X(Lij) = F(iLst)*

     , where

                 j(s,t) = min(s + S(t-1), t + S(s-1))


     In the function hfit(), the observed phenotypes are regressed
     directly on the expected trait values. This is not an optimal
     procedure becuase the data are really a mixture:


   y_i tilde sum_{st} F_{iLst} f( ( y_i - beta_{Lst} )/2sigma_L^2)


     where f(x) is a standard Gaussian density. The *beta_L* is a
     vector of mean trait values for the strain combinations. The
     parameters *beta_L* , sigma_L are estimated by maximum likelihood,
     and the test for the presence of a QTL at locus L is equivalent to
     the test that all the beta_{st}=mu, when the model collapses to a
     single Gaussian distribution. 

     The model-fitting is implemented in the function gfit() by an
     iterative process, rather like a simplified version of EM. Is is
     slower than hfit(), and generally gives similar results as far as
     overall QTL detection is concered,m but gives more accurate
     parameter estimates. The log-likelihood for the data is


 L = sum_{i} log ( sum_j p_{ij} frac{exp(-frac{(y_i-beta_j)^2}{2sigma^2})}{sqrt{2pi sigma^2}})



 = sum_i log ( sum_j p_{ij} exp(-frac{(y_i-beta_j)^2}{2sigma^2})) - frac{N log(2pisigma^2)}{2}


     Differentiating wrt to the parameters gives


 frac{partial L}{partial sigma^2} = sum_i frac{sum_j p_{ij} (y_i-beta_j)^2 exp(-frac{(y_i-beta_j)^2}{2sigma^2}) }{ 2sigma^4 sum_j p_{ij} exp(-frac{(y_i-beta_j)^2 }{2sigma^2})}  - frac{N}{2sigma^2}



 frac{partial L}{partial beta_j } = - sum_i frac{ p_{ij}   frac{(y_i-beta_j) }{sigma^2} exp( -frac{(y_i-beta_j)^2}{2sigma^2})}{ sum_j e_{ij}}


 = frac{1}{sigma^2} (  - sum_i frac{y_i e_{ij} }{sum_j e_{ij}} + beta_j frac{sum_i e_{ij} }{sum_j e_{ij}} )


     write 

 w_{ij} = frac{p_{ij} exp(-frac{(y_i-beta_j)^2}{2sigma^2}) }{ sum_j frac{p_{ij} exp(-frac{(y_i-beta_j)^2}{2sigma^2})}}


     then the mle satisfies


 hat{sigma^2} = frac{1}{N} sum_i frac{sum_j p_{ij}(y_i-beta_j)^2 exp(-frac{(y_i-beta_j)^2}{2sigma^2})}  {sum_j p_{ij}exp(-frac{(y_i-beta_j)^2}{2sigma^2})}



 hat{sigma^2} = frac{1}{N} sum_i sum_j hat{w}_{ij}(y_i-hat{beta}_j)^2



    hat{beta_j} = frac{sum_i  hat{e}_{ij} y_i}{sum_i hat{e}_{ij}}


     and the log-likelihood is


 hat{L} =  sum_i(log sum_j hat{e}_{ij}) - frac{N log(2pihat{sigma}^2)}{2}

_U_s_a_g_e:

     gfit( h,eps=1.0e-4, shuffle=FALSE, method="optim" )
     gaussian.loop( d, maxit=100, eps=1.0e-3, df=NULL ) 
     gaussian.null( n, sigma2 ) 
     gaussian.init( d ) 
     gaussian.iterate( d, params ) 
     gaussian.fn( p, d=NULL )
     gaussian.gr( p, d=NULL )

_A_r_g_u_m_e_n_t_s:

       h: an object returned by a previous call to happy()

 shuffle: boolean indicating whether the shuffle the phenotypes to
          perform a permutation test

  method: The optimisation algorithm. Default is to use R's "optim"
          function, which uses derivative information. All other values
          of this argument will use an EM type iteration.

       d: a list comprising two elements d, probs

   maxit: the maximum number of iterations in the ML fitting

     eps: the terminatation accuracy in the model fitting : the log
          likelihood must change by less than eps in successive
          iterations

      df: the degress of freedom to use. If NULL then this is computed
          as the rank of the data

       n: the number of observations with non-missing phenotypes

  sigma2: the variance of the errors in the data

  params: a list with two components, beta = the group means and sigma
          = the standard deviation

       p: vector of paramters. For internal use only

_V_a_l_u_e:

     gfit() returns a matrix with columns "marker", "LogL", "Null",
     "Chi", "df", "Pval", "LogPval". Each row of the column describes
     the fit of the model for thecorresponding  marker interval.

     gaussian.loop() fits the model to a single marker and returns a
     list with the same elements as in hfit()

     gaussian.iterate() performs a single iteration of the fitting
     process and returns a list with the updated LogL, beta, sigma,
     dbeta and dsigma

     gaussian.init() intialises the parameters under the Null model, ie
     to the case where the means are all identical and the variance is
     the overal variance.

     gaussian.null() returns the log-likelihood under the Null model

     gaussian.fn() and gaussian.gr() are the function and gradient
     required by the optim function.

_A_u_t_h_o_r(_s):

     Richard Mott

_S_e_e _A_l_s_o:

     happy{}, hprob{}

_E_x_a_m_p_l_e_s:

     ## An example session:
     # initialise happy
     ## Not run: h <- happy('HS.data','HS.alleles')
     # fit all the markers
     ## Not run: f <- gfit(h)

