cache                 package:happy                 R Documentation

_S_a_v_e _H_A_P_P_Y _d_e_s_i_g_n _m_a_t_r_i_c_e_s _a_n_d _g_e_n_o_t_y_p_e_s _t_o _d_i_s_k _f_o_r _r_a_p_i_d
_r_e_l_o_a_d_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     'save.genome()' will persist the happy design matrices or
     genotypes from a series of happy objects to disk as a collection
     of R delayed data packages (as implemented in the package
     'g.data'). 'load.genome()' "reloads" the data, although the
     matrices are not actually loaded  into memory until used.
     'load.markers()' loads in a specific set of design matrices or
     genotypes, as defined by their marker names. These functions are
     very usefiul when access to a random selection of loci across the
     genome is required, and when it would be impossible for reasons of
     space to load many entire HAPPY objects into memory.
     'save.happy()'  saves a single happy object as a delayed data
     package. 'the.chromosomes()' is a conveniemce funtion that
     generates a character vector of chromosome names.

_U_s_a_g_e:

     save.genome( gdir, sdir, prefix, chrs=NULL, file.format="ped",
     mapfile=NULL,ancestryfile=NULL, generations=50, phase="unknown", haploid=FALSE )
     genome <- load.genome( sdir, use.X=TRUE, chr=the.chromosomes(use.X=use.X) )
     marker.list <- load.markers( genome, markers )
     save.happy( h, pkg, dir, model="additive" )

_A_r_g_u_m_e_n_t_s:

    gdir: Path to the directory containing the genotype (.alleles and
          either .data  or .ped ) input files required to instantiate
          happy objects. This directory wil1 typically contain a pair
          of files for each chromosome of the genome of interest

    sdir: Path to the directory where the data will be saved by
          'save.genome', and read back by 'load.genome()'.

  prefix: Text fragment used to define the file names sought by
          'save.genome()'.  An attempt is made to find files in 'gdir'
          named like 'chrN.prefix.*' where N is the chromosome number
          (1...20, X, Y), as defined in 'chrs'.

    chrs: List of chromosome numbers to be processed.

   use.X: logical to determine whether to use X-chromsome data, in
          load.genome().

file.format: Defines the input genotype file format, either "ped" (Ped
          file format) or "happy" ( HAPPY .data file format).

 mapfile: Name of a text file containing the physicla (base pair) map
          for the genome. It contains three columns named "marker",
          "chromosome" and "bp". Every marker in the .alleles files
          should be listed in the file.

generations: The number of generations since the HS was founded (see
          happy()).

  genome: An object returned by 'load.genome()'.

 markers: A vector of marker names. These names will be searched for in
          the 'genome' object, and if found, their corresponding data
          retrieved.

 haploid: A boolean variable indicating if the genomes should be
          interpreted as haploid, ie. homozygous at every locus. This
          option is used for the analysis of both truly haploid genomes
          and for recombinant inbred lines where all genotypes should
          be homozygotes. Note that the format of the genotype file
          (the .data file) is unchanged, but only the first allele of
          each genotype is used in the analysis.The default value for
          this option is FALSE, i.e. the genomes are assumued to be
          diploid and heterozygous. 

ancestryfile: An optional file name that is used to provide
          subject-specific ancestry information. More Soon...

   phase: If phase=="unknown" then the phase of the genotypes is
          unknown and no attempt is made to infer it. If
          phase="estimate" then it is estimated using parental genotype
          data when available. If phase="known" then it is assumed the
          phase of the input genotypes is correct i.e. the first and
          second alleles in each genotype for an individual are on the
          respectively the first and second chromosomes.  Where phase
          is known this setting should increase power, but it will
          cause erroneous output if it is set when the data are
          unphased. If phase="estimate" then file.format="ped" is
          assumed automatically, because the input data file must be in
          ped-file format in order to specify parental information.  

       h: A HAPPY object

     pkg: The name of the R delayed data package to be created

     dir: Name of directory to create a delayed data package for a
          single happy object

   model: One of "additive", "full", "genotype"

     .

_V_a_l_u_e:

     'save.genome()' returns NULL. 'load.genome()' returns a list
     object which contains information about the delayed datapackages
     loaded, and how the markers are distributed between the packages.
     The list comprises two components, named "genome" and "subjects".
     The former is a datatable with columns "marker", "chromosome",
     "map", "ddp" which acts as a genome-wide lookup-table for each
     marker. The latter lists the subject names corresponding to the
     rows in the design matrices or genotypes. NOTE: The software
     assumes that all the chromosome-specific files used in
     'save.genome()' are consistent. i.e. the same subjects in the same
     order occur in each chromosome, and that a marker is only present
     once across the genome. 'load.markers()' returns a list of data
     (either matrices or genotype vectors), each datum being named
     accoring to the relevant marker 'the.chromosomes()' returns a
     character vector of chromosome names, like ' c( "chr1", "chr2"
     ..., "chrX", "chrY" )'.

_A_u_t_h_o_r(_s):

     Richard Mott

_S_e_e _A_l_s_o:

     happy(). Note that the function happy.save() differs from
     save.happy(), in that it saves a single happy object for reloading
     with 'load()'; it does not use delayed data loading.

