Skip to content
rmcclosk edited this page Sep 24, 2014 · 15 revisions

How to use ABSOLUTE

RunAbsolute is the main function. Its first parameter is the name of the segments file, which is either a data table with a specific format (see below), or a .Rdata file produced by Hapseg. When using copy_num_type="total", you must supply a CSV; when copy_num_type="allelic", you need a .Rdata file.

Segments file format

The segments file used as input to RunAbsolute must be a tab-delimited file with the following columns.

  1. "Chromosome": the chromosome that the segment is on (eg. 1)
  2. "Start": start of the segment (eg. 742429)
  3. "End": end of the segment (eg. 6398273)
  4. "Num_Probes": ???
  5. "Segment_Mean": ???

Parameters

Some parameters are required by RunAbsolute. Most of the descriptions and example values are taken from the official documentation.

  1. min.ploidy: minimum ploidy value to consider (eg. 0.95)
  2. max.ploidy: maximum ploidy value to consider (eg. 10)
  3. max.sigma.h: ??? "Maximum value of excess sample level variance (Eq. 6)". Eq. 6 refers to this paper (eg. 0.02).
  4. sigma.p: ??? "Provisional value of excess sample level variance used for mode search" (eg. 0)
  5. platform: one of "SNP_250K_STY", "SNP_6.0", or "Illumina_WES".
  6. copy_num_type: one of "total" or "allelic" (use "total" if you have a seg file, "allelic" if you have a Hapseg .Rdata file).
  7. results.dir: where to put results
  8. primary.disease: a string describing the disease being studied (eg. "cancer"). Seems to do nothing.
  9. sample.name: name of the sample (eg. "foo").
  10. max.as.seg.count: maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'. (eg. 1500).

Tips

  • If you are supplying a seg file, you must also set copy_num_type="total". You must also set max.as.seg.count to a sufficiently large value (but it doesn't seem to need to be bigger than the total number of segments in the file).
  • Make sure max.as.seg.count is large enough. If RunAbsolute finishes very quickly, load the RData and check seg.dat[["mode.res"]][["mode.flag"]]. If it is "OVERSEG", it means there were more segments than max.as.seg.count, so try increasing that parameter until it runs.

Example

RunAbsolute("mix250K_seg_out.txt",
        min.ploidy=0.95, 
        max.ploidy=10, 
        max.sigma.h=0.02, 
        sigma.p=0, 
        platform="Illumina_WES", 
        copy_num_type="total",
        results.dir="test", 
        primary.disease="cancer", 
        sample.name="foo", 
        max.as.seg.count=1500)

Output of RunAbsolute

Calling RunAbsolute will produce a file results.dir/sample.name.ABSOLUTE.RData (so in the above example, it would be test/foo.ABSOLUTE.RData). If you load this file within R, you will have access to an object called seg.dat which contains the output. This object has the following attributes.

  1. segtab: a data.frame with the following columns.
    • Chromosome
    • Start.bp
    • End.bp
    • n_probes
    • length
    • copy_num
    • seg_sigma
    • W
  2. error_model:
  3. primary.disease:
  4. group:
  5. platform:
  6. sample.name:
  7. array.name:
  8. obs.scna:
  9. mode.res:
  10. version:
Clone this wiki locally