Skip to content
rmcclosk edited this page Dec 16, 2014 · 15 revisions

The below are some incomplete notes I made when learning to use ABSOLUTE. Probably you don't need to pay attention to them.

RunAbsolute is the main function. Its first parameter is the name of the segments file, which is either a data table with a specific format (see below), or a .Rdata file produced by Hapseg. When using copy_num_type="total", you must supply a CSV; when copy_num_type="allelic", you need a .Rdata file.

Segments file format

The segments file used as input to RunAbsolute must be a tab-delimited file with the following columns.

  1. "Chromosome": the chromosome that the segment is on (eg. 1)
  2. "Start": start of the segment (eg. 742429)
  3. "End": end of the segment (eg. 6398273)
  4. "Num_Probes": ???
  5. "Segment_Mean": ???

Parameters

Some parameters are required by RunAbsolute. Most of the descriptions and example values are taken from the official documentation.

  1. min.ploidy: minimum ploidy value to consider (eg. 0.95)
  2. max.ploidy: maximum ploidy value to consider (eg. 10)
  3. max.sigma.h: ??? "Maximum value of excess sample level variance (Eq. 6)". Eq. 6 refers to this paper (eg. 0.02).
  4. sigma.p: ??? "Provisional value of excess sample level variance used for mode search" (eg. 0)
  5. platform: one of "SNP_250K_STY", "SNP_6.0", or "Illumina_WES".
  6. copy_num_type: one of "total" or "allelic" (use "total" if you have a seg file, "allelic" if you have a Hapseg .Rdata file).
  7. results.dir: where to put results
  8. primary.disease: a string describing the disease being studied (eg. "cancer"). Seems to do nothing.
  9. sample.name: name of the sample (eg. "foo").
  10. max.as.seg.count: maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'. (eg. 1500).
  11. max.non.clonal: maximum genome fraction that may be modeled as non-clonal (subclonal SCNA). Solutions implying greater values will be discarded. (eg. 0)
  12. max.neg.genome: maximum genome fraction that may be modeled as non-clonal with copy-ratio below that of clonal homozygous deletion. Solutions implying greater values will be discarded. (eg. 0)

Tidbits

  • If you are supplying a seg file, you must also set copy_num_type="total".
  • Make sure max.as.seg.count is large enough (not sure how to quantify that at this point). If RunAbsolute finishes very quickly, load the RData and check seg.dat[["mode.res"]][["mode.flag"]]. If it is "OVERSEG", it means there were more segments than max.as.seg.count, so try increasing that parameter until it runs.
  • max.non.clonal and max.neg.genome are required parameters, even though ABSOLUTE does not complain (initially) if you do not pass them in.
  • I have no idea why you would allow max.neg.genome > 0.
  • The parameters max.as.seg.count, max.non.clonal, and max.neg.genome seem to only act as flags to discard results, and not have any effect on the statistical analysis. If you are capable of filtering the results yourself, it might be worth setting these parameters to very permissive values, eg. max.non.clonal=100 and max.as.seg.count=10E10.

Example

RunAbsolute("mix250K_seg_out.txt",
        min.ploidy=0.95, 
        max.ploidy=10, 
        max.sigma.h=0.02, 
        sigma.p=0, 
        platform="Illumina_WES", 
        copy_num_type="total",
        results.dir="test", 
        primary.disease="cancer", 
        sample.name="foo", 
        max.as.seg.count=1500,
        max.non.clonal=0,
        max.neg.genome=0)

Error conditions

If RunAbsolute completed very quickly but you are unable to create a review object, it's likely that an error condition was encountered. Load the RData file created by RunAbsolute, and check the value of seg.dat[["mode.res"]][["mode.flag"]].

  • OVERSEG: this means there were too many segments. Increase max.as.seg.count.
  • E_CR_SCALE: the expected copy number was too far away from 1, either <0.75 or >1.25. Check that your seg file is not empty.

Warnings

A large number of warnings are created when running RunAbsolute. Most of them are harmless.

Warning in if (!is.na(res)) { :
  the condition has length > 1 and only the first element will be used

This is not a problem. The function which returns res, which is generally a list, returns NA on some conditions. The same error happens for mode.tab, for the same reason.

Warning in nlm(f = comb_1d_ll, p = d_grid[i], Q = Q, obs = obs, lambda_qz_res = lambda_qz_res,  :
  NA/Inf replaced by maximum positive value

Not sure what's going on here yet.

Output of RunAbsolute

Calling RunAbsolute will produce a file results.dir/sample.name.ABSOLUTE.RData (so in the above example, it would be test/foo.ABSOLUTE.RData). If you load this file within R, you will have access to an object called seg.dat which contains the output. This object has the following attributes.

  1. segtab: a data.frame with the following columns.
    • Chromosome
    • Start.bp
    • End.bp
    • n_probes
    • length
    • copy_num
    • seg_sigma
    • W
  2. error_model:
  3. primary.disease:
  4. group:
  5. platform:
  6. sample.name:
  7. array.name:
  8. obs.scna:
  9. mode.res:
  10. version:

Summarizing results

The function CreateReviewObject is used to create a summary of ABSOLUTE's output. Its parameters are self-explanatory (see the official documentation). This produces a file called [sample name].PP-calls_tab.txt.