-
Notifications
You must be signed in to change notification settings - Fork 1
ABSOLUTE
The below are some incomplete notes I made when learning to use ABSOLUTE. Probably you don't need to pay attention to them.
RunAbsolute
is the main function. Its first parameter is the name of the segments file, which is either a data table with a specific format (see below), or a .Rdata file produced by Hapseg. When using copy_num_type="total"
, you must supply a CSV; when copy_num_type="allelic"
, you need a .Rdata file.
The segments file used as input to RunAbsolute must be a tab-delimited file with the following columns.
- "Chromosome": the chromosome that the segment is on (eg. 1)
- "Start": start of the segment (eg. 742429)
- "End": end of the segment (eg. 6398273)
- "Num_Probes": ???
- "Segment_Mean": ???
Some parameters are required by RunAbsolute. Most of the descriptions and example values are taken from the official documentation.
- min.ploidy: minimum ploidy value to consider (eg. 0.95)
- max.ploidy: maximum ploidy value to consider (eg. 10)
- max.sigma.h: ??? "Maximum value of excess sample level variance (Eq. 6)". Eq. 6 refers to this paper (eg. 0.02).
- sigma.p: ??? "Provisional value of excess sample level variance used for mode search" (eg. 0)
- platform: one of "SNP_250K_STY", "SNP_6.0", or "Illumina_WES".
- copy_num_type: one of "total" or "allelic" (use "total" if you have a seg file, "allelic" if you have a Hapseg .Rdata file).
- results.dir: where to put results
- primary.disease: a string describing the disease being studied (eg. "cancer"). Seems to do nothing.
- sample.name: name of the sample (eg. "foo").
- max.as.seg.count: maximum number of allelic segments. Samples with a higher segment count will be flagged as 'failed'. (eg. 1500).
- max.non.clonal: maximum genome fraction that may be modeled as non-clonal (subclonal SCNA). Solutions implying greater values will be discarded. (eg. 0)
- max.neg.genome: maximum genome fraction that may be modeled as non-clonal with copy-ratio below that of clonal homozygous deletion. Solutions implying greater values will be discarded. (eg. 0)
- If you are supplying a seg file, you must also set
copy_num_type="total"
. - Make sure
max.as.seg.count
is large enough (not sure how to quantify that at this point). If RunAbsolute finishes very quickly, load the RData and checkseg.dat[["mode.res"]][["mode.flag"]]
. If it is"OVERSEG"
, it means there were more segments thanmax.as.seg.count
, so try increasing that parameter until it runs. -
max.non.clonal
andmax.neg.genome
are required parameters, even though ABSOLUTE does not complain (initially) if you do not pass them in. - I have no idea why you would allow
max.neg.genome > 0
. - The parameters
max.as.seg.count
,max.non.clonal
, andmax.neg.genome
seem to only act as flags to discard results, and not have any effect on the statistical analysis. If you are capable of filtering the results yourself, it might be worth setting these parameters to very permissive values, eg.max.non.clonal=100
andmax.as.seg.count=10E10
.
RunAbsolute("mix250K_seg_out.txt",
min.ploidy=0.95,
max.ploidy=10,
max.sigma.h=0.02,
sigma.p=0,
platform="Illumina_WES",
copy_num_type="total",
results.dir="test",
primary.disease="cancer",
sample.name="foo",
max.as.seg.count=1500,
max.non.clonal=0,
max.neg.genome=0)
If RunAbsolute completed very quickly but you are unable to create a review object, it's likely that an error condition was encountered. Load the RData file created by RunAbsolute, and check the value of seg.dat[["mode.res"]][["mode.flag"]]
.
-
OVERSEG: this means there were too many segments. Increase
max.as.seg.count
. - E_CR_SCALE: the expected copy number was too far away from 1, either <0.75 or >1.25. Check that your seg file is not empty.
A large number of warnings are created when running RunAbsolute. Most of them are harmless.
Warning in if (!is.na(res)) { :
the condition has length > 1 and only the first element will be used
This is not a problem. The function which returns res
, which is generally a list, returns NA
on some conditions. The same error happens for mode.tab
, for the same reason.
Warning in nlm(f = comb_1d_ll, p = d_grid[i], Q = Q, obs = obs, lambda_qz_res = lambda_qz_res, :
NA/Inf replaced by maximum positive value
Not sure what's going on here yet.
Calling RunAbsolute
will produce a file results.dir/sample.name.ABSOLUTE.RData
(so in the above example, it would be test/foo.ABSOLUTE.RData
). If you load
this file within R, you will have access to an object called seg.dat
which contains the output. This object has the following attributes.
- segtab: a data.frame with the following columns.
- Chromosome
- Start.bp
- End.bp
- n_probes
- length
- copy_num
- seg_sigma
- W
- error_model:
- primary.disease:
- group:
- platform:
- sample.name:
- array.name:
- obs.scna:
- mode.res:
- version:
The function CreateReviewObject
is used to create a summary of ABSOLUTE's output. Its parameters are self-explanatory (see the official documentation). This produces a file called [sample name].PP-calls_tab.txt.