Skip to content

Commit

Permalink
Modification for trigenic nature protocol
Browse files Browse the repository at this point in the history
1. Added supporting trigenic scripts (inside trigenic/)
2. Added test code (test/trigenic_scoring_job.m) and test data (trigenic_scoring_test/)
Changes to be committed:
modified:   Column_Key.md
	modified:   add_SGAPATH.m
	modified:   test/Readme.md
	new file:   test/trigenic_scoring_job.m
	new file:   test/trigenic_scoring_test/raw_triple_test_large.txt.gz
	new file:   test/trigenic_scoring_test/raw_triple_test_small.txt
	new file:   test/trigenic_scoring_test/scored_triple_test_small_digenic.txt.log
	new file:   test/trigenic_scoring_test/scored_triple_test_small_digenic.txt.orf
	new file:   test/trigenic_scoring_test/scored_triple_test_small_digenic.txt.txt
	new file:   test/trigenic_scoring_test/scored_triple_test_small_trigenic.txt
	new file:   test/trigenic_scoring_test/scored_triple_test_small_trigenic.txt_CHROM.pcl
  • Loading branch information
mahfuz05062 committed Apr 1, 2020
1 parent 94d6f96 commit e130d37
Show file tree
Hide file tree
Showing 20 changed files with 693,448 additions and 1 deletion.
21 changes: 21 additions & 0 deletions Column_Key.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,24 @@ Final format for the 2010 paper
12. Double mutant fitness
13. Double mutant fitness standard deviation

Assignment file for score_trigenic_interactions.m
-------------------------------------------------
### 8 col
1. StrainID1: strain ID of the first single mutant.
2. ORF1: ORF of the first single mutant.
3. StrainID2: strain ID of the second single mutant.
4. ORF2: ORF of the second single mutant.
5. DMstrainID: strain ID for the double mutant query (ORFs are ORF1 and ORF2).
6. SM1strainID: strain ID for the single mutant (one of the ORFs will be YDL227C/HO).
7. SM2strainID: strain ID for the other single mutant.
8. Annotation: which class this interaction should be assigned to.

Output of print_trigenic.m
--------------------------
### 6 col
1. Query Strain ID
2. Array Strain ID
3. Adjusted genetic interaction score (epsilon or tau)
4. P-value
5. Double/triple mutant fitness
6. Double/triple mutant fitness standard deviation
3 changes: 2 additions & 1 deletion add_SGAPATH.m
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ function add_SGAPATH()
addpath([base_dir '/util']);
addpath([base_dir '/postprocess']);
addpath([base_dir '/linkage_est']);
addpath([base_dir '/test']);
addpath([base_dir '/trigenic']);
addpath([base_dir '/test']);
end
15 changes: 15 additions & 0 deletions test/Readme.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
SGA score
===============

Data Description
================

Expand Down Expand Up @@ -53,4 +56,16 @@ test_20.sh diffs the orf file to check for strain census changes

*You may also want to examine a diff of the log files, but as file paths are subject to change
without affecting scores, this is tricky to do automatically.*


Tau-SGA (trigenic) Score
========================

Data Description
================
Data follows the same format of the SGA score. For the trigenic specific test, we have provided two different input files.

1. raw_triple_test_small.txt
2. raw_triple_test_full.txt.gz

Both files reside inside the directory 'trigenic_scoring_test'. Outputs of different steps for the small dataset is provided inside the directory that include output of the SGA scoring step, Tau-SGA scoring step, and the clustergrams and is generated using the script 'trigenic_scoring_job.m'. The outputs for the full file can be generated with the same script but using the full file as the input instead.
57 changes: 57 additions & 0 deletions test/trigenic_scoring_job.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
%% Step 1: Parameter Setup

% A. Mandatory Parameters: provides locations of requried files
inputfile = 'test/trigenic_scoring_test/raw_triple_test_small.txt'; % unzip raw_triple_test_small.txt.gz
outputfile= 'test/trigenic_scoring_test/scored_triple_test_small_digenic.txt';

smfitnessfile = 'refdata/smf_DmQueryStd_MiniArray_180208.tsv';
linkagefile = 'refdata/linkage_estimate_curated_160426.txt';
coord_file = 'refdata/chrom_coordinates_150617.tab';
removearraylist = 'refdata/bad_strains_160303.csv';

% B. Overridable parameters
border_strain_orf = 'YOR202W_dma1';
wild_type = 'URA3control+YDL227C_y13096';
skip_perl_step = false;

% C. Optional parameters
% skip_linkage_detection = false;
% skip_linkage_mask = false;
% skip_wt_remove = false;
random_seed = 42; % Reproducibility


%% Step 2: Add the SGA repository into path

% Assuming the current directory contains base_dir (relative path); if not,
% use the absolute path of the base_dir here
base_dir = 'SGA_Public-master';
addpath(base_dir);

% Add necessary subdirectories
add_SGAPATH();


%% Step 3: Generate SGA scores
cd(get_SGAROOT); % Make sure you are inside the base_dir
compute_sgascore


%% Step 4-7: Generate Tau-SGA scores

% Step 4: Load from standard sga output file (12 columns)
sga = load_sga_epsilon_from_scorefile([outputfile '.txt'], [outputfile '.orf']);

% Step 5: Provide an assignent file (inside SGA_Public-master/refdata)
assignments = 'assignment_file_170328.csv';

% Step 6: Calculate Tau scores (trigenic scores)
sga_triple = score_trigenic_interactions(sga, assignments);

% Step 7: Print/write the Tau scores
tau_output_file = strrep(outputfile, 'digenic', 'trigenic');
print_trigenic(sga, sga, tau_output_file);


%% Step 8: Clustering
generate_fg_clustergram(sga_triple, tau_output_file);
Binary file not shown.
669,696 changes: 669,696 additions & 0 deletions test/trigenic_scoring_test/raw_triple_test_small.txt

Large diffs are not rendered by default.

145 changes: 145 additions & 0 deletions test/trigenic_scoring_test/scored_triple_test_small_digenic.txt.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
Working dir is /home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master
Using DEFAULT: skip_linkage_detection = false
Using DEFAULT: skip_linkage_mask = false
Using DEFAULT: skip_wt_remove = false
Using DEFAULT: remove_HO_globally = false
Using DEFAULT: skip_batch_correction = false
Using DEFAULT: disable_arrayvar_pval = false
Using DEFAULT: disable_jackknife = false
Random Seed is 42

Load raw SGA data with batch script: /home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/IO/load_raw_sga_data_withbatch

beginning perl preprocessing
Data loaded.
using border strain YOR202W_dma1
border strain array matches 132544 colonies (19%); expected (19%)


Strain Summary:
type sn dma tsq damp tsa trip unann total
query 0 0 0 0 0 10 0 10
array 0 1033 0 0 200 0 0 1233

number of unique array plates found: 4
5232 colonies ignored from "bad arrays"

Constructing plateid->ind map...
| |
|**************************************************|
Constructing query->ind map...
Constructing array->ind map...

Linkage filter script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/filter_linkage_colonies
Linkage Defs: refdata/linkage_estimate_curated_160426.txt
Chrom Defs: refdata/chrom_coordinates_150617.tab

Array coordinates mapped.
array orfs matched: 1233
array orfs not found: 0
Query coordinates mapped.
query orfs matched: 0
query orfs not found: 10
All Queries are DM, ...removing HO/URA3 globally...
Mapping query-specific linkage...
| |
|**************************************************|
Query Linkage Process Report
strain based linkages: 0
orf based linkages: 4
window based linkages: 6
linkage failures: 0
Mapping array-specific linkage...
| |
|**************************************************|
Array Linkage Process Report
strain based linkages: 11
orf based linkages: 0
window based linkages: 0
linkage failures: 0
26372 colonies identified as linkage

Plate normalization script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_plate_normalization

Plate normalization...
| |
|**************************************************|
Calculating colony residuals...
| |
|**************************************************|

Spatial correction script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_spatial_normalization

Spatial normalization...
| |
|**************************************************|

Row/column correction script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_rowcol_normalization

Row/column correction...
| |
|**************************************************|

Competition correction script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_competition_correction

Get colony neighbor indices script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/get_colony_neighbor_indices_list

Mapping neighboring colonies...
| |
|**************************************************|

Competition correction...
| |
|**************************************************|

Plate normalization script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_plate_normalization

Plate normalization...
| |
|**************************************************|

Jackknife variance correction script:
/home/mahfuz/Desktop/SGA/Nat_Protocols_Paper_Release/SGA_Public-master/corrections/apply_jackknife_correction

Running the hold-one-out filter...
| |
|**************************************************|
Finished applying filters...
Getting arrayplate means...
| |
|**************************************************|
Preparing for batch normalization...
| |
|**************************************************|
Batch normalization...
| |
|**************************************************|
Calculating array WT variance...
| |
|**************************************************|
Computing average for double mutants...
| |
|*********************************************|
Pooling across arrayplates for each query...
| |
|************************* -- query_arrplate_vars space reallocation --
*************************|
Fitness file report:
Exact match : 1237
Partial Match : 0
Not Found : 6
NaN in file : 55
Model fitting...
| |
|**************************************************|
Printing output file...
| |
|**************************************************|
total time elapsed: 0.06 hours
Loading

0 comments on commit e130d37

Please sign in to comment.