Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gene essentiality workflow #675

Merged
merged 24 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
07329bc
feat: gene essentiality workflow with pr comment
mihai-sysbio Jul 7, 2023
c3db9c5
Merge branch 'develop' into feat/gene-essentiality-workflow
mihai-sysbio Oct 6, 2023
1bba24d
chore: expand workflow targets
mihai-sysbio Oct 6, 2023
aff8211
fix: add to matlab path
mihai-sysbio Oct 6, 2023
cd98819
fix: style of macaw summary output
edkerk Oct 18, 2024
c83e572
Merge branch 'feat/macaw_output' into feat/gene-essentiality-workflow
edkerk Oct 18, 2024
7f8c8ae
feat: post and store gene essentiality results
edkerk Oct 19, 2024
497bd7b
fix: model path and allow 2 workflow comments
edkerk Oct 19, 2024
5409a5d
chore: add macaw test result
edkerk Oct 19, 2024
3ac5563
fix: use recent RAVEN and gurobi
edkerk Oct 19, 2024
61f9b12
fix: avoid Penr statistic calculation
edkerk Oct 19, 2024
ff716cb
fix: missing ) in gene-essentiality.yml
edkerk Oct 19, 2024
88fbdfc
chore: add gene essentiality test result
edkerk Oct 19, 2024
0ef23be
fix: gene-essentiality.yml output format
edkerk Oct 20, 2024
91ed960
fix: location of data/testResults/README.md
edkerk Oct 20, 2024
79c258a
chore: add macaw test result
edkerk Oct 20, 2024
ac5c47c
fix: test workflow including branch update
edkerk Oct 21, 2024
6371196
fix: correct git-auto-commit-action@v5
edkerk Oct 21, 2024
e5342bd
chore: add gene essentiality test result
edkerk Oct 21, 2024
e56fbe8
fix: run all Hart et al. 2015 tests
edkerk Oct 21, 2024
ca406a6
fix: missing )
edkerk Oct 21, 2024
a5409dc
chore: add gene essentiality test result
edkerk Oct 21, 2024
b847199
fix: only git stash pop if stash exist
edkerk Oct 21, 2024
43d1d4d
fix gene-essentiality ' not "
edkerk Oct 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/commentGeneEssential.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the gene essentiality test:

<pre>
{TEST_RESULTS}
</pre>

> _Note: In the case of multiple test runs, this post will be edited._
9 changes: 9 additions & 0 deletions .github/workflows/commentMacaw.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the [MACAW](https://github.com/Devlin-Moyer/macaw) test:

<pre>
{TEST_RESULTS}
</pre>

This and a more detailed output from MACAW are also committed to `data/macawResults/`.

> _Note: In the case of multiple test runs, this post will be edited._
4 changes: 1 addition & 3 deletions .github/workflows/commentsFromTests.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the macaw test:
This PR has been [automatically tested with GH Actions](https://github.com/SysBioChalmers/Human-GEM/actions/runs/{GH_ACTION_RUN}). Here is the output of the gene essentiality test:

<pre>
{TEST_RESULTS}
</pre>

A more detailed output from this test run is also committed to `data/macawResults/macaw_results.csv`.

> _Note: In the case of multiple test runs, this post will be edited._
80 changes: 80 additions & 0 deletions .github/workflows/gene-essentiality.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
name: Check gene essentiality with Hart 2015

on:
pull_request:
branches:
- "main"
mihai-sysbio marked this conversation as resolved.
Show resolved Hide resolved
- "develop"

jobs:
check-metabolictasks:
runs-on: self-hosted

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Fetch RAVEN
uses: actions/checkout@v4
with:
repository: "SysBioChalmers/RAVEN"
path: "RAVEN"

- name: Run gene essentiality
id: essentiality
run: >
TEST_RESULTS=$(/usr/local/bin/matlab -batch
"warning('off', 'MATLAB:rmpath:DirNotFound');
rmpath(genpath('/home/m/ecModels-dependencies/RAVEN'));
rmpath(genpath('/home/m/actions-runner'));
addpath(genpath('.'));
setRavenSolver('gurobi');
ihuman = readYAMLmodel('model/Human-GEM.yml');
taskStruct = parseTaskList('data/metabolicTasks/metabolicTasks_Essential.txt');
[~, eGenes] = evalc('estimateEssentialGenes(ihuman, ''Hart2015_RNAseq.txt'', taskStruct);');
output = transpose(evaluateHart2015Essentiality(eGenes));
fid = fopen('data/testResults/gene-essential.csv','w');
fprintf(fid,[repmat('%s,',1,9) '%s\n'],output{:,1});
fprintf(fid,['%s,%d,%d,%d,%d' repmat(',%.4g',1,5) '\n'],output{:,2:end});
fclose(fid);
disp(cell2table(transpose(output(:,2:end)),'VariableNames',output(:,1)));") &&
echo "$TEST_RESULTS" &&
PARSED_RESULTS="${TEST_RESULTS//$'\n'/'<br>'}" &&
PARSED_RESULTS="${PARSED_RESULTS//$'\r'/'<br>'}" &&
echo "results=$PARSED_RESULTS" >> $GITHUB_OUTPUT

- name: Mention PR# in README.md
env:
PR_NUMBER: ${{ github.event.number }}
run: sed -i -e "s/[[:digit:]]\{3,4\}\*\* (gene /$PR_NUMBER\*\* (gene /" data/testResults/README.md

- name: Update local branch before committing changes
env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
run: |
git stash
git fetch
git checkout $BRANCH_NAME
if git stash list | grep -q 'stash@{'; then
git stash pop
fi

- name: Auto-commit results
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_user_name: memote-bot
commit_message: "chore: add gene essentiality test result"
file_pattern: data/testResults/*
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.number }}

- name: Post comment
uses: NejcZdovc/comment-pr@v2
with:
file: "commentGeneEssential.md"
identifier: "GITHUB_COMMENT_GENE"
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
TEST_RESULTS: ${{steps.essentiality.outputs.results}}
GH_ACTION_RUN: ${{github.run_id}}
21 changes: 17 additions & 4 deletions .github/workflows/macaw-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,26 +31,39 @@ jobs:
PARSED_RESULTS="${PARSED_RESULTS//$'\r'/'<br>'}"
echo $PARSED_RESULTS
echo "results=$PARSED_RESULTS" >> $GITHUB_OUTPUT
printf "$TEST_RESULTS" > data/testResults/macaw_summary.md

- name: Mention PR# in README.md
env:
PR_NUMBER: ${{ github.event.number }}
run: sed -i -e "s/#[[:digit:]]\{3,4\}/#$PR_NUMBER/g" data/macawResults/README.md

run: sed -i -e "s/[[:digit:]]\{3,4\}\*\* (MACAW)/$PR_NUMBER\*\* (MACAW)/" data/testResults/README.md

- name: Update local branch before committing changes
env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
run: |
git stash
git fetch
git checkout $BRANCH_NAME
if git stash list | grep -q 'stash@{'; then
git stash pop
fi

- name: Auto-commit results
uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_user_name: memote-bot
commit_message: "chore: add macaw test result"
file_pattern: data/macawResults/*
file_pattern: data/testResults/*
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.number }}

- name: Post comment
uses: NejcZdovc/comment-pr@v2
with:
file: "commentsFromTests.md"
file: "commentMacaw.md"
identifier: "GITHUB_COMMENT_MACAW"
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
TEST_RESULTS: ${{steps.macaw-run.outputs.results}}
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,16 @@
% Table S2 from the Hart2015 datasets:
% In this study, essential genes are, by definition, a subset of fitness genes
x = readtable('Hart2015_TableS2.xlsx');

% remove duplicated rows (MARCH1 and MARCH2 genes)
ind1 = find(ismember(x.Gene,'MARCH1'),1,'last');
ind2 = find(ismember(x.Gene,'MARCH2'),1,'last');
x([ind1;ind2],:) = [];

% extract gene list and cell types
genes = x.Gene;
celltypes = regexprep(upper(x.Properties.VariableNames(3:7)'),'BF_','');

% for each cell type, determine the "fitness" genes, defined as those with
% a 5% FDRs were chosen as thresholds (the values are extracted from the
% supporting information of the Hart 2015 paper. Genes observed in 3 or more
Expand All @@ -63,7 +63,7 @@
% also get the genes that were essential in all 5 cell lines ("all")
celltypes(end+1) = {'all'};
fitness_mat(:,end+1) = all(fitness_mat == 1,2);

% put Hart2015 essentiality data into data structure for comparison
expdata = {};
expdata.genes = genes;
Expand Down Expand Up @@ -94,30 +94,31 @@
% calculate true and false positives and negatives
[TP,TN,FP,FN,Penr] = deal(nan(numel(tissues),1)); % initialize variables
for i = 1:numel(tissues)

[~,tissue_ind] = ismember(tissues(i), expdata.tissues);
if tissue_ind == 0
continue % a few tissues are missing from the DepMap dataset
end

modelGenes = eGenes.refModel.genes;
if strcmpi(tissues{i},'all')
modelEssential = modelGenes(sum(modelPred,2) == size(modelPred,2));
else
modelEssential = modelGenes(modelPred(:,i));
end
modelNonEssential = setdiff(modelGenes,modelEssential);

expGenes = expdata.genes(~isnan(expdata.essential(:,tissue_ind)));
expEssential = expdata.genes(expdata.essential(:,tissue_ind) == 1);
expNonEssential = setdiff(expGenes,expEssential);

TP(i) = sum(ismember(modelEssential, expEssential)); % true positives
TN(i) = sum(ismember(modelNonEssential, expNonEssential)); % true negatives
FP(i) = sum(ismember(modelEssential, expNonEssential)); % false positives
FN(i) = sum(ismember(modelNonEssential, expEssential)); % false negatives
FN(i) = sum(ismember(modelNonEssential, expEssential)); % false negatives

Penr(i) = EnrichmentTest(intersect(modelGenes,expGenes), intersect(modelEssential,expGenes), intersect(expEssential,modelGenes));
% Requires Statistics and Machine Learning Toolbox
%Penr(i) = EnrichmentTest(intersect(modelGenes,expGenes), intersect(modelEssential,expGenes), intersect(expEssential,modelGenes));
end

% calculate some metrics
Expand All @@ -126,13 +127,15 @@
accuracy = (TP + TN)./(TP + TN + FP + FN);
F1 = 2*TP./(2*TP + FP + FN);
MCC = ((TP.*TN) - (FP.*FN))./sqrt((TP+FP).*(TP+FN).*(TN+FP).*(TN+FN)); % Matthews correlation coefficient
PenrAdj = adjust_pvalues(Penr,'Benjamini');
%PenrAdj = adjust_pvalues(Penr,'Benjamini');

% get results for cell types
results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC','Penr','logPenr','PenrAdj','logPenrAdj'};
[tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC, Penr, -log10(Penr), PenrAdj, -log10(PenrAdj)])]];

results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC'};
[tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC])]];

% Including Penr results
%results = [{'cellLine','TP','TN','FP','FN','accuracy','sensitivity','specificity','F1','MCC','Penr','logPenr','PenrAdj','logPenrAdj'};
% [tissues, num2cell([TP, TN, FP, FN, accuracy, sensitivity, specificity, F1, MCC, Penr, -log10(Penr), PenrAdj, -log10(PenrAdj)])]];
end

function [penr,pdep] = EnrichmentTest(pop,sample,successes)
Expand Down
2 changes: 1 addition & 1 deletion code/test/macawTests.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
(dead_end_results, dead_end_edges) = dead_end_test(model)
(duplicate_results, duplicate_edges) = duplicate_test(model)
output = dead_end_results.merge(duplicate_results)
output.to_csv('data/macawResults/macaw_results.csv', index = False)
output.to_csv('data/testResults/macaw_results.csv', index = False)
17 changes: 11 additions & 6 deletions data/macawResults/README.md → data/testResults/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,23 @@
# MACAW results
# Test results

The file here contains results from the [MACAW](https://github.com/Devlin-Moyer/macaw) `dead_end_test` and `duplicate_test` tests.
The file here contains results from the [MACAW](https://github.com/Devlin-Moyer/macaw) `dead_end_test` and `duplicate_test` tests, and from cell-line specific gene essentiality prediction based on the [Hart _et al._ (2015)](https://doi.org/10.1016/j.cell.2015.11.015) dataset.

The test results shown here were obtained by the GitHub Actions run in **PR #829**, and will be updated by any subsequent PR. Summary results are shown as a comment in the corresponding PR.
The test results shown here were obtained by the GitHub Actions run in **PR #675** (MACAW) and **PR #675** (gene essentiality), and will be updated by any subsequent PR. Summary results are shown as a comment in the corresponding PR.

### `dead_end_test`
### MACAW: `dead_end_test`
Looks for metabolites in Human-GEM that can only be produced by all reactions they participate in or only consumed, then identifies all reactions that are prevented from sustaining steady-state fluxes because of each of these dead-end metabolites. The simplest case of a dead-end metabolite is one that only participates in a single reaction. Also flags all reversible reactions that can only carry fluxes in a single direction because one of their metabolites can either only be consumed or only be produced by all other reactions it participates in.

### `duplicate_test`
### MACAW: `duplicate_test`
Identifies sets of reactions that may be duplicates of each other because they:

- Involve exactly the same metabolites with exactly the same stoichiometric coefficients (but potentially different associated genes).
- Involve exactly the same metabolites, but go in different directions and/or some are reversible and some are not.
- Involve exactly the same metabolites, but with different stoichiometric coefficients.
- Represent the oxidation and/or reduction of the same metabolite, but use different electron acceptors/donors from the given list of pairs of oxidized and reduced forms of various electron carriers (e.g. NAD(H), NADP(H), FAD(H2), ubiquinone/ubiquinol, cytochromes).

It is possible for a single reaction to fit in multiple of the above categories. There are sometimes cases where sets of reactions that fall into one of the above categories are completely legitimate representations of real biochemistry (e.g. separate irreversible reactions for importing vs exporting the same metabolite because two different transporters encoded by different genes are each responsible for transporting that metabolite in only one direction, enzymes that can use NAD(H) or NADP(H) interchangeably to catalyze the same redox reaction), but reactions that meet these criteria are generally worth close examination to ensure that they should actually all exist as separate reactions.
It is possible for a single reaction to fit in multiple of the above categories. There are sometimes cases where sets of reactions that fall into one of the above categories are completely legitimate representations of real biochemistry (e.g. separate irreversible reactions for importing vs exporting the same metabolite because two different transporters encoded by different genes are each responsible for transporting that metabolite in only one direction, enzymes that can use NAD(H) or NADP(H) interchangeably to catalyze the same redox reaction), but reactions that meet these criteria are generally worth close examination to ensure that they should actually all exist as separate reactions.

### Cell-line specific gene essentiality
Evaluate gene essentiality predictions in 5 cell-line specific GEMs with experimental fitness data gathered from the [Hart _et al._ (2015)](https://doi.org/10.1016/j.cell.2015.11.015).

Cell-line specific GEMs are constructed with tINIT2 for DLD1, GBM, HCT116, HeLa and RPE1 cell lines. Then, the `metabolicTasks_Essential.txt` list of tasks is used to identify essential genes in each of these models. The predicted gene essentiality is compared to results from a high-throughput CRISPR-Cas9 screen for identifying genes that affect fitness. Only the summary statistics of this comparison are kept.
7 changes: 7 additions & 0 deletions data/testResults/gene-essential.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
cellLine,TP,TN,FP,FN,accuracy,sensitivity,specificity,F1,MCC
DLD1,36,2185,59,279,0.8679,0.1143,0.9737,0.1756,0.1529
GBM,34,2165,61,298,0.8597,0.1024,0.9726,0.1593,0.1333
HCT116,46,2207,53,309,0.8616,0.1296,0.9765,0.2026,0.1905
HELA,30,2263,69,254,0.8765,0.1056,0.9704,0.1567,0.124
RPE1,14,2204,81,259,0.8671,0.05128,0.9646,0.07609,0.02585
all,7,2408,92,109,0.9232,0.06034,0.9632,0.06512,0.0254
Loading
Loading