greenelab · jjc2718 · Sep 21, 2020 · Sep 14, 2020 · Sep 14, 2020 · Sep 14, 2020
diff --git a/04_plot_results.ipynb → 04_plot_stratified_results.ipynb b/04_plot_results.ipynb → 04_plot_stratified_results.ipynb
@@ -465,8 +465,8 @@
    ],
    "source": [
     "vogelstein_results_df = au.compare_results(vogelstein_df, metric='aupr', correction=True,\n",
-    "                                correction_method='fdr_bh', correction_alpha=0.001,\n",
-    "                                verbose=True)\n",
+    "                                           correction_method='fdr_bh', correction_alpha=0.001,\n",
+    "                                           verbose=True)\n",
     "vogelstein_results_df.sort_values(by='p_value').head(n=10)"
    ]
   },
@@ -665,7 +665,12 @@
    "source": [
     "The plot above is similar to a volcano plot used in differential expression analysis. The x-axis shows the difference between AUPR in the signal (true labels) case and in the negative control (shuffled labels) case, and the y-axis shows the negative log of the t-test p-value, after FDR adjustment.\n",
     "\n",
-    "Orange points are significant at a cutoff of $\\alpha = 0.001$ after FDR correction."
+    "Orange points are significant at a cutoff of $\\alpha = 0.001$ after FDR correction.\n",
+    "\n",
+    "Our interpretation of these results:\n",
+    "\n",
+    "* For the top 50 analysis, we mostly reproduced the results from BioBombe which also used this gene set (some of the less significant hits weren't found in BioBombe, but we should have better statistical power here so it makes sense that we see more results)\n",
+    "* For the Vogelstein analysis, it was surprising/interesting that we saw lots more significant hits than we did for the top 50 analysis! On some level it's not shocking (if a gene is mutated frequently that doesn't necessarily make it a driver, and conversely drivers aren't always frequently mutated across all samples) but seeing visual confirmation of this was neat."
    ]
   },
   {
@@ -736,6 +741,17 @@
    "source": [
     "We have usually used TTN as our negative control (not understood to be a cancer driver, but is a large gene that is frequently mutated as a passenger). So it's a bit weird that it has a fairly low p-value here (would be significant at $\\alpha = 0.05$). We'll have to think about why this is."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# save significance testing results\n",
+    "top50_results_df.to_csv(os.path.join(cfg.results_dir, 'top50_stratified_pvals.tsv'), index=False, sep='\\t')\n",
+    "vogelstein_results_df.to_csv(os.path.join(cfg.results_dir, 'vogelstein_stratified_pvals.tsv'), index=False, sep='\\t')"
+   ]
   }
  ],
  "metadata": {