add summary and shap ref

florian-huber · May 31, 2024 · 2d0bf53 · 2d0bf53
1 parent 31b1dee
commit 2d0bf53
Showing 1 changed file with 50 additions and 4 deletions.
diff --git a/notebooks/live_coding_09e_machine_learning_ensembles.ipynb b/notebooks/live_coding_09e_machine_learning_ensembles.ipynb
@@ -20,7 +20,15 @@
    "cell_type": "code",
    "execution_count": 5,
    "id": "eeaf5a13-2aad-4384-9ba9-d002573338ff",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "hide-input"
+    ]
+   },
    "outputs": [],
    "source": [
     "import os\n",
@@ -34,7 +42,15 @@
    "cell_type": "code",
    "execution_count": 6,
    "id": "b1b39858-2bf0-46ec-8ad7-b77a07c7458a",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "hide-input"
+    ]
+   },
    "outputs": [
     {
      "data": {
@@ -459,7 +475,13 @@
    "cell_type": "code",
    "execution_count": 37,
    "id": "4d53915b-f5db-478c-92d6-469748cc4260",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -481,11 +503,25 @@
     "print(f'Accuracy of AdaBoost Classifier: {accuracy:.2f}')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "2e2d25f8-0cc1-41c8-a5a3-f055656e8e24",
+   "metadata": {},
+   "source": [
+    "The accuracy looks promising, but let's better also check the confusion matrix."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 38,
    "id": "9dda356c-4847-468a-a47c-9fbeab1f8240",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -512,6 +548,16 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "74e3bc6f-7e4f-483b-8d6e-aed3c3c27c91",
+   "metadata": {},
+   "source": [
+    "In many cases, ensemble models outperform individual models when it comes to robustness and the quality of the predictions. They come with two downsides. Firstly, they require the training and internal handling of dozens or hundreds of models instead of only one. For moderately sized datasets this is usually a price people are very willing to pay.\n",
+    "\n",
+    "Secondly, ensemble models are often harder to interpret. An individual decision tree is, in principle, fully human-readable. However, a random forest of hundreds of trees is not as easily accessible. There are techniques that help us to interpret predictions of such ensemble models, such as SHAP {cite}`lundberg_shap2017` {cite}`lundberg2020local2global`. Feel free to explore those tools yourself (e.g. [SHAP](https://github.com/shap/shap))."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "cc5d6e04-252b-4cd6-b3fd-e164d6c35282",