Skip to content

Commit

Permalink
add summary and shap ref
Browse files Browse the repository at this point in the history
  • Loading branch information
florian-huber committed May 31, 2024
1 parent 31b1dee commit 2d0bf53
Showing 1 changed file with 50 additions and 4 deletions.
54 changes: 50 additions & 4 deletions notebooks/live_coding_09e_machine_learning_ensembles.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,15 @@
"cell_type": "code",
"execution_count": 5,
"id": "eeaf5a13-2aad-4384-9ba9-d002573338ff",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"import os\n",
Expand All @@ -34,7 +42,15 @@
"cell_type": "code",
"execution_count": 6,
"id": "b1b39858-2bf0-46ec-8ad7-b77a07c7458a",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -459,7 +475,13 @@
"cell_type": "code",
"execution_count": 37,
"id": "4d53915b-f5db-478c-92d6-469748cc4260",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"name": "stdout",
Expand All @@ -481,11 +503,25 @@
"print(f'Accuracy of AdaBoost Classifier: {accuracy:.2f}')"
]
},
{
"cell_type": "markdown",
"id": "2e2d25f8-0cc1-41c8-a5a3-f055656e8e24",
"metadata": {},
"source": [
"The accuracy looks promising, but let's better also check the confusion matrix."
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "9dda356c-4847-468a-a47c-9fbeab1f8240",
"metadata": {},
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [
{
"data": {
Expand All @@ -512,6 +548,16 @@
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "74e3bc6f-7e4f-483b-8d6e-aed3c3c27c91",
"metadata": {},
"source": [
"In many cases, ensemble models outperform individual models when it comes to robustness and the quality of the predictions. They come with two downsides. Firstly, they require the training and internal handling of dozens or hundreds of models instead of only one. For moderately sized datasets this is usually a price people are very willing to pay.\n",
"\n",
"Secondly, ensemble models are often harder to interpret. An individual decision tree is, in principle, fully human-readable. However, a random forest of hundreds of trees is not as easily accessible. There are techniques that help us to interpret predictions of such ensemble models, such as SHAP {cite}`lundberg_shap2017` {cite}`lundberg2020local2global`. Feel free to explore those tools yourself (e.g. [SHAP](https://github.com/shap/shap))."
]
},
{
"cell_type": "markdown",
"id": "cc5d6e04-252b-4cd6-b3fd-e164d6c35282",
Expand Down

0 comments on commit 2d0bf53

Please sign in to comment.