Chapter 6 production polish (#86)

* starting work on ch5+6; categorical type change; remove commented out R code * value counts, class name remap, replace in ch5 * remove warnings * polished ch5+6 up to euclidean dist * minor bugfix * minor bugfix * fixed worksheets link at end of chp * fix minor section heading wording in Ch1 * added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet) * initial fit and predict polished; model spec -> model object * polishing preprocessing * balancing polished * pipelines * learning objs * mute warnings in ch5 * warn mute code; fixed links at end * restore cls2 to main branch * remove caption hack; minor fix to learning objs * Remove caption hack * initial improved seed explanation * random seed section polish done * polished ch6 up to tuning * initial cross val example done * in python -> in scikit * working on cross-val * polished ch6 up to predictor selection * commented out predictor selection * done ch6 except final under/overfit plot * warnings filter in ch6; remove seed hack cell * remove reference to random state in train/test split * minor typesetting .method() vs method * put setup.md back in to fix broken links * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * values -> to_numpy in randomness section * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * Update source/classification2.md Co-authored-by: Joel Ostblom <[email protected]> * remove code for area plot at the end of ch6 Co-authored-by: Joel Ostblom <[email protected]>
UBC-DSCI · Jan 18, 2023 · b6a0f4b · b6a0f4b
1 parent 220f8d9
commit b6a0f4b
Show file tree

Hide file tree

Showing 4 changed files with 415 additions and 601 deletions.
diff --git a/source/_toc.yml b/source/_toc.yml
@@ -9,7 +9,7 @@ parts:
     - file: acknowledgements-python.md
     - file: authors.md
     - file: editors.md
-    #- file: setup.md
+    - file: setup.md
 - caption: Chapters
   numbered: 3
   chapters:

diff --git a/source/classification1.md b/source/classification1.md
@@ -942,7 +942,6 @@ we will discuss how to choose $K$ in the next chapter.
 > which weigh each neighbor's vote differently, can be found on 
 > [the `scikit-learn` website](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html?highlight=kneighborsclassifier#sklearn.neighbors.KNeighborsClassifier).
 
-
 ```{code-cell} ipython3
 knn = KNeighborsClassifier(n_neighbors=5)
 knn
@@ -1048,7 +1047,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
    'B' : 'Benign'
 }).astype('category')
 unscaled_cancer
-unscaled_cancer
 ```
 
 Looking at the unscaled and uncentered data above, you can see that the differences
@@ -1146,7 +1144,7 @@ is to *drop* the remaining columns. This default behavior works well with the re
 in the {ref}`08:puttingittogetherworkflow` section), but for visualizing the result of preprocessing it can be useful to keep the other columns
 in our original data frame, such as the `Class` variable here.
 To keep other columns, we need to set the `remainder` argument to `'passthrough'` in the `make_column_transformer` function.
- Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
+Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
 and {glue:}`scaled-cancer-column-1`---include the name
 of the preprocessing step separated by underscores. This default behavior is useful in `sklearn` because we sometimes want to apply
 multiple different preprocessing steps to the same columns; but again, for visualization it can be useful to preserve
@@ -1742,7 +1740,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
 }).astype('category')
 unscaled_cancer
 
-
 # create the KNN model
 knn = KNeighborsClassifier(n_neighbors=7)