Skip to content

Commit

Permalink
Chapter 6 production polish (#86)
Browse files Browse the repository at this point in the history
* starting work on ch5+6; categorical type change; remove commented out R code

* value counts, class name remap, replace in ch5

* remove warnings

* polished ch5+6 up to euclidean dist

* minor bugfix

* minor bugfix

* fixed worksheets link at end of chp

* fix minor section heading wording in Ch1

* added nsmallest + note; better chaining for dist comps; removed comments; fixed colors (not working yet)

* initial fit and predict polished; model spec -> model object

* polishing preprocessing

* balancing polished

* pipelines

* learning objs

* mute warnings in ch5

* warn mute code; fixed links at end

* restore cls2 to main branch

* remove caption hack; minor fix to learning objs

* Remove caption hack

* initial improved seed explanation

* random seed section polish done

* polished ch6 up to tuning

* initial cross val example done

* in python -> in scikit

* working on cross-val

* polished ch6 up to predictor selection

* commented out predictor selection

* done ch6 except final under/overfit plot

* warnings filter in ch6; remove seed hack cell

* remove reference to random state in train/test split

* minor typesetting .method() vs method

* put setup.md back in to fix broken links

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* values -> to_numpy in randomness section

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* Update source/classification2.md

Co-authored-by: Joel Ostblom <[email protected]>

* remove code for area plot at the end of ch6

Co-authored-by: Joel Ostblom <[email protected]>
  • Loading branch information
trevorcampbell and joelostblom authored Jan 18, 2023
1 parent 220f8d9 commit b6a0f4b
Show file tree
Hide file tree
Showing 4 changed files with 415 additions and 601 deletions.
2 changes: 1 addition & 1 deletion source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ parts:
- file: acknowledgements-python.md
- file: authors.md
- file: editors.md
#- file: setup.md
- file: setup.md
- caption: Chapters
numbered: 3
chapters:
Expand Down
5 changes: 1 addition & 4 deletions source/classification1.md
Original file line number Diff line number Diff line change
Expand Up @@ -942,7 +942,6 @@ we will discuss how to choose $K$ in the next chapter.
> which weigh each neighbor's vote differently, can be found on
> [the `scikit-learn` website](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html?highlight=kneighborsclassifier#sklearn.neighbors.KNeighborsClassifier).

```{code-cell} ipython3
knn = KNeighborsClassifier(n_neighbors=5)
knn
Expand Down Expand Up @@ -1048,7 +1047,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
'B' : 'Benign'
}).astype('category')
unscaled_cancer
unscaled_cancer
```

Looking at the unscaled and uncentered data above, you can see that the differences
Expand Down Expand Up @@ -1146,7 +1144,7 @@ is to *drop* the remaining columns. This default behavior works well with the re
in the {ref}`08:puttingittogetherworkflow` section), but for visualizing the result of preprocessing it can be useful to keep the other columns
in our original data frame, such as the `Class` variable here.
To keep other columns, we need to set the `remainder` argument to `'passthrough'` in the `make_column_transformer` function.
Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
Furthermore, you can see that the new column names---{glue:}`scaled-cancer-column-0`
and {glue:}`scaled-cancer-column-1`---include the name
of the preprocessing step separated by underscores. This default behavior is useful in `sklearn` because we sometimes want to apply
multiple different preprocessing steps to the same columns; but again, for visualization it can be useful to preserve
Expand Down Expand Up @@ -1742,7 +1740,6 @@ unscaled_cancer['Class'] = unscaled_cancer['Class'].replace({
}).astype('category')
unscaled_cancer
# create the KNN model
knn = KNeighborsClassifier(n_neighbors=7)
Expand Down
Loading

0 comments on commit b6a0f4b

Please sign in to comment.