Decision Trees are powerful algorithms, capable of fitting complex datasets. The decision trees make predictions based on the bunch of if/else statements by splitting a node into two or more sub-nodes.
With versatility, the decision tree is also prone to overfitting. One of the reason why this algorithm often overfits because of its depth. It tends to memorize all the patterns in the train data but struggle to performs well on the unseen data (validation or test set).
To overcome with overfitting problem, we can reduce the complexity of the algorithm by reducing the depth size.
The decision tree with only a single depth is called decision stump and it only has one split from the root.
Classes, functions, and methods:
DecisionTreeClassifier
: classification model fromsklearn.tree
class.max_depth
: hyperparameter to control the depth of decision tree algorithm.export_text
: method fromsklearn.tree
class to display the text report showing the rules of a decision tree.
Note: we have already covered DictVectorizer
in session 3 and roc_auc_score
in session 4 respectively.
Add notes from the video (PRs are welcome)
The notes are written by the community. If you see an error here, please create a PR with a fix. |