Skip to content

Commit

Permalink
Decision Trees
Browse files Browse the repository at this point in the history
  • Loading branch information
tuanavu committed Mar 20, 2016
1 parent f427b7a commit 9a78d74
Show file tree
Hide file tree
Showing 2 changed files with 678 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,16 @@
"<!--TEASER_END-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"** Answer**\n",
"\n",
"- We are picking feature x3, because it has the lowest classification error\n",
" - At row 3, x3 = 1, y = -1, only 1 error compare to other features."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -43,6 +53,315 @@
"<!--TEASER_END-->"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"** Answer **"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n",
" <tr>\n",
" <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">x1</th>\n",
" <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">x2</th>\n",
" <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">x3</th>\n",
" <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">y</th>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">-1</td>\n",
" </tr>\n",
" <tr>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n",
" </tr>\n",
"</table>\n",
"[4 rows x 4 columns]<br/>\n",
"</div>"
],
"text/plain": [
"Columns:\n",
"\tx1\tint\n",
"\tx2\tstr\n",
"\tx3\tstr\n",
"\ty\tstr\n",
"\n",
"Rows: 4\n",
"\n",
"Data:\n",
"+----+----+----+----+\n",
"| x1 | x2 | x3 | y |\n",
"+----+----+----+----+\n",
"| 1 | 1 | 1 | 1 |\n",
"| 0 | 1 | 0 | -1 |\n",
"| 1 | 0 | 1 | -1 |\n",
"| 0 | 0 | 1 | 1 |\n",
"+----+----+----+----+\n",
"[4 rows x 4 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import graphlab\n",
"graphlab.canvas.set_target('ipynb')\n",
"\n",
"x = graphlab.SFrame({'x1':[1,0,1,0],'x2':['1','1','0','0'],'x3':['1','0','1','1'],'y':['1','-1','-1','1']})\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<pre>WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.</pre>"
],
"text/plain": [
"WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set."
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>Decision tree classifier:</pre>"
],
"text/plain": [
"Decision tree classifier:"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>--------------------------------------------------------</pre>"
],
"text/plain": [
"--------------------------------------------------------"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>Number of examples : 4</pre>"
],
"text/plain": [
"Number of examples : 4"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>Number of classes : 2</pre>"
],
"text/plain": [
"Number of classes : 2"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>Number of feature columns : 3</pre>"
],
"text/plain": [
"Number of feature columns : 3"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>Number of unpacked features : 3</pre>"
],
"text/plain": [
"Number of unpacked features : 3"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>+-----------+--------------+-------------------+-------------------+</pre>"
],
"text/plain": [
"+-----------+--------------+-------------------+-------------------+"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>| Iteration | Elapsed Time | Training-accuracy | Training-log_loss |</pre>"
],
"text/plain": [
"| Iteration | Elapsed Time | Training-accuracy | Training-log_loss |"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>+-----------+--------------+-------------------+-------------------+</pre>"
],
"text/plain": [
"+-----------+--------------+-------------------+-------------------+"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>| 1 | 0.000000 | 1.000000 | 0.634946 |</pre>"
],
"text/plain": [
"| 1 | 0.000000 | 1.000000 | 0.634946 |"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre>+-----------+--------------+-------------------+-------------------+</pre>"
],
"text/plain": [
"+-----------+--------------+-------------------+-------------------+"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"features = ['x1','x2','x3']\n",
"target = 'y' \n",
"\n",
"decision_tree_model = graphlab.decision_tree_classifier.create(x, validation_set=None,\n",
" target = target, features = features)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- The best feature to split on first is x3\n",
"- In this tree below you will see that starting from x3 = 1, the depth of the tree is 3."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"application/javascript": [
"$(\"head\").append($(\"<link/>\").attr({\n",
" rel: \"stylesheet\",\n",
" type: \"text/css\",\n",
" href: \"//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.1.0/css/font-awesome.min.css\"\n",
"}));\n",
"$(\"head\").append($(\"<link/>\").attr({\n",
" rel: \"stylesheet\",\n",
" type: \"text/css\",\n",
" href: \"//dato.com/files/canvas/1.8.4/css/canvas.css\"\n",
"}));\n",
"\n",
" (function(){\n",
"\n",
" var e = null;\n",
" if (typeof element == 'undefined') {\n",
" var scripts = document.getElementsByTagName('script');\n",
" var thisScriptTag = scripts[scripts.length-1];\n",
" var parentDiv = thisScriptTag.parentNode;\n",
" e = document.createElement('div');\n",
" parentDiv.appendChild(e);\n",
" } else {\n",
" e = element[0];\n",
" }\n",
"\n",
" if (typeof requirejs !== 'undefined') {\n",
" // disable load timeout; ipython_app.js is large and can take a while to load.\n",
" requirejs.config({waitSeconds: 0});\n",
" }\n",
"\n",
" require(['//dato.com/files/canvas/1.8.4/js/ipython_app.js'], function(IPythonApp){\n",
" var app = new IPythonApp();\n",
" app.attachView('sgraph','View', {\"edges_labels\": [\"yes\", \"no\", \"yes\", \"no\", \"no\", \"yes\"], \"selected_variable\": {\"name\": [\"<SGraph>\"], \"view_file\": \"sgraph\", \"view_component\": \"View\", \"view_params\": {\"elabel_hover\": false, \"vertex_positions\": null, \"h_offset\": 0.0, \"node_size\": 300, \"ecolor\": [0.37, 0.33, 0.33], \"elabel\": \"value\", \"arrows\": true, \"ewidth\": 1, \"vlabel\": \"__repr__\", \"highlight_color\": [0.69, 0.0, 0.498], \"vcolor\": [0.522, 0.741, 0.0], \"vlabel_hover\": false, \"highlight\": {\"0\": [0.69, 0.0, 0.48], \"1\": [0.039, 0.55, 0.77], \"3\": [1.0, 0.33, 0.0], \"5\": [0.039, 0.55, 0.77], \"6\": [1.0, 0.33, 0.0]}, \"v_offset\": 0.03}, \"view_components\": [\"View\"], \"type\": \"SGraph\", \"descriptives_links\": {\"edges\": \"edges\", \"vertices\": \"vertices\"}, \"descriptives\": {\"edges\": 6, \"vertices\": 7}}, \"positions\": null, \"error_type\": 0, \"vertices\": [5, 0, 2, 6, 3, 1, 4], \"vertices_labels\": [\"-0.12\", \"x3=1\", \"x1<1.0\", \"0.12\", \"0.12\", \"-0.12\", \"x2=1\"], \"edges\": [[0, 2], [0, 1], [2, 3], [2, 4], [4, 5], [4, 6]], \"ipython\": true, \"error_msg\": \"\"}, e);\n",
" });\n",
" })();\n",
" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"decision_tree_model.show(view=\"Tree\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -61,6 +380,26 @@
"<!--TEASER_END-->"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.0\n"
]
}
],
"source": [
"# Accuracy \n",
"print decision_tree_model.evaluate(x)['accuracy']"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Loading

0 comments on commit 9a78d74

Please sign in to comment.