Skip to content

Commit

Permalink
Deploying to gh-pages from @ 241c50c 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
mitjapotocin committed Oct 26, 2023
1 parent e027d5e commit 12a2696
Show file tree
Hide file tree
Showing 1,016 changed files with 16,526 additions and 16,499 deletions.
78 changes: 45 additions & 33 deletions 404.html

Large diffs are not rendered by default.

78 changes: 45 additions & 33 deletions 404/index.html

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions _next/data/2TRr12l_7QnMCbhAgxpBy/home/orange-users.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"pageProps":{"title":"Orange users","weight":40,"mdxSource":{"compiledSource":"/*@jsxRuntime automatic @jsxImportSource react*/\nconst {Fragment: _Fragment, jsx: _jsx, jsxs: _jsxs} = arguments[0];\nconst {useMDXComponents: _provideComponents} = arguments[0];\nfunction _createMdxContent(props) {\n const _components = Object.assign({\n h2: \"h2\",\n p: \"p\"\n }, _provideComponents(), props.components), {Figure} = _components;\n if (!Figure) _missingMdxReference(\"Figure\", true);\n return _jsxs(_Fragment, {\n children: [_jsx(_components.h2, {\n children: \"Education in Data Science\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"Orange is the perfect tool for hands-on training. Teachers enjoy the clear program design and the visual explorations of data and models. Students benefit from the flexibility of the tool and the power to invent new combinations of data mining methods. The educational strength of Orange comes from the combination of visual programming and interactive visualizations. We have also designed some educational widgets that have been explicitly created to support teaching.\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"Here are a few example workflows that we have used recently in data mining training (yes, we do not only develop Orange, we teach with it as well).\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Linear Regression\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"Wouldn't be great if we could just paint data and with each new data point observe how linear regression fits the line? In Orange, there's a widget for data painting and a polynomial regression widget (from the educational add-on) to display the fitted model.\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/linear-regression.thumb.png\",\n src: \"Paint the data, fit the model, check out the model coefficients.\",\n width: \"555\",\n height: \"450\",\n src: \"/home/[slug]/orange_users/__optimized-images__/linear-regression.thumb.png\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Overfitting\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"Not everything is a line. We can use linear regression on augmented data input with added columns for powers of input features. This is called polynomial regression. It is guaranteed to surprise students. Using a linear model you can now discover non-linear functions. But you can also heavily overfit the training data. For say two, three ... or ten input data points, what is the degree of polynomial expansion for the linear model to perfectly fit the data? When linear models overfit, model coefficients become very high. It is so easy to play with this in Orange: add some data here, raise or lower the degree of polynomial there...\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/overfitting-poly.thumb.png\",\n src: \"Polynomial expansion of input data can lead to interesting data fits, but also to overfitting.\",\n width: \"555\",\n height: \"498\",\n src: \"/home/[slug]/orange_users/__optimized-images__/overfitting-poly.thumb.png\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Regularization\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"If overfitting leads to the explosion of the values of the coefficients, it could easily be prevented by requesting the optimization to keep these low. That is exactly the idea behind regularization. In the workflow below, Polynomial Regression was given a regularized model. No more overfitting! It is also great to explore how the strength of regularization smooths the resulting model and reduces the values of coefficients.\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/regularization.thumb.png\",\n src: \"Regularization smooths the model and reduces the value of model coefficients.\",\n width: \"555\",\n height: \"529\",\n src: \"/home/[slug]/orange_users/__optimized-images__/regularization.thumb.png\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Always Evaluate Models on the Test Data\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"The following workflow is a bit more complex. We split the painted data to a training and test set. This time, as we use standard Orange widgets for learning, we need to declare that y is a class attribute. We do this in the Select Columns widget. We then evaluate the linear regression model with polynomial expansion on the training data and on the separate test data.\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/test-on-test-workflow.thumb.png\",\n src: \"A workflow to experiment with model scoring on train and test data sets.\",\n width: \"555\",\n height: \"543\",\n src: \"/home/[slug]/orange_users/__optimized-images__/test-on-test-workflow.thumb.png\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"Overfitted models have small errors on training data, but large errors on test data. To escape this, regularization helps. With it, the error on the test data is lower, while the error on the training data increases. Huh! The error on training data set is thus not a good indicator of the predictive power of the model. While this looks simple, everything in machine learning is about how to design models that will work well on the data they have not seen in training.\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/test-scores.thumb.png\",\n src: \"Model scoring on training and test data set.\",\n width: \"555\",\n height: \"171\",\n src: \"/home/[slug]/orange_users/__optimized-images__/test-scores.thumb.png\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"When teaching, the workflow presented here needs quite some thought and time. It should come after we explain linear regression, polynomial expansion, overfitting and regularization. But it gives so much freedom for students to explore: consider the interplay of different complexity of (painted) data set, degrees of polynomial expansion, and the effects of regularization. Plus, it provides us (teachers and trainers) the opportunity to talk about regression scoring and nicely leads to the introduction of cross-validation. Oh, the richness and art of data mining...\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Experimenting with k-Means Clustering\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"After the intro on k-means clustering algorithms (there is a widget from educational add on to support this), a great exercise for students is to check when the algorithm works and where it fails. Painting the data helps again! Say, for the smiley data set, k-means guided by silhouette scoring finds four clusters instead of three. Orange widgets can be set to automatic commit (Send Automatically) so that every time we change the input data, the signals propagate through the workflow for the users to immediately see the consequences of the changes. Can you paint a smiley data set where even clustering with k=3 would fail?\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/play-with-k-means.thumb.png\",\n src: \"Clustering, silhouette scoring and visualization of results.\",\n width: \"555\",\n height: \"462\",\n src: \"/home/[slug]/orange_users/__optimized-images__/play-with-k-means.thumb.png\"\n }), \"\\n\", _jsx(_components.h2, {\n children: \"Scoring of Clustering Models\"\n }), \"\\n\", _jsx(_components.p, {\n children: \"We did mention the clustering silhouette, right? It is the easiest approach to score the clustering. Silhouettes are estimated on data instances, and the silhouette of a clustering is the mean across data instance silhouettes. A high silhouette means that a data instance is surrounded by instances from the same cluster, while a low silhouette score indicates that data instances are close to another cluster. Orange has a widget that can plot the silhouette scores. And because Orange is all about interactive visualization, you can select silhouettes and check where their data instances are. Like in the workflow below, where we showcase that low silhouettes are assigned to borderline data instances. Silhouette Plot is great when explaining pros and cons of different clustering methods (yes, it works with any method, not just k-means).\"\n }), \"\\n\", _jsx(Figure, {\n src: \"/home/[slug]/orange_users/silhouette.thumb.png\",\n src: \"Like most of visualizations in Orange, Silhouette Plot is interactive as well.\",\n width: \"555\",\n height: \"322\",\n src: \"/home/[slug]/orange_users/__optimized-images__/silhouette.thumb.png\"\n })]\n });\n}\nfunction MDXContent(props = {}) {\n const {wrapper: MDXLayout} = Object.assign({}, _provideComponents(), props.components);\n return MDXLayout ? _jsx(MDXLayout, Object.assign({}, props, {\n children: _jsx(_createMdxContent, props)\n })) : _createMdxContent(props);\n}\nreturn {\n default: MDXContent\n};\nfunction _missingMdxReference(id, component) {\n throw new Error(\"Expected \" + (component ? \"component\" : \"object\") + \" `\" + id + \"` to be defined: you likely forgot to import, pass, or provide it.\");\n}\n","frontmatter":{},"scope":{}}},"__N_SSG":true}
File renamed without changes.
Loading

0 comments on commit 12a2696

Please sign in to comment.