Skip to content

Commit

Permalink
Updated Seeing The Forest For The Trees and 2770 other files
Browse files Browse the repository at this point in the history
  • Loading branch information
[email protected] authored and Siteleaf committed Nov 10, 2023
1 parent 8b69f69 commit 8310dd2
Show file tree
Hide file tree
Showing 2,771 changed files with 1,465 additions and 587 deletions.
437 changes: 0 additions & 437 deletions LICENCE

This file was deleted.

84 changes: 84 additions & 0 deletions _drafts/seeing-the-forest-for-the-trees.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: Seeing the Forest for the Trees
date: 2023-11-10 09:28:00 Z
summary: An introduction to the random forest machine learning model.
author: jstrong
---

I recently embarked on a journey into the world of machine learning through following the [fast.ai](https://course.fast.ai/) course taught by Jeremy Howard as part of an internal study group. I have learnt a great deal about the inner workings of neural networks and how deep learning can produce seemingly magical results. However, one of the most interesting discoveries for me was learning about a completely different type of model: the Random Forest.

## Starting from the Roots

In order to explain what a random forest is, it would be beneficial to highlight what it is made up of: Decision Trees. Decision trees are simple structures which go through a dataset and pose yes or no questions about its content. The data is split according to the answers to these questions and further questions are asked to split up the data into ever smaller subsets.

<div align="center">
<img src="/uploads/decision_tree.svg" width="400" height="400" title="Basic Decision Tree Diagram" alt="Basic Decision Tree Diagram"/>
</div>

From asking these binary questions, the decision tree allows us to get an idea of which features split the dataset most effectively. Most often, the goal is to predict a target feature of the dataset based on the rest. An example is predicting passenger survival on the Titanic. We can ask if the passenger was male or female? if they were in first class? or which port they embarked from? The effectiveness of each split may be measured by how well each side fits to the target variable. If we split by gender, how many men and women survive and therefore how accurate would a prediction be based on gender alone? Further questions aim to increase this accuracy to give a better model.

<div align="center">
<img src="/uploads/example_tree.svg" width="400" height="400" title="Titanic Example Decision Tree Diagram" alt="Titanic Example Decision Tree Diagram"/>
</div>

There are several parameters to consider when constructing a decision tree in order to get the best results:

* **Minimum Sample Size**: The least number of data points in a tree node, beyond which no further splits can be made.

* **Maximum Depth**: The limit of the number of layers of the tree

* **Maximum Features**: The most features to consider when splitting the data - sometimes considering all the data is not favourable

* **Maximum Leaf Nodes**: How many times the tree can split off into different directions

The above parameters can be chosen through trial and error or through a method such as cross-validation. Cross-validation is where myriad combinations of the parameters are used to build decision trees and the best parameters are determined as those which optimise a particular metric such as accuracy.

Programming a decision tree on your own may be an enlightening task, but there are plenty of libraries out there should you want to get started more quickly. One such library is *scikit-learn* for Python which provides a *DecisionTreeClassifier* object to feed a dataset and make predictions from.

Whilst a decision tree is useful and can itself be an accurate model, combining trees brings about even better results.

## Branching out

One example of using decision trees together is the random forest model. It is an ensemble model, entitled so due it being constructed from a number of smaller models, in this case those smaller models being decision trees.

One may construct a random forest as follows:

1. Construct a decision tree on a random subset of the data and a random subset of the features of the data (i.e. a sample of the rows *and* columns of a tabular dataset)

2. Repeat many times

3. Calculate the average of all the predictions of the decision trees

The final average will be the prediction of the random forest. It is generally much more accurate than lone decision trees. But why is this?

<div align="center">
<img src="/uploads/tree_to_forest.svg" width="400" height="400" title="Decision Trees to Random Forest Prediction Diagram" alt="Decision Trees to Random Forest Prediction Diagram"/>
</div>

The steps above constitute the process of 'bagging.' Bagging utilises the fact that each tree uses a different, random sample of the data. Due to this, each tree's error is unrelated to the others', that is to say that they are uncorrelated. This implies (theoretically) that the average of the errors is zero! Practically, this means we can produce a more accurate model by combining many less accurate models - an amazing ability.

The main advantage random forests have over decision trees is that they are more accurate and less prone to overfitting. Another benefit is that by looking the effect of features across all the trees used together in a forest, one can determine feature importances and get a better idea of the significance of each facet of a dataset. However, not every aspect of random forests is green and verdant, they do come with disadvantages:

* Decision trees and random forests are poor at extrapolating outside the input data due to their inherent reliance on averages to make predictions

* The ensemble nature of random forests makes them harder to interpret than decision trees, where splits can be followed through and understood readily

<div align="center">
<img src="/uploads/feature_importance.svg" width="400" height="400" title="Titanic Dataset Feature Importance Bar Chart" alt="Titanic Dataset Feature Importance Bar Chart"/>
</div>

The random forest is a brilliant machine-learning model which is very effective at what it does, but it is not applicable to or suitable for every situation.

## Sprouting Anew

Random forests have been around a while - [Leo Breiman](https://en.wikipedia.org/wiki/Leo_Breiman) formalised the definition and in fact coined the term in an eponymous paper in 2001! Other machine learning models have long since overtaken random forests in popularity - and for good reason. Two more modern models are neural networks/deep learning and gradient boosters.

Neural networks benefit from vast amounts of data, which we continue to acquire ever more and more of, whereas performance gains from random forests begin to stagnate after a certain level. Neural networks are also generally more flexible and versatile than random forests. They can be used on all types of data including images and language data rather than the focus of random forests which concerns tabular data. Another point to consider is that random forests, given that they consist of trees, are rules-based models and not all problems can be generalised or thought of in terms of trees. Resultantly, a function-based model, that is a neural network, performs much better in such cases.

Gradient boosters are often used instead of random forests for machine learning applications concerning tabular data as they are generally more accurate. They are also based on decision trees but instead of using bagging, they rather work on a principle of error-correction where the errors of initial models are trained against to improve the final model. Popular examples of gradient boosting models include [CatBoost](https://catboost.ai/) and [XGBoost](https://www.nvidia.com/en-us/glossary/data-science/xgboost/).

Despite its shortcomings compared to other more widely-used models, the random forest remains popular - especially for its use in studying machine learning. It is built up from the easy-to-understand decision tree, and the reasons for its increased accuracy are intuitive to comprehend therefore lending itself to use in AI education material. Aside from its pedagogical uses, the random forest model has been used in applications from fraud detection at banks, to medical imaging, to body part movement detection in the [Kinect](https://en.wikipedia.org/wiki/Kinect).

## Planting Seeds

I hope that I have provided an adequate primer on decision trees, random forests, and how they work. There is much more to learn, and I would recommend the [book](https://course.fast.ai/Resources/book.html) which complements the fast.ai course to fill in the gaps (chapter 8 in particular for random forests). Another avenue to look into is entering [Kaggle](https://www.kaggle.com/) competitions which will allow you to practise machine learning skills with real datasets and to cement knowledge through doing rather than simply reading or watching.
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
---
title: "Testing with Intent: a Path to Embedded Accessibility"
title: 'Testing with Intent: a Path to Embedded Accessibility'
date: 2023-11-06 09:45:00 Z
categories:
- Tech
layout: default_post
tags:
- Testing
- Testing Library
- Automation Testing
- Testing with Intent
- Accessibility
- Embedded Accessibility
summary: "In this post, I explore an approach to testing called Testing with Intent. I look what the approach is—testing from the perspective of a user intending to do something—and the positive impacts it can have on both testing and accessibility. I've written this for a broad audience, so I've steered clear of technical details included. Instead, you should come away with an understanding of why this topic is important and how you can benefit from adopting the approach."
summary: In this post, I explore an approach to testing called Testing with Intent.
I look what the approach is—testing from the perspective of a user intending to
do something—and the positive impacts it can have on both testing and accessibility.
I've written this for a broad audience, so I've steered clear of technical details
included. Instead, you should come away with an understanding of why this topic
is important and how you can benefit from adopting the approach.
author: sgladstone
image: "/uploads/Testing%20with%20intent%20-%20a%20path%20to%20embedded%20accessibility_.png"
layout: default_post
---

*Embedded Accessibility* is a vision of building accessible products by default. We can consider accessibility embedded when it no longer needs to be prioritised because it is already at the core of the delivery process.
Expand Down
12 changes: 9 additions & 3 deletions _posts/2023-11-06-testing-with-intent-a-technical-view.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
---
title: "Testing with Intent: a Technical View"
title: 'Testing with Intent: a Technical View'
date: 2023-11-06 09:55:00 Z
categories:
- Tech
layout: default_post
tags:
- Testing
- Testing Library
- Automation Testing
- Testing with Intent
- Accessibility
- Embedded Accessibility
summary: "In my previous post, I introduced and approach to testing called Testing with Intent. Essentially, the approach focuses on testing from the perspective of a user intending to do something. Adopting this approach brings you benefits in both your test suites and your products accessibility. That post discussed why the topic is important and how you can benefit if you adopt it. Now, it’s time to look at the technical side of how this actually works in practice. "
layout: default_post
summary: 'In my previous post, I introduced and approach to testing called Testing
with Intent. Essentially, the approach focuses on testing from the perspective of
a user intending to do something. Adopting this approach brings you benefits in
both your test suites and your products accessibility. That post discussed why the
topic is important and how you can benefit if you adopt it. Now, it’s time to look
at the technical side of how this actually works in practice. '
author: sgladstone
---

In [my first post]({{ site.github.url }}/2023/11/06/testing-with-intent-a-path-to-embedded-accessibility.html), I set out why I think we should be *Testing with Intent*. I set out that, if we focus our tests on the intentions of users, we can improve our test suites and start to tackle accessibility. To keep the content accessible to everyone, I chose to not include anything technical. Now, in this post, I’m going to look at the same subject but through a technical lens.

The essence of this whole approach to testing can be boiled down to one simple golden rule: “Wherever possible, use `queryByRole`". We’ll take a look at what we mean by this rule, and start to unpack its consequences.
Expand Down
5 changes: 4 additions & 1 deletion _posts/2023-11-07-understand-your-data-requirements.markdown
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
---
title: Understand your data requirements
date: 2023-11-07 00:00:00 Z
categories:
- Data Engineering
tags:
- Data Warehouse
- CDP
- Big Data
- Data Strategy
summary: This blog discusses the different data requirements that exist in a typical organisation and provides some suggestions over how to classify them and match them to technologies
summary: This blog discusses the different data requirements that exist in a typical
organisation and provides some suggestions over how to classify them and match them
to technologies
author: dhope
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ tags:
- sustainable software
- Sustainability
- Tech
summary: 'Part of the Conscientious Computing series this blog talks about the emerging
summary: Part of the Conscientious Computing series this blog talks about the emerging
ecosystem of organisations that are promoting sustainability within software development,
cloud computing, infrastructure, and digital services.'
cloud computing, infrastructure, and digital services.
author: ocronk
image: "/uploads/greensoftware-ecosystem-024a11.png"
contributors: jhowlett
Expand Down
Binary file modified _uploads/7 things tn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _uploads/7+things+tn.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/Accessability tooling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _uploads/Accessability+tooling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/Accessibility considerations blog-cdbb23.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/Accessibility considerations blog.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _uploads/Accessibility+considerations+blog.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/An introduction for TanStack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _uploads/An+introduction+for+TanStack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/BeyondTheHype - blue - episode 1 -social.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/BeyondTheHype - pink and blue - episode 6 - social.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/BeyondTheHype - pink and pink - episode 2 - social.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _uploads/Celebrating failure quote.png
Binary file added _uploads/Celebrating+failure+quote.png
Binary file modified _uploads/Conscientious computing.png
Binary file added _uploads/Conscientious+computing.png
Binary file modified _uploads/Could PS solve the OSS Sustainability Challenges.png
Binary file modified _uploads/Dorosmall.png
Binary file modified _uploads/Dynamically Skipping Tests within Jest.png
Binary file modified _uploads/Elevating software thumbnial.png
Binary file modified _uploads/Enhancing Jest Snapshot.png
Binary file modified _uploads/Environmental Impact blog.png
Binary file modified _uploads/FCDO-6x4.jpg
Binary file modified _uploads/Frank-Hubin---The-Product-Owner-Role.jpg
Binary file modified _uploads/GenAI-Arch-Image.jpg
Binary file modified _uploads/Glen Ocsko.png
Binary file modified _uploads/Group photo.png
Binary file modified _uploads/How AI can improve processes.png
Binary file modified _uploads/How I Reduced My App's Network Usage by 95%_.png
Binary file modified _uploads/IMG_6644.png
Binary file modified _uploads/LLM Thumbnail.png
Binary file modified _uploads/Mental Models Thumbnail.png
Binary file modified _uploads/MicrosoftTeams-image (1).png
Binary file modified _uploads/MicrosoftTeams-image (10).png
Binary file modified _uploads/MicrosoftTeams-image (5).png
Binary file modified _uploads/MicrosoftTeams-image (6)-26ac64.png
Binary file modified _uploads/MicrosoftTeams-image (6).png
Binary file modified _uploads/MicrosoftTeams-image (7)-8341bc.png
Binary file modified _uploads/MicrosoftTeams-image (7).png
Binary file modified _uploads/MicrosoftTeams-image (8).png
Binary file modified _uploads/MicrosoftTeams-image (9).png
Binary file modified _uploads/Police-vehicles-in-Pride-livery-905px.jpg
Binary file modified _uploads/Pride-flag-social.jpg
Binary file modified _uploads/Rules-help-you-go-faster-social-post.jpg
Binary file modified _uploads/Scottbot tn.png
Binary file modified _uploads/Scottlogic---Josh-Warren.jpg
Binary file modified _uploads/Scottlogic---Social-media-cards---Robat-Williams.jpg
Binary file modified _uploads/Scottlogic---Social-media-cards-copy_Design-4.jpg
Binary file modified _uploads/Scottlogic---Social-media-cards-copy_Design-5.jpg
Binary file modified _uploads/Scottlogic---Social-media-cards_Ged-Smith.jpg
Binary file modified _uploads/Screenshot 2022-11-25 141537.png
Binary file modified _uploads/Screenshot 2022-11-25 142845.png
Binary file modified _uploads/Social-Card-accessibility-post.jpg
Binary file modified _uploads/SustainabilityVennDiagramBranded.png
Binary file modified _uploads/The power of a well written user story-396f37.png
Binary file modified _uploads/The power of a well written user story.png
Binary file modified _uploads/Tools for measuring cloud.png
Binary file modified _uploads/Top-Ten-Tips---Dave-Ogle.jpg
Binary file modified _uploads/WAVE Dashboard-2b0986.png
Binary file modified _uploads/WAVE Dashboard.png
Binary file modified _uploads/WAVEDashboard.png
Binary file modified _uploads/WAVEIcons.png
Binary file modified _uploads/WAVEStructureTab.png
Binary file modified _uploads/WaveErrorIcons.png
Binary file modified _uploads/Web tech landscape - simplified.png
Binary file modified _uploads/What are WebTokens_.png
Binary file modified _uploads/You only have to come out once- tn.png
Binary file modified _uploads/aXeDashboard.png
Binary file modified _uploads/aXeErrorTypes.png
Binary file modified _uploads/acccessibility_temporary_permanent.png
Binary file modified _uploads/accessibility-world-map.jpg
Binary file modified _uploads/allyship.jpeg
Binary file modified _uploads/applause-button.png
Binary file modified _uploads/being honest about embodied carbon from sd.png
Binary file modified _uploads/blockchain-architecture.png
Binary file modified _uploads/cory.png
Binary file modified _uploads/david neal.png
Loading

0 comments on commit 8310dd2

Please sign in to comment.