Brought to you by Galvanize. Learn more about the way we teach at galvanize.com.
- WIFI:
g|Events
| Password islearningcommunity
- A web browser to see what we're working on as others see it (Recommend Google Chrome: [chrome.google.com] (http://chrome.google.com))
- We will be using Google Colab for this workshop so make a Google account if you don't already have one.
- Open this github Repo to follow along
A super friendly introduction to Machine Learning No previous experience expected, but knowing some python will help!
You can't learn EVERYTHING in ~2 hours, especially when it comes to Machine Learning! But you can learn enough to get excited and comfortable to keep working and learning on your own!
- This course is for absolute beginners
- Ask Questions!
- Answer Questions!
- Help others when you can
- Its ok to get stuck, just ask for help!
- Feel free to move ahead
- Be patient and nice
We're not going to focus on the math behind the models. We're going to focus more on when and how to use a model. If you would like to go into the math and more about each model I encourage you to do so!
Hello I'm Keenan Olsen. I'm a Technology Evangelist here at Galvanize!
I Originally got into Machine Learning by solving a manufacturing problem at my last job with computer vision, and I think its one of the coolest fields!
Note: I'm not a Galvanize Instructor
- Twitter: @KeenanOlsen
- LinkedIn: Keenan Olsen
- Email: [email protected]
Reach out to me if interested in:
- breaking into the tech industry
- learning resources
- meetup recommendations
- learning more about Galvanize
- giving me suggestions for events!
- being friends
Give a quick Intro!
- Whats your name?
- Whats your background?
- Why are you interested in Machine Learning?
- WIFI:
g|Events
| Password islearningcommunity
- Moderen web browser
- Google account
Looking at this data how do we know that regression will be a good choice? Why not Classification?
Iris Flower Dataset KNN Workbook
Looking at this data how do we know that Classification will be a good choice? Why not Regression?
>>> Iris K-Nearest Neighbors <<<
To put it very simply Machine Learning can usually be thought of using a statistical model built based on a dataset to solve a problem.
Instead of explicitly programming an algorithm to do a specific task, we let it "learn" from data to find patterns and inference.
We'll see examples of this soon!
More and more companies using making decisions with data are using machine learning. Here are just a few examples that you've probably experiences as a customer.
- Product Recommendations
- Amazon GO Computer Vision
- Alexa
- Delivery Robots
- Show & Movie Recommendations
- Gmail Spam Filtering
- Google Assistance
- Youtube Content filtering & Recommendations
- Self Driving Cars
- Siri
- App Store Recommendations
- Face Tagging Detection
- Self Driving Cars
These companies use Machine Learning in many other ways!
We talked about a some examples above from big companies we probably all know of. But here are several more types of applications that machine learning has become popular with.
- Cancer Detection
- X-Ray diagnostic
- Smart door Bell
- Smart Lights
- Security
- NVIDIA’s Hyperrealistic Face Generator
- video game Character or level generation
- art generation
- Crop monitoring & planning
- Sourcing and Shipping Automation
- Quality Assurance
- Design
- Credit cards
- Product listings
You can see how all of these applications revolve around finding patterns in data!
Supervised Learning uses a dataset that is labeled. In this context imagine having a list of features and a label(group) that those features belong to.
Here we have features(sepal length (cm), etc) and a label(Flower Species)
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
5.7 | 2.9 | 4.2 | 1.3 | versicolor |
7.7 | 3.0 | 6.1 | 2.3 | virginica |
We could use a full dataset with data like above to make a prediction of the flower species given only the Petal and Sepal Lengths.
Another good example of supervised learning is a email spam filter.
Say we have a bunch of emails in our dataset and they all have a label of either spam
or not_spam
. We could then train a supervised learning model to look at all of those emails and pick up patterns that show up in the spam emails. There are probably certain words or formatting that repeat them selves. If you've ever looked in your email spam folder you can probably pick out some of those things yourself!
There are 2 main types of supervised learning Classification and Regression:
Classification tries to assign the correct label to a new piece of data not containing a label. Both examples above are good examples of classification problems.
Spam filter would look at an email and decide if it should be labeled as spam
or not_spam
We could be given a new flower measurement and we want to try to label it with the correct Species: setosa
, versicolor
, virginica
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) |
---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 |
According to a model I trained it thinks this would be versicolor
.
Instead of predicting a label like classification, Regression predicts a value.
This example has features crime rate
, Zoning
, rooms
, square footage
and a value price
.
crime rate | Zoning | rooms | square footage | price |
---|---|---|---|---|
.5 | 3.5 | 5 | 1400 | 100000 |
.2 | 2 | 3 | 3000 | 50000 |
.3 | 4 | 7 | 1800 | 150000 |
Unlike the classification example where we tried to predict what group features belonged to, we want to predict what value the features would have. This could be a number ranging anywhere!
Given a list of new features from a house like below, we would then want to find out how much that house is worth by predicting a number value.
crime rate | Zoning | rooms | square footage |
---|---|---|---|
.7 | 4 | 2 | 1000 |
Some other examples to think about Predicting:
- Stock price
- Age
This workshop is going to focus on supervised Machine Learning, but we'll talk briefly about some of the other types!
Unsupervised Learning uses a dataset that is not labeled and gains insight about its patterns.
A common way of using unsupervised learning is clustering.
This picture shows an example of visualizing the Iris Dataset we talked about before. We can see that there are features that relate to each species. If we didn't have those labels we could use unsupervised learning to create clusters separating the groups out that would probably look pretty similar to this. We could then add a label to those clusters.
An example to think about is if you have a large dataset of customers. Maybe you would like to segment them out to cluster similar customers.
Uses mixed dataset labeled with labeled and unlabeled to train the model and a combination of supervised and unsupervised machine learning.
Semi Supervised Machine learning can be important to look into if you don't have enough labeled data to create a good model. Labeling and acquired labeled data can be extremely expensive / time consuming so developing a model that can use both types of data is super intriguing!
Imagine trying to label every piece of information you get from a self driving car! You have a constant video feed, Lidar, and other sensors.
Reinforcement Learning is often used in a situation where an algorithm can take an action in an environment and receive a reward
based on making a good design.
You see a lot of example of this type of machine learning used to make computers excellent gamers!
A couple examples:
Deep Learning is a subset of Machine Learning.
It uses layers of Artificial Neural Networks and can learn from data to change the weights of the neurons.
A Neural Network Playground - TensorFlow is a great place to start tinkering around and learning more about Artificial Neural Networks!
Deep Learning is killing it at recognizing and generating complicated patterns.
-
Computer vision (CV)
- Self Driving cars
- Amazon Go
-
Natural Language Processing (NLP)
- Alexa
- Siri
-
Generative Adversarial Networks (GANs)
- NVIDIA’s Hyperrealistic Face Generator
- video game Character or level generation
- art generation
For this class we're going to stay focused on Supervised Machine learning. It's a great place to start!
But out of all these what would you like to see a class on next?
Some of the common models. Having an idea of what these do and applications they should be used for is important! I will only briefly go over them so please read more about them!
Typically used for regression
Generally regression problems predict a value on a continuous spectrum
Typically used for classification
NOT used for regression problems! Has regression in the same due to the statistics behind the model.
Used to predict binary outputs (yes, no | true, false | Pass, fail)
looking for probability above a certain threshold
if .5
Typically used for classification
k-NN finds the k
number of nearest data points and makes a educated
guess based on the classifications of the nearest datapoint.
Typically used for classification
Maybe an over simplification but a Decision tree can be thought of like a bunch of if statements.
You've probably seen flow chats before with different paths to take depending on the data.
There are of course more than these 4 models, a few more popular ones you should look into are Support Vector Machines, Random Forests, and Naive Bayes.
There can be a lot of factors to consider, like the size of data, Labels, Accuracy, Scalability, etc... A lot of these out of the scope for this workshop.
But when you're first starting out It's important to think about your desired outcome(output of the model).
- Is it a number? Its probably a Regression problem.
- Is it a class / label? Its probably a classification problem.
- Are you separating unlabeled data into groups? It’s probably a clustering problem.
https://scikit-learn.org/stable/tutorial/machine_learning_map/
We can only scratch the surface of Machine Learning tonight in this workshop, so this is by no means everything you need to know, but it should help you get started!
Training your model on your dataset. You'll see terms like fit and train used interchangeably
relies too much the relationships in training data, Fails to work correctly on new data.
Fails to learn the relationships in the training data to be used on new data
Validate that your machine learning model is working on well on data that it was not trained on.
We trained the model, but need to validate that its working as expected. A common way is to split the dataset into training and testing(We'll do this soon in python).
Again this just some of them, there are soooooo many.....
Pandas is often used to explore, clean, and visualize your data.
Numpy is often used for muulti dimensional array manipulation
matplotlib is often used to visualizing your data in a chart like format
Scikitlearn a.k.a. sklearn is a powerfil opensource machine learning libray
Library from Google for Machine Learning. Popular in Deep Learning.
Library from Facebook for Machine Learning. Popular in Deep Learning.
A higher level wrapper that can be used with TensorFlow to make writing deep learning projects easier.
An opensource library to get started with NLP.
An opensource library to get started with computer vision and image manipulation.
Note: if you're thinking of exploring data science with python locallyClassification on your computer look into using Anaconda to manage your python and data libraries. I'd go crazy without it!
Looking at this data how do we know that regression will be a good choice? Why not Classification?
Looking at this data how do we know that Classification will be a good choice? Why not Regression?
>>> Iris K-Nearest Neighbors <<<
Did you learn something new?
Do you feel more comfortable with the ideas of Machine Learning?
Do you have an awesome idea you want to use try using machine learning? What is it?
Best way to learn is solving a problem you're excited about!
Use an "ugly" dataset. Understanding how to make a good dataset is important.
Scikit Learn has more built in datasets. Use them and apply what you learned today!
-
Hack Reactor Software Engineer Prep FREE | study at your own pace
-
Galvanize Data Science Prep Course - FREE | study at your own pace
We create a technology ecosystem for learners, entrepreneurs, startups and established companies to meet the needs of the rapidly changing digital world.
- Education
- Co-Working
- Events
- Enterprise
Transform your career with our 13 week immersive programs
- Software Engineer - 6/3/19 - 10/11/19
- Data Science - 8/19/19 - 11/15/19
Learn while working with out evening part-time classes
-
Python Fundamentals - 6/4/19 - 7/11/19
-
Data Analytics - 6/3/19 - 8/21/19
Please feel free to reach out to me with any questions! Let me know what you're planning to do next and how I can help!