Welcome to Introduction to Python for Data Analysis! Over the course of these five lessons, we will guide you through the basics of Python and how to use it for data analysis. We start out by teaching you just enough Python fundamentals to get you familiar with syntax, basic data structures, and control flow. Then we take you through using two of the most popular data analytics libraries, pandas
and matplotlib
, to learn the basics of data cleaning, processing, and visualization using Python. In our final lesson, we combine everything you have learned so far into a mini-project where we guide you through performing Exploratory Data Analysis (EDA) using Python, pandas
, matplotlib
, and SQL.
Click on any lesson to learn more about it and access those resources:
- Lesson 1 - Python Fundamentals
- Lesson 2 - Basic Control Flow & Algorithms
- Lesson 3 - Elementary Exploratory Data Analysis with
pandas
- Lesson 4 - Data Visualization with Python &
matplotlib
- Lesson 5 - A Guided Exploratory Data Analysis using Python and SQL
There are no prerequisites for this course. We structured this course with the beginner in mind, so if you have never written code then this is for you! Of course, if you already have a coding background, then this material may be more of a refresher.
All you need to use the materials in this course is a valid Google account and an internet connection. That's it. There are links to all the course material you'll need down in the Course Outline below. You do not need to set anything else up.
If you already have your own coding environment set up, you can simply download the course files from this repository and run them in your environment. However, since each environment can look different across machines and operating systems, we will not be providing direct support for issues related to your environment (e.g. Why can't Anaconda find this library?)
You may set up your own coding environment for this course if you wish. However, if you have not had prior experience with this or with coding, we recommend completing the course using the cloud resources we set up first before setting up your own environment. There is quite a bit of management going on behind the scenes with setting up environments for these course materials to be executed smoothly, which you will have to manage on your own in a local environment.
As far as coding environments, we recommend using either Anaconda
or Miniconda
. They are essentially the same, but Miniconda
comes with less preinstalled software out-of-the-box, so it is "lighter" to use than Anaconda
, which has more pre-installed software and will take up more space on your system.
This presentation shows you how to install Anaconda on your local machine
This course is meant to give you a hands-on introduction to Python and how it can be used in the data analytics process. After completing it, you will have a rudimentary understanding of Python syntax and basic programming concepts, as well as how to apply those concepts to exploratory data analysis using popular libraries. Our goal is simply to introduce you to these concepts - not to provide a rigorous study that would prepare you for technical interviews (at least not with this material alone.)
What this course does prepare you for:
- Further study with Python or other programming languages
What this course does not prepare you for:
- Python technical interviews
- Technical roles such as data scientist, data engineer, or software developer
After completing this course, we strongly encourage you to pursue further studies in programming with Python (or any other programming language you like) - especially if you would like to get into a more technical role. Again, this course can be considered foundational, but it is not enough by itself to prepare you for more technical roles.
Each lesson is outlined below. For each one, we use Jupyter Notebooks to run our Python code. To access the Jupyter Notebook for a given lesson, simply click on the button for that lesson. We also include a link to the accompanying presentation for each lesson.
Presentation: Click here to view the presentation for this lesson
To start things off, we introduce you to fundamental syntax and data structures in Python. After completing this lesson, you will know:
- What objects are in Python and their three fundamental properties
- How variables work and how to use them in our code
- How to perform basic arithmetic and logical operations
- How the basic container data structures (strings, lists, tuples, and dictionaries) work
- The basics of slicing and indexing
Presentation: Click here to view the presentation for this lesson
Following up on our previous lesson, we introduce you to basic control flow concepts (if/else statements & loops) and functions. After completing this lesson, you will know:
- How we can control the flow of our program's execution using basic control flow concepts
- How to execute different code based on conditions that we specify using if/else statements
- How to use both
for
andwhile
loops to repeatedly execute code in an efficient manner - How functions work in Python and how to create your own custom functions
- How to use basic control flow and functions to create algorithms that solve problems
Presentation: Click here to view the presentation for this lesson
After learning just enough basic Python, we introduce you to pandas
- one of the most popular (and ubiquitous) Python libraries for working with data. We'll show you the basic methods you should know in order to explore and understand your data. Since a lot of the operations we perform here have parallels in SQL, we will show the connections between the two concepts where applicable. After completing this lesson, you will know:
- What
DataFrames
are and how to use them to work with your data - How to read data stored in CSV files and manipulate it using
DataFrames
- Basic data manipulation using
pandas
, including how to select rows and columns, filtering, joining your data with other data sets, renaming columns, and dealing with null values - How to use built-in
pandas
features to view metadata and gain a better understanding of your data
Presentation: Click here to view the presentation for this lesson
Part of working with data is creating visualizations to help us better understand it and communicate our findings. In this lesson, we introduce you to matplotlib
- a popular library for creating visualizations with Python. After completing this lesson, you will know:
- How to create basic graphs with
matplotlib
, such as histograms, line charts, bar charts, and pie charts - How to customize your charts using parameters offered by
matplotlib
Presentation: Click here to view the presentation for this lesson
In this final lesson, we combine all of the concepts you have learned in the previous four lessons and guide you through an Exploratory Data Analysis using the Data Science Salaries data set.
Here is the scenario:
You are a data analyst working for a company that just received this data set and your management team wants you to explore it. They have some some questions about the data they would like you to answer. Additionally, they want you to give any insights you may find after exploring the data.
This is a comprehensive mini-project that will test your understanding of basic data analysis, Python, pandas
, matplotlib
, and SQL (optional) concepts. After completing this lesson, you will know:
- How to perform a basic Exploratory Data Analysis using Python and SQL
- How to structure and communicate your findings in a professional manner