Relational databases are the backbone of data science and the language that we use to communicate with them is called SQL. The SQL test is a common component for data-adjacenemnt jobs within industry, government and the education sector. It is a useful tool that some argue spawned the field of data science. Before Big Data was a thing, Knowledge Discovery in Databases (KDD) used simple SQL queries to investigate and understand the nature of the large amounts of data that were being collected by governments and companies. The humble SQL test now torments the budding data scientist as a right of passage in the job search process.
In this unit you will learn the basic ideas behind relational databases and SQL. You will setup a database in Amazon Web Services and then connect to it through RStudio. You will then load data into your database and run SQL queries on that data.
- Be able to discuss an overview of relational datanases and the purpose of SQL
- Be able to spin up an AWS instance and load a SQL database into it
- Be able to connect to the database through RStudio/R using the DBI package
- Be able to run basic SQL commands in RStudio using the RMySQL package
Overview of Amazon Web Services
Please fork and clone this repository to your computer. If you are unfamiliar with this process you must sign up for office hours.
Then you will need to create an account with AWS. This will require a credit card although we will only be using free services. If you already have an Amazpon account you can use that account.
Please create the account through the regional website for your location.
Once you have created an account follow the directions below, these steps are also shown in the video above.
- Log into your AWS Management Console
- Locate
RDS
under theDatabases
heading - Within Amazon RDS click
Create database
- Under
Choose a database creation method
clickStandard Create
- Under
Engine options
chooseMySQL
- Under
Templates
chooseFree tier
- Under
Settings
name yourDB instance identifier
asdatabase-1
- Under
Credential settings
create a username and password combination and write it down (you will need it later) - Under
Connectivity
expandAdditional connectivity configuration
to show additional menu items and make sure thatPublicly accessible
is checkedYes
- Expand the
Additional configuration
menu - Under
Initial database name
writeoudb
- Uncheck
Automatic backups
- Click
Create database
- Once the database is created, take a screenshot and add it to your repository
- Under
Security Groups
clickInbound
and thenEdit
- Add the rule
SQL/Aurora
onPort 3306
with theConnection
ofMyIP
- Open the sql-project.Rmd file in RStudio and follow the directions.
- Once you have completed the project please commit and pull request the repository back to the main branch. Please be sure to include your Rmd file and your screenshot of the AWS console page. The due date for this project is January 27 by 5:00pm EDT. Don't forget to delete your AWS database so you don't get charged any money!
After you submit please complete the knowledge check quiz located here