- Amazon Redshift
- What is Data Warehouse ?
- Introduction to Redshift
- Redshift Use Case
- Redshift Columnar Storage
- Redshift Configurations
- Fully managed Petabyte-size Data warehouse .
- Analyze (Run complex SQL queries) on massive amounts of data Columnar Store database
- What is Data Warehouse ?
- A transaction symbolizes unit of work performed within a database management system
- eg. reads and writes
Database | Data warehouse |
---|---|
Online Transaction Processing (OLTP) | Online Analytical Processing (OLAP) |
A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes | A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data |
Adding Items to your Shopping List | Generating Reports |
Single Source | Multiple Source |
Short transactions (small and simple queries ) with an emphasis on writes. | Long transactions (long and complex queries ) with an emphasis on reads |
- AWS Redshift is the AWS managed, petabyte-scale solution for Data Warehousing
- Pricing starts at just $0.25 per hour with no upfront costs or commitments.
- Scale up to petabytes for $1000 per terabyte , per year
- Redshift price is less than 1/10 cost of most similar services
- Redshift is used for Business Intelligence
- Redshift uses OLAP (Online Analytics Processing System)
- Redshift is Columnar Storage Database
- Columnar Storage for database tables is an important factor in optimizing analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk
-
We want to continuously COPY data from
- EMR
- S3 and
- DynamoDB
- to power a customer Business Intelligence tool
-
Using a third-party library we can connect and query Redshift for data.
-
Columnar Storage stores data together as columns instead of rows
-
OLAP applications look at multiple records at the same time. You save memory because you fetch just the columns of data you need instead of whole rows
-
Since data is stored via column, that means all data is of the same data-type allowing for easy compression
- Single Node
- Nodes come in sizes of 160Gb. You can launch a single node to get started with Redshift
- Multi-Node
- You can launch a cluster of nodes with Multi Node mode
- Leader Node
- Manages client connections and receiving