Skip to content

Week3 Readme

Andrej edited this page Nov 12, 2019 · 20 revisions

Reading Material Databases Week 3

Agenda

  1. Normalization
    • normal forms
  2. Transactions
  3. SQL injection
  4. NoSQL
  5. Non-relational vs. relational

1. Normalization

When developing the schema of a relational database, one of the most important aspects to be taken into account is to ensure that the duplication is minimized. This is done for 2 purposes:

  • Reducing the amount of storage needed to store the data.
  • Avoiding unnecessary data conflicts that may creep in because of multiple copies of the same data getting stored. Normalization in DBMS

Database Normalization is a technique that helps in designing the schema of the database in an optimal manner so as to ensure the above points. The core idea of database normalization is to divide the tables into smaller subtables and store pointers to data rather than replicating it.

There are various database “Normal” forms. Each normal form (NF) has an importance which helps in optimizing the database to save storage and to reduce redundancies. These normal forms build incrementally. e.g. a database is in 3NF if it is already in 2NF and satisfied the rules for 3rd normal form. Read for more details.

1st normal form (1NF) (4 rules)

  • Rule 1 : Single valued attributes (each column should have atomic value, no multiple values)
  • Rule 2 : Attribute domain should not change
  • Rule 3 : Unique names for attributes / columns
  • Rule 4 : Order does not matter

2nd normal form (2NF)

No partial dependency. (i.e. no field should depend on part of the primary key) Example

Score table (student_ID, subject_ID, score, teacher)
Subject table (subject_ID, subject Name)

3rd normal form (3NF)

No transitive dependency (i.e. no field should depend on non-key attributes).

Boyce-Codd normal form (3.5 NF)

for any dependency A → B, A should be a super key.

To increase your understanding, study the following materials:

2. Transactions

  • A transaction is a set of commands that you want to treat as "one command." It has to either happen in full or not at all.

  • A classical example is transferring money from one bank account to another. To do that you have first to withdraw the amount from the source account, and then deposit it to the destination account. The operation has to succeed in full. If you stop halfway, the money will be lost, and that is Very Bad.

  • To start transaction:
mysql> start;
OR
mysql> begin transaction;
  • To commit, use commit; and to abort, use rollback;
  • Note that autocommit variable should be set to false for rollback to work.
mysql> set autocommit = 0;

3. SQL injection

Some SQL clients accept input from user to fabricate the queries. A malicious user can tweak the input so as to acquire more information from the database or to destroy the database (literally!). Demo program sql-injection.js is in the Week3 folder.

Consider the following query SELECT name, salary FROM employees where id = X.

Injection to get more information

If X is `101 OR 1=1`, then the query returns all records because 1=1 is always true
SELECT name, salary FROM employees where id = 101 OR 1=1;

Injection to destroy the database

If X is `101; DROP database mydb`, then the query will delete the entire database
SELECT name, salary FROM employees where id = 101; DROP database mydb;

mysqljs prevents the second injection by not allowing multiple SQL statements to be executed at once.

4. NoSQL

IGOR - still to be determined what technology in particular

5. Non-relational vs. relational

IGOR - still to be determined what technology in particular

OLD THINGS, SHOULD BE DELETED IF NOT USED

Understanding the asynchronous nature of database queries

Jim (@remarcmij) wrote these excellent demo programs for better understanding. Do check them out.

Complicated values to store in MySQL

- Storing prices (floating point errors)
- Storing dates (datetime vs. timestamp)
- datetime : fixed value (joining date of employee): has a calendar date and a wall clock time
- timestamp : unix timestamp, seconds elapsed from 1 Jan 1970 00:00 in UTC (takes timezone into consideration)

Reference Material

Clone this wiki locally