Skip to content

achintyashivam11/hsql

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hsql

An SQL engine on top of Hadoop

There are 4 files

  1. lexer.py : this does the job of lexing
  2. parser.py : this does the parsing of the SQL queries
  3. main.py : this is the which runs the interpreter (like python !) and you can run SQL Queries
  4. implemenation.py : this file contains the implemenations of SQL queries and some functions (Must be completed)

The SQL Queries Currently Supported :

  1. select
  2. use
  3. drop
  4. load
  5. create database
  6. schema (not there in standard SQL. Added to view schema of databases and tables)
  7. current database (again not there in standard SQL. Added to know the currently selected database)
  8. exit() or quit() (to quit the interpreter)
  9. list database (to list all available databases)

What are requirements ?

hadoop python3 ply

How to install ply ?

pip3 install ply

How to run the interpreter ?

python3 main.py

What's Currently Working ?

  1. use
  2. create database
  3. load (partially)
  4. drop
  5. schema
  6. current database

What's must be done ?

  1. Make it work on hadoop (create and delete files/folders in hadoop. currently made to work on file system and not hadoop. May have to change remove() in implementation.py. can do in end I guess)
  2. Implement load completely
  3. Implement select
  4. Implement aggregate functions MAX, COUNT, SUM

Note : May have to write mapper/reducer in separate files and call them via system call in the wrapper functions select, load, MAX,COUNT and SUM via hadoop streaming API

The Directory Organization

DATABASE_ROOT/
    database_name.schema
    dblist.db
    database_name/
        table_name/     
                csv_file

Note

  1. dblist.db is file which contains the list of all the databases (only 1)
  2. There is one schema file per database
  3. There is one directory for each database Have commented as many important lines as possible. If you have any doubts, call me.

About

An SQL engine on top of Hadoop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%