- lexer.py : this does the job of lexing
- parser.py : this does the parsing of the SQL queries
- main.py : this is the which runs the interpreter (like python !) and you can run SQL Queries
- implemenation.py : this file contains the implemenations of SQL queries and some functions (Must be completed)
- select
- use
- drop
- load
- create database
- schema (not there in standard SQL. Added to view schema of databases and tables)
- current database (again not there in standard SQL. Added to know the currently selected database)
- exit() or quit() (to quit the interpreter)
- list database (to list all available databases)
hadoop python3 ply
pip3 install ply
python3 main.py
- use
- create database
- load (partially)
- drop
- schema
- current database
- Make it work on hadoop (create and delete files/folders in hadoop. currently made to work on file system and not hadoop. May have to change remove() in implementation.py. can do in end I guess)
- Implement load completely
- Implement select
- Implement aggregate functions MAX, COUNT, SUM
Note : May have to write mapper/reducer in separate files and call them via system call in the wrapper functions select, load, MAX,COUNT and SUM via hadoop streaming API
DATABASE_ROOT/
database_name.schema
dblist.db
database_name/
table_name/
csv_file
Note
- dblist.db is file which contains the list of all the databases (only 1)
- There is one schema file per database
- There is one directory for each database Have commented as many important lines as possible. If you have any doubts, call me.