A flask app designed for the exploration (fast retrieval and cleaning) of public data. Working with this code to make your own cleaning and sythesizing data tool should be doable with a basic knowledge of python web development, specifically flask.
As of now, the data base is populated via a public data API drawing from the UN and World Bank data sources that extracts, standardizes, and stores public data. these procs are stored in Vasco/ETL. I am hard at work making these scripts managable and useable so any user can retool this site to accept their data with basic python knowledge.
Check out the demo on heroku here
A special thanks to Blazing DB for the inspiration to start this
Email me if you are interested in hacking on this or need more info at [email protected]. Below are some initial tips if you are interested in getting started
- If you don't go here
-
To start, clone this repo, create a virtual enviroment. Activate it and run the usual
pip install -r requirments.txt
-
Note, for the lxml library to work properly you will need to install the development versions
** for debian based OSs like Ubuntu - sudo apt-get install python-dev libxml2-dev libxslt1-dev zlib1g-dev
** more info here http://stackoverflow.com/questions/5178416/pip-install-lxml-error
- Create a postgres database and note the name, username and password and add an enviromental variable DATABASE_URL equal to postgresql://user:password@localhost:5432/DBNAME.
** To do this use the set command in windows and export for bash, and add these commands to the begining of the activate script in your virtual enviroment, which should not push to github with the source code.
** For example, the last line of the "C:\...path...\venv_Vasco\Scripts\activate.bat" script in my virtual enviroment reads
set "DATABASE_URL=postgresql://user:password@localhost:5432/DBNAME"
Good short tutorial on this here
*start the app! python manage.py runserver
-
They will use the database name set in the connection string to alert the user which database is being worked on. the connection is created using the enviromental variable set above. This allows your computer to store the connection to your local or the production database and set it wherever it is running.
-
To start populating the data, run the procs stored in Vasco de Data\Vasco\ETL from a command line, starting with create_and_test_db.py
-
see a full schema diagram at Vasco de Data\archive\diagrams\db schema