quick and dirty dives into various datasets
- download and install anaconda (it comes with its own python)
- test on Windows: press windows key, type
jupyter
and enter - if this does not open in browser, fix it
- if desired (saves 1-2 clicks), configure notebook directory
- download and install regular edition
- [optional] purchase: buy 1 year, receive the licence certificate email, click on activate software. This leads to their site, where you download the activation key (2300 characters ending with
==
). In Pycharm, Help > Register > activation code (should color green). - to configure interpreter for a new project: pick pure python. Then 1) new env using venv, or 2) existing interpreter (should show none) > click
...
> system interpreter >C:\Users\YOURNAME\Anaconda3\python.exe
.
- use
virtualenv
- default
%autoreload 2
in notebooks https://stackoverflow.com/a/5399339 (seems to not always work on Windows) - save dependencies
pip freeze > requirements.txt
- make modules in python, unit test them, and import them in notebooks
- high-level pickling: Joblib caches return values based on function name and parameters passed.
from sklearn.externals.joblib import Memory
memory = Memory(cachedir='/tmp', verbose=0)
@memory.cache
def computation(p1, p2):
...
- outlier removal 1-liner
- SettingWithCopyWarning warning from pandas: explanation and solutions