This is strictly a prototype connector to extract information from jupyter notebooks and other code files. This code is absolutely not hardened enough to be a generalized connector. There are edge cases where the cataloging process may not work for notebooks which are large or have cases which I have not tested for.
Feel free to improve the code!
Extensions Covered:
- ipynb
- py
- r
- sql
- txt
- md
- c
- cpp
- xml
- Python 3
- The following python libraries:
- pandas
- tqdm
- requests
pip install pandas tqdm requests
All settings
are to be placed in config.py
- Generate your own API token at: https://github.com/settings/tokens
- Please be sure to give it the correct permissions. You may have to experiment to find the correct settings. If repos look empty, then grants aren't correct.
- Copy your key into config.py as the value for
APIKEY
- Generate an Alation API refresh token
- Copy your key into config.py as the value for
API_REFRESH_TOKEN
- Add the User ID for the owner of Alation API refresh token as the value for
API_USER_ID
- Add the Alation URL (without the last '/') for your instance as the value for
ALATION_HOST
- Create a virtual filesystem in Alation and grab it's ID. For example, if the URL for your newly created virtual filesystem is
http://ms-sandbox.alationbd.com/filesystem/1/
then the ID is1
. - Add the ID as the value for
DSID
- if using .CER file for Alation API calls, copy the file to the same directory, set up
USING_CER_FILE
to "Y", and provide the .cer file name in theCERTIFICATE
variable - Run
python connectorProto.py
- We currently only extract the data for master branch
- GitHub rate limits are not being considered