exploratory/Clustering analysis and community detection.ipynb: Combination notebook, the final notebook that combines all intermediary work.
requirements.txt: all packages that need to be installed to run the downloader, parser and notebooks
src/
form_13f_downloader.py: loads the 13F forms from the sec.gov website
form_13f_parser: parse both the XML and Tabular formatted 13F files, saves it contents in data/all_submission_files.xlsx
cusip_to_ticker_converter.py: uses api.openfigi.com and marketwatch.com to fetch for each cusip in the list the corresponding ticker symbol and extra information. The ticker symbols will be stored in data/all_submission_files2.xlsx, all metadata will be stored in data/stock_info.json
exploratory/Read_the_extra_data.ipynb: reads data/stock_info.json, parses the company description to extract the year of foundation and saves all in data/investee_info.xlsx.
test/ Unit tests for the 13F form parser
cleanup_notebook.sh: script to remove all output from notebooks, to be used before committing the changs to the git repository.
exploratory/
exploratory/exploratory_data_analysis.ipynb: explores the 13F forms that we collected.
exploratory/networkX_community_detection_yearOfFoundation.ipynb, exploratory/networkX_community_detection_sector.ipynb: Implements community detection using networkX_ respectively by year of foundation of the investees and industry/sector of the investees.
clustering.py: supporting code for the kmeans clustering notebook.