Skip to content

Latest commit

 

History

History
80 lines (63 loc) · 2.58 KB

README.md

File metadata and controls

80 lines (63 loc) · 2.58 KB

Software Engineering Trends on Docker Hub

An end-to-end framework which help the company to predict software engineering trends and the developers to know more about a docker image.

Our goal is to provide different companies with a dynamic dataset through which meaningful inferences can be made.

Our aim is to gather data from Docker Hub and analyse the trends. Docker Hub is a cloud-based repository in which Docker users and partners create, test, store and distribute container images.

This project was developed as part of coursework for Data-X at Berkeley.

Link to supporting presentation

Requirements

We use Conda to manage the environment and packages.

We use the following packages (among many others):

  • Python 3.6 or above
  • Pandas
  • Matplotlib
  • Plotly
  • Seaborn
  • boto3

To fetch new .json files from the AWS S3 bucket

cd data/
aws s3 sync s3://docker-recent recent-data

Installation

1. Downloading this Respository

Start by downloading or cloning this repository.

git clone https://github.com/cshubhamrao/docker-hub-data.git
cd docker-hub-data-x

2. Create and Activate Environment

Create the conda environment from the environment.yml file:

conda env create -f environment.yml

Now activate the environment by:

conda activate docker-hub

3. Run Jupyter Lab

jupyter lab

Contents

  1. Data - This folder contains all the data related files and folders that are generated or are stored for later use. This is also the folder where all the 'plots' generated by analytics.ipynb and another scripts.
  2. Misc - Contains all the miscellaneous scripts that are required for this project.
  3. Scripts - This folder is the main folder. This contains all the scripts that we used to scrape the data, clean that data, select required data to do analysis, and finally do analysis on the data and derive inference from the data.

Team Members

System Architecture:

Architecture