For more of the background on this project, check solution.pdf and proposal.pdf.
- Main repository on Github: https://github.com/diegoami/DA_ML_Capstone
- Companion project: https://github.com/diegoami/DA_split_youtube_frames_s3.git
- Data : https://da-youtube-ml.s3.eu-central-1.amazonaws.com/
wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_data_5.zip
unzip -qq -n wendy_cnn_frames_data_5.zip -d <YOUR SM_CHANNEL_TRAIN>
wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_E69.zip
unzip -qq -n wendy_cnn_frames_69.zip -d <YOUR TEST DIR>
## Optional: repeat the two above steps for episodes E71, E72, E73, E74, E75, E76, E77
Set up the model_dir and channel_train to point to where you put the files you downloaded, and where you want to keep the generated models, respectively
export SM_HOSTS=[]
export SM_CHANNEL_TRAIN=<YOUR-DATA-DIR>
export SM_MODEL_DIR=<YOUR-MODEL-DIR>
export SM_CURRENT_HOST=
Set up a python environment using the libraries listed in requirements-freezed.txt
Make sure that you pushd letsplay_classifier
before executing scripts locally, and set PYTHON_PATH to the current directory.
pushd letsplay_classifier
PYTHONPATH=$(pwd) python pca/pca_sklearn.py --data-dir=/media/diego/QData/youtube_ml/wendy-cnn-5/frames/all/
popd
This script creates a model to categorize frames in the train data directory
pushd letsplay_classifier
PYTHONPATH=$(pwd) python train.py --epochs=5 --img-width=320 --img-height=180 --layer-cfg=B --batch-size=16
popd
This script verifies the model over the dataset and print a classification report and a confidence matrix over the whole dataset
pushd letsplay_classifier
PYTHONPATH=$(pwd) python verify_model.py
popd
You can visualize the model and see how principal components of data set are spread.
pushd letsplay_classifier
PYTHONPATH=$(pwd) python pca/pca_vgg.py
To test how well the model is able to split videos into scenes, you can use the following command, to verify how the created model splits a video whose frames have not been used during training. Data-dir here is the directory of the images that you want to classify.
pushd letsplay_classifier
PYTHONPATH=$(pwd) python interval/predict_intervals_walkdir.py --data-dir=<TEST-DIR>
popd
To deploy the model on sagemaker, check the Juypter Notebooks, especially CNN_Third_iteration.ipynb, and follow the instructions on them.