Docker image will have python 3.8.5. The scorer.py
is tested with the same python version. You will need to install Docker and docker-compose to your working machine.
- Run
docker-compose -f docker-compose.train.yaml up
. This should produce asubmission/model.joblib
file with trained model and a tar archive with your model ingenerated_submissions
folder. First time docker-compose could run for some time to build the docker image. Next time it will be much quicker. - Run
docker-compose -f docker-compose.test.yaml up
. This will produce a .csv file with your predictions ingenerated_submissions
folder.
Both steps should run without any errors. Test and train data samples required to run this could be found in tests/
folder.
- Review
docker-compose.train.yaml
anddocker-compose.test.yaml
and make sure you understand how your folders with submission and data are mapped inside the container. If your train/val data or submission files are located in different directories, make changes in docker-compose file. To speed up initial debug, you could remove memory & cpu resource constraints. - Run following:
docker-compose -f docker-compose.train.yaml up
. You can of course train your model without any docker container (read on that below), but using docker ensures that the environment in which you train it remains the same as in the Yandex.Contest. This prevents some errors, arising from the library version differences. - Run following:
docker-compose -f docker-compose.test.yaml up
. During the execution you can track the CPU and Memory resources used withdocker stats
. If you exceed memory/cpu resources or time limit here, there is a good chance that your submission will fail to run when submitted to Yandex.Contest. Note that different CPU could run your solution for different amount of time, thus running time on your machine and Yandex.Contest will probably differ. Otherwise, if this executes succesfully (namely, the submission file is saved ingenerated_submissions/
folder), then it should work correctly when you send this archive to the Yandex.Contest. But still, because the test_data (which is not available to you directly, but only be provided for your submission in Yandex.Contest) could be different from the train_data, you could encounter other errors there. The obvious one would happen if you somehow expect the target column to be available in the test data. You could check the test data example intests/test_data_sample
folder - but the only difference in data schema are the missing target columns in thefunnel.csv
. - Now you can use generated submission file to check if metric is calculated correctly. You can first split data to train and val parts and calculate the exact score on your validation part. Using the scorer.py script, for example:
python scorer.py tests/y_true.csv tests/y_pred.csv
You have at least two options here:
- You could build and run docker container, and develop your model inside it (for example, see VSCode tutorial to do that https://code.visualstudio.com/docs/remote/containers). You could start a container using:
docker-compose -f docker-compose.train.yaml run app bash
- You could create new virtual env and install requirements.txt there. For this purpose you could any tool you like to manage virtual envs - for example: virtualenv, conda, pipenv, poetry.
-
Note that you need to edit both
train_model.py
andgenerate_submission.py
if you are making changes in one of them. For example, if you use LogisticRegression instead of SimpleModel while training, you need to do the same while testing. Thus, it is convenient to keep all model and data processing logic inSimpleModel.py
. -
Note that a submission archive name is displayed in "Compilation log" (Лог компиляции). You could use that to identify which exact archive was sent in the particular submission. The other way around is to download your submission.
- Running the solution on Windows we encountered a bug when sometimes submission folder caches inside the docker image (which is not supposed to happen). In that case try to add
--build
flag to your docker-compose command.