Example solution for IDAO 2021 Finals
Docker image will have python 3.8.5. The scorer.py
is tested with the same python version. You will need to install Docker and docker-compose to your working machine.
docker-compose -f docker-compose.train.yaml up
. This should produce a submission/model.joblib
file with trained model and a tar archive with your model in generated_submissions
folder. First time docker-compose could run for some time to build the docker image. Next time it will be much quicker.docker-compose -f docker-compose.test.yaml up
. This will produce a .csv file with your predictions in generated_submissions
folder.Both steps should run without any errors. Test and train data samples required to run this could be found in tests/
folder.
docker-compose.train.yaml
and docker-compose.test.yaml
and make sure you understand how your folders with submission and data are mapped inside the container. If your train/val data or submission files are located in different directories, make changes in docker-compose file. To speed up initial debug, you could remove memory & cpu resource constraints.docker-compose -f docker-compose.train.yaml up
. You can of course train your model without any docker container (read on that below), but using docker ensures that the environment in which you train it remains the same as in the Yandex.Contest. This prevents some errors, arising from the library version differences.docker-compose -f docker-compose.test.yaml up
. During the execution you can track the CPU and Memory resources used with docker stats
. If you exceed memory/cpu resources or time limit here, there is a good chance that your submission will fail to run when submitted to Yandex.Contest. Note that different CPU could run your solution for different amount of time, thus running time on your machine and Yandex.Contest will probably differ.generated_submissions/
folder), then it should work correctly when you send this archive to the Yandex.Contest. But still, because the test_data (which is not available to you directly, but only be provided for your submission in Yandex.Contest) could be different from the train_data, you could encounter other errors there. The obvious one would happen if you somehow expect the target column to be available in the test data. You could check the test data example in tests/test_data_sample
folder - but the only difference in data schema are the missing target columns in the funnel.csv
.
python scorer.py tests/y_true.csv tests/y_pred.csv
You have at least two options here:
docker-compose -f docker-compose.train.yaml run app bash
Note that you need to edit both train_model.py
and generate_submission.py
if you are making changes in one of them. For example, if you use LogisticRegression instead of SimpleModel while training, you need to do the same while testing. Thus, it is convenient to keep all model and data processing logic in SimpleModel.py
.
Note that a submission archive name is displayed in “Compilation log” (Лог компиляции). You could use that to identify which exact archive was sent in the particular submission. The other way around is to download your submission.
--build
flag to your docker-compose command.