This repository has been archived by the owner on Feb 20, 2024. It is now read-only.
Random uuid name cause potential "No module named xxx" error during load parameters #159
Labels
bug
Something isn't working
I encountered a "No module named xxx" error when loading parameter of my model is called when launching an inference job. Here is the error trace:
2019-07-10 02:21:07,256 rafiki.utils.service INFO Starting worker "75be99ec25a6" for service of ID "614d740e-9791-4c64-aafe-dc17cf7e7866"...
2019-07-10 02:21:07,511 rafiki.worker.inference INFO Starting inference worker for service of id 614d740e-9791-4c64-aafe-dc17cf7e7866...
2019-07-10 02:21:07,519 rafiki.cache.cache INFO add_worker_of_inference_job:INFERENCE_WORKERS_b6592484-deb4-4df2-bce3-ffc82d9a125a=614d740e-9791-4c64-aafe-dc17cf7e7866
2019-07-10 02:21:09,131 rafiki.utils.service ERROR Error while running worker:
2019-07-10 02:21:09,131 rafiki.utils.service ERROR Traceback (most recent call last):
File "/root/rafiki/utils/service.py", line 31, in run_worker
start_worker(service_id, service_type, container_id)
File "scripts/start_worker.py", line 24, in start_worker
worker.start()
File "/root/rafiki/worker/inference.py", line 41, in start
self._model = self._load_model(trial_id)
File "/root/rafiki/worker/inference.py", line 91, in _load_model
model_inst.load_parameters(parameters)
File "/root/e4568ce2-9d44-47b8-ac7f-1e8143168140.py", line 235, in load_parameters
ModuleNotFoundError: No module named '797342b4-9d38-432f-91f6-727eac25db71'
After debugging I figured that it is a potential bug of Rafiki and pickle. This bug is caused by pickling self-defined class objects(defined in model source code).
Pickle requires the pickled object's class to be importable during pickle.loads(), by using the same import path memorized during pickle.dumps. However, each time a train trail or inference job is launched, a random UUID name will be given to the model source code file name. This caused the inconsistency of import path during dumping and loading.
This bug is not revealed because currently, the models in Rafiki are only pickling imported class object or python "primitives". Their import path is consistent.
Potential fix for this bug could be:
Thank you!
The text was updated successfully, but these errors were encountered: