Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'xfuse' #70

Open
NicolaasVanRenne opened this issue Jul 6, 2023 · 12 comments
Open

ModuleNotFoundError: No module named 'xfuse' #70

NicolaasVanRenne opened this issue Jul 6, 2023 · 12 comments

Comments

@NicolaasVanRenne
Copy link

NicolaasVanRenne commented Jul 6, 2023

I am trying to run xfuse on the supercomputer system; I'm not an expert but I do manage to run other programs.

When I run xfuse
xfuse convert st --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1

I get the following:

Traceback (most recent call last):
File "/mydir/xfuse/bin/xfuse", line 5, in
from xfuse.main import cli
ModuleNotFoundError: No module named 'xfuse'

So the problem here is he thinks __main__ is located in mydir/xfuse/bin/xfuse, while in reality, it is located in mydir/xfuse/
The export path is set to the bin directory, otherwise he doesnt find xfuse. But now it seems like the other files cannot be found

What am I doing wrong here???

Kind regards,
Nicolaas

@ludvb
Copy link
Owner

ludvb commented Jul 7, 2023

Hi Nicolaas,

The error message seems to indicate that xfuse wasn't found in your python environment. It could be that something failed during the installation and that xfuse didn't get installed properly or that there is some kind of environment conflict. Can you try running the installation command again and post the output here? If it completes without errors, what is the output of python -c 'import xfuse'?

@NicolaasVanRenne
Copy link
Author

To install I run
pip install --target=$VSC_DATA/XFuse git+https://github.com/ludvb/xfuse@master

pip install --target=$VSC_DATA/XFuse git+https://github.com/ludvb/xfuse@master
Collecting git+https://github.com/ludvb/xfuse@master
Cloning https://github.com/ludvb/xfuse (to revision master) to /tmp/pip-req-build-0qzs29rz
Running command git clone -q https://github.com/ludvb/xfuse /tmp/pip-req-build-0qzs29rz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Collecting matplotlib<4.0.0,>=3.3.2
Downloading matplotlib-3.7.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (9.2 MB)
|████████████████████████████████| 9.2 MB 5.5 MB/s
Collecting tomlkit<0.8.0,>=0.7.0
Downloading tomlkit-0.7.2-py2.py3-none-any.whl (32 kB)
Collecting click<8.0.0,>=7.1.2
Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
|████████████████████████████████| 82 kB 1.0 MB/s
Collecting torchvision<0.10.0,>=0.9.1
Downloading torchvision-0.9.1-cp38-cp38-manylinux1_x86_64.whl (17.4 MB)
|████████████████████████████████| 17.4 MB 58.5 MB/s
Collecting tabulate<0.9.0,>=0.8.7
Downloading tabulate-0.8.10-py3-none-any.whl (29 kB)
Collecting tqdm<5.0.0,>=4.51.0
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
|████████████████████████████████| 77 kB 5.9 MB/s
Collecting tifffile<2021.0.0,>=2020.10.1
Downloading tifffile-2020.12.8-py3-none-any.whl (157 kB)
|████████████████████████████████| 157 kB 60.2 MB/s
Collecting scikit-learn<0.25.0,>=0.24.2
Downloading scikit_learn-0.24.2-cp38-cp38-manylinux2010_x86_64.whl (24.9 MB)
|████████████████████████████████| 24.9 MB 57.6 MB/s
Collecting pandas<2.0.0,>=1.1.4
Downloading pandas-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
|████████████████████████████████| 12.2 MB 57.3 MB/s
Collecting numpy<2.0.0,>=1.19.4
Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
|████████████████████████████████| 17.3 MB 61.2 MB/s
Collecting torch<2.0.0,>=1.8.1
Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
|████████████████████████████████| 887.4 MB 3.4 kB/s
Collecting Pillow<10.0.0,>=9.0.1
Downloading Pillow-9.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
|████████████████████████████████| 3.3 MB 62.7 MB/s
Collecting opencv-python<5.0.0,>=4.4.0
Downloading opencv_python-4.8.0.74-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.7 MB)
|████████████████████████████████| 61.7 MB 592 kB/s
Collecting pyro-ppl<1.6.0,>=1.5.0
Downloading pyro_ppl-1.5.2-py3-none-any.whl (607 kB)
|████████████████████████████████| 607 kB 59.6 MB/s
Collecting h5py<4.0.0,>=3.0.0
Downloading h5py-3.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
|████████████████████████████████| 4.8 MB 65.7 MB/s
Collecting scipy<2.0.0,>=1.5.4
Downloading scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
|████████████████████████████████| 34.5 MB 61.0 MB/s
Collecting tensorboard<3.0.0,>=2.5.0
Downloading tensorboard-2.13.0-py3-none-any.whl (5.6 MB)
|████████████████████████████████| 5.6 MB 62.1 MB/s
Collecting imageio<3.0.0,>=2.9.0
Downloading imageio-2.31.1-py3-none-any.whl (313 kB)
|████████████████████████████████| 313 kB 62.4 MB/s
Collecting pyparsing<3.1,>=2.3.1
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
|████████████████████████████████| 98 kB 8.9 MB/s
Collecting packaging>=20.0
Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
|████████████████████████████████| 1.2 MB 60.6 MB/s
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting importlib-resources>=3.2.0; python_version < "3.10"
Using cached importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting contourpy>=1.0.1
Downloading contourpy-1.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (300 kB)
|████████████████████████████████| 300 kB 59.8 MB/s
Collecting fonttools>=4.22.0
Downloading fonttools-4.40.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)
|████████████████████████████████| 4.4 MB 61.7 MB/s
Collecting python-dateutil>=2.7
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
|████████████████████████████████| 247 kB 986 kB/s
Collecting threadpoolctl>=2.0.0
Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collecting joblib>=0.11
Downloading joblib-1.3.1-py3-none-any.whl (301 kB)
|████████████████████████████████| 301 kB 63.9 MB/s
Collecting pytz>=2020.1
Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
|████████████████████████████████| 502 kB 66.0 MB/s
Collecting nvidia-cuda-runtime-cu11==11.7.99; platform_system == "Linux"
Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
|████████████████████████████████| 849 kB 65.4 MB/s
Collecting nvidia-cublas-cu11==11.10.3.66; platform_system == "Linux"
Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
|████████████████████████████████| 317.1 MB 60 kB/s
Collecting nvidia-cuda-nvrtc-cu11==11.7.99; platform_system == "Linux"
Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
|████████████████████████████████| 21.0 MB 64.1 MB/s
Collecting typing-extensions
Downloading typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Collecting nvidia-cudnn-cu11==8.5.0.96; platform_system == "Linux"
Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
|████████████████████████████████| 557.1 MB 20 kB/s
Collecting pyro-api>=0.1.1
Downloading pyro_api-0.1.2-py3-none-any.whl (11 kB)
Collecting opt-einsum>=2.3.2
Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
|████████████████████████████████| 65 kB 3.8 MB/s
Collecting markdown>=2.6.8
Downloading Markdown-3.4.3-py3-none-any.whl (93 kB)
|████████████████████████████████| 93 kB 2.5 MB/s
Collecting absl-py>=0.4
Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
|████████████████████████████████| 126 kB 62.1 MB/s
Collecting werkzeug>=1.0.1
Downloading Werkzeug-2.3.6-py3-none-any.whl (242 kB)
|████████████████████████████████| 242 kB 54.4 MB/s
Collecting requests<3,>=2.21.0
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting setuptools>=41.0.0
Downloading setuptools-68.0.0-py3-none-any.whl (804 kB)
|████████████████████████████████| 804 kB 57.9 MB/s
Collecting google-auth<3,>=1.6.3
Downloading google_auth-2.21.0-py2.py3-none-any.whl (182 kB)
|████████████████████████████████| 182 kB 60.8 MB/s
Collecting tensorboard-data-server<0.8.0,>=0.7.0
Downloading tensorboard_data_server-0.7.1-py3-none-manylinux2014_x86_64.whl (6.6 MB)
|████████████████████████████████| 6.6 MB 58.9 MB/s
Collecting protobuf>=3.19.6
Downloading protobuf-4.23.4-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
|████████████████████████████████| 304 kB 60.0 MB/s
Collecting google-auth-oauthlib<1.1,>=0.5
Downloading google_auth_oauthlib-1.0.0-py2.py3-none-any.whl (18 kB)
Collecting grpcio>=1.48.2
Downloading grpcio-1.56.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
|████████████████████████████████| 5.2 MB 66.0 MB/s
Collecting wheel>=0.26
Downloading wheel-0.40.0-py3-none-any.whl (64 kB)
|████████████████████████████████| 64 kB 2.6 MB/s
Collecting zipp>=3.1.0; python_version < "3.10"
Using cached zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting six>=1.5
Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting importlib-metadata>=4.4; python_version < "3.10"
Using cached importlib_metadata-6.7.0-py3-none-any.whl (22 kB)
Collecting MarkupSafe>=2.1.1
Downloading MarkupSafe-2.1.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting urllib3<3,>=1.21.1
Downloading urllib3-2.0.3-py3-none-any.whl (123 kB)
|████████████████████████████████| 123 kB 57.3 MB/s
Collecting charset-normalizer<4,>=2
Using cached charset_normalizer-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB)
Collecting idna<4,>=2.5
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting rsa<5,>=3.1.4
Downloading rsa-4.9-py3-none-any.whl (34 kB)
Collecting pyasn1-modules>=0.2.1
Downloading pyasn1_modules-0.3.0-py2.py3-none-any.whl (181 kB)
|████████████████████████████████| 181 kB 69.4 MB/s
Collecting cachetools<6.0,>=2.0.0
Downloading cachetools-5.3.1-py3-none-any.whl (9.3 kB)
Collecting requests-oauthlib>=0.7.0
Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting pyasn1>=0.1.3
Downloading pyasn1-0.5.0-py2.py3-none-any.whl (83 kB)
|████████████████████████████████| 83 kB 2.3 MB/s
Collecting oauthlib>=3.0.0
Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
|████████████████████████████████| 151 kB 60.8 MB/s
Building wheels for collected packages: xfuse
Building wheel for xfuse (PEP 517) ... done
Created wheel for xfuse: filename=xfuse-0.2.1-py3-none-any.whl size=87088 sha256=e1ba9840134da93427525f650357622087858c8c8cb5e69420d100f185a0d6c4
Stored in directory: /tmp/pip-ephem-wheel-cache-8rbbt0hq/wheels/20/58/f1/d7191214548dcce67218fdd3a25cc01da0c61d126ac43a9755
Successfully built xfuse
ERROR: torchvision 0.9.1 has requirement torch==1.8.1, but you'll have torch 1.13.1 which is incompatible.
ERROR: google-auth 2.21.0 has requirement urllib3<2.0, but you'll have urllib3 2.0.3 which is incompatible.
Installing collected packages: pyparsing, packaging, kiwisolver, numpy, cycler, Pillow, zipp, importlib-resources, contourpy, fonttools, six, python-dateutil, matplotlib, tomlkit, click, wheel, setuptools, nvidia-cuda-runtime-cu11, nvidia-cublas-cu11, nvidia-cuda-nvrtc-cu11, typing-extensions, nvidia-cudnn-cu11, torch, torchvision, tabulate, tqdm, tifffile, threadpoolctl, joblib, scipy, scikit-learn, pytz, pandas, opencv-python, pyro-api, opt-einsum, pyro-ppl, h5py, importlib-metadata, markdown, absl-py, MarkupSafe, werkzeug, certifi, urllib3, charset-normalizer, idna, requests, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, tensorboard-data-server, protobuf, oauthlib, requests-oauthlib, google-auth-oauthlib, grpcio, tensorboard, imageio, xfuse
Successfully installed MarkupSafe-2.1.3 Pillow-9.5.0 absl-py-1.4.0 cachetools-5.3.1 certifi-2023.5.7 charset-normalizer-3.1.0 click-7.1.2 contourpy-1.1.0 cycler-0.11.0 fonttools-4.40.0 google-auth-2.21.0 google-auth-oauthlib-1.0.0 grpcio-1.56.0 h5py-3.9.0 idna-3.4 imageio-2.31.1 importlib-metadata-6.7.0 importlib-resources-5.12.0 joblib-1.3.1 kiwisolver-1.4.4 markdown-3.4.3 matplotlib-3.7.2 numpy-1.24.4 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 oauthlib-3.2.2 opencv-python-4.8.0.74 opt-einsum-3.3.0 packaging-23.1 pandas-1.5.3 protobuf-4.23.4 pyasn1-0.5.0 pyasn1-modules-0.3.0 pyparsing-3.0.9 pyro-api-0.1.2 pyro-ppl-1.5.2 python-dateutil-2.8.2 pytz-2023.3 requests-2.31.0 requests-oauthlib-1.3.1 rsa-4.9 scikit-learn-0.24.2 scipy-1.10.1 setuptools-68.0.0 six-1.16.0 tabulate-0.8.10 tensorboard-2.13.0 tensorboard-data-server-0.7.1 threadpoolctl-3.1.0 tifffile-2020.12.8 tomlkit-0.7.2 torch-1.13.1 torchvision-0.9.1 tqdm-4.65.0 typing-extensions-4.7.1 urllib3-2.0.3 werkzeug-2.3.6 wheel-0.40.0 xfuse-0.2.1 zipp-3.15.0
WARNING: You are using pip version 20.0.2; however, version 23.1.2 is available.
You should consider upgrading via the '/apps/antwerpen/broadwell/centos8/Python/3.8.3-intel-2020a/bin/python3.8 -m pip install --upgrade pip' command.

Errors thrown in bold -
when I do as asked and put python -c 'import xfuse'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'xfuse'

Really think its a problem with the way I don't know how to handle github installations on the supercomputer. I also asked the supercomputer support team for help - I will keep you updated from my side, but if you know what is the problem, please let me know

Nicolaas

@ludvb
Copy link
Owner

ludvb commented Jul 7, 2023

Thanks for the update Nicolaas, do let us know what you hear from the support team.

I haven't used the --target option before with pip, but I'm wondering if you maybe need to also add the install path to your PYTHONPATH environment variable? This is very much a guess, but can you try running something like PYTHONPATH+=:$VSC_DATA/XFuse python -c 'import xfuse' and see if you still get the same error?

@NicolaasVanRenne
Copy link
Author

I ran PYTHONPATH+=:$VSC_DATA/XFuse python -c 'import xfuse'

and got this answer:

/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
setattr(self, word, getattr(machar, word).flat[0])
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
return self._float_to_str(self.smallest_subnormal)
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
setattr(self, word, getattr(machar, word).flat[0])
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
return self._float_to_str(self.smallest_subnormal)
/data/antwerpen/208/vsc20830/XFuse/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
from pandas.core.computation.check import NUMEXPR_INSTALLED

when running xfuse convert st --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1

It throws the same error:

Traceback (most recent call last):
File "/data/antwerpen/208/vsc20830/XFuse/bin/xfuse", line 5, in
from xfuse.main import cli
ModuleNotFoundError: No module named 'xfuse'

@ludvb
Copy link
Owner

ludvb commented Jul 7, 2023

Great! I think those warnings are fine, at least it seems the import is working with the new PYTHONPATH. This modifies the environment for the prefixed command only AFAIK. Can you try prefixing the xfuse convert by PYTHONPATH+=:$VSC_DATA/XFuse too (or run export PYTHONPATH+=:$VSC_DATA/XFuse before)?

@NicolaasVanRenne
Copy link
Author

NicolaasVanRenne commented Jul 7, 2023

I run

export PYTHONPATH+=:$VSC_DATA/XFuse
xfuse convert st --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1

It says:

-bash: xfuse: command not found

I don't understand what you mean by prefixing, but I assumed you meant this: PYTHONPATH+=:$VSC_DATA/XFuse/xfuse convert st --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1

Then he goes:

-bash: convert: command not found

`
So he finds xfuse but not convert? I don't understand anything of all this. When I was a kid we had MS-DOS and that thing just worked fine :/

R is a bit tricky in the beginning but then its just childsplay.

Give me these linux systems and I'm caught in never-ending shitstorm of errors... This is so frustrating... I just don't understand the underlying data structure

@ludvb
Copy link
Owner

ludvb commented Jul 7, 2023

Hah, yes I feel you, package and dependency management in Linux can indeed be frustrating at times :)

That first error suggests that your shell can no longer find the xfuse executable, maybe you are running the command from a different directory? Alternatively you can always write out the full path: /data/antwerpen/208/vsc20830/XFuse/bin/xfuse convert st ...

That second command looks good but I think you are missing a space between the environment modifier and the command: PYTHONPATH+=:$VSC_DATA/XFuse/ xfuse convert st .... Without the space, xfuse is a part of the modifier and the shell tries to invoke the command convert instead.

My understanding here is that Python is reading the environment variable PYTHONPATH in order to figure out where to look for package modules. So when using a non-default install path (with pip install --target ...), we need to append that to the PYTHONPATH so that Python will know where to look.

@NicolaasVanRenne
Copy link
Author

calling /data/antwerpen/208/vsc20830/XFuse/bin/ xfuse convert --counts section1.tsv --image section1.jpg --transformation-matrix section1-alignment.txt --scale 0.15 --save-path section1 actually worked!!!!

well... maybe not 100% but we are getting there.

Now he throws the following error:

/

data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
setattr(self, word, getattr(machar, word).flat[0])
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
return self._float_to_str(self.smallest_subnormal)
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
setattr(self, word, getattr(machar, word).flat[0])
/data/antwerpen/208/vsc20830/XFuse/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
return self._float_to_str(self.smallest_subnormal)
/data/antwerpen/208/vsc20830/XFuse/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).
from pandas.core.computation.check import NUMEXPR_INSTALLED
Usage: xfuse convert [OPTIONS] COMMAND [ARGS]...
Try 'xfuse convert --help' for help.

Error: no such option: --counts

What does he mean by 'no such option: --counts'? Am I doing sth wrong in the syntax?

@ludvb
Copy link
Owner

ludvb commented Jul 7, 2023

Great, looks promising :) xfuse convert has different subcommands for the type of data you are converting (st or visium). So I think we will be fine if you just add st now to that command (i.e., xfuse convert st --counts ...)

@NicolaasVanRenne
Copy link
Author

omfg its running

*crying emoji

@NicolaasVanRenne
Copy link
Author

Right. Now the next challenge; getting the GPU operational ;)

Thanks for the help thusfar! Much appreciated!!!

Quick questin; If you stop and re-run the analysis, you seem to start a new epoch. Does that affect the quality in any way?

And if I want to run the analysis in a shorter time-frame (quick and dirty, just to see some output fast instead of running for multiple days), would reducing the epoch number from 100k to 10k do the trick? Or would it be better to reduce the scale to .01 during pre-processing?

@ludvb
Copy link
Owner

ludvb commented Jul 8, 2023

Right. Now the next challenge; getting the GPU operational ;)

I'd guess this depends on the HPC environment (if you are using a workload manager etc.). For the Python installation, just make sure that you have a torch version with cuda support (visible as "+cuXX" in the version string):

$ python -c 'import torch; print(torch.__version__)'
1.13.1+cu117

You can check if the GPU is available like so:

$ python -c 'import torch; print(torch.cuda.is_available())'
True

Quick questin; If you stop and re-run the analysis, you seem to start a new epoch. Does that affect the quality in any way?

Do you mean starting the run anew or when restoring a previous run (https://github.com/ludvb/xfuse#stopping-and-resuming-a-run)? In general, I wouldn't be concerned so much about starting a new epoch, since the data is shuffled/sampled randomly. More important will be the number of steps taken by the optimizer.

And if I want to run the analysis in a shorter time-frame (quick and dirty, just to see some output fast instead of running for multiple days), would reducing the epoch number from 100k to 10k do the trick? Or would it be better to reduce the scale to .01 during pre-processing?

This is a good question. I haven't really found a good way to speed up the analysis frankly. Reducing the scale would indeed be a possibility but could impact the results quite negatively (you will still get some outputs to experiment with, though). What I would suggest is to monitor the training progress using tensorboard (https://github.com/ludvb/xfuse#tracking-the-training-progress), which will give you a good indication of when the model is starting to learn interesting structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants