Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

horovodrun command reports an error and cannot run the examples #26

Open
FyuNaru opened this issue Jul 25, 2023 · 0 comments
Open

horovodrun command reports an error and cannot run the examples #26

FyuNaru opened this issue Jul 25, 2023 · 0 comments

Comments

@FyuNaru
Copy link

FyuNaru commented Jul 25, 2023

I installed the environment exactly according to the steps in INSTALLING.md. When I used the commands in TRAINING.md to test, the following error occurred


horovodrun -np 2 -H 192.168.31.6:2 --verbose python examples/torch/pytorch_mnist.py


output:

Filtering local host names.
Remote host found:
All hosts are local, finding the interfaces with address 127.0.0.1
Local interface found lo
mpirun --allow-run-as-root --tag-output -np 2 -H 192.168.31.6:2 -bind-to none -map-by slot -mca btl_tcp_if_include lo -x NCCL_SOCKET_IFNAME=lo -x ADDR2LINE -x AR -x AS -x BROWSER -x CC -x CFLAGS -x CMAKE_PREFIX_PATH -x COLORTERM -x CONDA_BACKUP_ADDR2LINE -x CONDA_BACKUP_AR -x CONDA_BACKUP_AS -x CONDA_BACKUP_CC -x CONDA_BACKUP_CFLAGS -x CONDA_BACKUP_CMAKE_PREFIX_PATH -x CONDA_BACKUP_CONDA_BUILD_SYSROOT -x CONDA_BACKUP_CPP -x CONDA_BACKUP_CPPFLAGS -x CONDA_BACKUP_CXX -x CONDA_BACKUP_CXXFILT -x CONDA_BACKUP_CXXFLAGS -x CONDA_BACKUP_DEBUG_CFLAGS -x CONDA_BACKUP_DEBUG_CPPFLAGS -x CONDA_BACKUP_DEBUG_CXXFLAGS -x CONDA_BACKUP_ELFEDIT -x CONDA_BACKUP_GCC -x CONDA_BACKUP_GCC_AR -x CONDA_BACKUP_GCC_NM -x CONDA_BACKUP_GCC_RANLIB -x CONDA_BACKUP_GPROF -x CONDA_BACKUP_GXX -x CONDA_BACKUP_HOST -x CONDA_BACKUP_LD -x CONDA_BACKUP_LDFLAGS -x CONDA_BACKUP_LD_GOLD -x CONDA_BACKUP_NM -x CONDA_BACKUP_OBJCOPY -x CONDA_BACKUP_OBJDUMP -x CONDA_BACKUP_RANLIB -x CONDA_BACKUP_READELF -x CONDA_BACKUP_SIZE -x CONDA_BACKUP_STRINGS -x CONDA_BACKUP_STRIP -x CONDA_BACKUP__CONDA_PYTHON_SYSCONFIGDATA_NAME -x CONDA_BUILD_SYSROOT -x CONDA_CUPY_CUDA_PATH -x CONDA_DEFAULT_ENV -x CONDA_EXE -x CONDA_PREFIX -x CONDA_PREFIX_1 -x CONDA_PREFIX_2 -x CONDA_PREFIX_3 -x CONDA_PREFIX_4 -x CONDA_PREFIX_5 -x CONDA_PREFIX_6 -x CONDA_PREFIX_7 -x CONDA_PROMPT_MODIFIER -x CONDA_PYTHON_EXE -x CONDA_SHLVL -x CPP -x CPPFLAGS -x CUDA_PATH -x CXX -x CXXFILT -x CXXFLAGS -x DBUS_SESSION_BUS_ADDRESS -x DEBUG_CFLAGS -x DEBUG_CPPFLAGS -x DEBUG_CXXFLAGS -x ELFEDIT -x GCC -x GCC_AR -x GCC_NM -x GCC_RANLIB -x GIT_ASKPASS -x GPROF -x GXX -x HOME -x HOROVOD_CCL_BGT_AFFINITY -x HOROVOD_GLOO_TIMEOUT_SECONDS -x HOROVOD_NUM_NCCL_STREAMS -x HOROVOD_STALL_CHECK_TIME_SECONDS -x HOROVOD_STALL_SHUTDOWN_TIME_SECONDS -x HOST -x LANG -x LANGUAGE -x LD -x LDFLAGS -x LD_GOLD -x LESSCLOSE -x LESSOPEN -x LOGNAME -x LS_COLORS -x MOTD_SHOWN -x NCCL_SOCKET_IFNAME -x NM -x OBJCOPY -x OBJDUMP -x PATH -x PWD -x RANLIB -x READELF -x SHELL -x SHLVL -x SIZE -x SSH_CLIENT -x SSH_CONNECTION -x STRINGS -x STRIP -x TERM -x TERM_PROGRAM -x TERM_PROGRAM_VERSION -x USER -x VSCODE_GIT_ASKPASS_EXTRA_ARGS -x VSCODE_GIT_ASKPASS_MAIN -x VSCODE_GIT_ASKPASS_NODE -x VSCODE_GIT_IPC_HANDLE -x VSCODE_IPC_HOOK_CLI -x XDG_DATA_DIRS -x XDG_RUNTIME_DIR -x XDG_SESSION_CLASS -x XDG_SESSION_ID -x XDG_SESSION_TYPE -x _ -x _CE_CONDA -x _CE_M -x _CONDA_PYTHON_SYSCONFIGDATA_NAME python examples/torch/pytorch_mnist.py
[mpiexec@gpu-server-1] match_arg (lib/utils/args.c:166): unrecognized argument allow-run-as-root
[mpiexec@gpu-server-1] HYDU_parse_array (lib/utils/args.c:181): argument matching returned error
[mpiexec@gpu-server-1] parse_args (mpiexec/get_parameters.c:315): error parsing input array
[mpiexec@gpu-server-1] HYD_uii_mpx_get_parameters (mpiexec/get_parameters.c:47): unable to parse user arguments
[mpiexec@gpu-server-1] main (mpiexec/mpiexec.c:49): error parsing parameters


It is difficult to find a solution to this error on the Internet. I speculate that the version of mpi is too new. When I use the mpirun --version command, the version of mpi I get is 4.1.1. But I don't know how to solve this problem. I tried various solutions, such as replacing an older server with a completely different configuration, but the same problem still occurred

Hope to get your help, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant