Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object #329

Open
Harry1035 opened this issue Mar 20, 2024 · 2 comments

Comments

@Harry1035
Copy link

In the windows environment, the following problems occur when extracting the WIKI corpus:

INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 643, in <module>
    main()
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
    process_dump(input_file, args.templates, output_path, file_size,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 417, in process_dump
    Process = get_context("fork").Process
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 243, in get_context
    return super().get_context(method)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 193, in get_context
    raise ValueError('cannot find context for %r' % method) from None
ValueError: cannot find context for 'fork'

dit: wikiextractor\WikiExtractor.py line 417

Process = get_context("fork").Process
->
Process = get_context("spawn").Process

A new problem arises:

INFO: Starting page extraction from zhwiki-20240301-pages-articles-multistream.xml.bz2.
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\anaconda3\envs\myenv\Scripts\wikiextractor.exe\__main__.py", line 7, in <module>
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 639, in main
    process_dump(input_file, args.templates, output_path, file_size,
  File "D:\ProgramData\anaconda3\envs\myenv\lib\site-packages\wikiextractor\WikiExtractor.py", line 425, in process_dump
    reduce.start()
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "D:\ProgramData\anaconda3\envs\myenv\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
@sghyan16
Copy link

"ValueError: cannot find context for 'fork' "
I got the same problem

@dengmengjie
Copy link

dengmengjie commented Mar 25, 2024

Thanks for your solution about switching to "spawn" mode! I am having the same issue on windows. I tried several ways but none of them work. I guess the problem lies in reduce = Process(target=reduce_process, args=(output_queue, output)), where the file output is a non-pickable parameter.
See here.
You could try running this code in ubuntu or other linux system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants