Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running parallel #1

Open
shabnamkadir opened this issue Jul 11, 2015 · 9 comments
Open

Error running parallel #1

shabnamkadir opened this issue Jul 11, 2015 · 9 comments
Assignees
Labels

Comments

@shabnamkadir
Copy link
Contributor

Traceback (most recent call last):
  File "parallel_global_script.py", line 294, in <module>
    supercluster_results = lbv.map(lambda channel: supercluster_info['kk_sub'][channel].cluster_mask_starts(),full_adjacency.keys())
  File "<string>", line 2, in map
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results
    ret = f(self, *args, **kwargs)
  File "<string>", line 2, in map
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids
    ret = f(self, *args, **kwargs)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 1123, in map
    return pf.map(*sequences)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 271, in map
    ret = self(*sequences)
  File "<string>", line 2, in __call__
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 78, in sync_view_results
    return f(self, *args, **kwargs)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 254, in __call__
    return r.get()
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 104, in get
    raise self._exception
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 139, in wait
    results = error.collect_exceptions(results, self._fname)
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 233, in collect_exceptions
    raise e
  File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 231, in collect_exceptions
    raise CompositeError(msg, elist)
IPython.parallel.error.CompositeError: one or more exceptions from call to method: <lambda>
[0:apply]: NameError: name 'supercluster_info' is not defined
@shabnamkadir shabnamkadir self-assigned this Jul 11, 2015
@rossant
Copy link
Member

rossant commented Jul 11, 2015

you need to import your functions on all nodes first

@shabnamkadir
Copy link
Contributor Author

Appears to now run, but gives nonsensical results:
'''
INFO klustakwik: Number of spikes in data set: 4001
INFO klustakwik: Number of unique masks in data set: 2575
INFO klustakwik.initial_parameters: full_step_every = 1
INFO klustakwik.initial_parameters: penalty_k = 0.0
INFO klustakwik.initial_parameters: fast_split = False
INFO klustakwik.initial_parameters: split_every = 40
INFO klustakwik.initial_parameters: use_noise_cluster = True
INFO klustakwik.initial_parameters: subset_break_fraction = 0.01
INFO klustakwik.initial_parameters: mua_point = 2
INFO klustakwik.initial_parameters: max_split_iterations = None
INFO klustakwik.initial_parameters: max_possible_clusters = 1000
INFO klustakwik.initial_parameters: max_iterations = 1000
INFO klustakwik.initial_parameters: break_fraction = 0.0
INFO klustakwik.initial_parameters: prior_point = 1
INFO klustakwik.initial_parameters: num_changed_threshold = 0.05
INFO klustakwik.initial_parameters: always_split_bimodal = False
INFO klustakwik.initial_parameters: use_mua_cluster = True
INFO klustakwik.initial_parameters: split_first = 20
INFO klustakwik.initial_parameters: points_for_cluster_mask = 100
INFO klustakwik.initial_parameters: max_quick_step_candidates_fraction = 0.4
INFO klustakwik.initial_parameters: penalty_k_log_n = 1.0
INFO klustakwik.initial_parameters: max_quick_step_candidates = 100000000
INFO klustakwik.initial_parameters: noise_point = 1
INFO klustakwik.initial_parameters: num_starting_clusters = 500
INFO klustakwik.initial_parameters: consider_cluster_deletion = True
INFO klustakwik.initial_parameters: dist_thresh = 9.21034037198
INFO klustakwik.initial_parameters: use_noise_cluster = True
INFO klustakwik.initial_parameters: use_mua_cluster = True
Time taken for parallel clustering 156.44 s
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]
'''

@shabnamkadir
Copy link
Contributor Author

I'm not sure why it is returning None objects instead of KK objects.

@shabnamkadir
Copy link
Contributor Author

'''
Traceback (most recent call last):
File "parallel_global_script.py", line 304, in
supercluster_results = lbv.map(lambda channel: run_subset_KK(supercluster_info['kk_sub'][channel]),full_adjacency.keys())
File "", line 2, in map
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results
ret = f(self, _args, *_kwargs)
File "", line 2, in map
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids
ret = f(self, _args, *_kwargs)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 1123, in map
return pf.map(_sequences)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 271, in map
ret = self(_sequences)
File "", line 2, in call
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 78, in sync_view_results
return f(self, _args, *_kwargs)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/remotefunction.py", line 254, in call
return r.get()
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 104, in get
raise self._exception
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/asyncresult.py", line 139, in wait
results = error.collect_exceptions(results, self._fname)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 233, in collect_exceptions
raise e
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/error.py", line 231, in collect_exceptions
raise CompositeError(msg, elist)
IPython.parallel.error.CompositeError: one or more exceptions from call to method:
[5:apply]: NameError: name 'run_subset_KK' is not defined
[1:apply]: NameError: name 'run_subset_KK' is not defined
[7:apply]: NameError: name 'run_subset_KK' is not defined
[3:apply]: NameError: name 'run_subset_KK' is not defined
.... 116 more exceptions ...

'''

@shabnamkadir
Copy link
Contributor Author

importing run_subset_KK from parallel_global on engine(s)
[0:apply]: 
---------------------------------------------------------------------------ImportError                               Traceback (most recent call last)<string> in <module>()
/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py in remote_import(name, fromlist, level)
    439             import sys
    440             user_ns = globals()
--> 441             mod = __import__(name, fromlist=fromlist, level=level)
    442             if fromlist:
    443                 for key in fromlist:
ImportError: No module named 'parallel_global'

[1:apply]: 

@shabnamkadir
Copy link
Contributor Author

This bug is now a Heisenbug. It sometimes parallelises fine.

About to parallelize
Time taken for parallel clustering 151.85 s

@shabnamkadir
Copy link
Contributor Author

It always fails the first time it is launched, but if you keep the same engines running
and don't restart, and run the script again - it works! The second time, the clustering happens fine...

@shabnamkadir
Copy link
Contributor Author

When changing the number of points without restarting the engines (yes, I know):

About to parallelize
Time taken for parallel clustering 587.06 s
Traceback (most recent call last):
  File "parallel_global_script_40000.py", line 180, in <module>
    superclusters[supercluster_info['sub_spikes'][channel],i] = supercluster_results[i]+1
ValueError: shape mismatch: value array of shape (304,) could not be broadcast to indexing result of shape (2898,)

@shabnamkadir
Copy link
Contributor Author

Possibly related:
Traceback (most recent call last):
File "nickground_global_script_1280000.py", line 185, in
c[:]['supercluster_info'] = supercluster_info
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 806, in setitem
self.update({key:value})
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 687, in update
return self.push(ns, block=self.block, track=self.track)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 708, in push
return self._really_apply(util._push, kwargs=ns, block=block, track=track, targets=targets)
File "", line 2, in _really_apply
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 55, in sync_results
ret = f(self, _args, *_kwargs)
File "", line 2, in _really_apply
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 40, in save_ids
ret = f(self, _args, *_kwargs)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/view.py", line 562, in _really_apply
ident=ident)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/parallel/client/client.py", line 1280, in send_apply_request
metadata=metadata, track=track)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/IPython/kernel/zmq/session.py", line 660, in send
tracker = stream.send_multipart(to_send, copy=False, track=True)
File "/home/skadir/.conda/envs/globalphy/lib/python3.4/site-packages/zmq/sugar/socket.py", line 331, in send_multipart
return self.send(msg_parts[-1], flags, copy=copy, track=track)
File "zmq/backend/cython/socket.pyx", line 619, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6169)
File "zmq/backend/cython/socket.pyx", line 674, in zmq.backend.cython.socket.Socket.send (zmq/backend/cython/socket.c:6034)
File "zmq/backend/cython/socket.pyx", line 169, in zmq.backend.cython.socket._send_frame (zmq/backend/cython/socket.c:2118)
File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:6920)
zmq.error.Again: Resource temporarily unavailable
Bad address (stream_engine.cpp:788)
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants