-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that the container running on the UChicago AF JupyterHub is able to control worker nodes through Dask #4
Comments
Just pasting here some of the next steps we should be targeting
|
For tracking purposes, @mvigl can you also comment here what the data sets are that are needed for https://gitlab.cern.ch/gstark/pycolumnarprototype (not sure what branch is being used for dev at the moment) to run the notebook demos and where those are located on the MWT2 at the moment? |
So right away we run into some permissions issues with Git in the container (will have to think of the best way to sort that out) (venv) [bash][feickert AnalysisBase-24.2.26]:workarea > . /release_setup.sh
(venv) [bash][feickert AnalysisBase-24.2.26]:workarea > mkdir container_testing
(venv) [bash][feickert AnalysisBase-24.2.26]:workarea > cd container_testing/
(venv) [bash][feickert AnalysisBase-24.2.26]:container_testing > git clone --recursive ssh://[email protected]:7999/gstark/pycolumnarprototype.git
Cloning into 'pycolumnarprototype'...
Warning: Permanently added the ECDSA host key for IP address '[188.185.35.37]:7999' to the list of known hosts.
Permission denied (publickey,keyboard-interactive).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
(venv) [bash][feickert AnalysisBase-24.2.26]:container_testing > git clone -v --recursive https://gitlab.cern.ch/gstark/pycolumnarprototype.git
Cloning into 'pycolumnarprototype'...
# This hangs forever But if we avoid that for the moment and do from a login node on the UChicago AF $ ssh [email protected]
[16:19] login02.af.uchicago.edu:~ $ cd workarea/container_testing/
[16:20] login02.af.uchicago.edu:~/workarea/container_testing $ git clone --branch py_el_tool_test --recursive ssh://[email protected]:7999/gstark/pycolumnarprototype.git and then flip back to JupyterLab we're able to build as expected (venv) [bash][feickert AnalysisBase-24.2.26]:pycolumnarprototype > cmake -S src -B build
(venv) [bash][feickert AnalysisBase-24.2.26]:pycolumnarprototype > cmake --build build --clean-first
-- Setting ATLAS specific build flags
-- checker_gccplugins library not found
-- Using the LCG modules without setting up a release
-- $<BUILD_INTERFACE:/usr/AnalysisBaseExternals/24.2.26/InstallArea/x86_64-centos7-gcc11-opt/lib/libpython3.9.so>;$<INSTALL_INTERFACE:/usr/AnalysisBaseExternals/24.2.26/InstallArea/x86_64-centos7-gcc11-opt/lib/libpython3.9.so>
-- Configuring ATLAS project with name "PyColumnarPrototypeDemo" and version "1.0.0"
...
[100%] Linking CXX shared library ../x86_64-centos7-gcc11-opt/lib/libColumnarPrototypeDict.so
Detaching debug info of libColumnarPrototypeDict.so into libColumnarPrototypeDict.so.dbg
[100%] Built target ColumnarPrototypeDict
[100%] Built target Package_ColumnarPrototype
(venv) [bash][feickert AnalysisBase-24.2.26]:pycolumnarprototype > PYTHONPATH="$(dirname $(find . -type f -iname "PyColumnarPrototype*.so")):${PYTHONPATH}" python3 -c 'import PyColumnarPrototype; print(f"{PyColumnarPrototype.column_maker()=}")'
PyColumnarPrototype.column_maker()=array([1543080.6, 7524391.5], dtype=float32) ✅ I'll count that as "good enough" until we can figure out how to manage Git permissions here, and I think the SSL team can give us plenty of pointers given that they know how to do this for Coffea-casa. 👍 |
The dev branch is https://gitlab.cern.ch/gstark/pycolumnarprototype/-/tree/py_el_tool_test?ref_type=heads, the PHYSLITE data in
As for the datasets we will move to I'll paste here Vangelis' comment from another thread:
|
This is now done via https://gitlab.cern.ch/gstark/pycolumnarprototype/-/issues/1#note_7305996. I'll let @mvigl do a double check on this but I'm calling this complete as it ran from top top bottom with a "restart and run all" in the container on the UChicago AF. ✅ |
I have already made sure that data can be found at MWT2_LOCALGROUPDISK and MC partly at MWT2 and all of it at BNL. |
Okay, so @ivukotic has been kind enough to agree to work on setting up a kubernets cluster for us this week where he'll do some tests and see if he can get the notebook to scale out across all the data that he's transfered to the So this might also address
We can then take these results to ATLAS and further motivate
👍 |
Recursive cloning without authentication
has now been fixed. Example:(venv) [bash][atlas AnalysisBase-24.2.26]:workdir > git clone --branch py_el_tool_test --recurse-submodules https://gitlab.cern.ch/gstark/pycolumnarprototype.git
Cloning into 'pycolumnarprototype'...
remote: Enumerating objects: 515, done.
remote: Counting objects: 100% (152/152), done.
remote: Compressing objects: 100% (146/146), done.
remote: Total 515 (delta 67), reused 16 (delta 6), pack-reused 363
Receiving objects: 100% (515/515), 2.26 MiB | 1.67 MiB/s, done.
Resolving deltas: 100% (283/283), done.
Submodule 'src/columnarprototype' (https://gitlab.cern.ch/krumnack/columnarprototype.git) registered for path 'src/ColumnarPrototype'
Submodule 'src/nanobind' (https://github.com/wjakob/nanobind) registered for path 'src/nanobind'
Cloning into 'src/ColumnarPrototype'...
remote: Enumerating objects: 370, done.
remote: Counting objects: 100% (308/308), done.
remote: Compressing objects: 100% (109/109), done.
remote: Total 370 (delta 199), reused 307 (delta 199), pack-reused 62
Receiving objects: 100% (370/370), 104.55 KiB | 0 bytes/s, done.
Resolving deltas: 100% (229/229), done.
Submodule path 'src/ColumnarPrototype': checked out '1e1537fc9669fe7425e74313200dd8bd3e4e3c64'
Cloning into 'src/nanobind'...
remote: Enumerating objects: 5502, done.
remote: Counting objects: 100% (1855/1855), done.
remote: Compressing objects: 100% (268/268), done.
remote: Total 5502 (delta 1646), reused 1681 (delta 1568), pack-reused 3647
Receiving objects: 100% (5502/5502), 2.00 MiB | 0 bytes/s, done.
Resolving deltas: 100% (3981/3981), done.
Submodule path 'src/nanobind': checked out 'c7bd406ef758c933eaf4b2d03d6d81b54bd9ad03'
Submodule 'ext/robin_map' (https://github.com/Tessil/robin-map) registered for path 'ext/robin_map'
Cloning into 'ext/robin_map'...
remote: Enumerating objects: 1098, done.
remote: Counting objects: 100% (152/152), done.
remote: Compressing objects: 100% (57/57), done.
remote: Total 1098 (delta 105), reused 115 (delta 82), pack-reused 946
Receiving objects: 100% (1098/1098), 875.43 KiB | 0 bytes/s, done.
Resolving deltas: 100% (752/752), done.
Submodule path 'src/nanobind/ext/robin_map': checked out '68ff7325b3898fca267a103bad5c509e8861144d'
(venv) [bash][atlas AnalysisBase-24.2.26]:workdir > c.f. https://gitlab.cern.ch/gstark/pycolumnarprototype/-/issues/2#note_7306579 for more details. Using this I've added
or # Ensure importable
try:
import PyColumnarPrototype
except ModuleNotFoundError as err:
import sys
from pathlib import Path
# position 1 of sys.path will be after cwd and before the activated virtual environment's site-packages
sys.path.insert(1, str(next(Path().cwd().glob("**/PyColumnarPrototype*.so")).parent))
import PyColumnarPrototype with the built version at import time. |
@mvigl can you please let me know how do I change Z->ee notebook in order to run across more than 1 mc and 1 data file? |
I don't have a clear answer yet, sry - I need to look more in detail tomorrow. Changing the 'Get data' section to something like this could be a starting point.
We don't want to specify the variables that are accessed (which is a longer list) but for some reason it doesn't work if I don't.
|
it is not realistic to have 50TB locally. I need a way to give it a lot of files accessible via xroot... |
I think that should be doable according to documentation: https://uproot.readthedocs.io/en/latest/uproot._dask.dask.html |
@alexander-held points out that for the accessing local files you can use a list of xrootd-accessible URIs like is done in the IRIS-HEP Analysis Grand Challenge notebooks. Probably the most relevant bit is the |
The issue of being able to open and read files over |
@mvigl @ivukotic The variables are required to be specified as not all of the tree branches in the xAOD can be converted to arrays using Awkward (c.f. scikit-hep/uproot5#1040 (comment)). So this means that you'd need something like the following modified example that @ivukotic gave (also from scikit-hep/uproot5#1040 (comment)): import uproot
xc = "root://xcache.af.uchicago.edu:1094//"
fname_data = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
)
fname_dat1 = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/6c/67/DAOD_PHYSLITE.34858087._000002.pool.root.1"
)
tree_name = "CollectionTree"
branches = [
"AnalysisElectronsAuxDyn.pt",
"AnalysisElectronsAuxDyn.eta",
"AnalysisElectronsAuxDyn.phi",
"AnalysisElectronsAuxDyn.charge",
]
tree_data = uproot.iterate(
{fname_data: tree_name, fname_dat1: tree_name},
expressions=branches,
step_size=1000, # step_size here just as a throwaway example
)
print(next(tree_data)) # example to show the iterator works which you would then generalize further with different |
I think (hope) that we would not need to say ahead of time what we need: in principle that info is in the graph (determined by what we want to ultimately obtain) and ideally uproot should only try to access what is strictly required? Is the issue that we cannot read a PHYSLITE file at all because somehow uproot trips over some branches, even if they need to not be read technically? |
It appears yes, as
it reads the whole tree
as revising @ivukotic's original example and taking some of the points I demonstrated in scikit-hep/uproot5#1040 (comment) we can get # test.py
import uproot
xc = "root://xcache.af.uchicago.edu:1094//"
fname_data = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
)
fname_dat1 = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/6c/67/DAOD_PHYSLITE.34858087._000002.pool.root.1"
)
tree_data = uproot.iterate({fname_data: "CollectionTree", fname_dat1: "CollectionTree"})
next(tree_data) # trigger error and by poking at things interactively with (venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python -i test.py
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2478, in _awkward_check
interpretation.awkward_form(self.file)
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 111, in awkward_form
return self._model.awkward_form(self._branch.file, context)
File "/venv/lib/python3.9/site-packages/uproot/model.py", line 684, in awkward_form
raise uproot.interpretation.objects.CannotBeAwkward(
uproot.interpretation.objects.CannotBeAwkward: xAOD::MissingETAssociationMap_v1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/analysis/test.py", line 14, in <module>
next(tree_data) # trigger error
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 191, in iterate
for item in hasbranches.iterate(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 1076, in iterate
_ranges_or_baskets_to_arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3041, in _ranges_or_baskets_to_arrays
branchid_to_branch[cache_key]._awkward_check(interpretation)
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2480, in _awkward_check
raise ValueError(
ValueError: cannot produce Awkward Arrays for interpretation AsObjects(Unknown_xAOD_3a3a_MissingETAssociationMap_5f_v1) because
xAOD::MissingETAssociationMap_v1
instead, try library="np" rather than library="ak" or globally set uproot.default_library
in file root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1
in object /CollectionTree;1:METAssoc_AnalysisMET we can see that just by trying to get the Awkward array representation of the offending >>> file = uproot.open(fname_data)
>>> tree = file["CollectionTree"]
>>> branch_names = tree.keys()
>>> "METAssoc_AnalysisMET" in branch_names
True
>>> tree["METAssoc_AnalysisMET"]
<TBranchElement 'METAssoc_AnalysisMET' at 0x7ff948c1af40>
>>> tree["METAssoc_AnalysisMET"].array()
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2478, in _awkward_check
interpretation.awkward_form(self.file)
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 111, in awkward_form
return self._model.awkward_form(self._branch.file, context)
File "/venv/lib/python3.9/site-packages/uproot/model.py", line 684, in awkward_form
raise uproot.interpretation.objects.CannotBeAwkward(
uproot.interpretation.objects.CannotBeAwkward: xAOD::MissingETAssociationMap_v1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 1811, in array
_ranges_or_baskets_to_arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3041, in _ranges_or_baskets_to_arrays
branchid_to_branch[cache_key]._awkward_check(interpretation)
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 2480, in _awkward_check
raise ValueError(
ValueError: cannot produce Awkward Arrays for interpretation AsObjects(Unknown_xAOD_3a3a_MissingETAssociationMap_5f_v1) because
xAOD::MissingETAssociationMap_v1
instead, try library="np" rather than library="ak" or globally set uproot.default_library
in file root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1
in object /CollectionTree;1:METAssoc_AnalysisMET
>>> So I think we need to go and look at this and then talk with the Awkward team about what can be done here if anything. |
..even with some helpful tips from @lgray
import warnings
import uproot
def _remove_not_interpretable(branch):
if isinstance(
branch.interpretation, uproot.interpretation.identify.uproot.AsGrouped
):
for name, interpretation in branch.interpretation.subbranches.items():
if isinstance(
interpretation, uproot.interpretation.identify.UnknownInterpretation
):
warnings.warn(
f"Skipping {branch.name} as it is not interpretable by Uproot"
)
return False
if isinstance(
branch.interpretation, uproot.interpretation.identify.UnknownInterpretation
):
warnings.warn(f"Skipping {branch.name} as it is not interpretable by Uproot")
return False
try:
_ = branch.interpretation.awkward_form(None)
except uproot.interpretation.objects.CannotBeAwkward:
warnings.warn(
f"Skipping {branch.name} as it is it cannot be represented as an Awkward array"
)
return False
else:
return True
xc = "root://xcache.af.uchicago.edu:1094//"
fname_data = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
)
fname_dat1 = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/6c/67/DAOD_PHYSLITE.34858087._000002.pool.root.1"
)
tree_data = uproot.iterate(
{fname_data: "CollectionTree", fname_dat1: "CollectionTree"},
filter_branch=_remove_not_interpretable,
)
next(tree_data) # trigger error we still hit errors ...
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 742, in basket_array
output = data.view(dtype).reshape((-1, *shape))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/analysis/test.py", line 49, in <module>
next(tree_data) # trigger error
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 191, in iterate
for item in hasbranches.iterate(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 1076, in iterate
_ranges_or_baskets_to_arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3139, in _ranges_or_baskets_to_arrays
uproot.source.futures.delayed_raise(*obj)
File "/venv/lib/python3.9/site-packages/uproot/source/futures.py", line 38, in delayed_raise
raise exception_value.with_traceback(traceback)
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3081, in basket_to_array
basket_arrays[basket.basket_num] = interpretation.basket_array(
File "/venv/lib/python3.9/site-packages/uproot/interpretation/jagged.py", line 196, in basket_array
content = self._content.basket_array(
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 745, in basket_array
raise ValueError(
ValueError: basket 0 in tree/branch /CollectionTree;1:METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink has the wrong number of bytes (25086) for interpretation AsStridedObjects(Model_ElementLink_3c_DataVector_3c_xAOD_3a3a_Jet_5f_v1_3e3e__v1)
in file root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1 |
Trying out import uproot
tree = uproot.dask(
{"DAOD_PHYSLITE.34857549._000351.pool.root.1": "CollectionTree"},
filter_branch=_remove_not_interpretable
)
delayed_arr = tree["AnalysisElectronsAuxDyn.pt"]
print(delayed_arr.compute()) Full file with XRootD URI:import warnings
import uproot
def _remove_not_interpretable(branch):
if isinstance(
branch.interpretation, uproot.interpretation.identify.uproot.AsGrouped
):
for name, interpretation in branch.interpretation.subbranches.items():
if isinstance(
interpretation, uproot.interpretation.identify.UnknownInterpretation
):
warnings.warn(
f"Skipping {branch.name} as it is not interpretable by Uproot"
)
return False
if isinstance(
branch.interpretation, uproot.interpretation.identify.UnknownInterpretation
):
warnings.warn(f"Skipping {branch.name} as it is not interpretable by Uproot")
return False
try:
_ = branch.interpretation.awkward_form(None)
except uproot.interpretation.objects.CannotBeAwkward:
warnings.warn(
f"Skipping {branch.name} as it is it cannot be represented as an Awkward array"
)
return False
else:
return True
xc = "root://xcache.af.uchicago.edu:1094//"
file_uri = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
)
tree = uproot.dask(
{file_uri: "CollectionTree"}, filter_branch=_remove_not_interpretable
)
delayed_arr = tree["AnalysisElectronsAuxDyn.pt"]
print(delayed_arr.compute()) Running: (venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python test.py
...
[[], [], [], [], [], [], ..., [], [], [7.25e+03], [3.03e+04], [1.49e+04], []] |
@alexander-held Hm. But if we use # example.py
import uproot
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
xc = "root://xcache.af.uchicago.edu:1094//"
file_uri = (
xc
+ "root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1"
)
factory = NanoEventsFactory.from_root(
{file_uri: "CollectionTree"}, schemaclass=PHYSLITESchema, permit_dask=True
)
events = factory.events()
events.compute() # ValueError (venv) [bash][atlas AnalysisBase-24.2.26]:analysis > python example.py
...
/venv/lib/python3.9/site-packages/coffea/nanoevents/factory.py:51: UserWarning: Skipping EventInfoAuxDyn.hardScatterVertexLink as it is not interpretable by Uproot
warnings.warn(f"Skipping {branch.name} as it is not interpretable by Uproot")
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 742, in basket_array
output = data.view(dtype).reshape((-1, *shape))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/analysis/example.py", line 15, in <module>
events.compute() # ValueError
File "/venv/lib/python3.9/site-packages/dask/base.py", line 342, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/venv/lib/python3.9/site-packages/dask/base.py", line 628, in compute
results = schedule(dsk, keys, **kwargs)
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 1101, in __call__
return self.read_tree(ttree, start, stop)
File "/venv/lib/python3.9/site-packages/uproot/_dask.py", line 919, in read_tree
container[buffer_key] = mapping[buffer_key]
File "/venv/lib/python3.9/site-packages/coffea/nanoevents/factory.py", line 121, in __getitem__
return self._mapping[self._func(index)]
File "/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/base.py", line 98, in __getitem__
self.extract_column(
File "/venv/lib/python3.9/site-packages/coffea/nanoevents/mapping/uproot.py", line 161, in extract_column
return columnhandle.array(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 1811, in array
_ranges_or_baskets_to_arrays(
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3139, in _ranges_or_baskets_to_arrays
uproot.source.futures.delayed_raise(*obj)
File "/venv/lib/python3.9/site-packages/uproot/source/futures.py", line 38, in delayed_raise
raise exception_value.with_traceback(traceback)
File "/venv/lib/python3.9/site-packages/uproot/behaviors/TBranch.py", line 3081, in basket_to_array
basket_arrays[basket.basket_num] = interpretation.basket_array(
File "/venv/lib/python3.9/site-packages/uproot/interpretation/jagged.py", line 196, in basket_array
content = self._content.basket_array(
File "/venv/lib/python3.9/site-packages/uproot/interpretation/objects.py", line 745, in basket_array
raise ValueError(
ValueError: basket 3 in tree/branch /CollectionTree;1:METAssoc_AnalysisMETAux./METAssoc_AnalysisMETAux.jetLink has the wrong number of bytes (12196) for interpretation AsStridedObjects(Model_ElementLink_3c_DataVector_3c_xAOD_3a3a_Jet_5f_v1_3e3e__v1)
in file root://xcache.af.uchicago.edu:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlaslocalgroupdisk/rucio/data18_13TeV/df/a4/DAOD_PHYSLITE.34858087._000001.pool.root.1
(venv) [bash][atlas AnalysisBase-24.2.26]:analysis > |
I think a lot of these errors are related to broken reading of ElementLink branches in awkward forth (scikit-hep/uproot5#951). With the PHYSLITE schema it can also happen that it wants to read an ElementLink branch even if you request something different in case an ElementLink branch appears first in the list and will therefore be used to get the offsets from (therefore you may see errors even when requesting e.g. only |
@nikoladze Can you either share this with us or help contribute this to Coffea? This is blocking for us, and so @alexander-held and I would both like to try to follow up on getting this working (sooner than later). |
If we just want to scale Dask + PHYSLITE (and not do much physics with it) I think we can use |
Here is a version that just uses import dask_awkward as dak
import uproot
import hist.dask
import coffea.nanoevents
import vector
import warnings
warnings.filterwarnings("ignore")
vector.register_awkward()
delayed_hist = hist.dask.Hist.new.Reg(120, 0, 120, label="mass [GeV]").Weight()
tree = uproot.dask(
[{"DAOD_PHYSLITE.34857549._000351.pool.root.1": "CollectionTree"}],
step_size=20_000,
filter_branch=coffea.nanoevents.factory._remove_not_interpretable,
)
# build electron object
el_p4 = dak.zip(
{
"pt": tree["AnalysisElectronsAuxDyn.pt"],
"eta": tree["AnalysisElectronsAuxDyn.eta"],
"phi": tree["AnalysisElectronsAuxDyn.phi"],
"mass": tree["AnalysisElectronsAuxDyn.m"],
},
with_name="Momentum4D",
)
# select 2-electron events
evt_filter = dak.num(el_p4) == 2
el_p4 = el_p4[evt_filter]
# fill histogram with di-electron system invariant mass and plot
delayed_hist.fill(dak.sum(el_p4, axis=-1).mass / 1_000)
delayed_hist.compute().plot() This should scale fine to multiple files ( |
coffea version of that would be import warnings
import awkward as ak
import hist.dask
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
warnings.filterwarnings("ignore")
delayed_hist = hist.dask.Hist.new.Reg(120, 0, 120, label="mass [GeV]").Weight()
def filter_name(name):
return name in [
"AnalysisElectronsAuxDyn.pt",
"AnalysisElectronsAuxDyn.eta",
"AnalysisElectronsAuxDyn.phi",
"AnalysisElectronsAuxDyn.m",
]
events = factory = NanoEventsFactory.from_root(
{"DAOD_PHYSLITE.34857549._000351.pool.root.1": "CollectionTree"},
schemaclass=PHYSLITESchema,
permit_dask=True,
uproot_options=dict(filter_name=filter_name),
).events()
el_p4 = events.Electrons
# select 2-electron events
evt_filter = ak.num(el_p4) == 2
el_p4 = el_p4[evt_filter]
# fill histogram with di-electron system invariant mass and plot
delayed_hist.fill((el_p4[:, 0] + el_p4[:, 1]).mass / 1_000)
delayed_hist.compute().plot() by using |
that works too. both versions work in a bit more than 4 min over 50 files. That's very slow given amount of data read... |
Is this 4 minutes for a single thread or parallelized across many cores? I could imagine that there is a non-negligible constant overhead of |
@ivukotic @alexander-held @nikoladze to circle back to this, @ivukotic what are you running exactly? Can you share the code snippet so that we can look at it? This is 4 minutes for 50 files with a local Dask client? |
If you're executing exactly the code given by @nikoladze above then it will be using the threaded scheduler (the default). Uproot is very gil heavy and not great with the threaded executor. You should spawn a |
That's 100% why it's so slow if you're using more than one core. |
modified version: import warnings
import awkward as ak
import hist.dask
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
from distributed import Client
warnings.filterwarnings("ignore")
delayed_hist = hist.dask.Hist.new.Reg(120, 0, 120, label="mass [GeV]").Weight()
def filter_name(name):
return name in [
"AnalysisElectronsAuxDyn.pt",
"AnalysisElectronsAuxDyn.eta",
"AnalysisElectronsAuxDyn.phi",
"AnalysisElectronsAuxDyn.m",
]
if __name__ == "__main__":
client = Client() # or do with Client() as client:
events = factory = NanoEventsFactory.from_root(
{"DAOD_PHYSLITE.34857549._000351.pool.root.1": "CollectionTree"},
schemaclass=PHYSLITESchema,
permit_dask=True,
uproot_options=dict(filter_name=filter_name),
).events()
el_p4 = events.Electrons
# select 2-electron events
evt_filter = ak.num(el_p4) == 2
el_p4 = el_p4[evt_filter]
# fill histogram with di-electron system invariant mass and plot
delayed_hist.fill((el_p4[:, 0] + el_p4[:, 1]).mass / 1_000)
delayed_hist.compute().plot() |
Now we have a kubernetes dask cluster that can scale to 100 cores. My test was not using it at all.
Will test your code now.
Ilija
From: Lindsey Gray ***@***.***>
Sent: Wednesday, December 13, 2023 11:03
To: usatlas/analysisbase-dask ***@***.***>
Cc: Ilija Vukotic ***@***.***>; Mention ***@***.***>
Subject: Re: [usatlas/analysisbase-dask] Test that the container running on the UChicago AF JupyterHub is able to control worker nodes through Dask (Issue #4)
If you're executing exactly the code given by @nikoladze<https://urldefense.com/v3/__https:/github.com/nikoladze__;!!BpyFHLRN4TMTrA!4nuzSlgTgxu3wc66Le6FhGOw3uc5fGTd6-1liq47LnhmYgz23iEAX4FAenZ5_Q4gYCaJNP1wKRMwDj1Jbg1Gp4iiPxKM$> above then it will be using the threaded scheduler (the default). Uproot is very gil heavy and not great with the threaded executor.
You should spawn a distributed.Client instance for local scale testing.
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/usatlas/analysisbase-dask/issues/4*issuecomment-1854366589__;Iw!!BpyFHLRN4TMTrA!4nuzSlgTgxu3wc66Le6FhGOw3uc5fGTd6-1liq47LnhmYgz23iEAX4FAenZ5_Q4gYCaJNP1wKRMwDj1Jbg1Gpz1HiaRH$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AANICGHJT6FMKCC73PCEK4TYJHNTNAVCNFSM6AAAAAA7DJYHWOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJUGM3DMNJYHE__;!!BpyFHLRN4TMTrA!4nuzSlgTgxu3wc66Le6FhGOw3uc5fGTd6-1liq47LnhmYgz23iEAX4FAenZ5_Q4gYCaJNP1wKRMwDj1Jbg1Gp90cbe4r$>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
That's happening on the client side so it has nothing to do with k8s, does it work with a local client? |
Sorry for the bad formatting. Now should be fine. Code using localcluster: from dask.distributed import Client
import warnings
import awkward as ak
import hist.dask
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
client=Client()
tree_name = "CollectionTree"
xc='root://xcache.af.uchicago.edu:1094//'
def get_data_dict(n=10):
# data18_13TeV:data18_13TeV.00348885.physics_Main.deriv.DAOD_PHYSLITE.r13286_p4910_p5855_tid34857549_00
r={}
with open("data.txt",'r') as mc:
ls=mc.readlines()
print(len(ls))
for i in range(0, min(n,len(ls))):
r[xc+ls[i].strip()]=tree_name
return r
warnings.filterwarnings("ignore")
delayed_hist = hist.dask.Hist.new.Reg(120, 0, 120, label="mass [GeV]").Weight()
def filter_name(name):
return name in [
"AnalysisElectronsAuxDyn.pt",
"AnalysisElectronsAuxDyn.eta",
"AnalysisElectronsAuxDyn.phi",
"AnalysisElectronsAuxDyn.m",
]
infile=get_data_dict(1)
print(infile)
events = factory = NanoEventsFactory.from_root(
infile,
schemaclass=PHYSLITESchema,
permit_dask=True,
uproot_options=dict(filter_name=filter_name),
).events()
el_p4 = events.Electrons
# select 2-electron events
evt_filter = ak.num(el_p4) == 2
el_p4 = el_p4[evt_filter]
# fill histogram with di-electron system invariant mass and plot
delayed_hist.fill((el_p4[:, 0] + el_p4[:, 1]).mass / 1_000)
delayed_hist.compute().plot() gives: 332
{'root://xcache.af.uchicago.edu:1094//root://dcgftp.usatlas.bnl.gov:1094//pnfs/usatlas.bnl.gov/LOCALGROUPDISK/rucio/data18_13TeV/04/9a/DAOD_PHYSLITE.34857549._000001.pool.root.1': 'CollectionTree'}
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[1], line 46
37 print(infile)
39 events = factory = NanoEventsFactory.from_root(
40 infile,
41 schemaclass=PHYSLITESchema,
42 permit_dask=True,
43 uproot_options=dict(filter_name=filter_name),
44 ).events()
---> 46 el_p4 = events.Electrons
48 # select 2-electron events
49 evt_filter = ak.num(el_p4) == 2
File /venv/lib/python3.9/site-packages/dask_awkward/lib/core.py:1309, in Array.__getattr__(self, attr)
1306 elif self._maybe_behavior_property(attr):
1307 return self._call_behavior_property(attr)
-> 1309 raise AttributeError(f"{attr} not in fields.")
1310 try:
1311 # at this point attr is either a field or we'll have to
1312 # raise an exception.
1313 return self.__getitem__(attr)
AttributeError: Electrons not in fields. |
@usatlas why are you deleting my posts? |
@ivukotic you appear to be using a very old version of dask if it is not complaining at you about being in a To say anything more I'd have to have access to the file you're using - the tests we run in coffea that check this collection specifically do pass on the test sample that we have. It may be prudent to upgrade to the (very) recent release of coffea 2023 (will be on pypi in ~30 minutes), just so all the versions of dependencies are lined up well. |
Why are you applying your own filter for branches? Is there something missing in the coffea version that removes only uninterpretable branches that fails for your PHYSLITE files? If so please submit an issue and provide relevant testing data (even a 40 events file, which I'm sure is shareable, is enough), we would like to continue to make sure our code work for a variety of experiments. You should not need to downselect so heavily since reading is delayed. |
@lgray I deleted part of the thread with badly formatted messages. This was executed in JupyterLab and the version is: |
@ivukotic I know this may sound like a strange request, but please do not delete my posts in the future. I find it extremely distasteful (even if I understand why you did it in this case). |
Hi @lgray and thanks for jumping in to help! I'm not sure what @ivukotic is after here but I can give you some feedback on few things I tried with the snippets @alexander-held and @nikoladze posted above and the latest (released) version of So the snippet from @alexander-held (without
We do that because some of the PHYSLITE branches have broken reading in awkward forth (scikit-hep/uproot5#951). Also this option should help loading less data from the disk to memory, essentially only the columns needed, which are user-defined here by the Another weird behaviour I find is that # works
events = factory = NanoEventsFactory.from_root(
{file_path: "CollectionTree"},
schemaclass=PHYSLITESchema,
delayed=True,
uproot_options=dict(filter_name=filter_name)
).events()
# fails
events = factory = NanoEventsFactory.from_root(
{file_path: "CollectionTree"},
schemaclass=PHYSLITESchema,
delayed=False,
uproot_options=dict(filter_name=filter_name)
).events()
# works
events = factory = NanoEventsFactory.from_root(
{file_path: "CollectionTree"},
schemaclass=PHYSLITESchema,
delayed=True,
uproot_options=dict(filter_name=filter_name)
).events().compute() Any input here is much appreciated and I can of course follow up opening a |
I believe this fails because |
current workaround would be to replace |
Thank you @nikoladze, this works for me! I didn't know about this option as it's not documented. I will keep an eye on the issue you opened. |
Okay, so while I am still confused about a few things (adaptive auto scaling, how to connect to cluster dashboards that I create myself, why we still need the What helped was when @fengpinghu pointed out to me
👍 The following Python should work for anyone who has access to the UChicago Jupyter Lab cluster and uses the import time
import warnings
import awkward as ak
import hist.dask
from coffea.nanoevents import NanoEventsFactory, PHYSLITESchema
from dask_kubernetes.operator import KubeCluster, make_cluster_spec
from distributed import Client
warnings.filterwarnings("ignore")
# Setup a KubeCluster
spec = make_cluster_spec(
name="analysis-base",
image="hub.opensciencegrid.org/usatlas/analysis-dask-base:latest",
)
cluster = KubeCluster(custom_cluster_spec=spec)
# This doesn't seem to work as expected and scale up as work starts
cluster.adapt(minimum=1, maximum=50)
print(f"Dashboard: {cluster.dashboard_link}") # Dashboard link won't open (404s)
client = Client(cluster)
# Without filter_name then delayed_hist.compute() will error with
# AttributeError: 'NoneType' object has no attribute 'reset_active_node'
def filter_name(name):
return name in (
"AnalysisElectronsAuxDyn.pt",
"AnalysisElectronsAuxDyn.eta",
"AnalysisElectronsAuxDyn.phi",
"AnalysisElectronsAuxDyn.m",
)
def get_data_dict(
n=10,
read_file="mc.txt",
tree_name="CollectionTree",
xc="root://xcache.af.uchicago.edu:1094//",
):
r = {}
with open(read_file, "r") as readfile:
ls = readfile.readlines()
_range_max = min(n, len(ls))
print(f"Processing {_range_max} out of {len(ls)} files")
for i in range(0, _range_max):
r[xc + ls[i].strip()] = tree_name
return r
file_uris = get_data_dict(100)
events = NanoEventsFactory.from_root(
file_uris,
schemaclass=PHYSLITESchema,
uproot_options=dict(filter_name=filter_name),
delayed=True,
).events()
# Lay out the event selection logic
el_p4 = events.Electrons
# select 2-electron events
evt_filter = ak.num(el_p4) == 2
el_p4 = el_p4[evt_filter]
# Now scale across the KubeCluster to multiple workers
cluster.scale(50)
# ensure cluster has finished scaling
time.sleep(30)
print(cluster)
# fill histogram with di-electron system invariant mass and plot
delayed_hist = hist.dask.Hist.new.Reg(120, 0, 120, label="mass [GeV]").Weight()
delayed_hist.fill((el_p4[:, 0] + el_p4[:, 1]).mass / 1_000)
# This takes about:
# 24 seconds for 50 files and 10 workers
# 13 seconds for 50 files and 25 workers
# 11 seconds for 50 files and 50 workers
# 12 seconds for 50 files and 100 workers
_start = time.time()
result_hist = delayed_hist.compute()
_stop = time.time()
print(
f"Cluster with {cluster.n_workers} workers finished in {_stop-_start:.2f} seconds."
)
delayed_hist.visualize()
artists = result_hist.plot()
fig = artists[0][0].get_figure()
ax = fig.get_axes()[0]
ax.set_ylabel("Count")
fig.savefig("mass.png")
client.close()
cluster.close() This further addresses points on @lukasheinrich's outline comment #4 (comment). So I think we can move forward with taking @mvigl's notebook and then trying it in the distributed manner (c.f. https://gitlab.cern.ch/gstark/pycolumnarprototype/-/issues/3 for start of this). |
Hi all, I can run the code above on the UChicago Jupyter Lab cluster but then if one tries to access
one gets the following error after
which is the same seen here https://gitlab.cern.ch/gstark/pycolumnarprototype/-/issues/3 when running the tools and the whole reason why they fail. So I think this is what we should try to figure out. |
We want to be able to run a workflow that runs a distributed analysis with
dask.distributed
. This means that arun inside of a Jupyter Notebook in the interactive JupyterLab session should be able to send jobs to the worker nodes on the AF.
The text was updated successfully, but these errors were encountered: