-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic file I/O with Uproot -> Writing TTrees -> different branch types #46
Comments
Thanks for opening your first issue here 💖! If you have any questions, feel free to mention one of the conveners, previous contributors, or attend our weekly meeting (see https://hepsoftwarefoundation.org/workinggroups/training.html). Also, sometimes issues go unnoticed, so don't hesitate to @mention some of us, if we do not come back to you within a few days. |
Actually, If the branch type ( About Again, explaining all of that in an introductory tutorial would probably be too much information. I don't have an idea of what is the best pedagogy for these two things, array-type conversion and file closing, whether it's better to explicitly convert array types and explicitly close files, encouraging the assumption that it is necessary to do these things, or add a comment about the fact that they're not necessary, potentially burying the presentation in too many details, or what. |
That's a real problem because it creates "incompatible" trees. What concerns the
But then you only "open" it, and you do not show how to "close" it in the end (when working interactively). Of course, if you are sure that |
Uproot does convert the data to what was requested in the uproot.WritableDirectory.mktree declaration: >>> import numpy as np
>>> import uproot
>>> outfile = uproot.recreate("/tmp/whatever.root")
>>> tree2 = outfile.mktree("tree2", {"x": np.int32, "y": np.float32})
>>> tree2.extend({"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)})
>>> tree2.show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
x | int32_t | AsDtype('>i4')
y | float | AsDtype('>f4') So the That could be worth a mention, but an introductory tutorial can't get into all of the details without overwhelming first-timers. Perhaps a more efficient way to give new users a good working knowledge is to refer to the >>> outfile["tree1"] = {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
>>> outfile["tree1"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
x | int64_t | AsDtype('>i8')
y | double | AsDtype('>f8') method as "quick and dirty" and the This can also apply to opening and closing files using the For some file I/O libraries, this can leave output files in an invalid state, but one of Uproot's deliberate features is that all file-state changing happens in the writing functions (e.g. Therefore, for Uproot, the only danger from not closing files is the possibility of running out of open file handles. That applies equally to file handles for reading and file handles for writing.
If the Python process ends due to Ctrl-C, a Python exception, or a segfault, then the file that is being written to will be in a valid state if and only if that Ctrl-C, Python exception, or segfault happened between file-writing statements. If it happened in the middle of a file-writing statement, then there's no guarantee, though we do try to make updates in a way that will keep it valid for as long as possible. For instance, if we're adding a new histogram or TTree to the file, we'll write the data for the histogram or TTree first, then declare that block valid in ROOT's TFreeSegments so that ROOT recognizes it as good data, and then add it to the TDirectory. If writing is interrupted partway through this process, you won't see the histogram or TTree in the TDirectory, try to read it, and get garbage; you'll either not see it in the TDirectory or it will be fully formed. But while this is as cautious as possible, some of these operations, like updating a TDirectory, are not atomic and it's still possible to be interrupted in the middle and get an invalid file. For instance, to add a TKey to a TDirectory, hundreds of bytes must be written, and the file-writing could be interrupted in the middle of this step. So the only categorical statement I can make is that the file is in a valid state if the process is interrupted between file-writing statements. If, for instance, (The argument against frequent calls to |
I wasn't clear again, sorry . |
BTW. Of course, in the tutorial, both methods of creating branches should be shown. However, I really think one also needs there a dedicated paragraph (in the tutorial) that explicitly warns people that branches created with "automatically deduced" types may result in different ROOT/C++ types when the same Python macro is used on different machines (which is not the case if one explicitly uses numpy types). |
In the "
Basic file I/O with Uproot
" chapter, in the "Writing TTrees
" section, you show two ways to create/write trees.However, their
"x"
branches will be different.This creates an
"x/L"
branch:output1["tree1"] = {"x": np.random.randint(0, 10, 1000000), "y": np.random.normal(0, 1, 1000000)}
This creates an
"x/I"
branch:output1.mktree("tree2", {"x": np.int32, "y": np.float64})
Maybe it would be a good idea to use:
output1.mktree("tree2", {"x": np.int64, "y": np.float64})
BTW. You do not tell people to execute
output1.close()
in the end.The text was updated successfully, but these errors were encountered: