Replies: 1 comment
-
|
Here's the implementation of It doesn't do much, just iterates over the files, reads the requested data, and calls If you want to avoid opening the same files multiple times, you'd have to forsake the convenience of It doesn't make sense to try to cover a lot of use-cases in |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi experts,
Let's say I am trying to access multiple trees from the same set of files which can fit into memory, call it
files. The documentation suggests that for reading sets of files,uproot.concatenateis the way to go. There are multiple ways I can think to do this:files_tree1andfiles_tree2wherefiles_treeX = [f+":treeX" for f in files], pass each list touproot.concatenatein a separate call:This I think is inefficient, since the same files wil be opened twice. Maybe caching would help, but feels like it can be done better.
files_all = files_tree1 + files_tree2and combined branches list:all_branches = tree1_branches + tree2_branches]and pass these to one uproot call, hoping uproot magic will know what to do. This surprisngly (at least to me) did not crash. Uproot just produced anawkwardarray which is aunionof 2 awkward arraystree1_branches(N entries) andtree2_branches(M entries), withak.type(data)giving :so calling
data[-1]gives the last entry fromtree2anddata[0]gives the first entry oftree1. I guess it can be expected behaviour from theglobal_indexthatuproot.concatenate()seems to keep track of (or maybe I'm completely off)?Anyway, I think we still open each file twice, which is non-ideal.
uproot.openon each of them, then access the keys from the structure we get back. This way each file is opened once.My questions are:
uproot.concatenatedo in the background that makes it more performant (if that's even true) thanuproot.openinside a loop over files? What I can see quickly from a skim over the source code is thatconcatenateloops over the files one by one, opening them asReadOnlyFilethen grabbing the data, but I am probably missing something subtle in the steps.Beta Was this translation helpful? Give feedback.
All reactions