feat!: remove protocol for form transformation #382

agoose77 · 2023-10-08T11:00:27Z

This should be done in uproot, I think.

codecov-commenter · 2023-10-08T11:02:13Z

Codecov Report

Merging #382 (05b3ce0) into main (16b5189) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #382      +/-   ##
==========================================
- Coverage   93.23%   93.21%   -0.02%     
==========================================
  Files          24       24              
  Lines        3043     3038       -5     
==========================================
- Hits         2837     2832       -5     
  Misses        206      206

Files	Coverage Δ
src/dask_awkward/__init__.py	`100.00% <ø> (ø)`
src/dask_awkward/lib/__init__.py	`100.00% <ø> (ø)`
src/dask_awkward/lib/io/io.py	`96.00% <100.00%> (-0.08%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

douglasdavis · 2023-10-09T15:44:54Z

Makes sense to me, since this was @lgray's I'll ask him to give it a green light as well

lgray · 2023-10-09T15:47:50Z

I think it's OK - but we need to find a place for it to live in the long run that's not coffea since it's not generally coffea specific (HEP, however, appears to be the only place where we need it presently).

Having a factorization point between serialized data and user interface is pretty reasonable IMO, but you need to do some pretty advanced data transformations to require it.

agoose77 · 2023-10-09T17:27:04Z

@lgray my view is that right now only coffea uses this. I don't see it as being a dask-awkward feature unless we extend the various ingest interfaces to support it. So right now I think if coffea want to generalise to other packages, perhaps we make a new awkward-form-remapping package that provides these interfaces / tools?

lgray · 2023-10-10T13:56:29Z

One of the first things I'm going to do one we get CoffeaTeam/coffea#900 is to turn the parquet nanoevents back on and start testing it routinely, which will require a patch to dask-awkward.

This also paves the way for future backends being easily ingested.

I'm certainly happy with another package if that seems like a prudent long term choice, but given the first thing I'm going to do is make it work for parquet, I'm pushing a bit to make sure we're not being overly strict about this.

nikoladze · 2023-10-10T15:50:49Z

Maybe the wrong place for this discussion, but i wonder when we create parquet files why we would even write them so "flat" like NanoAOD that we need to remap the form and not just write them in the structure we want it to have in the first place? Isn't it a nice feature of arrow/parquet that it seamlessly maps to awkward array?

jpivarski · 2023-10-10T15:57:09Z

After Arrow version 2 or so, it has become very good at writing NanoEvents-type structures (e.g. with lists of particle-records, rather than separate jagged arrays for each particle attribute). Due to the way that Parquet stores data, NanoEvents-form and NanoAOD-form would use the same disk space, but if it's easier to just write and read Parquet without changing the form, that would be a natural thing to do.

agoose77 · 2023-10-10T19:59:43Z

@lgray given how easy it is to publish new versions and packages, I am feeling that we don't need to move this to dask-awkward at this time.

I wonder whether uproot.dask should in future not handle form remapping. It seems to me that coffea and/or some other HEP package should handle form remappng, and e.g. implement uproot.dask for this case.

As such, are you content if we merge this PR, leave things in uproot for now, and subsequently handle this on the downstream side?

lgray · 2023-10-10T20:20:09Z

Yes - been happy with proceeding from the start - this particular branch of conversation is not a show stopper.

agoose77 · 2023-10-11T09:00:11Z

The test failures are due to Arrow being unavailable for Python 3.12.

As the other tests pass, and we can see that error, I'm going to merge this one and get it off the queue.

feat!: remove protocol

395879b

agoose77 requested a review from douglasdavis October 8, 2023 11:01

agoose77 mentioned this pull request Oct 10, 2023

fix: declare pyarrow optional for 3.12 #385

Merged

agoose77 force-pushed the agoose77/feat!-remove-form-transformation branch from a2762a1 to 395879b Compare October 10, 2023 20:40

agoose77 merged commit 8c9ff67 into main Oct 11, 2023
38 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: remove protocol for form transformation #382

feat!: remove protocol for form transformation #382

agoose77 commented Oct 8, 2023 •

edited

Loading

codecov-commenter commented Oct 8, 2023 •

edited

Loading

douglasdavis commented Oct 9, 2023 •

edited

Loading

lgray commented Oct 9, 2023 •

edited

Loading

agoose77 commented Oct 9, 2023

lgray commented Oct 10, 2023

nikoladze commented Oct 10, 2023

jpivarski commented Oct 10, 2023

agoose77 commented Oct 10, 2023

lgray commented Oct 10, 2023

agoose77 commented Oct 11, 2023

feat!: remove protocol for form transformation #382

feat!: remove protocol for form transformation #382

Conversation

agoose77 commented Oct 8, 2023 • edited Loading

codecov-commenter commented Oct 8, 2023 • edited Loading

Codecov Report

douglasdavis commented Oct 9, 2023 • edited Loading

lgray commented Oct 9, 2023 • edited Loading

agoose77 commented Oct 9, 2023

lgray commented Oct 10, 2023

nikoladze commented Oct 10, 2023

jpivarski commented Oct 10, 2023

agoose77 commented Oct 10, 2023

lgray commented Oct 10, 2023

agoose77 commented Oct 11, 2023

agoose77 commented Oct 8, 2023 •

edited

Loading

codecov-commenter commented Oct 8, 2023 •

edited

Loading

douglasdavis commented Oct 9, 2023 •

edited

Loading

lgray commented Oct 9, 2023 •

edited

Loading