-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A NB draft showing an example of LSDB pipeline #18
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Click here to view all benchmarks. |
@@ -0,0 +1,236 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #22. nested_ddf
Are you getting that this is actually a Dask-Nested NestedFrame object? On my end it seems like this is stuck as a Dask DataFrame. Interestingly from_dask_dataframe works when loading the data directly using dask. Investigating...
import dask.dataframe as dd test_dd = dd.read_parquet(f"{catalogs_dir}/ztf_object", columns=["ra", "dec", "ps1_objid"],) type(NestedFrame.from_dask_dataframe(test_dd)) #NestedFrame
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be an issue with LSDB using legacy frames, vs dask natively using dask-expr dataframes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
from dask_expr import from_legacy_dataframe
object_ndf = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_object._ddf))
type(object_ndf) #nested_dask.core.NestedFrame
@@ -0,0 +1,236 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #4. %pip install aiohttp lsdb
Just a note, I'm suggesting we add dask-expr as a dependency to this workflow, but it's already a requirement of the package so it's not needed here
Reply via ReviewNB
@@ -0,0 +1,236 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below comment on from_legacy_dataframe.
Do we need to actually do the LSDB join for this use case? Would it be better to just do the join at the NestedFrame level:
from dask_expr import from_legacy_dataframe nf_object = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_object._ddf)) nf_source = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_source._ddf)) nf_joined = nf_object.add_nested(nf_source, "ztf_source", how="left")
Reply via ReviewNB
@@ -0,0 +1,236 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you planning on adding a bit of analysis to this? Maybe just a query and a function run would be helpful
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for getting the ball rolling here, I have a few structural requests/questions
Change Description
Solution Description
Code Quality
Project-Specific Pull Request Checklists
Bug Fix Checklist
New Feature Checklist
Documentation Change Checklist
Build/CI Change Checklist
Other Change Checklist