Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A NB draft showing an example of LSDB pipeline #18

Merged
3 commits merged into from
May 30, 2024
Merged

A NB draft showing an example of LSDB pipeline #18

3 commits merged into from
May 30, 2024

Conversation

hombit
Copy link
Collaborator

@hombit hombit commented May 24, 2024

Change Description

  • My PR includes a link to the issue that I am addressing

Solution Description

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Project-Specific Pull Request Checklists

Bug Fix Checklist

  • My fix includes a new test that breaks as a result of the bug (if possible)
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Build/CI Change Checklist

  • If required or optional dependencies have changed (including version numbers), I have updated the README to reflect this
  • If this is a new CI setup, I have added the associated badge to the README

Other Change Checklist

  • Any new or updated docstrings use the NumPy docstring format.
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover any changes
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

@hombit hombit requested a review from dougbrn May 24, 2024 13:00
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

Before [7ffa702] After [c3788e8] Ratio Benchmark (Parameter)
146M 148M 1.01 benchmarks.NestedFrameAddNested.peakmem_run
147M 149M 1.01 benchmarks.NestedFrameQuery.peakmem_run
489±4ms 492±1ms 1.01 benchmarks.NestedFrameQuery.time_run
237±0.6ms 237±2ms 1 benchmarks.NestedFrameAddNested.time_run
147M 147M 1 benchmarks.NestedFrameReduce.peakmem_run
378±1ms 377±2ms 1 benchmarks.NestedFrameReduce.time_run

Click here to view all benchmarks.

@@ -0,0 +1,236 @@
{
Copy link
Collaborator

@dougbrn dougbrn May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #22.    nested_ddf

Are you getting that this is actually a Dask-Nested NestedFrame object? On my end it seems like this is stuck as a Dask DataFrame. Interestingly from_dask_dataframe works when loading the data directly using dask. Investigating...

import dask.dataframe as dd


test_dd = dd.read_parquet(f"{catalogs_dir}/ztf_object",
    columns=["ra", "dec", "ps1_objid"],)


type(NestedFrame.from_dask_dataframe(test_dd)) #NestedFrame

Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be an issue with LSDB using legacy frames, vs dask natively using dask-expr dataframes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

from dask_expr import from_legacy_dataframe
object_ndf = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_object._ddf))
type(object_ndf) #nested_dask.core.NestedFrame

@@ -0,0 +1,236 @@
{
Copy link
Collaborator

@dougbrn dougbrn May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    %pip install aiohttp lsdb

Just a note, I'm suggesting we add dask-expr as a dependency to this workflow, but it's already a requirement of the package so it's not needed here


Reply via ReviewNB

@@ -0,0 +1,236 @@
{
Copy link
Collaborator

@dougbrn dougbrn May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below comment on from_legacy_dataframe.

Do we need to actually do the LSDB join for this use case? Would it be better to just do the join at the NestedFrame level:

from dask_expr import from_legacy_dataframe
nf_object = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_object._ddf))
nf_source = NestedFrame.from_dask_dataframe(from_legacy_dataframe(lsdb_source._ddf))


nf_joined = nf_object.add_nested(nf_source, "ztf_source", how="left")

Reply via ReviewNB

@@ -0,0 +1,236 @@
{
Copy link
Collaborator

@dougbrn dougbrn May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you planning on adding a bit of analysis to this? Maybe just a query and a function run would be helpful


Reply via ReviewNB

Copy link
Collaborator

@dougbrn dougbrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for getting the ball rolling here, I have a few structural requests/questions

@hombit hombit closed this pull request by merging all changes into main in dac5188 May 30, 2024
@hombit hombit deleted the lsdb-docs branch May 30, 2024 17:12
@hombit hombit restored the lsdb-docs branch May 30, 2024 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants