Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggest adding more details to building new notebooks in contributing guide #193

Open
MathewBiddle opened this issue May 7, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@MathewBiddle
Copy link
Contributor

Bullet 4 in https://github.com/ioos/ioos_code_lab/blob/main/CONTRIBUTING.md#building-new-notebooks should include:

  • High level guidance on creating a jupyter notebook. (Is there a reference we can drop in here?)
  • details on what to do with large data
  • if needed, details on HPC resources
  • guidance on the content of markdown cells. Plain language description of what the code is doing.
@MathewBiddle MathewBiddle added the documentation Improvements or additions to documentation label May 7, 2024
@ocefpaf
Copy link
Member

ocefpaf commented May 8, 2024

  • High level guidance on creating a jupyter notebook. (Is there a reference we can drop in here?)

Maybe https://docs.jupyter.org/en/latest/start/index.html ?

details on what to do with large data

This is a broad question and the answer may vary from case-to-case. Here is a few things I can think of:

Scenario 1
Large data that is not served in a way that someone can slice it on demand and the notebook will use a smaller manageable portion of it.

Answer: Do the slicing offline and provide that step as an example.

Scenario 2
Large data that will be used in its entirety and the data is served somewhere.

Answer: We will skip that notebook from the CIs run but, if the notebook runs on someone's machine, we can publish it with a caveat that it will take some time/resources to run it locally.

Scenario 3
Large data is served somewhere we can slice it and the notebook will use a small portion.

Answer: As long as the CI run time can handle the request, we can publish as-is, if not, use solution 2.

Scenario 4
Large-ish data (<2 G), not served anywhere.

Answer: We can add it as an artifact on GH releases and fetch the data from it.

if needed, details on HPC resources

Not sure what scenarios you are thinking here but, as long as someone can run the notebook, this can fall into the same large data scenario 2, where we can publish it but we won't run on our CIs.

guidance on the content of markdown cells. Plain language description of what the code is doing.

This is a tough one and probably the most important one IMO. Jupyter has a Narrative Section on their docs that could help but it is just a placeholder at the moment.

@MathewBiddle
Copy link
Contributor Author

Maybe https://docs.jupyter.org/en/latest/start/index.html ?

That was my initial thought. But that really introduces the concept of a jupyter notebook. It doesn't describe what the content should be.

@ocefpaf
Copy link
Member

ocefpaf commented May 9, 2024

It doesn't describe what the content should be.

Yeah. Which leads us to the "Narrative" part that is incomplete. I guess we should write our own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: To Do
Development

No branches or pull requests

2 participants