Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API functionality #176

Merged
merged 10 commits into from
Oct 23, 2023

Conversation

harryjmoss
Copy link

As part of work between UCL and Durham, we have produced a prototype FastAPI app for sending numpy arrays derived from HDF5 files, as well as other SWIFT objects.

Here is the API

@kyleaoman kyleaoman changed the base branch from master to remote-api-dev October 23, 2023 10:39
@kyleaoman
Copy link
Member

kyleaoman commented Oct 23, 2023

@JBorrow the time allocated to the team at UCL to work on this has now ended so they're handing what they have (which is now fairly mature) over to us. I think that what we should do from here is:

  1. Merge this PR into a feature branch to wrap up some loose ends before merging to master.
  2. In that feature branch move the API code into a separate module (or submodule?) of swiftsimio. I think it makes sense to distribute it in the swiftsimio package.
  3. After that there's a little bit of documentation work to do in the swiftsimio narrative docs. At a minimum we should point to the remote API docs as an available feature, and probably add a new page with some cookbook examples. This would be about the earliest to consider merging into master?
  4. Then there's a bit of work to do to set up a persistent server on cosma (on dataweb, I think). The RSEs have left some suggestions of plausible ways this could work.
  5. A little bit of development needed to decide how we want to offer available simulation data through the server. Loosely the server would maintain a dictionary mapping labels to hdf5 files on disc, something like {"flamingos_NxxxxLxxxx_physicslabel_snapnum": "/cosma8/.../snap.hdf5"}, and then a user should be able to set up a RemoteSWIFTDataset with a label plus their credentials.
  6. Set up a webpage (on/linked from the VirgoDB pages?) pointing to the relevant documentation and tabulating the data available from the server by label. Maybe we can set everything up with a test dataset and then roll out an initial simulation dataset that we can announce more widely. Perhaps Flamingos makes a good candidate?

If you agree with step 1, at least, then please go ahead and approve the merge to a branch and we can work from there :)

@kyleaoman
Copy link
Member

@MatthieuSchaller you likely want to be aware of this PR, maybe you have input on the proposed workflow from here.

@kyleaoman kyleaoman added the enhancement New feature or request label Oct 23, 2023
@kyleaoman
Copy link
Member

Actually there's no reason not to merge this into a development branch, and I need to make a few commits to fix bugs due to the new SWIFTUnits._handle attribute - and don't have push permission on the fork. @JBorrow detailed review can wait for the eventual PR to master, but perhaps worth having a browse of the branch in the meantime.

@kyleaoman kyleaoman merged commit b84712b into SWIFTSIM:remote-api-dev Oct 23, 2023
5 checks passed
Copy link
Member

@JBorrow JBorrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments; in particular I would like to see these moved outside of this file.

Is it possible to test these functions? I would like to see that.

Comment on lines +26 to +31
"cloudpickle>=2.2.1",
"numpy",
"h5py",
"unyt>=2.9.0",
"numba>=0.50.0",
"requests>=2.31.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indentation

Comment on lines +36 to +40
class NumpyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
return json.JSONEncoder.default(self, obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it would be great to have all the remote stuff in a separate file (e.g. remote.py)

Comment on lines +247 to +261
class RemoteSWIFTUnits:
def __init__(self, unit_dict=None):
excluded_fields = ["filename"]
if unit_dict is not None:
for key, value in unit_dict.items():
if key not in excluded_fields and not isinstance(
value, unyt.unyt_quantity
):
if isinstance(unit_dict[key], dict):
for nested_key, nested_value in unit_dict[key].items():
setattr(self, nested_key, nested_value)
else:
setattr(self, key, unyt.unyt_quantity.from_string(value))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What on earth does this do? Why does it exist? Needs to be documented.

Comment on lines +1229 to +1231
Args:
json_array (str): Numpy array as JSON
data_type (str): Data type of elements in the original array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is inconsistent with the rest of the codebase.


def generate_getter_remote(
server_address: str,
session: requests.Session,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session is undocumented

Comment on lines +1996 to +1998
+ SWIFTUnits,
+ SWIFTMetadata,
+ SWIFTParticleDataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be for the remote versions of these.

@kyleaoman
Copy link
Member

Thanks @JBorrow. It's on my list to make a draft PR for this branch with a to-do list before it can be merged. Getting testing set up alone is going to be a big job. Ideally a lot of existing tests should be refactored to use fixtures so that we can test both the stand-alone and "client" SWIFTDatasets. Always hesitant to start making significant changes to tests, though, so will want to brainstorm that a bit before investing any effort. Figuring out how to host a server for the tests is also a
bit of a tricky one - probably possible but not similar to anything I've tried before. Lots of other little stuff to do, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants