-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hierarchical labelling of Dataset variables #4665
Comments
@Robileo - thanks for the interesting question. Quick question. Would it work to reshape your data into a N-D array, ie |
@jhamman thanks for your answer. In the following situation, making a N-D array would work :
The array
In that case, finding common dimensions between all the variables is much less intuitive. But labeling the variables with tuples is quite easy :
Pandas supports this hierarchical labeling pretty well. But when I run several simulations my data are multi-dimensional and xarray becomes very interesting. If xarray had this feature it would be the ideal tool ! |
Closing this issue to keep discussion consolidated in #4118 |
Context and problem
In my every day work I use pandas to store and process system simulation results. The lack of a feature of pandas DataFrame in xarray Dataset prevent me from switching to xarray.
I often have to store hierarchical data such as "rotational speed of engine A of system B". In a DataFrame, I label this variable with a tuple
('system_B', 'engine_A', 'speed')
. This results in a hierarchical labelling of the DataFrame variables :df.system_B.engine_A.speed
gives a Series containing the values of the speed anddf.system_B
gives a DataFrame containing all the variables of system B.Here is an example :
I can make a Dataset from this DataFrame but accessing the variables is more difficult (in v0.16.2):
I cannot benefit from IPython code completion and I cannot access easily to all the variables of system_B.
Solution
A solution could be to copy the behaviour of Pandas' DataFrame :
ds.system_B
, if no variable namedsystem_B
is found, xarray returns a Dataset containing all the variables labelled with a tuple beginning with'system_B'
. As a result,ds.system_B
gives a Dataset andds.system_B.engine_A.speed
gives a DataArray.reorder_level
of Pandas' DataFrame.Additional context
Hierarchical data have already been discussed in previous issues including #1092 and #4118. The questions related to this topic are complex : should levels share coordinates with each other ? How dealing with slicing, concatenation ... ? etc...
The feature I propose does not fulfil all the needs but allows hierarchical ordering of variables without changing the internal structure of Dataset. This feature have already been proposed in this comment of #1092 but does not seems to have been implemented since then.
The text was updated successfully, but these errors were encountered: