You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A key to understanding the behavior of PintArrays within Pandas DataFrames is that PintArrays are ExtensionArrays, and Pandas supports only 1-dimensional ExtensionArrays. In a 2-dimensional DataFrame, PintArrays are only ever columns. If you ask for a single row (via .loc or .iloc or some such), you will get back a Series, that may have a PintArray as its values (for example when the entire DataFrame is homogeneous in its units). But that Series is just a view constructed by PintPandas for your convenience. Rows in the DataFrame itself are not PintArrays.
also for
Pandas and PintArrays
Pandas makes it easy to put data into and rows and columns, and most novice users do not need to understand any details beyond the fact that a DataFrame has both rows and columns, whereas Series are 1-dimensional arrays (which could be a row of data or a column of data). But when using advanced features of Pandas like ExtensionArrays, it is helpful to understand a few additional details.
When working with basic numerical data, Pandas uses Numpy data structures which are well-suited to vectorization and other performance optimizations. Pandas ExtensionArrays provide almost the full range of Pandas functionality when operating on user-defined 1-dimensional arrays. A PintArray is an ExtensionArray that is filled with Pint Quantities (and which can be optimized for performance).
A 1-dimensional Pandas Series can use a PintArray to hold its values. Columns in 2-dimensional Pandas DataFrame can contain PintArrays--with all the efficiency the ExtensionArray APIs provide, but rows are a special case. If all elements of the row have the same units, the row will be returned as Series backed by a PintArray with those units. But if the units are heterogeneous, the row will be returned as a Series consisting of discrete Quantities (or raw data if the column values don't have units). All Quantity data within such Series will follow Pint rules of unit conversions and will give error messages when units are not compatible, but some error messages may lose information as Pandas tries to align two incompatible Quantities to non-unitized magnitude values. To get the greatest benefit from Pint-pandas (and Pandas in general), make your columns from data with homogeneous data and let your rows contain the heterogeneous data when necessary.
When examining DataFrames that contain units, if you see units within your DataFrame or Series, it means the Pandas object is not using PintArrays but it is still using Pint Quantities. If you see only magnitudes when you print your DataFrame, and you see pint[units] as the dtype of the column or Series, it means the Pandas object is using PintArrays.
API page for all series and dataframe accessors. Would need to set up docstrings.
in common issues, add code to illustrate the following:
Quantity objects within Pandas DataFrames (or Series) will behave like Quantities, meaning that they are subject to unit conversion rules and will raise errors when incompatible units are mixed. But these loose Quantities don't offer the elegance or performance optimizations that come from using PintArrays. And they may give strange error messages as Pandas tries to convert incompatible units to dimensionless magnitudes (which is often prohibited by Pint) rather than naming the incompatibility between the two Quantities in question.
in common issues, expand on Creating DataFrames from Series
The text was updated successfully, but these errors were encountered:
Now that it's merged it should be easier to make a PR and view the changed pages yourself @MichaelTiemannOSC
I think you need to make an account on readthedocs to get it to build for you
A page or example for:
also for
API page for all series and dataframe accessors. Would need to set up docstrings.
in common issues, add code to illustrate the following:
in common issues, expand on
Creating DataFrames from Series
The text was updated successfully, but these errors were encountered: