Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional validation step (and documentation improvements) #107

Merged
merged 28 commits into from
Jan 30, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4916abd
add 3.11 classifier
fsoubelet Jan 26, 2023
e64095a
fix language in doc config
fsoubelet Jan 26, 2023
10953b9
bump version
fsoubelet Jan 26, 2023
45f3072
normalize subheader line
fsoubelet Jan 26, 2023
b299e4c
normalize subheader line
fsoubelet Jan 26, 2023
de4220e
options to skip validation
fsoubelet Jan 26, 2023
62e65be
conf and examples to functions
fsoubelet Jan 26, 2023
f2a6d99
add new dependencies for doc goodies
fsoubelet Jan 26, 2023
ec4645f
examples
fsoubelet Jan 26, 2023
6979bcd
add missing type hint, fix admonition
fsoubelet Jan 26, 2023
dc3c433
hints and returns
fsoubelet Jan 26, 2023
abf685b
tests for new reader argument
fsoubelet Jan 26, 2023
87ebc1b
tests for new writer argument
fsoubelet Jan 26, 2023
2ada3c8
we are only 3.7+ so this should go away, to confirm with Josch on the…
fsoubelet Jan 26, 2023
8644051
hint admonition for the methodology
fsoubelet Jan 27, 2023
65a76dd
rephrasing
fsoubelet Jan 27, 2023
93eac60
update changelog
fsoubelet Jan 27, 2023
3094cec
remove old commented out test
fsoubelet Jan 30, 2023
2e0e26a
remove and update this too
fsoubelet Jan 30, 2023
5a6e0e6
tests for space in column name
fsoubelet Jan 30, 2023
78cbb00
validation off by default when reading
fsoubelet Jan 30, 2023
d5115b7
another no validation test in writer
fsoubelet Jan 30, 2023
a889533
validate in new name at import
fsoubelet Jan 30, 2023
08dfb4e
adapt warning text in reader
fsoubelet Jan 30, 2023
d96520a
rename validate_after_reading to validate
fsoubelet Jan 30, 2023
8e1489d
rename validate_before_writing to validate
fsoubelet Jan 30, 2023
0cb54d1
adapt warning text in writer
fsoubelet Jan 30, 2023
5925a3e
named admonition in the docs
fsoubelet Jan 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# TFS-Pandas Changelog

## Version 3.3.0

- Added:
- The option is now given to the user to skip DataFrame validation after reading from file / before writing to file. Validation is left "on" by default, but can be turned off with a boolean argument.

- Changes:
- The documentation has been expanded and improved, with notably the addition of example code snippets.

## Version 3.2.1

- Changed:
Expand Down
62 changes: 46 additions & 16 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,9 @@
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import os
import pathlib
import sys

# ignore numpy warnings, see:
# https://stackoverflow.com/questions/40845304/runtimewarning-numpy-dtype-size-changed-may-indicate-binary-incompatibility
import warnings

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
Expand Down Expand Up @@ -55,16 +50,36 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.doctest",
"sphinx.ext.todo",
"sphinx.ext.coverage",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.githubpages",
"sphinx.ext.napoleon",
"sphinx.ext.autodoc", # Include documentation from docstrings
"sphinx.ext.coverage", # Collect doc coverage stats
"sphinx.ext.doctest", # Test snippets in the documentation
"sphinx.ext.githubpages", # Publish HTML docs in GitHub Pages
"sphinx.ext.intersphinx", # Link to other projects’ documentation
"sphinx.ext.mathjax", # Render math via JavaScript
"sphinx.ext.napoleon", # Support for NumPy and Google style docstrings
"sphinx.ext.todo", # Support for todo items
"sphinx.ext.viewcode", # Add links to highlighted source code
"sphinx_copybutton", # Add a "copy" button to code blocks
"sphinx-prompt", # prompt symbols will not be copy-pastable
"sphinx_codeautolink", # Automatically link example code to documentation source
]

# Config for autosectionlabel extension
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2

# Config for the napoleon extension
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = True
napoleon_use_admonition_for_examples = True
napoleon_use_admonition_for_notes = True
napoleon_use_admonition_for_references = True
napoleon_preprocess_types = True
napoleon_attr_annotations = True

# Configuration for sphinx.ext.todo
todo_include_todos = True

# Add any paths that contain templates here, relative to this directory.
# templates_path = ['_templates']

Expand Down Expand Up @@ -101,7 +116,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
language = "en"

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
Expand All @@ -111,8 +126,9 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
# The reST default role (used for this markup: `text`) to use for all
# documents.
default_role = "obj"

# -- Options for HTML output ----------------------------------------------

Expand Down Expand Up @@ -215,3 +231,17 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
"Miscellaneous",
),
]

# -- Instersphinx Configuration ----------------------------------------------

# Example configuration for intersphinx: refer to the Python standard library.
# use in refs e.g:
# :ref:`comparison manual <python:comparisons>`
intersphinx_mapping = {
"python": ("https://docs.python.org/3/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"matplotlib": ("https://matplotlib.org/stable/", None),
"scipy": ("https://docs.scipy.org/doc/scipy/", None),
"cpymad": ("https://hibtc.github.io/cpymad/", None),
}
3 changes: 1 addition & 2 deletions doc/modules/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
TFS-Pandas Modules
**************************
==================

.. automodule:: tfs.collection
:members:
Expand Down Expand Up @@ -31,4 +31,3 @@ TFS-Pandas Modules

.. automodule:: tfs.writer
:members:

3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
EXTRA_DEPENDENCIES = {
"test": ["pytest>=5.2", "pytest-cov>=2.9", "cpymad>=1.8.1"],
"hdf5": ["h5py>=2.9.0", "tables>=3.6.0"],
"doc": ["sphinx", "sphinx_rtd_theme"],
"doc": ["sphinx", "sphinx_rtd_theme", "sphinx_copybutton", "sphinx-prompt", "sphinx_codeautolink"],
}
EXTRA_DEPENDENCIES.update({"all": [elem for list_ in EXTRA_DEPENDENCIES.values() for elem in list_]})
EXTRA_DEPENDENCIES["test"] += EXTRA_DEPENDENCIES["hdf5"]
Expand Down Expand Up @@ -66,6 +66,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Topic :: Scientific/Engineering",
"Topic :: Software Development :: Libraries :: Python Modules",
"Typing :: Typed",
Expand Down
15 changes: 14 additions & 1 deletion tests/test_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

import tfs
from tfs import read_tfs, write_tfs
from tfs.errors import TfsFormatError
from tfs.constants import HEADER
from tfs.errors import TfsFormatError

CURRENT_DIR = pathlib.Path(__file__).parent

Expand Down Expand Up @@ -35,6 +35,19 @@ def test_tfs_read_str_input(self, _tfs_file_str: str):
assert len(str(test_file)) > 0
assert isinstance(test_file.index[0], str)

def test_tfs_read_no_validation(self, _tfs_file_pathlib: pathlib.Path):
test_file = read_tfs(_tfs_file_pathlib, index="NAME", validate_after_reading=False)
assert len(test_file.headers) > 0
assert len(test_file.columns) > 0
assert len(test_file.index) > 0
assert len(str(test_file)) > 0
assert isinstance(test_file.index[0], str)
fsoubelet marked this conversation as resolved.
Show resolved Hide resolved

def test_tfs_read_no_validation_doesnt_warn(self, caplog):
nan_tfs_path = pathlib.Path(__file__).parent / "inputs" / "has_nans.tfs"
_ = read_tfs(nan_tfs_path, index="NAME", validate_after_reading=False)
assert "contains non-physical values at Index:" not in caplog.text

def tfs_indx_pathlib_input(self, _tfs_file_pathlib: pathlib.Path):
test_file = read_tfs(_tfs_file_pathlib)
assert test_file.indx["BPMYB.5L2.B1"] == test_file.set_index("NAME")["BPMYB.5L2.B1"]
Expand Down
42 changes: 28 additions & 14 deletions tests/test_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
import pytest
from cpymad.madx import Madx
from pandas._testing import assert_dict_equal
from pandas.testing import assert_frame_equal, assert_index_equal, assert_series_equal
from pandas.testing import (assert_frame_equal, assert_index_equal,
assert_series_equal)

import tfs
from tfs import TfsDataFrame, read_tfs, write_tfs
Expand Down Expand Up @@ -100,6 +101,15 @@ def test_tfs_write_read(self, _tfs_dataframe, tmp_path):
assert_frame_equal(_tfs_dataframe, new, check_exact=False) # float precision can be an issue
assert_dict_equal(_tfs_dataframe.headers, new.headers, compare_keys=True)

def test_tfs_write_read_no_validate(self, _tfs_dataframe, tmp_path):
write_location = tmp_path / "test.tfs"
write_tfs(write_location, _tfs_dataframe, validate_before_writing=False)
assert write_location.is_file()

new = read_tfs(write_location, validate_after_reading=False)
assert_frame_equal(_tfs_dataframe, new, check_exact=False) # float precision can be an issue
assert_dict_equal(_tfs_dataframe.headers, new.headers, compare_keys=True)

def test_tfs_write_read_no_headers(self, _dataframe_empty_headers: TfsDataFrame, tmp_path):
write_location = tmp_path / "test.tfs"
write_tfs(write_location, _dataframe_empty_headers)
Expand Down Expand Up @@ -140,6 +150,10 @@ def test_tfs_write_read_autoindex(self, _tfs_dataframe, tmp_path):
assert_index_equal(df.index, df_read.index, check_exact=False)
assert_dict_equal(_tfs_dataframe.headers, df_read.headers, compare_keys=True)

def test_no_warning_on_non_unique_columns_if_no_validate(self, tmp_path, caplog):
df = TfsDataFrame(columns=["A", "B", "A"])
write_tfs(tmp_path / "temporary.tfs", df, validate_before_writing=False)
assert "Non-unique column names found" not in caplog.text

class TestFailures:
def test_raising_on_non_unique_columns(self, caplog):
Expand Down Expand Up @@ -231,19 +245,19 @@ def test_header_line_raises_on_non_strings(self):


class TestWarnings:
@pytest.mark.skipif(
sys.version_info >= (3, 7),
reason="Our workers on 3.7+ install pandas >= 1.3.0 which has fixed the .convert_dtypes() bug "
"we try...except in _autoset_pandas_types and test here",
)
def test_empty_df_warns_on_types_inference(self, caplog):
empty_df = pandas.DataFrame()
converted_df = tfs.writer._autoset_pandas_types(empty_df)
assert_frame_equal(converted_df, empty_df)

for record in caplog.records:
assert record.levelname == "WARNING"
assert "An empty dataframe was provided, no types were inferred" in caplog.text
# @pytest.mark.skipif(
# sys.version_info >= (3, 7),
# reason="Our workers on 3.7+ install pandas >= 1.3.0 which has fixed the .convert_dtypes() bug "
# "we try...except in _autoset_pandas_types and test here",
# )
# def test_empty_df_warns_on_types_inference(self, caplog):
# empty_df = pandas.DataFrame()
# converted_df = tfs.writer._autoset_pandas_types(empty_df)
# assert_frame_equal(converted_df, empty_df)

# for record in caplog.records:
# assert record.levelname == "WARNING"
# assert "An empty dataframe was provided, no types were inferred" in caplog.text
fsoubelet marked this conversation as resolved.
Show resolved Hide resolved

def test_warning_on_non_unique_columns(self, tmp_path, caplog):
df = TfsDataFrame(columns=["A", "B", "A"])
Expand Down
4 changes: 2 additions & 2 deletions tfs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
"""
from tfs.errors import TfsFormatError
from tfs.frame import TfsDataFrame, concat
from tfs.hdf import read_hdf, write_hdf
from tfs.reader import read_tfs
from tfs.writer import write_tfs
from tfs.hdf import read_hdf, write_hdf

__title__ = "tfs-pandas"
__description__ = "Read and write tfs files."
__url__ = "https://github.com/pylhc/tfs"
__version__ = "3.2.1"
__version__ = "3.3.0"
__author__ = "pylhc"
__author_email__ = "[email protected]"
__license__ = "MIT"
Expand Down
54 changes: 29 additions & 25 deletions tfs/collection.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Collection
----------------------
----------

Advanced **TFS** files reading and writing functionality.
"""
Expand Down Expand Up @@ -48,47 +48,51 @@ class TfsCollection(metaclass=_MetaTfsCollection):

Classes inheriting from this abstract class will be able to define **TFS** files
as readable or writable, and read or write them just as attribute access or
assignments. All attributes will be read and written as ``TfsDataFrame`` objects.
assignments. All attributes will be read and written as `~tfs.TfsDataFrame` objects.

Example:
If **./example** is a directory that contains two **TFS** files **beta_phase_x.tfs**
and **beta_phase_y.tfs** with `BETX` and `BETY` columns respectively:

.. sourcecode:: python
.. code-block:: python

class ExampleCollection(TfsCollection)
# All TFS attributes must be marked with the Tfs(...) class, and generated attribute
# names will be appended with _x / _y depending on files found in "./example"
>>> # All TFS attributes must be marked with the Tfs(...) class,
... # and generated attribute names will be appended with _x / _y
... # depending on files found in "./example"
... class ExampleCollection(TfsCollection):
... beta = Tfs("beta_phase_{}.tfs") # A TFS attribute
... other_value = 7 # A traditional attribute.

beta = Tfs("beta_phase_{}.tfs") # A TFS attribute
other_value = 7 # A traditional attribute.
... def get_filename(template: str, plane: str) -> str:
... return template.format(plane)

def get_filename(template: str, plane: str) -> str:
return template.format(plane)
>>> example = ExampleCollection("./example")

example = ExampleCollection("./example")
>>> # Get the BETX / BETY column from "beta_phase_x.tfs":
>>> beta_x_column = example.beta_x.BETX # / example.beta_x.BETY

# Get the BETX / BETY column from "beta_phase_x.tfs":
beta_x_column = example.beta_x.BETX # / example.beta_x.BETY
>>> # Get the BETY column from "beta_phase_y.tfs":
>>> beta_y_column = example.beta_y.BETY

# Get the BETY column from "beta_phase_y.tfs":
beta_y_column = example.beta_y.BETY
>>> # The planes can also be accessed as items (both examples below work):
>>> beta_y_column = example.beta["y"].BETY
>>> beta_y_column = example.beta["Y"].BETY

# The planes can also be accessed as items (both examples below work):
beta_y_column = example.beta["y"].BETY
beta_y_column = example.beta["Y"].BETY
>>> # This will write an empty DataFrame to "beta_phase_y.tfs":
>>> example.allow_write = True
>>> example.beta["y"] = DataFrame()

# This will write an empty DataFrame to "beta_phase_y.tfs":
example.allow_write = True
example.beta["y"] = DataFrame()

If the file to be loaded is not defined for two planes then the attribute can be declared
and accessed as:

.. code-block:: python

If the file to be loaded is not defined for two planes then the attribute can be declared as:
``coupling = Tfs("getcouple.tfs", two_planes=False)`` and then accessed as
``f1001w_column = example.coupling.F1001W``.
>>> coupling = Tfs("getcouple.tfs", two_planes=False) # declaration
>>> f1001w_column = example.coupling.F1001W # access

No file will be loaded until the corresponding attribute is accessed and the loaded
``TfsDataFrame`` will be buffered, thus the user should expect an ``IOError`` if the requested
`~tfs.TfsDataFrame` will be buffered, thus the user should expect an ``IOError`` if the requested
file is not in the provided directory (only the first time, but is better to always take it
into account!).

Expand Down
2 changes: 1 addition & 1 deletion tfs/constants.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Constants
-------------------
---------

General constants used throughout ``tfs-pandas``, relating to the standard of **TFS** files.
"""
Expand Down
2 changes: 1 addition & 1 deletion tfs/errors.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Errors
-------------------
------

Errors that can be raised during the handling of **TFS** files.
"""
Expand Down
6 changes: 3 additions & 3 deletions tfs/frame.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
Frame
-------------------
-----

Contains the class definition of a ``TfsDataFrame``, inherited from the ``pandas`` ``DataFrame``, as well
as a utility function to validate the correctness of a ``TfsDataFrame``.
Expand All @@ -9,7 +9,7 @@
from collections import OrderedDict
from contextlib import suppress
from functools import partial, reduce
from typing import Sequence, Union
from typing import Sequence, Set, Union

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -260,7 +260,7 @@ def concat(
axes. Data manipulation is done by the ``pandas.concat`` function. Resulting headers are either
merged according to the provided **how_headers** method or as given via **new_headers**.

..warning::
.. warning::
Please note that when using this function on many ``TfsDataFrames``, leaving the contents of the
final headers dictionary to the automatic merger can become unpredictable. In this case it is
recommended to provide the **new_headers** argument to ensure the final result, or leave both
Expand Down
Loading