Skip to content

Commit

Permalink
Merge pull request #201 from khaeru/enh/read-csv
Browse files Browse the repository at this point in the history
Read SDMX-CSV 2.0.0
  • Loading branch information
khaeru authored Oct 23, 2024
2 parents 70cf79c + f70790d commit 830692b
Show file tree
Hide file tree
Showing 17 changed files with 492 additions and 21 deletions.
1 change: 1 addition & 0 deletions doc/api/model-v21-list.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
:obj:`~.v21.ContentConstraint`
:obj:`~.v21.DataKey`
:obj:`~.v21.DataKeySet`
:obj:`~.v21.DataSet`
:obj:`~.v21.DataSetTarget`
:obj:`~.v21.DataStructureDefinition`
:obj:`~.v21.DataflowDefinition`
Expand Down
1 change: 1 addition & 0 deletions doc/api/model-v30-list.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
:obj:`~.v30.DataConstraint`
:obj:`~.v30.DataKey`
:obj:`~.v30.DataKeySet`
:obj:`~.v30.DataSet`
:obj:`~.v30.DataStructureDefinition`
:obj:`~.v30.Dataflow`
:obj:`~.v30.DataflowRelationship`
Expand Down
31 changes: 27 additions & 4 deletions doc/api/reader.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,38 @@ SDMX-JSON
:members:
:undoc-members:


.. currentmodule:: sdmx.reader.csv

SDMX-CSV
=========

.. currentmodule:: sdmx.reader.csv
:mod:`sdmx.reader.csv` supports SDMX-CSV 2.0.0, corresponding to SDMX 3.0.0.
See :ref:`sdmx-csv` for differences between versions of the SDMX-CSV file format.

.. autoclass:: sdmx.reader.csv.Reader
:members:
:undoc-members:
Implementation details:

- :meth:`.Reader.inspect_header` inspects the header line in the CSV input and constructs a set of :class:`~.csv.Handler` instances, one for each field appearing in the particular file.
Some of these handlers do actually process the contents of the field, but silently discard it; for example, when ``labels="name"``, the name fields are not processed.
- :meth:`.Reader.handle_row` is applied to every record in the CSV input.
Each Handler is applied to its respective field.
Every :meth:`.handle_row` call constructs a single :class:`~.v30.Observation`.
- :meth:`.Reader.read_message` assembles the resulting observations into one or more :class:`DataSets <.common.BaseDataSet>`.
SDMX-CSV 2.0.0 specifies a mix of codes such as "I" (:attr:`.ActionType.information`) and "D" (:attr:`.ActionType.delete`) in the "ACTION" field for each observation in the same file, whereas the SDMX IM specifies that :attr:`~.BaseDataSet.action` is an attribute of an entire DataSet.
:class:`~.csv.Reader` groups all observations into 1 or more DataSet instances, according to their respective "ACTION" field values.

Currently :mod:`.reader.csv` has the following limitations:

- :meth:`.Reader.read_message` generates SDMX 3.0.0 (:mod:`.model.v30`) artefacts such as :class:`.v30.DataSet`, since these correspond to the supported SDMX-CSV 2.0.0 format.
It is not currently supported to generate SDMX 2.1 artefacts such as :class:`.v21.DataSet`.
- Currently only a single :class`.v30.Dataflow` or :class:`.v30.DataStructureDefinition` can be supplied to :meth:`.Reader.read_message`.
The SDMX-CSV 2.0.0 format supports mixing data flows and data structures in the same message.
Such messages can be read with :mod:`sdmx`, but the resulting data sets will only correspond to the given data flow.

.. automodule:: sdmx.reader.csv
:members:
:undoc-members:
:show-inheritance:

Reader API
==========
Expand Down
23 changes: 17 additions & 6 deletions doc/implementation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -300,21 +300,32 @@ The SDMX-JSON *format* is versioned differently from the overall SDMX *standard*
SDMX-CSV
--------

Reference: https://github.com/sdmx-twg/sdmx-csv
Reference: https://github.com/sdmx-twg/sdmx-csv; see in particular the file `sdmx-csv-field-guide.md <https://github.com/sdmx-twg/sdmx-csv/blob/v2.0.0/data-message/docs/sdmx-csv-field-guide.md>`_.

Based on Comma-Separated Value (CSV).
The SDMX-CSV *format* is versioned differently from the overall SDMX *standard*:

- `SDMX-CSV 1.0 <https://github.com/sdmx-twg/sdmx-csv/tree/v1.0>`__ corresponds to SDMX 2.1.
It supports only data and metadata, not structures.
- SDMX-CSV 2.0 corresponds to SDMX 3.0.
SDMX-CSV 1.0 files are recognizable by the header ``DATAFLOW`` in the first column of the first row.

.. versionadded:: 2.9.0
.. versionadded:: 2.9.0

Support for SDMX-CSV 1.0.
Support for *writing* SDMX-CSV 1.0.
See :mod:`.writer.csv`.

:mod:`sdmx` does not currently support *writing* SDMX-CSV.
See :issue:`34`.
:mod:`sdmx` does not currently support *reading* SDMX-CSV 1.0.

- `SDMX-CSV 2.0.0 <https://github.com/sdmx-twg/sdmx-csv/tree/v2.0.0>`_ corresponds to SDMX 3.0.0.
The format differs from and is not backwards compatible with SDMX-CSV 1.0.
SDMX-CSV 2.0.0 files are recognizable by the header ``STRUCTURE`` in the first column of the first row.

.. versionadded:: 2.19.0

Initial support for *reading* SDMX-CSV 2.0.0.
See :mod:`.reader.csv`.

:mod:`sdmx` does not currently support *writing* SDMX-CSV 2.0.0.

.. _sdmx-rest:
.. _web-service:
Expand Down
2 changes: 2 additions & 0 deletions doc/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ What's new?
Next release
============

- :mod:`.reader.csv` supports reading :ref:`SDMX-CSV 2.0.0 <sdmx-csv>` (corresponding to SDMX 3.0.0) (:pull:`201`, :issue:`34`).
See the implementation notes for information about the differences between the SDMX-CSV 1.0 and 2.0.0 formats and their support in :mod:`sdmx`.
- Bug fix for writing :class:`.VersionableArtefact` to SDMX-ML 2.1: :class:`KeyError` was raised if :attr:`.VersionableArtefact.version` was an instance of :class:`.Version` (:pull:`198`).
- Bug fix for reading data from structure-specific SDMX-ML: :class:`.XMLParseError` / :class:`NotImplementedError` was raised if reading 2 messages in sequence with different XML namespaces defined (:pull:`200`, thanks :gh-user:`mephinet` for :issue:`199`).

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ select = ["C9", "E", "F", "I", "W"]
ignore = ["E501", "W191"]
# Exceptions:
# - .client._handle_get_kwargs: 12
# - .reader.csv.Reader.inspect_header: 12
# - .reader.xml.v21._component_end: 12
# - .testing.generate_endpoint_tests: 11
# - .writer.pandas._maybe_convert_datetime: 23
Expand Down
24 changes: 22 additions & 2 deletions sdmx/model/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,27 @@ def __contains__(self, name): ...

# §3.4: Data Types


#: Per the standard…
#:
#: ..
#:
#: …used to specify the action that a receiving system should take when processing
#: the content that is the object of the action:
#:
#: Append
#: Data or metadata is an incremental update for an existing data/metadata set or
#: the provision of new data or documentation (attribute values) formerly absent.
#: If any of the supplied data or metadata is already present, it will not replace
#: that data or metadata.
#: Replace
#: Data/metadata is to be replaced and may also include additional data/metadata
#: to be appended.
#: Delete
#: Data/Metadata is to be deleted.
#: Information
#: Data and metadata are for information purposes.
#:
#: — SDMX 3.0.0 Section 2 §3.4.2.1
ActionType = Enum("ActionType", "delete replace append information")

ConstraintRoleType = Enum("ConstraintRoleType", "allowable actual")
Expand Down Expand Up @@ -1990,7 +2010,7 @@ def compare(self, other, strict=True):
class BaseDataSet(AnnotableArtefact):
"""Common features of SDMX 2.1 and 3.0 DataSet."""

#:
#: Action to be performed
action: Optional[ActionType] = None
#:
valid_from: Optional[str] = None
Expand Down
1 change: 1 addition & 0 deletions sdmx/model/v21.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
"DataStructureDefinition",
"DataflowDefinition",
"Observation",
"DataSet",
"StructureSpecificDataSet",
"GenericDataSet",
"GenericTimeSeriesDataSet",
Expand Down
1 change: 1 addition & 0 deletions sdmx/model/v30.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"DataStructureDefinition",
"Dataflow",
"Observation",
"DataSet",
"StructureSpecificDataSet",
"MetadataAttributeDescriptor",
"IdentifiableObjectSelection",
Expand Down
4 changes: 2 additions & 2 deletions sdmx/reader/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from pathlib import Path

from . import json, xml
from . import csv, json, xml

#: Reader classes
READERS = [json.Reader, xml.Reader]
READERS = [csv.Reader, json.Reader, xml.Reader]


def _readers():
Expand Down
Loading

0 comments on commit 830692b

Please sign in to comment.