ArcanaFramework · tclose · Aug 29, 2024 · Aug 27, 2024 · Aug 27, 2024 · Aug 28, 2024
diff --git a/README.rst b/README.rst
@@ -14,6 +14,8 @@ FileFormats
    :target: https://arcanaframework.github.io/fileformats/
    :alt: Documentation Status
 
+.. image:: ./docs/source/_static/images/logo_small.png
+    :alt: Logo
 
 *Fileformats* provides a library of file-format types implemented as Python classes.
 The file-format types were designed to be used in type validation and data movement
@@ -110,19 +112,31 @@ There are 2 main functions that can be used for format identification
 ``from_mime``
 ~~~~~~~~~~~~~
 
-As the name suggests, this function is used to return the FileFormats class corresponding to a given `MIME <https://www.iana.org/assignments/media-types/media-types.xhtml>`__ string. All non-vendor official MIME-types are supported. Non-official types can be loaded using the `application/x-name-of-type`
-form as long as the name of the type is unique amongst all installed format types. To avoid name clashes between different extension types, the "MIME-like" string can be used instead, where informal registries corresponding to the fileformats extension namespace are used instead, e.g. `medimage/nifti-gz` or `datascience/hdf5`.
+As the name suggests, this function is used to return the FileFormats class corresponding
+to a given `MIME <https://www.iana.org/assignments/media-types/media-types.xhtml>`__ string.
+All non-vendor official MIME-types are supported. Non-official types can be loaded using
+the `application/x-name-of-type` form as long as the name of the type is unique amongst
+all installed format types. To avoid name clashes between different extension types, the
+"MIME-like" string can be used instead, where informal registries corresponding to the
+fileformats extension namespace are used instead, e.g. `medimage/nifti-gz` or `datascience/hdf5`.
 
 ``find_matching``
 ~~~~~~~~~~~~~~~~~
 
-Given a set of file-system paths, by default, ``find_matching`` will iterate through all installed fileformats classes and return all that validate successfully (formats without any specific constraints are excluded by default). The potential candidate classes can be restricted by using the `candidates` keyword argument.
+Given a set of file-system paths, by default, ``find_matching`` will iterate through all
+installed fileformats classes and return all that validate successfully (formats without
+any specific constraints are excluded by default). The potential candidate classes can be
+restricted by using the `candidates` keyword argument.
 
 
 Format Conversion
 -----------------
 
-While not implemented in the main File-formats itself, file-formats provides hooks for other packages to implement extra behaviour such as format conversion. The `fileformats-extras <https://github.com/ArcanaFramework/fileformats-extras>`__ implements a number of converters between standard file-format types, e.g. archive types to/from generic file/directories, which if installed can be called using the `convert()` method.
+While not implemented in the main File-formats itself, file-formats provides hooks for
+other packages to implement extra behaviour such as format conversion.
+The `fileformats-extras <https://github.com/ArcanaFramework/fileformats-extras>`__
+implements a number of converters between standard file-format types, e.g. archive types
+to/from generic file/directories, which if installed can be called using the `convert()` method.
 
 .. code-block:: python
 

diff --git a/docs/logo_dev/logo.webp b/docs/logo_dev/logo.webp
diff --git a/docs/logo_dev/snake-around-folder.webp b/docs/logo_dev/snake-around-folder.webp
diff --git a/docs/logo_dev/snake-transparent.psd b/docs/logo_dev/snake-transparent.psd
diff --git a/docs/logo_dev/snake-transparent.webp b/docs/logo_dev/snake-transparent.webp
diff --git a/docs/logo_dev/snake-trimmed-mod.psd b/docs/logo_dev/snake-trimmed-mod.psd
diff --git a/docs/logo_dev/snake-trimmed.psd b/docs/logo_dev/snake-trimmed.psd
diff --git a/docs/source/_static/images/logo.png b/docs/source/_static/images/logo.png
diff --git a/docs/source/_static/images/logo_small.png b/docs/source/_static/images/logo_small.png
diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -0,0 +1,67 @@
+Public API
+==========
+
+Functions
+~~~~~~~~~
+
+.. autofunction:: fileformats.core.to_mime
+
+.. autofunction:: fileformats.core.from_mime
+
+.. autofunction:: fileformats.core.find_matching
+
+.. autofunction:: fileformats.core.from_paths
+
+
+Core
+~~~~
+
+.. autoclass:: fileformats.core.FileSet
+    :members: mime_type, mime_like, from_mime, strext, unconstrained, possible_exts, metadata, select_metadata, select_by_ext, matching_exts, convert, get_converter, register_converter, all_formats, standard_formats, hash, hash_files, mock, sample, decomposed_fspaths, from_paths, copy, move
+
+.. autoclass:: fileformats.core.Field
+    :members: mime_like, from_mime, to_primitive, from_primitive
+
+
+Generic
+~~~~~~~
+
+.. autoclass:: fileformats.generic.FsObject
+
+.. autoclass:: fileformats.generic.File
+
+.. autoclass:: fileformats.generic.Directory
+
+.. autoclass:: fileformats.generic.DirectoryOf
+
+.. autoclass:: fileformats.generic.SetOf
+
+
+Field
+~~~~~
+
+.. autoclass:: fileformats.field.Text
+
+.. autoclass:: fileformats.field.Integer
+
+.. autoclass:: fileformats.field.Decimal
+
+.. autoclass:: fileformats.field.Boolean
+
+.. autoclass:: fileformats.field.Array
+
+
+Mixins
+~~~~~~
+
+.. autoclass:: fileformats.core.mixin.WithMagicNumber
+
+.. autoclass:: fileformats.core.mixin.WithMagicVersion
+
+.. autoclass:: fileformats.core.mixin.WithAdjacentFiles
+
+.. autoclass:: fileformats.core.mixin.WithSeparateHeader
+
+.. autoclass:: fileformats.core.mixin.WithSideCars
+
+.. autoclass:: fileformats.core.mixin.WithClassifiers
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -160,7 +160,7 @@
 
 # The name of an image file (relative to this directory) to place at the top
 # of the sidebar.
-# html_logo = "_static/images/logo_small.png"
+html_logo = "_static/images/logo_small.png"
 
 # The name of an image file (within the static path) to use as favicon of the
 # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32

diff --git a/docs/source/detection.rst b/docs/source/detection.rst
@@ -0,0 +1,106 @@
+
+Detection
+=========
+
+*FileFormats* has been designed to detect whether a set of files matches a given
+format specification. This can be used either be in the form of validating file types
+in workflows or identifying the format in which user input files have been provided.
+
+Validation
+----------
+
+In the basic case, *FileFormats* can be used for checking the format of files and
+directories against known types. Typically, there are two layers of checks, ones
+performed on the file-system paths alone,
+
+.. code-block:: python
+
+    from fileformats.image import Jpeg
+
+    jpeg_file = Jpeg("/path/to/image.jpg")  # PASSES
+    jpeg_file = Jpeg("/path/to/image.png")  # FAILS!
+
+
+The second layer of checks, which typically require reading the file and peeking at its
+contents for magic numbers and the like
+
+.. code-block:: python
+
+    fspath = "/path/to/fake-image.jpg"
+
+    with open(fspath, "w") as f:
+        f.write("this is not a valid JPEG file")
+
+    jpeg_file = Jpeg(fspath)  # FAILS!
+
+
+Directories are classified by the contents of the files within them, via the
+``content_types`` class attribute, e.g.
+
+.. code-block:: python
+
+    from fileformats.generic import File, Directory
+
+    class Dicom(WithMagicNumber, File):
+        magic_number = b"DICM"
+        magic_number_offset = 128
+
+    class  DicomDir(Directory):
+        content_types = (Dicom,)
+
+
+Note that only one file within the directory needs to match the specified content type
+for it to be considered a match and additional files will be ignored. For example,
+the ``Dicom`` type would be considered valid on the following directory structure
+despite the presence of the ``.DS_Store`` directory and the ``catalog.xml`` file.
+
+.. code-block::
+
+    dicom-directory
+    ├── .DS_Store
+    │   ├── deleted-file1.txt
+    │   ├── deleted-file2.txt
+    │   └── ...
+    ├── 1.dcm
+    ├── 2.dcm
+    ├── 3.dcm
+    ├── ...
+    ├── 1024.dcm
+    └── catalog.xml
+
+In addition to statically defining `Directory` formats such as the Dicom example above,
+dynamic directory types can be created on the fly by providing the content types as
+arguments to the `DirectoryOf[]` method,
+e.g.
+
+.. code-block:: python
+
+    from fileformats.generic import Directory
+    from fileformats.image import Png
+    from fileformats.text import Csv
+
+    def my_task(image_dir: DirectoryOf[Png]) -> Csv:
+        ... task implementation ...
+
+.. _Pydra: https://pydra.readthedocs.io
+.. _Fastr: https://gitlab.com/radiology/infrastructure/fastr
+
+
+Identification
+--------------
+
+The ``find_matching`` function can be used to list the formats that match a given file
+
+.. code-block::
+
+    >>> from fileformats.core import find_matching
+    >>> find_matching("/path/to/word.doc")
+    [<class 'fileformats.application.Msword'>]
+
+.. warning::
+   The installation of extension packages may cause detection code to break if one of
+   the newly added formats also matches the file and your code doesn't handle this case.
+   If you are only interested in formats covered in the main fileformats package then
+   you should use the ``standard_only`` flag
+
+Alter
diff --git a/docs/source/developer.rst b/docs/source/developer.rst
@@ -260,8 +260,8 @@ files and another one for little endian files. Therefore we can't just use the
 ``fileformats.core.mark.check``.
 
 
-Converters
-----------
+Implementing converters
+-----------------------
 
 Converters between two equivalent formats are defined using Pydra_ dataflow engine
 `tasks <https://pydra.readthedocs.io/en/latest/components.html>`_. There are two types
@@ -409,7 +409,7 @@ a warning if the import fails, when get_converter is called on a format in that
 namespace.
 
 
-.. note::
+.. warning::
     If the converters aren't imported successfully, then you will receive a
     ``FormatConversionError`` error saying there are no converters between FormatA and
     FormatB.

diff --git a/docs/source/extras.rst b/docs/source/extras.rst
@@ -0,0 +1,93 @@
+
+Read, write and convert
+=======================
+
+In addition to the basic features of validation and path handling, it is possible to
+implement methods to interact with the data of file format objects via "extras hooks".
+Such features are added to selected format classes on a needs basis (pull requests
+welcome 😊, see :ref:`Developer Guide`), so are by no means comprehensive, and
+are provided "as-is".
+
+Since these features typically rely on a range of external libraries, they are kept in
+separate *extras* packages (e.g.
+`fileformats-extras <https://pypi.org/project/fileformats-extras/>`__,
+`fileformats-medimage-extras <https://pypi.org/project/fileformats-medimage-extras/>`__),
+which need to be installed separately.
+
+
+Metadata
+--------
+
+If there has been an extras overload registered for the ``read_metadata`` method,
+then metadata associated with the fileset can be accessed via the ``metadata`` property,
+e.g.
+
+.. code-block:: python
+
+    >>> dicom.metadata["SeriesDescription"]
+    "localizer"
+
+Formats the ``WithSeparateHeader`` and ``WithSideCars`` mixin classes will attempt the
+side car if a metadata reader is implemented (e.g. JSON) and merge that with any header
+information read from the primary file.
+
+
+Reading and writing
+-------------------
+
+Several classes in the base fileformats package implement ``load`` and ``save`` methods.
+An advantage of implementing them  in the format class is that objects instantiated from
+them can then be duck-typed in calling functions/methods. For example, both ``Yaml`` and
+``Json`` formats (both inherit from the ``DataSerialization`` type) implement the
+``load`` method, which returns a dictionary
+
+.. code-block:: python
+
+    from fileformats.application import DataSerialization  # i.e. JSON or YAML
+
+    def read_serialisation(serialized: DataSerialization) -> dict:
+        return serialized.load()
+
+
+Converters
+----------
+
+Several conversion methods are available between equivalent file-formats in the standard
+classes. For example, archive types such as ``Zip`` can be converted into and generic
+file/directories using the ``convert`` classmethod of the target format to convert to
+
+.. code-block:: python
+
+    from fileformats.application import Zip
+    from fileformats.generic import Directory
+
+    # Example round trip from directory to zip file
+    zip_file = Zip.convert(Directory("/path/to/a/directory"))
+    extracted = Directory.convert(zip_file)
+
+The converters are implemented in the Pydra_ dataflow framework, and can be linked into
+wider Pydra_ workflows by accessing the underlying converter task with the ``get_converter``
+classmethod
+
+.. code-block:: python
+
+    import pydra
+    from pydra.tasks.mypackage import MyTask
+    from fileformats.image import Gif, Png
+
+    wf = pydra.Workflow(name="a_workflow", input_spec=["in_gif"])
+    wf.add(
+        Png.get_converter(Gif, name="gif2png", in_file=wf.lzin.in_gif)
+    )
+    wf.add(
+        MyTask(
+            name="my_task",
+            in_file=wf.gif2png.lzout.out_file,
+        )
+    )
+    ...
+
+
+
+.. _Pydra: https://pydra.readthedocs.io
+.. _Analyze: https://en.wikipedia.org/wiki/Analyze_(imaging_software)