Skip to content

Commit

Permalink
Merge pull request #15 from HenningTimm/line_numbers
Browse files Browse the repository at this point in the history
Line numbers
  • Loading branch information
HenningTimm authored Mar 18, 2024
2 parents 88ec2a3 + f9faca5 commit b6d508f
Show file tree
Hide file tree
Showing 17 changed files with 331 additions and 36 deletions.
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Changelog

## Version 0.6.0 (2024-03-18)

- Added a CHANGELOG file.
- Switched yaml input to use the ruamel.yaml round trip parser.
This parser is also "safe", i.e. does not allow code execution.
- Added line numbers and columns to error messages where possible.
For yaml files, these are taken from the ruamel.yaml round trip parser.
For tsv files, the column names within the specified block are reported.
- Added `--strict` flag to the `convert` sub-command that prevents converting files that produced warnings.
- The rules `keys_valid` and `keys_unique` now correctly report their name in output.
- Added integration tests checking tsv files.
- Improved user feedback for parameters dealing with rule names, e.g. `--skip`.


## Version 0.5.0 (2024-01-10)

- Improved documentation and user feedback
- Added option to check multiple files. You can now specify multiple input file paths for the check sub-command and those file paths can be glob patterns.
- Added suggestions for fixes to some lints.

Please note that the convert sub-command currently only works with one input file.


## Version 0.4.0 (2023-12-21)

- Added option to skip lints or reduce their severity to warning.
- Added first suggestions for fixes, e.g. minor typo detection with string matching.
- Added RULES.md to have a human readable documentation of employed lints.


## Version 0.3.0 (2023-12-05)

This release adds:

- Lints to detect trailing spaces
- Internal preparations to support multiple files
- Minor documentation improvements
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# yml2block

The yml2block script converts a YAML description of a dataverse-compliant
The yml2block script converts a YAML description of a Dataverse-compliant
metadata schema into a Dataverse metadata block TSV file.
Additionally, it can lint both YAML and TSV metadata block files for common errors.

For a list of releases and a documentation of added features etc. please refer to the [changelog](CHANGELOG.md).

## Requirements and Installation

Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "yml2block"
version = "0.5.0"
version = "0.6.0"
description = "Converts yaml files describing dataverse metadata blocks into tsv files understood by dataverse."
license = "MIT"
authors = [
Expand All @@ -13,7 +13,7 @@ authors = [
repository = "https://github.com/HenningTimm/yml2block"
readme = "README.md"
classifiers = [
"Development Status :: 2 - Pre-Alpha",
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: MIT License",
Expand Down
28 changes: 24 additions & 4 deletions tests/integration_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,15 @@ def test_duplicate_names_detected():
"""This test ensures that duplicate names are detected."""
runner = CliRunner()
result = runner.invoke(
yml2block.__main__.main, ["check", "tests/invalid/duplicate_name.yml"]
yml2block.__main__.main,
["check", "tests/invalid/duplicate_datasetfield_name.yml"],
)
assert result.exit_code == 1, result.output

runner = CliRunner()
result = runner.invoke(
yml2block.__main__.main,
["check", "tests/invalid/duplicate_datasetfield_name.tsv"],
)
assert result.exit_code == 1, result.output

Expand All @@ -37,11 +45,17 @@ def test_duplicate_top_level_key_detected():
assert result.exit_code == 1, result.output


def test_typo_in_key_detected():
"""This test ensures that typos in keys are detected."""
def test_typo_in_keyword_detected():
"""This test ensures that typos in top-level keywords are detected."""
runner = CliRunner()
result = runner.invoke(
yml2block.__main__.main, ["check", "tests/invalid/typo_in_keyword.yml"]
)
assert result.exit_code == 1, result.output

runner = CliRunner()
result = runner.invoke(
yml2block.__main__.main, ["check", "tests/invalid/typo_in_key.yml"]
yml2block.__main__.main, ["check", "tests/invalid/typo_in_keyword.tsv"]
)
assert result.exit_code == 1, result.output

Expand All @@ -54,6 +68,12 @@ def test_trailing_whitespace_detected():
)
assert result.exit_code == 1, result.output

runner = CliRunner()
result = runner.invoke(
yml2block.__main__.main, ["check", "tests/invalid/whitespace_in_key.tsv"]
)
assert result.exit_code == 1, result.output


def test_wrong_extensions_fail():
"""Ensure that files that do not end in tsv, csv, yml or yaml fail."""
Expand Down
10 changes: 10 additions & 0 deletions tests/invalid/duplicate_datasetfield_name.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#metadataBlock name dataverseAlias displayName
ValidExample Valid
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
Duplicate Duplicate key 1 This field describes duplicate 1. textbox TRUE FALSE FALSE FALSE TRUE TRUE ValidExample
Duplicate Duplicate key 2 This field describes duplicate 2. textbox TRUE FALSE FALSE FALSE TRUE TRUE ValidExample
Answer Answer text TRUE TRUE TRUE TRUE TRUE TRUE ValidExample
#controlledVocabulary DatasetField Value identifier displayOrder
AnswerYes Yes answer_positive
AnswerNo No answer_negative
AnswerMaybeSo Maybe answer_unclear
File renamed without changes.
9 changes: 9 additions & 0 deletions tests/invalid/typo_in_keyword.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#metadataBlock name dataverseAlias displayName
InvalidExample Invalid
#datasetFields name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
Description Description This field describes. textbox TRUE FALSE FALSE FALSE TRUE TRUE InvalidExample
Answer Answer text TRUE TRUE TRUE TRUE TRUE TRUE InvalidExample
#controlledVocabulary DatasetField Value identifier displayOrder
AnswerYes Yes answer_positive
AnswerNo No answer_negative
AnswerMaybeSo Maybe answer_unclear
File renamed without changes.
9 changes: 9 additions & 0 deletions tests/invalid/whitespace_in_key.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#metadataBlock name dataverseAlias displayName
InvalidExample Invalid
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id
Description Description This field describes. textbox TRUE FALSE FALSE FALSE TRUE TRUE InvalidExample
Answer Answer text TRUE TRUE TRUE TRUE TRUE TRUE InvalidExample
#controlledVocabulary DatasetField Value identifier displayOrder
AnswerYes Yes answer_positive
AnswerNo No answer_negative
AnswerMaybeSo Maybe answer_unclear
1 change: 1 addition & 0 deletions yml2block/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
from . import tsv_input
from . import yaml_input
from . import suggestions
from . import datatypes
41 changes: 39 additions & 2 deletions yml2block/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,23 @@ def __iter__(self):
for filename, violations in self.violations.items():
yield (filename, violations, min(violations, key=lambda x: x.level).level)

def safe_conversion_possible(self, file_path, strict=False):
"""Check if the file can be safely converted to tsv."""
try:
max_severity = min(self.violations[file_path], key=lambda x: x.level).level
if max_severity == Level.ERROR:
return False
elif max_severity == Level.WARNING:
if strict:
return False
else:
return True
else:
return True
except KeyError:
print(f"The file {file_path} is not present in this list of files.")
raise


def guess_input_type(input_path):
"""Guess the input type from the file name."""
Expand Down Expand Up @@ -150,6 +167,12 @@ def check(file_paths, warn, skip, warn_ec, verbose):

# Unpack all file paths as glob patterns
file_paths = [path for fp in file_paths for path in glob.glob(fp)]

# Return early with an error, if no files are found
if not file_paths:
print("No files found at path. No check was performed.")
sys.exit(1)

if verbose:
print(f"Checking the following files: {file_paths}\n")

Expand Down Expand Up @@ -186,11 +209,16 @@ def check(file_paths, warn, skip, warn_ec, verbose):
@click.option(
"--warn-ec", default=0, help="Error code used for lint warnings. Default: 0"
)
@click.option(
"--strict",
is_flag=True,
help="Fail conversion, if warnings are present. Default: false",
)
@click.option("--verbose", "-v", count=True, help="Print performed checks to stdout.")
@click.option(
"--outfile", "-o", nargs=1, help="Path to where the output file will be written."
)
def convert(file_path, warn, skip, warn_ec, verbose, outfile):
def convert(file_path, warn, skip, warn_ec, strict, verbose, outfile):
"""Convert a YML metadata block into a TSV metadata block.
Reads in the provided Dataverse Metadata Block in YML format and converts it into
Expand All @@ -207,6 +235,11 @@ def convert(file_path, warn, skip, warn_ec, verbose, outfile):
path, _ext = os.path.splitext(file_path)
outfile = f"{path}.tsv"

# Return early with an error, if no files are found
if not file_path:
print("No file found at path. Nothing to convert.")
sys.exit(1)

if verbose:
print(f"Checking input file: {file_path}\n\n")

Expand All @@ -233,8 +266,12 @@ def convert(file_path, warn, skip, warn_ec, verbose, outfile):

lint_violations.extend_for(file_path, file_lint_violations)

if input_type == "yaml" and file_path not in lint_violations:
if input_type == "yaml" and lint_violations.safe_conversion_possible(
file_path, strict
):
output.write_metadata_block(data, outfile, longest_row, verbose)
else:
print("Errors detected. No TSV file was written.")

return_violations(lint_violations, warn_ec, verbose)

Expand Down
76 changes: 76 additions & 0 deletions yml2block/datatypes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""This moudle contains custom datatypes used to create the internal YAML-like
structure used for metadata block lints. MOst importantly these types allow to
track line and column numbers for input data to enable more helpful user feedback.
"""

# from ruamel.yaml.scalarstring import LiteralScalarString, FoldedScalarString, DoubleQuotedScalarString, SingleQuotedScalarString, PlainScalarString
# from ruamel.yaml.scalarint import ScalarInt
# from ruamel.yaml.scalarbool import ScalarBoolean

# TODO: Implement custom constructor for ruamel yaml
# to get line and colum numbers for leafs in the yaml file
# https://stackoverflow.com/a/45717104


class MDBlockList(list):
__slots__ = ("line", "column")

def __init__(self, iterable=None, line=None, column=None):
"""Create a new MDBlock list that behaves like a normal list
but tracks line an column number in object slots.
"""
self.line = line
self.column = column

# Delegate initialization to the list constructore
if iterable is None:
super().__init__()
else:
super().__init__(iterable)

@classmethod
def from_ruamel(cls, ruamel_list):
"""Create a MDBlock list from an existing list read by the
ruamel.yaml round trip parser.
"""
return cls(ruamel_list, line=ruamel_list.lc.line, column=ruamel_list.lc.col)


class MDBlockDict(dict):
__slots__ = ("line", "column")

def __init__(self, mapping=None, /, line=None, column=None, **kwargs):
"""Create a new MDBlock Dictionary that behaves like a normal dictionary
but tracks line an column number in object slots.
"""
self.line = line
self.column = column

# Ensure an inner dict exists
if mapping is None:
mapping = dict()

# Remain compatible with pythons regular dict constructor by
# allowing kwargs to extend the passed mapping
mapping.update(kwargs)

super().__init__(mapping)

@classmethod
def from_ruamel(cls, ruamel_dict):
"""Create a MDBlock dict from an existing dict read by the
ruamel.yaml round trip parser.
"""
return cls(ruamel_dict, line=ruamel_dict.lc.line, column=ruamel_dict.lc.col)


class MDBlockNode:
__slots__ = ("line", "column", "value")

def __init__(self, value, line=None, column=None):
self.line = line
self.column = column
self.value = value

def __repr__(self):
return f"({self.line}, {self.column}) {self.value}"
3 changes: 2 additions & 1 deletion yml2block/output.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,11 @@ def write_metadata_block(yml_metadata, output_path, longest_line, verbose):
for block_line in content:
new_line = [""]

for key, value in block_line.items():
for key, entry in block_line.items():
if key not in block_headers:
block_headers.append(key)

value = entry.value
# TODO: Consider screening for True, False, None
# before and replace them.
if value is True:
Expand Down
Loading

0 comments on commit b6d508f

Please sign in to comment.