Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Draft: Calculate max/min for DECIMAL columns #108

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

paultiplady
Copy link
Contributor

Problem

Targets must know the width of columns in order to be able to
correctly represent them. For example, BigQuery distinguishes
DECIMAL and BIGDECIMAL types, and a SQL database like target-mysql
or target-postgres would need to be able to reconstruct the actual
precision/scale values in order to be able to create a valid schema.

Proposed changes

To encode the DECIMAL column as JSONSchema, use Python Decimal
machinery to calculate the Maximum/Minimum values of a decimal
with the specified precision.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Since this changes the output schema, I think it's probably a breaking change. Some targets might not be able to handle a modification to the schema.

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

Targets must know the width of columns in order to be able to
correctly represent them. For example, BigQuery distinguishes
DECIMAL and BIGDECIMAL types, and a SQL database like target-mysql
or target-postgres would need to be able to reconstruct the actual
precision/scale values in order to be able to create a valid schema.

To encode the DECIMAL column as JSONSchema, use Python Decimal
machinery to calculate the Maximum/Minimum values of a decimal
with the specified precision.
@paultiplady paultiplady changed the title Calculate max/min for DECIMAL columns Draft: Calculate max/min for DECIMAL columns Apr 22, 2022
@paultiplady
Copy link
Contributor Author

This isn't complete, but I wanted to share this WIP and get your thoughts on whether it's an approach you'd be happy to upstream.

with localcontext(Context(Emax=before_dp, prec=column.numeric_precision)):
largest_decimal = Decimal('1.0E+' + str(before_dp)).next_minus()

result.maximum = float(largest_decimal)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be configurable; by default the old behavior can be kept, but if you opt in to schema_includes_decimal_max_min config then this would be conditionally enabled.

Without a config flag, it'll be backwards-incompatible and will break existing target-schemas.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant