Skip to content
This repository has been archived by the owner on May 30, 2022. It is now read-only.

Latest commit

 

History

History
179 lines (151 loc) · 9.9 KB

CHANGELOG.md

File metadata and controls

179 lines (151 loc) · 9.9 KB

Changelog

0.2.4

  • BUG FIX: multipleOf validation
    • FIX LINK
    • Due to floating point errors in Python and JSONSchema, multipleOf validation has been failing.

0.2.3

  • FEATURES:
    • JSONSchema: anyOf Support
      • Streamed JSONSchemas which include anyOf combinations should now be fully supported
      • This allows for full support of Stitch/Singer's DateTime string fallbacks.
    • JSONSchema: allOf` Support
      • Streamed JSONSchemas which include allOf combinations should now be fully supported
      • Columns are persisted as normal.
      • This is perceived to be most useful for merging objects, and putting in place things like maxLength etc.
  • BUG FIX: Buffer Flushing at frequent intervals/with small batches
    • FIX LINK
    • Buffer size calculations relied upon some "sophisticated" logic for determining the "size" in memory of a Python object
    • The method used by Singer libraries is to simply use the size of the streamed JSON blob
    • Performance Improvement seen due to batches now being far larger and interactions with the remote being far fewer.
  • BUG FIX: NULLABLE not being implied when field is missing from streamed JSONSchema
    • FIX LINK
    • If a field was persisted in remote, but then left out of a subsequent streamed JSONSchema, we would fail
    • In this instance, the field is implied to be NULL, but additionally, if values are present for it in the streamed data, we should persist it.

0.2.2

  • FEATURES:
    • Performance improvement for upserting data
      • Saw long running queries for some SELECT COUNT(1)... queries
        • Resulting in full table scans
      • These queries are only being used for is_table_empty, therefore we can use a more efficient SELECT EXISTS(...) query which only needs a single row to be fetched

0.2.1

  • FEATURES:
    • Performance improvement for upserting data
      • For large or even reasonably sized tables, trying to upsert the data was prohibitively slow
      • To mitigate this, we now add indexes to allow
      • This change can be opted out of via the add_upsert_indexes config option
      • NOTE: This only effects intallations post 0.2.1, and will not upgrade/migrate existing installations
    • Support for latest PostgreSQL 12.0
      • PostgreSQL recently released 12.0, and we now have testing around it and can confirm that target-postgres should function correctly for it!
  • BUG FIX: STATE messages being sent at the wrong time
    • FIX LINK
    • STATE messages were being output incorrectly for feeds which had many streams outputting at varying rates

0.2.0

  • NOTE: The minor version bump is not expected to have much effect on folks. This was done to signal the output change from the below bug fix. It is our impression not many are using this feature yet anyways. Since this was not a patch change, we decided to make this a minor instead of major change to raise less concern. Thank you for your patience!
  • FEATURES:
  • BUG FIX: No STATE Message Wrapper necessary
    • FIX LINK
    • STATE messages are formatted as {"value": ...}
    • target-potgres emitted the full message
    • The official singer-target-template, doesn't write out that value "wrapper", and just writes the JSON blob contained in it
    • This fix makes target-postgres do the same

0.1.11

  • BUG FIX: canonicalize_identifier Not called on all identifiers persisted to remote
    • FIX LINK
    • Presently, on column splits/name collisions, we add a suffix to an identifier
    • Previously, we did not canonicalize these suffixes
    • While this was not an issue for any targets currently in production, it was an issue for some up and coming targets.
    • This fix simply makes sure to call canonicalize_identifier before persisting an identifier to remote

0.1.10

  • FEATURES:
    • Root Table Name Canonicalization
      • The stream name is used for the value of the root table name in Postgres
      • stream names are controlled exclusively by the tap and do not have to meet many standards
      • Previously, only stream names which were lowercase, alphanumeric, etc.
      • Now, the target can canonicalize the root table name, allowing for the input stream name to be whatever the tap provides.

0.1.9

  • Singer-Python: bumped to latest 5.6.1
  • Psycopg2: bumped to latest 2.8.2
  • FEATURES:
  • BUG FIX: ACTIVATE_VERSION Messages did not flush buffer
    • FIX LINK
    • When we issue an activate version record, we presently do not flush the buffer after writing the batch. This results in more records being written to remote than need to be.
    • This results in no functionality change, and should not alleviate any known bugs.
    • This should be purely performance related.

0.1.8

  • Singer-Python: bumped to latest
  • Minor housekeeping:
    • Updated container versions to latest
    • Updated README to reflect new versions of PostgreSQL Server

0.1.7

  • BUG FIX: A bug was identified for de-nesting.
    • ISSUE LINK
    • FAILING TESTS LINK
    • FIX LINK
    • Subtables with subtables did not serialize column names correctly
      • The column names ended up having the table names (paths) prepended on them
      • Due to the denested table schema and denested records being different no information showed up in remote.
      • This bug was ultimately tracked down to the core denesting logic.
    • This will fix failing uploads which had nullable columns in subtables but no data was seen populating those columns.
      • The broken schema columns will still remain
    • Failing schemas which had non-null columns in subtables will still be broken
      • To fix will require dropping the associated tables, potentially resetting the entire db/schema

0.1.6

  • BUG FIX: A bug was identified for path to column serialization.
    • LINK
    • A nullable properties which had multiple JSONSchema types
      • ie, something like [null, string, integer ...]
      • Failed to find an appropriate column in remote to persist None values to.
    • Found by usage of the Hubspot Tap

0.1.5

0.1.4

  • BUG FIX: A bug was identified in 0.1.3 with stream key_properties and canonicalization.
    • LINK
    • Discovered and fixed by @mirelagrigoras
    • If the key_properties for a stream changed due to canonicalization, the stream would fail to persist due to:
      • the persist_csv_rows key_properties values would remain un-canonicalized (sp?) and therefore cause issues once serialized into a SQL statement
      • the pre-checks for tables would break because no values could be pulled from the schema with un-canonicalized fields pulled out of the key_properties
    • NOTE: the key_properties metadata is saved with raw field names.

0.1.3

  • SCHEMA_VERSION: 1
    • LINK
    • Initialized a new field in remote table schemas schema_version
    • A migration in PostgresTarget handles updating this
  • BUG FIX: A bug was identified in 0.1.2 with column type splitting.
    • LINK
    • A schema with a field of type string is persisted to remote
      • Later, the same field is of type date-time
        • The values for this field will not be placed under a new column, but rather under the original string column
    • A schema with a field of type date-time is persisted to remote
      • Later, the same field is of type string
        • The original date-time column will be made nullable
        • The values for this field will fail to persist
  • FEATURES:
    • Added the logging_level config option which uses standard Python Logger Levels to configure more details about what Target-Postgres is doing
      • Query level logging and timing
      • Table schema changes logging and timing