PARSynthesizer is not learning rounding scheme for numerical columns #2274

npatki · 2024-10-31T23:46:33Z

Environment Details

SDV version: 1.17.1

Error Description

First observed in #2241: If I have a numerical, sequential column with a particular rounding scheme, I would expect that all SDV synthesizers will learn the rounding scheme and ensure the synthetic data that is produced has the same. But this is not the case for PARSynthesizer.

Steps to reproduce

In the example below, the numerical column col_A is always rounded to 2 digits. Observe how the synthetic data does not follow that scheme.

import pandas as pd
import numpy as np

from sdv.metadata import Metadata
from sdv.sequential import PARSynthesizer

data = pd.DataFrame(data={
    'id': ['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c'],
    'col_A': [5000.23, 4500.23, 4300.45, 2300.11, 3212.31, np.nan, 3456.34, 7890.12, 8201.00, 9810.12]
})

metadata = Metadata.load_from_dict({
    'tables': {
        'table': {
            'sequence_key': 'id',
            'columns': {
                'id': { 'sdtype': 'id' },
                'col_A': { 'sdtype': 'numerical'}
            }
        },
    }
})

synthesizer = PARSynthesizer(metadata, epochs=1)
synthesizer.fit(data)
synthesizer.sample(num_sequences=2)

Additional Context

Observe also that other synthesizers such as the GaussianCopula are able to correctly learn the rounding scheme and produce synthetic data that is correctly formatted.

from sdv.single_table import GaussianCopulaSynthesizer

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthesizer.sample(num_rows=5)

The text was updated successfully, but these errors were encountered:

npatki added bug Something isn't working data:sequential Related to timeseries datasets labels Oct 31, 2024

npatki mentioned this issue Nov 1, 2024

PARSynthesizer samples uniformly distributed time series data #2241

Closed

frances-h mentioned this issue Nov 12, 2024

PARSynthesizer is not learning rounding scheme for numerical columns #2289

Merged

frances-h closed this as completed in #2289 Nov 13, 2024

frances-h added this to the 1.17.2 milestone Nov 13, 2024

amontanez24 assigned frances-h Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARSynthesizer is not learning rounding scheme for numerical columns #2274

PARSynthesizer is not learning rounding scheme for numerical columns #2274

npatki commented Oct 31, 2024 •

edited

Loading

PARSynthesizer is not learning rounding scheme for numerical columns #2274

PARSynthesizer is not learning rounding scheme for numerical columns #2274

Comments

npatki commented Oct 31, 2024 • edited Loading

Environment Details

Error Description

Steps to reproduce

Additional Context

npatki commented Oct 31, 2024 •

edited

Loading