Fix dt regression in empty() #898

martindurant · 2023-10-26T15:42:59Z

Fixes #897

martindurant · 2023-10-26T15:43:37Z

@jrbourbeau , I'll merge this when it passes, and that should be enough to make dask CI happy.

jrbourbeau · 2023-10-26T15:46:29Z

Thanks for fixing so quickly @martindurant!

Will there be a release out with this patch soon? We use releases in most CI build (one build uses main for fastparquet). If not, I'll just add some skip logic

martindurant · 2023-10-26T15:47:40Z

Will there be a release out with this patch soon

Yes, since the windows-py3.12 wheel failed to build in the last round anyway.

martindurant · 2023-10-26T17:32:21Z

@jrbourbeau , would you mind running your main-branch CI somewhere to see if the failures go away?

jrbourbeau · 2023-10-26T17:46:57Z

Locally I'm getting the same error

____________________________________________________________________________________________________________________________________ test_timestamp96 _____________________________________________________________________________________________________________________________________

tmpdir = local('/private/var/folders/h0/_w6tz8jd3b9bk6w7d_xpg9t40000gn/T/pytest-of-james/pytest-21/test_timestamp960')

    @FASTPARQUET_MARK
    def test_timestamp96(tmpdir):
        fn = str(tmpdir)
        df = pd.DataFrame({"a": [pd.to_datetime("now", utc=True)]})
        ddf = dd.from_pandas(df, 1)
        ddf.to_parquet(fn, engine="fastparquet", write_index=False, times="int96")
        pf = fastparquet.ParquetFile(fn)
        assert pf._schema[1].type == fastparquet.parquet_thrift.Type.INT96
>       out = dd.read_parquet(fn, engine="fastparquet", index=False).compute()

dask/dataframe/io/tests/test_parquet.py:1883:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/base.py:342: in compute
    (result,) = compute(self, traverse=False, **kwargs)
dask/base.py:628: in compute
    results = schedule(dsk, keys, **kwargs)
dask/dataframe/io/parquet/core.py:96: in __call__
    return read_parquet_part(
dask/dataframe/io/parquet/core.py:654: in read_parquet_part
    dfs = [
dask/dataframe/io/parquet/core.py:655: in <listcomp>
    func(
dask/dataframe/io/parquet/fastparquet.py:1075: in read_partition
    return cls.pf_to_pandas(
dask/dataframe/io/parquet/fastparquet.py:1115: in pf_to_pandas
    df, views = pf.pre_allocate(size, columns, categories, index)
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:797: in pre_allocate
    df, arrs = _pre_allocate(size, columns, categories, index, cats,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/api.py:1051: in _pre_allocate
    df, views = dataframe.empty(dtypes, size, cols=cols, index_names=index,
../../../mambaforge/envs/dask-py310/lib/python3.10/site-packages/fastparquet/dataframe.py:202: in empty
    values = type(bvalues)._from_sequence(values, copy=False, dtype=bvalues.dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

pandas/_libs/tslibs/tzconversion.pyx:187: ValueError

Note it looks like the line changed in this PR is similar, but not exactly the same, to the line where the error is being raised. Maybe both lines need the same sort of update

martindurant · 2023-10-26T17:49:11Z

What's your pandas version?

jrbourbeau · 2023-10-26T17:49:46Z

In [1]: import pandas as pd
pd
In [2]: pd.__version__
Out[2]: '1.5.3'

martindurant · 2023-10-26T17:52:45Z

OK, then I think all the pandas I have and in tests are too new... Hold on.

Fix dt regression in empty()

9ed4443

martindurant merged commit 89acf38 into dask:main Oct 26, 2023
21 checks passed

martindurant deleted the dt_again branch October 26, 2023 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dt regression in empty() #898

Fix dt regression in empty() #898

martindurant commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023

Fix dt regression in empty() #898

Fix dt regression in empty() #898

Conversation

martindurant commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023

jrbourbeau commented Oct 26, 2023

martindurant commented Oct 26, 2023