Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using text generator resulting in error #299

Open
anoopnarang opened this issue Sep 5, 2024 · 1 comment
Open

Using text generator resulting in error #299

anoopnarang opened this issue Sep 5, 2024 · 1 comment
Assignees

Comments

@anoopnarang
Copy link

anoopnarang commented Sep 5, 2024

Expected Behavior

Should work without error

Current Behavior

Getting the following error

  File "./dependencies.zip/dbldatagen/text_generators.py", line 881, in pandasGenerateText
    results = self.generateText(rows, rows.size)
  File "./dependencies.zip/dbldatagen/text_generators.py", line 768, in generateText
    para_stats = np.clip(para_stats_raw, self._minValues, self._maxValues, out=stats_array)
  File "/usr/local/lib64/python3.9/site-packages/numpy/_core/fromnumeric.py", line 2247, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
  File "/usr/local/lib64/python3.9/site-packages/numpy/_core/fromnumeric.py", line 66, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/usr/local/lib64/python3.9/site-packages/numpy/_core/fromnumeric.py", line 46, in _wrapit
    result = getattr(arr, method)(*args, **kwds)
  File "/usr/local/lib64/python3.9/site-packages/numpy/_core/_methods.py", line 108, in _clip
    return um.clip(a, min, max, out=out, **kwargs)
numpy._core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'clip' output from dtype('float64') to dtype('uint8') with casting rule 'same_kind'

Steps to Reproduce (for bugs)

Install dbldatagen using pip install dbldatagen

Generate a custom dataset with a text generator column

 .withColumn("essay", text=dg.ILText(paragraphs=(1, 4), sentences=(2, 6)), random=True)

Context

Trying to create a regular dataset with a text column, it throws this error. Other type of columns work fine.
I think AWS Emr serverless by default is using newer versions of numpy which is not compatible with dbldatagen.

Your Environment

  • dbldatagen version used: 0.4.0
  • Databricks Runtime version: Aws EMR serverless
  • Cloud environment used: Aws
@ronanstokes-db
Copy link
Contributor

ronanstokes-db commented Sep 19, 2024

Is this on a Databricks runtime environment ? If so, the version of Numpy and Pandas used are determined by the Databricks runtime.

Which version of the Databricks runtime was being used ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants