Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The data_quality__eligibility_death_flag check causes a comparison error when run on PostgreSQL #553

Open
BuzzCutNorman opened this issue Jul 25, 2024 · 4 comments · Fixed by #640
Labels
bug Something isn't working community Label for issues created by community members

Comments

@BuzzCutNorman
Copy link
Contributor


Describe the bug - Required
The data_quality__eligibility_death_flag check causes a comparison error when run on PostgreSQL. PostgreSQL will only allow boolean to be compared with data types of boolean and str. Here is a link that explains in more detail.

https://www.postgresql.org/docs/current/datatype-boolean.html

I am also assuming that in the input layer eligibility definition that boolean mean the boolean type for the datawarehouse platform. Looking at the query for the death flag check I could have assumed incorrectly, and I should be converting the boolean columns to int of only 1 or 0.


Environment - Required

  • Tuva project package version (e.g. 0.6.0): 0.10.2
  • dbt version (e.g. 1.7): 1.8.4
  • dbt type (e.g. dbt cloud or dbt CLI): dbt CLI
  • Data warehouse (e.g. Snowflake): PostgreSQL

To Reproduce
Steps to reproduce the behavior:
I executed dbt run with claims_enabled: true set in the dbt_project.yml:


Expected behavior
The data quality checks would run without encountering any PostgreSQL operator errors..


Screenshots
If applicable, add screenshots to help explain your problem.


Additional context

�[0m13:45:20.781423 [debug] [Thread-4 (]: Began running node model.the_tuva_project.data_quality__eligibility_death_flag
�[0m13:45:20.781423 [info ] [Thread-4 (]: 55 of 638 START sql table model data_quality.eligibility_death_flag ............ [RUN]
�[0m13:45:20.781423 [debug] [Thread-4 (]: Re-using an available connection from the pool (formerly model.the_tuva_project.data_quality__eligibility_death_date, now model.the_tuva_project.data_quality__eligibility_death_flag)
�[0m13:45:20.781423 [debug] [Thread-4 (]: Began compiling node model.the_tuva_project.data_quality__eligibility_death_flag
�[0m13:45:20.781423 [debug] [Thread-4 (]: Writing injected SQL for node "model.the_tuva_project.data_quality__eligibility_death_flag"
�[0m13:45:20.781423 [debug] [Thread-4 (]: Began executing node model.the_tuva_project.data_quality__eligibility_death_flag
�[0m13:45:20.781423 [debug] [Thread-4 (]: Writing runtime sql for node "model.the_tuva_project.data_quality__eligibility_death_flag"
�[0m13:45:20.797048 [debug] [Thread-4 (]: Using postgres connection "model.the_tuva_project.data_quality__eligibility_death_flag"
�[0m13:45:20.797048 [debug] [Thread-4 (]: On model.the_tuva_project.data_quality__eligibility_death_flag: BEGIN
�[0m13:45:20.797048 [debug] [Thread-4 (]: Opening a new connection, currently in state closed
�[0m13:45:20.828294 [debug] [Thread-4 (]: SQL status: BEGIN in 0.037 seconds
�[0m13:45:20.828294 [debug] [Thread-4 (]: Using postgres connection "model.the_tuva_project.data_quality__eligibility_death_flag"
�[0m13:45:20.828294 [debug] [Thread-4 (]: On model.the_tuva_project.data_quality__eligibility_death_flag: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "meltano", "target_name": "prod", "node_id": "model.the_tuva_project.data_quality__eligibility_death_flag"} */

  
    

  create  table "datawarehouse"."data_quality"."eligibility_death_flag__dbt_tmp"
  
  
    as
  
  (
    

SELECT DISTINCT
    M.Data_SOURCE
    ,coalesce(cast(M.ENROLLMENT_START_DATE as TEXT),cast('1900-01-01' as TEXT)) AS SOURCE_DATE
    ,'ELIGIBILITY' AS TABLE_NAME
    ,'Member ID' AS DRILL_DOWN_KEY
    ,coalesce(M.Member_ID, 'NULL') as drill_down_value
    ,'ELIGIBILITY' AS CLAIM_TYPE
    ,'DEATH_FLAG' AS FIELD_NAME
    ,CASE
        WHEN M.DEATH_FLAG in (1,0) THEN 'valid'
        WHEN M.DEATH_FLAG is null then 'null'
        ELSE 'invalid'
        END AS BUCKET_NAME
    ,cast(null as TEXT) as INVALID_REASON
    ,CAST(Death_Flag as TEXT) AS FIELD_VALUE
    , '2024-07-25 19:29:16.725294+00:00' as tuva_last_run
FROM "datawarehouse"."input_layer"."eligibility" M
  );
  
�[0m13:45:20.859549 [debug] [Thread-4 (]: Postgres adapter: Postgres error: operator does not exist: boolean = integer
LINE 23:         WHEN M.DEATH_FLAG in (1,0) THEN 'valid'
                                   ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

�[0m13:45:20.859549 [debug] [Thread-4 (]: On model.the_tuva_project.data_quality__eligibility_death_flag: ROLLBACK
�[0m13:45:20.859549 [debug] [Thread-4 (]: On model.the_tuva_project.data_quality__eligibility_death_flag: Close
�[0m13:45:20.953290 [debug] [Thread-4 (]: Database Error in model data_quality__eligibility_death_flag (models\data_quality\intermediate\atomic_checks\claims\eligibility\data_quality__eligibility_death_flag.sql)
  operator does not exist: boolean = integer
  LINE 23:         WHEN M.DEATH_FLAG in (1,0) THEN 'valid'
                                     ^
  HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.
  compiled Code at C:\EL_Projects\tuva-project/.meltano/transformers/dbt/target\run\the_tuva_project\models\data_quality\intermediate\atomic_checks\claims\eligibility\data_quality__eligibility_death_flag.sql
�[0m13:45:20.953290 [error] [Thread-4 (]: 55 of 638 ERROR creating sql table model data_quality.eligibility_death_flag ... [�[31mERROR�[0m in 0.17s]
�[0m13:45:20.953290 [debug] [Thread-4 (]: Finished running node model.the_tuva_project.data_quality__eligibility_death_flag
@BuzzCutNorman BuzzCutNorman added the bug Something isn't working label Jul 25, 2024
@aneiderhiser
Copy link
Contributor

@BuzzCutNorman do you know if this is still an issue?

@BuzzCutNorman
Copy link
Contributor Author

Just checked 0.12.2 without a work around and got the same error:

20:30:38  Finished running 1 table model, 1 project hook in 0 hours 0 minutes and 3.27 seconds (3.27s).
20:30:39
20:30:39  Completed with 1 error and 0 warnings:
20:30:39
20:30:39    Database Error in model data_quality__eligibility_death_flag (models\data_quality\dqi\intermediate\atomic_checks\claims\eligibility\data_quality__eligibility_death_flag.sql)
  operator does not exist: boolean = integer
  LINE 23:         when m.death_flag in (1,0) then 'valid'
                                     ^
  HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

@aneiderhiser aneiderhiser added the community Label for issues created by community members label Nov 9, 2024
@github-project-automation github-project-automation bot moved this from Ready for Review to Ready to Release in The Tuva Project Backlog Nov 22, 2024
@sfangman
Copy link

Hi @sarah-tuva and team 👋

I think this issue may be rearing its head again for Redshift users as Redshift does not allow for casting from boolean to text/varchar

ex:

select cast(true as text);

returns

ERROR: cannot cast type boolean to character varying

Any other thoughts for how we can address this aside from having our boolean columns be varchar with 1/0 values?

@sarah-tuva
Copy link
Member

Thanks for letting us know that this is still a problem. I am reopening the issue.

I figured out why this was not caught in CI/CD. Our test dataset has death_flag stored as an integer rather than a boolean.

Potential fix for this is to cast the flag as an integer first and then string.

SELECT DISTINCT
    m.data_source
    ,coalesce(cast(m.enrollment_start_date as TEXT),cast('1900-01-01' as TEXT)) as source_date
    ,'ELIGIBILITY' AS table_name
    ,'Member ID' AS drill_down_key
    ,coalesce(m.member_id, 'NULL') as drill_down_value
    ,'ELIGIBILITY' AS claim_type
    ,'DEATH_FLAG' AS field_name
    ,case
        when cast(cast(m.death_flag as integer) as TEXT) in ('1','0') then 'valid'
        when m.death_flag is null then 'null'
        else 'invalid'
        end as bucket_name
    ,cast(null as TEXT) as invalid_reason
    ,cast(cast(death_flag as integer) as TEXT) as field_value
    , '2024-12-18 19:13:01.274056+00:00' as tuva_last_run
from "dev_sarah"."input_layer"."eligibility" m;

@sarah-tuva sarah-tuva reopened this Dec 18, 2024
@sarah-tuva sarah-tuva moved this from Ready to Release to Ready to Pull in The Tuva Project Backlog Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community Label for issues created by community members
Projects
Status: Ready to Pull
Development

Successfully merging a pull request may close this issue.

4 participants