You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
query = """
SELECT
CLIENTES.ID_CLIENTE AS ID_CLIENTE_ID,
CLIENTES.CPF_CNPJ AS CPF_CNPJ_CPF,
NOV.ID_BASE AS IDBASE_NOVO,
CLIENTES.NOME AS NOME_CLIENTE_NOME,
NEGOCIOS.ID_COD_PLANOPK AS COD_PK,
NEGOCIOS.ID_COD_SOLICITACAO AS ID_COD_SOL_WIFI,
CASE
WHEN CLIENTES.IS_PESSOA_FISICA = 1 THEN 'PF'
WHEN CLIENTES.ID_RAMO_ATIVIDADE = 4 THEN 'GOVERNAMENTAL'
WHEN CLIENTES.IS_PESSOA_FISICA = 0 THEN 'PJ'
ELSE ''
END AS CLASSIFICACAO_CLIENTE_CLASS,
CASE
WHEN CLIENTES.SITUACAO = '0' THEN 'ATIVO'
WHEN CLIENTES.SITUACAO = '1' THEN 'ARQUIVADO'
WHEN CLIENTES.SITUACAO = '2' THEN 'BLOQUEADO'
WHEN CLIENTES.SITUACAO = '3' THEN 'PENDENTE'
WHEN CLIENTES.SITUACAO = '4' THEN 'MANUTENÇÃO'
WHEN CLIENTES.SITUACAO = '5' THEN 'BLOQUADO (PROTESTADO)'
WHEN CLIENTES.SITUACAO = '6' THEN 'INVIABILIDADE'
WHEN CLIENTES.SITUACAO = '7' THEN 'PROSPECTO'
ELSE CLIENTES.SITUACAO
END AS SITUACAO_CLIENTE_SIT
FROM CLIENTES
LEFT JOIN WIFI_BUSINESS NEGOCIOS
ON NEGOCIOS.ID_COD_CLIENTE = CLIENTES.ID_CLIENTE
LEFT JOIN CLIENTES_NOVO NOV
ON CLIENTES.ID_CLIENTE = NOV.ID_CLIENTE
LEFT JOIN WIFI_BUSINESS NEGOCIOS
ON NEGOCIOS.ID_COD_CLIENTE = CLIENTES.ID_CLIENTE
LEFT JOIN CLIENTES_NOVO NOV
ON CLIENTES.ID_CLIENTE = NOV.ID_CLIENTE
In this step we perform a left join usin the keys "ID_CLIENTE, ID_COD_CLIENTE,ID_CLIENTE"
After the join we can use the table fields to "call" the field we need and perfor the tranformation like this.
CLIENTES.ID_CLIENTE AS ID_CLIENTE_ID,
CLIENTES.CPF_CNPJ AS CPF_CNPJ_CPF,
NOV.ID_BASE AS IDBASE_NOVO,
CLIENTES.NOME AS NOME_CLIENTE_NOME,
NEGOCIOS.ID_COD_PLANOPK AS COD_PK,
NEGOCIOS.ID_COD_SOLICITACAO AS ID_COD_SOL_WIFI,
So we expect who in the data lineage the field ID_CLIENTE(from table cliente silver_layer ) become ID_CLIENTE_ID ( gold_layer)
But observes the follow image:
The expectation is ONLY ONE LINE from ID_CLIENTE to ID_CLIENTE_ID
Like the IS_PESSOA_FISICA to CLASSIFICACAO_CLIENTE_CLASS
FIG 2:
Describe the bug
A clear and concise description of what the bug is.
Using spark listener, when are performed a JOIN in gold layer the data lineage become 1:*
It's a little complex to explain but let's go.
Imagine the follow scenario:
is there a script in your gold layer like this:
from pyspark.sql.functions import expr
from spark.camadas import silverTempViews, writeGold, writeOracleDW
from spark.session import session
source = "db-name" # database_name
silverTempViews(source, [
'clientes','wifi_business','clientes_novo' #table_names
])
query = """
SELECT
CLIENTES.ID_CLIENTE AS ID_CLIENTE_ID,
CLIENTES.CPF_CNPJ AS CPF_CNPJ_CPF,
NOV.ID_BASE AS IDBASE_NOVO,
CLIENTES.NOME AS NOME_CLIENTE_NOME,
NEGOCIOS.ID_COD_PLANOPK AS COD_PK,
NEGOCIOS.ID_COD_SOLICITACAO AS ID_COD_SOL_WIFI,
CASE
WHEN CLIENTES.IS_PESSOA_FISICA = 1 THEN 'PF'
WHEN CLIENTES.ID_RAMO_ATIVIDADE = 4 THEN 'GOVERNAMENTAL'
WHEN CLIENTES.IS_PESSOA_FISICA = 0 THEN 'PJ'
ELSE ''
END AS CLASSIFICACAO_CLIENTE_CLASS,
CASE
WHEN CLIENTES.SITUACAO = '0' THEN 'ATIVO'
WHEN CLIENTES.SITUACAO = '1' THEN 'ARQUIVADO'
WHEN CLIENTES.SITUACAO = '2' THEN 'BLOQUEADO'
WHEN CLIENTES.SITUACAO = '3' THEN 'PENDENTE'
WHEN CLIENTES.SITUACAO = '4' THEN 'MANUTENÇÃO'
WHEN CLIENTES.SITUACAO = '5' THEN 'BLOQUADO (PROTESTADO)'
WHEN CLIENTES.SITUACAO = '6' THEN 'INVIABILIDADE'
WHEN CLIENTES.SITUACAO = '7' THEN 'PROSPECTO'
ELSE CLIENTES.SITUACAO
END AS SITUACAO_CLIENTE_SIT
"""
__session = session.get()
df = __session.sql(query)
indicador = "gold_teste_datahub"
writeOracleDW(df, indicador)
Note:
This sql reflects to the image about the issue.
Lets focus on the real part of problem:
In this step we perform a left join usin the keys "ID_CLIENTE, ID_COD_CLIENTE,ID_CLIENTE"
After the join we can use the table fields to "call" the field we need and perfor the tranformation like this.
So we expect who in the data lineage the field ID_CLIENTE(from table cliente silver_layer ) become ID_CLIENTE_ID ( gold_layer)
But observes the follow image:
The expectation is ONLY ONE LINE from ID_CLIENTE to ID_CLIENTE_ID
Like the IS_PESSOA_FISICA to CLASSIFICACAO_CLIENTE_CLASS
FIG 2:
Is there a way to fix this?
For knowledge this is the spark config:
Jar version:
Datahub cli version:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Its expect a datalineage using JOIN but 1:1 like FIG 2.
Joins do not become datalineage ambiguous or datalineage 1:*.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
NAME="Oracle Linux Server"
VERSION="9.4"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Oracle Linux Server 9.4"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:9:4:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 9"
ORACLE_BUGZILLA_PRODUCT_VERSION=9.4
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=9.4
Additional context
A full data lineage is expected using join and 1:1
The text was updated successfully, but these errors were encountered: