Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hive support #1313

Closed
wants to merge 19 commits into from
Closed

Added hive support #1313

wants to merge 19 commits into from

Conversation

aklochkova
Copy link
Contributor

FIxes #1168

@aklochkova aklochkova requested a review from wivern August 26, 2019 15:11
ymolodkov and others added 2 commits August 27, 2019 13:16
 dependency itself was updated in the common library, here I just fixed code.
Copy link
Contributor

@wivern wivern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed cohort generation on Hive:

java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [CREATE TEMPORARY TABLE results.xehkngqvfinal_cohort
 AS SELECT
person_id, min(start_date) as start_date, end_date

FROM
cteEnds
group by person_id, end_date

]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 7:20 Invalid table alias or column reference 'end_date': (possible column names are: person_id, start_date, era_end_date)

I would like to share cohort design in a separate comment.

@wivern
Copy link
Contributor

wivern commented Aug 28, 2019

{
  "ConceptSets": [
    {
      "id": 0,
      "name": "clopidogrel",
      "expression": {
        "items": [
          {
            "concept": {
              "CONCEPT_ID": 1322184,
              "CONCEPT_NAME": "clopidogrel",
              "STANDARD_CONCEPT": "S",
              "STANDARD_CONCEPT_CAPTION": "Standard",
              "INVALID_REASON": "V",
              "INVALID_REASON_CAPTION": "Valid",
              "CONCEPT_CODE": "32968",
              "DOMAIN_ID": "Drug",
              "VOCABULARY_ID": "RxNorm",
              "CONCEPT_CLASS_ID": "Ingredient"
            },
            "includeDescendants": true
          }
        ]
      }
    }
  ],
  "PrimaryCriteria": {
    "CriteriaList": [
      {
        "DrugExposure": {
          "CodesetId": 0
        }
      }
    ],
    "ObservationWindow": {
      "PriorDays": 0,
      "PostDays": 0
    },
    "PrimaryCriteriaLimit": {
      "Type": "First"
    }
  },
  "QualifiedLimit": {
    "Type": "First"
  },
  "ExpressionLimit": {
    "Type": "First"
  },
  "InclusionRules": [],
  "CensoringCriteria": [],
  "CollapseSettings": {
    "CollapseType": "ERA",
    "EraPad": 0
  },
  "CensorWindow": {},
  "cdmVersionRange": ">=5.0.0"
}

@wivern
Copy link
Contributor

wivern commented Aug 28, 2019

One more exception while opening "Condition Eras" Heracles report:

org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [select  concept_hierarchy.concept_id,
	CONCAT(
		coalesce(concept_hierarchy.level4_concept_name,'NA'), '||',
		coalesce(concept_hierarchy.level3_concept_name,'NA'), '||',
		coalesce(concept_hierarchy.level2_concept_name,'NA'), '||',
		coalesce(concept_hierarchy.level1_concept_name,'NA'), '||',
		coalesce(concept_hierarchy.concept_name,'NA')
	) concept_path,
	hr1.count_value as num_persons, 
	ROUND(CAST(1.0*hr1.count_value / denom.count_value AS DOUBLE),5) as percent_persons,
	ROUND(CAST(hr2.avg_value AS DOUBLE),5) as length_of_era
from (
    select stratum_1, count_value
    from results.heracles_results 
    where analysis_id = 1000 and cohort_definition_id = ?
		GROUP BY stratum_1, count_value
) hr1
inner join (
    select stratum_1, avg_value 
    from results.heracles_results_dist 
    where analysis_id = 1007 and cohort_definition_id = ?
		GROUP BY stratum_1, avg_value
) hr2 on hr1.stratum_1 = hr2.stratum_1
INNER JOIN results.concept_hierarchy concept_hierarchy
  ON CAST(CASE WHEN isNumeric(hr1.stratum_1) = 1 THEN hr1.stratum_1 ELSE null END AS INT) = concept_hierarchy.concept_id
    AND concept_hierarchy.treemap='Condition'
CROSS JOIN (
    select count_value from results.heracles_results where analysis_id = 1 and cohort_definition_id = ?
) denom
order by hr1.count_value desc]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 25:20 Invalid function 'isNumeric'
```

@wivern
Copy link
Contributor

wivern commented Aug 28, 2019

CohortCharaterization failed:

org.ohdsi.webapi.exception.AtlasException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.springframework.jdbc.BadSqlGrammarException: StatementCallback; bad SQL grammar [insert into results.cc_results (
     type,
     fa_type,
     covariate_id,
     covariate_name,
     analysis_id,
     analysis_name,
     concept_id,
     count_value,
     avg_value,
     cohort_definition_id,
     strata_id,
     strata_name,
     cc_generation_id)
select CAST('PREVALENCE' AS VARCHAR(255)) as type,
        CAST('CUSTOM_FE' AS VARCHAR(255)) as fa_type,
        CAST(covariate_id AS BIGINT) as covariate_id,
        CAST(covariate_name AS VARCHAR(1000)) as covariate_name,
        CAST(1036 AS INTEGER) as analysis_id,
        CAST('Location by State during cohort period' AS VARCHAR(1000)) as analysis_name,
        CAST(concept_id AS INTEGER) as concept_id,
        sum_value       as count_value,
        average_value   as stat_value,
        CAST(57 AS BIGINT) as cohort_definition_id,
        CAST(0 AS BIGINT) as strata_id,
        CAST('' AS VARCHAR(1000)) as strata_name,
        CAST(45518 AS BIGINT) as cc_generation_id
from (SELECT
  CAST(C.concept_id AS BIGINT) * 1000 + 930 AS covariate_id,
  C.concept_name                            AS covariate_name,
  C.concept_id                              AS concept_id,
  COUNT(*)                                  AS sum_value,
  COUNT(*) * 1.0 / stat.total_cnt * 1.0     AS average_value
FROM (SELECT *
      FROM results.temp_cohort_frajflhn
      WHERE cohort_definition_id = 57) cohort
  JOIN omop_orc.location_history LH
    ON LH.start_date < cohort.cohort_end_date
       AND COALESCE(LH.end_date, CAST('20991231' AS TIMESTAMP)) > cohort.cohort_start_date
       AND LH.domain_id = 'PERSON'
       AND LH.entity_id = cohort.subject_id
  JOIN omop_orc.location L ON L.location_id = LH.location_id
  JOIN omop_orc.concept_ancestor CA ON CA.descendant_concept_id = L.region_concept_id
  JOIN omop_orc.concept C ON C.concept_id = CA.ancestor_concept_id AND C.concept_class_id = '4th level'
  CROSS JOIN (
               SELECT COUNT(*) total_cnt
               FROM results.temp_cohort_frajflhn
               WHERE cohort_definition_id = 57
             ) stat
GROUP BY C.concept_id, C.concept_name, stat.total_cnt) subquery]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 37:7 Table not found 

@wivern
Copy link
Contributor

wivern commented Aug 29, 2019

One more issue with Profiles:

PreparedStatementCallback; bad SQL grammar [select 'drug' as "domain", drug_concept_id concept_id, concept_name, drug_exposure_start_date start_date, drug_exposure_end_date end_date
from omop_orc.drug_exposure d
join omop_orc.concept c on d.drug_concept_id = c.concept_id
where person_id = ?

union all

select 'drugera' as "domain", drug_concept_id concept_id, concept_name, drug_era_start_date start_date, drug_era_end_date end_date 
from omop_orc.drug_era 
join omop_orc.concept c on c.concept_id = drug_era.drug_concept_id
where person_id = ?  

union all 

select 'condition' as "domain", condition_concept_id concept_id, concept_name, condition_start_date start_date, condition_end_date end_date
from omop_orc.condition_occurrence co
join omop_orc.concept c on co.condition_concept_id = c.concept_id
where person_id = ?

union all

select 'conditionera' as "domain", condition_concept_id concept_id, concept_name, condition_era_start_date start_date, condition_era_end_date end_date 
from omop_orc.condition_era
join omop_orc.concept c on c.concept_id = condition_era.condition_concept_id
where person_id = ?  

union  all

select 'observation' as "domain", observation_concept_id concept_id, concept_name, observation_date start_date, observation_date end_date 
from omop_orc.observation
join omop_orc.concept c on c.concept_id = observation.observation_concept_id
where person_id = ?  

union all

select 'visit' as "domain", visit_concept_id concept_id, concept_name, visit_start_date start_date, visit_end_date end_date 
from omop_orc.visit_occurrence
join omop_orc.concept c on c.concept_id = visit_occurrence.visit_concept_id
where person_id = ? 

union all

select 'death' as "domain", death_type_concept_id concept_id, concept_name, death_date start_date, death_date end_date
from omop_orc.death d
join omop_orc.concept c on d.death_type_concept_id = c.concept_id
where person_id = ?

union  all

select 'measurement' as "domain", measurement_concept_id concept_id, concept_name, measurement_date start_date, measurement_date end_date
from omop_orc.measurement m
join omop_orc.concept c on m.measurement_concept_id = c.concept_id
where person_id = ?

union  all

select 'device' as "domain", device_concept_id concept_id, concept_name, device_exposure_start_date start_date, device_exposure_end_date end_date 
from omop_orc.device_exposure de
join omop_orc.concept c on de.device_concept_id = c.concept_id
where person_id = ?

union  all

select 'procedure' as "domain", procedure_concept_id concept_id, concept_name, procedure_date start_date, procedure_date end_date 
from omop_orc.procedure_occurrence po
join omop_orc.concept c on po.procedure_concept_id = c.concept_id
where person_id = ?

union all

select 'specimen' as "domain", specimen_concept_id concept_id, concept_name, specimen_date start_date, specimen_date end_date 
from omop_orc.specimen s
join omop_orc.concept c on s.specimen_concept_id = c.concept_id
where person_id = ?]; nested exception is org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:17 cannot recognize input near 'as' '"domain"' ',' in selection target

count_value bigint,
last_update_time timestamp
)
PARTITIONED BY(cohort_definition_id int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use HINT DISTRIBUTE_ON_KEY and HINT SORT_ON_KEY and get rid off the separate SQL file for Hive

@anthonysena
Copy link
Collaborator

@olga-ganina please assign this review to a member of the team. Thanks!

@aklochkova aklochkova closed this Mar 11, 2020
@aklochkova
Copy link
Contributor Author

was merged in #1418

@chrisknoll chrisknoll deleted the issue-1168-hive-support branch May 18, 2021 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for sources using Hive LLAP
6 participants