Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move migration logic from SQL stored procedure to python #313

Open
renato2099 opened this issue Oct 9, 2023 · 1 comment
Open

Move migration logic from SQL stored procedure to python #313

renato2099 opened this issue Oct 9, 2023 · 1 comment

Comments

@renato2099
Copy link
Collaborator

Right now, after we ingest the e-tuff file we write it down to disk and then let postgres ingest and process it.
Since we have all of the data already in memory as a dataframe, we should process it there and only write the results to postgres.

@renato2099
Copy link
Collaborator Author

renato2099 commented Oct 15, 2023

hi @lewismc , these are the high-level steps we currently do in the SQL stored procedure

1. populate `data_time_series` using data from `proc_observations` where observation_types.variable_name IN ('datetime', 'depth', 'temperature', 'light', 'internal temperature')

2. populate `data_position`  using data from `proc_observations` where observation_types.variable_name IN ('longitude')
3. update `data_position` with latitudes
4. update `data_position` with lon_error and lat_errors

5. populate `data_histogram_bin_info` where b.variable_name LIKE 'HistDepthBinMax%'
6. populate `data_histogram_bin_data` where b.variable_name LIKE 'TimeAt%'

7. populate `data_profile` where b.variable_name LIKE 'PdtDepth%'
8. update `data_profile` where b.variable_name LIKE 'PdtTempMin%'
9. update `data_profile` where b.variable_name LIKE 'PdtTempMax%'

 -- SQL update statements to link measurement date time with position date time
10. update `data_time_series`.`position_date_time` using data from `date_time`
11. update `data_histogram_bin_data`.`position_date_time` using data from `date_time`
12. update `data_histogram_bin_data`.`data_profile` using data from `date_time`

another advantage of moving this to python is that we could more easily parallelize processing of the file while stop moving data across different tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant