Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to migrate MariaDB database #61

Merged
merged 54 commits into from
Sep 4, 2024
Merged

Command to migrate MariaDB database #61

merged 54 commits into from
Sep 4, 2024

Conversation

amyfromandi
Copy link
Collaborator

@amyfromandi amyfromandi commented Jul 17, 2024

This branch contains migration code that enables a complete MariaDB to PostgreSQL database migration, as detailed in #60. The script in this pull request should make several key data evolutions:

  • Migrate the macrostrat MariaDB database to PostgreSQL, retaining all database tables, rows, etc.
  • Merge the resulting database (macrostrat_two, currently) into Macrostrat's PostgreSQL database/macrostrat schema, overwriting tables if needed
  • Feature/move fixtures to migrations #73: Run PostgreSQL migrations to ensure all constraints/sequences/foreign keys are defined
  • Add foreign keys from maps to macrostrat #80: Regenerate foreign keys from other schemas (e.g., maps) as needed. This has been moved to future work, based on some complexities identified by @mwestphall.

A successful operation of this script will result in full integration of all data from MariaDB, allowing the Macrostrat API to operate off of a single PostgreSQL database (UW-Macrostrat/macrostrat-api#229)

Architecturally, this should be implemented within the macrostrat command-line application:

  • Changes to MariaDB migration to integrate with Macrostrat CLI #74:
    • Implement as a macrostrat subcommand, e.g., macrostrat migrate-mariadb (we can figure out the "right" name later)
    • Use database credentials, etc. from macrostrat.toml rather than replicating them elsewhere
    • Use the macrostrat.database module for running SQL commands
    • As much as possible, we should use Docker for command-line processing
  • Once this is all validated, we can remove the macrostrat v1 schlep scripts that were the initial version of this process

Latest update:

  • restored maria-migrate branch
  • added .gitignore file
  • refactored compare_data_counts() to compare any two databases passed through as parameters.

Next steps:

  • Resolve all discrepancies between the PostgreSQL macrostrat_two database and the development/production macrostrat PostgreSQL database.

@amyfromandi amyfromandi requested a review from davenquinn July 17, 2024 20:54
@davenquinn
Copy link
Member

For data validation summary with these scripts (see #60), can we print row counts, etc. as tables to improve readability (maybe using the rich Python library)?

@davenquinn davenquinn changed the title restoring maria-migrate branch Command to migrate MariaDB database Jul 17, 2024
amyfromandi and others added 23 commits July 22, 2024 13:46
* main: (158 commits)
  Got rid of unnecessary spacing changes
  Centralized GRANTs in a single post-update hook
  New subsystem and hook function that updates permissions after all schema updates
  Removed some old commands that may not be useful anymore
  Updated JetBrains files
  Update and include new tables
  Update and include new tables
  Started upgrading macrostrat migration scripts
  Updated Macrostrat subsystem definitions
  Moved 'column-builder' views into new subsystem architecture
  Started re-working v2-transition code into various places
  Move sql alteration functions to database-upgrades directory
  Remove dependencies on local packages in pyproject.toml
  Update README.md
  Storage scheme migration
  Updated storage migrations
  Updated maps migrations
  Create some explicit migrations
  Remove unused function
  Improved organization of paleogeography sub-app
  ...
davenquinn and others added 22 commits July 29, 2024 12:34
… mode parameters for pgloader command to work
Changes to MariaDB migration to integrate with Macrostrat CLI
@amyfromandi amyfromandi linked an issue Aug 13, 2024 that may be closed by this pull request
…and kep the schlep scripts for the tables that are modified in the pre/post migratino scripts.
@amyfromandi
Copy link
Collaborator Author

amyfromandi commented Aug 30, 2024

Removed v1 schlep scripts and moved all index files here schlep-index.
Kept schlep scripts for tables that are modified in the pre/post migration scripts in case they are needed for future references i.e. the process scripts. remaining schlep scripts

These changes are referenced from #81

@amyfromandi amyfromandi merged commit 240d0f6 into main Sep 4, 2024
@davenquinn davenquinn deleted the maria-migrate branch September 7, 2024 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MariaDB to PostgreSQL data variance tracker
2 participants