Skip to content

Latest commit

 

History

History
60 lines (49 loc) · 2.1 KB

README.md

File metadata and controls

60 lines (49 loc) · 2.1 KB

PG_dedump

An utility script to extract data from Postgres (.sql) dumps.

Why

We have dumps from different postgres/postgis servers, different versions of postgres are involved and it's not simple to just restore them given the necessary data(/relationships/ownerships) integrity that a full restore requires.

Moreover, in this case the dumps contain just read-only data, so the only requirement is to have access to those tables.

How

The tool iterates through a sql file, that contains statements generated by pg_dump. When a CREATE TABLE command is found the statement is converted in the DuckDB dialect and a table with the same data schema is created. The script then looks for the COPY statement,

  1. reads each line after that as TSV
  2. perform some transformation to match the table schema
  3. convert this block of rows to an arrow table
  4. efficiently copies the arrow table into duckdb
  5. export the table as parquet file

Install

pip install .

Usage

usage: pg_dedump [-h] [-v] [-r] [-c CHUNKS] [-t TOTAL] [-d DB] [-p PREFIX] [-o OUTPUT] [--output-type OUTPUT_TYPE] [FILE ...]

extract tables from postgres dumps

positional arguments:
  FILE                  files to read, if empty, stdin is used

options:
  -h, --help            show this help message and exit
  -v, --verbose         Print debug output
  -r, --drop-db         Delete the database if present
  -c CHUNKS, --chunks CHUNKS
                        Chunk insert size - default: 10000
  -t TOTAL, --total TOTAL
                        Total lines - enables the progress bar
  -d DB, --db DB        Name of the dump database - by default uses dump.ddb
  -p PREFIX, --prefix PREFIX
                        Prefix to add to each table exported
  -o OUTPUT, --output OUTPUT
                        Output path
  --output-type OUTPUT_TYPE
                        Format of the tables output - default: parquet

Example

pg_dump < dump.sql

Generate a supported dump

To create an sql dump from a binary dump use:

pg_restore -f output.sql binary_dump_file

It's advised to extract just the tables you need using option -t.