Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create primary key for new tables #176

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Create primary key for new tables #176

wants to merge 4 commits into from

Conversation

hbruch
Copy link
Collaborator

@hbruch hbruch commented Oct 23, 2024

This PR

  • will create primary keys for newly created tables (GeoDataFrame.to_postgis(...index=True) does not create them)
  • will avoid recreating existing tables ((geo)pandas to_sql/to_postgis replace action drops the table and recreates it (see to_postgis). Helpful would be, if replace would implement truncate/insert and a not-yet-existing option recreate would drop and create the table...)

@hbruch hbruch requested a review from derhuerst October 23, 2024 10:56
Copy link
Member

@derhuerst derhuerst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code's overall goals aren't clear immediately to an outsider like me. This is why I have some questions/objections.

schema=schema,
if_exists='replace',
)
# table was just created, create primary key (to_postgis doesn't create these,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're calling to_sql() above, not to_postgis(). Can you expand the comment to explain what's going on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment that explains that GeoDataFrame's methods are not named in a db agnostic as pandas.DataFrame's to_sql.

pipeline/resources/postgis_geopandas_io_manager.py Outdated Show resolved Hide resolved
Copy link
Member

@derhuerst derhuerst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does con.connection.cursor() start a new transaction? What if it fails? What if we're already within a transaction? PostgreSQL doesn't allow nested transactions.

I don't know if the with closing(con.connection.cursor()) as c expressions each open a new connection to the DB. If they do, they won't run in the same transaction because AFAIK transactions are always connection-bound.

Assuming that it works as-implemented, I approve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants