Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for psycopg2 driver. #214

Open
jackwotherspoon opened this issue Dec 10, 2021 · 22 comments
Open

Add support for psycopg2 driver. #214

jackwotherspoon opened this issue Dec 10, 2021 · 22 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jackwotherspoon
Copy link
Collaborator

Support PostgreSQL psycopg2 driver with connector.

@jackwotherspoon jackwotherspoon added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Dec 10, 2021
@jackwotherspoon jackwotherspoon self-assigned this Dec 10, 2021
@gunar
Copy link

gunar commented Dec 16, 2021

Hey @jackwotherspoon, I've hit this exact problem this week. Do you have an estimate? Any rough ideas would be greatly appreciated. Thanks in advance.

@jackwotherspoon
Copy link
Collaborator Author

jackwotherspoon commented Dec 16, 2021

@gunar This has been prioritized and will be worked on in early 2022.

In the meantime, you can connect to Cloud SQL using Python and psycopg2 with the Cloud SQL Auth Proxy through Unix sockets or TCP. For an explanation and example see here: https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#expandable-2

For the Unix sockets sample just swap out part of the pg8000 code for this code snippet:

pool = sqlalchemy.create_engine(
    sqlalchemy.engine.url.URL.create(
        drivername="postgresql+psycopg2",
        username=db_user,  # e.g. "my-database-user"
        password=db_pass,  # e.g. "my-database-password"
        database=db_name,  # e.g. "my-database-name"
        query={
            "host": "{}/{}".format(
                db_socket_dir,  # e.g. "/cloudsql"
                instance_connection_name)  # i.e "<PROJECT-NAME>:<INSTANCE-REGION>:<INSTANCE-NAME>"
        }
    ),
    **db_config
)

Hope this helps for the time being! Thanks for the feedback!

@gunar
Copy link

gunar commented Dec 16, 2021

@jackwotherspoon Thanks man. I'm not inclined to go with the Cloud SQL Auth Proxy as that's one more thing for the user to remember to execute.

I guess I could make the python script itself spawn the process for the proxy—has anyone attempted that?

Also, do I need sqlalchemy in order to connect to the proxy or can I connect directly with psycopg2?

@jackwotherspoon jackwotherspoon added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Feb 14, 2022
@kemics
Copy link

kemics commented Jun 22, 2022

Hi @jackwotherspoon , thank you for all your work on cloud sql connectors

Do you have any estimates for psycopg2 support? It would be really nice to have

By the way the issue has p2 priority but it has the biggest number of thumb-ups😉

@jackwotherspoon
Copy link
Collaborator Author

Hi @kemics, thanks for the kind words and showing your enthusiasm towards the Cloud SQL Python connector.

I currently can't provide a specific timeline for psycopg2 support because there are a lot of factors at play here but I can provide some details for you.

In order to support psycopg2 connections with the connector we require the ability to pass in a pre-existing SSL connection and skip a couple of postgres database level protocols (this is the hard part and is required because Cloud SQL connections connect to a proxy server and not directly to the database itself).

I have been in talks with the psycopg2 team about helping make some changes in psycopg2 to help support these type of connections, the work also requires additional upstream changes into libpq.

Overall, there are a lot of moving parts but I will continue to keep the ball rolling on this and make sure that I update progress here on the issue for more transparency.

Thanks again for the gentle nudge here and for showing interest in the Python connector, much appreciated.

Have a great day :)

@kemics
Copy link

kemics commented Jul 21, 2022

Thanks @jackwotherspoon , great explanation 👍

I'm a fan of what you are doing here and have successfully tried this repo with sqlalchemy and also https://github.com/GoogleCloudPlatform/cloud-sql-go-connector . Works like a charm and a huge improvement in terms of security

@gunar
Copy link

gunar commented Aug 3, 2022

@jackwotherspoon

With regards to:

For the Unix sockets sample just swap out part of the pg8000 code for this code snippet:

Is there any way to use my already logged in gcloud IAM authentication and not have to pass in username and password as part of the code? (I'm trying to get away from .env files).

@jackwotherspoon
Copy link
Collaborator Author

jackwotherspoon commented Aug 3, 2022

@gunar

Is there any way to use my already logged in gcloud IAM authentication and not have to pass in username and password as part of the code? (I'm trying to get away from .env files).

This can be done in a fairly straight-forward way for a service account authentication using Manual IAM Database Authentication. You just need to add your service account as an IAM database user to your Cloud SQL instance and grant it the Cloud SQL Instance User IAM role alongside the usual Cloud SQL Client role.

Once this is done, the below snippet should work in any environment that uses the service account (i.e. Cloud Run, Cloud Functions, App Engine etc.) to connect to Cloud SQL via Public IP (unix sockets is recommended for Public IP, see below comment for Private IP TCP connection snippet). See comments below for running locally.

import google.auth
from google.auth.transport.requests import Request
import sqlalchemy
import psycopg2

# uncomment below two lines to set application credentials to service account locally
# import os
# os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/key.json"

# get application default credentials
creds, _ = google.auth.default()

# refresh credentials if not valid
if not creds.valid:
  request = Request()
  creds.refresh(request)

# IAM service account database user, postgres removes suffix
sa_iam_user = (creds.service_account_email).removesuffix(
  ".gserviceaccount.com"
)

# create SQLAlchemy connection pool
pool = sqlalchemy.create_engine(
  sqlalchemy.engine.url.URL.create(
    drivername="postgresql+psycopg2",
    username=sa_iam_user, # login to database as service account user
    password=str(creds.token), # manual IAM database authn
    database=db_name,  # e.g. "my-database-name"
    query={
      "host": "{}/{}".format(
          "/cloudsql",  # e.g. "/cloudsql"
          instance_connection_name)  # i.e "<PROJECT-NAME>:<INSTANCE-REGION>:<INSTANCE-NAME>"
    }
  ),
)

# ... use connection pool

@cw-optima-ovo
Copy link

cw-optima-ovo commented Jan 19, 2023

I've been struggling to get psycopg2 working with either of the solutions posted here, really appreciate the examples @jackwotherspoon - wondering if you or someone else could point me in the right direction...
I need to connect to a Cloud SQL instance running Postgres with a gen2 cloud function using an IAM service account. Tried the solution posted on 16/12/21 as well as the solution posted 3/8/22 but still not getting there.

The cloud function is connected to the VPC with a serverless VPC connector, and the cloud sql instance has a private and public IP. I've also tried the below with the private IP instead of the 'project-name:instance-region:instance-name' for the unix socket.

I'm getting the error
psycopg2.OperationalError: connection to server on socket "/cloudsql/project-name:instance-region:instance-name/.s.PGSQL.5432" failed: No such file or directory

This is my code (obfuscated of course)

    # get application default credentials
    creds, _ = google.auth.default()

    # refresh credentials if not valid
    if not creds.valid:
        request = Request()
        creds.refresh(request)

    # IAM service account database user, postgres removes suffix
    sa_iam_user = '[email protected]'

    # create SQLAlchemy connection pool
    engine = sqlalchemy.create_engine(
    sqlalchemy.engine.url.URL.create(
        drivername="postgresql+psycopg2",
        username=sa_iam_user, # login to database as service account user
        password=str(creds.token), # manual IAM database authn
        database='database-name',  # e.g. "my-database-name"
        query={
        "host": "{}/{}".format(
            "/cloudsql",  # e.g. "/cloudsql"
            'project-name:instance-region:instance-name')  # i.e "<PROJECT-NAME>:<INSTANCE-REGION>:<INSTANCE-NAME>"
        }
    ),
    )

What am I doing wrong? I feel like I'm very close but obviously missing something.

@jackwotherspoon
Copy link
Collaborator Author

@cw-optima-ovo Let me see if I can help you out! Thanks for raising your issue and sorry for the slow reply.

I should of made one thing clear in my previous comment with the code sample. The Unix sockets approach is for a Public IP connection.

The recommended approach for Private IP connections such as your own is through a TCP socket which I will showcase here: (code sample was tested for gen2 Cloud Functions with VPC Connector attached)

import google.auth
from google.auth.transport.requests import Request
import sqlalchemy
import psycopg2

# get application default credentials
creds, _ = google.auth.default()

# refresh credentials if not valid
if not creds.valid:
  request = Request()
  creds.refresh(request)

# IAM service account database user, postgres removes suffix
sa_iam_user = (creds.service_account_email).removesuffix(
  ".gserviceaccount.com"
)

# create SQLAlchemy connection pool
pool = sqlalchemy.create_engine(
    # Equivalent URL:
    # postgresql+psycopg2://<db_user>:<db_pass>@<db_host>:<db_port>/<db_name>
    sqlalchemy.engine.url.URL.create(
        drivername="postgresql+psycopg2",
        username=sa_iam_user,
        password=str(creds.token),
        host="YOUR_PRIVATE_IP_ADDRESS",
        port=5432,
        database="YOUR_DATABASE",
    ),
)
# ... use connection pool

Let me know if you have luck with this sample over the unix sockets version 😄 Make sure the service account that the Cloud Function uses is added as a proper Cloud SQL IAM user to the instance and has the proper roles.

@cw-optima-ovo
Copy link

@cw-optima-ovo Let me see if I can help you out! Thanks for raising your issue and sorry for the slow reply.

I should of made one thing clear in my previous comment with the code sample. The Unix sockets approach is for a Public IP connection.

The recommended approach for Private IP connections such as your own is through a TCP socket which I will showcase here: (code sample was tested for gen2 Cloud Functions with VPC Connector attached)

import google.auth
from google.auth.transport.requests import Request
import sqlalchemy
import psycopg2

# get application default credentials
creds, _ = google.auth.default()

# refresh credentials if not valid
if not creds.valid:
  request = Request()
  creds.refresh(request)

# IAM service account database user, postgres removes suffix
sa_iam_user = (creds.service_account_email).removesuffix(
  ".gserviceaccount.com"
)

# create SQLAlchemy connection pool
pool = sqlalchemy.create_engine(
    # Equivalent URL:
    # postgresql+psycopg2://<db_user>:<db_pass>@<db_host>:<db_port>/<db_name>
    sqlalchemy.engine.url.URL.create(
        drivername="postgresql+psycopg2",
        username=sa_iam_user,
        password=str(creds.token),
        host="YOUR_PRIVATE_IP_ADDRESS",
        port=5432,
        database="YOUR_DATABASE",
    ),
)
# ... use connection pool

Let me know if you have luck with this sample over the unix sockets version 😄 Make sure the service account that the Cloud Function uses is added as a proper Cloud SQL IAM user to the instance and has the proper roles.

Thanks @jackwotherspoon that does indeed work! However I ended up going down the asyncpg route with the sql python connector in the end, the reason for this was partly because I didn't want to have a fixed IP in my variables as we're deploying a project through CI/CD. So while it's solvable to get the IP with CI/CD, it's a headache we didn't want to introduce given we had the project:region:instance already available. The performance between asyncpg and psycopg2 is about the same for our context so no harm done.

Hoping to see the psycopg2 supported with the cloud sql connector in the future though, this would have been my original preference.!

@enocom
Copy link
Member

enocom commented Mar 21, 2023

Related: psycopg/psycopg2#1421.

@wwuck
Copy link

wwuck commented Jul 13, 2023

Can I add a request for psycopg3 here? Or should that go in a separate issue?

@jackwotherspoon
Copy link
Collaborator Author

Can I add a request for psycopg3 here? Or should that go in a separate issue?

@wwuck psycopg3 is already being tracked in a separate issue #219, although both will most likely be supported at the same time once we can get up-stream changes to libpq made.

@jackwotherspoon
Copy link
Collaborator Author

jackwotherspoon commented Aug 1, 2023

FYI to everyone: it is possible to connect with automatic IAM authentication without the need for the Python Connector:

You'll need to ensure a few things:

  • The token has only sql.login scope (i.e. https://www.googleapis.com/auth/sqlservice.login)
  • The token isn't transmitted over an unencrypted channel. (Private IP recommended)

We're working on making this path easier for folks, but for now I'll share the mechanics for visibility.

Assuming you're using SQLAlchemy, you can do this:

import sqlalchemy
from sqlalchemy import event

import google.auth
from google.auth.transport.requests import Request

# initialize ADC creds
creds, _ = google.auth.default(
    scopes=["https://www.googleapis.com/auth/sqlservice.login"]
)

# Cloud SQL Instance IP address
instance_ip = <INSTANCE_IP> 

# create SQLAlchemy connection pool
# use Cloud SQL Instance IP + native port (5432)
# for best security use client certificates + server cert for SSL
engine = sqlalchemy.create_engine(
    f"postgresql+psycopg2://[email protected]:empty-pass@{ip_address}:5432/dbname",
    connect_args={'sslmode': 'require'},
)

# set do_connect event listener
# This gets called before a connection is created and allows you to
# refresh the OAuth2 token here as needed
@event.listens_for(engine, 'do_connect')
def auto_iam_authn(dialect, conn_rec, cargs, cparams):  
    # refresh credentials if expired
    if not creds.valid:
        request = Request()
        creds.refresh(request)
    
    # use OAuth2 token as password
    cparams["password"] = str(creds.token)

# interact with Cloud SQL database using connection pool
with engine.connect() as conn:
    # query database
    time = conn.execute(sqlalchemy.text("SELECT NOW()")).fetchone()
    print(time[0])

@bapi24
Copy link

bapi24 commented Aug 15, 2023

@jackwotherspoon is there a way connecting to the instance with IAM without using sqlalchemy but using just psycopg2?

# example
with psycopg2.connect(url) as con:
    with con.cursor() as cur:
        cur.execute("SELECT * FROM test.table")
        rows = cur.fetchall()
        print(rows)

@jackwotherspoon
Copy link
Collaborator Author

jackwotherspoon commented Aug 15, 2023

@bapi24

@jackwotherspoon is there a way connecting to the instance with IAM without using sqlalchemy but using just psycopg2?

Yes, you can still use the same concept as above. You can set the password to that of the IAM Principal's OAuth2 token. You will just have to manage refreshing the credentials yourself within your application to make sure the credentials object has not expired.

However, we normally recommend taking advantage of connection pooling for production environment, thus we strongly recommend the above SQLAlchemy example for most cases.

import psycopg2

import google.auth
from google.auth.transport.requests import Request

# initialize ADC creds
creds, _ = google.auth.default(
    scopes=["https://www.googleapis.com/auth/sqlservice.login"]
)

 # refresh credentials if expired (manage this code in your application)
if not creds.valid:
    request = Request()
    creds.refresh(request)


# Cloud SQL Instance IP address
instance_ip = '<INSTANCE_IP>'

# interact with Cloud SQL database using psycopg2 connection
with psycopg2.connect(f"dbname=mydb [email protected] password={str(creds.token)} host={instance_ip} sslmode=require") as con:
    with con.cursor() as cur:
        cur.execute("SELECT * FROM test.table")
        rows = cur.fetchall()
        print(rows)

@bapi24
Copy link

bapi24 commented Aug 15, 2023

@jackwotherspoon that worked, thank you!
Few Questions:

@jackwotherspoon
Copy link
Collaborator Author

@bapi24 Glad it worked! Let me answer your questions 😄

The documentation below says to use cloudsql.instances.login IAM permission to a custom role. If we attach that to a service-account, would service-account be able to talk to the DB same way?

The above code can be used for both an IAM User or IAM service account! So yes, you can add your service account as an IAM database user and have it login and access your DB 😄

Regarding the way we are getting the token here, is this a workaround for doing gcloud auth application-default login?

No it is not a workaround as the two perform slightly different functions (that work together). The way we are getting the token in code is fetching the Application Default Credentials from the environment and adding the required scopes for the OAuth2 token to be able to login to the DB. The gcloud auth application-default login command is setting the Application Default Credentials for your local environment. This is the key distinction, one gets the ADC credentials and one sets the ADC credentials.

The above code will work locally and in an application hosted in the Cloud (Cloud Run, App Engine etc.) since the ADC credentials in the Cloud are set automatically for you to the service account used with the application (no need to run a gcloud auth command).

would psycopg be supported in cloudsql-python-connector anytime soon?

It is most likely that psycopg will not be supported anytime soon in the Cloud SQL Python Connector (if ever) due to the vast overhead required to patch the psycopg library as well as the libpq library. This work coupled with the fact that the above samples (SQLAlchemy and direct psycopg2) already work very nicely without the need for the Python Connector makes this a low priority effort.

@bapi24
Copy link

bapi24 commented Aug 15, 2023

Got it, thank you very much for explaining it in detail 🙌

@AndreasBergmeier6176
Copy link

FYI to everyone: it is possible to connect with automatic IAM authentication without the need for the Python Connector:

You'll need to ensure a few things:

* The token has only `sql.login` scope (i.e. `https://www.googleapis.com/auth/sqlservice.login`)

* The token isn't transmitted over an unencrypted channel. (Private IP recommended)

We're working on making this path easier for folks, but for now I'll share the mechanics for visibility.

Assuming you're using SQLAlchemy, you can do this:

import sqlalchemy
from sqlalchemy import event

import google.auth
from google.auth.transport.requests import Request

# initialize ADC creds
creds, _ = google.auth.default(
    scopes=["https://www.googleapis.com/auth/sqlservice.login"]
)

# Cloud SQL Instance IP address
instance_ip = <INSTANCE_IP> 

# create SQLAlchemy connection pool
# use Cloud SQL Instance IP + native port (5432)
# for best security use client certificates + server cert for SSL
engine = sqlalchemy.create_engine(
    f"postgresql+psycopg2://[email protected]:empty-pass@{ip_address}:5432/dbname",
    connect_args={'sslmode': 'require'},
)

# set do_connect event listener
# This gets called before a connection is created and allows you to
# refresh the OAuth2 token here as needed
@event.listens_for(engine, 'do_connect')
def auto_iam_authn(dialect, conn_rec, cargs, cparams):  
    # refresh credentials if expired
    if not creds.valid:
        request = Request()
        creds.refresh(request)
    
    # use OAuth2 token as password
    cparams["password"] = str(creds.token)

# interact with Cloud SQL database using connection pool
with engine.connect() as conn:
    # query database
    time = conn.execute(sqlalchemy.text("SELECT NOW()")).fetchone()
    print(time[0])

Thanks for providing these examples. Although it would be really good if they were in the official documentation.

One thing that we still cannot make work is doing the psycopg2 -> Google SQL Auth Proxy -> Postgres connection in GKE using automatic IAM. I was under the impression that you would not need the login token there - am I wrong?

@enocom
Copy link
Member

enocom commented Oct 13, 2023

Correct -- the Proxy will handle that for you. If you'd like to open an issue in the Proxy repo, I'd be happy to help you out.

The approach describe above assumes you bypass the Connector and Proxy entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

8 participants