Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTP/HTTPS protocol for ClickHouse connection #938

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

avelanarius
Copy link
Member

@avelanarius avelanarius commented Oct 31, 2024

This PR adds support for the HTTP (port 8123)/HTTPS (port 8443) protocol to connect to ClickHouse (in addition to the already supported clickhouse:// (port 9000) protocol).

This change is motivated by the fact that the HTTPS endpoint is the first/default endpoint shown on ClickHouse Cloud and by the fact that we saw a user try to use the HTTP endpoint.

Additionally this PR improves the "ClickHouse Connection Doctor" code and the validation of user-provided connection URLs. Related to that, RemoteLogDrainUrl and QuesmaInternalTelemetryUrl now can't be provided in the config - they were overwritten by hardcoded constants either way.

Before this change, Quesma supported only the "clickhouse://" protocol
to connect to ClickHouse. However, ClickHouse also supports the HTTP
protocol (8123) and HTTPS protocol (8443). This PR extends the
ClickHouse connection logic to switch to using HTTP(S) protocol if the
user put that endpoint instead.

This change is motivated by the fact that the HTTPS endpoint is the
first/default endpoint shown on ClickHouse Cloud and by the fact that
we saw a user try to use the HTTP endpoint.
@avelanarius
Copy link
Member Author

@mieciu I vaguely remember you mentioning something about http:// endpoint in Hydrolix, but from all I've seen Hydrolix also uses the clickhouse:// endpoint - just making sure that this change is also compatible with Hydrolix (I only tested ClickHouse OSS and ClickHouse Cloud).

@avelanarius avelanarius force-pushed the clickhouse-connection-user-friendly branch from e97a153 to 17ef2cd Compare October 31, 2024 16:13
@avelanarius avelanarius force-pushed the clickhouse-connection-user-friendly branch from 20df302 to ba73ad3 Compare October 31, 2024 16:41
@mieciu
Copy link
Member

mieciu commented Nov 4, 2024

@mieciu I vaguely remember you mentioning something about http:// endpoint in Hydrolix, but from all I've seen Hydrolix also uses the clickhouse:// endpoint - just making sure that this change is also compatible with Hydrolix (I only tested ClickHouse OSS and ClickHouse Cloud).

So far we didn't really care about the protocol prefix - once connection has been established with host we relied on driver defaults, so that'd be something to check. I'm not sure if querying over http protocol is supported there.

@avelanarius avelanarius marked this pull request as ready for review November 4, 2024 09:57
@avelanarius avelanarius requested a review from a team as a code owner November 4, 2024 09:57
avelanarius added a commit to avelanarius/quesma that referenced this pull request Nov 4, 2024
This PR improves the validation of connection URLs in configuration,
giving clearer error messages to the user and the expected format of the
connection string.

Related to that, RemoteLogDrainUrl and QuesmaInternalTelemetryUrl now
can't be provided in the config - they were overwritten by
hardcoded constants either way.

Additionally, ClickHouse connection doctor is improved:
- If connection succeeded it suggests making sure that username/password
  is correct
- It now skips TLS trial connections if the user disabled TLS in the
  config (and informs of that fact)
- It tries default ports in more cases

This PR is extracted from larger PR QuesmaOrg#938.
@avelanarius
Copy link
Member Author

Since there are some quirks with HTTP(S) in Hydrolix and this PR might not be mergable as-is, I extracted only the validation improvements to this (mergable) PR #946

avelanarius added a commit to avelanarius/quesma that referenced this pull request Nov 4, 2024
This PR improves the validation of connection URLs in configuration,
giving clearer error messages to the user and the expected format of the
connection string.

Related to that, RemoteLogDrainUrl and QuesmaInternalTelemetryUrl now
can't be provided in the config - they were overwritten by
hardcoded constants either way.

Additionally, ClickHouse connection doctor is improved:
- If connection succeeded it suggests making sure that username/password
  is correct
- It now skips TLS trial connections if the user disabled TLS in the
  config (and informs of that fact)
- It tries default ports in more cases

This PR is extracted from larger PR QuesmaOrg#938.
@avelanarius
Copy link
Member Author

And the quirks are:

  • Hydrolix native protocol endpoint uses the https:// scheme: https://docs.hydrolix.io/docs/query-some-data

    Connect to Hydrolix
    HTTPS API: https://{myhostname}}.hydrolix.live/query
    Clickhouse Native: https://{myhostname}}.hydrolix.live:9440

  • Hydrolix HTTPS API has a different API than ClickHouse HTTP(S) API: curl --request POST \ --url https://hostname.hydrolix.live/query \ --header 'accept: text/json' \ --header 'content-type: text/plain' \ --data 'SELECT count() FROM sample.taxi_trips' vs curl 'http://localhost:8123/?query=SELECT%201'

@avelanarius avelanarius changed the title Make ClickHouse connection configuration more user-friendly Add support for HTTP/HTTPS protocol for ClickHouse connection Nov 4, 2024
@avelanarius avelanarius marked this pull request as draft November 4, 2024 11:00
@mieciu
Copy link
Member

mieciu commented Nov 4, 2024

Yea just to leave trace after our investigation:

This discrepancy is related to differences in how queries over http are handled in both DBs.

Relevant sources: ClickHouse / Hydrolix.

github-merge-queue bot pushed a commit that referenced this pull request Nov 4, 2024
This PR improves the validation of connection URLs in configuration,
giving clearer error messages to the user and the expected format of the
connection string.

Related to that, RemoteLogDrainUrl and QuesmaInternalTelemetryUrl now
can't be provided in the config - they were overwritten by hardcoded
constants either way.

Additionally, ClickHouse connection doctor is improved:
- If connection succeeded it suggests making sure that username/password
is correct
- It now skips TLS trial connections if the user disabled TLS in the
config (and informs of that fact)
- It tries default ports in more cases

This PR is extracted from larger PR #938.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants