Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17568: SolrCloud shouldn't proxy/route a core specific request #2885

Merged
merged 8 commits into from
Dec 2, 2024

Conversation

dsmiley
Copy link
Contributor

@dsmiley dsmiley commented Nov 26, 2024

Because cores are a node level concept, not a cluster one.
This also removed a collection loop.
@github-actions github-actions bot added documentation Improvements or additions to documentation cat:cli labels Nov 27, 2024
# Conflicts:
#	solr/CHANGES.txt
@@ -36,7 +36,8 @@ Improvements

Optimizations
---------------------
(No changes)
* SOLR-17568: The CLI bin/solr export tool now contacts the appropriate nodes directly for data instead of proxying through one.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epugh you might be interested in this aspect as it touches the CLI. Or perhaps not since you are working on CLI infrastructure and not the business of what they do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix.. I had no idea that it could be so inefficient. In the future, export might leverage a streaming expression, i wonder if Streaming Expressions is currently smart enough for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug; I found the issue above by seeing a test failure. There's no test failure relating to what you say.

Copy link
Contributor

@gbellaton gbellaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to understand why the previous code with even bothering to go through all the collections from the ClusterState.
The new code looks good to me.

Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, seems like a gnarly pattern to spot and fix? One nit about the major changes text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some nice simplification!

=== SolrCloud request routing

HTTP requests to SolrCloud that are for a specific core must be delivered to the node with that core.
Past SolrCloud versions would scan all cores everywhere to find a node with the core, and if found then proxy the request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a slightly dangling sentence... I now understand the "past", now what is happening? Or was this just a a inefficient bug pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now what is happening

Will HTTP 404 if Solr can't resolve the request. I should add that. I don't find the last sentence to be dangling but feel free to offer a rewording suggestion.


try (SolrClient client = CLIUtils.getSolrClient(baseurl, credentials)) {
// reference the replica's node URL, not the baseUrl in scope, which could be anywhere
try (SolrClient client = CLIUtils.getSolrClient(replica.getBaseUrl(), credentials)) {
Copy link
Contributor

@janhoy janhoy Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a CLI tool, and the CLI tool is potentially being used from a computer outside the cluster itself, I wonder if a side-effect of this might be that it won't work through reverse proxy / ingress with a different external DNS name than what Solr has on the inside.

If I interpret this code correctly, replica.getBaseUrl() is data fetched from ZK clusterstate, while baseUrl was the user-supplied --solr-url. Thus it could be that export tool would currently work through a proxy / ingress due to the auto proxying, but may not work with this change.

A related and similar issue is that the Admin UI does not work well behind a reverse proxy, if you try to click on links for other Cores it will all break.

If we truly want to support clients through reverse proxies, I believe the correct fix is for the Solr server to support Forwarded HTTP header (and its X-Forwarded-* counterparts) and rewrite URLs served in REST responses to adhere to the external address space, not the internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that this change would break for such a scenario. Thankfully the ExportTool is using the Solr/HTTP ClusterStateProvider instead of ZK; thus the URLs may be rewritable by such a proxy thing since it'd be at the HTTP level. But Solr the client would need to be configured to use XML instead of javabin, but I think it's javabin by default. I could imagine adding special support for this for CLUSTERSTATE or COLSTATUS

Despite that issue, I still believe in this change. I don't think it makes sense for SolrCloud to go through such guessing efforts to resolve collection/core ambiguity. Ideally we'd improve on that so that we wouldn't have ambiguity. I could imagine a /solr/collectionName!shardName/select syntax or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a corner case for sure and not worth modifying until there is a user request. Just wanted to highlight the issue and raise awareness as to whether we want bin/solr to work through reverse proxy.

@dsmiley
Copy link
Contributor Author

dsmiley commented Nov 27, 2024

Thanks for feedback everyone. I'll merge in a day or two. Reminder: this is 10.0 only.

shuffledSlices = new ArrayList<>(slices);
Collections.shuffle(shuffledSlices, Utils.RANDOM);
}
Iterator<Slice> shuffledSlices = new RandomIterator<>(Utils.RANDOM, slices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why we need to shuffle if we look for a specific core by its name. But I see that the byCoreName parameter is always false (byCoreName is now always false).
So the code wants to find any active replica of any shard in a specific collection, and then return a url with replica.getBaseUrl() + "/" + origCoreName. Do I understand correctly? Is it the intended behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking closer. I see there's some dead code to remove -- byCoreName and don't pass origCorename in either. I'll do another update to clarify.

@dsmiley dsmiley merged commit 3e35891 into apache:main Dec 2, 2024
4 checks passed
@dsmiley dsmiley deleted the solr-17568 branch December 2, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants