-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17568: SolrCloud shouldn't proxy/route a core specific request #2885
Conversation
Because cores are a node level concept, not a cluster one. This also removed a collection loop.
# Conflicts: # solr/CHANGES.txt
@@ -36,7 +36,8 @@ Improvements | |||
|
|||
Optimizations | |||
--------------------- | |||
(No changes) | |||
* SOLR-17568: The CLI bin/solr export tool now contacts the appropriate nodes directly for data instead of proxying through one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epugh you might be interested in this aspect as it touches the CLI. Or perhaps not since you are working on CLI infrastructure and not the business of what they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix.. I had no idea that it could be so inefficient. In the future, export might leverage a streaming expression, i wonder if Streaming Expressions is currently smart enough for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrug; I found the issue above by seeing a test failure. There's no test failure relating to what you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to understand why the previous code with even bothering to go through all the collections from the ClusterState.
The new code looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, seems like a gnarly pattern to spot and fix? One nit about the major changes text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some nice simplification!
=== SolrCloud request routing | ||
|
||
HTTP requests to SolrCloud that are for a specific core must be delivered to the node with that core. | ||
Past SolrCloud versions would scan all cores everywhere to find a node with the core, and if found then proxy the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a slightly dangling sentence... I now understand the "past", now what is happening? Or was this just a a inefficient bug pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now what is happening
Will HTTP 404 if Solr can't resolve the request. I should add that. I don't find the last sentence to be dangling but feel free to offer a rewording suggestion.
|
||
try (SolrClient client = CLIUtils.getSolrClient(baseurl, credentials)) { | ||
// reference the replica's node URL, not the baseUrl in scope, which could be anywhere | ||
try (SolrClient client = CLIUtils.getSolrClient(replica.getBaseUrl(), credentials)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a CLI tool, and the CLI tool is potentially being used from a computer outside the cluster itself, I wonder if a side-effect of this might be that it won't work through reverse proxy / ingress with a different external DNS name than what Solr has on the inside.
If I interpret this code correctly, replica.getBaseUrl()
is data fetched from ZK clusterstate, while baseUrl
was the user-supplied --solr-url
. Thus it could be that export tool would currently work through a proxy / ingress due to the auto proxying, but may not work with this change.
A related and similar issue is that the Admin UI does not work well behind a reverse proxy, if you try to click on links for other Cores it will all break.
If we truly want to support clients through reverse proxies, I believe the correct fix is for the Solr server to support Forwarded HTTP header (and its X-Forwarded-*
counterparts) and rewrite URLs served in REST responses to adhere to the external address space, not the internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that this change would break for such a scenario. Thankfully the ExportTool is using the Solr/HTTP ClusterStateProvider instead of ZK; thus the URLs may be rewritable by such a proxy thing since it'd be at the HTTP level. But Solr the client would need to be configured to use XML instead of javabin, but I think it's javabin by default. I could imagine adding special support for this for CLUSTERSTATE or COLSTATUS
Despite that issue, I still believe in this change. I don't think it makes sense for SolrCloud to go through such guessing efforts to resolve collection/core ambiguity. Ideally we'd improve on that so that we wouldn't have ambiguity. I could imagine a /solr/collectionName!shardName/select syntax or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a corner case for sure and not worth modifying until there is a user request. Just wanted to highlight the issue and raise awareness as to whether we want bin/solr to work through reverse proxy.
Thanks for feedback everyone. I'll merge in a day or two. Reminder: this is 10.0 only. |
shuffledSlices = new ArrayList<>(slices); | ||
Collections.shuffle(shuffledSlices, Utils.RANDOM); | ||
} | ||
Iterator<Slice> shuffledSlices = new RandomIterator<>(Utils.RANDOM, slices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why we need to shuffle if we look for a specific core by its name. But I see that the byCoreName parameter is always false (byCoreName is now always false).
So the code wants to find any active replica of any shard in a specific collection, and then return a url with replica.getBaseUrl() + "/" + origCoreName. Do I understand correctly? Is it the intended behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking closer. I see there's some dead code to remove -- byCoreName and don't pass origCorename in either. I'll do another update to clarify.
https://issues.apache.org/jira/browse/SOLR-17568