-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17568: SolrCloud shouldn't proxy/route a core specific request #2885
Changes from 6 commits
dea80ed
eaf76ba
dc4c37c
a357f4a
1a6be13
f0201b6
b1cf02b
8969ae0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -660,8 +660,8 @@ class CoreHandler { | |
} | ||
|
||
boolean exportDocsFromCore() throws IOException, SolrServerException { | ||
|
||
try (SolrClient client = CLIUtils.getSolrClient(baseurl, credentials)) { | ||
// reference the replica's node URL, not the baseUrl in scope, which could be anywhere | ||
try (SolrClient client = CLIUtils.getSolrClient(replica.getBaseUrl(), credentials)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is a CLI tool, and the CLI tool is potentially being used from a computer outside the cluster itself, I wonder if a side-effect of this might be that it won't work through reverse proxy / ingress with a different external DNS name than what Solr has on the inside. If I interpret this code correctly, A related and similar issue is that the Admin UI does not work well behind a reverse proxy, if you try to click on links for other Cores it will all break. If we truly want to support clients through reverse proxies, I believe the correct fix is for the Solr server to support Forwarded HTTP header (and its There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's true that this change would break for such a scenario. Thankfully the ExportTool is using the Solr/HTTP ClusterStateProvider instead of ZK; thus the URLs may be rewritable by such a proxy thing since it'd be at the HTTP level. But Solr the client would need to be configured to use XML instead of javabin, but I think it's javabin by default. I could imagine adding special support for this for CLUSTERSTATE or COLSTATUS Despite that issue, I still believe in this change. I don't think it makes sense for SolrCloud to go through such guessing efforts to resolve collection/core ambiguity. Ideally we'd improve on that so that we wouldn't have ambiguity. I could imagine a /solr/collectionName!shardName/select syntax or similar. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s a corner case for sure and not worth modifying until there is a user request. Just wanted to highlight the issue and raise awareness as to whether we want bin/solr to work through reverse proxy. |
||
expectedDocs = getDocCount(replica.getCoreName(), client, query); | ||
QueryRequest request; | ||
ModifiableSolrParams params = new ModifiableSolrParams(); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is some nice simplification! |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,7 +40,6 @@ | |
import java.lang.invoke.MethodHandles; | ||
import java.nio.charset.StandardCharsets; | ||
import java.util.ArrayList; | ||
import java.util.Arrays; | ||
import java.util.Collection; | ||
import java.util.Collections; | ||
import java.util.Enumeration; | ||
|
@@ -1090,39 +1089,18 @@ private SolrCore checkProps(ZkNodeProps zkProps) { | |
return core; | ||
} | ||
|
||
private List<Slice> getSlicesForAllCollections(ClusterState clusterState, boolean activeSlices) { | ||
// looks across *all* collections | ||
if (activeSlices) { | ||
return clusterState | ||
.collectionStream() | ||
.flatMap(coll -> Arrays.stream(coll.getActiveSlicesArr())) | ||
.toList(); | ||
} else { | ||
return clusterState.collectionStream().flatMap(coll -> coll.getSlices().stream()).toList(); | ||
} | ||
} | ||
|
||
protected String getRemoteCoreUrl(String collectionName, String origCorename) | ||
throws SolrException { | ||
ClusterState clusterState = cores.getZkController().getClusterState(); | ||
final DocCollection docCollection = clusterState.getCollectionOrNull(collectionName); | ||
Slice[] slices = (docCollection != null) ? docCollection.getActiveSlicesArr() : null; | ||
List<Slice> activeSlices; | ||
if (docCollection == null) { | ||
return null; | ||
} | ||
Collection<Slice> activeSlices = docCollection.getActiveSlices(); | ||
boolean byCoreName = false; | ||
|
||
int totalReplicas = 0; | ||
|
||
if (slices == null) { | ||
byCoreName = true; | ||
// all collections! | ||
activeSlices = getSlicesForAllCollections(clusterState, true); | ||
if (activeSlices.isEmpty()) { | ||
activeSlices = getSlicesForAllCollections(clusterState, false); | ||
} | ||
} else { | ||
activeSlices = List.of(slices); | ||
} | ||
|
||
for (Slice s : activeSlices) { | ||
totalReplicas += s.getReplicas().size(); | ||
} | ||
|
@@ -1145,40 +1123,32 @@ protected String getRemoteCoreUrl(String collectionName, String origCorename) | |
"No active replicas found for collection: " + collectionName); | ||
} | ||
|
||
String coreUrl = | ||
getCoreUrl(collectionName, origCorename, clusterState, activeSlices, byCoreName, true); | ||
String coreUrl = getCoreUrl(origCorename, clusterState, activeSlices, byCoreName, true); | ||
|
||
if (coreUrl == null) { | ||
coreUrl = | ||
getCoreUrl(collectionName, origCorename, clusterState, activeSlices, byCoreName, false); | ||
coreUrl = getCoreUrl(origCorename, clusterState, activeSlices, byCoreName, false); | ||
} | ||
|
||
return coreUrl; | ||
} | ||
|
||
private String getCoreUrl( | ||
String collectionName, | ||
String origCorename, | ||
ClusterState clusterState, | ||
List<Slice> slices, | ||
Collection<Slice> slices, | ||
boolean byCoreName, | ||
boolean activeReplicas) { | ||
String coreUrl; | ||
Set<String> liveNodes = clusterState.getLiveNodes(); | ||
|
||
List<Slice> shuffledSlices; | ||
if (slices.size() < 2) { | ||
shuffledSlices = slices; | ||
} else { | ||
shuffledSlices = new ArrayList<>(slices); | ||
Collections.shuffle(shuffledSlices, Utils.RANDOM); | ||
} | ||
Iterator<Slice> shuffledSlices = new RandomIterator<>(Utils.RANDOM, slices); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering why we need to shuffle if we look for a specific core by its name. But I see that the byCoreName parameter is always false (byCoreName is now always false). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for looking closer. I see there's some dead code to remove -- byCoreName and don't pass origCorename in either. I'll do another update to clarify. |
||
while (shuffledSlices.hasNext()) { | ||
Slice slice = shuffledSlices.next(); | ||
|
||
for (Slice slice : shuffledSlices) { | ||
List<Replica> randomizedReplicas = new ArrayList<>(slice.getReplicas()); | ||
Collections.shuffle(randomizedReplicas, Utils.RANDOM); | ||
Iterator<Replica> shuffledReplicas = new RandomIterator<>(Utils.RANDOM, slice.getReplicas()); | ||
while (shuffledReplicas.hasNext()) { | ||
Replica replica = shuffledReplicas.next(); | ||
|
||
for (Replica replica : randomizedReplicas) { | ||
if (!activeReplicas | ||
|| (liveNodes.contains(replica.getNodeName()) | ||
&& replica.getState() == Replica.State.ACTIVE)) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,11 @@ Users who previously relied on collection-specific URLs to avoid including the c | |
|
||
The service installer now installs a `systemd` startup script instead of an `init.d` startup script. It is up to the user to uninstall any existing `init.d` script when upgrading. | ||
|
||
=== SolrCloud request routing | ||
|
||
HTTP requests to SolrCloud that are for a specific core must be delivered to the node with that core. | ||
Past SolrCloud versions would scan all cores everywhere to find a node with the core, and if found then proxy the request. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feels like a slightly dangling sentence... I now understand the "past", now what is happening? Or was this just a a inefficient bug pattern? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Will HTTP 404 if Solr can't resolve the request. I should add that. I don't find the last sentence to be dangling but feel free to offer a rewording suggestion. |
||
|
||
=== Deprecation removals | ||
|
||
* The `jaegertracer-configurator` module, which was deprecated in 9.2, is removed. Users should migrate to the `opentelemetry` module. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epugh you might be interested in this aspect as it touches the CLI. Or perhaps not since you are working on CLI infrastructure and not the business of what they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix.. I had no idea that it could be so inefficient. In the future, export might leverage a streaming expression, i wonder if Streaming Expressions is currently smart enough for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrug; I found the issue above by seeing a test failure. There's no test failure relating to what you say.