Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17568: SolrCloud shouldn't proxy/route a core specific request #2885

Merged
merged 8 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion solr/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ Improvements

Optimizations
---------------------
(No changes)
* SOLR-17568: The CLI bin/solr export tool now contacts the appropriate nodes directly for data instead of proxying through one.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epugh you might be interested in this aspect as it touches the CLI. Or perhaps not since you are working on CLI infrastructure and not the business of what they do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix.. I had no idea that it could be so inefficient. In the future, export might leverage a streaming expression, i wonder if Streaming Expressions is currently smart enough for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug; I found the issue above by seeing a test failure. There's no test failure relating to what you say.

(David Smiley)

Bug Fixes
---------------------
Expand Down Expand Up @@ -130,6 +131,8 @@ Other Changes

* SOLR-16903: Update CLI tools to use java.nio.file.Path instead of java.io.File (Andrey Bozhko)

* SOLR-17568: SolrCloud no longer reroutes/proxies a core request to another node if not found locally. (David Smiley)

================== 9.8.0 ==================
New Features
---------------------
Expand Down
4 changes: 2 additions & 2 deletions solr/core/src/java/org/apache/solr/cli/ExportTool.java
Original file line number Diff line number Diff line change
Expand Up @@ -660,8 +660,8 @@ class CoreHandler {
}

boolean exportDocsFromCore() throws IOException, SolrServerException {

try (SolrClient client = CLIUtils.getSolrClient(baseurl, credentials)) {
// reference the replica's node URL, not the baseUrl in scope, which could be anywhere
try (SolrClient client = CLIUtils.getSolrClient(replica.getBaseUrl(), credentials)) {
Copy link
Contributor

@janhoy janhoy Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a CLI tool, and the CLI tool is potentially being used from a computer outside the cluster itself, I wonder if a side-effect of this might be that it won't work through reverse proxy / ingress with a different external DNS name than what Solr has on the inside.

If I interpret this code correctly, replica.getBaseUrl() is data fetched from ZK clusterstate, while baseUrl was the user-supplied --solr-url. Thus it could be that export tool would currently work through a proxy / ingress due to the auto proxying, but may not work with this change.

A related and similar issue is that the Admin UI does not work well behind a reverse proxy, if you try to click on links for other Cores it will all break.

If we truly want to support clients through reverse proxies, I believe the correct fix is for the Solr server to support Forwarded HTTP header (and its X-Forwarded-* counterparts) and rewrite URLs served in REST responses to adhere to the external address space, not the internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that this change would break for such a scenario. Thankfully the ExportTool is using the Solr/HTTP ClusterStateProvider instead of ZK; thus the URLs may be rewritable by such a proxy thing since it'd be at the HTTP level. But Solr the client would need to be configured to use XML instead of javabin, but I think it's javabin by default. I could imagine adding special support for this for CLUSTERSTATE or COLSTATUS

Despite that issue, I still believe in this change. I don't think it makes sense for SolrCloud to go through such guessing efforts to resolve collection/core ambiguity. Ideally we'd improve on that so that we wouldn't have ambiguity. I could imagine a /solr/collectionName!shardName/select syntax or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a corner case for sure and not worth modifying until there is a user request. Just wanted to highlight the issue and raise awareness as to whether we want bin/solr to work through reverse proxy.

expectedDocs = getDocCount(replica.getCoreName(), client, query);
QueryRequest request;
ModifiableSolrParams params = new ModifiableSolrParams();
Expand Down
56 changes: 13 additions & 43 deletions solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some nice simplification!

Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
import java.lang.invoke.MethodHandles;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.Enumeration;
Expand Down Expand Up @@ -1090,39 +1089,18 @@ private SolrCore checkProps(ZkNodeProps zkProps) {
return core;
}

private List<Slice> getSlicesForAllCollections(ClusterState clusterState, boolean activeSlices) {
// looks across *all* collections
if (activeSlices) {
return clusterState
.collectionStream()
.flatMap(coll -> Arrays.stream(coll.getActiveSlicesArr()))
.toList();
} else {
return clusterState.collectionStream().flatMap(coll -> coll.getSlices().stream()).toList();
}
}

protected String getRemoteCoreUrl(String collectionName, String origCorename)
throws SolrException {
ClusterState clusterState = cores.getZkController().getClusterState();
final DocCollection docCollection = clusterState.getCollectionOrNull(collectionName);
Slice[] slices = (docCollection != null) ? docCollection.getActiveSlicesArr() : null;
List<Slice> activeSlices;
if (docCollection == null) {
return null;
}
Collection<Slice> activeSlices = docCollection.getActiveSlices();
boolean byCoreName = false;

int totalReplicas = 0;

if (slices == null) {
byCoreName = true;
// all collections!
activeSlices = getSlicesForAllCollections(clusterState, true);
if (activeSlices.isEmpty()) {
activeSlices = getSlicesForAllCollections(clusterState, false);
}
} else {
activeSlices = List.of(slices);
}

for (Slice s : activeSlices) {
totalReplicas += s.getReplicas().size();
}
Expand All @@ -1145,40 +1123,32 @@ protected String getRemoteCoreUrl(String collectionName, String origCorename)
"No active replicas found for collection: " + collectionName);
}

String coreUrl =
getCoreUrl(collectionName, origCorename, clusterState, activeSlices, byCoreName, true);
String coreUrl = getCoreUrl(origCorename, clusterState, activeSlices, byCoreName, true);

if (coreUrl == null) {
coreUrl =
getCoreUrl(collectionName, origCorename, clusterState, activeSlices, byCoreName, false);
coreUrl = getCoreUrl(origCorename, clusterState, activeSlices, byCoreName, false);
}

return coreUrl;
}

private String getCoreUrl(
String collectionName,
String origCorename,
ClusterState clusterState,
List<Slice> slices,
Collection<Slice> slices,
boolean byCoreName,
boolean activeReplicas) {
String coreUrl;
Set<String> liveNodes = clusterState.getLiveNodes();

List<Slice> shuffledSlices;
if (slices.size() < 2) {
shuffledSlices = slices;
} else {
shuffledSlices = new ArrayList<>(slices);
Collections.shuffle(shuffledSlices, Utils.RANDOM);
}
Iterator<Slice> shuffledSlices = new RandomIterator<>(Utils.RANDOM, slices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why we need to shuffle if we look for a specific core by its name. But I see that the byCoreName parameter is always false (byCoreName is now always false).
So the code wants to find any active replica of any shard in a specific collection, and then return a url with replica.getBaseUrl() + "/" + origCoreName. Do I understand correctly? Is it the intended behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking closer. I see there's some dead code to remove -- byCoreName and don't pass origCorename in either. I'll do another update to clarify.

while (shuffledSlices.hasNext()) {
Slice slice = shuffledSlices.next();

for (Slice slice : shuffledSlices) {
List<Replica> randomizedReplicas = new ArrayList<>(slice.getReplicas());
Collections.shuffle(randomizedReplicas, Utils.RANDOM);
Iterator<Replica> shuffledReplicas = new RandomIterator<>(Utils.RANDOM, slice.getReplicas());
while (shuffledReplicas.hasNext()) {
Replica replica = shuffledReplicas.next();

for (Replica replica : randomizedReplicas) {
if (!activeReplicas
|| (liveNodes.contains(replica.getNodeName())
&& replica.getState() == Replica.State.ACTIVE)) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ Users who previously relied on collection-specific URLs to avoid including the c

The service installer now installs a `systemd` startup script instead of an `init.d` startup script. It is up to the user to uninstall any existing `init.d` script when upgrading.

=== SolrCloud request routing

HTTP requests to SolrCloud that are for a specific core must be delivered to the node with that core.
Past SolrCloud versions would scan all cores everywhere to find a node with the core, and if found then proxy the request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a slightly dangling sentence... I now understand the "past", now what is happening? Or was this just a a inefficient bug pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now what is happening

Will HTTP 404 if Solr can't resolve the request. I should add that. I don't find the last sentence to be dangling but feel free to offer a rewording suggestion.


=== Deprecation removals

* The `jaegertracer-configurator` module, which was deprecated in 9.2, is removed. Users should migrate to the `opentelemetry` module.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -204,25 +204,13 @@ private void addAndQueryDocs(final String baseUrl, int docs) throws Exception {

SolrQuery query = new SolrQuery("*:*");

String collectionUrl = baseUrl + "/onenodecollection" + "core";
try (SolrClient client = getHttpSolrClient(baseUrl, "onenodecollectioncore")) {

// it might take a moment for the proxy node to see us in their cloud state
waitForNon403or404or503(client, collectionUrl);

try (SolrClient client = getHttpSolrClient(baseUrl, "onenodecollection")) {
// add a doc
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", docs);
client.add(doc);
client.add(sdoc("id", docs));
client.commit();

QueryResponse results = client.query(query);
assertEquals(docs - 1, results.getResults().getNumFound());
}

try (SolrClient client = getHttpSolrClient(baseUrl, "onenodecollection")) {
QueryResponse results = client.query(query);
assertEquals(docs - 1, results.getResults().getNumFound());

SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", docs + 1);
Expand Down
Loading