-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex subset of vertices #4726
base: master
Are you sure you want to change the base?
Conversation
e8b5d2d
to
b0bde5b
Compare
6104281
to
c949e47
Compare
Signed-off-by: ntisseyre <[email protected]>
c949e47
to
7879876
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ntisseyre !
Looks great! I have just two small comments.
public static final ConfigOption<Integer> KEYS_SIZE = new ConfigOption<>(STORAGE_NS,"keys-size", | ||
"The maximum amount of keys/partitions to retrieve from distributed storage system by JanusGraph in a single request.", | ||
ConfigOption.Type.MASKABLE, 100); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this config was not added into the configuration reference and that's why CI is failing.
Could you please execute mvn --quiet clean install -DskipTests=true -pl janusgraph-doc -am
and amend your commit? This will automatically re-generate configuration reference documentation.
import java.util.List; | ||
import java.util.function.Function; | ||
|
||
public class CQLSubsetIterator<TItem> implements Iterator<TItem> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpick)
Codacy suggests:
Generics names should be a one letter long and upper case.
I usually use all upper case, but not always one letter.
Summary
This PR introduces a significant optimization to the reindexing process in
JanusGraph
by allowing a subset of vertices to be reindexed instead of scanning the entire storage.This enhancement provides substantial performance improvements, primarily when the specific subset of vertices for indexing is already known.
NOTE
This feature is currently supported only for
CQL
storage. Other storage backends still need to be implemented.KeyColumnValueStore.java
Motivation
Previously, reindexing required scanning all vertices in storage, which could be highly resource-intensive and time-consuming, particularly in large datasets.
This update enables users to focus on a targeted subset of vertices, reducing the time and computational load for reindexing. This is especially beneficial in environments where only specific vertices are relevant to a given index or data update.
Changes
API in
JanusGraphManagement
Benefits
Enhanced Flexibility: This feature allows users to update specific sections of the graph more easily without impacting the entire dataset.
Backward Compatibility
This feature is backward compatible and does not impact existing functionality. Users not specifying a subset will still experience the previous behavior of scanning the entire storage.