Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse serviceName in VerticaReplicator #826

Closed
wants to merge 73 commits into from

Conversation

fenic-fawkes
Copy link
Collaborator

@fenic-fawkes fenic-fawkes commented Jun 3, 2024

allow users to provide a service name to use for source and target databases. if unspecified we use the first service assigned to the database with a cluster ip, if supplied we validate that it's assigned to the appropriate main/sandbox cluster

spilchen and others added 30 commits March 21, 2024 14:37
This pulls in the latest vclusterops library changes. Many of the
changes that we are pulling in were changes related to improving the
vcluster CLI.
This pulls in the new signature for VStartDatabase. A new parameter was
returned, which we can safely ignore for the operators usage.
During a replicated upgrade, we must categorize the subclusters into one
of two replica groups. Initially, all subclusters participating in the
upgrade will be placed in replica group A. Subsequently, new subclusters
will be generated throughout the process and assigned to the second
group, named replica group B. This change manages the assignment of
subclusters at the beginning of the upgrade process. To facilitate this
change, new status fields will be introduced to the VerticaDB, enabling
the tracking of this state across reconcile iterations.
This adds a new controller for sandbox. It is going to watch a ConfigMap
that has labels on it to indicate it contains state for a sandbox. That
ConfigMap also contains the vdb name whose subclusters are involved in
sandbox/unsanbox. For now it simply reads the VerticaDB found in the
configMap and do nothing else.
Some changes are also made to the VerticaDB API to add new fields
related to sanbox
This PR eliminates labels from the `VerticaReplicator` sample config.
The intention is to enhance the user experience on OpenShift when
generating this CR from the webUI. OpenShift utilizes this sample to
prepopulate fields in the webUI interface.
This adds a few webhook rules for replicated upgrade. It's probably not
a complete list, but we can always add more later when we find rules to
check. This commit adds the following:
- ensures the subclusters involved in the upgrade are stable. They
cannot be removed and the sizes can't change.
- We do allow new subclusters to be added to one of the replica groups.
But they must be secondary subclusters. No new primaries.
- ensures that subclusters listed in the
`.status.updateState.replicaGroups` aren't repeated.
We now collect the sandbox name in podfacts. We updated the vsql query
to return the extra state. We built an interface to get the information
as we intend to use a REST endpoint in the future but still need support
via vsql for older releases.
This adds a new leg, e2e-leg-9, to the CI. This is going to be used to
test changes done in 24.3.0. One new change with this is that it will
use a vertica license so that we can test scaling past 3 nodes. This
will be required in order to test out the new replicated upgrade work.
#764)

`vrep` reconciler enforces the minimum source and target db versions to
be at least 24.3.0

---------

Co-authored-by: Roy Paulin <[email protected]>
Co-authored-by: spilchen <[email protected]>
This PR adds below webhooks rules for sandboxes in CRD:
- cannot scale (up or down) any subcluster that is in a sandbox
- cannot remove a subcluster that is sandboxed. It must be unsandboxed
first.
- cannot have multiple sandboxes with the same name
- cannot have the image of a sandbox be different than the main cluster
before the sandbox has been setup
- cannot be used on versions older than 24.3.0
- cannot be used before the database has been initialized
- cannot have duplicate subclusters defined in a sandbox
- cannot have a subcluster defined in multiple sandboxes
- cannot have a non-existing subcluster defined in a sandbox

We could add more rules later for sandboxes when we have needs.
we want to use restart reconciler in both the VerticaDB controller and
the sandbox controller. This makes some changes to the restart
reconciler in order to achieve that. The sandbox name (empty string if
VerticaDB controller) is passed to the reconciler and it uses it to
target only pods belonging to that sandbox( or to target only pods
belonging to the main cluster if sandbox name is empty)
This change adds stubs to the replicated upgrade reconciler. It attempts
to have the functions present that are needed to do the upgrade. Due to
dependencies, such as sandboxing and replication, comprehensive testing
is currently limited to unit tests. While this logic might evolve during
integration, it gives a framework for how the upgrade should function.

Additionally, I identified new requirements for state management. Two
new annotations have been introduced to the VerticaDB:
- vertica.com/replicated-upgrade-sandbox: to record the name of the
sandbox created for the upgrade
- vertica.com/replicated-upgrade-replicator-name: to track the name of
the VerticaReplicator CR utilized for replicating data to the sandbox.
This PR adds a new webhook rule for sandboxes in CRD: cannot have a
primary subcluster defined in a sandbox.
This adds a new fetcher in podfacts to fetch node details from the
running database inside the pod. This new fetcher will call VClusterOps
API which will send an HTTP request to the database. The new fetcher
will be enabled when VclusterOps annotation is set, and Vertica server
version is not older than v24.3.0. The new fetcher should be faster and
more reliable than the old fetcher which will execute vsql within the
pod.
In the replicated upgrade process, we must pause and redirect client
connections from subclusters in replica group A to those in replica
group B. This is the initial stage of the change. Currently,
pause/redirect semantics are not supported, as they require new server
modifications that have not yet been implemented. Therefore, we will
perform an old-style drain to pause the process and maintain service
labels correctly to point to subclusters in replica group B for
redirection.

---------

Co-authored-by: Roy Paulin <[email protected]>
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
This pull request modifies the Vertica replicator controller to invoke
the replication command synchronously. Additionally, it introduces new
status conditions within VerticaReplicator to monitor the controller's
various states.

---------

Co-authored-by: spilchen <[email protected]>
If we had selected an upgrade method but ended up failing back to a
different method because something was incompatible, we are going to log
an event. For instance, if we requested something other than offline
upgrade, but we end up in the offline upgrade code path we will have an
event message like this:
```
 Normal   IncompatibleUpgradeRequested  7m20s                  verticadb-operator  Requested upgrade is incompatible with the Vertica deployment. Falling back to offline upgrade.
```
This PR adds sandbox reconciler to VerticaDB controller. The reconciler
will add subclusters to sandboxes in the database. The change of sandbox
fields in CRD will trigger sandbox reconciler, then the reconciler will
call vclusterOps to add subclusters to sandboxes.
This will add a status message to the `.status.upgradeStatus` field in
the VerticaDB as the operator goes through the various stages of
replicated upgrade. It reuses the same design for the other upgrade
methods in that we only advance the status message forward. This means
that we need to reconcile an iteration, we won't report the first status
message again.
Automated changes by pre-release.yml GitHub action

Signed-off-by: GitHub <[email protected]>
Signed-off-by: Matt Spilchen <[email protected]>
Co-authored-by: spilchen <[email protected]>
@jizhuoyu
Copy link
Collaborator

_ No description provided. _

Could we update the PR description please?

@fenic-fawkes
Copy link
Collaborator Author

needs one small vcluster change that's under review but then this should be good to merge

chinhtranvan and others added 2 commits June 13, 2024 21:20
This PR updates the Vertica image to `20240606` to avoid replication
errors from the server side. This is a temporary fix, and we will revert
this change after fixing the issue on the server side
This PR build a reconciler so that during the upgrade we can just
promote the sandbox after all of the data has been replicated to it.

---------

Co-authored-by: cchen-vertica <[email protected]>
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/vadmin/replication_start_vc.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
@cchen-vertica
Copy link
Collaborator

I've synced the vclusterOps to the Github repository. You can update your go.mod to refer to the latest commit in vcluster repo.

This PR used vsql to close all active user sessions before database
replication starts in the replicated upgrade. This can speed up the
upgrade process. This is a temporary change. We will improve this by
pausing all user sessions soon.
Copy link
Collaborator

@roypaulin roypaulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close. Sorry if what I say is often confusing. Let me know If I misunderstood what you are trying to do.

opts.SourceTLSConfig = s.SourceTLSConfig
opts.IsEon = v.VDB.IsEON()

opts.IPv6 = net.IsIPv6(s.SourceIP)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should not be a problem. @cchen-vertica ?

pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
pkg/controllers/vrep/replication_reconciler.go Outdated Show resolved Hide resolved
…pgrade (#829)

This PR added the last two steps of replicated upgrade: 
1. remove the old main cluster. After we promote the sandbox, the old
main cluster is not needed anymore so we remove it.
2. rename the subclusters in new main cluster. After we promote the
sandbox, we rename all subclusters in the sandbox to match the ones in
original main cluster.
chinhtranvan and others added 4 commits June 21, 2024 18:28
)

This PR changes underscore to hyphen for the service name in the
operator.
With the change on the server side, we have already fixed the issue in
#832. We can revert to
the latest Vertica image to fix the replication issue
The latest vclusterOps improves the promote_sandbox API to clean
communal storage after sandbox promotion. This PR simply updates
vclusterOps library to clean communal storage in the online upgrade.
After promoting sandbox to main, we need to delete sandbox config map to
avoid any kind of conflicts as the sandbox does not exist anymore.
@roypaulin
Copy link
Collaborator

Almost done! Resolve the conflicts and run the tests again.

@CLAassistant
Copy link

CLAassistant commented Jul 25, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ jizhuoyu
✅ chinhtranvan
✅ cchen-vertica
✅ roypaulin
✅ fenic-fawkes
❌ spilchen


spilchen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Base automatically changed from vnext to main July 26, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants