Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to migrate existing build results to Pulp #3509

Merged
merged 1 commit into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions backend/run/copr-change-storage
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#! /usr/bin/python3
Fixed Show fixed Hide fixed

"""
Migrate existing build results for a given project and all of its CoprDirs
from one storage (Copr backend) to another (Pulp).
"""

import os
import sys
import argparse
import logging
from copr_common.log import setup_script_logger
from copr_backend.helpers import BackendConfigReader
from copr_backend.storage import PulpStorage


STORAGES = ["backend", "pulp"]

log = logging.getLogger(__name__)
setup_script_logger(log, "/var/log/copr-backend/change-storage.log")


def get_arg_parser():
Fixed Show fixed Hide fixed
"""
CLI argument parser
"""
parser = argparse.ArgumentParser()
parser.add_argument(
"--src",
required=True,
choices=STORAGES,
help="The source storage",
)
parser.add_argument(
"--dst",
required=True,
choices=STORAGES,
help="The destination storage",
)
parser.add_argument(
"--project",
required=True,
help="Full name of the project that is to be migrated",
)
parser.add_argument(
"--delete",
action="store_true",
default=False,
help="After migrating the data, remove it from the old storage",
)
return parser


def is_valid_build_directory(name):
"""
See the `copr-backend-resultdir-cleaner`. We may want to share the code
between them.
"""
if name in ["repodata", "devel"]:
return False

if name.startswith("repodata.old") or name.startswith(".repodata."):
return False

if name in ["tmp", "cache", "appdata"]:
return False

parts = name.split("-")
if len(parts) <= 1:
return False

number = parts[0]
if len(number) != 8 or any(not c.isdigit() for c in number):
return False

return True


def main():
Fixed Show fixed Hide fixed
"""
The main function
"""
parser = get_arg_parser()
args = parser.parse_args()

if args.src == args.dst:
log.info("The source and destination storage is the same, nothing to do.")
return

if args.src == "pulp":
log.error("Migration from pulp to somewhere else is not supported")
sys.exit(1)

if args.delete:
log.error("Data removal is not supported yet")
sys.exit(1)

config_file = "/etc/copr/copr-be.conf"
config = BackendConfigReader(config_file).read()
owner, project = args.project.split("/")
ownerdir = os.path.join(config.destdir, owner)

for subproject in os.listdir(ownerdir):
if not (subproject == project or subproject.startswith(project + ":")):
continue

coprdir = os.path.join(ownerdir, subproject)
for chroot in os.listdir(coprdir):
if chroot == "srpm-builds":
continue

chrootdir = os.path.join(coprdir, chroot)
if not os.path.isdir(chrootdir):
continue

appstream = None
devel = None
storage = PulpStorage(
owner, subproject, appstream, devel, config, log)

for builddir in os.listdir(chrootdir):
resultdir = os.path.join(chrootdir, builddir)
if not os.path.isdir(resultdir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this check is enough..., maybe the upload_build_results is clever enough to handle issues? But see how this checking is done for the resultdir cleaner crawler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that would cause problems. Updated.

continue

if not is_valid_build_directory(builddir):
log.info("Skipping: %s", resultdir)
continue

# TODO Fault-tolerance and data consistency
# Errors when creating things in Pulp will likely happen
# (networking issues, unforseen Pulp validation, etc). We
# should figure out how to ensure that all RPMs were
# successfully uploaded, and if not, we know about it.
#
# We also need to make sure that no builds, actions, or cron,
# are currently writing into the results directory. Otherwise
# we can end up with incosystent data in Pulp.

full_name = "{0}/{1}".format(owner, subproject)
result = storage.init_project(full_name, chroot)
if not result:
log.error("Failed to initialize project: %s", resultdir)
break

# We cannot check return code here
storage.upload_build_results(chroot, resultdir, None)

result = storage.publish_repository(chroot)
if not result:
log.error("Failed to publish a repository: %s", resultdir)
break

log.info("OK: %s", resultdir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can not make this in a transactional manner.. (if error happens, rollback). But would it be possible to first analyze the situation and gather the tasks that need to be done, fail if some problem happens, and only if no problems happen - start the processing?

Also, I'm curious if whether we need a project lock (for building and other modification).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But would it be possible to first analyze the situation and gather the tasks that need to be done, fail if some problem happens, and only if no problems happen - start the processing?

Sooo, I am not really sure how helpful this would be. Gathering tasks beforehand would probably avoid issues like the script trying to access a directory it doesn't have permissions to and then failing. Or something like this. But I suppose the majority of failures that can/will happen are going to happen due to networking issues or something else when actually uploading things to Pulp. And having a calculated list of tasks wouldn't IMHO help.

I would probably only remember or maybe pre-calculate the number of RPM files we are uploading and after everything is done, query Pulp to find out if we have the same number. Or maybe compare names of RPMs if we wanted to be more precise. If it doesn't match, we can either re-try several times, or just log it and manually review all failures.

Also, I'm curious if whether we need a project lock (for building and other modification).

That dumping a lockfile in this script would be easy but changing our build-related code, action code, cron jobs, etc to respect the lock, sounds like a bigger problem.

If such a locking feature would be generally useful, then sure. But if the only purpose would be for the Pulp migration, I hope we could figure something easier.

For initial migrations of test users, I think we would be fine with "please don't submit new builds until the migration is finished". And the mass migration of everything, will be done in batches. So maybe we can just put an ugly hack into our build/action scheduler to temporarily hide all jobs that fall in the currently migrated batch.



if __name__ == "__main__":
main()
11 changes: 8 additions & 3 deletions frontend/coprs_frontend/commands/change_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
configure the storage type for a given project and while doing so, it makes sure
DNF repositories for the project are created.

Existing builds are not migrated from one storage to another. This may be an
useful feature but it is not implemented yet.
To migrate existing build results for a given project and all of its CoprDirs,
run also `copr-change-storage` script on backend.
"""

import sys
Expand Down Expand Up @@ -44,4 +44,9 @@ def change_storage(fullname, storage):
db.session.commit()
print("Configured storage for {0} to {1}".format(copr.full_name, storage))
print("Submitted action to create repositories: {0}".format(action.id))
print("Existing builds not migrated (not implemented yet).")
print("To migrate existing build results for this project and all of its "
"CoprDirs, run also this command on backend:")

cmd = "sudo -u copr copr-change-storage --src backend --dst pulp "
cmd += "--project {0}".format(fullname)
print(" {0}".format(cmd))
Loading