Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Process Improvement Need]: Continuous Delivery #69

Open
5 tasks
riverma opened this issue Sep 8, 2022 · 8 comments
Open
5 tasks

[New Process Improvement Need]: Continuous Delivery #69

riverma opened this issue Sep 8, 2022 · 8 comments
Assignees
Labels
low complexity Ticket is relatively easy & straightforward to resolve most requested Highly requested by community members software lifecycle Process improvements involving developing, testing, integrating, deploying software

Comments

@riverma
Copy link
Collaborator

riverma commented Sep 8, 2022

Checked for duplicates

Yes - I've already checked

Category

Software Lifecycle - the creation, change, and release of software

Describe the need

We have a need for best practices related to continuous delivery. Specifically, this means:

  • Automatically publishing built artifacts to package managers / repositories. This includes:
    • Agreeing upon a set of package repositories based on the type of artifact: For example, PyPI for Python packages, DockerHub for Docker images, Maven for Java, etc.
    • Agreeing upon a naming scheme to describe package namespaces within the repositories. For example, should we name artifacts gov.nasa.jpl.<proj>.<package> etc.?
    • Agreeing upon tiers of software packages to deliver, e.g. snapshots vs. versioned releases
    • Automation for GitHub-based repositories to automatically push built packages to repositories upon code commits
  • A guide that walks through the above

Tasks

+1'd by @mike-gangl @mcduffie @LucaCinquini @kgrimes2

@riverma riverma self-assigned this Sep 8, 2022
@riverma riverma added the most requested Highly requested by community members label Sep 8, 2022
@riverma riverma added the low complexity Ticket is relatively easy & straightforward to resolve label Sep 29, 2022
@riverma
Copy link
Collaborator Author

riverma commented Oct 6, 2022

@jpl-jengelke recommends incorporating this into the current CI guide

@riverma riverma added software lifecycle Process improvements involving developing, testing, integrating, deploying software and removed enhancement labels Nov 1, 2022
@riverma
Copy link
Collaborator Author

riverma commented Dec 1, 2022

@riverma
Copy link
Collaborator Author

riverma commented Jan 31, 2023

Recommendations

Packages (Python)

  • Suggested Repository:
  • Suggested Naming convention
    • nasa-[project org]-[module name] [semantic version ID]
  • Known benefits
    • Free
  • Known constraints

Packages (Java)

  • Suggested Repository:
  • Suggested Naming convention
    • gov.nasa.[project org].[module name]
  • Known benefits
    • Free
    • No specific constraints to package size or volume
  • Known constraints
    • N/A

Packages (NodeJS)

  • Suggested Repository:
  • Suggested Naming convention
    • @nasa-[project org]/[module name]
  • Known benefits
    • Free
    • No specific constraints to package size or volume
  • Known constraints
    • Current NASA-branded packages vary in terms of account ownership and naming convention, potentially causing confusion

Packages (Miscellaneous)

  • Repository:
    • Store in you Cloud-based VCS of choice (e.g. GitHub Releases, GitLab Package Registry)
  • Suggested Naming convention
  • Known benefits
    • Free
    • Unlimited number of packages
  • Known constraints
    • Typically < 2GB individual package limit

Infrastructure Deployments (Terraform)

  • Repository:
  • Suggested Naming convention
    • terraform-nasa-[project org]-modules/[module-name]
  • Known benefits
    • Free
  • Known constraints
    • No officially sponsored NASA namespace currently exists, potentially causing confusion
    • 1000 document limit per account
    • 500KB max file size per document

Test Data (Small < 2GB)

  • Repository:
    • Create a new repository in you Cloud-based VCS of choice (e.g. GitHub Releases, GitLab Package Registry)
  • Suggested Naming convention
    • [project org]-[project module]-test-dataset
  • Known benefits
    • Free
  • Known constraints
    • Typically < 2GB dataset limit

Test Data (Large: 2GB - 100GB)

  • Repository:
  • Suggested Naming convention
    • N/A
  • Known benefits
    • Scalable storage
    • Authentication to rate-limit bandwidth usage
  • Known constraints
    • Non-free

Test Data (Large > 100GB)

Containers (Archival / Public)

  • Repository:
    • Store in you Cloud-based VCS of choice (e.g. GitHub Packages, GitLab Package Registry)
  • Suggested Naming convention
    • nasa-[project org]-[project module]:[tag]
  • Known benefits
    • Free
    • No size or bandwidth limits known for public repositories
  • Known constraints
    • Usage limitations on private repositories
    • High-latency for on-demand, runtime applications

Containers (Runtime / Private)

  • Repository:
    • Amazon Elastic Container Registry (ECR)
  • Suggested Naming convention
    • nasa-[project org]-[project module]:[tag]
  • Known benefits
    • Private repositories
    • Low-latency pulls for runtime usage, especially in Amazon Web Services (AWS)
  • Known constraints
    • Subject to pricing

@drewm-jpl
Copy link

drewm-jpl commented Feb 3, 2023

Hi @riverma,

Regarding repositories for test data, it might be worth looking at the data repository guidance provided by Scientific Data - Nature (https://www.nature.com/sdata/policies/repositories).

In particular, their list of recommended generalist data repositories may be pertinent:
image

@galenatjpl
Copy link

@riverma it looks like you have done a great job defining the repositories and formats that I would expect here. I'm mostly familiar with Maven central, and PyPI from building things in the past.

I think one thing to consider (which may be tangential to this ticket), is how/when do we push artifacts to these places? We have sort of thought about some notional methodologies related to this (see the blue part of this diagram).

My thoughts about test data are that:
1). We will be hopefully centralizing on a single representative "golden dataset" that exercises the capabilities we care to test.
2). As such, we should probably just store that dataset in S3, and be done with it. We aren't going to be storing gobs and gobs of data, but we just need that representative "starter" data. Any data produced as a result of SPS runs can be transitory, and deleted relatively quickly after verification. In other words, we aren't an actual mission, and won't had the Life Of Mission data requirements and associated costs. If we need to store several gigabytes of data on S3, it's not going to break the bank.

That being said, I haven't taken a look at the repositories @drewm-jpl mentioned. I do know that we are all familiar with AWS/S3 though..

@galenatjpl
Copy link

Also, you might want to take a quick look at AWS CodeArtifact, but perhaps that wouldn't be the best solution to have work with a fully open-source building process. Or maybe it would work? Other public services like maven and pypi might be better, but just pointing out CodeArtifact, in case it wasn't looked at as part of this eval.

@riverma
Copy link
Collaborator Author

riverma commented May 24, 2023

ingyhere added a commit to ingyhere/slim-starterkit-python that referenced this issue Mar 15, 2024
…n with structure of other similar configs; Minor code changes in response to static analysis…
@riverma riverma removed their assignment Aug 12, 2024
@riverma riverma changed the title [New Process Improvement Need]: Artifact packaging hosting and dependency management [New Process Improvement Need]: Continuous Delivery Aug 21, 2024
@yunks128 yunks128 self-assigned this Sep 5, 2024
@yunks128
Copy link
Contributor

yunks128 commented Oct 1, 2024

@yunks128
Next steps:

  1. Finalize Repository and Naming Conventions

    • Several repository and naming conventions are outlined (e.g., PyPI, DockerHub, Maven).
    • Confirm and document the agreed-upon repository choices, especially for large datasets. Need to decide whether AWS CodeArtifact should be integrated or rejected.
  2. Automation for CI/CD Pipeline

    • The primary need is to automate the build and push processes to the chosen repositories. GitHub Actions/Workflows is the suggested tool.
    • Create and share reusable workflow templates for common repositories like PyPI, Maven, and DockerHub.
  3. Handling Large Test Data

    • You may centralize smaller test datasets (<2GB) in cloud-based version control systems (VCS) like GitHub Releases, but larger datasets need a solution, possibly using S3 or DAAC for Earth data.
    • Define a clear storage and retrieval strategy for both small and large test data, including cost considerations for S3.
    • Set up scripts or workflows to handle dataset uploading, retention, and versioning.
  4. Guide Creation

    • A SLIM best practices guide is needed to compile all the aforementioned strategies, repository decisions, automation workflows, and versioning schemes.
    • Draft the continuous delivery guide, starting with the repository, automation, and versioning steps. This guide can be integrated into the repository's docs folder.
  5. Continuous Testing Integration

    • Since the continuous delivery process also involves ensuring that packages are tested before release, integrating a continuous testing strategy will strengthen the CD process.
    • Build upon the recent Continuous Testing Guide ([New Best Practice Guide]: Continuous Testing Guide and Checklist #110) by linking the testing pipeline to the delivery pipeline, ensuring tests are automatically executed before artifacts are published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low complexity Ticket is relatively easy & straightforward to resolve most requested Highly requested by community members software lifecycle Process improvements involving developing, testing, integrating, deploying software
Projects
Status: 🏗 In Progress
Development

No branches or pull requests

4 participants