-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PreTest] Technical proposal for the checksum check of the three-party dependencies #431
base: main
Are you sure you want to change the base?
[PreTest] Technical proposal for the checksum check of the three-party dependencies #431
Conversation
All contributors have signed the CLA ✍️ ✅ |
5052f6c
to
8a7d63e
Compare
I have read the CLA Document and I hereby sign the CLA |
recheck |
Hi @zong-zhe @Peefy, |
Pull Request Test Coverage Report for Build 11678614661Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
cc @zong-zhe |
## For the existing kpm dependencies, we need a solution that does not break compatibility for starting this feature | ||
We can have two methods to ensure backward compatibility: | ||
|
||
1. Checksum Registry/Database: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall direction of this section is in the right direction 👍, and in the next stage, you may need to provide more details to design the CheckSum Database
for kpm.
At first, you can design what information to store in the Database and what it should look like in kpm and KCL packages. Refer to languages like cargo, go, etc. to see how they store this information and what information should be stored.
We can take some motivation from npm, where we create a `registry` that contains metadata for all the packages and dependencies known to KPM. This is similar to the Go checksum database, which has `checksum database` for all the modules known to Go. | ||
If we are not able to find the checksum entry in the lockfile, we can retrieve it from the registry or `checksum database`. We can also add a flag, similar to Go, allowing users to skip checking the registry or checksum database and download the package even if it might be corrupted. | ||
|
||
2. FLAG_NO_SUM_CHECK: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this section, it is important to clarify when checksums verification are requested in kpm, and what impact they may have on the current kpm process.
Combined with the current kcl run
, kcl mod update
, kcl mod push
, kcl mod pull
, sorts out their current work flow and the effective position of checksums verification.
You can use a simple illustration as the example to show the workflow for kpm:
kcl run -> step 1 -> step 2 -> .......
And checksum verification will add as :
verify sum here
|
kcl run -> step 1 -> step 2 -> ......
|
require sum here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @zong-zhe, do we need checksum verification in kcl run
? I navigated through the workflow of kcl run
but could not find any place where it downloads or fetches modules from Git or OCI registry as part of its operation. Instead, it only fetches information from the local storage path of external packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NishantBansal2003 😄
kpm downloads OCI and GIT related content via Downloader: https://github.com/kcl-lang/kpm/blob/main/pkg/downloader/downloader.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. I apologize if my previous message was unclear. What I meant to convey is that when we trigger the kcl run
command in the CLI, the functions that are executed do not seem to involve downloading content from Git or an OCI registry. Instead, they appear to only access information from the local storage path of external packages.
Additionally, I wanted to clarify whether the workflow example you provided was just an illustrative example or if we actually need to explore the possibility of incorporating checksum verification during the kcl run command. Should I investigate how checksum verification might be integrated into the kcl run process, or was that meant as a general example?
Could you please confirm if I’m understanding both aspects correctly?
Hey @zong-zhe, |
Signed-off-by: Nishant Bansal <[email protected]>
Signed-off-by: Nishant Bansal <[email protected]>
b015467
to
8d68dbd
Compare
Hey @zong-zhe, |
Hey @zong-zhe, please take a look when you have a moment. I'm happy to incorporate any changes if needed. |
We can take some motivation from npm, where we create a `registry` that contains metadata for all the packages and dependencies known to KPM. This is similar to the Go checksum database, which has `checksum database` for all the modules known to Go. | ||
If we are not able to find the checksum entry in the lockfile, we can retrieve it from the registry or `checksum database`. We can also add a flag, similar to Go, allowing users to skip checking the registry or checksum database and download the package even if it might be corrupted. | ||
|
||
2. FLAG_NO_SUM_CHECK: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NishantBansal2003 😄
kpm downloads OCI and GIT related content via Downloader: https://github.com/kcl-lang/kpm/blob/main/pkg/downloader/downloader.go
(Instead of using merkle tree for checksum database)We can take motivation from Cargo - where we store `Kpm module records` in Git Repository and each commit to this repository is signed with GPG keys. When we fetch metadata of the module we will also fetch its signed commit, after that we can verify the GPG signature of the commit, This verification ensures that the commits were made by an authorized entity and that the commit data has not been altered. | ||
|
||
|
||
### Work flow of kpm process and the effective position of checksum verification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to design a separate module called Checker
that can be used to check whether a KCL dependency is valid. This includes checking whether the dependency's name is appropriate, whether the version number matches the semantic version, and the checksum rather than simply making changes to an existing process. Independent modules can help us better test and manage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will try to design this. I will keep you updated about the progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does KCL impose any naming conventions on dependencies? If not, how can we determine whether a dependency name is appropriate?
cc: @zong-zhe
Signed-off-by: Nishant Bansal <[email protected]>
@zong-zhe, I’ve added the design for the checker module. PTAL.. |
pkg "kcl-lang.io/kpm/pkg/package" | ||
) | ||
|
||
type Version struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NishantBansal2003 😄
My suggestion here is that a simple composition pattern in OOP might be better, no particularly complex structure is required, just a Checker
abstract interface that provides a uniform function to check whether a dependency is valid, and then provides IdentChecker
, VersionChecker
and SumChecker
check a KCL dependent name, version, and sum. Finally, These three Checkers
are combined into one DepChecker
, which is also an implementation struct of Checker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adopting the composition pattern with a Checker
interface would be better. It enhances clarity, making the code easier to maintain and extend. I’ll proceed with this approach. Thanks for the suggestion!
} | ||
|
||
// parseVersion parses a semantic version string and returns its major, minor, and patch components. | ||
func parseVersion(v string) (p Version, ok bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content here does not need to be manually implemented, we can complete through the tripartite library, the specific can refer to here:
if the version is not semver, NewVersion
will return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we can find a library that simplifies our implementation, that would be great. I’ll look into this and refer to the tripartite library as suggested. Thank you for pointing it out!
} | ||
|
||
// HashLocalCache calculates the hash of all files in the specified local cache directory. | ||
func HashLocalCache(localCachePath string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has been defined before:
Line 32 in bea4575
func HashDir(dir string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took inspiration from Go's checksum generation (hence the "h1:" format). Though the differences are subtle, here are the trade-offs and when each method is preferable:
- HashLocalCache: Hashes both file content and name. Use when file names matter.
- HashDir: Can ignore certain files (e.g., .git). Use when excluding non-essential files is necessary.
- HashLocalCache: Slower due to sorting and name hashing. Use for higher accuracy with filenames and ordering.
- HashDir: Faster and simpler. Use when performance is critical and file names/order don’t matter.
I think we can discuss how to move forward from here.
Hey @zong-zhe, made some modifications based on reviews, PTAL |
Could you share some of the goals we have while implementing checksum verification, so I can explore possible solutions? |
|
||
// Checker is the interface for all dependency checkers. | ||
type Checker interface { | ||
Check(d pkg.Dependency, localCachePath string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to use the KclPkg
structure as a parameter, which is the interface for checking whether the package's properties are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that I should refactor the Checker
interface to use KclPkg
instead of pkg.Dependency
? This would allow each checker to access all the relevant information about the package and its dependencies in a single structure. Specifically:
- Using
KclPkg
to check the dependency name , version, and checksum for each dependency. - Ensuring that the
SumChecker
respects theNoSumCheck
flag inKclPkg
.
Would this approach align with the design goals, or do you recommend any adjustments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Using
KclPkg
to check the dependency name , version, and checksum for each dependency.
Ensuring that the SumChecker respects the NoSumCheck flag in KclPkg.
Yes, you can try to sync these into the doc PR and provide more design.
if err != nil { | ||
return fmt.Errorf("failed to calculate checksum: %w", err) | ||
} | ||
if d.Sum != gotSum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it compared with the sum value calculated by the local file ? I remember it was to be compared with the sum stored remotely or sum local cache. The purpose of checking the checksum is to prevent security risks caused by malicious replacement of remote third-party dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. Yes, the method calculates the checksum of the local file or cache.
The current implementation ensures that the local file matches the expected checksum stored in the dependency object (d.Sum
). This object should ideally be populated with the checksum value retrieved from a trusted remote source or sum local cache to ensure we are validating the file against a reliable source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that d.Sum
may be compromised, and that we should include the actual design for retrieving the checksum from the mod file or remotely? Previously, I thought that if this was the first time a package was downloaded, we would get the checksum from the remote source. If it was not the first time, we would check the checksum from the mod lock file, which we have already populated in d.Sum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NishantBansal2003 😄
I don't understand what you want to express. My understanding is that the main function of this checksum is to determine whether the sum in the lock file is the same as the sum on the remote end, ensuring the security of the package. If you think there might be a different reason or a different usage, you might need to provide things like rust or go that they do with checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the previous approach. Earlier, I thought that d.Sum
in KPM stores the trusted checksum for a dependency. However, after carefully reviewing the KPM code, I now understand that d.Sum
retrieves the checksum from the OCI registry or calculates it using HashDir
(which is to be stored in the lock file). This checksum needs to be compared with a remote trusted source to ensure the integrity of the dependency. Am I correct now?
Lines 1226 to 1236 in bea4575
dep.FromKclPkg(dpkg) | |
dep.Sum, err = c.AcquireDepSum(*dep) | |
if err != nil { | |
return nil, err | |
} | |
if dep.Sum == "" { | |
dep.Sum, err = utils.HashDir(localPath) | |
if err != nil { | |
return nil, err | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, now I think I have some clarity. I will update the design doc based on the reviews.
Hi @NishantBansal2003 😄 It is mainly divided into three parts:
|
Signed-off-by: Nishant Bansal <[email protected]>
b42cf76
to
045840b
Compare
Hi @zong-zhe, |
Hey @zong-zhe, please take a look when you have a moment. |
Hi @NishantBansal2003 😄 You can first propose a PR to implement |
Hey @zong-zhe, since the implementation of getTrustedSum is incomplete, should I still submit the current code as a PR, considering that any modifications can be made in the future before merging? |
PTAL @ #470, I think we can proceed with further discussion. |
Signed-off-by: Nishant Bansal <[email protected]>
Hey @zong-zhe, I have added the design for integrating the checker into kpm's workflow, PTAL. Also, I need some help with an issue I encountered while testing the changes locally: https://cloud-native.slack.com/archives/C05TC96NWN8/p1727073915893189?thread_ts=1727069902.233579&cid=C05TC96NWN8 |
I understood the difference from - Lines 209 to 267 in b4d396a
But I'm not sure how to handle the following case: Currently, my implementation raises an error if dep.Source.Oci is nil. I want to ensure that we handle both scenarios correctly during integration. Since dep.Source.Registry.Oci gets filled when downloading using the OCI reference , but not dep.Source.Oci , should I adjust the integration code to also populate dep.Source.Oci when dep.Source.Registry.Oci is present? Or do you have a better approach to address this?
|
I also want to know some package example, like
So, should I adjust the existing third-party dependency to include the checksum in its manifest? If so, how can I do that? |
@@ -0,0 +1,231 @@ | |||
# Integrating Checker into the Existing KPM Workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @NishantBansal2003 😄
There are some detailed issues in your proposal that we need to discuss before proceeding with the subsequent implementation. You need to supplement some details in your research plan.
For example:
- In functions that trigger re-downloads, will the verification process of third-party libraries be triggered, such as go build?
- If the local third-party libraries are modified, will the verification process of the third-party libraries be triggered? What are the verification results? For instance, Golang allows modifications to third-party libraries during development. After I modify the third-party libraries locally, how does the checksum work in Golang?
Additionally, please research how the above tasks are handled in npm and helm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some modifications based on the reviews. Please take a look and let me know if I am heading in the right direction. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @NishantBansal2003 😄
Currently, this plan is heading in the right direction, but it requires a few adjustments, specifically a clear division of tasks. We need to list each command in KPM that requires the integration of the checksum feature as sub-tasks.
Next, we need to prepare for the integration. We require a seamless upgrade process that does not render all existing third-party libraries unusable due to the addition of the checksum.
First, we need to know the specifics of the sums for all official third-party libraries. You may need to use some of the API methods currently provided by KPM to write a simple Go tool to tally which third-party libraries have sum fields and which do not. You can add a GitHub Action in this repository https://github.com/kcl-lang/modules to call this Go tool to generate a report. You can directly submit the PR of this Go tool to this repository https://github.com/kcl-lang/modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to list each command in KPM that requires the integration of the checksum feature as sub-tasks.
Already added the kcl commands that might need checksum integration and provided a rough approach through the code. If anything is missing or incorrect, let me know so I can fix it and propose an overall design later.
Signed-off-by: Nishant Bansal <[email protected]>
Signed-off-by: Nishant Bansal <[email protected]>
Hey @zong-zhe, |
Currently, I am considering overwriting the package artifact and updating the manifest with the given version using |
Hi 😄 |
Hey @zong-zhe, could you review this code design whenever you have time? Here is the link: |
Signed-off-by: Nishant Bansal <[email protected]>
1. Does this PR affect any open issues?(Y/N) and add issue references :
2. What is the scope of this PR (e.g. component or file name):
/docs/proposal
3. Provide a description of the PR(e.g. more details, effects, motivations or doc link):
4. Are there any breaking changes?(Y/N) and describe the breaking changes(e.g. more details, motivations or doc link):
5. Are there test cases for these changes?(Y/N) select and add more details, references or doc links: