Replies: 1 comment 2 replies
-
My major concern on the subject is legal, rather than anything else, which means I'm not an expert on what the answer is or should be. And it relies on an underlying question: If we merge a PR with code that we believed was correctly licensed, but wasn't (Whether that's because it was taken from another codebase with a different license, or generated by AI, or whatever), what is our responsibility/vulnerability in that case? What are we obligated to do? If we are just obligated to remove or replace the code, I don't think the legal issues here are necessarily that fraught; if it turns out that AI-generated code is legally uncopyrightable, we may have to revert or reimplement some patches, but it's unlikely anything more dramatic will happen. If we have other responsibilities or vulnerabilities, it might be more important to make sure such code doesn't enter the codebase in the first place. I think another valid concern is code quality; because LLMs are fundamentally incapable of understanding things, and instead simply spit out statistically likely symbols, the code they generate may not be correct. I'm not concerned about the cases where the code is obviously terrible; those will fail tests, or not compile, or be caught by normal scrutiny. The case where the code looks right but is subtly wrong is more interesting. Those cases are of course already possible with human-authored code, but it seems likely to me that LLM-generated code is likely to have different failure modes than human-generated code. It might be useful to know what the provenance of the code is that we're reviewing, in case we need to apply different checks to the two. I think my conclusion, from those two points, is that I would be in favor of requiring that code that was largely LLM generated be labelled as such in the PR. That doesn't mean it will get rejected, but it may be beneficial for us to know in the future if we have LLM-generated code in the codebase. And it may be helpful for reviewers to look at the code with the lens of "this may be weird in ways I'm not used to". |
Beta Was this translation helpful? Give feedback.
-
Since the topic was raised in #15264, I wanted to share my thoughts on using generative AI to help make contributions to the project.
I don’t think it’s especially relevant if contributors get help reading or writing code from a friend, an AI assistant, etc. I don’t think it’s critical to disclose this either, although the context may be helpful in some cases. We expect that contributors have the legal ability to license their submission under the applicable open source license (e.g. CDDL or GPL).
More importantly, we would like for submitters to have the time and understanding to participate in the code review process, and ideally to be around to help with problems that may arise with their code post-integration. The reality is that this standard is not always met (e.g. “drive-by” PR’s). Even folks trying to meet these standards may be called away with other life commitments, or not have the experience to thoroughly understand how their changes integrate with tricky existing code. It’s up to us as reviewers to apply some judgement about whether to take on the collective responsibility of maintaining the change.
We should review PR's on their merits. Submitters should expect to help reviewers understand their changes, and to make modifications to their PR as requested by subject-matter experts.
Beta Was this translation helpful? Give feedback.
All reactions