Project policy on AI-assisted contributions (ChatGPT, Copilot, etc) #15269

ahrens · 2023-09-13T04:17:06Z

ahrens
Sep 13, 2023
Maintainer

Since the topic was raised in #15264, I wanted to share my thoughts on using generative AI to help make contributions to the project.

I don’t think it’s especially relevant if contributors get help reading or writing code from a friend, an AI assistant, etc. I don’t think it’s critical to disclose this either, although the context may be helpful in some cases. We expect that contributors have the legal ability to license their submission under the applicable open source license (e.g. CDDL or GPL).

More importantly, we would like for submitters to have the time and understanding to participate in the code review process, and ideally to be around to help with problems that may arise with their code post-integration. The reality is that this standard is not always met (e.g. “drive-by” PR’s). Even folks trying to meet these standards may be called away with other life commitments, or not have the experience to thoroughly understand how their changes integrate with tricky existing code. It’s up to us as reviewers to apply some judgement about whether to take on the collective responsibility of maintaining the change.

We should review PR's on their merits. Submitters should expect to help reviewers understand their changes, and to make modifications to their PR as requested by subject-matter experts.

pcd1193182 · 2023-09-13T18:26:10Z

pcd1193182
Sep 13, 2023
Collaborator

My major concern on the subject is legal, rather than anything else, which means I'm not an expert on what the answer is or should be. And it relies on an underlying question: If we merge a PR with code that we believed was correctly licensed, but wasn't (Whether that's because it was taken from another codebase with a different license, or generated by AI, or whatever), what is our responsibility/vulnerability in that case? What are we obligated to do? If we are just obligated to remove or replace the code, I don't think the legal issues here are necessarily that fraught; if it turns out that AI-generated code is legally uncopyrightable, we may have to revert or reimplement some patches, but it's unlikely anything more dramatic will happen. If we have other responsibilities or vulnerabilities, it might be more important to make sure such code doesn't enter the codebase in the first place.

I think another valid concern is code quality; because LLMs are fundamentally incapable of understanding things, and instead simply spit out statistically likely symbols, the code they generate may not be correct. I'm not concerned about the cases where the code is obviously terrible; those will fail tests, or not compile, or be caught by normal scrutiny. The case where the code looks right but is subtly wrong is more interesting. Those cases are of course already possible with human-authored code, but it seems likely to me that LLM-generated code is likely to have different failure modes than human-generated code. It might be useful to know what the provenance of the code is that we're reviewing, in case we need to apply different checks to the two.

I think my conclusion, from those two points, is that I would be in favor of requiring that code that was largely LLM generated be labelled as such in the PR. That doesn't mean it will get rejected, but it may be beneficial for us to know in the future if we have LLM-generated code in the codebase. And it may be helpful for reviewers to look at the code with the lens of "this may be weird in ways I'm not used to".

2 replies

ahrens Sep 14, 2023
Maintainer Author

If we merge a PR with code that we believed was correctly licensed, but wasn't, what is our responsibility/vulnerability in that case?

Absent compelling evidence, I think we should go with the simplest solution, which would be to remove the affected code, e.g. with git revert. I don't personally feel the need to spend a lot of effort on this hypothetical.

It might be useful to know what the provenance of the code is that we're reviewing, in case we need to apply different checks to the two.

That seems reasonable.

I would be in favor of requiring that code that was largely LLM generated be labelled as such in the PR

I wouldn't go so far as "requiring", but I would be fine with recommending or encouraging this. (If we were to "require" it, and someone forgot to label it, what would the consequences be? Rejection without code review seems like it could discard useful contributions.)

Thanks for your thoughts, Paul!

IvanVolosyuk Sep 14, 2023

This kinda opens a can of worms. :)

What is LLM-generated code? Does autocomplete qualify? Autocomplete in editor can be clangd based, it can be single-line LLM or even multi-line code block autocomplete.
On the other hand the full change can be written once by AI, iterated over the initial version by human (may be with help of the AI)...
Where we draw the line?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project policy on AI-assisted contributions (ChatGPT, Copilot, etc) #15269

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Project policy on AI-assisted contributions (ChatGPT, Copilot, etc) #15269

ahrens Sep 13, 2023 Maintainer

Replies: 1 comment · 2 replies

pcd1193182 Sep 13, 2023 Collaborator

ahrens Sep 14, 2023 Maintainer Author

IvanVolosyuk Sep 14, 2023

ahrens
Sep 13, 2023
Maintainer

Replies: 1 comment 2 replies

pcd1193182
Sep 13, 2023
Collaborator

ahrens Sep 14, 2023
Maintainer Author