Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs_clone_range should return a more descriptive return value #15148

Merged
merged 1 commit into from
Aug 8, 2023

Conversation

oromenahar
Copy link
Contributor

Return the more descriptive ENOTTY instead of EXDEV when the parameters don't match the requirements of the clone function. EINVAL would also be possible because some of the parameters are invalid but I decided to go with ENOTTY, which is little bit more generic than EINVAL.

Motivation and Context

zfs_clone_range returned a lot EXDEV for erros wich are not really a cross device clone request. This should be change to match the return expectations of the function when calling it. Results in better and more stable code.

Description

Described in Motivation and Context. When I have time I will try to add some test, but my time is limited, so sorry for the missing tests.

How Has This Been Tested?

Checked if it still compiles and made a few runs on the shell.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly. (explained it in some comments.)
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@robn
Copy link
Member

robn commented Aug 2, 2023

Heh, I started looking at this a couple of nights ago but got into the weeds a bit, as I will now describe.

On the one hand, I would like the errors returned zfs_clone_range to make sense for it in isolation. If the calling function needs to convert them to satisfy some other API requirement, then that's its job. In the future zfs_clone_range will likely be used as the enabling tech for other features unrelated to the file cloning syscalls, so its error codes should reflect what's happening.

On the other hand, copy_file_range needs a clear signal about when it should fallback vs returning the error, and I wasn't sure if simple error codes were unambiguous enough to signal that.

(This was the point where I decided that I'd need to play around, which I haven't done yet).

So I lean softly towards the first option; make zfs_clone_range do its own thing, and the caller will have to figure it out from there. If it turns out that we can detect when to fallback cleanly, great! If we can't, then maybe we do something different (an extra output flag, a different set of error codes that map to the real ones but also mean "try something else", etc).

So for this PR, I would say the first three cases that are checking that the incoming ranges are sensibly aligned should return EINVAL, as this is matches the traditional meaning for that error code (caller asked for something that makes no sense) and also directly lines up with what FICLONERANGE returns returns, so no conversion required for most callers.

The fourth case, where the source block hasn't been written yet, is trickier. Probably I would just leave it as EAGAIN; EBUSY might work too. This is one of the tricky-to-adapt ones I mentioned before, and really zfs_clone_range shouldn't be doing any of this. ENOTTY definitely isn't the right error (which for ioctl is roughly "not implemented"), but neither is EXDEV. So copy_file_range will have to grow a check for fallback for that case, but that's ok for the first one.

So my version of this would be something like (totally untested):

diff --git module/os/freebsd/zfs/zfs_vnops_os.c module/os/freebsd/zfs/zfs_vnops_os.c
index 45cf6fdfc..89cd8b4fb 100644
--- module/os/freebsd/zfs/zfs_vnops_os.c
+++ module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6290,7 +6290,7 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
 
 	error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
 	    ap->a_outoffp, &len, ap->a_outcred);
-	if (error == EXDEV || error == EOPNOTSUPP)
+	if (error == EXDEV || error == EOPNOTSUPP || error == EINVAL)
 		goto bad_locked_fallback;
 	*ap->a_lenp = (size_t)len;
 out_locked:
diff --git module/os/linux/zfs/zpl_file_range.c module/os/linux/zfs/zpl_file_range.c
index 72384b638..e9185c1ca 100644
--- module/os/linux/zfs/zpl_file_range.c
+++ module/os/linux/zfs/zpl_file_range.c
@@ -103,7 +103,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
 	 * Since Linux 5.3 the filesystem driver is responsible for executing
 	 * an appropriate fallback, and a generic fallback function is provided.
 	 */
-	if (ret == -EOPNOTSUPP || ret == -EXDEV)
+	if (ret == -EOPNOTSUPP || ret == -EXDEV || ret == -EINVAL)
 		ret = generic_copy_file_range(src_file, src_off, dst_file,
 		    dst_off, len, flags);
 #else
@@ -111,7 +111,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
 	 * Before Linux 5.3 the filesystem has to return -EOPNOTSUPP to signal
 	 * to the kernel that it should fallback to a content copy.
 	 */
-	if (ret == -EXDEV)
+	if (ret == -EXDEV || ret == -EINVAL)
 		ret = -EOPNOTSUPP;
 #endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
 
diff --git module/zfs/zfs_vnops.c module/zfs/zfs_vnops.c
index 54ea43363..0a77c19ef 100644
--- module/zfs/zfs_vnops.c
+++ module/zfs/zfs_vnops.c
@@ -1171,7 +1171,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
 	 * We cannot clone into files with different block size.
 	 */
 	if (inblksz != outzp->z_blksz && outzp->z_size > inblksz) {
-		error = SET_ERROR(EXDEV);
+		error = SET_ERROR(EINVAL);
 		goto unlock;
 	}
 
@@ -1179,7 +1179,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
 	 * Offsets and len must be at block boundries.
 	 */
 	if ((inoff % inblksz) != 0 || (outoff % inblksz) != 0) {
-		error = SET_ERROR(EXDEV);
+		error = SET_ERROR(EINVAL);
 		goto unlock;
 	}
 	/*
@@ -1187,7 +1187,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
 	 */
 	if ((len % inblksz) != 0 &&
 	    (len < inzp->z_size - inoff || len < outzp->z_size - outoff)) {
-		error = SET_ERROR(EXDEV);
+		error = SET_ERROR(EINVAL);
 		goto unlock;
 	}
 
@@ -1240,17 +1240,9 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
 		nbps = maxblocks;
 		error = dmu_read_l0_bps(inos, inzp->z_id, inoff, size, bps,
 		    &nbps);
-		if (error != 0) {
-			/*
-			 * If we are tyring to clone a block that was created
-			 * in the current transaction group. Return an error,
-			 * so the caller can fallback to just copying the data.
-			 */
-			if (error == EAGAIN) {
-				error = SET_ERROR(EXDEV);
-			}
+		if (error != 0)
 			break;
-		}
+
 		/*
 		 * Encrypted data is fine as long as it comes from the same
 		 * dataset.

I'm not super committed to this though. Definitely keen to hear what you (and others) think!

@robn
Copy link
Member

robn commented Aug 2, 2023

On the other hand, copy_file_range needs a clear signal about when it should fallback vs returning the error, and I wasn't sure if simple error codes were unambiguous enough to signal that.

An alternate approach might be to always use the fallback except in cases where we know the call should fail. copy_file_range is "I don't care how this happens", so maybe we should let the fallback give it a try. If it can't perform, it'll return its own error code anyway and if it can, great, the application got what it asked for.

@oromenahar
Copy link
Contributor Author

oromenahar commented Aug 2, 2023

hey
first:

So for this PR, I would say the first three cases

I was convinced a few minutes ago, while waiting for some tea water, this will be changed. I converted it in the os/linux/zpl_clone code so just return EINVAL

I'm not sure about the fourth case. I'm thinking about this a few days now as well. I'm going to reconsider it tomorrow.

An alternate approach might be to always use the fallback except in cases where we know the call should fail. copy_file_range is "I don't care how this happens", so maybe we should let the fallback give it a try. If it can't perform, it'll return its own error code anyway and if it can, great, the application got what it asked for.

I don't think this is a good idea, because other people expecting some cases. For example when I try to reflink a file across different dataset I expect an error and maybe don't want or expect your described behavior. Bad example with the reflink I know, but I think you will get the point. ZFS is different in many cases, like blockdevices and datasets on the same pool. It does a lot of tasks, like raid, encryption, integrity and much more. From a linux perspective it's not handling a small task and this as good as possible. Maybe you are right and we can go with your approach. My personal opinion would be when I request a reflink please do the reflink if possible. I don't care about cross device reflink errors or whatever. The data will be saved on the pool and if we make it clear in the documentation that will be fine. I'm unsure and would wait for some other people and there opinion.

@oromenahar oromenahar changed the title zfs_clone_range should return a descriptive ENOTTY zfs_clone_range should return a more descriptive return value Aug 2, 2023
@robn
Copy link
Member

robn commented Aug 2, 2023

My personal opinion would be when I request a reflink please do the reflink if possible.

To be clear, I was only saying "always fallback" for copy_file_range. FICLONERANGE should always fail if it can't clone.

@oromenahar
Copy link
Contributor Author

oromenahar commented Aug 3, 2023

My personal opinion would be when I request a reflink please do the reflink if possible.

To be clear, I was only saying "always fallback" for copy_file_range. FICLONERANGE should always fail if it can't clone.

Yes I agree. Thanks for your ideas.

module/os/linux/zfs/zpl_file_range.c Outdated Show resolved Hide resolved
@oromenahar
Copy link
Contributor Author

I changed the first three cases to EINVAL, how can I reference you @robn you provided the patch wich I used as a base.

The fourth case, where the source block hasn't been written yet, is trickier. Probably I would just leave it as EAGAIN; EBUSY might work too. This is one of the tricky-to-adapt ones I mentioned before, and really zfs_clone_range shouldn't be doing any of this.

what about ENOSYS? The function is not yet implemented and the caller want's to use. I would go with this, what do you think about it?

I added a new case, which I use ENOSYS as well.

@oromenahar
Copy link
Contributor Author

oromenahar commented Aug 3, 2023

An alternate approach might be to always use the fallback except in cases where we know the call should fail.

I checked some other filesystems how they handle this in a recent kernel version (6.3) and them doing something like this (copy from nfs4file.c ceph handles the same way):

	if (ret == -EOPNOTSUPP || ret == -EXDEV)
		ret = generic_copy_file_range(file_in, pos_in, file_out,
					      pos_out, count, flags);
	return ret;

I added an always fallback, that's not the best case. For example when the filesystem is read only or we don't have permission, then we don't need a generic copy. This case should fail. Do you know any more cases?

If we try to exclude the cases where we should fail and return the error code the if-statement can get very long if (ret != ... && ret != ... && ret != ...). Also the other cases around. And if anybody changes something in zfs_clone_range it must be changed in copy_file_range as well. I don't like this kind of code writing, people will not reconize it or forgot it. For that reason it might be a good Idea to make the unecessary function calls and let generic copy figure it out again? What do you and other people think about it?

@robn
Copy link
Member

robn commented Aug 4, 2023

what about ENOSYS? The function is not yet implemented and the caller want's to use. I would go with this, what do you think about it?

No, because the function is implemented. That particular situation is that the block was created in the same txg as the clone attempt, so the block is not on disk yet. So the existing EAGAIN ("Resource temporary unavailable") is totally appropriate there.

@robn
Copy link
Member

robn commented Aug 4, 2023

An alternate approach might be to always use the fallback except in cases where we know the call should fail.

I checked some other filesystems how they handle this in a recent kernel version (6.3) and them doing something like this (copy from nfs4file.c ceph handles the same way):

	if (ret == -EOPNOTSUPP || ret == -EXDEV)
		ret = generic_copy_file_range(file_in, pos_in, file_out,
					      pos_out, count, flags);
	return ret;

I added an always fallback, that's not the best case. For example when the filesystem is read only or we don't have permission, then we don't need a generic copy. This case should fail. Do you know any more cases?

If we try to exclude the cases where we should fail and return the error code the if-statement can get very long if (ret != ... && ret != ... && ret != ...). Also the other cases around. And if anybody changes something in zfs_clone_range it must be changed in copy_file_range as well. I don't like this kind of code writing, people will not reconize it or forgot it. For that reason it might be a good Idea to make the unecessary function calls and let generic copy figure it out again? What do you and other people think about it?

Yeah, all this is why it wasn't an obvious choice for me!

At this point I'm leaning towards explicitly listing the fallback conditions (ie how we already have it). I think once we have it settled they won't change much, and it gives us better control if Linux decides to change something in the future - I wouldn't want a future Linux to start treating one of our error codes differently and something weird happen. Better that it breaks and we hear about it. And, if it does break, the user still has a normal copy or an explicit clone available to them.

I did think of keeping a list (macro) of fallback conditions, but its not really any better and more importantly, FreeBSD will have to handle this too and the conditions where it is safe to fallback might be different to Linux, so there's nothing much gained really.

@oromenahar
Copy link
Contributor Author

oromenahar commented Aug 4, 2023

So the existing EAGAIN ("Resource temporary unavailable") is totally appropriate there.

Okay sounds plausible.

I did think of keeping a list (macro) of fallback conditions, but its not really any better and more importantly, FreeBSD will have to handle this too and the conditions where it is safe to fallback might be different to Linux, so there's nothing much gained really.

I agree I don't think it make sense at this point, maybe later, when other tools using zfs_clone_range and we need to change it on several files.

What do you think about: https://github.com/openzfs/zfs/pull/15148/files#diff-9b39a50c6603b448db8226a521d86c3f009780b37e70c40cd03c79bd2047d1e0R1268 ? This could be treated as invalid arguments as well. Like you wanna clone data from encrypted dataset to another, than the destination is the invalid argument. I try to think what other people and/or me would expect from an error code. I'm curious what you think about this return value part!

At this point I'm leaning towards explicitly listing the fallback conditions (ie how we already have it). [...] better control if Linux decides to change something [...] Better that it breaks and we hear about it.

That are strong arguments. I would say too we should use the way how we have already done it in the past. I'm thinking about refactoring it and adding a second function which is called is_clone_range_allowed(...) and or is_clone_or_copy_range_allowed(...) or something similar. But I don't want to investigate it right now and don't want to invest the time to rewrite this part. Other parts are more important maybe somebody else will do it.

I think we agree in the most parts now. Will wait for an opinion about https://github.com/openzfs/zfs/pull/15148/files#diff-9b39a50c6603b448db8226a521d86c3f009780b37e70c40cd03c79bd2047d1e0R1268 and than push a commit. Would be nice to have a stable interface which don't change much later in the 2.2-release.

@robn
Copy link
Member

robn commented Aug 4, 2023

This could be treated as invalid arguments as well. Like you wanna clone data from encrypted dataset to another, than the destination is the invalid argument. I try to think what other people and/or me would expect from an error code. I'm curious what you think about this return value part!

I think EXDEV is the right one here - what the caller asked for is sensible, but we can't do it because going cross-"device" (filesystem, dataset, etc) is complicated.

It's not EINVAL. That's almost always the caller asking for something that makes no sense. In this case they asked for something quite sensible, but ZFS can't do it because of its own internal reasons (one day we might be able to do cross-dataset cloning for encrypted blocks (eg #14705) and then we'll be able to give the caller what they asked for).

Would be nice to have a stable interface which don't change much later in the 2.2-release.

Agreed! You're doing great shaking all this stuff out, thanks so much!

Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block wich should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Signed-off-by: Kay Pedersen <[email protected]>
@oromenahar
Copy link
Contributor Author

@robn think it's finished. If you agree @behlendorf can merge it to master.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read through all the discussion I think the modified return values are pretty reasonable. Thanks for working through this! @robn if you're okay with the updated PR I'll go ahead and merge it.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 7, 2023
@robn
Copy link
Member

robn commented Aug 8, 2023

@oromenahar, top work! @behlendorf yep, good to go!

(sorry both, was afk most of the weekend and forgot this 🏖️)

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 8, 2023
@behlendorf behlendorf merged commit 019dea0 into openzfs:master Aug 8, 2023
19 checks passed
@oromenahar oromenahar deleted the fix/brtspeakingreturnvalue branch August 8, 2023 18:45
usaleem-ix pushed a commit to truenas/zfs that referenced this pull request Aug 17, 2023
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15148
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 25, 2023
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15148
behlendorf pushed a commit that referenced this pull request Aug 25, 2023
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes #15148
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
Return the more descriptive error codes instead of `EXDEV` when
the parameters don't match the requirements of the clone function.
Updated the comments in `brt.c` accordingly.
The first three errors are just invalid parameters, which zfs can
not handle.
The fourth error indicates that the block which should be cloned
is created and cloned or modified in the same transaction
group (`txg`).

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Kay Pedersen <[email protected]>
Closes openzfs#15148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants