-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs_clone_range should return a more descriptive return value #15148
zfs_clone_range should return a more descriptive return value #15148
Conversation
Heh, I started looking at this a couple of nights ago but got into the weeds a bit, as I will now describe. On the one hand, I would like the errors returned On the other hand, (This was the point where I decided that I'd need to play around, which I haven't done yet). So I lean softly towards the first option; make So for this PR, I would say the first three cases that are checking that the incoming ranges are sensibly aligned should return The fourth case, where the source block hasn't been written yet, is trickier. Probably I would just leave it as So my version of this would be something like (totally untested): diff --git module/os/freebsd/zfs/zfs_vnops_os.c module/os/freebsd/zfs/zfs_vnops_os.c
index 45cf6fdfc..89cd8b4fb 100644
--- module/os/freebsd/zfs/zfs_vnops_os.c
+++ module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6290,7 +6290,7 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap)
error = zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp),
ap->a_outoffp, &len, ap->a_outcred);
- if (error == EXDEV || error == EOPNOTSUPP)
+ if (error == EXDEV || error == EOPNOTSUPP || error == EINVAL)
goto bad_locked_fallback;
*ap->a_lenp = (size_t)len;
out_locked:
diff --git module/os/linux/zfs/zpl_file_range.c module/os/linux/zfs/zpl_file_range.c
index 72384b638..e9185c1ca 100644
--- module/os/linux/zfs/zpl_file_range.c
+++ module/os/linux/zfs/zpl_file_range.c
@@ -103,7 +103,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
* Since Linux 5.3 the filesystem driver is responsible for executing
* an appropriate fallback, and a generic fallback function is provided.
*/
- if (ret == -EOPNOTSUPP || ret == -EXDEV)
+ if (ret == -EOPNOTSUPP || ret == -EXDEV || ret == -EINVAL)
ret = generic_copy_file_range(src_file, src_off, dst_file,
dst_off, len, flags);
#else
@@ -111,7 +111,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
* Before Linux 5.3 the filesystem has to return -EOPNOTSUPP to signal
* to the kernel that it should fallback to a content copy.
*/
- if (ret == -EXDEV)
+ if (ret == -EXDEV || ret == -EINVAL)
ret = -EOPNOTSUPP;
#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
diff --git module/zfs/zfs_vnops.c module/zfs/zfs_vnops.c
index 54ea43363..0a77c19ef 100644
--- module/zfs/zfs_vnops.c
+++ module/zfs/zfs_vnops.c
@@ -1171,7 +1171,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
* We cannot clone into files with different block size.
*/
if (inblksz != outzp->z_blksz && outzp->z_size > inblksz) {
- error = SET_ERROR(EXDEV);
+ error = SET_ERROR(EINVAL);
goto unlock;
}
@@ -1179,7 +1179,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
* Offsets and len must be at block boundries.
*/
if ((inoff % inblksz) != 0 || (outoff % inblksz) != 0) {
- error = SET_ERROR(EXDEV);
+ error = SET_ERROR(EINVAL);
goto unlock;
}
/*
@@ -1187,7 +1187,7 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
*/
if ((len % inblksz) != 0 &&
(len < inzp->z_size - inoff || len < outzp->z_size - outoff)) {
- error = SET_ERROR(EXDEV);
+ error = SET_ERROR(EINVAL);
goto unlock;
}
@@ -1240,17 +1240,9 @@ zfs_clone_range(znode_t *inzp, uint64_t *inoffp, znode_t *outzp,
nbps = maxblocks;
error = dmu_read_l0_bps(inos, inzp->z_id, inoff, size, bps,
&nbps);
- if (error != 0) {
- /*
- * If we are tyring to clone a block that was created
- * in the current transaction group. Return an error,
- * so the caller can fallback to just copying the data.
- */
- if (error == EAGAIN) {
- error = SET_ERROR(EXDEV);
- }
+ if (error != 0)
break;
- }
+
/*
* Encrypted data is fine as long as it comes from the same
* dataset. I'm not super committed to this though. Definitely keen to hear what you (and others) think! |
An alternate approach might be to always use the fallback except in cases where we know the call should fail. |
hey
I was convinced a few minutes ago, while waiting for some tea water, this will be changed. I converted it in the os/linux/zpl_clone code so just return I'm not sure about the fourth case. I'm thinking about this a few days now as well. I'm going to reconsider it tomorrow.
I don't think this is a good idea, because other people expecting some cases. For example when I try to reflink a file across different dataset I expect an error and maybe don't want or expect your described behavior. Bad example with the reflink I know, but I think you will get the point. ZFS is different in many cases, like blockdevices and datasets on the same pool. It does a lot of tasks, like raid, encryption, integrity and much more. From a linux perspective it's not handling a small task and this as good as possible. Maybe you are right and we can go with your approach. My personal opinion would be when I request a reflink please do the reflink if possible. I don't care about cross device reflink errors or whatever. The data will be saved on the pool and if we make it clear in the documentation that will be fine. I'm unsure and would wait for some other people and there opinion. |
To be clear, I was only saying "always fallback" for |
Yes I agree. Thanks for your ideas. |
42d0446
to
6e593b3
Compare
I changed the first three cases to
what about I added a new case, which I use |
I checked some other filesystems how they handle this in a recent kernel version (6.3) and them doing something like this (copy from if (ret == -EOPNOTSUPP || ret == -EXDEV)
ret = generic_copy_file_range(file_in, pos_in, file_out,
pos_out, count, flags);
return ret; I added an always fallback, that's not the best case. For example when the filesystem is read only or we don't have permission, then we don't need a generic copy. This case should fail. Do you know any more cases? If we try to exclude the cases where we should fail and return the error code the |
No, because the function is implemented. That particular situation is that the block was created in the same txg as the clone attempt, so the block is not on disk yet. So the existing |
Yeah, all this is why it wasn't an obvious choice for me! At this point I'm leaning towards explicitly listing the fallback conditions (ie how we already have it). I think once we have it settled they won't change much, and it gives us better control if Linux decides to change something in the future - I wouldn't want a future Linux to start treating one of our error codes differently and something weird happen. Better that it breaks and we hear about it. And, if it does break, the user still has a normal copy or an explicit clone available to them. I did think of keeping a list (macro) of fallback conditions, but its not really any better and more importantly, FreeBSD will have to handle this too and the conditions where it is safe to fallback might be different to Linux, so there's nothing much gained really. |
Okay sounds plausible.
I agree I don't think it make sense at this point, maybe later, when other tools using What do you think about: https://github.com/openzfs/zfs/pull/15148/files#diff-9b39a50c6603b448db8226a521d86c3f009780b37e70c40cd03c79bd2047d1e0R1268 ? This could be treated as invalid arguments as well. Like you wanna clone data from encrypted dataset to another, than the destination is the invalid argument. I try to think what other people and/or me would expect from an error code. I'm curious what you think about this return value part!
That are strong arguments. I would say too we should use the way how we have already done it in the past. I'm thinking about refactoring it and adding a second function which is called I think we agree in the most parts now. Will wait for an opinion about https://github.com/openzfs/zfs/pull/15148/files#diff-9b39a50c6603b448db8226a521d86c3f009780b37e70c40cd03c79bd2047d1e0R1268 and than push a commit. Would be nice to have a stable interface which don't change much later in the 2.2-release. |
I think It's not
Agreed! You're doing great shaking all this stuff out, thanks so much! |
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block wich should be cloned is created and cloned or modified in the same transaction group (`txg`). Signed-off-by: Kay Pedersen <[email protected]>
6e593b3
to
f21662d
Compare
@robn think it's finished. If you agree @behlendorf can merge it to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having read through all the discussion I think the modified return values are pretty reasonable. Thanks for working through this! @robn if you're okay with the updated PR I'll go ahead and merge it.
@oromenahar, top work! @behlendorf yep, good to go! (sorry both, was afk most of the weekend and forgot this 🏖️) |
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Kay Pedersen <[email protected]> Closes openzfs#15148
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Kay Pedersen <[email protected]> Closes openzfs#15148
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Kay Pedersen <[email protected]> Closes #15148
Return the more descriptive error codes instead of `EXDEV` when the parameters don't match the requirements of the clone function. Updated the comments in `brt.c` accordingly. The first three errors are just invalid parameters, which zfs can not handle. The fourth error indicates that the block which should be cloned is created and cloned or modified in the same transaction group (`txg`). Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Kay Pedersen <[email protected]> Closes openzfs#15148
Return the more descriptive
ENOTTY
instead ofEXDEV
when the parameters don't match the requirements of the clone function.EINVAL
would also be possible because some of the parameters are invalid but I decided to go withENOTTY
, which is little bit more generic thanEINVAL
.Motivation and Context
zfs_clone_range
returned a lotEXDEV
for erros wich are not really a cross device clone request. This should be change to match the return expectations of the function when calling it. Results in better and more stable code.Description
Described in
Motivation and Context
. When I have time I will try to add some test, but my time is limited, so sorry for the missing tests.How Has This Been Tested?
Checked if it still compiles and made a few runs on the shell.
Types of changes
Checklist:
Signed-off-by
.