Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-90370: Avoid temporary varargs tuple creation in argument passing #30312

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

colorfulappl
Copy link
Contributor

@colorfulappl colorfulappl commented Dec 31, 2021

When "Augument Clinic generated code" are parsing arguments, the args are packed to a tuple before passing to callee. This may be unnecessary.

Pass a raw pointer which points to on-stack varargs, and a varargssize integer to indicate how many varargs are passed, can save the time of tuple creation/destruction and value copy.

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@colorfulappl

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments here.

I wonder if it would be simpler to restrict the scope of such a change to only signatures like def f(*args, kw1=<default1>, kw2=<default2>, ...), like print()/min()/max() have.

@@ -3341,16 +3342,16 @@ test_vararg_and_posonly(PyObject *module, PyObject *const *args, Py_ssize_t narg
for (Py_ssize_t i = 0; i < nargs - 1; ++i) {
PyTuple_SET_ITEM(__clinic_args, i, args[1 + i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems suspicious -- isn't __clinic_args a pointer-to-PyObject-pointer, not a pointer-to-tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault. The tuple doesn't need to be created.

if (!args) {
goto exit;
}
a = args[0];
__clinic_args = args[1];
return_value = test_vararg_impl(module, a, __clinic_args);
__clinic_args = (PyObject *const *)args[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid such unsafe casts. Maybe passing a PyObject *** into _PyArg_UnpackKeywordsWithVarargFast instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copy and cast are unnecessary.
Maybe __clinic_args = args + 1; is enough, note the args refers to the 2nd argument of function test_vararg, assuming it's not assigned in line 3389.

Python/getargs.c Outdated
buf[vararg] = (PyObject *)&args[vararg];

/* copy required positional args */
for (i = 0; i < vararg; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible to avoid copying at all by directly passing some sub-buffer of the *args buffer into the ..._impl function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. None of the positional args or variable-length args need to be copied. I will let _PyArg_UnpackKeywordsWithVarargFast parse default and keyword args only.

@isidentical isidentical self-requested a review January 21, 2022 14:04
@isidentical
Copy link
Member

Thanks for the PR and benchmarks @colorfulappl, I am current blocked a bit but plan to review this soon!

@colorfulappl
Copy link
Contributor Author

Thanks for the PR and benchmarks @colorfulappl, I am current blocked a bit but plan to review this soon!

Thank you @isidentical !
With @sweeneyde 's comment I have found that my patch is actually buggy.
I am writing a new patch but blocked by some affairs last a few days.
You can review after I push a new commit and ignore this version. Sorry :)

@isidentical
Copy link
Member

Sure, then just ping me whenever the new patch ready and I'll try to take a look at it then!

@colorfulappl
Copy link
Contributor Author

Sorry for such a long delay.

When generate parsing code for function with varargs, this PR:

  1. Introduce a new function _PyArg_UnpackKeywordsWithVarargKwonly.
    It's a copy of _PyArg_UnpackKeywordsWithVararg but copying of positional args and packing of varargs are deleted.
    _PyArg_UnpackKeywordsWithVarargKwonly returns an array pointer of parsed keyword args only.
  2. Positional args are not copied to argsbuf. argsbuf only saves parsed keyword args.
  3. Pass varargsize and vararg array pointer to XXXX_impl, this avoid pack/unpack of varargs.

I'd really appreciate it if you could take a look and give me some advice @isidentical .

@colorfulappl
Copy link
Contributor Author

Benchmark Result

Benchmark:
https://bugs.python.org/file50533

Environment:
OS: macOS 12.3
CPU: Apple M1 Pro
Compiler: Apple clang version 13.1.6
Build args: --enable-optimizations --with-lto

+-----------------------------------------------------+---------+-----------------------+
| Benchmark                                           | base    | opt                   |
+=====================================================+=========+=======================+
| print(a)                                            | 133 ns  | 126 ns: 1.06x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c)                                      | 337 ns  | 326 ns: 1.04x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c, *v)                                  | 679 ns  | 661 ns: 1.03x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, sep='', file=stdout)                       | 134 ns  | 125 ns: 1.07x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, sep='', flush=True, file=stdout)           | 571 ns  | 563 ns: 1.01x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(*v, sep='', flush=True, file=stdout)          | 833 ns  | 819 ns: 1.02x faster  |
+-----------------------------------------------------+---------+-----------------------+
| print(a, b, c, *v, sep='', flush=True, file=stdout) | 1.18 us | 1.16 us: 1.02x faster |
+-----------------------------------------------------+---------+-----------------------+
| Geometric mean                                      | (ref)   | 1.03x faster          |
+-----------------------------------------------------+---------+-----------------------+

@colorfulappl
Copy link
Contributor Author

I found there are some bugs when argument clinic processes varargs and made a fix in #32092.
Maybe this patch should wait until these bugs are fixed?

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Feb 4, 2023

Just a heads-up, @colorfulappl: we prefer keeping all of the PR history since it makes reviewing easier, so please do not force-push. (Also documented in the devguide.) If you need to sync with main, you can do a git fetch --all && git merge --no-ff upstream/main.

@arhadthedev

This comment was marked as outdated.

@colorfulappl
Copy link
Contributor Author

we prefer keeping all of the PR history since it makes reviewing easier, so please do not force-push.

Thanks for the advice; I will pay attention to it.

I added some test cases for the class method __new__ (without actually creating the objects, only packing and returning the arguments). And I found that when the __new__ or __init__ function accepts a vararg only, the argument clinic generated code does not check keyword arguments.

e.g.

# Signature of this function:
#
# @classmethod
# _testclinic.TestClassVarargOnly.__new__(cls, *args)

test_class_new_vararg_only(kw=1)

# Here we expect an error like "TypeError: TestClassVarargOnly.__new__() got an unexpected keyword argument 'kw'",
# but nothing raised.

We may need to fix it in another issue.

@colorfulappl

This comment was marked as outdated.

@arhadthedev

This comment was marked as outdated.

max_pos
), indent=4))
p.converter.parser_name,
p.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect here p.converter.name instead.

I'm trying your patch for converting some vararg functions in the math module (see the old conversion attempt), and it seems that without this change autogenerated code is broken, because it references a Python parameter name instead of C (i.e., coordinates, instead of args):

static PyObject *
math_hypot_impl(PyObject *module, Py_ssize_t varargssize,
                PyObject *const *args);

static PyObject *
math_hypot(PyObject *module, PyObject *const *args, Py_ssize_t nargs)
{
    PyObject *return_value = NULL;
    Py_ssize_t varargssize = nargs;
    PyObject *const *__clinic_args;

    if (!_PyArg_CheckPositional("hypot", nargs, 0, PY_SSIZE_T_MAX)) {
        goto exit;
    }
    __clinic_args = coordinates + 0;
    return_value = math_hypot_impl(module, varargssize, __clinic_args);

exit:
    return return_value;
}

See docs. BTW, I think there should be a test for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding "args" will work too, but I'm not sure it's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice.
When the function signature is METH_FASTCALL, the pass-in parameter is always named args, so replacing p.name with args works.
I made a fix and added two tests. It looks correct now.

def test_vararg_with_default_with_other_name(self):
with self.assertRaises(TypeError):
ac_tester.vararg_with_default_with_other_name()
self.assertEqual(ac_tester.vararg_with_default_with_other_name(1, b=False), (1, (), False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add an exception test with _b.

@skirpichev
Copy link
Member

Not sure, if this is a good idea, but could you consider using nargs instead of varargsize for *_impl() functions? This will reduce needs for changing the body of converted function (take math_gcd as an example to convert from scratch).

@colorfulappl
Copy link
Contributor Author

When "Augument Clinic generated code" are parsing arguments, the args are packed to a tuple before passing to callee. This may be unnecessary.

This patch intends to reduce the overhead on passing vararg from Argument Clinic generated code to user-implemented *.impl() function.

Before this patch, vararg arguments are packed to a tuple, *_impl() function fetches these arguments by access the tuple items. In such a case, the number of vararg is implied in the tuple object.
After this patch, a pointer which points to vararg array is passed to *_impl() function directly, so the varargssize is necessary to tell the user how many vararg arguments is received.

but could you consider using nargs instead of varargsize for *_impl() functions?

@skirpichev Do you mean pass both nargs and max_pos to *_impl() function and let the user to calculate the number of vararg arguments? This may be confusing for users, and violate the purpose of Argument Clinic (Argument Clinic How-To - Abstract):

Argument Clinic is a preprocessor for CPython C files. Its purpose is to automate all the boilerplate involved with writing argument parsing code for “builtins”.

@skirpichev
Copy link
Member

Indeed, my suggestion only valid for only-vararg case (as gcd, lcm and so on). Not sure if this case worth a special treatment...

@erlend-aasland erlend-aasland changed the title bpo-46212: Avoid temporary varargs tuple creation in argument passing gh-90370: Avoid temporary varargs tuple creation in argument passing Feb 27, 2023
@skirpichev
Copy link
Member

@colorfulappl, could you fix merge conflicts?

@skirpichev skirpichev added the stale Stale PR or inactive for long period of time. label Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review performance Performance or resource usage stale Stale PR or inactive for long period of time. topic-argument-clinic
Projects
None yet
Development

Successfully merging this pull request may close these issues.