Skip to content

Commit

Permalink
pythongh-111089: Revert PyUnicode_AsUTF8() changes (python#111833)
Browse files Browse the repository at this point in the history
* Revert "pythongh-111089: Use PyUnicode_AsUTF8() in Argument Clinic (python#111585)"

This reverts commit d9b606b.

* Revert "pythongh-111089: Use PyUnicode_AsUTF8() in getargs.c (python#111620)"

This reverts commit cde1071.

* Revert "pythongh-111089: PyUnicode_AsUTF8() now raises on embedded NUL (python#111091)"

This reverts commit d731579.

* Revert "pythongh-111089: Add PyUnicode_AsUTF8() to the limited C API (python#111121)"

This reverts commit d8f32be.

* Revert "pythongh-111089: Use PyUnicode_AsUTF8() in sqlite3 (python#111122)"

This reverts commit 37e4e20.
  • Loading branch information
vstinner authored Nov 7, 2023
1 parent ea970fb commit 11e8348
Show file tree
Hide file tree
Showing 50 changed files with 952 additions and 244 deletions.
8 changes: 0 additions & 8 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -992,19 +992,11 @@ These are the UTF-8 codec APIs:
As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
Raise an exception if the *unicode* string contains embedded null
characters. To accept embedded null characters and truncate on purpose
at the first null byte, ``PyUnicode_AsUTF8AndSize(unicode, NULL)`` can be
used instead.
.. versionadded:: 3.3
.. versionchanged:: 3.7
The return type is now ``const char *`` rather of ``char *``.
.. versionchanged:: 3.13
Raise an exception if the string contains embedded null characters.
UTF-32 Codecs
"""""""""""""
Expand Down
1 change: 0 additions & 1 deletion Doc/data/stable_abi.dat

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 0 additions & 9 deletions Doc/whatsnew/3.13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1149,9 +1149,6 @@ New Features
:c:func:`PyErr_WriteUnraisable`, but allow to customize the warning mesage.
(Contributed by Serhiy Storchaka in :gh:`108082`.)

* Add :c:func:`PyUnicode_AsUTF8` function to the limited C API.
(Contributed by Victor Stinner in :gh:`111089`.)


Porting to Python 3.13
----------------------
Expand Down Expand Up @@ -1222,12 +1219,6 @@ Porting to Python 3.13
Note that ``Py_TRASHCAN_BEGIN`` has a second argument which
should be the deallocation function it is in.

* The :c:func:`PyUnicode_AsUTF8` function now raises an exception if the string
contains embedded null characters. To accept embedded null characters and
truncate on purpose at the first null byte,
``PyUnicode_AsUTF8AndSize(unicode, NULL)`` can be used instead.
(Contributed by Victor Stinner in :gh:`111089`.)

* On Windows, ``Python.h`` no longer includes the ``<stddef.h>`` standard
header file. If needed, it should now be included explicitly. For example, it
provides ``offsetof()`` function, and ``size_t`` and ``ptrdiff_t`` types.
Expand Down
16 changes: 16 additions & 0 deletions Include/cpython/unicodeobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,22 @@ PyAPI_FUNC(PyObject*) PyUnicode_FromKindAndData(
const void *buffer,
Py_ssize_t size);

/* --- Manage the default encoding ---------------------------------------- */

/* Returns a pointer to the default encoding (UTF-8) of the
Unicode object unicode.
Like PyUnicode_AsUTF8AndSize(), this also caches the UTF-8 representation
in the unicodeobject.
_PyUnicode_AsString is a #define for PyUnicode_AsUTF8 to
support the previous internal function with the same behaviour.
Use of this API is DEPRECATED since no size information can be
extracted from the returned data.
*/

PyAPI_FUNC(const char *) PyUnicode_AsUTF8(PyObject *unicode);

/* === Characters Type APIs =============================================== */

Expand Down
30 changes: 11 additions & 19 deletions Include/unicodeobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -443,25 +443,17 @@ PyAPI_FUNC(PyObject*) PyUnicode_AsUTF8String(
PyObject *unicode /* Unicode object */
);

// Returns a pointer to the UTF-8 encoding of the Unicode object unicode.
//
// Raise an exception if the string contains embedded null characters.
// Use PyUnicode_AsUTF8AndSize() to accept embedded null characters.
//
// This function caches the UTF-8 encoded string in the Unicode object
// and subsequent calls will return the same string. The memory is released
// when the Unicode object is deallocated.
PyAPI_FUNC(const char *) PyUnicode_AsUTF8(PyObject *unicode);

// Returns a pointer to the UTF-8 encoding of the
// Unicode object unicode and the size of the encoded representation
// in bytes stored in `*size` (if size is not NULL).
//
// On error, `*size` is set to 0 (if size is not NULL).
//
// This function caches the UTF-8 encoded string in the Unicode object
// and subsequent calls will return the same string. The memory is released
// when the Unicode object is deallocated.
/* Returns a pointer to the default encoding (UTF-8) of the
Unicode object unicode and the size of the encoded representation
in bytes stored in *size.
In case of an error, no *size is set.
This function caches the UTF-8 encoded string in the unicodeobject
and subsequent calls will return the same string. The memory is released
when the unicodeobject is deallocated.
*/

#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030A0000
PyAPI_FUNC(const char *) PyUnicode_AsUTF8AndSize(
PyObject *unicode,
Expand Down
5 changes: 1 addition & 4 deletions Lib/test/test_capi/test_unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -914,10 +914,7 @@ def test_asutf8(self):
self.assertEqual(unicode_asutf8('abc', 4), b'abc\0')
self.assertEqual(unicode_asutf8('абв', 7), b'\xd0\xb0\xd0\xb1\xd0\xb2\0')
self.assertEqual(unicode_asutf8('\U0001f600', 5), b'\xf0\x9f\x98\x80\0')

# disallow embedded null characters
self.assertRaises(ValueError, unicode_asutf8, 'abc\0', 0)
self.assertRaises(ValueError, unicode_asutf8, 'abc\0def', 0)
self.assertEqual(unicode_asutf8('abc\0def', 8), b'abc\0def\0')

self.assertRaises(UnicodeEncodeError, unicode_asutf8, '\ud8ff', 0)
self.assertRaises(TypeError, unicode_asutf8, b'abc', 0)
Expand Down
1 change: 0 additions & 1 deletion Lib/test/test_stable_abi_ctypes.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

This file was deleted.

This file was deleted.

2 changes: 0 additions & 2 deletions Misc/stable_abi.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2478,8 +2478,6 @@
added = '3.13'
[function.PySys_AuditTuple]
added = '3.13'
[function.PyUnicode_AsUTF8]
added = '3.13'
[function._Py_SetRefcnt]
added = '3.13'
abi_only = true
30 changes: 25 additions & 5 deletions Modules/_io/clinic/_iomodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Modules/_io/clinic/fileio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 19 additions & 4 deletions Modules/_io/clinic/textio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Modules/_io/clinic/winconsoleio.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Modules/_multiprocessing/clinic/multiprocessing.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 7 additions & 2 deletions Modules/_multiprocessing/clinic/semaphore.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Modules/_multiprocessing/posixshmem.c
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ _posixshmem_shm_open_impl(PyObject *module, PyObject *path, int flags,
{
int fd;
int async_err = 0;
const char *name = PyUnicode_AsUTF8(path);
const char *name = PyUnicode_AsUTF8AndSize(path, NULL);
if (name == NULL) {
return -1;
}
Expand Down Expand Up @@ -87,7 +87,7 @@ _posixshmem_shm_unlink_impl(PyObject *module, PyObject *path)
{
int rv;
int async_err = 0;
const char *name = PyUnicode_AsUTF8(path);
const char *name = PyUnicode_AsUTF8AndSize(path, NULL);
if (name == NULL) {
return NULL;
}
Expand Down
Loading

0 comments on commit 11e8348

Please sign in to comment.