Skip to content

Segmentation fault in _ctypes when _type_ can't be converted to UTF-8 #137037

@3xt3r

Description

@3xt3r

Bug report

Bug description:

Summary

The initial goal was to confirm whether a segmentation fault could occur in the following code path from _ctypes.c:

if (dict != NULL && dict->proto != NULL) {
    if (PyUnicode_Check(dict->proto)
        && (strchr("sPzUZXO", PyUnicode_AsUTF8(dict->proto)[0]))) {
        return 1;
    }
}

The hypothesis: if dict->proto is a malformed PyUnicodeObject (e.g. one that bypassed PyUnicode_READY()), then PyUnicode_AsUTF8() may return NULL or point to invalid memory, causing a crash during the strchr() call or later Unicode processing.

This behavior was confirmed by crafting an invalid Unicode object in C and assigning it to _type_ in a ctypes.POINTER subclass.


PoC

badproto.c

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <stdio.h>

static PyObject* crash_on_utf8(PyObject *self, PyObject *args) {
    PyObject *u = PyUnicode_New(5, 127);

    if (!u) {
        PyErr_SetString(PyExc_RuntimeError, "PyUnicode_New failed");
        return NULL;
    }

    ((PyASCIIObject *)u)->state.ready = 0; 
    const char *utf8 = PyUnicode_AsUTF8(u);
    char c = utf8[0];

    return PyLong_FromLong((long)c);
}

static PyMethodDef Methods[] = {
    {"crash_on_utf8", crash_on_utf8, METH_NOARGS, "Force PyUnicode_AsUTF8 to segfault."},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef mod = {
    PyModuleDef_HEAD_INIT,
    "badproto",
    NULL,
    -1,
    Methods
};

PyMODINIT_FUNC PyInit_badproto(void) {
    return PyModule_Create(&mod);
}

test.py

import badproto
badproto.crash_on_utf8()

Analysis

  • The crash path begins in cast_check_pointertype() in _ctypes.c.
  • That code assumes _type_ is a valid Unicode object and that PyUnicode_AsUTF8() is safe to call.
  • However, if the object was created via C and is in an invalid state (e.g., ready = 0), this assumption may be broken.
  • Consequences:
    • PyUnicode_AsUTF8() may return NULL, leading to undefined behavior in strchr(...).
    • Or it may cause a deep crash elsewhere (e.g. find_maxchar_surrogates()).

GDB Trace

Program terminated with signal SIGSEGV, Segmentation fault.
#0  find_maxchar_surrogates (begin=0x0, end=0x70, ...) at Objects/unicodeobject.c:1790

Expected behavior

CPython should defensively reject _type_ values that are not fully initialized Unicode objects, or at least guard against NULL from PyUnicode_AsUTF8().

Adding an explicit PyUnicode_READY() call in this path might be appropriate.


Notes

This is not reachable from pure Python — it depends on constructing an invalid PyUnicodeObject in C. However, it reveals a potentially unsafe assumption in _ctypes that may be worth hardening.

core.dmp

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

Labels

extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedtopic-ctypestype-crashA hard crash of the interpreter, possibly with a core dump

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions