I'm a freshman in python and I want to study the implemention of python's builtin function like abs(), but in the python file of \__builtin__.py I saw this:
Does anybody know how it works?
The built-in functions are implemented in the same language as the interpreter, so the source code is different depending on the Python implementation you are using (Jython, CPython, PyPy, etc). You are probably using CPython, so the abs() function is implemented in C. You can look at the real source code of this function here.
static PyObject *
builtin_abs(PyObject *module, PyObject *x)
{
return PyNumber_Absolute(x);
}
The source code for PyNumber_Absolute (which is, arguably, more interesting) can be found here:
PyObject *
PyNumber_Absolute(PyObject *o)
{
PyNumberMethods *m;
if (o == NULL)
return null_error();
m = o->ob_type->tp_as_number;
if (m && m->nb_absolute)
return m->nb_absolute(o);
return type_error("bad operand type for abs(): '%.200s'", o);
}
As you can see, the actual implementation of abs() calls nb_absolute() which is different for different object types. The one for float looks like this
static PyObject *
float_abs(PyFloatObject *v)
{
return PyFloat_FromDouble(fabs(v->ob_fval));
}
So, effectively, CPython is just using the C math library in this case. The same will be true for other implementations of Python - Jython is using the functions from the Java math library.
Related
Viewing the source code of CPython on GitHub, I saw the method here:
https://github.com/python/cpython/blob/main/Python/bltinmodule.c
And more specifically:
static PyObject *
builtin_sorted(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
PyObject *newlist, *v, *seq, *callable;
/* Keyword arguments are passed through list.sort() which will check
them. */
if (!_PyArg_UnpackStack(args, nargs, "sorted", 1, 1, &seq))
return NULL;
newlist = PySequence_List(seq);
if (newlist == NULL)
return NULL;
callable = _PyObject_GetAttrId(newlist, &PyId_sort);
if (callable == NULL) {
Py_DECREF(newlist);
return NULL;
}
assert(nargs >= 1);
v = _PyObject_FastCallKeywords(callable, args + 1, nargs - 1, kwnames);
Py_DECREF(callable);
if (v == NULL) {
Py_DECREF(newlist);
return NULL;
}
Py_DECREF(v);
return newlist;
}
I am not a C master, but I don't see any implementation of any of the known sorting algorithms, let alone the special sort that Python uses (I think it's called Timsort? - correct me if I'm wrong)
I would highly appreciate if you could help me "digest" this code and understand it, because as of right now I've got:
PyObject *newlist, *v, *seq, *callable;
Which is creating a new list - even though list is mutable no? then why create a new one?
and creating some other pointers, not sure why...
then we unpack the rest of the arguments as the comment suggests, if it doesn't match the arguments there (being the function 'sorted' for example) then we break out..
I am pretty sure I am reading this all completely wrong, so I stopped here...
Thanks for the help in advanced, sorry for the multiple questions but this block of code is blowing my mind and learning to read this would help me a lot!
The actual sorting is done by list.sort. sorted simply creates a new list from whatever iterable argument it is given, sorts that list in-place, then returns it. A pure Python implementation of sorted might look like
def sorted(itr, *, key=None):
newlist = list(itr)
newlist.sort(key=key)
return newlist
Most of the C code is just boilerplate for working with the underlying C data structures, detecting and propagating errors, and doing memory management.
The actual sorting algorithm is spread throughout Objects/listobject.c; start here. If you are really interested in what the algorithm is, rather than how it is implemented in C, you may want to start with https://github.com/python/cpython/blob/main/Objects/listsort.txt instead.
list sort implementation isn't there. This is a wrapper function fetching PyId_sort from there:
callable = _PyObject_GetAttrId(newlist, &PyId_sort);
object.h contains a macro using token pasting to define the PyId_xxx objects
#define _Py_IDENTIFIER(varname) _Py_static_string(PyId_##varname, #varname)
... and I stopped digging after that. There could be more macro magic involved in order to enforce a coherent naming through the whole python codebase.
The implementation is located here:
https://github.com/python/cpython/blob/main/Objects/listobject.c
More precisely around line 2240
static PyObject *
list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse)
/*[clinic end generated code: output=57b9f9c5e23fbe42 input=cb56cd179a713060]*/
{
Comments read:
/* An adaptive, stable, natural mergesort. See listsort.txt.
* Returns Py_None on success, NULL on error. Even in case of error, the
* list will be some permutation of its input state (nothing is lost or
* duplicated).
*/
Now it takes some effort to understand the details of the algorithm but it's there.
...or any Python object that exists in an importable library. I have found PyDateTime_* functions in the documentation for creating objects from the datetime module, but I can't find anything to do with the python decimal module. Is this possible?
Looking for a Boost.Python way if there is one, but the native API's will suffice if not.
In Boost.Python that would be something like
bp::object decimal = bp::import("decimal").attr("Decimal");
bp::object decimal_obj = decimal(1, 4);
Should be straightforward enough. Although untested, something like the following should work:
PyObject * decimal_mod = PyImport_ImportModule("decimal");
assert(decimal_mod);
PyObject * decimal_ctor = PyObject_GetAttrString(decimal_mod, "Decimal");
assert(decimal_ctor);
PyObject * four = PyObject_CallFunction(decimal_ctor, "i", 4);
assert(four);
Do keep in mind that all three PyObject * references here should be decreffed (using Py_DECREF()) once you are done with them. Also, I use assert() here for pedagogical purposes. Actual code should have real error handling.
Also, I use the raw Python/C API here. I've never used boost-python, so I don't know what differences exist, if any.
An example is python's file.__exit__ (i.e. if it does anything in addition to close). Is this documented anywhere? I tried Googling but didn't find good results.
Python's built-in functions and types are written in C (in the reference implementation, CPython). You can read its source code, if you want. For the __exit__ method you're asking about, in Python 3, I think you are looking for the file Modules/_io/iobase.c:
static PyObject *
iobase_exit(PyObject *self, PyObject *args)
{
return PyObject_CallMethodObjArgs(self, _PyIO_str_close, NULL);
}
It looks like it doesn't do anything but call close.
The equivalent bit of code for Python 2 is in a differnt file, since it is still using its own IO classes (rather than the IO module, which is also available as a backport from Python 3). Look in Objects/fileobject.c.
static PyObject *
file_exit(PyObject *f, PyObject *args)
{
PyObject *ret = PyObject_CallMethod(f, "close", NULL);
if (!ret)
/* If error occurred, pass through */
return NULL;
Py_DECREF(ret);
/* We cannot return the result of close since a true
* value will be interpreted as "yes, swallow the
* exception if one was raised inside the with block". */
Py_RETURN_NONE;
}
I'm not exactly sure why this code needs a test for None where the Python 3 code doesn't, but you can still see that it doesn't do anything other than call close (and ignore its return value).
I'm writing functools.partial object alternative, that accumulates arguments until their number become sufficient to make a call.
I use C API and I have tp_call implementation which when its called, returns modified version of self or PyObject*.
At first I followed Defining New Types guide and then realized, that I just can't return different types (PyObject * and MyObject*) from tp_call implementation.
Then I tried to not use struct with MyObject* definition and use PyObject_SetAttrString in tp_init instead, just like we do that in Python. But in that case I got AttributeError, because you can't set arbitrary attributes on object instances in Python.
What I need here is to make my tp_call implementation polymorphic, and make it able to return either MyObject which is subclass of PyObject, or PyObject type itself.
What is the sane way to do that?
UPDATE #0
That's the code:
static PyObject *Curry_call(Curry *self, PyObject *args,
PyObject *kwargs) {
PyObject * old_args = self->args;
self->args = PySequence_Concat(self->args, args);
Py_DECREF(old_args);
if (self->kwargs == NULL && kwargs != NULL) {
self->kwargs = kwargs;
Py_INCREF(self->kwargs);
} else if (self->kwargs != NULL && kwargs != NULL) {
PyDict_Merge(self->kwargs, kwargs, 1);
}
if ((PyObject_Size(self->args) +
(self->kwargs != NULL ? PyObject_Size(self->kwargs) : 0)) >=
self->num_args) {
return PyObject_Call(self->fn, self->args, self->kwargs);
} else {
return (PyObject *)self;
}
}
UPDATE #1
Why I initially abandoned this implementation - because I get segfault with it on subsequent calls of partial object. I thought that It happens because of casting Curry * to PyObject* issues. But now I have fixed the segfault by adding Py_INCREF(self); before return (PyObject *)self;. Very strange to me. Should I really INCREF self if I return it by C API ownership rules?
If you've defined your MyObject type correctly, you should be able to simply cast your MyObject * to a PyObject * and return that. The first member of a MyObject is a PyObject, and C lets you cast a pointer to a struct to a pointer to the struct's first member and vice versa. I believe the feature exists specifically to allow things like this.
I don't really know your whole code, but as long as MyObject is a PyObject (compatible, i.e. has the same "header" fields, make sure you have a length field), CPython is designed to just take your MyObject as a PyObject; simply cast the pointer to PyObject before returning it.
As you can see here, that is one of the things that is convenient when using C++: You can actually have subclasses with type safety, and you don't have to worry about someone just copying over half of your subclass' instance, for example.
EDIT: because it was asked "isn't this unsafe": yes. It is. But its only as unsafe as type handling in user code gets; CPython lets you do this, because it stores and checks the PyTypeObject *ob_type member of the PyObject struct contained. That's about as safe as for example C++'s runtime type checking is -- but it's implemented by python developers as opposed to GCC/clang/MSVC/icc/... developers.
If I'm understanding correctly,
PyMODINIT_FUNC in Python 2.X has been replaced by PyModule_Create in Python3.X
Both return PyObject*, however, in Python 3.X, the module's initialization function MUST return the PyObject* to the module - i.e.
PyMODINIT_FUNC
PyInit_spam(void)
{
return PyModule_Create(&spammodule);
}
whereas in Python2.X, this is not necessary - i.e.
PyMODINIT_FUNC
initspam(void)
{
(void) Py_InitModule("spam", SpamMethods);
}
So, my sanity checking questions are:
Is my understanding correct?
Why was this change made?
Right now, I'm only experimenting with very simple cases of C-extensions of Python. Perhaps if I were doing more, the answer to this would be obvious, or maybe if I were trying to embed Python into something else....
Yes, your understanding is correct. You must return the new module object from the initing function with return type PyMODINIT_FUNC. (PyMODINIT_FUNC declares the function to return void in python2, and to return PyObject* in python3.)
I can only speculate as to the motivations of the people who made the change, but I believe it was so that errors in importing the module could be more easily identified (you can return NULL from the module-init function in python3 if something went wrong).