It seems that the Python C API is not consistent with the const correctness of character arrays. For example, PyImport_ImportFrozenModule accepts a char*, whereas PyImport_ImportModule accepts a const char*.
The implication of all this is that in my C++ application that I am writing with an embedded Python interpreter, I sometimes have to cast the string literal that I pass to a Python API call as just a char* (as opposed to const char*), and sometimes I don't. For example:
PyObject *os = PyImport_ImportModule("os"); // Works without the const_cast
PyObject *cwd = PyObject_CallMethod(os, const_cast<char*>("getcwd"), NULL); // Accepts char*, not const char*
If I don't do the const_cast<char*> (or (char*)) on the string literal, I get a compiler warning about casting string literals to char*.
Here are my questions:
Is there an advantage/reason to having some of the functions not take a const char* (and/or why would the Python API not be consistent in this)? My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this. I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++ (correct me if I am wrong... my strength is python, not C/C++). Is the lack of "const correctness" of the Python API because it's simply not as important in C? (There is an old thread on the python mailing list from 2000 asking the same question, but it didn't seem to go anywhere and it is implied the reason might be due to some compilers not supporting const. Since many functions now have const char*, this doesn't seem to apply anymore)
Because my understanding of C++ is limited, I am unsure if I am going about casting string literals properly. The way I see it, I can either one of the following (I am currently doing the first):
// Method 1) Use const_cast<char*>
PyImport_ImportFrozenModule(const_cast<char*>("mymodule"));
// Method 2) Use (char*)
PyImport_ImportFrozenModule((char*) "mymodule");
// Method 3) Use char array
char mod[] = "mymodule";
PyImport_ImportFrozenModule(mod);
Which is the best method do use?
Update:
It looks like the Python3 branch is slowly trying to fix the const correctness issue. For example, the PyImport_ImportFrozenModule function I use as an example above now takes a const char* in Python 3.4, but there are still functions that take only a char*, such as PyLong_FromString.
Based on some mailing list conversations from python-dev, it looks like the initial API just simply wasn't created with const correctness in mind, probably just because Guido didn't think about it. Dating all the way back to 2002, someone asked if there was any desire to address that by adding const-correctness, complaining that it's a pain to always have to do this:
somefunc(const char* modulename, const char* key)
{
... PyImport_ImportModule(const_cast<char*>(modulename)) ...
Guido Van Rossum (the creator of Python) replied (emphasis mine):
I've never tried to enforce const-correctness before, but I've heard
enough horror stories about this. The problem is that it breaks 3rd
party extensions left and right, and fixing those isn't always easy.
In general, whenever you add a const somewhere, it ends up propagating
to some other API, which then also requires a const, which propagates
to yet another API needing a const, ad infinitum.
There was a bit more discussion, but without Guido's support the idea died.
Fast forward nine years, and the topic came up again. This time someone was simply wondering why some functions were const-correct, while others weren't. One of the Python core developers replied with this:
We have been adding const to many places over the years. I think the
specific case was just missed (i.e. nobody cared about adding const
there).
It seems that when it could be done without breaking backwards compatibility, const-correctness has been added to many places in the C API (and in the case of Python 3, in places where it would break backwards compatibility with Python 2), but there was never a real global effort to fix it everywhere. So the situation is better in Python 3, but the entire API is likely not const correct even now.
I'm don't think that the Python community has any preferred way to handle casting with calls that are not const-correct (there's no mention of it in the official C-API style guide), probably because there aren't a ton of people out there interfacing with the C-API from C++ code. I would say the preferred way of doing it from a pure C++ best-practices perspective would be the first choice, though. (I'm by no means a C++ expert, so take that with a grain of salt).
Is there an advantage/reason to having some of the functions not take a const char*?
No. Looks like an oversight in the library's design or, like you say, legacy issues. They could at least have made it consistent, though!
My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this.
Exactly. Their documentation should also specify that the function argument (or, rather, the argument's pointee) shall not be modified during the function call; alas it currently does not say this.
I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++.
Well, not really, at least as far as I know.
The way I see it, I can either one of the following (I am currently doing the first)
(good)
Which is the best method do use?
Well the const_cast will at least make sure that you are only modifying the const-ness, so if you had to choose I'd go with that. But, really, I wouldn't be too bothered about this.
Related
I would like to implement a C / C++ library from a .dll file into a Python script to control a piece of i/o equipment called ClipX by HBM (in case anyone needs help with this in the future).
The manufacturer gives an example C implementation, and an example C++ implementation. In the C example, the Connect() function returns some pointer, which is used in subsequent read/write functions. In the C++ example, a ClipX class is used to establish the connection, and read/write functions are methods in that class. I've simplified the code for the purposes of this question.
Basically, I want to connect() to the device, and at some later point read() from it. From what I've read, it seems like Cython would be a good way to wrap connect() and read() as separate functions, and import them as a module into Python. My questions are:
For the C implementation, would I be able to pass MHandle pointer back to Python, after connecting, for later use (i.e. calling the read function)? Would the pointer even have any meaning, being used later in a different function call?
For the C++ implementation, could the dev object be passed to the Python code, to be later passed back for a Read()? Can you do that with arbitrary objects?
I am a mechanical engineer, sorry if this is gibberish or wildly uninformed. Any guidance is very much appreciated.
C Code:
/*From .h file*/
----------------------------------------------------
struct sClipX {
void *obj;
};
typedef struct sClipX * MHandle;
ClipX_API MHandle __stdcall Connect(const char *);
----------------------------------------------------
/*End .h file*/
int main()
{
const char * IP = "172.21.104.76";
MHandle m=Connect(IP);
Read(m, 0x4428);
}
C++ Code:
int main(){
ClipX dev = ClipX();
dev.Connect("172.21.104.76");
dev.Read(0x4428);
C++ functions are callable from C if you declare them as extern "C". This is related to name mangling
The Python interpreter can be extended with C functions. Read carefully the Extending and Embedding the Python Interpreter chapter.
Be careful about C++ exceptions. You don't want them to cross the Python interpreter code. So any extern "C" C++ function called from Python should handle and catch exceptions raised by internal routines.
At last, be careful about memory management and garbage collection. P.Wilson old paper on Uniprocessor garbage collection techniques is relevant, at least for terminology and insights. Or read the GC handbook. Python uses a reference counting scheme and handles specially weak references. Be careful about circular references.
Be of course aware of the GIL in Python. Roughly speaking, you cannot have several threads doing Python things without precautions.
Serialization of device-related data would also be a concern, but you probably don't need it.
Most importantly, document well your code.
Tools like doxygen could help (perhaps with LaTeX or DocBook).
Use of course a good enough version control system. I recommend git. Also a good build automation tool.
My suggestion is to publish your C++ code as open source, e.g. on github or gitlab. You then could get useful code reviews and feedback.
If your hardware + software system is safety-critical, consider static program analysis techniques e.g. with Frama-C or Clang static analyzer or with your own GCC plugin. In a few months (end of 2020), you might try Bismon (read also this draft report).
I am definitely biased, but I do recommend trying some Linux distribution (e.g. Ubuntu or Debian) as your cross-development platform. Be aware that a lot of devices (including RaspBerryPi) are running some embedded Linux system, so the learning effort makes sense. Then read Advanced Linux Programming
http://www.swig.org/papers/PyTutorial98/PyTutorial98.pdf
It comes from above link:
I know that it is an old publication so it is possible that information is outdated.
I would like to ask:
"Seems to work fine with C++ if you aren't being too clever"
What does it mean, to be too clever?
Is there known situation/case that I shuold be very careful where I am programming C++ modules and extending Python using swig tool?
This PDF appears to be a copy of slides from a presentation given by David Beazley at the 7th International Python Conference. My guess is there was a joke or verbal explanation of what he meant by that phrase.
Seems to work fine with C++ if you aren't being too clever
Here is a link to his website if you want to get in touch with him and ask him directly. His twitter account is dabeaz, which may (or may not) be a better way of contacting him.
The slide is strange and misleading. SWIG does not transform pass-by-value into pass-by-reference at all. Let me try to clarify by an example:
Let's say that as in the example you have the C++ function
double dot_product(Vector a, Vector b);
Now in plain C++ (no SWIG, no wrapping) you may use this function as in the following examples:
1.
Vector a = Vector(1,0);
Vector b = Vector(0,1);
double zero = dot_product(a, b);
2.
Vector *a = new Vector(1,0);
Vector *b = new Vector(0,1);
double zero = dot_product(*a, *b);
In both cases, the function is in fact called in exactly the same way using call-by-value.
SWIG wraps all objects into a structure that contains a pointer to the object, so under the hood SWIG passes pointers around for everything, and therefore uses a syntax as in the second example. But there is no conversion / transformation of call semantics going on whatsoever.
To answer your questions:
"Seems to work fine with C++ if you aren't being too clever" What does it mean, to be too clever?
I have no idea. As stated in another answer, likely a joke.
Is there known situation/case that I shuold be very careful where I am programming C++ modules and extending Python using swig tool?
This is a very broad question, and there certainly are pitfalls, especially related to memory management. However, this particular "transformation" is not an issue.
For reference, here is the relevant entry in the SWIG manual. Note that it is worded differently: The function is transformed to accept pointers. Nothing is said about "call semantics" (since this is a non-issue).
Since it's my first time learning systems programming, I'm having a hard time wrapping my head around the rules. Now, I got confused about memory leaks. Let's consider an example. Say, Rust is throwing a pointer (to a string) which Python is gonna catch.
In Rust, (I'm just sending the pointer of the CString)
use std::ffi::CString;
pub extern fn do_something() -> *const c_char {
CString::new(some_string).unwrap().as_ptr()
}
In Python, (I'm dereferencing the pointer)
def call_rust():
lib = ctypes.cdll.LoadLibrary(rustLib)
lib.do_something.restype = ctypes.c_void_p
c_pointer = lib.do_something()
some_string = ctypes.c_char_p(c_pointer).value
Now, my question is about freeing the memory. I thought it should be freed in Python, but then ownership pops in. Because, as_ptr seems to take an immutable reference. So, I got confused about whether I should free the memory in Rust or Python (or both?). If it's gonna be Rust, then how should I go about freeing it when the control flow has landed back into Python?
Your Rust function do_something constructs a temporary CString, takes a pointer into it, and then drops the CString. The *const c_char is invalid from the instant you return it. If you're on nightly, you probably want CString#into_ptr instead of CString#as_ptr, as the former consumes the CString without deallocating the memory. On stable, you can mem::forget the CString. Then you can worry about who is supposed to free it.
Freeing from Python will be tricky or impossible, since Rust may use a different allocator. The best approach would be to expose a Rust function that takes a c_char pointer, constructs a CString for that pointer (rather than copying the data into a new allocation), and drops it. Unfortunately the middle part (creating the CString) seems impossible on stable for now: CString::from_ptr is unstable.
A workaround would to pass (a pointer to) the entire CString to Python and provide an accessor function to get the char pointer from it. You simply need to box the CString and transmute the box to a raw pointer. Then you can have another function that transmutes the pointer back to a box and lets it drop.
It's fairly straightforward (if tedious) to unit test Python extension modules written in C, including the error cases for many of the Python/C APIs such as PyArg_ParseTuple. For example, the idiomatic way to start a C function which implements a Python function or method looks like:
if (!PyArg_ParseTuple(args, "someformat:function_name")) {
return NULL;
}
The success case of this can be unit tested by calling the function with the correct number and type of arguments. The failure cases can also be tested by calling the function with first the wrong number of arguments and then the right number of arguments but passing values of the wrong type. This results in full branch test coverage of the C code.
However, it's not clear how to exercise the negative paths for other Python/C APIs. An idiomatic way to begin module initialization in a C extension looks like:
if (PyType_Ready(&Some_Extension_Structure) < 0) {
return 0;
}
How can PyType_Ready be made to fail? Similarly, the C function for allocating a new instance of an extension type frequently uses an API like PyObject_New:
self = PyObject_New(Some_Structure, &Some_Extension_Structure);
if (self == NULL) {
return NULL;
}
How can one unit test this negative case (particularly considering PyObject_New is likely used many, many times over the course the execution of a single unit test method)?
It seems possible to build a general solution, relying on dynamic linker tricks such as LD_PRELOAD to provide fakes of these C APIs which can be directed to fail in the right ways at the right times. The cost of building a system like that seems a bit out of reach, though. Has someone else done it already and make the result available?
Are there Python/C-specific tricks that could make this testing easier?
Should I be thinking along some other lines entirely?
This is a clear case for test doubles (for example, mocking). Since the Python C API doesn't offer any facilities for faking an out of memory condition, you'd have to do it yourself.
Create your own layer that provides PyType_Ready and PyObject_New. Have them pass through to the C API functions, unless some control, probably an environment variable, instructs them not to. They can cause any mayhem you desire, and test your code's reaction.
I've discovered a function in the Python C API named PyEval_CallFunction which seems to be useful. It allows you to invoke a Python callable by saying something like:
PyEval_CallFunction(obj, "OOO", a, b, c);
However, I can't find any official documentation on this function. A google search brings up various unofficial tutorials which discuss this function, but:
The function isn't
documented in the official
Python docs, so I don't know if it's
even something that is supposed to
be part of the public API.
Searching the web turns up
inconsistent usage policies. Some
tutorials indicate the
format string needs parenthesis
around the type list, like
"(OiiO)", whereas other times I
see it used without the parenthesis.
When I actually try the function in
a real program, it seems to require
the parenthesis, otherwise it
segfaults.
I'd like to use this function because it's convenient. Does anyone know anything about this, or know why it isn't documented? Is it part of the public API?
I couldn't find many references to it either, but the tutorial you linked to mentions this:
The string format and the following
arguments are as for Py_BuildValue
(XXX so i really should have described
that by now!). A call such as
PyEval_CallFunction(obj, "iii", a, b, c);
is equivalent to
PyEval_CallObject(obj, Py_BuildValue("iii", a, b, c));
I suppose PyEval_CallFunction is not public API, as its value seems rather limited. There is not much of a difference between these two. But then again, I'm not really involved in python extensions, so this is just my view on this.
PyEval_CallObject itself is just a macro around PyEval_CallObjectWithKeywords.
#define PyEval_CallObject(func,arg) \
PyEval_CallObjectWithKeywords(func, arg, (PyObject *)NULL)
On the matter of "What is public API?" here is a recent message from Martin v. Löwis:
Just to stress and support Georg's
explanation: the API is not defined
through the documentation, but instead
primarily through the header files.
All functions declared as PyAPI_FUNC
and not starting with _Py are public
API. There used to be a lot of undocumented API (up to 1.4, there was no API documentation at all, only the extension module tutorial); these days, more and more API gets documented.
http://mail.python.org/pipermail/python-dev/2011-February/107973.html
The reason it isn't documented is because you should be using PyObject_CallFunction instead.
The PyEval_* function family are the raw internal calls for the interpreter evaluation loop. The corresponding documented PyObject_* calls include all the additional interpreter state integrity checks, argument validation and stack protection.