Python to C/C++ const char question

Python to C/C++ const char question - python

I am extending Python with some C++ code.
One of the functions I'm using has the following signature:
int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
char *format, char **kwlist, ...);
(link: http://docs.python.org/release/1.5.2p2/ext/parseTupleAndKeywords.html)
The parameter of interest is kwlist. In the link above, examples on how to use this function are given. In the examples, kwlist looks like:
static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
When I compile this using g++, I get the warning:
warning: deprecated conversion from string constant to ‘char*’
So, I can change the static char* to a static const char*. Unfortunately, I can't change the Python code. So with this change, I get a different compilation error (can't convert char** to const char**). Based on what I've read here, I can turn on compiler flags to ignore the warning or I can cast each of the constant strings in the definition of kwlist to char *. Currently, I'm doing the latter. What are other solutions?
Sorry if this question has been asked before. I'm new.

Does PyArg_ParseTupleAndKeywords() expect to modify the data you are passing in? Normally, in idiomatic C++, a const <something> * points to an object that the callee will only read from, whereas <something> * points to an object that the callee can write to.
If PyArg_ParseTupleAndKeywords() expects to be able to write to the char * you are passing in, you've got an entirely different problem over and above what you mention in your question.
Assuming that PyArg_ParseTupleAndKeywords does not want to modify its parameters, the idiomatically correct way of dealing with this problem would be to declare kwlist as const char *kwlist[] and use const_cast to remove its const-ness when calling PyArg_ParseTupleAndKeywords() which would make it look like this:
PyArg_ParseTupleAndKeywords(..., ..., ..., const_cast<char **>(kwlist), ...);

There is an accepted answer from seven years ago, but I'd like to add an alternative solution, since this topic seems to be still relevant.
If you don't like the const_cast solution, you can also create a write-able version of the string array.
char s_voltage[] = "voltage";
char s_state[] = "state";
char s_action[] = "action";
char s_type[] = "type";
char *kwlist[] = {s_voltage, s_state, s_action, s_type, NULL};
The char name[] = ".." copies the your string to a writable location.

Related

What are all the types available in Cython?

During a Cython meetup a speaker pointed to other data types such as cython.ssize_t. The type ssize_t is briefly mentioned in this Wikipedia article however it is not well explained. Similarly Cython documentation mentions types in terms of how types are automatically converted.
What are all the data types available in Cython and what are their specifications?

You have basically access to most of the C types:
Here are the equivalent of all the Python types (if I do not miss some), taken from Oreilly book cython book
Python bool:
bint (boolean coded on 4 bits, alias for short)
Python int and long
[unsigned] char
[unsigned] short
[unsigned] int
[unsigned] long
[unsigned] long long
Python float
float
double
long double
Python complex
float complex
double complex
Python bytes / str / unicode
char *
std::string
For the size_t and Py_ssite_t, keep in mind these are aliases.
Py_ssize_t is defined in python.h imported implicitly in cython. That can hold the size (in bytes) of the largest object the Python interpreter ever creates.
While size_t is a standard C89 type, defined in <stddef.h>.

Cython: when should I define a string as char*, str, or bytes?

When defining a variable type that will hold a string in Cython + Python 3, I can use (at least):
cdef char* mystring = "foo"
cdef str mystring = "foo"
cdef bytes mystring = "foo"
The documentation page on strings is unclear on this -- it mostly gives examples using char* and bytes, and frankly I'm having a lot of difficulty understanding it.
In my case the strings will be coming from a Python3 program and are assumed to be unicode. They will be used as dict keys and function arguments, but I will do no further manipulation on them. Needless to say I am trying to maximize speed.
This question suggests that under Python2.7 and without Unicode, typing as str makes string manipulation code run SLOWER than with no typing at all. (But that's not necessarily relevant here since I won't be doing much string manipulation.)
What are the advantages and disadvantages of each of these options?

If there is no further processing done on a particular type, it would be best and fastest to not type them at all, which means they are treated as a general purpose PyObject *.
The str type is a special case which means bytes on Python 2 and unicode on Python 3.
The str type is special in that it is the byte string in Python 2 and the Unicode string in Python 3
So code that types a string as str and handles it as unicode will break on python 2 where str means bytes.
Strings only need to be typed if they are to be converted to C char* or C++ std::string. There, you would use str to handle py2/py3 compatibility, along with helper functions to convert to/from bytes and unicode in order to be able to convert to either char* or std::string.
Typing of strings is for interoperability with C/C++, not for speed as such. Cython will auto-convert, without copying, a bytes string to a char* for example when it sees something like cdef char* c_string = b_string[:b_len] where b_string is a bytes type.
OTOH, if strings are typed without that type being used, Cython will do a conversion from object to bytes/unicode when it does not need to which leads to overhead.
This can be seen in the C code generated as Pyx_PyObject_AsString, Pyx_PyUnicode_FromString et al.
This is also true in general - the rule of thumb is if a specific type is not needed for further processing/conversion, best not to type it at all. Everything in python is an object so typing will convert from the general purpose PyObject* to something more specific.

Some quick testing revealed that for this particular case, only the str declaration worked -- all other options produced errors. Since the string is generated elsewhere in Python3, evidently the str type declaration is needed.
Whether it is faster not to make any declaration at all remains an open question.

Converting Python ProtoBuf to C++ ProtoBuf using SerializeToString() and ParseFromString() functions

Hi I've a simple example of addressbook.proto I am serializing using the protobuf SerailizeToString() function in python. Here's the code.
import address_pb2
person = address_pb2.Person()
person.id = 1234
person.name = "John Doe"
person.email = "jdoe#example.com"
phone = person.phones.add()
phone.number = "555-4321"
phone.type = address_pb2.Person.HOME
print(person.SerializeToString())
Where address_pb2 is the file I generated from the protobuf compiler. Note that the example is copied from the protoBuf tutorials. This gives me the following string.
b'\n\x08John Doe\x10\xd2\t\x1a\x10jdoe#example.com"\x0c\n\x08555-4321\x10\x01'
Now I want to import this string into c++ protobuf. For this I wrote the following code.
#include <iostream>
#include <fstream>
#include <string>
#include "address.pb.h"
using namespace std;
int main(int argc, char* argv[]) {
GOOGLE_PROTOBUF_VERIFY_VERSION;
tutorial::AddressBook address_book;
string data = "\n\x08""John Doe\x10""\xd2""\t\x1a""\x10""jdoe#example.com\"\x0c""\n\x08""555-4321\x10""\x01""";
if(address_book.ParseFromString(data)){
cout<<"working"<< endl;
}
else{
cout<<"not working" << endl;
}
// Optional: Delete all global objects allocated by libprotobuf.
google::protobuf::ShutdownProtobufLibrary();
return 0;
}
Here I am simply trying to import the script using ParseFromString() fucntion but this doesn't work and I am not sure how it will work as I've been stuck on this since a long time now.
I tried changing the binary a bit to suit the c++ version but still no idea if I am on the right path or not.
How can I achieve this ? Does anybody have a clue ?

In Python, you are serializing a Person object. In C++, you are trying to parse an AddressBook object. You need to use the same type on both ends.
(Note that protobuf does NOT guarantee that it will detect these errors. Sometimes when you parse a message as the wrong type, the parse will appear to succeed, but the content will be garbage.)
There's another issue with your code that happens not to be a problem in this specific case, but wouldn't work in general:
string data = "\n\x08""John Doe\x10""\xd2""\t\x1a""\x10""jdoe#example.com\"\x0c""\n\x08""555-4321\x10""\x01""";
This line won't work if the string has any NUL bytes, i.e. '\x00'. If so, that byte would be interpreted as the end of the string. To avoid this problem you need to specify the length of the data, like:
string data("\n\x08""John Doe\x10""\xd2""\t\x1a""\x10""jdoe#example.com\"\x0c""\n\x08""555-4321\x10""\x01""", 45);

what happens when you compare two strings in python

When comparing strings in python e.g.
if "Hello" == "Hello":
#execute certain code
I am curious about what the code is that compares the strings. So if i were to compare these in c i would just compare each character and break when one character doesn't match. i'm wondering exactly what the process is of comparing two strings like this, i.e. when it will break and if there is any difference between this comparison and the method said above other than redundancy in lines of code

I'm going to assume you are using CPython here, the standard Python.org implementation. Under the hood, the Python string type is implemented in C, so yes, testing if two strings are equal is done exactly like you'd do it in C.
What it does is use the memcmp() function to test if the two str objects contain the same data, see the unicode_compare_eq function defined in unicodeobject.c:
static int
unicode_compare_eq(PyObject *str1, PyObject *str2)
{
int kind;
void *data1, *data2;
Py_ssize_t len;
int cmp;
len = PyUnicode_GET_LENGTH(str1);
if (PyUnicode_GET_LENGTH(str2) != len)
return 0;
kind = PyUnicode_KIND(str1);
if (PyUnicode_KIND(str2) != kind)
return 0;
data1 = PyUnicode_DATA(str1);
data2 = PyUnicode_DATA(str2);
cmp = memcmp(data1, data2, len * kind);
return (cmp == 0);
}
This function is only called if str1 and str2 are not the same object (that's an easy and cheap thing to test). It first checks if the two objects are the same length and store the same kind of data (string objects use a flexible storage implementation to save memory; different storage means the strings can't be equal).
There are other Python implementations, like Jython or IronPython, which may use different techniques, but it basically will come down to much the same thing.

Python c-api and unicode strings

I need to convert between python objects and c strings of various encodings. Going from a c string to a unicode object was fairly simple using PyUnicode_Decode, however Im not sure how to go the other way
//char* can be a wchar_t or any other element size, just make sure it is correctly terminated for its encoding
Unicode(const char *str, size_t bytes, const char *encoding="utf-16", const char *errors="strict")
:Object(PyUnicode_Decode(str, bytes, encoding, errors))
{
//check for any python exceptions
ExceptionCheck();
}
I want to create another function that takes the python Unicode string and puts it in a buffer using a given encodeing, eg:
//fills buffer with a null terminated string in encoding
void AsCString(char *buffer, size_t bufferBytes,
const char *encoding="utf-16", const char *errors="strict")
{
...
}
I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
Note: both methods above are members of a c++ Unicode class that wraps the python api
I'm using Python 3.0

I suspect it has somthing to do with PyUnicode_AsEncodedString however that returns a PyObject so I'm not sure how to put that into my buffer...
The PyObject returned is a PyStringObject, so you just need to use PyString_Size and PyString_AsString to get a pointer to the string's buffer and memcpy it to your own buffer.
If you're looking for a way to go directly from a PyUnicode object into your own char buffer, I don't think that you can do that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.