Description of problem
I have to migrate some code to Python 3. The compilation terminated with success. But I have a problem on the runtime:
static PyObject* Parser_read(PyObject * const self, PyObject * unused0, PyObject * unused1) {
//Retrieve bytes from the underlying data stream.
//In this case, an iterator
PyObject * const i = PyIter_Next(self->readIterator);
//If the iterator returns NULL, then no more data is available.
if(i == NULL)
{
Py_RETURN_NONE;
}
//Treat the returned object as just bytes
PyObject * const bytes = PyObject_Bytes(i);
Py_DECREF(i);
if( not bytes )
{
//fprintf(stderr, "try to read %s\n", PyObject_Str(bytes));
PyErr_SetString(PyExc_ValueError, "iterable must return bytes like objects");
return NULL;
}
....
}
In my python code, I have something like that:
for data in Parser(open("file.txt")):
...
The code works well on Python 2. But on Python 3, I got:
ValueError: iterable must return bytes like objects
Update
The solution of #casevh works well in all test cases except one: when I wrap the stream:
def wrapper(stream):
for data in stream:
for i in data:
yield i
for data in Parser(wrapper(open("file.txt", "rb"))):
...
and I got:
ValueError: iterable must return bytes like objects
One option is to open the file in binary mode:
open("file.txt", "rb")
That should create an iterator that returns a sequence of bytes.
Python 3 strings are assumed to be Unicode and without proper encoding/decoding, they shouldn't be interpreted as a sequence of bytes. If you are reading plain ASCII text, and not a binary data stream, you could also convert from Unicode to ASCII. See PyUnicode_AsASCIIString() and related functions.
As noted by #casevh, in Python you need to decide whether your data is binary or text. The fact that you are iterating lines makes me think that the latter is the case.
def wrapper(stream):
for data in stream:
for i in data:
yield i
works in Python 2, because iterating a str will yield 1-character strings; in Python 3, iterating over a bytes object will yield individual bytes that are integers in range 0 - 255. You can get the the code work identically in Python 2 and 3 (and identically to the Python 2 behaviour of the code above) by using range and slicing 1 byte/character at a time:
def wrapper(stream):
for data in stream:
for i in range(len(data)):
yield data[i:i + 1]
P.S. You also have a mistake in your C extension code: Parser_read takes 3 arguments, 2 of which are named unused_x. Only a method annotated with METH_KEYWORDS takes 3 arguments (PyCFunctionWithKeywords); all others, including METH_NOARGS must be functions taking 2 arguments (PyCFunction).
Related
I am translating a source of NodeJS to Python. However there is a function readUInt32BE that I do not quite understand how it works
Original Code
const buf = Buffer.from("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==", 'base64');
const appId = parseInt(buf.slice(0, 4).toString('hex'), 16);
const serverIdLength = buf.slice(4, 8).readUInt32BE(0);
Here is what I have tried so far in Python
encodeToken = base64.b64decode("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==")
appId = encodeToken[:4]
appId = appId.hex()
serverIdLength = ......
If possible, can you write a function that works the same as readUInt32BE(0) and explain it for me ? Thanks
I'm assuming from the name that the function interpreters an arbitrary sequence of 4 bytes as an unsigned 32-bit (big endian) integer.
The corresponding Python function would be struct.unpack with an appropriate format string.
import struct
appId = encodeToken[:4]
serverIdLength = struct.unpack(">I", appId)[0]
# ">" means "big-endian"
# "I" means 4-byte unsigned integer
No need to to get the a hex representation of the bytes first. unpack always returns a tuple, even if only one value is created by the format string, so you need to take the first element of that tuple as the final value.
I have a C function
int * myfunc()
{
int * ret = (int *) malloc(sizeof(int)*5);
...
return ret;
}
in python I can call it as
ret = lib.myfunc()
but I can't seem to figure out how to actually use ret in the python code (i.e. cast it to an int array of length 5.
I see lots of documentation (and questions here) about how to pass a python array into a C function, but not how one deals with an array returned from a C function.
the only thing I've figured out so far (which sort of works, but seems ugly)
buf = ffi.buffer(ret,ffi.sizeof("int")*5)
int_array = ffi.from_buffer("int *", buf)
is that what I'm supposed to do? or is there a better way?
In C, the type int * and the type int[5] are equivalent at runtime. That means that although you get a int * from calling the function, you can directly use it as if it were a int[5]. All the same operations work with the exception of len(). For example:
ret = lib.myfunc()
my_python_list = [ret[i] for i in range(5)]
lib.free(ret) # see below
As the function called malloc(), you probably need to call free() yourself, otherwise you have a memory leak. Declare it by adding the line void free(void *); to the cdef() earlier.
There are some more advanced functions whose usage is optional here: you could cast the type from int * to int[5] with p = ffi.cast("int[5]", p); or instead you could convert the C array into a Python list with an equivalent but slightly faster call my_python_list = ffi.unpack(ret, 5).
I have created the following class using pybind11:
py::class_<Raster>(m, "Raster")
.def(py::init<double*, std::size_t, std::size_t, std::size_t, double, double, double>());
However I have no idea how I would call this constructor in Python.. I see that Python expects a float in the place of the double*, but I cannot seem to call it.
I have tried, ctypes.data_as(ctypes.POINTER(ctypes.c_double)) but this does not work...
Edit:
I have distilled the answer from #Sergei answer.
py::class_<Raster>(m, "Raster", py::buffer_protocol())
.def("__init__", [](Raster& raster, py::array_t<double> buffer, double spacingX, double spacingY, double spacingZ) {
py::buffer_info info = buffer.request();
new (&raster) Raster3D(static_cast<double*>(info.ptr), info.shape[0], info.shape[1], info.shape[2], spacingX, spacingY, spacingZ);
})
Pybind does automatic conversions. When you bind f(double *) the argument is assumed to be a pointer to a singe value, not a pointer to the array begin, because it would be quite unnatural to expect such input from python side. So pybind will convert argument using this logic.
If you need to pass raw array to c++ use py::buffer like here:
py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
.def("__init__", [](Matrix &m, py::buffer b) {
typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
/* Request a buffer descriptor from Python */
py::buffer_info info = b.request();
/* Some sanity checks ... */
if (info.format != py::format_descriptor<Scalar>::format())
throw std::runtime_error("Incompatible format: expected a double array!");
if (info.ndim != 2)
throw std::runtime_error("Incompatible buffer dimension!");
auto strides = Strides(
info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
auto map = Eigen::Map<Matrix, 0, Strides>(
static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
new (&m) Matrix(map);
});
To make it work you need to pass a type which follows python buffer protocol.
From the tweet here:
import sys
x = 'ñ'
print(sys.getsizeof(x))
int(x) #throws an error
print(sys.getsizeof(x))
We get 74, then 77 bytes for the two getsizeof calls.
It looks like we are adding 3 bytes to the object, from the failed int call.
Some more examples from twitter (you may need to restart python to reset the size back to 74):
x = 'ñ'
y = 'ñ'
int(x)
print(sys.getsizeof(y))
77!
print(sys.getsizeof('ñ'))
int('ñ')
print(sys.getsizeof('ñ'))
74, then 77.
The code that converts strings to ints in CPython 3.6 requests a UTF-8 form of the string to work with:
buffer = PyUnicode_AsUTF8AndSize(asciidig, &buflen);
and the string creates the UTF-8 representation the first time it's requested and caches it on the string object:
if (PyUnicode_UTF8(unicode) == NULL) {
assert(!PyUnicode_IS_COMPACT_ASCII(unicode));
bytes = _PyUnicode_AsUTF8String(unicode, NULL);
if (bytes == NULL)
return NULL;
_PyUnicode_UTF8(unicode) = PyObject_MALLOC(PyBytes_GET_SIZE(bytes) + 1);
if (_PyUnicode_UTF8(unicode) == NULL) {
PyErr_NoMemory();
Py_DECREF(bytes);
return NULL;
}
_PyUnicode_UTF8_LENGTH(unicode) = PyBytes_GET_SIZE(bytes);
memcpy(_PyUnicode_UTF8(unicode),
PyBytes_AS_STRING(bytes),
_PyUnicode_UTF8_LENGTH(unicode) + 1);
Py_DECREF(bytes);
}
The extra 3 bytes are for the UTF-8 representation.
You might be wondering why the size doesn't change when the string is something like '40' or 'plain ascii text'. That's because if the string is in "compact ascii" representation, Python doesn't create a separate UTF-8 representation. It returns the ASCII representation directly, which is already valid UTF-8:
#define PyUnicode_UTF8(op) \
(assert(_PyUnicode_CHECK(op)), \
assert(PyUnicode_IS_READY(op)), \
PyUnicode_IS_COMPACT_ASCII(op) ? \
((char*)((PyASCIIObject*)(op) + 1)) : \
_PyUnicode_UTF8(op))
You also might wonder why the size doesn't change for something like '1'. That's U+FF11 FULLWIDTH DIGIT ONE, which int treats as equivalent to '1'. That's because one of the earlier steps in the string-to-int process is
asciidig = _PyUnicode_TransformDecimalAndSpaceToASCII(u);
which converts all whitespace characters to ' ' and converts all Unicode decimal digits to the corresponding ASCII digits. This conversion returns the original string if it doesn't end up changing anything, but when it does make changes, it creates a new string, and the new string is the one that gets a UTF-8 representation created.
As for the cases where calling int on one string looks like it affects another, those are actually the same string object. There are many conditions under which Python will reuse strings, all just as firmly in Weird Implementation Detail Land as everything we've discussed so far. For 'ñ', the reuse happens because this is a single-character string in the Latin-1 range ('\x00'-'\xff'), and the implementation stores and reuses those.
I am currently translating a rospy IMU-driver to roscpp and have difficulites figuring out what this piece of code does and how I can translate it.
def ReqConfiguration(self):
"""Ask for the current configuration of the MT device.
Assume the device is in Config state."""
try:
masterID, period, skipfactor, _, _, _, date, time, num, deviceID,\
length, mode, settings =\
struct.unpack('!IHHHHI8s8s32x32xHIHHI8x', config)
except struct.error:
raise MTException("could not parse configuration.")
conf = {'output-mode': mode,
'output-settings': settings,
'length': length,
'period': period,
'skipfactor': skipfactor,
'Master device ID': masterID,
'date': date,
'time': time,
'number of devices': num,
'device ID': deviceID}
return conf
I have to admit that I never ever worked with neither ros nor python before.
This is no 1:1 code from the source, I removed the lines I think I know what they do, but especially the try-block is what I don't understand. I would really appreciate help, because I am under great preasure of time.
If someone is curious(context reasons): The files I have to translate are mtdevice.py , mtnode.py and mtdef.py and can be found googleing for the filesnames + the keyword ROS IMU Driver
Thanks a lot in advance.
This piece of code unpacks the fields of a C structure, namely masterID, period, skipfactor, _, _, _, date, time, num, deviceID, length, mode, settings, stores those in a Python dictionary and returns that dictionary as call result. The underscores are placeholders for the parts of the struct that aren't used.
See also: https://docs.python.org/2/library/struct.html, e.g. for a description of the format string ('!IHHHHI8s8s32x32xHIHHI8x') that tells the unpack function what the struct looks like.
The syntax a, b, c, d = f () means that the function returns a thing called a tuple in Python. By assigning a tuple to multiple variables, it's split into its fields.
Example:
t = (1, 2, 3, 4)
a, b, c, d = t
# At this point a == 1, b == 2, c == 3, d == 4
To replace this piece of code by C++ should not be too hard, since C++ has structs much like C. So the simplest C++ implementation of requestConfiguration would be to just return that struct. If you want to stay closer to the Python functionality, your function could put the fields of the struct into a C++ STL map and return that. The format string + the docs that the link points to, tell you what data types are in your struct and where.
Note that it's the second parameter of unpack that holds your data, the first parameter just contains information on the layout (format) of the second parameter, as explained in the link. The second parameter looks to Python as if it's a string, but it's actually a C struct. The first parameter tells Python where to find what in that struct.
So if you read the docs on format strings, you can find out the layout of your second parameter (C struct). But maybe you don't need to. It depends on the caller of your function. It may just expect the plain C struct.
From your added comments I understand that there's more code in your function than you show. The fields of the structs are assigned to attributes of a class.
If you know the field names of your C struct (config) then you can assign them directly to the attributes of your C++ class.
// Pointer 'this' isn't needed but inserted for clarity
this->mode = config.mode;
this->settings = config.settings;
this->length = config.length;
I've assumed that the field names of the config struct are indeed mode, settings, length etc. but you'd have to verify that. Probably the layout of this struct is declared in some C header file (or in the docs).
To do the same thing with C++, you'd declare a struct with the various parameters:
struct DeviceRecord {
uint32_t masterId;
uint16_t period, skipfactor, _a, _b;
uint32_t _c;
char date[8];
char time[8];
char padding[64];
uint16_t num;
uint32_t deviceID;
uint16_t length, mode;
uint32_t settings;
char padding[8];
};
(It's possible this struct is already declared somewhere; it might also use "unsigned int" instead of "uint32_t" and "unsigned short" instead of "uint16_t", and _a, _b, _c would probably have real names.)
Once you have your struct, the question is how to get the data. That depends on where the data is. If it's in a file, you'd do something like this:
DeviceRecord rec; // An instance of the struct, whatever it's called
std::ifstream fin("yourfile.txt", std::ios::binary);
fin.read(reinterpret_cast<char*>(&rec), sizeof(rec));
// Now you can access rec.masterID etc
On the other hand, if it's somewhere in memory (ie, you have a char* or void* to it), then you just need to cast it:
void* data_source = get_data(...); // You'd get this from somewhere
DeviceRecord* rec_ptr = reinterpret_cast<DeviceRecord*>(stat_source);
// Now you can access rec_ptr->masterID etc
If you have a std::vector, you can easily get such a pointer:
std::vector<uint8_t> data_source = get_data(...); // As above
DeviceRecord* rec_ptr = reinterpret_cast<DeviceRecord*>(data_source.data());
// Now you can access rec_ptr->masterID etc, provided data_source remains in scope. You should probably also avoid modifying data_source.
There's one more issue here. The data you've received is in big-endian, but unless you have a PowerPC or other unusual processor, you're probably on a little-endian machine. So you need to do a little byte-swapping before you access the data. You can use the following function to do this.
template<typename Int>
Int swap_int(Int n) {
if(sizeof(Int) == 2) {
union {char c[2]; Int i;} swapper;
swapper.i = n;
std::swap(swapper.c[0], swapper.c[1]);
n = swapper.i;
} else if(sizeof(Int) == 4) {
union {char c[4]; Int i;} swapper;
swapper.i = n;
std::swap(swapper.c[0], swapper.c[3]);
std::swap(swapper.c[1], swapper.c[2]);
n = swapper.i;
}
return n;
}
These return the swapped value rather than changing it in-place, so now you'd access your data with something like swap_int(rec->num). NB: The above byte-swapping code is untested; I'll try compiling it a bit later and fix it if necessary.
Without more information, I can't give you a definitive way of doing this, but perhaps this will be enough to help you work it out on your own.