data gets corrupted between c and python - python
I am trying to use Cython and ctypes to call a c library function using Python.
But the data bytes get corrupted somehow. Could someone please help to locate the issue?
testCRC.c:
#include <stdio.h>
unsigned char GetCalculatedCrc(const unsigned char* stream){
printf("Stream is %x %x %x %x %x %x %x\n",stream[0],stream[1],stream[2],stream[3],stream[4],stream[5],stream[6]);
unsigned char dummy=0;
return dummy;
}
wrapped.pyx:
# Exposes a c function to python
def c_GetCalculatedCrc(const unsigned char* stream):
return GetCalculatedCrc(stream)
test.py:
x_ba=(ctypes.c_ubyte *7)(*[0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.value)
output:
Stream is d3 ff f7 7f 0 0 5f # expected
0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41
Solution:
1.
I had to update the cython to 0.29 to have the fix for the bug which was not allowing to use the typed memory.(read only problem).
2.
It worked passing x_ca.raw. But when x_ca.value was passed it threw error 'out of bound access.'
After the suggestions from #ead & #DavidW:
´.pyx´:
def c_GetCalculatedCrc(const unsigned char[:] stream):
# Exposes a c function to python
print "received %s\n" %stream[6]
return GetCalculatedCrc(&stream[0])
´test.py´:
x_ba=(ctypes.c_ubyte *8)(*[0x47,0xD3,0xFF,0xF7,0x7F,0x00,0x00,0x41])
x_ca=(ctypes.c_char * len(x_ba)).from_buffer(x_ba)
y=c_GetCalculatedCrc(x_ca.raw)
output:
Stream is 47 d3 ff f7 7f 0 0 41
As pointed out by #DavidW the problem is your usage of x_ca.value: When x_ca.value is called, every time a new bytes-object is created (see documentation) and memory is copied:
x_ca.value is x_ca.value
#False -> every time a new object is created
However, when the memory is copied, it handles \0-character as end of string (which is typical for C-strings), as can be seen in source code:
static PyObject *
CharArray_get_value(CDataObject *self, void *Py_UNUSED(ignored))
{
Py_ssize_t i;
char *ptr = self->b_ptr;
for (i = 0; i < self->b_size; ++i)
if (*ptr++ == '\0')
break;
return PyBytes_FromStringAndSize(self->b_ptr, i);
}
Thus the result of x_ca.value is a bytes object of length 4, which doesn't share memory with x_ca - when you access stream[6] it leads to undefined behavior - anything could happen (also a crash).
So what can be done?
Normally, you just cannot have a pointer-argument in a def-function, but char * is an exception - a bytes object can be automatically converted to char *, which however doesn't happen via buffer protocol but via PyBytes_AsStringAndSize.
This is the reason, why you cannot pass x_ca to c_GetCalculatedCrc as it is: x_ca implements the buffer protocol, but is not a bytes-object and thus there is no PyBytes_AsStringAndSize.
An alternative is to use typed memory view, which utilizes the buffer protocol, i.e.
%%cython
def c_GetCalculatedCrc(const unsigned char[:] stream):
print(stream[6]);
and now passing x_ca directly, with original length/content:
c_GetCalculatedCrc(x_ca)
# 65 as expected
Another alternative would be to pass x_ca.raw to function expecting const unsigned char * as argument, as has been pointed out by #DavidW in comments, which shares memory with x_ca. However I would prefer the typed memory views - they are safer than raw pointers and you would not run into surprisingly undefined behavior.
Related
Incorrect CRC calculation in protocol. One is implemented using zlib and the other one is calculated in function
I am implementing a protocol in an STM32F412 board. It's almost done, I just need to do a CRC check for the received data. I tried using the internal CRC module for calculating the CRC but I could not match the result to any online CRC algorithm online, so I decided to do a simple implementation of the Ethernet CRC. static const uint32_t crc32_tab[] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; uint32_t calc_crc_calculate(uint8_t *pData, uint32_t uLen) { uint32_t val = 0xFFFFFFFFU; int i; for(i = 0; i < uLen; i++) { val = crc32_tab[(val ^ pData[i]) & 0xFF] ^ ((val >> 8) & 0x00FFFFFF); } return val^0xFFFFFFFF; } I calculated the crc of 0x6F and compared the result to the online calculators and it apparently matches. When I try to test the protocol with my python code I'm just unable to match the CRCs. On python I'm using the following code: d = 0x6f crc = zlib.crc32(bytes(d))&0xFFFFFFFF I'm now unable to tell which is right. Apparently my algorithm is OK because it matches the online calculator. BUT those online calculators do not seem to be reliable sometimes and I doubt that python's zlib implementation is wrong .. I may be using it wrong at worst.
Actually you can compute the Ethernet CRC32 with the builtin module of the STM32. It took me quite a while to make it match up as well. This code should match up for sizes divisible by 4 (I also used python zlib on the other end): #include "stm32l4xx_hal.h" uint32_t CRC32_Compute(const uint32_t *data, size_t sizeIn32BitWords) { CRC_HandleTypeDef hcrc = { .Instance = CRC, .Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE, .Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE, .Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_WORD, .Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_ENABLE, .InputDataFormat = CRC_INPUTDATA_FORMAT_WORDS, }; HAL_StatusTypeDef status = HAL_CRC_Init(&hcrc); assert (status == HAL_OK) uint32_t checksum = HAL_CRC_Calculate(&hcrc, data, sizeIn32BitWords); uint32_t checksumInverted = ~checksum; return checksumInverted; } The challenge with sizes not divisible by 4 is to get the "inversion/reversal" (changing the bit order) right. There is an example how the hardware handles this in the "RM0394 Reference manual STM32L43xxx STM32L44xxx STM32L45xxx STM32L46xxx advanced ARM®-based 32-bit MCUs Rev 3" on page 333. The essence is that reversal reverses the bit order. For CRC32 this reversal must happen on the word level, i.e. over 32 bits.
Ok. It certainly was a bug on my part. But it was happening in my python code. I suddenly realized that I was practically doing bytes(0x6F) which just creates an array with 111 positions. What I actually needed to do was import struct d = pack('B', 0x6F) crc = zlib.crc32(bytes(d))&0xFFFFFFFF This question could have been avoided had I just done a little bit of rubber duck debugging. Hopefuly this will help someone else.
SWIG, C, Python - Ignoring NULL terminators when passing a char * to python
Put quickly: I want to send a full char * from a C module (build with SWIG) to a python callback function. I can already do most of this BUT: my char array is just binary data and therefore contains 0s which is being seen as a NULL terminator in the C->Python conversion. Python only receives the bytes up until the first 0 rather than the full array. In the swig documentation it specifically says that treating char * as binary data is possible with a typemap but doesn't say how. I've been playing with cstring.i (%cstring_output_withsize etc) but I am relatively new to SWIG and am out of my depth. How do I tell SWIG to ignore the NULL terminator and use a size value instead when passing to python? ========================================================= If you need more detail: For a little background, I'm writing a very simple network packet format/transfer protocol for a 802.15.4 RF network. There are multiple architectures, languages and transceivers on the network so I'm writing my library to have a unified settable callback function allowing a small wrapper to be written on each platform that passes a buffer of bytes to the transciever using that host language. Below are the key parts of the code. //defined in Packets.h void (*send_callback_ptr)(char * payload, uint8_t payload_length, uint16_t destination, uint16_t sequence_no); //a function to set the callback void send_callback_set(void (*f)(char * payload_buffer, uint8_t payload_length, uint16_t destination, uint16_t sequence_no)) { send_callback_ptr = f; } uint8_t send(uint16_t target, enum PACKET_TYPE type, void * packet_object) { char * payload_buffer = malloc(128); //max buffer size is 128 uint8_t payload_length = 12; pack(target, type, packet_object, payload_buffer); uint16_t destination = target; uint16_t sequence_no = GLOBAL.hash++; send_callback_ptr(payload, payload_length, destination, sequence_no); }; At the top is the callback and the callback-setting function. 'send' takes a target, a packet type and a packet object (void * so python can pass it a pointer to a PyObject). It then packs all the data into char * payload_buffer using 'pack()' (pack just uses a set of rules to pack a packet object into a byte array ready to send). def py_callback(buffer, buffer_length, destination, seq_no): print buffer_length print "type:", type(buffer) print "length:", len(buffer) for i in range(len(buffer)): print i, '{:02X}'.format(ord(buffer[i])) MyModule.send_callback_set(py_callback) The python then looks like this which recieves the payload, payload_length etc from the call to send_callback_ptr at the end of the C function send(). In one example, the python callback function only receives 3 bytes when it should get 13 because the 4th byte in the array is a 0.
Cython print() outputs before C printf(), even when placed afterwards
I'm trying to pick up Cython. import counter cdef public void increment(): counter.increment() cdef public int get(): return counter.get() cdef public void say(int times): counter.say(times) This is the "glue code" I'm using to call functions from counter.py, a pure Python source code file. It's laid out like this: count = 0 def increment(): global count count += 1 def get(): global count return count def say(times): global count print(str(count) * times) I have successfully compiled and run this program. The functions work fine. However, a very strange thing occured when I tested this program: int main(int argc, char *argv[]) { Py_Initialize(); // The following two lines add the current working directory // to the environment variable `PYTHONPATH`. This allows us // to import Python modules in this directory. PyRun_SimpleString("import sys"); PyRun_SimpleString("sys.path.append(\".\")"); PyInit_glue(); // Tests for (int i = 0; i < 10; i++) { increment(); } int x = get(); printf("Incremented %d times\n", x); printf("The binary representation of the number 42 is"); say(3); Py_Finalize(); return 0; } I would expect the program to produce this output: Incremented 10 times The binary representation of the number 42 is 101010 However, it prints this: Incremented 10 times 101010 The binary representation of the number 42 is But if I change the line printf("The binary representation of the number 42 is"); to printf("The binary representation of the number 42 is\n"); then the output is corrected. This seems strange to me. I understand that if I want to print the output of a Python function, I might just as well return it to C and store it in a variable, and use C's printf() rather than the native Python print(). But I would be very interested to hear the reason this is happening. After all, the printf() statement is reached before the say() statement (I double checked this in gdb just to make sure). Thanks for reading.
Arduino to Raspberry crc32 check
I'm trying to send messages through the serial USB interface of my Arduino (C++) to a Raspberry Pi (Python). On the Arduino side I define a struct which I then copy into a char[]. The last part of the struct contains a checksum that I want to calculate using CRC32. I copy the struct into a temporary char array -4 bytes to strip the checksum field. The checksum is then calculated using the temporary array and the result is added to the struct. The struct is then copied into byteMsg which gets send over the serial connection. On the raspberry end I do the reverse, I receive the bytestring and calculate the checksum over the message - 4 bytes. Then unpack the bytestring and compare the received and calculated checksum but this fails unfortunately. For debugging I compared the crc32 check on both the python and arduino for the string "Hello World" and they generated the same checksum so doesn't seem to be a problem with the library. The raspberry is also able to decode the rest of the message just fine so the unpacking of the data into variables seem to be ok as well. Any help would be much appreciated. The Python Code: def unpackMessage(self, message): """ Processes a received byte string from the arduino """ # Unpack the received message into struct (messageID, acknowledgeID, module, commandType, data, recvChecksum) = struct.unpack('<LLBBLL', message) # Calculate the checksum of the recv message minus the last 4 # bytes that contain the sent checksum calcChecksum = crc32(message[:-4]) if recvChecksum == calcChecksum: print "Checksum checks out" The Aruino crc32 library taken from http://excamera.com/sphinx/article-crc.html crc32.h #include <avr/pgmspace.h> static PROGMEM prog_uint32_t crc_table[16] = { 0x00000000, 0x1db71064, 0x3b6e20c8, 0x26d930ac, 0x76dc4190, 0x6b6b51f4, 0x4db26158, 0x5005713c, 0xedb88320, 0xf00f9344, 0xd6d6a3e8, 0xcb61b38c, 0x9b64c2b0, 0x86d3d2d4, 0xa00ae278, 0xbdbdf21c }; unsigned long crc_update(unsigned long crc, byte data) { byte tbl_idx; tbl_idx = crc ^ (data >> (0 * 4)); crc = pgm_read_dword_near(crc_table + (tbl_idx & 0x0f)) ^ (crc >> 4); tbl_idx = crc ^ (data >> (1 * 4)); crc = pgm_read_dword_near(crc_table + (tbl_idx & 0x0f)) ^ (crc >> 4); return crc; } unsigned long crc_string(char *s) { unsigned long crc = ~0L; while (*s) crc = crc_update(crc, *s++); crc = ~crc; return crc; } Main Arduino Sketch struct message_t { unsigned long messageID; unsigned long acknowledgeID; byte module; byte commandType; unsigned long data; unsigned long checksum; }; void sendMessage(message_t &msg) { // Set the messageID msg.messageID = 10; msg.checksum = 0; // Copy the message minus the checksum into a char* // Then perform the checksum on the message and copy // the full msg into byteMsg char byteMsgForCrc32[sizeof(msg)-4]; memcpy(byteMsgForCrc32, &msg, sizeof(msg)-4); msg.checksum = crc_string(byteMsgForCrc32); char byteMsg[sizeof(msg)]; memcpy(byteMsg, &msg, sizeof(msg)); Serial.write(byteMsg, sizeof(byteMsg)); void loop() { message_t msg; msg.module = 0x31; msg.commandType = 0x64; msg.acknowledgeID = 0; msg.data = 10; sendMessage(msg); Kind Regards, Thiezn
You are making the classic struct-to-network/serial/insert communication layer mistake. Structs have hidden padding in order to align the members onto suitable memory boundaries. This is not guaranteed to be the same across different computers, let alone different CPUs/microcontrollers. Take this struct as an example: struct Byte_Int { int x; char y; int z; } Now on a basic 32-bit x86 CPU you have a 4-byte memory boundary. Meaning that variables are aligned to either 4 bytes, 2 bytes or not at all according to the type of variable. The example would look like this in memory: int x on bytes 0,1,2,3, char y on byte 4, int z on bytes 8,9,10,11. Why not use the three bytes on the second line? Because then the memory controller would have to do two fetches to get a single number! A controller can only read one line at a time. So, structs (and all other kinds of data) have hidden padding in order to get variables aligned in memory. The example struct would have a sizeof() of 12, and not 9! Now, how does that relate to your problem? You are memcpy()ing a struct directly into a buffer, including the padding. The computer on the other end doesn't know about this padding and misinterprets the data. What you need a serialization function that takes the members of your structs and pasts them into a buffer one at a time, that way you lose the padding and end up with something like this: [0,1,2,3: int x][4: char y][5,6,7,8: int z]. All as one lengthy bytearray/string which can be safely sent using Serial(). Of course on the other end you would have to parse this string into intelligible data. Python's unpack() does this for you as long as you give the right format string. Lastly, an int on an Arduino is 16 bits long. On a pc generally 4! So assemble your unpack format string with care.
The char array I was passing to the crc_string function contained '\0' characters. The crc_string was iterating through the array until it found a '\0' which shouldn't happen in this case since I was using the char array as a stream of bytes to be sent over a serial connection. I've changed the crc_string function to take the array size as argument and iterate through the array using that value. This solved the issue. Here's the new function unsigned long crc_string(char *s, size_t arraySize) { unsigned long crc = ~0L; for (int i=0; i < arraySize; i++) { crc = crc_update(crc, s[i]); } crc = ~crc; return crc; }
Python/SWIG: GC Object already tracked when trying to use a C function to dereference a pointer, from SWIG
I have an issue where I'm dealing with WORDs (2 byte unsigned integers). Here are the commands I usually run import mySimLib mySimLib.init() strPtr = mySimLib.strInit( 200 ) #where 200 is the number of characters I want in the #string. strInit returns a malloc'd pointer wPtr = mySimLib.wordInit () # where wordInit returns a malloc'd pointer to a WORD. mySimLib.Write ("Title", "Data", 4) # 4 is the number of bytes required to store data mySimLib.Search ("Title", strPtr, 200, wPtr) #Search finds the record with same title, #copies the data into strPtr up to the number of bytes in the record - as long as #the number of bytes in the strPtr is greater. mySimLib.printWord (wPtr) #Since I cannot use python to dereference word pointers, I call a C function to print it out. At this point, my program crashes. It throws an exception (reading violation) or some GC Object Already tracked error. The thing is - I have a string print function that never fails when I have it print. When I try to get the word to print, I do get errors. This is my wordptr initiating function: unsigned int * wordInit () { unsigned int * d = malloc ( sizeof ( unsigned int ) ); *d = 0; return d; } This is my printing function: void wordPrint (unsigned int * d){ printf ("\nWptr: %d",*d); } I've no idea what I'm doing wrong here but these crashes are very erratic and annoying.