I'm trying to send a struct over UART (from an ESP32) to be processed by Python by using this guide.
// we send this to the host, to be processed by python script
struct package {
uint8_t modifier;
uint8_t keyboard_keys[6];
};
// instantiate struct
package to_send = {};
// send the contents of keyboard_keys and keyboard_modifier_keys
// https://folk.uio.no/jeanra/Microelectronics/TransmitStructArduinoPython.html
void usb_keyboard_send(void)
{
to_send.modifier = keyboard_modifier_keys;
for(uint8_t i = 0; i < 6; i++) {
to_send.keyboard_keys[i] = keyboard_keys[i];
}
printf("S");
printf((uint8_t *)&to_send, sizeof(to_send));
printf("E");
}
However I get the error: invalid conversion from 'uint8_t* {aka unsigned char*}' to 'const char*' [-fpermissive]
I'm pretty new to C++, and I've tried all sorts of casting, but I just can't get it to work. Could someone offer guidance please?
Setting aside that it's generally a bad idea to mix ASCII and raw binary, your code is almost right.
You have 2 major errors:
// instantiate struct
package to_send = {};
should be:
// instantiate struct
struct package to_send = {};
Also, to write directly (not formatted text) to STDOUT you want to use fwrite()
i.e.
printf("S");
fwrite((uint8_t *)&to_send, sizeof(uint8_t), sizeof(struct_package), STDOUT);
printf("E");
As an aside, after fixing these 2 errors you may be surprised to find that your struct isn't the number of bytes you expect. The compiler may optimize it to make memory accesses faster by padding fields to word sized boundaries (32 bits on ESP32). sizeof() will return the correct value taking in to account whatever optimizations are done, but your Python code may not expect that. To fix this you probably wan to use a compiler hint, e.g. __attribute__((__packed__)). See here for a general guide to structure packing.
Related
I'm trying to work with robotics binary data but I am very stuck. I don't understand even how to work with it. I want to create a dataframe from it and use pandas to get some statistics.
When I open the file I get this:
struct Time
{
long long time;
unsigned short millitm;
short timezone;
short dstflag;
};
struct wxp1
{
float x;
float y;
};
struct wxp2
struct position
{
wxp1 position; // based on Basic Full(679x382)
wxp1 position2; // based on Basic Full(679x382)
wxp1 position3; // based on Basic Full(679x382)
wxp1 estimatedposition;
wxp1 estimatedposition2;
wxp1 estimatedposition3;
float score;
};
Followed by binary
00\x00\x00\x00\x00\x00\x00\x00\x etc
Not familiar at all with it I tried to open it with struct or methods I found on stackoverflow without success.
from numpy import fromfile, dtype
from pandas import DataFrame
records = fromfile('/content/my_file.blog')
df=DataFrame(records)
But I don't get something relevant with it...
I am using the pyelftools to read an elf file. How can I get an offset value or address of a member in a struct? For example, say I have the following struct in C.
typedef struct
{
int valA;
} TsA;
typedef struct
{
int valB;
} TsB;
typedef struct
{
int valC;
TsB b;
} TsC;
typedef struct
{
TsA a;
TsC c;
} TsStruct;
TsStrcut myStruct;
How can I get an address of myStruct.c.b.valB? I found a similar question here but did not find any good answer.
Find the DIE for the structure, the one with tag DW_TAG_structure_type and DW_AT_name equal to structure names.
Enumerate the DW_TAG_member subdies under it. While there, look at the DW_AT_member_location, it's the offset of the corresponding structure element.
It might help if you take a look at the DIE structure visually first. DWARF Explorer might help (disclaimer: I wrote it).
I am developing an IOT application that requires me to handle many small unstructured messages (meaning that their fields can change over time - some can appear and others can disappear). These messages typically have between 2 and 15 fields, whose values belong to basic data types (ints/longs, strings, booleans). These messages fit very well within the JSON data format (or msgpack).
It is critical that the messages get processed in their order of arrival (understand: they need to be processed by a single thread - there is no way to parallelize this part). I have my own logic for handling these messages in realtime (the throughput is relatively small, a few hundred thousand messages per second at most), but there is an increasing need for the engine to be able to simulate/replay previous periods by replaying a history of messages. Though it wasn't initially written for that purpose, my event processing engine (written in Go) could very well handle dozens (maybe in the low hundreds) of millions of messages per second if I was able to feed it with historical data at a sufficient speed.
This is exactly the problem. I have been storing many (hundreds of billions) of these messages over a long period of time (several years), for now in delimited msgpack format (https://github.com/msgpack/msgpack-python#streaming-unpacking). In this setting and others (see below), I was able to benchmark peak parsing speeds of ~2M messages/second (on a 2019 Macbook Pro, parsing only), which is far from saturating disk IO.
Even without talking about IO, doing the following:
import json
message = {
'meta1': "measurement",
'location': "NYC",
'time': "20200101",
'value1': 1.0,
'value2': 2.0,
'value3': 3.0,
'value4': 4.0
}
json_message = json.dumps(message)
%%timeit
json.loads(json_message)
gives me a parsing time of 3 microseconds/message, that is slightly above 300k messages/second. Comparing with ujson, rapidjson and orjson instead of the standard library's json module, I was able to get peak speeds of 1 microsecond/message (with ujson), that is about 1M messages/second.
Msgpack is slightly better:
import msgpack
message = {
'meta1': "measurement",
'location': "NYC",
'time': "20200101",
'value1': 1.0,
'value2': 2.0,
'value3': 3.0,
'value4': 4.0
}
msgpack_message = msgpack.packb(message)
%%timeit
msgpack.unpackb(msgpack_message)
Gives me a processing time of ~750ns/message (about 100ns/field), that is about 1.3M messages/second. I initially thought that C++ could be much faster. Here's an example using nlohmann/json, though this is not directly comparable with msgpack:
#include <iostream>
#include "json.hpp"
using json = nlohmann::json;
const std::string message = "{\"value\": \"hello\"}";
int main() {
auto jsonMessage = json::parse(message);
for(size_t i=0; i<1000000; ++i) {
jsonMessage = json::parse(message);
}
std::cout << jsonMessage["value"] << std::endl; // To avoid having the compiler optimize the loop away.
};
Compiling with clang 11.0.3 (std=c++17, -O3), this runs in ~1.4s on the same Macbook, that is to say a parsing speed of ~700k messages/second with even smaller messages than the Python example. I know that nlohmann/json can be quite slow, and was able to get parsing speeds of about 2M messages/second using simdjson's DOM API.
This is still far too slow for my use case. I am open to all suggestions to improve message parsing speed with potential applications in Python, C++, Java (or whatever JVM language) or Go.
Notes:
I do not necessarily care about the size of the messages on disk (consider it a plus if the storage method you suggest is memory-efficient).
All I need is a key-value model for basic data types - I do not need nested dictionaries or lists.
Converting the existing data is not an issue at all. I am simply looking for something read-optimized.
I do not necessarily need to parse the entire thing into a struct or a custom object, only to access some of the fields when I need it (I typically need a small fraction of the fields of each message) - it is fine if this comes with a penalty, as long as the penalty does not destroy the whole application's throughput.
I am open to custom/slightly unsafe solutions.
Any format I choose to use needs to be naturally delimited, in the sense that the messages will be written serially to a file (I am currently using one file per day, which is sufficient for my use case). I've had issues in the past with unproperly delimited messages (see writeDelimitedTo in the Java Protobuf API - lose a single byte and the entire file is ruined).
Things I have already explored:
JSON: experimented with rapidjson, simdjson, nlohmann/json, etc...)
Flat files with delimited msgpack (see this API: https://github.com/msgpack/msgpack-python#streaming-unpacking): what I am currently using to store the messages.
Protocol Buffers: slightly faster, but does not really fit with the unstructured nature of the data.
Thanks!!
I assume that messages only contain few named attributes of basic types (defined at runtime) and that these basic types are for example strings, integers and floating-point numbers.
For the implementation to be fast, it is better to:
avoid text parsing (slow because sequential and full of conditionals);
avoid checking if messages are ill-formed (not needed here as they should all be well-formed);
avoid allocations as much as possible;
work on message chunks.
Thus, we first need to design a simple and fast binary message protocol:
A binary message contains the number of its attributes (encoded on 1 byte) followed by the list of attributes. Each attribute contains a string prefixed by its size (encoded on 1 byte) followed by the type of the attribute (the index of the type in the std::variant, encoded on 1 byte) as well as the attribute value (a size-prefixed string, a 64-bit integer or a 64-bit floating-point number).
Each encoded message is a stream of bytes that can fit in a large buffer (allocated once and reused for multiple incoming messages).
Here is a code to decode a message from a raw binary buffer:
#include <unordered_map>
#include <variant>
#include <climits>
// Define the possible types here
using AttrType = std::variant<std::string_view, int64_t, double>;
// Decode the `msgData` buffer and write the decoded message into `result`.
// Assume the message is not ill-formed!
// msgData must not be freed or modified while the resulting map is being used.
void decode(const char* msgData, std::unordered_map<std::string_view, AttrType>& result)
{
static_assert(CHAR_BIT == 8);
const size_t attrCount = msgData[0];
size_t cur = 1;
result.clear();
for(size_t i=0 ; i<attrCount ; ++i)
{
const size_t keyLen = msgData[cur];
std::string_view key(msgData+cur+1, keyLen);
cur += 1 + keyLen;
const size_t attrType = msgData[cur];
cur++;
// A switch could be better if there is more types
if(attrType == 0) // std::string_view
{
const size_t valueLen = msgData[cur];
std::string_view value(msgData+cur+1, valueLen);
cur += 1 + valueLen;
result[key] = std::move(AttrType(value));
}
else if(attrType == 1) // Native-endian 64-bit integer
{
int64_t value;
// Required to not break the strict aliasing rule
std::memcpy(&value, msgData+cur, sizeof(int64_t));
cur += sizeof(int64_t);
result[key] = std::move(AttrType(value));
}
else // IEEE-754 double
{
double value;
// Required to not break the strict aliasing rule
std::memcpy(&value, msgData+cur, sizeof(double));
cur += sizeof(double);
result[key] = std::move(AttrType(value));
}
}
}
You probably need to write the encoding function too (based on the same idea).
Here is an example of usage (based on your json-related code):
const char* message = "\x01\x05value\x00\x05hello";
void bench()
{
std::unordered_map<std::string_view, AttrType> decodedMsg;
decodedMsg.reserve(16);
decode(message, decodedMsg);
for(size_t i=0; i<1000*1000; ++i)
{
decode(message, decodedMsg);
}
visit([](const auto& v) { cout << "Result: " << v << endl; }, decodedMsg["value"]);
}
On my machine (with an Intel i7-9700KF processor) and based on your benchmark, I get 2.7M message/s with the code using the nlohmann json library and 35.4M message/s with the new code.
Note that this code can be much faster. Indeed, most of the time is spent in efficient hashing and allocations. You can mitigate the problem by using a faster hash-map implementation (eg. boost::container::flat_map or ska::bytell_hash_map) and/or by using a custom allocator. An alternative is to build your own carefully tuned hash-map implementation. Another alternative is to use a vector of key-value pairs and use a linear search to perform lookups (this should be fast because your messages should not have a lot of attributes and because you said that you need a small fraction of the attributes per message).
However, the larger the messages, the slower the decoding. Thus, you may need to leverage parallelism to decode message chunks faster.
With all of that, this is possible to reach more than 100 M message/s.
I need to send float data to Arduino from Python and get the same value back. I thought to send some float data from the Arduino first. The data is sent as 4 successive bytes. I'm trying to figure out how to collect these successive bytes and convert it to proper format at the Python end (system end)
Arduino code:
void USART_transmitdouble(double* d)
{
union Sharedblock
{
char part[4];
double data;
} my_block;
my_block.data = *d;
for(int i=0;i<4;++i)
{
USART_send(my_block.part[i]);
}
}
int main()
{
USART_init();
double dble=5.5;
while(1)
{
USART_transmitdouble(&dble);
}
return 0;
}
Python code (system end):
my_ser = serial.Serial('/dev/tty.usbmodemfa131',19200)
while 1:
#a = raw_input('enter a value:')
#my_ser.write(a)
data = my_ser.read(4)
f_data, = struct.unpack('<f',data)
print f_data
#time.sleep(0.5)
Using the struct module as shown in the above code is able to print float values.
50% of the time,the data is printed correctly. However, if I mess with time.sleep() or stop the transmission and restart it, incorrect values are printed out. I think the wrong set of 4 bytes are being unpacked in this case. Any idea on what we can do here?
Any other ideas other than using struct module to send and receive float data to and from Arduino?
Well, the short answer is there's some interaction going on between software and hardware. I'm not sure how you're stopping the transmission. I suspect whatever you're doing is actually stopping the byte being sent mid-byte therefore inject a new byte when you start back up. The time.sleep() part could be that some hardware buffer is getting overflowed and you're losing bytes which causes an alignment offset. Once you start grabbing a few bytes from one float and a few bytes from another you'll start getting the wrong answer.
One thing I've noticed is that you do not have any alignment mechanism. This is often hard to do with a UART because all you can send are bytes. One way would be to send a handshake back and forth. Computer says restart, hardware restarts the connection (stops sending stuff, clears w/e buffers it has, etc) and sends some magic like 0xDEADBEEF. Then the computer can find this 0xDEADBEEF and know where the next message is going to start. You'll still need to be aware of whatever buffers exist in the hardware/OS and take precautions to not overflow them. There are a number of flow control methods ranging for XON/XOFF to actual hardware flow control.
Because this question ranks highly on search engines I have put together a working solution.
WARNING: Unless you need to full floating point precision, convert to a string and send that (either using sprintf or dtostrf, or use Serial.print(value,NumberOfDecimalPlaces) (documentation) ). This is because the following solution a) Wont work for machines of different endianess and b) some of the bytes may be misinterpreted as control characters.
Solution: Get the pointer for the floating point number and then pass it as a byte array to Serial.write().
e.g.
/*
Code to test send_float function
Generates random numbers and sends them over serial
*/
void send_float (float arg)
{
// get access to the float as a byte-array:
byte * data = (byte *) &arg;
// write the data to the serial
Serial.write (data, sizeof (arg));
Serial.println();
}
void setup(){
randomSeed(analogRead(0)); //Generate random number seed from unconnected pin
Serial.begin(9600); //Begin Serial
}
void loop()
{
int v1 = random(300); //Generate two random ints
int v2 = random(300);
float test = ((float) v1)/((float) v2); // Then generate a random float
Serial.print("m"); // Print test variable as string
Serial.print(test,11);
Serial.println();
//print test variable as float
Serial.print("d"); send_float(test);
Serial.flush();
//delay(1000);
}
Then to receive this in python I used your solution, and added a function to compare the the two outputs for verification purposes.
# Module to compare the two numbers and identify and error between sending via float and ASCII
import serial
import struct
ser = serial.Serial('/dev/ttyUSB0', 9600) // Change this line to your port (this is for linux ('COM7' or similar for windows))
while True:
if(ser.inWaiting() > 2):
command = ser.read(1) #read the first byte
if (command == 'm'):
vS = ser.readline()
#
ser.read(1)
data = ser.read(4)
ser.readline()
vF, = struct.unpack('<f',data)
vSf = float(vS)
diff = vF-vSf
if (diff < 0):
diff = 0-diff
if (diff < 1e-11):
diff = 0
print "Str:", vSf, " Fl: ", vF, " Dif:", diff
References:
Sending a floating point number from python to arduino and
How to send float over serial
I don't know Python, however, what is wrong with the Arduino sending the number like this:
value= 1.234;
Serial.println(value);
For the Arduino to receive a float:
#include <stdio.h>
#include <stdlib.h>
void loop() {
char data[10], *end;
char indata;
int i=0;
float value;
while ((indata!=13) & (i<10)) {
if (Serial.available() > 0) {
indata = Serial.read();
data[i] = indata;
i++;
}
}
i-=1;
data[i] = 0; // replace carriage return with 0
value = strtof(data,&end);
}
Note this code is untested although very similar to code I have used in the past.
I'm working on an Arduino project, and I am interfacing it with a Python script due to memory limitations. On the Python side I have a 2 dimensional matrix containing respective x, y values for coordinates, and in this list is 26000 coordinate pairs. So, in interest of clarifying the data structure for all of you, pathlist[0][0], would return the X value of the first coordinate of my list. Performing different operations, etc. on this list in Python is posing no problems. Where I am running into trouble however is sending these values to Arduino over serial, in a way that is useful.
Due to the nature of serial communication (at least I think this is the case) I must send each each integer as a string, and only one digit at a time. So, a number like 345 would be sent over as 3 individual characters, those being of course, 3, 4, then 5.
What I am struggling with is finding a way to rebuild those integers on the Arduino.
Whenever I send a value over, it's receiving the data and outputting it like so:
//Python is sending over the number '25'
2ÿÿ52
//Python is sending the number 431.
4ÿÿ321ÿÿÿ2
The Arduino code is:
String str;
int ds = 4;
void setup() {
Serial.begin(9600);
}
void loop(){
if (Serial.available()>0) {
for (int i=0; i<4; i=i+1) {
char d= Serial.read();
str.concat(d);
}
char t[str.length()+1];
str.toCharArray(t, (sizeof(t)));
int intdata = atoi(t);
Serial.print(intdata);
}
}
And the Python code looks like this:
import serial
s = serial.Serial(port='/dev/tty.usbmodemfd131', baudrate=9600)
s.write(str(25))
I'm almost certain that the problem isn't stemming from the output method (Serial.print), seeing as when I declare another int, it formats fine on output, so I am assuming the problem lies in how the intdata variable is constructed.
One thing of note that may help diagnose this problem is that if I change Serial.print(intdata) to Serial.print(intdata+5) my result is 2ÿÿ57, where I would expect 30 (25+5). This 7 is present regardless of the input. For instance I could write 271 to the serial and my result would look as follows:
//For input 271.
2ÿÿ771ÿÿÿ7
It appears to me that Arduino is chunking the values into pairs of two and appending the length to the end. I can't understand why that would happen though.
It also seems to me that the ÿ are being added in the for loop. Meaning that they are added because nothing is being sent at that current moment. But even fixing that by adding yet another if(Serial.available()>0) conditional, the result is still not treated like an integer.
Also, would using Pickle be appropriate here?
What am I doing wrong?
You should wait a bit for the serial data to arrive.
The Arduino code should be:
if (Serial.available()){
delay(100); // Wait for all data.
while (Serial.available()) {
char d = Serial.read();
str.concat(d);
}
}
Also you have to clear your string before re-using it.
[Edit]
I forgot to mention ÿ == -1 == 255 which means Serial.read() it is saying it can't read anything.
I would change the communication so python sends newlines between numbers, so you're not as dependent on the timing:
s.write(str(25)+'\n')
and then on the receiving side:
void loop(){
while (Serial.available() > 0) {
char d = Serial.read();
if (d == '\n') {
char t[str.length()+1];
str.toCharArray(t, (sizeof(t)));
int intdata = atoi(t);
Serial.print(intdata);
str = String();
}
else {
str.concat(d);
}
}
}