Extract data from hex CAN payload - python

Essentially I have this .blf file which has a bunch of CAN frames in hex.
In CAN, one frame has a bunch of message fields.
I'd like to grab the Data field.
For example: 1a01 2122 25f4 a187 ea80 2891 a223 4542
Is a CAN frame. Somewhere in that frame is the Data message that I can convert into Decimal.
How do I go about recognizes which Hex codes contain the message?
Thanks in advnace

BLF format not only contains the data of the CAN frame, but also other information like the id of the application which created the BLF file, timestamps, arbitration id, etc.
Additionally the data could also be compressed.
As you have python in the tags, I'd suggest you take a look at python-can
This library has support for reading and writing BLF files.

Related

Extracting data from pcap files

I am trying to extract NetFlow Records from a .pcap file, however the data comes up in a non-readable format, like on the attached picture below.
I am unsure how to convert this into a readable format.
I essentially want to get the payload information from the packet capture.
I have tried using Python's scapy library, but I can still not convert it to human readable text.

how can I extract Chinese text from PDF using simple ‘with open’?

I need to extract pdf text using python,but pdfminer and others are too big to use,but when using simple "with open xxx as xxx" method, I met a problem , the content part didn't extract appropriately. The text looks like bytes because it start with b'. My code and the result screenshot:
with open(r"C:\Users\admin\Desktop\aaa.pdf","rb") as file:
aa=file.readlines()
for a in aa:
print(a)
Output Screenshot:
To generate an answer from the comments...
when using simple "with open xxx as xxx" method, I met a problem , the content part didn't extract appropriately
The reason is that PDF is not a plain text format but instead a binary format whose contents may be compressed and/or encrypted. For example the object you posted a screenshot of,
4 0 obj
<</Filter/FlateDecode/Length 210>>
stream
...
endstream
endobj
contains FLATE compressed data between stream and endstream (which is indicated by the Filter value FlateDecode).
But even if it was not compressed or encrypted, you might still not recognize any text displayed because each PDF font object can use its own, completely custom encoding. Furthermore, glyphs you see grouped in a text line do not need to be drawn by the same drawing instruction in the PDF, you may have to arrange all the strings in drawing instructions by coordinate to be able to find the text of a text line.
(For some more details and backgrounds read this answer which focuses on the related topic of replacement of text in a PDF.)
Thus, when you say
pdfminer and others are too big to use
please consider that they are so big for a reason: They are so big because you need that much code for adequate text extraction. This is in particular true for Chinese text; for simple PDFs with English text there are some short cuts working in benign circumstances, but for PDFs with CJK text you should not expect such short cuts.
If you want to try nonetheless and implement text extraction yourself, grab a copy of ISO 32000-1 or ISO 32000-2 (Google for pdf32000 for a free copy of the former) and study that pdf specification. Based on that information you can step by step learn to parse those binary strings to pdf objects, find content streams therein, parse the instructions in those content streams, retrieve the text pieces drawn by those instructions, and arrange those pieces correctly to a whole text.
Don't expect your solution to be much smaller than pdfminer etc...

adding a cover page to a csv/excel file in python

I am trying to add a cover page to a csv file in python which would display like the general information such as the date and name. My program currently exports mysql data to a csv file in python. When I open the csv, its an excel file. I am trying to add a cover page to this excel file which is in the csv format. Could you give me some ideas as to how I could go about doing this?
You can't add a cover page to a CSV file.
CSV is short for "Comma-separated values". It is defined to just be values separated by commas and nothing else. Wikipedia states that:
RFC 4180 proposes a specification for the CSV format, and this is the
definition commonly used. However, in popular usage "CSV" is not a
single, well-defined format. As a result, in practice the term "CSV"
might refer to any file that:
is plain text using a character set such as ASCII, various Unicode character sets (e.g. UTF-8), EBCDIC, or Shift JIS,
consists of records (typically one record per line),
with the records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or
tab; sometimes the delimiter may include optional spaces),
where every record has the same sequence of fields.
This assertion is important for any application which wants to read the file. How would an application deal with weird unexpected data in some proprietary format?
You can, however, invent your own proprietary format, which only you know how to read. This could include data for a cover (as an image, pdf, latex or something else) and the data for your CSV. But this would be quite an undertaking and there are a million ways to approach this problem. The scope on how to implement such a thing is beyond the scope of stackoverflow. Try breaking down your question.

How do I tell python what my data structure (that is in binary) looks like so I can plot it?

I have a data set that looks like this.
b'\xa3\x95\x80\x80YFMT\x00BBnNZ\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Type,Length,Name,Format,Columns\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa3\x95\x80\x81\x17PARMNf\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Name,Value\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa3\x95\x80\x82-GPS\x00BIHBcLLeeEefI\x00\x00\x00Status,TimeMS,Week,NSats,HDop,Lat,Lng,RelAlt,Alt,Spd,GCrs,VZ,T\x00\x00\xa3\x95\x80\x83\x1fIMU\x00Iffffff\x00\x00\x00\x00\x00\x00\x00\x00\x00TimeMS,GyrX,GyrY,G
I have been reading around to try and find how do I implement a code into python that will allow me to parse this data so that I can plot some of the column against each other (Mostly time).
Some things I found that may help in doing this:
There is a code that will allow me to convert this data into a CSV file. I know how to use the code and convert it to a CSV file and plot from there, but for a learning experience I want to be able to do this without converting it to a CSV file. Now I tried reading that code but I am clueless since I am very new to python. Here is the link to the code:
https://github.com/PX4/Firmware/blob/master/Tools/sdlog2/sdlog2_dump.py
Also, Someone posted this saying this might be the log format, but again I couldn't understand or run any code on that page.
http://dev.px4.io/advanced-ulog-file-format.html
A good starting point for parsing binary data is the struct module https://docs.python.org/3/library/struct.html and it's unpack function. That's what the CSV dump routine you linked to is doing as well. If you walk through the process method, it's doing the following:
Read a chunk of binary data
Figure out if it has a valid header
Check the message type - if it's a FORMAT message parse that. If it's
a description message, parse that.
Dump out a CSV row
You could modify this code to essentially replace the __printCSVRow method with something that captures the data into a pandas dataframe (or other handy data structure) so that when the main routine is all done you can grab all the data from the dataframe and plot it.

Converting raw binary data into an image file?

I'm trying to read a field from an Active Directory entry which contains raw jpeg binary data. I'd like to read that data and convert it to an image file for use in my django-based application. I cannot for the life of me figure out how to handle this data in a nice way. Any ideas?
Edit:
To anyone who might come across this in the future: there's a method in python's OS library:
os.tmpfile()
it creates a file and destroys it once the file descriptor is closed. Very useful for this situation.
Here is somebody who was having the same problem -- check out the latest post at the bottom.
http://groups.google.com/group/django-users/browse_thread/thread/4214db6699863ded/5d816b02daca3186
Looks like passing raw data to SimpleUploadedFile is what you are looking for.
request._raw_post_data
The raw HTTP POST data as a byte
string. This is useful for processing
data in different formats than of
conventional HTML forms: binary
images, XML payload etc.
http://docs.djangoproject.com/en/dev/ref/request-response/#httprequest-objects
I know this isn't part of the question, but this looks pretty awesome! "HttpRequest.read() file-like interface"
http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read

Categories