How to read in binary data after ascii header in Python

How to read in binary data after ascii header in Python - python

I have some imaging data that's stored in a file that contains an ascii text header, ending with a null character, followed by the binary data. The ascii headers vary in length, and I'm wondering what's the best way to open the file, read the header and find the null character, and then load the binary data (in Python).
Thanks for the help,
James

Probably ought to start with something like this.
with open('some file','rb') as input:
aByte= input.read(1)
while aByte and ord(aByte) != 0: aByte= input.read(1)
# At this point, what's left is the binary data.
Python version numbers matter a lot for this kind of thing. The issue is the result of the read function. Some versions can return bytes (which are numbers). Other versions will return strings (which requires ord(aByte)).

Does something like this work:
with open('some_file','rb') as f:
binary_data = f.read().split('\0',1)[1]

Other people have already answered your direction question, but I thought I'd add this.
When working with binary data, I often find it useful to subclass file and add various convince methods for reading/writing packed binary data.
It's overkill for simple things, but if you find yourself parsing lots of binary file formats, it's worth the extra effort to avoid repeating yourself.
If nothing else, hopefully it serves as a useful example of how to use struct. On a side note, this is pulled from older code, and is very much python 2.x. Python 3.x handles this (particularly strings vs. bytes) significantly differently.
import struct
import array
class BinaryFile(file):
"""
Automatically packs or unpacks binary data according to a format
when reading or writing.
"""
def __init__(self, *args, **kwargs):
"""
Initialization is the same as a normal file object
%s""" % file.__doc__
super(BinaryFile, self).__init__(self, *args, **kwargs)
def read_binary(self,fmt):
"""
Read and unpack a binary value from the file based
on string fmt (see the struct module for details).
This will strip any trailing null characters if a string format is
specified.
"""
size = struct.calcsize(fmt)
data = self.read(size)
# Reading beyond the end of the file just returns ''
if len(data) != size:
raise EOFError('End of file reached')
data = struct.unpack(fmt, data)
for item in data:
# Strip trailing zeros in strings
if isinstance(item, str):
item = item.strip('\x00')
# Unpack the tuple if it only has one value
if len(data) == 1:
data = data[0]
return data
def write_binary(self, fmt, dat):
"""Pack and write data to the file according to string fmt."""
# Try expanding input arguments (struct.pack won't take a tuple)
try:
dat = struct.pack(fmt, *dat)
except (TypeError, struct.error):
# If it's not a sequence (TypeError), or if it's a
# string (struct.error), don't expand.
dat = struct.pack(fmt, dat)
self.write(dat)
def read_header(self, header):
"""
Reads a defined structure "header" consisting of a sequence of (name,
format) strings from the file. Returns a dict with keys of the given
names and values unpaced according to the given format for each item in
"header".
"""
header_values = {}
for key, format in header:
header_values[key] = self.read_binary(format)
return header_values
def read_nullstring(self):
"""
Reads a null-terminated string from the file. This is not implemented
in an efficient manner for long strings!
"""
output_string = ''
char = self.read(1)
while char != '\x00':
output_string += char
char = self.read(1)
if len(char) == 0:
break
return output_string
def read_array(self, type, number):
"""
Read data from the file and return an array.array of the given
"type" with "number" elements
"""
size = struct.calcsize(type)
data = self.read(size * number)
return array.array(type, data)

Related

How to decode an Opaque data which has obtained by pysnmp?

I'm going to read data from an SNMP device by its OID via pysnmp library. However, I'm dealing with an error from Opaque type:
from pysnmp import hlapi
def construct_object_types(list_of_oids):
object_types = []
for oid in list_of_oids:
object_types.append(hlapi.ObjectType(hlapi.ObjectIdentity(oid)))
return object_types
def get(target, oids, credentials, port=161, engine=hlapi.SnmpEngine(),
context=hlapi.ContextData()):
handler = hlapi.getCmd(
engine,
credentials,
hlapi.UdpTransportTarget((target, port)),
context,
*construct_object_types(oids)
)
return fetch(handler, 1)[0]
def cast(value):
try:
return int(value)
except (ValueError, TypeError):
try:
return float(value)
except (ValueError, TypeError):
try:
return str(value)
except (ValueError, TypeError) as exc:
print(exc)
return value
def fetch(handler, count):
result = []
for i in range(count):
(error_indication, error_status,
error_index, var_binds) = next(handler)
if not error_indication and not error_status:
items = {}
print(var_binds)
for var_bind in var_binds:
items[str(var_bind[0])] = cast(var_bind[1])
result.append(items)
else:
raise RuntimeError(f'SNMP error: {error_indication}')
return result
print(get("192.168.100.112", [".1.3.6.1.4.1.9839.1.2.532.0",
'.1.3.6.1.4.1.9839.1.2.513.0'],
hlapi.CommunityData('public')))
Out:
[ObjectType(ObjectIdentity(<ObjectName value object, tagSet <TagSet object, tags 0:0:6>, payload [1.3.6.1.4.1.9839.1.2.532.0]>), <Opaque value object, tagSet <TagSet object, tags 64:0:4>, encoding iso-8859-1, payload [0x9f780441ccb646]>), ObjectType(ObjectIdentity(<ObjectName value object, tagSet <TagSet object, tags 0:0:6>, payload [1.3.6.1.4.1.9839.1.2.513.0]>), <Integer value object, tagSet <TagSet object, tags 0:0:2>, subtypeSpec <ConstraintsIntersection object, consts <ValueRangeConstraint object, consts -2147483648, 2147483647>>, payload [10]>)]
{'1.3.6.1.4.1.9839.1.2.532.0': '\x9fx\x04AÌ¶F', '1.3.6.1.4.1.9839.1.2.513.0': 10}
The first OID (.1.3.6.1.4.1.9839.1.2.532.0) returns an Opaque value (\x9fx\x04AÌ¶F) and I don't know how I can convert it to a float value. I should add that, that is a temperature value of 25.5°C.
In other words, how can I reach the following values by each other?
25.5
encoding iso-8859-1, payload [0x9f780441ccb646]
'\x9fx\x04AÌ¶F'

Your value 0x9f780441ccb646 can be
split into two floats, of which one is 25.589001, the other part is something else, or
the middle is the representation of the pysnmp object (__repr__) without a known interpretation (probably MIB missing), or
converted to a byte representation (with iso-8859-1 encoding) which is your string '\x9fx\x04AÌ¶F'.
So the data is there, it just needs to be extracted from the SNMP packet. The proper way would be to give the corresponding MIB entry to pysnmp.
Alternatively (answering your second question), the manual way of decoding the bytes can be done with the Python's struct module.
import struct
data = 0x9f780441ccb646 # this is what you got from pysnmp
thebytes = struct.pack("l", data)
print(thebytes.decode('latin1'))
print(thebytes)
print(struct.unpack("ff", thebytes))
gives
F¶ÌAx
b'F\xb6\xccA\x04x\x9f\x00'
(25.589000701904297, 1.4644897383138518e-38)
instead of unpacking to two floats, the MIB will tell you how the other data should be interpreted, so instead of unpack("ff",… you might want something else, check out the available format specifiers, for example "fhh" would give (25.589000701904297, 30724, 159).
EDIT:
TL;DR:
data = '\0\x9fx\x04AÌ¶F'
print("temperature: %f°C" % struct.unpack('>ff', data.encode('latin1'))[1])
temperature: 25.589001°C
To elaborate on the string representation: The bytes you see 'AÌ¶F' are in a reversed order than the ones in my print statement 'F¶ÌA' because of the different endianess. The byte order is already corrected in the int-converted data 0x9f780441ccb646 that you give in your output and I used in the conversion example. If you want to start from the encoded string, you first need to convert it back to the correct memory representation:
data = '\0\x9fx\x04AÌ¶F' # (initial '\0' is for filling the 8-bytes in correct alignment)
thebytes = data.encode('latin1')
But that's only half of the trick, because now the endianess is still wrong. Fortunately struct has the flags to correct for that. You can unpack in both byte-orders and choose the right one
print("unpacked little-endian: ", struct.unpack("<ff", thebytes))
print("unpacked big-endian: ", struct.unpack(">ff", thebytes))
unknown, temperature = struct.unpack(">ff", thebytes)
print("temperature: %f°C" % temperature)
giving
unpacked little-endian: (2.9225269119838333e-36, 23398.126953125)
unpacked big-endian: (1.4644897383138518e-38, 25.589000701904297)
temperature: 25.589001°C
The correct endianess of the opaque packet is either part of SNMP standard (then probably "network-byte order" '!' is the correct one), or should also be given in the MIB together with the correct field types which need to be given as format specifiers. If your packets are always 7-byte long, you might try a combination that adds to 7 bytes instead of 8 (ff = 4+4), then you can also omit adding the \0 padding byte.

According to the TerhorstD's asnwer plus some changes and knowing that the Opaque frame consists of 7 bytes in which the 3 bytes of those are constant (\x9fx\x04 or 159\120\4 in decimal), I wrote the following code snippet to deal with that problem:
...
handler = get("192.168.100.112", [".1.3.6.1.4.1.9839.1.2.532.0",
'.1.3.6.1.4.1.9839.1.2.513.0'],
hlapi.CommunityData('public'))
for key, value in handler.items():
try:
if len(value) == 7 and value[0].encode('latin1')[0] == 159\
and value[1].encode('latin1')[0] == 120\
and value[2].encode('latin1')[0] == 4:
data = value[3:]
print(struct.unpack('>f', data.encode('latin1'))[0])
else:
print(value)
except AttributeError:
print(value)
Out:
25.589001
10
[NOTE]:
Opaque is a little-endian format (> in struct).
[UPDATE]:
More wisely:
for key, value in handler.items():
try:
unpacked = struct.unpack('>BBBf', value.encode('latin1'))
if unpacked[:3] == (159,120,4):
'''Checking if data Opaque or not.'''
print(unpacked[-1])
else:
print(value)
except AttributeError:
print(value)

Storing and retrieving a single byte i2c address with strings

I am trying to store an i2c address, 0x3c, to a string to be stored in a text file that is later read. When reading the text file however, I cannot read the data from the string in the correct way, such that
value = string_read(text_file)
print(value == 0x3c)
would return true. How can I read a single byte stored in a string:
'0x3c'
into value so that the above code would return true?

See: https://stackoverflow.com/a/209550/9606335. Specifically, in your example, if you know your string is only "0x3c", then you can convert it to a numerical value by value = int("0x3c", 0). Now your expression should behave as you expect:
>>> print(int("0x3c", 0) == 0x3c)
True

Convert JSON to .ics (Python)

I am trying to convert a JSON file to an iCalendar file. My supervisor suggested using two functions convertTo(data) (which converts a JSON to a String) and convertFrom(data) (which converts a String to a JSON; I am not sure of the purpose of this function).
My current approach uses a lot of refactoring and multiple functions.
#returns a String
def __convert(data):
convStr = __convertTo(data)
convStr = __fields(convStr)
return convStr
#convert JSON to a String
def __convertTo(data):
str = "" + data
return str
#takes string arg (prev converted from JSON) to split it into useful info
def __fields(data)
#########
iCalStr = __iCalTemplate(title, dtStart, UID, remType, email)
return iCalStr
#
def __iCalTemplate(title, dtStart, UID, remType, email):
icsTempStr = "BEGIN:VEVENT\n
DTSTART:" + dtStart + "\nUID:" + UID + "\nDESCRIPTION:" + desc + "\nSUMMARY:" + title
if remType is not None
icsTempStr += "\nBEGIN:VALARM\nACTION:" + remType + "DESCRIPTION:This is an event reminder"
if remType is email
icsTempStr += "\nSUMMARY:Alarm notification\nATTENDEE:mailto:" + email
icsTempStr += "\nEND:VALARM"
return icsTempStr
Any hints or suggestions would be very helpful. I am fully aware that this code needs a LOT of work.

This isn't intended to be a complete answer, but as a longer tip.
There's a Python idiom that will be very helpful to you in building strings, especially potentially large ones. It's probably easier to see an example than explain:
>>> template = 'a value: {a}; b value: {b}'
>>> data = {'a': 'Spam', 'b': 'Eggs'}
>>> template.format(**data)
'a value: Spam; b value: Eggs'
This idiom has a number of advantages over string concatenation and could eliminate the need for a function altogether if you write the template correctly. Optional inserts could, for example, be given values of ''. Once you format your iCal template correctly, it's just a matter of retrieving the right data points from JSON... and if you name your template insert points the same as what you have in JSON, you might even be able to do that conversion in one step. With a bit of planning, your final answer could be something as simple as:
import json
template = 'full iCal template with {insert_point} spec goes here'
data = json.JSONDecoder().decode(your_json_data)
ical = template.format(**data)
To do a quick (and slightly different) interpreter example:
>>> import json
>>> decoder = json.JSONDecoder()
>>> json_example = '{"item_one" : "Spam", "item_two" : "Eggs"}'
>>> template = 'Item 1: {item_one}\nItem 2: {item_two}'
>>> print template.format(**decoder.decode(json_example))
Item 1: Spam
Item 2: Eggs

I ended up using a completely different, more efficient approach to accomplish this. In summary, my method traverses through a JSON, extracting each value from each field and manually places it in the appropriate place in an iCalendar template. It returns a string. Something like this...
def convert(self, json):
template = 'BEGIN:VEVENT\n'
template += 'DTSTART:%s\n' % json['event-start']
...
return template

How to use struct.pack when the data and the size to pack is undefined in advance

I need to dynamically generate a binary file from CSV file.
Example:
CSV file:
#size, #data
1 , 0xAB
2 , 1234 (0x04D2)
5 , "ascii" (0x6173636969)
1 , "\x23" (0x23)
Expected binary file:
'\xAB\x04\xD2\x61\x73\x63\x69\x69\x23'
The data can be a string, unsignedinteger or an hexadecimal value.
In my program i process as follow:
I read size/data data from CSV file
I use eval function to get data value
I use Struct.pack function to generate output data
The problem is how to use Struct.pack function in order to process string or value.
I tried this:
check isinstance(value, basestring) to handle string
but i dont know how to handle the unsigned value defined in hexadecimal (but i dont know how to specify the format type for special size eg: 5 Byte)
I am thinking about putting any value into a hexadecimal string ...
What is the simplest way to handle (string/unsigned value to defined sized binary output)

If you encounter a string, you just need to use encode to get a byte string from it. If you encounter a value, just try to convert it to an int in base 10 or 16 and then use struct.pack:
formats = {
1: "B",
2: "H",
4: "I",
8: "Q"
}
def handle_value (size, value):
try:
value = int(value)
except:
try:
value = int(value, 16)
except:
pass
if type(value) == str:
value = value[value.find('"') + 1, value.find('"') + 1 + size]
value = value.encode("ascii") # or whatever encoding you want
else:
value = struct.pack(">" + formats[size], value)
return value
Then to read the whole file:
output = bytes()
for line in files:
size, value = line.split(",")
size = int(size.strip())
value = value.strip()
output += handle_value(size, value)
Edit: I didn't notice you get the size from the CSV file, so you can infer the format you want from this size if the value is a int.

convert number to string

if I have a string s='ABCDEFJHI', and I slice it like this ['ABC','DEF','JHI'].
I have function encode(some calculation) which convert the sliced string into numbers.
for example 'encode('ABC' ) gives 50 , encode('DEF') gives 33, encode('JHI') gives 10
['ABC','DEF','JHI'] gives [50,33,10].
I want to do the reverse case, decode(50) gives 'ABC'
I have idea that when I encode sub-string , I create a library then I append sub-string with its number like: ('ABC':50)(do the same for all the sub-strings), later in decode I will just extract the sub-string according to the number.
How can I do this in python?

If it's reversible, I suggest to store it in reverse format (50: 'ABC'). And also, imagine a situation where the given code has not encoded before.
encode_history = {}
def encode(str):
"""some calculations which lead to the code"""
... your calculations ...
encode_history[code] = str
return code
def decode(code):
"""function to convert a code to string"""
if code in encode_history:
return encode_history[code]
else:
return None

In your encode function:
def encode(the_string):
#do whatever encoding you're doing
return (the_number,the_string)
and wherever you're using it, do:
d = dict()
for value in ["ABC","DEF","JHI"]:
encoded,decoded = encode(value)
d[encoded] = decoded
Define a function also like:
def decode(lookup_table,value):
return lookup_table[value]
and use it like:
encoded_values = list()
for value in ["ABC","DEF","JHI"]:
encoded,decoded = encode(value)
d[encoded] = decoded
encoded_values.append(encoded)
for value in encoded_values:
print("{} | {}".format(value,decode(d,value)))
[OUT]
50 | ABC
33 | DEF
10 | JHI
That said -- why are you doing this, how are you doing this, and why aren't you using some sort of real encryption for it? If it's not two-way encryption, you should almost certainly NOT be storing the data anywhere, and if it IS two-way encryption, why not just decrypt it using the opposite algorithm you used to encrypt? Just keep that in mind.....

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read in binary data after ascii header in Python - python

Does something like this work: with open('some_file','rb') as f: binary_data = f.read().split('\0',1)[1]

Related

How to decode an Opaque data which has obtained by pysnmp?

Storing and retrieving a single byte i2c address with strings

Convert JSON to .ics (Python)

How to use struct.pack when the data and the size to pack is undefined in advance

convert number to string

Categories

Resources