This question already has answers here:
Efficient way to swap bytes in python
(5 answers)
Closed 4 months ago.
I've created a buffer of words represented in little endian(Assuming each word is 2 bytes):
A000B000FF0A
I've separated the buffer to 3 words(2 bytes each)
A000
B000
FF0A
and after that converted to big endian representation:
00A0
00B0
0AFF
Is there a way instead of split into words to represent the buffer in big endian at once?
Code:
buffer='A000B000FF0A'
for i in range(0, len(buffer), 4):
value = endian(int(buffer[i:i + 4], 16))
def endian(num):
p = '{{:0{}X}}'.format(4)
hex = p.format(num)
bin = bytearray.fromhex(hex).reverse()
l = ''.join(format(x, '02x') for x in bin)
return int(l, 16)
Using the struct or array libraries are probably the easiest ways to do this.
Converting the hex string to bytes first is needed.
Here is an example of how it could be done:
from array import array
import struct
hex_str = 'A000B000FF0A'
raw_data = bytes.fromhex(hex_str)
print("orig string: ", hex_str.casefold())
# With array lib
arr = array('h')
arr.frombytes(raw_data)
# arr = array('h', [160, 176, 2815])
arr.byteswap()
array_str = arr.tobytes().hex()
print(f"Swap using array: ", array_str)
# With struct lib
arr2 = [x[0] for x in struct.iter_unpack('<h', raw_data)]
# arr2 = [160, 176, 2815]
struct_str = struct.pack(f'>{len(arr2) * "h"}', *arr2).hex()
print("Swap using struct:", struct_str)
Gives transcript:
orig string: a000b000ff0a
Swap using array: 00a000b00aff
Swap using struct: 00a000b00aff
You can use the struct to interpret your bytes as big or little endian. Then you can use the hex() method of the bytearray object to have a nice string representation.
Docs for struct.
import struct
# little endian
a = struct.pack("<HHH",0xA000,0xB000,0xFF0A)
# bih endian
b = struct.pack(">HHH",0xA000,0xB000,0xFF0A)
print(a)
print(b)
# convert back to string
print( a.hex() )
print( b.hex() )
Which gives:
b'\x00\xa0\x00\xb0\n\xff'
b'\xa0\x00\xb0\x00\xff\n'
00a000b00aff
a000b000ff0a
Related
I saved one data set(200 double data values) from Keil, it turns to be a .hex file with IntelHex format, I installed IntelHex in python and load it. Now the problem is I do not know how to interpret it, for example, this post
tells you to use dict, but it does not work for hex file including double data. my code:
from intelhex import IntelHex
ih = IntelHex() # create empty object
ih.loadhex('output.hex')
ihdict = ih.todict()
datastr = ""
startAddress = 536871952
while ihdict.get(startAddress) != None:
datastr += str("%0.2X" %ihdict.get(startAddress))
startAddress += 1
the file output.hex:
:020000042000DA
:0802A8003FB7809F5BC03F409F
:1002B000DFB56EEF5AB73F407F717CBF38BE3F401D
:1002C000DFD369EFE9B43F407F717CBF38BE3F4068
:1002D0003F895E9F44AF3F401F706A0F38B53F4073
:1002E0009F20584F10AC3F405F5F72AF2FB93F4027
:1002F000DFB56EEF5AB73F40DF5B7DEFADBE3F40ED
:10030000BFA364DF51B23F40DF62676FB1B33F40CC
:100310001F9E8C0F4FC63F405F0C6B2F86B53F4032
:10032000BF7542DF3AA13F403F4D689F26B43F4032
:100330009F2742CF13A13F40DF2D5BEF96AD3F409B
:100340009F915ACF48AD3F40DF874CEF43A63F40D7
:100350007FD2573FE9AB3F40FD721E7F398F3F4050
:10036000FF892FFFC4973F409D5311CFA9883F407D
:100370001F706A0F38B53F407F78663F3CB33F40FF
:100380001DFD148F7E8A3F401F954F8FCAA73F40A7
:100390005FC04D2FE0A63F401F0D3C8F069E3F40A3
:1003A0007F4A443F25A23F40DFE13DEFF09E3F40C2
:1003B0003F185C1F0CAE3F403F79379FBC9B3F40CE
:1003C000FF2F3EFF179F3F40DFBC586F5EAC3F40A2
:1003D000FD36287F1B943F403F3D419F9EA03F40FC
:1003E000FFFA317FFD983F409FF2354FF99A3F4029
:1003F0007D0511BF82883F40DF703B6FB89D3F4055
:10040000FF1143FF88A13F40DD60146F308A3F40F9
:100410001F49328F24993F407D230CBF11863F40F6
:100420009DBD29CFDE943F40BFED2EDF76973F4044
:10043000DDBA056FDD823F407D58183F2C8C3F4070
:100440007F3333BF99993F40DD9C0A6F4E853F4013
:100450007F3333BF99993F403DBC171FDE8B3F4030
:10046000BDF4185F7A8C3F403D16091F8B843F40D6
:100470003DE1FC9E707E3F40DD0D0DEF86863F40E6
:100480003F1F469F0FA33F403DFFF79EFF7B3F402E
:10049000DD42196FA18C3F40BDC6F65E637B3F40D5
:1004A0009D5AFB4EAD7D3F40FD4BE6FE25733F4020
:1004B0001D12D30E89693F403D8EF51EC77A3F401D
:1004C0003DBC171FDE8B3F409D7FE0CE3F703F401D
:1004D000BD6C055FB6823F40DD4903EFA4813F401C
:1004E000DD14F76E8A7B3F40DDF6FB6EFB7D3F40FF
:1004F000BD20E85E10743F40DD6EE86E37743F400B
:100500001DB1F78ED87B3F407DFCD33EFE693F4056
:100510009D4C274FA6933F407DD7EEBE6B773F4063
:10052000DDAADE6E556F3F401D55B38EAA593F4080
:100530009DF7CCCE7B663F40DDCFC3EEE7613F4009
:10054000BDF2C55EF9623F40BD7AD95EBD6C3F40E9
:100550005D90D82E486C3F40BD7AD95EBD6C3F405F
:100560003D67BD9EB35E3F403D0DCC9E06663F405D
:10057000BD88AD5EC4563F40BDA6A85E53543F4003
:10058000BDD4CA5E6A653F40FD95B0FE4A583F4003
:100590003DA3B39ED1593F405DF89D2EFC4E3F4098
:1005A000FD69E1FEB4703F40FD59BAFE2C5D3F404D
:1005B000BD63C8DE31643F407DB0B63E585B3F400E
:1005C0001DCD9F8EE64F3F405DEAC92EF5643F404A
:1005D000FD4993FEA4493F405D8E852EC7423F40B2
:1005E0007D4D88BE26443F401D3EA20E1F513F4018
:1005F0003D938C9E49463F40FD7E9F7EBF4F3F40CE
:10060000DDB1C8EE58643F40BD7F70DE3F383F40EB
:100610005DBCA72EDE533F409D4197CEA04B3F408F
:100620003D0B799E853C3F403D0B799E853C3F408C
:10063000BD7F70DE3F383F407D6B83BEB5413F409C
:10064000FD32827E19413F409D2A864E15433F4030
:10065000FDEFA1FEF7503F401DC4620E62313F40E6
:100660003D476F9EA3373F401D98930ECC493F40B6
:100670001D53608E29303F403DF4671EFA333F40E2
:100680003D048F1E82473F407D726D3EB9363F402C
:10069000FDF68B7EFB453F40FD5767FEAB333F4089
:1006A0005D7774AE3B3A3F40BD2C695E96343F4067
:1006B0009DF579CEFA3C3F40DD4C476EA6233F4086
:1006C0001D1E540E0F2A3F407DE36FBEF1373F40A1
:1006D000FD7C4C7E3E263F40DD8153EEC0293F40ED
:1006E0009D034ECE01273F409D1375CE893A3F4072
:1006F000DDC4336EE2193F407D444B3EA2253F40AE
:100700005DD84F2EEC273F407D0F3FBE871F3F40F7
:100710007DF143BEF8213F40BDE04B5EF0253F40F8
:100720005D8548AE42243F405DD84F2EEC273F40C8
:100730007D5B5CBE2D2E3F40FD40567E202B3F4012
:10074000BD514EDE28273F40FD2945FE94223F4003
:100750001DD9208E6C103F409DE552CE72293F403E
:100760003D91399EC81C3F407D16293E8B143F4069
:100770007D6930BE34183F409D9935CECC1A3F403C
:100780005DFD34AE7E1A3F409DB730CE5B183F40D2
:100790003DAF349E571A3F405D8C322E46193F4084
:1007A0003D81129E40093F405DC8282E64143F40A1
:1007B0007D34243E1A123F405DC13EAE601F3F4073
:1007C0003D2E0B1E97053F405D6E372EB71B3F40F9
:1007D0003D09269E04133F403DCD2F9EE6173F4026
:1007E000BD6F49DEB7243F403D5C2D1EAE163F4035
:1007F0001DE00A0E70053F403DBD089E5E043F406F
:100800009D890ECE44073F401D86190EC30C3F4004
:10081000BDD70EDE6B073F401DBB258EDD123F406E
:100820001DBB258EDD123F407D06023E03013F4089
:10083000BDD0245E68123F40FDA81B7ED40D3F4012
:100840005D14462E0A233F40FD12347E091A3F40B4
:100850009D180C4E0C063F401D681E0E340F3F4085
:100860001D510D8EA8063F403DD4191EEA0C3F4095
:100870001B94ED0DCAF63E405D40152EA00A3F4088
:100880009D64294EB2143F407DF82D3EFC163F403A
:100890001DD9208E6C103F405DE6232EF3113F40A2
:1008A0007DA526BE52133F40BDB913DEDC093F4093
:1008B0005D2904AE14023F40BBE5E2DD72F13E402B
:1008C0009D180C4E0C063F40BD84075EC2033F409E
:1008D0005D221A2E110D3F40FDC6167E630B3F4070
:0108E0009D7A
:00000001FF
Assuming the data represents a list of 64-bit floating point numbers that you want to decode, the process is to collect the appropriate number of octets and decode them as a double.
Reusing the structure you presented:
from intelhex import IntelHex
import struct
ih = IntelHex()
ih.loadhex('output.hex')
ihdict = ih.todict()
# Read all the data into a long list of int octets
data = []
startAddress = 536871952
while ihdict.get(startAddress) is not None:
data.append(ihdict.get(startAddress))
startAddress += 1
# slice the list into 8-byte bytearrays
bin_arr = [bytearray(data[n:n+8]) for n in range(0, len(data), 8)]
# unpack each bytearray as a double
# Filter for 8 byte arrays because len(data) is not divisible by 8.
# Is the data properly aligned?
doubles_list = [struct.unpack('d', b) for b in bin_arr if len(b) == 8]
print(doubles_list)
It may be worth mentioning that the above assumes a big endian byte ordering. I believe you can use < as part of the format definition to assume a little endian ordering. More information is available in the struct.unpack docs.
I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!
There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
54272
>>> int.from_bytes(b"\xd4\x00", 'little')
212
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.
I have a long bytearray
barray=b'\x00\xfe\x4b\x00...
What would be the best way to convert it to a list of 2-byte integers?
You can use the struct package for that:
from struct import unpack
tuple_of_shorts = unpack('h'*(len(barray)//2),barray)
This will produce signed shorts. For unsigned ones, use 'H' instead:
tuple_of_shorts = unpack('H'*(len(barray)//2),barray)
This produces on a little-endian machine for your sample input:
>>> struct.unpack('h'*(len(barray)//2),barray)
(-512, 75)
>>> struct.unpack('H'*(len(barray)//2),barray)
(65024, 75)
In case you want to work with big endian, or little endian, you can put a > (big endian) or < (little endian) in the specifications. For instance:
# Big endian
tuple_of_shorts = unpack('>'+'H'*(len(barray)//2),barray) # unsigned
tuple_of_shorts = unpack('>'+'h'*(len(barray)//2),barray) # signed
# Little endian
tuple_of_shorts = unpack('<'+'H'*(len(barray)//2),barray) # unsigned
tuple_of_shorts = unpack('<'+'h'*(len(barray)//2),barray) # signed
Generating:
>>> unpack('>'+'H'*(len(barray)//2),barray) # big endian, unsigned
(254, 19200)
>>> unpack('>'+'h'*(len(barray)//2),barray) # big endian, signed
(254, 19200)
>>> unpack('<'+'H'*(len(barray)//2),barray) # little endian, unsigned
(65024, 75)
>>> unpack('<'+'h'*(len(barray)//2),barray) # little endian, signed
(-512, 75)
Using the struct module:
import struct
count = len(barray)/2
integers = struct.unpack('H'*count, barray)
Depending on the endianness you may want to prepend a < or > for the unpacking format. And depending on signed/unsigned, it's h, or H.
Note, using the Python struct library to convert your array also allows you to specify a repeat count for each item in the format specifier. So 4H for example would be the same as using HHHH.
Using this approach avoids the need to create potentially massive format strings:
import struct
barray = b'\x00\xfe\x4b\x00\x4b\x00'
integers = struct.unpack('{}H'.format(len(barray)/2), barray)
print(integers)
Giving you:
(65024, 75, 75)
If memory efficiency is a concern, you may consider using an array.array:
>>> barr = b'\x00\xfe\x4b\x00'
>>> import array
>>> short_array = array.array('h', barr)
>>> short_array
array('h', [-512, 75])
This is like a space-efficient primitive array, with an OO-wrapper, so it supports sequence-type methods you would have on a list, like .append, .pop, and slicing!
>>> short_array[:1]
array('h', [-512])
>>> short_array[::-1]
array('h', [75, -512])
Also, recovering your bytes object becomes trivial:
>>> short_array
array('h', [-512, 75])
>>> short_array.tobytes()
b'\x00\xfeK\x00'
Note, if you want the opposite endianness from the native byte-order, use the in-place byteswap method:
>>> short_array.byteswap()
>>> short_array
array('h', [254, 19200])
I have a large string more than 256 bits and and I need to byte swap it by 32 bits. But the string is in a hexadecimal base. When I looked at numpy and array modules I couldnt find the right syntax as to how to do the coversion. Could someone please help me?
An example:(thought the data is much longer.I can use pack but then I would have to convert the little endian to decimal and then to big endian first which seems like a waste):
Input:12345678abcdeafa
Output:78563412faeacdab
Convert the string to bytes, unpack big-endian 32-bit and pack little-endian 32-bit (or vice versa) and convert back to a string:
#!python3
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
s = binascii.unhexlify(s)
a,b = struct.unpack('>LL',s)
s = struct.pack('<LL',a,b)
return binascii.hexlify(s)
print(convert(Input),Output)
Output:
b'78563412faeacdab' b'78563412faeacdab'
Generalized for any string with length multiple of 4:
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
if len(s) % 4 != 0:
raise ValueError('string length not multiple of 4')
s = binascii.unhexlify(s)
f = '{}L'.format(len(s)//4)
dw = struct.unpack('>'+f,s)
s = struct.pack('<'+f,*dw)
return binascii.hexlify(s)
print(convert(Input),Output)
If they really are strings, just do string operations on them?
>>> input = "12345678abcdeafa"
>>> input[7::-1]+input[:7:-1]
'87654321afaedcba'
My take:
slice the string in N digit chunks
reverse each chunk
concatenate everything
Example:
>>> source = '12345678abcdeafa87654321afaedcba'
>>> # small helper to slice the input in 8 digit chunks
>>> chunks = lambda iterable, sz: [iterable[i:i+sz]
for i in range(0, len(iterable), sz)]
>>> swap = lambda source, sz: ''.join([chunk[::-1]
for chunk in chunks(source, sz)])
Output asked in the original question:
>>> swap(source, 8)
'87654321afaedcba12345678abcdeafa'
It is easy to adapt in order to match the required output after icktoofay edit:
>>> swap(swap(source, 8), 2)
'78563412faeacdab21436587badcaeaf'
A proper implementation probably should check if len(source) % 8 == 0.
In Java, I can encode a BigInteger as:
java.math.BigInteger bi = new java.math.BigInteger("65537L");
String encoded = Base64.encodeBytes(bi.toByteArray(), Base64.ENCODE|Base64.DONT_GUNZIP);
// result: 65537L encodes as "AQAB" in Base64
byte[] decoded = Base64.decode(encoded, Base64.DECODE|Base64.DONT_GUNZIP);
java.math.BigInteger back = new java.math.BigInteger(decoded);
In C#:
System.Numerics.BigInteger bi = new System.Numerics.BigInteger("65537L");
string encoded = Convert.ToBase64(bi);
byte[] decoded = Convert.FromBase64String(encoded);
System.Numerics.BigInteger back = new System.Numerics.BigInteger(decoded);
How can I encode long integers in Python as Base64-encoded strings? What I've tried so far produces results different from implementations in other languages (so far I've tried in Java and C#), particularly it produces longer-length Base64-encoded strings.
import struct
encoded = struct.pack('I', (1<<16)+1).encode('base64')[:-1]
# produces a longer string, 'AQABAA==' instead of the expected 'AQAB'
When using this Python code to produce a Base64-encoded string, the resulting decoded integer in Java (for example) produces instead 16777472 in place of the expected 65537. Firstly, what am I missing?
Secondly, I have to figure out by hand what is the length format to use in struct.pack; and if I'm trying to encode a long number (greater than (1<<64)-1) the 'Q' format specification is too short to hold the representation. Does that mean that I have to do the representation by hand, or is there an undocumented format specifier for the struct.pack function? (I'm not compelled to use struct, but at first glance it seemed to do what I needed.)
Check out this page on converting integer to base64.
import base64
import struct
def encode(n):
data = struct.pack('<Q', n).rstrip('\x00')
if len(data)==0:
data = '\x00'
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def decode(s):
data = base64.urlsafe_b64decode(s + '==')
n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
return n[0]
The struct module:
… performs conversions between Python values and C structs represented as Python strings.
Because C doesn't have infinite-length integers, there's no functionality for packing them.
But it's very easy to write yourself. For example:
def pack_bigint(i):
b = bytearray()
while i:
b.append(i & 0xFF)
i >>= 8
return b
Or:
def pack_bigint(i):
bl = (i.bit_length() + 7) // 8
fmt = '<{}B'.format(bl)
# ...
And so on.
And of course you'll want an unpack function, like jbatista's from the comments:
def unpack_bigint(b):
b = bytearray(b) # in case you're passing in a bytes/str
return sum((1 << (bi*8)) * bb for (bi, bb) in enumerate(b))
This is a bit late, but I figured I'd throw my hat in the ring:
def inttob64(n):
"""
Given an integer returns the base64 encoded version of it (no trailing ==)
"""
parts = []
while n:
parts.insert(0,n & limit)
n >>= 32
data = struct.pack('>' + 'L'*len(parts),*parts)
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def b64toint(s):
"""
Given a string with a base64 encoded value, return the integer representation
of it
"""
data = base64.urlsafe_b64decode(s + '==')
n = 0
while data:
n <<= 32
(toor,) = struct.unpack('>L',data[:4])
n |= toor & 0xffffffff
data = data[4:]
return n
These functions turn an arbitrary-sized long number to/from a big-endian base64 representation.
Here is something that may help. Instead of using struct.pack() I am building a string of bytes to encode and then calling the BASE64 encode on that. I didn't write the decode, but clearly the decode can recover an identical string of bytes and a loop could recover the original value. I don't know if you need fixed-size integers (like always 128-bit) and I don't know if you need Big Endian so I left the decoder for you.
Also, encode64() and decode64() are from #msc's answer, but modified to work.
import base64
import struct
def encode64(n):
data = struct.pack('<Q', n).rstrip('\x00')
if len(data)==0:
data = '\x00'
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
def decode64(s):
data = base64.urlsafe_b64decode(s + '==')
n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
return n[0]
def encode(n, big_endian=False):
lst = []
while True:
n, lsb = divmod(n, 0x100)
lst.append(chr(lsb))
if not n:
break
if big_endian:
# I have not tested Big Endian mode, and it may need to have
# some initial zero bytes prepended; like, if the integer is
# supposed to be a 128-bit integer, and you encode a 1, you
# would need this to have 15 leading zero bytes.
initial_zero_bytes = '\x00' * 2
data = initial_zero_bytes + ''.join(reversed(lst))
else:
data = ''.join(lst)
s = base64.urlsafe_b64encode(data).rstrip('=')
return s
print encode(1234567890098765432112345678900987654321)