How to interpret (read) signed 24bits data from 32bits - python

I have raw file containing signed 24bits data packed into 32bits
example:
00 4D 4A FF
00 FF FF FF
I would like read those data and get signed integer between [-2^23 and 2^23-1]
for now I write
int32_1 = file1.read(4)
val1 = (( unpack('=l', int32_1)[0] & 0xFFFFFF00)>>8
but how to take the 2-complement into account to interpret 00FFFFFF as -1 ?

Your code is making things more complicated than they need to be. However, you really should specify the endian type correctly in the unpack format string.
from binascii import hexlify
from struct import unpack
data = ('\x00\x03\x02\x01', '\x00\x4D\x4A\xFF', '\x00\xFF\xFF\xFF')
for b in data:
i = unpack('<l', b)[0] >> 8
print hexlify(b), i
output
00030201 66051
004d4aff -46515
00ffffff -1
FWIW, here's a version that works in Python 3 or Python 2; the output is slightly different in Python 3, since normal strings in Python 3 are Unicode; byte strings are "special".
from __future__ import print_function
from binascii import hexlify
from struct import unpack
data = (b'\x00\x03\x02\x01', b'\x00\x4D\x4A\xFF', b'\x00\xFF\xFF\xFF')
for b in data:
i = unpack('<l', b)[0] >> 8
print(hexlify(b), i)
Python 3 output
b'00030201' 66051
b'004d4aff' -46515
b'00ffffff' -1
And here's a version that only runs on Python 3:
from binascii import hexlify
data = (b'\x00\x03\x02\x01', b'\x00\x4D\x4A\xFF', b'\x00\xFF\xFF\xFF')
for b in data:
i = int.from_bytes(b[1:], 'little', signed=True)
print(hexlify(b), i)

you can shift 8 bits to the left, take the result as a signed 32bit integer (use ctypes library), and divide by 256
>>> import ctypes
>>> i = 0x00ffffff
>>> i
16777215
>>> i<<8
4294967040
>>> ctypes.c_int32(i<<8).value
-256
>>> ctypes.c_int32(i<<8).value//256
-1

Related

Little to big endian buffer at once python [duplicate]

This question already has answers here:
Efficient way to swap bytes in python
(5 answers)
Closed 4 months ago.
I've created a buffer of words represented in little endian(Assuming each word is 2 bytes):
A000B000FF0A
I've separated the buffer to 3 words(2 bytes each)
A000
B000
FF0A
and after that converted to big endian representation:
00A0
00B0
0AFF
Is there a way instead of split into words to represent the buffer in big endian at once?
Code:
buffer='A000B000FF0A'
for i in range(0, len(buffer), 4):
value = endian(int(buffer[i:i + 4], 16))
def endian(num):
p = '{{:0{}X}}'.format(4)
hex = p.format(num)
bin = bytearray.fromhex(hex).reverse()
l = ''.join(format(x, '02x') for x in bin)
return int(l, 16)
Using the struct or array libraries are probably the easiest ways to do this.
Converting the hex string to bytes first is needed.
Here is an example of how it could be done:
from array import array
import struct
hex_str = 'A000B000FF0A'
raw_data = bytes.fromhex(hex_str)
print("orig string: ", hex_str.casefold())
# With array lib
arr = array('h')
arr.frombytes(raw_data)
# arr = array('h', [160, 176, 2815])
arr.byteswap()
array_str = arr.tobytes().hex()
print(f"Swap using array: ", array_str)
# With struct lib
arr2 = [x[0] for x in struct.iter_unpack('<h', raw_data)]
# arr2 = [160, 176, 2815]
struct_str = struct.pack(f'>{len(arr2) * "h"}', *arr2).hex()
print("Swap using struct:", struct_str)
Gives transcript:
orig string: a000b000ff0a
Swap using array: 00a000b00aff
Swap using struct: 00a000b00aff
You can use the struct to interpret your bytes as big or little endian. Then you can use the hex() method of the bytearray object to have a nice string representation.
Docs for struct.
import struct
# little endian
a = struct.pack("<HHH",0xA000,0xB000,0xFF0A)
# bih endian
b = struct.pack(">HHH",0xA000,0xB000,0xFF0A)
print(a)
print(b)
# convert back to string
print( a.hex() )
print( b.hex() )
Which gives:
b'\x00\xa0\x00\xb0\n\xff'
b'\xa0\x00\xb0\x00\xff\n'
00a000b00aff
a000b000ff0a

Base 64 decode from raw binary

I am trying to decode base64 from raw binary:
As input, I have 4 6-bit values
010000 001010 000000 011001
which I convert to decimal, giving
16 10 0 25
and finally decode using the base 64 table, giving
Q K A Z
This is verified to be the correct result.
I would like to use Python's base64 module to automate this, but using
import base64
base64.b64decode( bytearray([16,10,0,25]) )
returns an empty string.
What is the proper way to use this library with the given inputs?
[16, 10, 0, 25] isn't a base64 string, really; I don't think base64 has any functions for converting numeric representations of the base64 alphabet to their alphabetic representations. It's not difficult to roll your own, though:
def to_characters(numeric_arr):
target = b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' + b'abcdefghijklmnopqrstuvwxyz' + b'0123456789' + b'+/'
return bytes(target[n] for n in numeric_arr)
Then:
>>> to_characters(bytearray([16, 10, 0, 25]))
b'QKAZ'
>>> to_characters([16, 10, 0, 25]) # <- or just this
b'QKAZ'
You can now pass this bytes object to base64.b64decode:
>>> base64.b64decode(b'QKAZ')
b'#\xa0\x19'
(Note that you had a syntax issue in your example use of bytearray - don't do bytearray[...]; do bytearray([...]). Python doesn't use C-like int array[size] syntax.)

How to byteswap 32bit integers inside a string in python?

I have a large string more than 256 bits and and I need to byte swap it by 32 bits. But the string is in a hexadecimal base. When I looked at numpy and array modules I couldnt find the right syntax as to how to do the coversion. Could someone please help me?
An example:(thought the data is much longer.I can use pack but then I would have to convert the little endian to decimal and then to big endian first which seems like a waste):
Input:12345678abcdeafa
Output:78563412faeacdab
Convert the string to bytes, unpack big-endian 32-bit and pack little-endian 32-bit (or vice versa) and convert back to a string:
#!python3
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
s = binascii.unhexlify(s)
a,b = struct.unpack('>LL',s)
s = struct.pack('<LL',a,b)
return binascii.hexlify(s)
print(convert(Input),Output)
Output:
b'78563412faeacdab' b'78563412faeacdab'
Generalized for any string with length multiple of 4:
import binascii
import struct
Input = b'12345678abcdeafa'
Output = b'78563412faeacdab'
def convert(s):
if len(s) % 4 != 0:
raise ValueError('string length not multiple of 4')
s = binascii.unhexlify(s)
f = '{}L'.format(len(s)//4)
dw = struct.unpack('>'+f,s)
s = struct.pack('<'+f,*dw)
return binascii.hexlify(s)
print(convert(Input),Output)
If they really are strings, just do string operations on them?
>>> input = "12345678abcdeafa"
>>> input[7::-1]+input[:7:-1]
'87654321afaedcba'
My take:
slice the string in N digit chunks
reverse each chunk
concatenate everything
Example:
>>> source = '12345678abcdeafa87654321afaedcba'
>>> # small helper to slice the input in 8 digit chunks
>>> chunks = lambda iterable, sz: [iterable[i:i+sz]
for i in range(0, len(iterable), sz)]
>>> swap = lambda source, sz: ''.join([chunk[::-1]
for chunk in chunks(source, sz)])
Output asked in the original question:
>>> swap(source, 8)
'87654321afaedcba12345678abcdeafa'
It is easy to adapt in order to match the required output after icktoofay edit:
>>> swap(swap(source, 8), 2)
'78563412faeacdab21436587badcaeaf'
A proper implementation probably should check if len(source) % 8 == 0.

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

how to write integer number in particular no of bytes in python ( file writing)

assume i have to store few integer numbers like 1024 or 512 or 10240 or 900000 in a file, but the condition is that i can consume only 4 bytes (not less nor max).but while writing a python file using write method it stored as "1024" or "512" or "10240" ie they written as ascii value but i want to store directly their binary value.
Any help will really appreciable.
use the struct module
>>> import struct
>>> struct.pack("i",1024)
'\x00\x04\x00\x00'
>>> struct.pack("i",10240)
'\x00(\x00\x00'
>>> struct.pack("i",900000)
'\xa0\xbb\r\x00'
In Python3, it you can use the to_bytes method of int. The paren around 1024 are only necessary as 1024. parses as a float and would cause a syntax error.
>>> (1024).to_bytes(4, "big")
b'\x00\x00\x04\x00'
>>> (1024).to_bytes(4, "little")
b'\x00\x04\x00\x00'
The struct module will do
>>> import struct
>>> f = open('binary.bin','wb')
>>> f.write(struct.pack("l",1024))
>>> f.close()
vinko#parrot:~$ xxd -b binary.bin
0000000: 00000000 00000100 00000000 00000000 ....

Categories