Big Binary Code into File in Python - python

I have been working on a program and I have been trying to convert a big binary file (As a string) and pack it into a file. I have tried for days to make such thing possible. Here is the code I had written to pack the large binary string.
binaryRecieved="11001010101....(Shortened)"
f=open(fileName,'wb')
m=long(binaryRecieved,2)
struct.pack('i',m)
f.write(struct.pack('i',m))
f.close()
quit()
I am left with the error
struct.pack('i',x)
struct.error: integer out of range for 'i' format code
My integer is out of range, so I was wondering if there is a different way of going about with this.
Thanks

Convert your bit string to a byte string: see for example this question Converting bits to bytes in Python. Then pack the bytes with struct.pack('c', bytestring)

For encoding m in big-endian order (like "ten" being written as "10" in normal decimal use) use:
def as_big_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
out.reverse()
return out
For encoding m in little-endian order (like "ten" being written as "01" in normal decimal use) use:
def as_little_endian_bytes(i):
out=bytearray()
while i:
out.append(i&0xff)
i=i>>8
return out
both functions work on numbers - like you do in your question - so the returned bytearray may be shorter than expected (because for numbers leading zeroes do not matter).
For an exact representation of a binary-digit-string (which is only possible if its length is dividable by 8) you would have to do:
def as_bytes(s):
assert len(s)%8==0
out=bytearray()
for i in range(0,len(s)-8,8):
out.append(int(s[i:i+8],2))
return out

In struct.pack you have used 'i' which represents an integer number, which is limited. As your code states, you have a long output; thus, you may want to use 'd' in stead of 'i', to pack your data up as double. It should work.
See Python struct for more information.

Related

How to convert from a decimal to hex in python without using hex()?

I am new here, so please excuse any mistakes I may have made:)
I have been trying to send hex numbers over a virtual serial port pair using Python3 before I can test it on an actual device. However, the only ways to work with hex numbers I have found so far are:
a) Use them as a regular string
num_hex = input()
But this does not allow me to work on the numbers, as num_hex is a string
.
b) Convert them using int(,16)
ip_hex = input()
num_ip_hex = int(ip_hex, 16)
print(ip_hex, num_ip_hex, hex(num_ip_hex))
When used here num_ip_hex just store numbers in the form of base 10. For example the output for the print statement with input 'a' is
input[]: a
output[]: a 10 0xa
.
c) Use hex() and then use them
ip = input(">> ")
ip=int(ip, 16)
ip=hex(ip)
Again, this also gives a string.
I need a way to receive hex numbers and to be able to work with them further in that exact same way, not as strings or decimals. Is this possible?
EDIT: In short some form of hex that i can work with to like add, subtract, shift left etc.
I think the closest you can get is storing data as bytes. Bytes actually have a built in method in python .hex() so you can always see a hex representation of it.
my_bytes = b'some words'
my_bytes.hex() #'736f6d6520776f726473'
If you are sending the data raw as bytes, you could then do a direct comparison without worrying about hex at all. However, if you want to still send the hex as a string, you will need binascii.unhexlify()
import binascii
binascii.unhexlify('736f6d6520776f726473') # b'some words'
Though it is generally preferred to send bytes, as then you do not have to worry about encoding and other issues.
Hope this helps!
Edit: Wanted to add dealing directly with the code you provided, it would look something like:
comparable_bytes = b'verify_me'
comparable_hex = '7665726966795f6d65'
ip_hex = binascii.unhexlify(input('>> ')) # Input the Hex numbers
assert ip_hex == comparable_bytes
assert ip_hex.hex() == comparable_hex
Edit 2: Multiple hex character input
# Remove whitespace, allow for entry with or without spaces
ip = input().strip().replace(' ', '')
bytestring = binascii.unhexlify(ip)
Then you can directly send bytestring.

Alternative ways for binary conversion in python

I often need to convert status code to bit representation in order to determine what error/status are active on analyzers using plain-text or binary communication protocol.
I use python to poll data and to parse it. Sometime I really get confuse because I found that there is so many ways to solve a problem. Today I had to convert a string where each character is an hexadecimal digit to its binary representation. That is, each hexadecimal character must be converted into 4 bits, where the MSB start from left. Note: I need a char by char conversion, and leading zero.
I managed to build these following function which does the trick in a quasi one-liner fashion.
def convertStatus(s, base=16):
n = int(math.log2(base))
b = "".join(["{{:0>{}b}}".format(n).format(int(x, base)) for x in s])
return b
Eg., this convert the following input:
0123456789abcdef
into:
0000000100100011010001010110011110001001101010111100110111101111
Which was my goal.
Now, I am wondering what another elegant solutions could I have used to reach my goal? I also would like to better understand what are advantages and drawbacks among solutions. The function signature can be changed, but usually it is a string for input and output. Lets become imaginative...
This is simple in two steps
Converting a string to an int is almost trivial: use int(aString, base=...)
the first parameter is can be a string!
and with base, almost every option is possible
Converting a number to a string is easy with format() and the mini print language
So converting hex-strings to binary can be done as
def h2b(x):
val = int(x, base=16)
return format(val, 'b')
Here the two steps are explicitly. Possible it's better to do it in one line, or even in-line

Represent string as an integer in python

I would like to be able to represent any string as a unique integer (means every integer in the world could mean only one string, and a certain string would result constantly in the same integer).
The obvious point is, that's how the computer works, representing the string 'Hello' (for example) as a number for each character, specifically a byte (assuming ASCII encoding).
But... I would like to perform arithmetic calculations over that number (Encode it as a number using RSA).
The reason this is getting messy is because assuming I have a bit larger string 'I am an average length string' I have more characters (29 in this case), and an integer with 29 bytes could come up HUGE, maybe too much for the computer to handle (when coming up with bigger strings...?).
Basically, my question is, how could I do? I wouldn't like to use any module for RSA, it's a task I would like to implement myself.
Here's how to turn a string into a single number. As you suspected, the number will get very large, but Python can handle integers of any arbitrary size. The usual way of working with encryption is to do individual bytes all at once, but I'm assuming this is only for a learning experience. This assumes a byte string, if you have a Unicode string you can encode to UTF-8 first.
num = 0
for ch in my_string:
num = num << 8 + ord(ch)

Writing binary data to a file in Python

I am trying to write data (text, floating point data) to a file in binary, which is to be read by another program later. The problem is that this program (in Fort95) is incredibly particular; each byte has to be in exactly the right place in order for the file to be read correctly. I've tried using Bytes objects and .encode() to write, but haven't had much luck (I can tell from the file size that it is writing extra bytes of data). Some code I've tried:
mgcnmbr='42'
bts=bytes(mgcnmbr)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()
I've also tried:
mgcnmbr='42'
bts=mgcnmbr.encode(utf_32_le)
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(bts)
test_file.close()
To clarify, what I need is the integer value 42, written as a 4 byte binary. Next, I would write the numbers 1 and 0 in 4 byte binary. At that point, I should have exactly 12 bytes. Each is a 4 byte signed integer, written in binary. I'm pretty new to Python, and can't seem to get it to work out. Any suggestions? Soemthing like this? I need complete control over how many bytes each integer (and later, 4 byte floating point ) is.
Thanks
You need the struct module.
import struct
fout = open('test.dat', 'wb')
fout.write(struct.pack('>i', 42))
fout.write(struct.pack('>f', 2.71828182846))
fout.close()
The first argument in struct.pack is the format string.
The first character in the format string dictates the byte order or endianness of the data (Is the most significant or least significant byte stored first - big-endian or little-endian). Endianness varies from system to system. If ">" doesn't work try "<".
The second character in the format string is the data type. Unsurprisingly the "i" stands for integer and the "f" stands for float. The number of bytes is determined by the type. Shorts or "h's" for example are two bytes long. There are also codes for unsigned types. "H" corresponds to an unsigned short for instance.
The second argument in struct.pack is of course the value to be packed into the bytes object.
Here's the part where I tell you that I lied about a couple of things. First I said that the number of bytes is determined by the type. This is only partially true. The size of a given type is technically platform dependent as the C/C++ standard (which the struct module is based on) merely specifies minimum sizes. This leads me to the second lie. The first character in the format string also encodes whether the standard (minimum) number of bytes or the native (platform dependent) number of bytes is to be used. (Both ">" and "<" guarantee that the standard, minimum number of bytes is used which is in fact four in the case of an integer "i" or float "f".) It additionally encodes the alignment of the data.
The documentation on the struct module has tables for the format string parameters.
You can also pack multiple primitives into a single bytes object and realize the same result.
import struct
fout = open('test.dat', 'wb')
fout.write(struct.pack('>if', 42, 2.71828182846))
fout.close()
And you can of course parse binary data with struct.unpack.
Assuming that you want it in little-endian, you could do something like this to write 42 in a four byte binary.
test_file=open(PATH_HERE/test_file.dat','ab')
test_file.write(b'\xA2\0\0\0')
test_file.close()
A2 is 42 in hexadecimal, and the bytes '\xA2\0\0\0' makes the first byte equal to 42 followed by three empty bytes. This code writes the byte: 42, 0, 0, 0.
Your code writes the bytes to represent the character '4' in UTF 32 and the bytes to represent 2 in UTF 32. This means it writes the bytes: 52, 0, 0, 0, 50, 0, 0, 0, because each character is four bytes when encoded in UTF 32.
Also having a hex editor for debugging could be useful for you, then you could see the bytes that your program is outputting and not just the size.
In my problem Write binary string in binary file Python 3.4 I do like this:
file.write(bytes(chr(int(mgcnmbr)), 'iso8859-1'))

How is the conversion in the python struct module done?

I need to unpack information in python from a C Structure,
doing it by the following code:
struct.unpack_from('>I', file.read(4))[0]
and afterwards, writing changed values back:
new_value = struct.pack('>I', 008200)
file.write(new_value)
a few examples:
008200 returns an syntaxerror: invalid token.
000010 is written into: 8
000017 is written into: 15
000017 returns a syntaxerror.
I have no idea what kind of conversion that is.
Any kind of help would be great.
This is invalid python code and is not related to the struct module. In python, numbers starting with a zero are octal (base 8). So, python tries to decode 008200 in octal but '8' isn't valid. Assuming you wanted decimal, use 8200. If you wanted hex, use 0x8200.

Categories