Python - write long string of bits inside a binary file - python

I've a string composed of ~75 000 bits (very long string).
I would like to create a binary file which is represented by this sequence of bits.
I did the following code :
byte_array = bytearray(global_bits_str.encode())
with open('file1.bin', 'wb') as f:
f.write(byte_array)
But when I check file1.bin I can see that it's composed of 75 000 bytes instead of 75 000 bits. I guess it has been encoded in ascii (1 byte per bit) in the file.
Any suggestions ?

You can use the int builtin to convert your binary string into a sequence of integers, then pass that to bytearray.
Example for one byte:
>>> int('10101010', 2)
170
>>> bytearray([170])
bytearray(b'\xaa')
Splitting the string:
chunks = [bit_string[n:n+8] for n in range(0, len(bit_string), 8)]
You'll have to do some special casing for the last chunk since it may not be a full byte, which can be done by ljust to left-pad it with zeros.
Putting it together:
def to_bytes(bits, size=8, pad='0'):
chunks = [bits[n:n+size] for n in range(0, len(bits), size)]
if pad:
chunks[-1] = chunks[-1].ljust(size, pad)
return bytearray([int(c, 2) for c in chunks]
# Usage:
byte_array = to_bytes(global_bits_str)

Related

Python string with byte array to byte array

ciphertext = base64.b64decode(xxxxxx) //output is b'148,240,50,66,81,26,240,2,101,31'
bytearray(ciphertext) // output is bytearray(b'148,240,50,66,81,26,240,2,101,31')
What am looking for is output of bytearray([148,240,50,66,81,26,240,2,101,31])
Full code:
ciphertext = base64.b64decode("MTQ4LDI0MCw1MCw2Niw4MSwyNiwyNDAsMiwxMDEsMzEsMjM3LDEwMSw4OCwxODQsMTQsMTM1LDEzMCw0Miw0NywxODksMTkyLDE1MSw0OCwyMjQsMTU1LDQxLDM5LDE0MywyMDksMTA0LDE5NywyMywxMDUsMjMsMTYzLDUzLDQsMTQ0LDE2MSwxNDgsMjMwLDI1NCwxMzQsMjEzLDE3NCwyNDcsMTkxLDUyLDY0LDE2LDYzLDk0LDE1NCwxMzMsMzksMTMzLDIyNCwxODcsMTE0LDE1OCwyMzksMzUsMTUxLDM4LDE3NSwxNTIsOTksMTAyLDIxNCwyNTEsMTk0LDIxMywxNzMsMTc0LDcyLDIyNSwyMDIsMTcyLDE1NCw4OCwxMzksMTE1LDIzNywyMzYsMTIxLDAsMjE0LDIxNiwxOTYsNDAsMzgsMjA0LDgzLDEzNiwxNjAsMTczLDY5LDcsMzgsMjI1LDExOCw0OSw0OCw3MCwxNjYsMTIxLDI0NSwxOTEsMTgzLDEyMiwxOTksMTg3LDgsNDMsNDUsOTMsMTI0LDIxNSwxNjEsNzAsMjU0LDI2LDE4OCwxMywyMjYsMTMxLDMsNCw0MywxOTgsMjEyLDEwMywxMTcsMjE1LDEyNywyNDMsMzksNzIsNzYsMTE0LDUwLDE5Niw1NSwxMjEsODYsMjUxLDUzLDI0MiwzMCwxMDksNDcsMjEwLDI1MywxNjMsOTAsOTgsMTQsNjAsMTE1LDc1LDE0OSwyMTAsMTc1LDI2LDEyNCwyMjgsMjQ3LDIwLDIwMyw5NiwyMTAsMjYsODEsNjUsMTg4LDEyMSwxMjgsOTEsMTA3LDE2OCwxMywyMDcsMTc1LDE3MCwyNTUsMjM2LDE0OSwxMDksNTksMjQsMTcyLDExLDU4LDEzLDAsMTUyLDExNiwxMTAsMTExLDIyLDIzMSwzLDIzNyw0Miw4MSw3Nyw2MywyMjMsMTAzLDEwOSw1NiwxNTgsNDMsMjA2LDIwMiwzOCwxNDgsMTM3LDE4OSwyMTQsMjE2LDkwLDE4LDIyNCwyNTQsMzcsMTA5LDE4LDg0LDIyMiwyMDksMjUsNTMsMjE5LDE2OSwyMTEsNTAsMTgyLDQwLDExMiwyMDksMzEsNTIsMjEsNTMsOTgsMTIyLDI1NCwxMDgsMzksMzgsMTM0LDE1MCwxMzksMTk0LDMw=")
Replace:
bytearray(ciphertext)
with:
bytearray(map(int, ciphertext.split(b',')))
# Or if you prefer genexprs:
bytearray(int(x) for x in ciphertext.split(b','))
The former is just converting the raw bytes to an equivalent bytearray, the latter splits it up by commas and parses the components as ints.

Loss of size of numbers when processing in hex format

Faced a problem while processing hex numbers.
When running str (hex ()) from a file, the zeros after 0x... disappear.
At the entrance:
0x0000000000000000000000000000000000000000000000000000000000000001f01f80f12f7cf16638f7a8074d46fe2f421a73432b1441a01ed3dd883c68acad
0x00000000000000000000000000000000000000000000000000000000000000029a799033fc54073346f870c15c9836f6b2e9eccdb85f09d29a8ddc90dc3a8ef1
0x00000000000000000000000000000000000000000000000000000000000000033e561483073e429ec25c09c99de2a81d5a34a539ad2dbb688af6b6f5f24936a4
On exit:
0x1f01f80f12f7cf16638f7a8074d46fe2f421a73432b1441a01ed3dd883c68acad
0x29a799033fc54073346f870c15c9836f6b2e9eccdb85f09d29a8ddc90dc3a8ef1
0x33e561483073e429ec25c09c99de2a81d5a34a539ad2dbb688af6b6f5f24936a4
Code:
with open("data.txt", "r") as file:
for line in file:
L = int(line, 0)
R = str(hex(L))
print(R)
What needs to be fixed in the code? I need one size of numbers and no loss of zeros.
Use string formatting:
# means put 0x on the front for hex numbers.
0130 means the fields is 130 characters long, the leading zero means pad with zeros.
x means hexadecimal (lowercase a-f).
line = '0x0000000000000000000000000000000000000000000000000000000000000001f01f80f12f7cf16638f7a8074d46fe2f421a73432b1441a01ed3dd883c68acad'
print(line) # as read from file
integer = int(line, 0)
formatted = f'{integer:#0130x}'
print(formatted)
print(formatted == line) # check that original and re-formatted are the same
Output:
0x0000000000000000000000000000000000000000000000000000000000000001f01f80f12f7cf16638f7a8074d46fe2f421a73432b1441a01ed3dd883c68acad
0x0000000000000000000000000000000000000000000000000000000000000001f01f80f12f7cf16638f7a8074d46fe2f421a73432b1441a01ed3dd883c68acad
True

Best way to output a binary sequence in Python

I have a binary sequence, for example: 10010111010101. I need to output this sequence to a file and then read it later but I want it to be compressed as much as possible, what is the easiest way to do this?
I have tried to take every 8 bits (byte) in the sequence together and output the byte value and then when I read it later, I cut it bit by bit, is there an easier way? or a module that does this readily?
The best textual encoding for binary data is either base64 or ascii85.
ASCII85
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.a85encode(data).decode('utf-8'))
Base64
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.b64encode(data).decode('utf-8'))
WARNING: Typically sys.byteorder is little-endian, so you might run into problems when you try to load up the file.

How to append a list of Hex to one Hex number

I have a list of hex bytes strings like this
['0xe1', '0xd7', '0x7', '0x0']
(as read from a binary file)
I want to flip the list and append the list together to create one hex number,
['0x07D7E1']
How do I format the list to this format?
Concatenate your hex numbers into one string:
'0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
This does include the 00 byte explicitly in the output:
>>> inputlist = ['0xe1', '0xd7', '0x7', '0x0']
>>> '0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
'0x0007D7E1'
However, I'd look into reading your binary file format better; using struct for example to unpack bytes directly from the file into proper integers in the right byte order:
>>> import struct
>>> bytes = ''.join([chr(int(c, 16)) for c in inputlist])
>>> value = struct.unpack('<I', bytes)[0]
>>> print hex(value)
0x7d7e1

Converting from hex to binary without losing leading 0's python

I have a hex value in a string like
h = '00112233aabbccddee'
I know I can convert this to binary with:
h = bin(int(h, 16))[2:]
However, this loses the leading 0's. Is there anyway to do this conversion without losing the 0's? Or is the best way to do this just to count the number of leading 0's before the conversion then add it in afterwards.
I don't think there is a way to keep those leading zeros by default.
Each hex digit translates to 4 binary digits, so the length of the new string should be exactly 4 times the size of the original.
h_size = len(h) * 4
Then, you can use .zfill to fill in zeros to the size you want:
h = ( bin(int(h, 16))[2:] ).zfill(h_size)
This is actually quite easy in Python, since it doesn't have any limit on the size of integers. Simply prepend a '1' to the hex string, and strip the corresponding '1' from the output.
>>> h = '00112233aabbccddee'
>>> bin(int(h, 16))[2:] # old way
'1000100100010001100111010101010111011110011001101110111101110'
>>> bin(int('1'+h, 16))[3:] # new way
'000000000001000100100010001100111010101010111011110011001101110111101110'
Basically the same but padding to 4 bindigits each hexdigit
''.join(bin(int(c, 16))[2:].zfill(4) for c in h)
A newbie to python such as I would proceed like so
datastring = 'HexInFormOfString'
Padding to accommodate preceding zeros if any, when python converts string to Hex.
datastrPadded = 'ffff' + datastring
Convert padded value to binary.
databin = bin(int(datastrPadded,16))
Remove 2bits ('0b') that python adds to denote binary + 16 padded bits .
databinCrop = databin[18:]
This converts a hex string into a binary string. Since you want the length to be dependent on the original, this may be what you want.
data = ""
while len(h) > 0:
data = data + chr(int(h[0:2], 16))
h = h[2:]
print h
I needed integer as input and pure hex/bin strings out with the prefixs '0b' and '0x' so my general solution is this:
def pure_bin(data, no_of_bits=NO_OF_BITS):
data = data + 2**(no_of_bits)
return bin(data)[3:]
def pure_hex(data, no_of_bits=NO_OF_BITS):
if (no_of_bits%4) != 0:
no_of_bits = 4*int(no_of_bits / 4) + 4
data = data + 2**(no_of_bits)
return hex(data)[3:]
hexa = '91278c4bfb3cbb95ffddc668d995bfe0'
binary = bin(int(hexa, 16))[2:]
print binary
hexa_dec = hex(int(binary, 2))[2:]
print hexa_dec

Categories