Best way to output a binary sequence in Python - python

I have a binary sequence, for example: 10010111010101. I need to output this sequence to a file and then read it later but I want it to be compressed as much as possible, what is the easiest way to do this?
I have tried to take every 8 bits (byte) in the sequence together and output the byte value and then when I read it later, I cut it bit by bit, is there an easier way? or a module that does this readily?

The best textual encoding for binary data is either base64 or ascii85.
ASCII85
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.a85encode(data).decode('utf-8'))
Base64
import base64
import sys
# Length of the binary string in bytes (32 bytes will let you have a 256 digit binary character stream)
# Keep it as low as possible to save space
length = 32
binary_string = input('Enter binary string : ')
integer = eval('0b'+binary_string)
data = integer.to_bytes(length, sys.byteorder, signed=False)
print(base64.b64encode(data).decode('utf-8'))
WARNING: Typically sys.byteorder is little-endian, so you might run into problems when you try to load up the file.

Related

Python string with byte array to byte array

ciphertext = base64.b64decode(xxxxxx) //output is b'148,240,50,66,81,26,240,2,101,31'
bytearray(ciphertext) // output is bytearray(b'148,240,50,66,81,26,240,2,101,31')
What am looking for is output of bytearray([148,240,50,66,81,26,240,2,101,31])
Full code:
ciphertext = base64.b64decode("MTQ4LDI0MCw1MCw2Niw4MSwyNiwyNDAsMiwxMDEsMzEsMjM3LDEwMSw4OCwxODQsMTQsMTM1LDEzMCw0Miw0NywxODksMTkyLDE1MSw0OCwyMjQsMTU1LDQxLDM5LDE0MywyMDksMTA0LDE5NywyMywxMDUsMjMsMTYzLDUzLDQsMTQ0LDE2MSwxNDgsMjMwLDI1NCwxMzQsMjEzLDE3NCwyNDcsMTkxLDUyLDY0LDE2LDYzLDk0LDE1NCwxMzMsMzksMTMzLDIyNCwxODcsMTE0LDE1OCwyMzksMzUsMTUxLDM4LDE3NSwxNTIsOTksMTAyLDIxNCwyNTEsMTk0LDIxMywxNzMsMTc0LDcyLDIyNSwyMDIsMTcyLDE1NCw4OCwxMzksMTE1LDIzNywyMzYsMTIxLDAsMjE0LDIxNiwxOTYsNDAsMzgsMjA0LDgzLDEzNiwxNjAsMTczLDY5LDcsMzgsMjI1LDExOCw0OSw0OCw3MCwxNjYsMTIxLDI0NSwxOTEsMTgzLDEyMiwxOTksMTg3LDgsNDMsNDUsOTMsMTI0LDIxNSwxNjEsNzAsMjU0LDI2LDE4OCwxMywyMjYsMTMxLDMsNCw0MywxOTgsMjEyLDEwMywxMTcsMjE1LDEyNywyNDMsMzksNzIsNzYsMTE0LDUwLDE5Niw1NSwxMjEsODYsMjUxLDUzLDI0MiwzMCwxMDksNDcsMjEwLDI1MywxNjMsOTAsOTgsMTQsNjAsMTE1LDc1LDE0OSwyMTAsMTc1LDI2LDEyNCwyMjgsMjQ3LDIwLDIwMyw5NiwyMTAsMjYsODEsNjUsMTg4LDEyMSwxMjgsOTEsMTA3LDE2OCwxMywyMDcsMTc1LDE3MCwyNTUsMjM2LDE0OSwxMDksNTksMjQsMTcyLDExLDU4LDEzLDAsMTUyLDExNiwxMTAsMTExLDIyLDIzMSwzLDIzNyw0Miw4MSw3Nyw2MywyMjMsMTAzLDEwOSw1NiwxNTgsNDMsMjA2LDIwMiwzOCwxNDgsMTM3LDE4OSwyMTQsMjE2LDkwLDE4LDIyNCwyNTQsMzcsMTA5LDE4LDg0LDIyMiwyMDksMjUsNTMsMjE5LDE2OSwyMTEsNTAsMTgyLDQwLDExMiwyMDksMzEsNTIsMjEsNTMsOTgsMTIyLDI1NCwxMDgsMzksMzgsMTM0LDE1MCwxMzksMTk0LDMw=")
Replace:
bytearray(ciphertext)
with:
bytearray(map(int, ciphertext.split(b',')))
# Or if you prefer genexprs:
bytearray(int(x) for x in ciphertext.split(b','))
The former is just converting the raw bytes to an equivalent bytearray, the latter splits it up by commas and parses the components as ints.

How do I decode the hex from a specific part of a BSC transaction receipt, using web3 py?

I am writing a python script using web3 package.
The process explained:
I have a transaction, which I read the transaction receipt for
txn_receipt = w3.eth.getTransactionReceipt('0x8ddd5ab8f53df7365a2feb8ee249ca2d317edcdcb6f40faae728a3cb946b4eb1')
Just for this example, I read a specific section of the log. This returns a hex.
x = txn_receipt['logs'][4]['data']
PROBLEM:
How do I decode this hex? If you go to BSC SCAN, you will see the decoded value I am expecting at block 453.
Expected value:
amount0In :
2369737542851785768252
amount1In :
0
amount0Out :
0
amount1Out :
82650726831815053455
See here:
https://bscscan.com/tx/0x8ddd5ab8f53df7365a2feb8ee249ca2d317edcdcb6f40faae728a3cb946b4eb1#eventlog
Assuming you don't need the key names, you could do this with basic python (no need for web3 library).
The data field in BSC logs is a string of hex encoded values, which is just a base 16 representation of the decimal value. In your example:
0x00000000000000000000000000000000000000000000008076b6fbd0ebb5bd3c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000047b025e26b62ed08f
Just trim the beginning, split it up, and convert each string with python's int() function:
hexdata = '0x00000000000000000000000000000000000000000000008076b6fbd0ebb5bd3c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000047b025e26b62ed08f'
# Trim '0x' from beginning of string
hexdataTrimed = hexdata[2:]
# Split trimmed string every 64 characters
n = 64
dataSplit = [hexdataTrimed[i:i+n] for i in range(0, len(hexdataTrimed), n)]
# Fill new list with converted decimal values
data = []
for val in range(len(dataSplit)):
toDec = int(dataSplit[val], 16)
data.append(toDec)
print(data)
# returns [2369737542851785768252, 0, 0, 82650726831815053455]
Sources:
https://www.binaryhexconverter.com/hex-to-decimal-converter
https://www.w3schools.com/python/ref_func_int.asp

Python - write long string of bits inside a binary file

I've a string composed of ~75 000 bits (very long string).
I would like to create a binary file which is represented by this sequence of bits.
I did the following code :
byte_array = bytearray(global_bits_str.encode())
with open('file1.bin', 'wb') as f:
f.write(byte_array)
But when I check file1.bin I can see that it's composed of 75 000 bytes instead of 75 000 bits. I guess it has been encoded in ascii (1 byte per bit) in the file.
Any suggestions ?
You can use the int builtin to convert your binary string into a sequence of integers, then pass that to bytearray.
Example for one byte:
>>> int('10101010', 2)
170
>>> bytearray([170])
bytearray(b'\xaa')
Splitting the string:
chunks = [bit_string[n:n+8] for n in range(0, len(bit_string), 8)]
You'll have to do some special casing for the last chunk since it may not be a full byte, which can be done by ljust to left-pad it with zeros.
Putting it together:
def to_bytes(bits, size=8, pad='0'):
chunks = [bits[n:n+size] for n in range(0, len(bits), size)]
if pad:
chunks[-1] = chunks[-1].ljust(size, pad)
return bytearray([int(c, 2) for c in chunks]
# Usage:
byte_array = to_bytes(global_bits_str)

ascii txt file to binary bin file

I have a txt file with a stream of HEX data, I would like to convert it in binary fomart in order to save space on the disk.
this is my simple script just to test the decoding and the binary storage
hexstr = "12ab"
of = open('outputfile.bin','wb')
for i in hexstr:
#this is how I convert an ASCII char to 7 bit representation
x = '{0:07b}'.format(ord(i))
of.write(x)
of.close()
I exect that outputfile.bin has a size of 28 bit, instead the results is 28 byte.
I guess the problem is that x is a string and not a bit sequence.
How should I do?
Thanks in advance
First of all, you will not get a file size that is not a multiple of 8 bits on any popular platform.
Second, you really have to brush up an what "binary" actually means. You confuse two different concepts: representing a number in the binary number system and writing out data in a "non human readable" form.
Actually, you are confusing two even more fundamental concepts: data and the representation of data. "12ab" is a representation of the four bytes in memory, as is "\x31\x32\x61\x62".
Your problem is that x contains 28 bytes of data that can either be represented as "0110001011001011000011100010" or as "\x30\x31\x31\x30\x30...\x30\x30\x31\x30".
Maybe this will help you:
>>> hexstr = "12ab"
>>> len(hexstr)
4
>>> ['"%s": %x' % (c, ord(c)) for c in hexstr]
['"1": 31', '"2": 32', '"a": 61', '"b": 62']
>>> i = 42
>>> hex(i)
'0x2a'
>>> x = '{0:07b}'.format(i)
>>> x
'0101010'
>>> [hex(ord(c)) for c in x]
['0x30', '0x31', '0x30', '0x31', '0x30', '0x31', '0x30']
>>> hex(ord('0')), hex(ord('1'))
('0x30', '0x31')
>>> import binascii
>>> [hex(ord(c)) for c in binascii.unhexlify(hexstr)]
['0x12', '0xab']
That said, thhe binascii module has a method you can use:
import binascii
data = binascii.unhexlify(hexstr)
with open('outputfile.bin', 'wb') as f:
f.write(data)
This will encode your data in 8bit instead of 7bit, but usually it is not worth the effort to use 7bit for compression reasons anyway.
Is this what you want? "12ab" should be written as \x01\x02\x0a\x0b, right?
import struct
hexstr = "12ab"
of = open('outputfile.bin','w')
for i in hexstr:
of.write(struct.pack('B', int(i, 16)))
of.close()

Fixed-digit base64 encode and decode in Python

I'm trying to encode and decode a base64 string. It works fine normally, but if I try to restrict the hash to 6 digits, I get an error on decoding:
from base64 import b64encode
from base64 import b64decode
s="something"
base 64 encode/decode:
# Encode:
hash = b64encode(s)
# Decode:
dehash = b64decode(hash)
print dehash
(works)
6-digit base 64 encode/decode:
# Encode:
hash = b64encode(s)[:6]
# Decode:
dehash = b64decode(hash)
print dehash
TypeError: Incorrect padding
What am I doing wrong?
UPDATE:
Based on Mark's answer, I added padding to the 6-digit hash to make it divisible by 4:
hash = hash += "=="
But now the decode result = "some"
UPDATE 2
Wow that was stupid ..
Base64 by definition requires padding on the input if it does not decode into an integral number of bytes on the output. Every 4 base64 characters gets turned into 3 bytes. Your input length does not divide evenly by 4, thus there's an error.
Wikipedia has a good description of the specifics of Base64.

Categories