I use python 2.7 and windows.
I want to convert string'A123456' to bytes: b'\x0A\x12\x34\x56' and then concatenate it with other bytes (b'\xBB') to b'\xbb\x0A\x12\x34\x56'.
That is, I want to obtain b'\xbb\x0A\x12\x34\x56' from string'A123456' and b'\xBB'
This isn't too hard to do with binascii.unhexlify, the only problem you've got is that you want to zero pad your string when it's not an even number of nibbles (unhexlify won't accept a length 7 string).
So first off, it's probably best to make a quick utility function that does that, because doing it efficiently isn't super obvious, and you want a self-documenting name:
def zeropad_even(s):
# Adding one, then stripping low bit leaves even values unchanged, and rounds
# up odd values to next even value
return s.zfill(len(s) + 1 & ~1)
Now all you have to do is use that to fix up your string before unhexlifying it:
>>> from binascii import unhexlify
>>> unhexlify(zeropad_even('A123456'))
'\n\x124V'
>>> _ == b'\x0A\x12\x34\x56'
True
I included that last test just to show that you got the expected result; the repr of str tries to use printable ASCII or short escapes where available, so only the \x12 actually ends up in the repr; \x0A' becomes \n, \x34 is 4 and \x56 is V, but those are all equivalent ways to spell the same bytes.
Related
I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40
-------------------------- add new-----------------------------
Let me fill more info here:
The actual situation is that I have this LONG STRING in environment-A, and I need to copy and paste it to environment-B;
UNFORTUNATELY, envir-A and envir-B are not connected (no mutual access), so I'm thinking about a way to encode/decode to represent it, otherwise for more files I have to input the string hand by hand----which is slow and not reproducible.
Any suggestion or gadget recommend?
Many thanks!
I'm facing a weird problem to encode a SUPER LONG binaries to a simple form, like several digits.
Say, there is a long string consist of only 1 and 0, e.g. "110...011" of length 1,000 to 100,000 or even more digits, and I would like to encode this STRING to something that has fewer digits/chars. Then I need to reverse it back to original STRING.
Currently I'am trying using hex / int method in Python to 'compress' this String, and 'decompress' it back to original form.
A example would be:
1.input string : '110011110110011'
'''
def Bi_to_Hex_Int(input_str, method ):
#2to16
if method=='hex':
string= str(input_str)
input_two= string
result= hex(int(input_two,2))
#2to10
if method=='int':
string= str(input_str)
input_two= string
result= int(input_two,2)
print("input_bi length",len(str(input_two)), "\n output hex length",len(str(result)),'\n method: {}'.format(method) )
return result
res_16 =Bi_to_Hex_Int(gene , 'hex')
=='0x67b3'
res_10 =Bi_to_Hex_Int(gene , 'int')
== 26547
'''
Then I can reverse it back:
'''
def HexInt_to_bi(input_str , method):
if method =='hex':
back_two = bin(int(input_str,16))
back_two = back_two[2:]
if method =='int':
back_two = bin( int(input_str ))
back_two = back_two[2:]
print("input_hex length",len(str(input_str)), "\n output bi length",len(str(back_two)) )
return back_two
hexback_two = HexInt_to_bi(res_16, 'hex')
intback_two = HexInt_to_bi(res_10 , 'int')
'''
BUT, this does have a problem, I tried around 500 digits of String:101010...0001(500d), the best 'compressed' result is around 127 digits by hex;
So is there a better way to further 'compress' string to fewer digits?
**Say 5,000 digits of string consist of 1s&0s, compress to 50/100 something of digits/chars(even lower) ** ??
If you want it that simple, say 1 hex character compresses 4 binary characters (2 ^ 4 = 16). Compression ratio you want is about 100 / 50 times. For 50 times you need 50 binary characters to be compressed into 1 character, means you require 2 ^ 50 different characters to encode any combination. Quite a lot that is.
If you accept lower ratio, you may try base64 like described here. Its compress ratio is 6 to 1.
Otherwise you have to come up with some complex algorithm like splitting your string into blocks, looking for similar amongst them, encoding them with different symbols, building a map of those symbols, etc.
Probably it's easier to compress your string with an archivator, then return a base64 representation of the result.
If task allows, you may store the whole strings somewhere and give them short unique names, so instead of compression and decompression you have to store and retrieve strings by names.
This probably doesn't produce the absolutely shortest string you can get, but it's trivially easy using the facilities built into Python. No need to convert the characters into a binary format, the zlib compression will convert an input with only 2 different characters into something optimal.
Encoding:
import zlib
import base64
result = base64.b64encode(zlib.compress(input_str.encode()))
If the count of 0 and 1 is significant different than you can use enumerative coding to get shortest representation
If the string consists only of 0 and 1 digits, then you can pack eight digits into one byte. You will also need to keep track of how many digits there are past the last multiple of eight, since the last byte may be representing fewer than eight digits.
The output value is not including the 0's in the beginning, can someone help me fix the problem?
def bitwiseOR(P, Q):
return bin(P | Q)
bitwiseOR(0b01010111, 0b00111000)
OUTPUT: '0b1111111'
The leading zeroes are just for representation, so you can utilize Format Specification Mini-Language to display them as you wish:
Format string:
# Includes 0b prefix
0{length} Pad leading zeroes so total length is length
def bitwiseOR(P, Q, length=10):
return format(P | Q, f'#0{length}b')
x = bitwiseOR(0b01010111, 0b00111000)
# 0b01111111
print(x)
Leading zeros are a property of the string you produce, not the number. So, for example, if you're looking for a way to make the following two calls produce different results, that's not possible:1
bitwiseOR(0b01010111, 0b00111000)
bitwiseOR( 0b1010111, 0b111000)
However, if you can provide the number of digits separately, then you can do this using the format() function. It accepts a second argument which lets you customize how the number is printed out using the format spec. Based on that spec, you can print a number padded with zeros to a given width like this:
>>> format(127, '#010b')
'0b01111111'
Here the code consists of four pieces:
# means apply the 0b prefix at the beginning
0 means pad with leading zeros
10 means the total length of the resulting string should be at least 10 characters
b means to print the number in binary
You can tweak the format code to produce your desired string length, or even take the length from a variable.
1Well... technically there is a way to make Python re-read its own source code and possibly produce different results that way, but that's not useful in any real program, it's only useful if you want to learn something about how the Python interpreter works.
x = 64
var_in_bin_format = bin(64)
print(var_in_bin_format)
#Output
#0b1000000
#Desired Output -- > should always be in 8 bit format
#0b01000000
def call_another_api(var_in_bin_format):
pass
In Python, I need to call an API that expects its parameter to be always in 8 bit format regardless of the value of the decimal number?
I am not that good in bit manipulation so I am thinking if there is something I can do here?
How can I do this? I cannot use the format() function as it will convert the value into a string representation and the API that I am calling will alert me that it is not in the correct format.
Even though you say that you can't use format() because it returns a string, I'm going to post this because that's also what bin() does. bin(x) is equivalent to format(x, '#b'). I'd guess that you haven't added the '#', which means you won't have '0b' leading the value.
The Python 3 documentation for bin() actually gives a pretty strong hint about how you might do this, using format instead.
If you know that the value passed will not be negative, you can use the format string '#010b':
format(x, '#010b')
Breaking this down:
'b' means that the number will be a string binary representation.
'10' means that the entire string will be 10 characters long, two for '0b' and 8 for the value.
'0' makes it pad with '0' instead of ' '.
'#' will add the '0b' prefix, as done by bin().
Note that this assumes that the number is an integer in the range [0, 255]. Integers outside this range will generate valid representations, but will not match the format expected, and may have a leading '-'. Objects of type float can not be converted with the 'b' format code. I assume that these are not problems, given what your intended output is, but it might be a good idea to add an explicit check to throw a ValueError if the value is less than 0 or greater than 255.
If you're in Python 3.6+, you could also use f-strings:
f'{x:#010b}'
Is is not possible to convert all decimal numbers to 8 bit. You can only convert numbers from 0 to 255 in 8 bits.
Say I have 2 hex values that were taken in from keyboard input.
E.g. val1 = 0x000000 and val2 = 0xFF0000
and I do,
print(hex(val1))
print(hex(val2))
I get the right output for val2 but val1 is just 0x0, how do I get it to print the whole value?
By whole value, I mean, if the inputted value is 0x000000, I want to output the value as 0x000000 and not 0x0.
Use the format() built-in function to show the hex representation to the expected level of precision:
>>> format(53, '08x')
'00000035'
>>> format(1234567, '08x')
'0012d687'
The format code 08x means "eight lowercase hex digits padded with leading zeros".
You can pad the hex value to the specified number of digits by using the ljust method.
Whenever the string is less than the specified width, it'll append the specified filler character to extend it.
In your example example, hex(0x0).ljust(8, '0') == "0x000000".
Strings that are already long enough are preserved so that 0xFF0000 will still work.
print(hex(0x000000).ljust(8, '0')) # Prints 0x000000
print(hex(0xFF0000).ljust(8, '0')) # Prints 0xFF0000
A couple of important things to note that have bitten me in the past:
Make sure your width includes the length of the leading "0x"
This is because the ljust function operator operates on raw text and doesn't realize it's a hex string
If you give a width value shorter than you need, the strings won't be filled up enough and will have different lengths.
In other words len(hex(0xFF0000).ljust(4, '0')) != len(hex(0xFF0000).ljust(4, '0')) because you need a length of 8 characters to fit both cases
You say the user typed this input in on the keyboard. That means you already started with the strings you want, but you parsed integers out of the input and threw away the strings.
Don't do that. Keep the strings. Parse integers too if you need to do math, but keep the strings. Then you can just do
print(original_input)
without having to go through a reconstruction process and guess at how many leading zeros were originally input.