I am using struct.pack method which takes variable number of arguments. I want to convert a string to bytes. If a string is short (e.g. 'name') I can do it like:
bytes = struct.pack('4c','n','a','m','e')
But what to do when my string is 80 characters long?
I have tried the format string 's', instead of '80c' for struct.pack, but the result is not the same as that of above call.
Use "80s", not just "s". The input is a single string, rather than a series of characters. i.e.
bytes = struct.pack('4s','name')
Note that if you specify a length greater than that of the input, the output will be null-padded.
That doesn't make much sense. Strings are already bytes in python 2.x; So you could just do:
my_string = 'I am some big string'
my_bytes = my_string
On python 3, strings are unicode objects by default. To get bytes you have to encode the string.
my_bytes = my_string.encode('utf-8')
If really you want to use struct.pack, you'd use * syntax as described in the tutorial:
my_bytes = struct.pack('20c', *my_string)
or
my_bytes = struct.pack('20s', my_string)
Related
I'm trying to remove certain characters from a bytearray (specifically, certain control characters that are messing up my formatting)
I manually listed individual translates and it worked, but can I format this as a single translate?
In the string variant, the input can be a dictionary table. But I get an error when I tried this that parameters must be a bytearray object.
translation_table_0A = bytes.maketrans(b"\x0A", b"\x00")
translation_table_0B = bytes.maketrans(b"\x0B", b"\x00")
translation_table_0C = bytes.maketrans(b"\x0C", b"\x00")
translation_table_0D = bytes.maketrans(b"\x0D", b"\x00")
translation_table_04 = bytes.maketrans(b"\x04", b"\x00")
test_bytes = bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
out_list = test_bytes.translate(translation_table_0A) # remove \x0A
out_list = out_list.translate(translation_table_0B) # remove \x0B
out_list = out_list.translate(translation_table_0C) # remove \x0C
out_list = out_list.translate(translation_table_0D) # remove \x0D
out_list = out_list.translate(translation_table_04) # remove \x04
print(f"Output coded: {obj}")
print(f"Output decoded: {obj.decode('mac-roman')}")
I would think it would work like this:
translate_dict = {b"\x0A" : b"\x00", b"\x0B" : b"\x00", b"\x0C" : b"\x00", b"\x0D" : b"\x00", b"\x04" : b"\x00", }
out_list = test_bytes.translate(translate_dict) # remove Control Chars
But it doesn't. Does anyone know how to get this working?
Unfortunately the documentation is lacking in details:
bytes
bytes maketrans()
bytes methods
bytes translate()
From the maketrans method, a table can be generated, but 'from' and 'to' must be byte-like objects, so tuples, lists, or dictionaries wont work.
note: Not interested in regex solutions, or other libraries. Specifically looking for this application.
If you want a bytes translation table, you get a 256-byte mapping of bytes, that is indexed into by the source byte value, and returns the nth byte value in the mapping. You don't have to set up 4 different translation tables to translate 4 bytes, you can do it like this:
>>> translation_table = bytes.maketrans(b"\x0A\x0B\x0C\x0D\x04", b"\x00\x00\x00\x00\x00")
That will let you change the unwanted byte values to \x00 like this:
>>> test_bytes=bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
>>> test_bytes.translate(translation_table)
bytearray (b'ufsb\x00sb\x00\x00t\xf1tsbav\x00\x00v\x02\x00\x01bhrt\x00\x00\x00\x00\x01\x00')
which does not look exactly like test_bytes with 4 byte values changed, because the default representation of a printable character in a bytestring is the printable character and not the hex escape. You can see this if you ask for test_bytes back:
>>> test_bytes
bytearray (b'ufsb\rsb\x0b\x00t\xf1tsbav\x00\x0cv\x02\x04\x01bhrt\x00\x00\x00\n\x01\x00')
Here sequences such as tsbav and bhrt appear as printable characters and not as hex escapes. But it is only the representation that differs.
If you are working with bytes, you can't use a dictionary as a translation table. In Python 3, where strings are Unicode, a 256-byte mapping table won't work, because there are 1,114,112 possible codepoints that the table might need to translate. So for strings, translate() uses a dict instead. While efficient, a dict can't match a 256-byte character map for efficiency. So bytes.maketrans() makes a 256-byte character map, but str.maketrans() makes a dict, and the corresponding translate() methods expect the corresponding kind of translation table.
I have the following problem in python
I have the value 0x402de4a in hex and would like to convert it to bytes so I use .to_bytes(3, 'little') which gives me b'J\2d#' if I print it. I am aware that this is just a representation of the bytes but I need to turn a string later for the output which would give me J\2d# if I use str() nut I need it to be \x4a\x2d\x40 how can I convert the byte object to string so I can get the raw binary data as a string
my code is as follows
addr = 0x402d4a
addr = int(addr,16)
addr = str(addr.to_bytes(3,'little'))
print(addr)
and my expected output is
\x4a\x2d\x40
Thanks in advance
There is no direct way to get \x4a\x2d and so forth from a string. Or bytes, for this matter.
What you should do:
Convert the int to bytes -- you've done this, good
Loop over the bytes, use f-string to print the hexadecimal value with the "\\x" prefix
join() them
2 & 3 can nicely be folded into one generator comprehension, e.g.:
rslt = "".join(
f"\\x{b:02x}" for b in value_as_bytes
)
Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:
Assuming:
rnd_bytes = b'w\x12\x96\xb8'
len(rnd_bytes)
prints: 4
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
my_str = rnd_bytes.decode('utf-8' , 'backslashreplace')
Now, I have the string.
I want to convert it back to exactly the original byte (size 4!):
According to python ressources and this answer, there are different possibilities:
conv_bytes = bytes(my_str, 'utf-8')
conv_bytes = my_str.encode('utf-8')
But len(conv_bytes) returns 10.
I tried to analyse the outcome:
>>> repr(rnd_bytes)
"b'w\\x12\\x96\\xb8'"
>>> repr(my_str)
"'w\\x12\\\\x96\\\\xb8'"
>>> repr(conv_bytes)
"b'w\\x12\\\\x96\\\\xb8'"
It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to
SyntaxError: EOL while scanning string literal
due to the last argument '\'. This had been discussed here, where the following suggestion came up:
>>> my_str2=my_str.encode('utf_8').decode('unicode_escape')
>>> repr(my_str2)
"'w\\x12\\x96¸'"
This replaces the '\\\\' but seems to add / change some other characters:
>>> conv_bytes2 = my_str2.encode('utf8')
>>> len(conv_bytes2)
6
>>> repr(conv_bytes2)
"b'w\\x12\\xc2\\x96\\xc2\\xb8'"
There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?
Note: Some codes found on the Internet.
You could try to convert it to hex format. Then it is easy to convert it back to byte format.
Sample code to convert bytes to string:
hex_str = rnd_bytes.hex()
Here is how 'hex_str' looks like:
'771296b8'
And code for converting it back to bytes:
new_rnd_bytes = bytes.fromhex(hex_str)
The result is:
b'w\x12\x96\xb8'
For processing you can use:
readable_str = ''.join(chr(int(hex_str[i:i+2], 16)) for i in range(0, len(hex_str), 2))
But newer try to encode readable string, here is how readable string looks like:
'w\x12\x96¸'
After processing readable string convert it back to hex format before converting it back to bytes string like:
hex_str = ''.join([str(hex(ord(i)))[2:4] for i in readable_str])
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
The UTF-8 encoding cannot interpret every possible sequence of bytes as a string. Using backslashreplace gives you a string that preserves the information for bytes that couldn't be converted:
>>> rnd_bytes = b'w\x12\x96\xb8'
>>> rnd_bytes.decode('utf-8', 'backslashreplace')
'w\x12\\x96\\xb8'
but that representation is not very useful for converting back.
Instead, use an encoding that does interpret every possible sequence of bytes as a string. The most straightforward of these is ISO-8859-1, which simply maps each byte one at a time to the first 256 Unicode code points respectively.
>>> rnd_bytes.decode('iso-8859-1')
'w\x12\x96¸'
>>> rnd_bytes.decode('iso-8859-1').encode('iso-8859-1') == rnd_bytes
True
I have a string like "Some characters \x00\x80\x34 and then some other characters". How can I convert the regular characters to their hex equivalent, while converting \x00 to the actual 00 hex value?
binascii.hexlify() considers '\', 'x', '0', '0' as actual characters.
Later edit:
The string itself is produced by another function. When I print it, it actually prints "\x00".
As my understanding you are trying to convert only the characters that are not hex values to hex. It would help if you gave a sample input string that you are trying to convert to hex.
Also you can convert to hex values using just the built in encoding and decoding method. That should take care of what you are trying to do. The following three lines are what I ran in terminal of my machine, and gave the output you are expecting. I also attached an image to show you. Hope it helps:
aStr = "Some characters \x00\x80\x34 and then some other characters"
aStr.encode("hex")
aStr.encode("hex").decode("hex")
It's unclear what you're asking, since binascii.hexlify should work:
>>> import binascii
>>> s = "\x00\x80\x34"
>>> binascii.hexlify(s)
'008034'
>>> s = "foobar \x00\x80\x34 foobar"
>>> binascii.hexlify(s)
'666f6f6261722000803420666f6f626172'
foorbar = 666f6f6261722, space = 20
↳ https://docs.python.org/3/library/binascii.html
I implemented a simple file seek and read in Python:
>>>f = open("<filepath>", "rb")
>>>f.seek(0x20) #offset 0x20
>>>byte=f.read(4) #4 byte space
I ended up with
>>>byte
'\xe0\x00\x00\x00'
which is the expected result, but I need to use it as a hex value without escapes for further calculations.
How can I convert such a string into an unescaped hex value? (In the above example '\xe0\x00\x00\x00' should tranform into 'e0000000' or '0xe0000000'.)
Use encode('hex'):
>>> byte.encode('hex')
'e0000000'
# convert it to int
>>> int(byte.encode('hex'), 16)
3758096384
You could use byte.encode('hex') to get the hex value.
You could use the struct module to unpack the bytes into a number and then format that the way you wish.
inport struct
print '{:08x}'.format(struct.unpack('>I', byte)[0])
Output:
e0000000