How do I write a string of bytes to a file, in byte mode, using python?
I have:
['0x28', '0x0', '0x0', '0x0']
How do I write 0x28, 0x0, 0x0, 0x0 to a file? I don't know how to transform this string to a valid byte and write it.
Map to a bytearray() or bytes() object, then write that to the file:
with open(outputfilename, 'wb') as output:
output.write(bytearray(int(i, 16) for i in yoursequence))
Another option is to use the binascii.unhexlify() function to turn your hex strings into a bytes value:
from binascii import unhexlify
with open(outputfilename, 'wb') as output:
output.write(unhexlify(''.join(format(i[2:], '>02s') for i in b)))
Here we have to chop off the 0x part first, then reformat the value to pad it with zeros and join the whole into one string.
In Python 3.X, bytes() will turn an integer sequence into a bytes sequence:
>>> bytes([1,65,2,255])
b'\x01A\x02\xff'
A generator expression can be used to convert your sequence into integers (note that int(x,0) converts a string to an integer according to its prefix. 0x selects hex):
>>> list(int(x,0) for x in ['0x28','0x0','0x0','0x0'])
[40, 0, 0, 0]
Combining them:
>>> bytes(int(x,0) for x in ['0x28','0x0','0x0','0x0'])
b'(\x00\x00\x00'
And writing them out:
>>> L = ['0x28','0x0','0x0','0x0']
>>> with open('out.dat','wb') as f:
... f.write(bytes(int(x,0) for x in L))
...
4
Related
ciphertext = base64.b64decode(xxxxxx) //output is b'148,240,50,66,81,26,240,2,101,31'
bytearray(ciphertext) // output is bytearray(b'148,240,50,66,81,26,240,2,101,31')
What am looking for is output of bytearray([148,240,50,66,81,26,240,2,101,31])
Full code:
ciphertext = base64.b64decode("MTQ4LDI0MCw1MCw2Niw4MSwyNiwyNDAsMiwxMDEsMzEsMjM3LDEwMSw4OCwxODQsMTQsMTM1LDEzMCw0Miw0NywxODksMTkyLDE1MSw0OCwyMjQsMTU1LDQxLDM5LDE0MywyMDksMTA0LDE5NywyMywxMDUsMjMsMTYzLDUzLDQsMTQ0LDE2MSwxNDgsMjMwLDI1NCwxMzQsMjEzLDE3NCwyNDcsMTkxLDUyLDY0LDE2LDYzLDk0LDE1NCwxMzMsMzksMTMzLDIyNCwxODcsMTE0LDE1OCwyMzksMzUsMTUxLDM4LDE3NSwxNTIsOTksMTAyLDIxNCwyNTEsMTk0LDIxMywxNzMsMTc0LDcyLDIyNSwyMDIsMTcyLDE1NCw4OCwxMzksMTE1LDIzNywyMzYsMTIxLDAsMjE0LDIxNiwxOTYsNDAsMzgsMjA0LDgzLDEzNiwxNjAsMTczLDY5LDcsMzgsMjI1LDExOCw0OSw0OCw3MCwxNjYsMTIxLDI0NSwxOTEsMTgzLDEyMiwxOTksMTg3LDgsNDMsNDUsOTMsMTI0LDIxNSwxNjEsNzAsMjU0LDI2LDE4OCwxMywyMjYsMTMxLDMsNCw0MywxOTgsMjEyLDEwMywxMTcsMjE1LDEyNywyNDMsMzksNzIsNzYsMTE0LDUwLDE5Niw1NSwxMjEsODYsMjUxLDUzLDI0MiwzMCwxMDksNDcsMjEwLDI1MywxNjMsOTAsOTgsMTQsNjAsMTE1LDc1LDE0OSwyMTAsMTc1LDI2LDEyNCwyMjgsMjQ3LDIwLDIwMyw5NiwyMTAsMjYsODEsNjUsMTg4LDEyMSwxMjgsOTEsMTA3LDE2OCwxMywyMDcsMTc1LDE3MCwyNTUsMjM2LDE0OSwxMDksNTksMjQsMTcyLDExLDU4LDEzLDAsMTUyLDExNiwxMTAsMTExLDIyLDIzMSwzLDIzNyw0Miw4MSw3Nyw2MywyMjMsMTAzLDEwOSw1NiwxNTgsNDMsMjA2LDIwMiwzOCwxNDgsMTM3LDE4OSwyMTQsMjE2LDkwLDE4LDIyNCwyNTQsMzcsMTA5LDE4LDg0LDIyMiwyMDksMjUsNTMsMjE5LDE2OSwyMTEsNTAsMTgyLDQwLDExMiwyMDksMzEsNTIsMjEsNTMsOTgsMTIyLDI1NCwxMDgsMzksMzgsMTM0LDE1MCwxMzksMTk0LDMw=")
Replace:
bytearray(ciphertext)
with:
bytearray(map(int, ciphertext.split(b',')))
# Or if you prefer genexprs:
bytearray(int(x) for x in ciphertext.split(b','))
The former is just converting the raw bytes to an equivalent bytearray, the latter splits it up by commas and parses the components as ints.
I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'
Suppose I read a long bytes object from somewhere, knowing it is utf-8 encoded. But the read may not fully consume the available content so that the last character in the stream may be incomplete. Calling bytes.decode() on this object may result in a decode error. But what really fails is only the last few bytes. Is there a function that works in this case, returning the longest decoded string and the remaining bytes?
utf-8 encodes a character into at most 4 bytes, so trying to decode truncated bytes should work, but a vast majority of computation will be wasted, and I don't really like this solution.
To give a simple but concrete example:
>>> b0 = b'\xc3\x84\xc3\x96\xc3'
>>> b1 = b'\x9c\xc3\x84\xc3\x96\xc3\x9c'
>>> (b0 + b1).decode()
>>> 'ÄÖÜÄÖÜ'
(b0 + b1).decode() is fine, but b0.decode() will raise. The solution should be able to decode b0 for as much as possible and return the bytes that cannot be decoded.
You are describing the basic usage of io.TextIOWrapper: a buffered text stream over a binary stream.
>>> import io
>>> txt = 'before\N{PILE OF POO}after'
>>> b = io.BytesIO(txt.encode('utf-8'))
>>> t = io.TextIOWrapper(b)
>>> t.read(5)
'befor'
>>> t.read(1)
'e'
>>> t.read(1)
'💩'
>>> t.read(1)
'a'
Contrast with reading a bytes stream directly, where it would be possible to read halfway through an encoded pile of poo:
>>> b.seek(0)
0
>>> b.read(5)
b'befor'
>>> b.read(1)
b'e'
>>> b.read(1)
b'\xf0'
>>> b.read(1)
b'\x9f'
>>> b.read(1)
b'\x92'
>>> b.read(1)
b'\xa9'
>>> b.read(1)
b'a'
Specify encoding="utf-8" if you want to be explicit. The default encoding, i.e. locale.getpreferredencoding(False), would usually be utf-8 anyway.
As I mentioned in the comments under #wim's answer, I think you could use the codecs.iterdecode() incremental decoder to do this. Since it's a generator function, there's no need to manually save and restore its state between iterative calls to it.
Here's how how it might be used to handle a situation like the one you described:
import codecs
from random import randint
def reader(sequence):
""" Yield random length chunks of sequence until exhausted. """
plural = lambda word, n, ending='s': (word+ending) if n > 1 else word
i = 0
while i < len(sequence):
size = randint(1, 4)
chunk = sequence[i: i+size]
hexrepr = '0x' + ''.join('%02X' % b for b in chunk)
print('read {} {}: {}'.format(size, plural('byte', len(chunk)), hexrepr))
yield chunk
i += size
bytes_obj = b'\xc3\x84\xc3\x96\xc3\x9c\xc3\x84\xc3\x96\xc3\x9c' # 'ÄÖÜÄÖÜ'
for decoded in codecs.iterdecode(reader(bytes_obj), 'utf-8'):
print(decoded)
Sample output:
read 3 bytes: 0xC384C3
Ä
read 1 byte: 0x96
Ö
read 1 byte: 0xC3
read 3 bytes: 0x9CC384
ÜÄ
read 2 bytes: 0xC396
Ö
read 4 bytes: 0xC39C
Ü
I got a special packet in string format, which has 32 bytes header and the body contains one of more entries, each consist of 90 bytes.
I want to process this string using python. Can I just read like sock read first 32 bytes header, and take it off the string, and continue read 90 bytes of the first entry?
something like:
str.read(32) # => "x01x02..."
str.read(90) # => "x02x05..."
You can use StringIO to read a string like a file
>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'
I would also suggest you take a look at the struct module for help with parsing binary data
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.
For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:
>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)
To split the packet into a 32 byte header and body:
header = packet[:32]
body = packet[32:]
To further split the body into one or more entries:
entries = [packet[i:i+90] for i in range(0, len(packet), 90)]
In python 2.x you could do simply:
header = s[:32]
body = s[32:32+90]
In python 3.x all strings are unicode, so I would convert to bytearray firstly:
s = bytearray(s)
header = s[:32]
body = s[32:32+90]
I have a list of hex bytes strings like this
['0xe1', '0xd7', '0x7', '0x0']
(as read from a binary file)
I want to flip the list and append the list together to create one hex number,
['0x07D7E1']
How do I format the list to this format?
Concatenate your hex numbers into one string:
'0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
This does include the 00 byte explicitly in the output:
>>> inputlist = ['0xe1', '0xd7', '0x7', '0x0']
>>> '0x' + ''.join([format(int(c, 16), '02X') for c in reversed(inputlist)])
'0x0007D7E1'
However, I'd look into reading your binary file format better; using struct for example to unpack bytes directly from the file into proper integers in the right byte order:
>>> import struct
>>> bytes = ''.join([chr(int(c, 16)) for c in inputlist])
>>> value = struct.unpack('<I', bytes)[0]
>>> print hex(value)
0x7d7e1