I´m learning Python.
Seeing the module struct, I found a doubt:
Is it possible to convert a "string" to "bin" without giving the length.
For the case (with chars length)
F = open("data.bin", "wb")
import struct
data = struct.pack("24s",b"This is an unknown string")
print(data)
F.write(data)
F.close()
I´m trying to do the same but with unknown length.
Thanks a lot!
If you have the string, use len to determine the length of the string.
i.e
data = struct.pack("{0}s".format(len(unknown_string)), unknown_string)
The Bytes type is a binary data type, it just stores a bunch of 8bit characters. Note that the code with struct.pack ends up creating a bytes object:
>>> import struct
>>> data = struct.pack("24s",b"This is an unknown string")
>>> type(data)
<class 'bytes'>
>>> len(data)
24
The length of this is 24 as per your format specifier. If you just want to place the bytes-string directly into the file without doing any length checking you don't even need to use the struct module, you can just write it directly to the file:
F = open("data.bin", "wb")
F.write(b"This will work")
If however you wanted to keep the 24 bytes length you could keep using struct.pack:
>>> data = struct.pack("24s",b"This is an unknown st")
>>> len(data)
24
>>> print(data)
b'This is an unknown st\x00\x00\x00'
>>> data = struct.pack("24s",b"This is an unknown string abcdef")
>>> print(data)
b'This is an unknown strin'
In the case of supplying a bytes that is too short struct.pack pads the remainder with 0's and in the case where it's too long it truncates it.
If you don't mind getting the missing space padded out with zeros you can just pass in the bytes object directly to struct.pack and it will handle it.
Thanks to both...
My new code:
F = open("data.bin", "wb")
strs = b"This is an unkown string"
import struct
data = struct.pack("{0}s".format(len(strs)),strs)
print(data)
F.write(data)
F.close()
Related
I have a string which includes encoded bytes inside it:
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
I want to decode it, but I can't since it has become a string. Therefore I want to ask whether there is any way I can convert it into
str2 = b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
Here str2 is a bytes object which I can decode easily using
str2.decode('utf-8')
to get the final result:
'Output file 문항분석.xlsx Created'
You could use ast.literal_eval:
>>> print(str1)
b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'
>>> type(str1)
<class 'str'>
>>> from ast import literal_eval
>>> literal_eval(str1).decode('utf-8')
'Output file 문항분석.xlsx Created'
Based on the SyntaxError mentioned in your comments, you may be having a testing issue when attempting to print due to the fact that stdout is set to ascii in your console (and you may also find that your console does not support some of the characters you may be trying to print). You can try something like the following to set sys.stdout to utf-8 and see what your console will print (just using string slice and encode below to get bytes rather than the ast.literal_eval approach that has already been suggested):
import codecs
import sys
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.buffer)
s = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
b = s[2:-1].encode().decode('utf-8')
A simple way is to assume that all the characters of the initial strings are in the [0,256) range and map to the same Unicode value, which means that it is a Latin1 encoded string.
The conversion is then trivial:
str1[2:-1].encode('Latin1').decode('utf8')
Finally I have found an answer where i use a function to cast a string to bytes without encoding.Given string
str1 = "b'Output file \xeb\xac\xb8\xed\x95\xad\xeb\xb6\x84\xec\x84\x9d.xlsx Created'"
now i take only actual encoded text inside of it
str1[2:-1]
and pass this to the function which convert the string to bytes without encoding its values
import struct
def rawbytes(s):
"""Convert a string to raw bytes without encoding"""
outlist = []
for cp in s:
num = ord(cp)
if num < 255:
outlist.append(struct.pack('B', num))
elif num < 65535:
outlist.append(struct.pack('>H', num))
else:
b = (num & 0xFF0000) >> 16
H = num & 0xFFFF
outlist.append(struct.pack('>bH', b, H))
return b''.join(outlist)
So, calling the function would convert it to bytes which then is decoded
rawbytes(str1[2:-1]).decode('utf-8')
will give the correct output
'Output file 문항분석.xlsx Created'
Trying to make an encode process but have an error line:
Look at my whole fuction ,please, for the whole getting it. I think it isn't big enough.
Trying to add the header to the file data:
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
filebytes = headerdata + data
Have an error:
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
struct.error: argument for 's' must be a bytes object
I was trying to change it:(this line, addin 'b')
str(Header.MAX_FORMAT_LENGTH)+b"s",header.magicnum, header.size, header.fformat)
Have another error:
str(Header.MAX_FORMAT_LENGTH)+b's',header.magicnum, header.size, header.fformat) TypeError: must be str, not bytes
the whole fucnton:
def encode(image, data, filename, encryption=False, password=""):
im = Image.open(image)
px = im.load()
#Create a header
header = Header()
header.size = len(data)
header.fformat = "" if (len(filename.split(os.extsep))<2)\
else filename.split(os.extsep)[1]
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
filebytes = headerdata + data
#Optional encryption step
if encrypt:
if password:
filebytes = encrypt(filebytes, password,\
padding=im.width*im.height - len(filebytes))
else:
print ("Password is empty, encryption skipped")
#Ensure the image is large enough to hide the data
if len(filebytes) > im.width*im.height:
print ("Image too small to encode the file. \
You can store 1 byte per pixel.")
exit()
for i in range(len(filebytes)):
coords = (i%im.width, i/im.width)
byte = ord(filebytes[i])
px[coords[0], coords[1]] = encode_in_pixel(byte, px[coords[0],\
coords[1]])
im.save("output.png", "PNG")
Your original code was correct, except that the type of header.magicnum was unexpected. Your code snippet should read
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s","{:04d}".format(header.magicnum).encode('UTF-8'), header.size, header.fformat)
filebytes = headerdata + data
or some other suitable format code and encoding that turns header.magicnum into your expected result.
Code
since you said they are all strings, here you go
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum.encode(), int(header.size), header.fformat.encode())
This should work for the formats and types you want
Explanation
According to this, and specifically section 7.1.2.2, we can find the types needed as arguments for the following format characters:
-----------------------------------------
|Formatting Character | Type (in python)|
-----------------------------------------
|s | integer |
-----------------------------------------
|I | bytes |
-----------------------------------------
and since the data you want to format is of type str, we need to change it.
Lets start with making a str to and integer since it's the simplest.
>>> x = '123'
>>> type(x)
str
>>> y = int(x)
>>> type(y)
int
Easy, all we need to do is call int() on our string.
Next up is turning a string into bytes. We use strings encode() method to do this (documentation)
>>> x = '123'
>>> type(x)
str
>>> y = e.encode()
>>> type(y)
bytes
>>> print(y)
b'123'
I'm dealing with a character separated hex file, where each field has a particular start code. I've opened the file as 'rb', but I was wondering, after I get the index of the startcode using .find, how do I read a certain number of bytes from this position?
This is how I am loading the file and what I am attempting to do
with open(someFile, 'rb') as fileData:
startIndex = fileData.find('(G')
data = fileData[startIndex:7]
where 7 is the number of bytes I want to read from the index returned by the find function. I am using python 2.7.3
You can get the position of a substring in a bytestring under python2.7 like this:
>>> with open('student.txt', 'rb') as f:
... data = f.read()
...
>>> data # holds the French word for student: élève
'\xc3\xa9l\xc3\xa8ve\n'
>>> len(data) # this shows we are dealing with bytes here, because "élève\n" would be 6 characters long, had it been properly decoded!
8
>>> len(data.decode('utf-8'))
6
>>> data.find('\xa8') # continue with the bytestring...
4
>>> bytes_to_read = 3
>>> data[4:4+bytes_to_read]
'\xa8ve'
You can look for the special characters, and for compatibility with Python3k, it's better if you prepend the character with a b, indicating these are bytes (in Python2.x, it will work without though):
>>> data.find(b'è') # in python2.x this works too (unfortunately, because it has lead to a lot of confusion): data.find('è')
3
>>> bytes_to_read = 3
>>> pos = data.find(b'è')
>>> data[pos:pos+bytes_to_read] # when you use the syntax 'n:m', it will read bytes in a bytestring
'\xc3\xa8v'
>>>
I got a special packet in string format, which has 32 bytes header and the body contains one of more entries, each consist of 90 bytes.
I want to process this string using python. Can I just read like sock read first 32 bytes header, and take it off the string, and continue read 90 bytes of the first entry?
something like:
str.read(32) # => "x01x02..."
str.read(90) # => "x02x05..."
You can use StringIO to read a string like a file
>>> import StringIO
>>> s = 'Hello, World!'
>>> sio = StringIO.StringIO(s)
>>> sio.read(6)
'Hello,'
>>> sio.read()
' World!'
I would also suggest you take a look at the struct module for help with parsing binary data
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
You define the format of the data using format strings, so 'hhl' in the above example is short (2 bytes), short (2 bytes), int (4 bytes). It also supports specifying endianness (byte order) in the format string.
For example if your header format was uint, 4 byte str, uint, uint, ushort, ulong:
>>> import struct
>>> data = ''.join(chr(i) for i in range(128)) * 10
>>> hdr_fmt = 'I4sIIHL'
>>> struct.calcsize(hdr_fmt)
32
>>> struct.unpack_from(hdr_fmt, data, 0)
(50462976, '\x04\x05\x06\x07', 185207048, 252579084, 4368, 2242261671028070680)
To split the packet into a 32 byte header and body:
header = packet[:32]
body = packet[32:]
To further split the body into one or more entries:
entries = [packet[i:i+90] for i in range(0, len(packet), 90)]
In python 2.x you could do simply:
header = s[:32]
body = s[32:32+90]
In python 3.x all strings are unicode, so I would convert to bytearray firstly:
s = bytearray(s)
header = s[:32]
body = s[32:32+90]
I have a file encoded in a strange pattern. For example,
Char (1 byte) | Integer (4 bytes) | Double (8 bytes) | etc...
So far, I wrote the code below, but I have not been able to figure out why still shows garbage in the screen. Any help will be greatly appreciated.
BRK_File = 'commands.BRK'
input = open(BRK_File, "rb")
rev = input.read(1)
filesize = input.read(4)
highpoint = input.read(8)
which = input.read(1)
print 'Revision: ', rev
print 'File size: ', filesize
print 'High point: ', highpoint
print 'Which: ', which
while True
opcode = input.read(1)
print 'Opcode: ', opcode
if opcode = 120:
break
elif
#other opcodes
read() returns a string, which you need to decode to get the binary data. You could use the struct module to do the decoding.
Something along the following lines should do the trick:
import struct
...
fmt = 'cid' # char, int, double
data = input.read(struct.calcsize(fmt))
rev, filesize, highpoint = struct.unpack(fmt, data)
You may have to deal with endianness issues, but struct makes that pretty easy.
It would be helpful to show the contents of the file, as well as the "garbage" that it's outputting.
input.read() returns a string, so you have to convert what you're reading to the type that you want. I suggest looking into the struct module.