Encoding error in Python

Encoding error in Python - python

Trying to make an encode process but have an error line:
Look at my whole fuction ,please, for the whole getting it. I think it isn't big enough.
Trying to add the header to the file data:
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
filebytes = headerdata + data
Have an error:
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
struct.error: argument for 's' must be a bytes object
I was trying to change it:(this line, addin 'b')
str(Header.MAX_FORMAT_LENGTH)+b"s",header.magicnum, header.size, header.fformat)
Have another error:
str(Header.MAX_FORMAT_LENGTH)+b's',header.magicnum, header.size, header.fformat) TypeError: must be str, not bytes
the whole fucnton:
def encode(image, data, filename, encryption=False, password=""):
im = Image.open(image)
px = im.load()
#Create a header
header = Header()
header.size = len(data)
header.fformat = "" if (len(filename.split(os.extsep))<2)\
else filename.split(os.extsep)[1]
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum, header.size, header.fformat)
filebytes = headerdata + data
#Optional encryption step
if encrypt:
if password:
filebytes = encrypt(filebytes, password,\
padding=im.width*im.height - len(filebytes))
else:
print ("Password is empty, encryption skipped")
#Ensure the image is large enough to hide the data
if len(filebytes) > im.width*im.height:
print ("Image too small to encode the file. \
You can store 1 byte per pixel.")
exit()
for i in range(len(filebytes)):
coords = (i%im.width, i/im.width)
byte = ord(filebytes[i])
px[coords[0], coords[1]] = encode_in_pixel(byte, px[coords[0],\
coords[1]])
im.save("output.png", "PNG")

Your original code was correct, except that the type of header.magicnum was unexpected. Your code snippet should read
#Add the header to the file data
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s","{:04d}".format(header.magicnum).encode('UTF-8'), header.size, header.fformat)
filebytes = headerdata + data
or some other suitable format code and encoding that turns header.magicnum into your expected result.

Code
since you said they are all strings, here you go
headerdata = struct.pack("4s"+\
"I"+\
str(Header.MAX_FORMAT_LENGTH)+"s",header.magicnum.encode(), int(header.size), header.fformat.encode())
This should work for the formats and types you want
Explanation
According to this, and specifically section 7.1.2.2, we can find the types needed as arguments for the following format characters:
-----------------------------------------
|Formatting Character | Type (in python)|
-----------------------------------------
|s | integer |
-----------------------------------------
|I | bytes |
-----------------------------------------
and since the data you want to format is of type str, we need to change it.
Lets start with making a str to and integer since it's the simplest.
>>> x = '123'
>>> type(x)
str
>>> y = int(x)
>>> type(y)
int
Easy, all we need to do is call int() on our string.
Next up is turning a string into bytes. We use strings encode() method to do this (documentation)
>>> x = '123'
>>> type(x)
str
>>> y = e.encode()
>>> type(y)
bytes
>>> print(y)
b'123'

Related

TypeError: cannot use a string pattern on a bytes-like object python3

I have updated my project to Python 3.7 and Django 3.0
Here is code of models.py
def get_fields(self):
fields = []
html_text = self.html_file.read()
self.html_file.seek(0)
# for now just find singleline, multiline, img editable
# may put repeater in there later (!!)
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
# m is ('<img editable="true" label="Image" class="w300" width="300" border="0">', 'img editable')
# or similar
# first is full tag, second is tag type
# append as a list
# MUST also save value in here
data = {'tag':m[0], 'type':m[1], 'label':'', 'value':None}
title_list = re.findall("label\s*=\s*\"([^\"]*)", m[0])
if(len(title_list) == 1):
data['label'] = title_list[0]
# store the data
fields.append(data)
return fields
Here is my error traceback
File "/home/harika/krishna test/dev-1.8/mcam/server/mcam/emails/models.py", line 91, in get_fields
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
File "/usr/lib/python3.7/re.py", line 225, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
How can I solve my issue?

The thing is that python3's read returns bytes (i.e. "raw" representation) and not string. You can convert between bytes and string if you specify encoding, i.e. how are characters converted to bytes:
>>> '☺'.encode('utf8')
b'\xe2\x98\xba'
>>> '☺'.encode('utf16')
b'\xff\xfe:&'
the b before string signifies that the value is not string but rather bytes. You can also supply raw bytes if you use that prefix:
>>> bytes_x = b'x'
>>> string_x = 'x'
>>> bytes_x == string_x
False
>>> bytes_x.decode('ascii') == string_x
True
>>> bytes_x == string_x.encode('ascii')
True
Note you can only use basic (ASCII) characters if you are using b prefix:
>>> b'☺'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
So to fix your problem you need to either convert the input to a string with appropriate encoding:
html_text = self.html_file.read().decode('utf-8') # or 'ascii' or something else
Or -- probably better option -- is to use bytes in the findalls instead of strings:
for m in re.findall(b"(<(singleline|multiline|img editable)[^>]*>)", html_text):
...
title_list = re.findall(b"label\s*=\s*\"([^\"]*)", m[0])
(note the b in front of each "string")

Read Null terminated string in python

I'm trying to read a null terminated string but i'm having issues when unpacking a char and putting it together with a string.
This is the code:
def readString(f):
str = ''
while True:
char = readChar(f)
str = str.join(char)
if (hex(ord(char))) == '0x0':
break
return str
def readChar(f):
char = unpack('c',f.read(1))[0]
return char
Now this is giving me this error:
TypeError: sequence item 0: expected str instance, int found
I'm also trying the following:
char = unpack('c',f.read(1)).decode("ascii")
But it throws me:
AttributeError: 'tuple' object has no attribute 'decode'
I don't even know how to read the chars and add it to the string, Is there any proper way to do this?

Here's a version that (ab)uses __iter__'s lesser-known "sentinel" argument:
with open('file.txt', 'rb') as f:
val = ''.join(iter(lambda: f.read(1).decode('ascii'), '\x00'))

How about:
myString = myNullTerminatedString.split("\x00")[0]
For example:
myNullTerminatedString = "hello world\x00\x00\x00\x00\x00\x00"
myString = myNullTerminatedString.split("\x00")[0]
print(myString) # "hello world"
This works by splitting the string on the null character. Since the string should terminate at the first null character, we simply grab the first item in the list after splitting. split will return a list of one item if the delimiter doesn't exist, so it still works even if there's no null terminator at all.
It also will work with byte strings:
myByteString = b'hello world\x00'
myStr = myByteString.split(b'\x00')[0].decode('ascii') # "hello world" as normal string
If you're reading from a file, you can do a relatively larger read - estimate how much you'll need to read to find your null string. This is a lot faster than reading byte-by-byte. For example:
resultingStr = ''
while True:
buf = f.read(512)
resultingStr += buf
if len(buf)==0: break
if (b"\x00" in resultingStr):
extraBytes = resultingStr.index(b"\x00")
resultingStr = resultingStr.split(b"\x00")[0]
break
# now "resultingStr" contains the string
f.seek(0 - extraBytes,1) # seek backwards by the number of bytes, now the pointer will be on the null byte in the file
# or f.seek(1 - extraBytes,1) to skip the null byte in the file

(edit version 2, added extra way at the end)
Maybe there are some libraries out there that can help you with this, but as I don't know about them lets attack the problem at hand with what we know.
In python 2 bytes and string are basically the same thing, that change in python 3 where string is what in py2 is unicode and bytes is its own separate type, which mean that you don't need to define a read char if you are in py2 as no extra work is required, so I don't think you need that unpack function for this particular case, with that in mind lets define the new readString
def readString(myfile):
chars = []
while True:
c = myfile.read(1)
if c == chr(0):
return "".join(chars)
chars.append(c)
just like with your code I read a character one at the time but I instead save them in a list, the reason is that string are immutable so doing str+=char result in unnecessary copies; and when I find the null character return the join string. And chr is the inverse of ord, it will give you the character given its ascii value. This will exclude the null character, if its needed just move the appending...
Now lets test it with your sample file
for instance lets try to read "Sword_Wea_Dummy" from it
with open("sword.blendscn","rb") as archi:
#lets simulate that some prior processing was made by
#moving the pointer of the file
archi.seek(6)
string=readString(archi)
print "string repr:", repr(string)
print "string:", string
print ""
#and the rest of the file is there waiting to be processed
print "rest of the file: ", repr(archi.read())
and this is the output
string repr: 'Sword_Wea_Dummy'
string: Sword_Wea_Dummy
rest of the file: '\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
other tests
>>> with open("sword.blendscn","rb") as archi:
print readString(archi)
print readString(archi)
print readString(archi)
sword
Sword_Wea_Dummy
ÍÌÌ=p=Š4:¦6¿JÆ=
>>> with open("sword.blendscn","rb") as archi:
print repr(readString(archi))
print repr(readString(archi))
print repr(readString(archi))
'sword'
'Sword_Wea_Dummy'
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6='
>>>
Now that I think about it, you mention that the data portion is of fixed size, if that is true for all files and the structure on all of them is as follow
[unknow size data][know size data]
then that is a pattern we can exploit, we only need to know the size of the file and we can get both part smoothly as follow
import os
def getDataPair(filename,knowSize):
size = os.path.getsize(filename)
with open(filename, "rb") as archi:
unknown = archi.read(size-knowSize)
know = archi.read()
return unknown, know
and by knowing the size of the data portion, its use is simple (which I get by playing with the prior example)
>>> strins_data, data = getDataPair("sword.blendscn", 80)
>>> string_data, data = getDataPair("sword.blendscn", 80)
>>> string_data
'sword\x00Sword_Wea_Dummy\x00'
>>> data
'\xcd\xcc\xcc=p=\x8a4:\xa66\xbfJ\x15\xc6=\x00\x00\x00\x00\xeaQ8?\x9e\x8d\x874$-i\xb3\x00\x00\x00\x00\x9b\xc6\xaa2K\x15\xc6=;\xa66?\x00\x00\x00\x00\xb8\x88\xbf#\x0e\xf3\xb1#ITuB\x00\x00\x80?\xcd\xcc\xcc=\x00\x00\x00\x00\xcd\xccL>'
>>> string_data.split(chr(0))
['sword', 'Sword_Wea_Dummy', '']
>>>
Now to get each string a simple split will suffice and you can pass the rest of the file contained in data to the appropriated function to be processed

Doing file I/O one character at a time is horribly slow.
Instead use readline0, now on pypi: https://pypi.org/project/readline0/ . Or something like it.
In 3.x, there's a "newline" argument to open, but it doesn't appear to be as flexible as readline0.

Here is my implementation:
import struct
def read_null_str(f):
r_str = ""
while 1:
back_offset = f.tell()
try:
r_char = struct.unpack("c", f.read(1))[0].decode("utf8")
except:
f.seek(back_offset)
temp_char = struct.unpack("<H", f.read(2))[0]
r_char = chr(temp_char)
if ord(r_char) == 0:
return r_str
else:
r_str += r_char

Python struct string to bin

I´m learning Python.
Seeing the module struct, I found a doubt:
Is it possible to convert a "string" to "bin" without giving the length.
For the case (with chars length)
F = open("data.bin", "wb")
import struct
data = struct.pack("24s",b"This is an unknown string")
print(data)
F.write(data)
F.close()
I´m trying to do the same but with unknown length.
Thanks a lot!

If you have the string, use len to determine the length of the string.
i.e
data = struct.pack("{0}s".format(len(unknown_string)), unknown_string)

The Bytes type is a binary data type, it just stores a bunch of 8bit characters. Note that the code with struct.pack ends up creating a bytes object:
>>> import struct
>>> data = struct.pack("24s",b"This is an unknown string")
>>> type(data)
<class 'bytes'>
>>> len(data)
24
The length of this is 24 as per your format specifier. If you just want to place the bytes-string directly into the file without doing any length checking you don't even need to use the struct module, you can just write it directly to the file:
F = open("data.bin", "wb")
F.write(b"This will work")
If however you wanted to keep the 24 bytes length you could keep using struct.pack:
>>> data = struct.pack("24s",b"This is an unknown st")
>>> len(data)
24
>>> print(data)
b'This is an unknown st\x00\x00\x00'
>>> data = struct.pack("24s",b"This is an unknown string abcdef")
>>> print(data)
b'This is an unknown strin'
In the case of supplying a bytes that is too short struct.pack pads the remainder with 0's and in the case where it's too long it truncates it.
If you don't mind getting the missing space padded out with zeros you can just pass in the bytes object directly to struct.pack and it will handle it.

Thanks to both...
My new code:
F = open("data.bin", "wb")
strs = b"This is an unkown string"
import struct
data = struct.pack("{0}s".format(len(strs)),strs)
print(data)
F.write(data)
F.close()

Python: converting hex values, stored as string, to hex data

(Answer found. Close the topic)
I'm trying to convert hex values, stored as string, in to hex data.
I have:
data_input = 'AB688FB2509AA9D85C239B5DE16DD557D6477DEC23AF86F2AABD6D3B3E278FF9'
I need:
data_output = '\xAB\x68\x8F\xB2\x50\x9A\xA9\xD8\x5C\x23\x9B\x5D\xE1\x6D\xD5\x57\xD6\x47\x7D\xEC\x23\xAF\x86\xF2\xAA\xBD\x6D\x3B\x3E\x27\x8F\xF9'
I was trying data_input.decode('hex'), binascii.unhexlify(data_input) but all they return:
"\xabh\x8f\xb2P\x9a\xa9\xd8\\#\x9b]\xe1m\xd5W\xd6G}\xec#\xaf\x86\xf2\xaa\xbdm;>'\x8f\xf9"
What should I write to receive all bytes in '\xFF' view?
updating:
I need representation in '\xFF' view to write this data to a file (I'm opening file with 'wb') as:
«hЏІPљ©Ш\#›]бmХWЦG}м#Ї†тЄЅm;>'Џщ
update2
Sorry for bothering. An answer lies under my nose all the time:
data_output = data_input.decode('hex')
write_file(filename, data_output) #just opens a file 'wb', ant write a data in it
gives the same result as I need

I like chopping strings into fixed-width chunks using re.findall
print '\\x' + '\\x'.join(re.findall('.{2}', data_input))
If you want to actually convert the string into a list of ints, you can do that like this:
data = [int(x, 16) for x in re.findall('.{2}', data_input)]

It's an inefficient solution, but there's always:
flag = True
data_output = ''
for char in data_input:
if flag:
buffer = char
flag = False
else:
data_output = data_output + '\\x' + buffer + char
flag = True
EDIT HOPEFULLY THE LAST: Who knew I could mess up in so many different ways on that simple a loop? Should actually run now...

>>> int('0x10AFCC', 16)
1093580
>>> hex(1093580)
'0x10afcc'
So prepend your string with '0x' then do the above

File is not decoded properly

I have a file encoded in a strange pattern. For example,
Char (1 byte) | Integer (4 bytes) | Double (8 bytes) | etc...
So far, I wrote the code below, but I have not been able to figure out why still shows garbage in the screen. Any help will be greatly appreciated.
BRK_File = 'commands.BRK'
input = open(BRK_File, "rb")
rev = input.read(1)
filesize = input.read(4)
highpoint = input.read(8)
which = input.read(1)
print 'Revision: ', rev
print 'File size: ', filesize
print 'High point: ', highpoint
print 'Which: ', which
while True
opcode = input.read(1)
print 'Opcode: ', opcode
if opcode = 120:
break
elif
#other opcodes

read() returns a string, which you need to decode to get the binary data. You could use the struct module to do the decoding.
Something along the following lines should do the trick:
import struct
...
fmt = 'cid' # char, int, double
data = input.read(struct.calcsize(fmt))
rev, filesize, highpoint = struct.unpack(fmt, data)
You may have to deal with endianness issues, but struct makes that pretty easy.

It would be helpful to show the contents of the file, as well as the "garbage" that it's outputting.
input.read() returns a string, so you have to convert what you're reading to the type that you want. I suggest looking into the struct module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Encoding error in Python - python

Related

TypeError: cannot use a string pattern on a bytes-like object python3

Read Null terminated string in python

Python struct string to bin

Python: converting hex values, stored as string, to hex data

File is not decoded properly

Categories

Resources