Unicode Decode Error while reading text from image - python

I have used this code to read text from an image file.
Reading text from image
The code is as follows
from PIL import Image
from pytesseract import image_to_string
image = Image.open("image.jpg",'r')
myText = image_to_string(Image.open(open('maxresdefault.jpg')),config='-psm 10')
myText = image_to_string(Image.open(open('maxresdefault.jpg')))
print(myText)
Error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 278: character maps to
Tried to solve this error from following:UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>
Then got error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

As per Image documentation (help(Image.open)), the image files must be opened in the binary mode:
open('maxresdefault.jpg', 'rb')

Load the Image in binary format.
Changing the following code solved the problem for me.
import PIL.Image
pil_image = PIL.Image.open(image_path, "rb")
Hope it helps !

Related

unicode decode error while importing Medical Data on pandas

I tried importing a medical data and I ran into this unicode error, here is my code:
output_path = r"C:/Users/muham/Desktop/AI projects/cancer doc classification"
my_file = glob.glob(os.path.join(output_path, '*.csv'))
for files in my_file:
data = pd.read_csv(files)
print(data)
My error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 3314: invalid start byte
Try other encodings, default one is utf-8
like
import pandas
pandas.read_csv(path, encoding="cp1252")
or ascii, latin1, etc ...

Tabula-py windows- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position N: invalid start byte

I'm trying to read pdf file like
di = read_pdf('test.pdf', output_format= 'json', encoding='utf-8', guess = False)
Its working fine on linux. But on windows getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position N: invalid start byte", Please help me to know the issue.

UnicodeDecodeError: 'utf8' codec can't decode byte 0x8e in position 1

This is my first post here excuse me if i miss anything.
I have some data in my CSV file and am trying to import data into my prod and getting UnicodeDecodeError. I have some french words in my csv file
Code:
open_csv = csv.DictReader(open('filename',''rb))
for i in open_csv:
x = find(where={})#mongodb query
x.something = i.get(row_header)
x.save()
am getting UnicodeDecodeError: 'utf8' codec can't decode byte 0x8e in position 1 error while saving the data
I would suggest you to try the following code:
import codecs
open_csv = csv.DictReader(codecs.open('filename','rb'))
for i in open_csv:
x = find(where={})
x.something = i.get(row_header)
x.save()
I work in Python 3.x but this should work in 2.x too if that is what you are using.

Write unicode gif to file in python

I have a GIF file (or any image format) in unicode form:
>>> data
u'GIF89a,\x000\x00\ufffd\ufffd\x00\x00\x00\x00\ufffd\ufffd\ufff...
I want to write this to file:
>>> f = open('file.gif', 'wb')
>>> f.write(data)
But I get an error:
UnicodeEncodeError at /image
'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
How do I do this?
Try this:
utf8data = data.encode('UTF-8')
open('file.gif', 'w').write(utf8data)
You must encode the string to unicode explicitly
f.write(data.encode('utf-8'))

urllib2 opener providing wrong charset

When I open the url and read it, I can't recognize it. But when I check the content header it says it is encoded as utf-8. So I tried to convert it to unicode and it complained UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128) using unicode().
.encode("utf-8") produces
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
.decode("utf-8") produced
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte.
I have tried everything I can come up with(I'm not that good at encodings)
I would be happy if I could get this to work. Thanks.
This is a common mistake. The server sends gzipped stream.
You should unpack it first:
response = opener.open(self.__url, data)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO.StringIO( response.read())
gzip_f = gzip.GzipFile(fileobj=buf)
content = gzip_f.read()
else:
content = response.read()
The header is probably wrong. Check out chardet.
EDIT: Thinking more about it -- my money is on the contents being gzipped. I believe some of Python's various URL-opening modules/classes/etc will ungzip, while others won't.

Categories