Encode/decode documents to base64 dynamically - python

How do I encode pdf and word files in a folder to base64 and decode them and save into the same folder?
The pdf and word files are generated dynamically through a web service.
I would like to use python to do so.
I used this. But it gives the error
Traceback (most recent call last):
File "sample.py", line 7, in
base64.encode(open("hello.pdf"), open("hello1.b64", "w"))
File "C:\Python34\lib\base64.py", line 496, in encode
s = input.read(MAXBINSIZE)
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1340: character maps
base64.encode(open("hello.pdf"), open("hello1.b64", "w"))

The base64 module, which is included in the standard lib. The documentation is here.

Related

unable to debug error produced while reading csv in python

I am trying to csv file. Code I've written below gives error(available after code block). Not sure what I am missing or doing wrong.
import csv
file = open('AlfaRomeo.csv')
csvreader = csv.reader(file)
for j in csvreader:
print(j)
Traceback (most recent call last):
File "C:\Users\Pratik\PycharmProjects\AkraScraper\Transform_Directory\Developer_Sandbox.py", line 39, in
for j in csvreader:
File "C:\Users\Pratik\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 402: character maps to
The error is that you have a character in your input file which fails the Unicode decode test. It's value is 0x8d (141 decimal), and it's the 402nd byte in the file. I suggest loading the file in a text editor and search forward until you find it. So you know what you're looking for, it's in the Extended ASCII code section of https://www.asciitable.com/.

Python read text

I am simply trying to read a text file that has 4000+ lines of nouns all single column and I’m getting an error:
Traceback (most recent call last):
File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/nouns.py", line 4, in <module>
for i in nouns_file:
File "/var/containers/Bundle/Application/107074CD-03B1-4FB3-809A-CBD44D6CF245/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2241: ordinal not in range(128)
With code:
with open("nounlist.txt", "r") as nouns_file:
for i in nouns_file:
print(i)
I’m not sure what’s causing this. I would think that it would just output all of the nouns from my nounlist.txt file.

json.load() function give strange 'UnicodeDecodeError: 'ascii' codec can't decode' error

I'm trying to read a JSON file I have saved in a text file using the python .loads() function. I will later parse the JSON to obtain a specific value.
I keep getting this error message. When I google it, there are no results.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position >85298: ordinal not in range(128)
Here is the full error message:
Traceback (most recent call last): File ".../FirstDegreeKanyeScript.py", >line 10, in data=json.load(data_file) File >"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/in>it.py", line 265, in load return loads(fp.read(), File >"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings>/ascii.py", line 26, in decode return codecs.ascii_decode(input, >self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 >in position 85298: ordinal not in range(128)
Here is my code:
import json
from pprint import pprint
with
open("/Users/.../KanyeAllSongs.txt") as data_file:
data=json.load(data_file)
pprint(data)
I've tried adding data.decode('utf-8') under the json.load, but I still get the same error.
Any ideas what could be the issue?
Specify the encoding in the open call.
# encoding is a keyword argument
open("/Users/.../KanyeAllSongs.txt", encoding='utf-8') as data_file:
data=json.load(data_file)

Python ignores encoding argument in favor of cp1252

I have a lengthy json file that contains utf-8 characters (and is encoded in utf-8). I want to read it in python using the built-in json module.
My code looks like this:
dat = json.load(open("data.json"), "utf-8")
Though I understand the "utf-8" argument should be unnecessary as it is assumed as the default. However, I get this error:
Traceback (most recent call last):
File "winratio.py", line 9, in <module>
dat = json.load(open("data.json"), "utf-8")
File "C:\Python33\lib\json\__init__.py", line 271, in load
return loads(fp.read(),
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 28519: ch
aracter maps to <undefined>
My question is: Why does python seem to ignore my encoding specification and try to load the file in cp1252?
Try this:
import codecs
dat = json.load(codecs.open("data.json", "r", "utf-8"))
Also here are described some tips about a writing mode in context of the codecs library: Write to UTF-8 file in Python

UnicodeDecodeError reading string in CSV

I'm having a problem reading some chars in python.
I have a csv file in UTF-8 format, and I'm reading, but when script read:
Preußen Münster-Kaiserslautern II
I get this error:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 515, in __call__
handler.get(*groups)
File "/Users/fermin/project/gae/cuotastats/controllers/controllers.py", line 50, in get
f.name = unicode( row[1])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
I tried to use Unicode functions and convert string to Unicode, but I haven't found the solution. I tried to use sys.setdefaultencoding('utf8') but that doesn't work either.
Try the unicode_csv_reader() generator described in the csv module docs.

Categories