Want the code to read ' instead of ’ - python

I am trying to convert a csv file to a json file. The whole code runs fine but when I encounter the statement:
json.dump(DictName, out_file)
I get the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 15: invalid start byte
Would someone please be able to help?
TIA.

I found the solution. While parsing the string, I converted the string to unicode using the unicode() function:
unicode(stringname, errors='replace')
and it replaced all the erroneous symbols.

Related

Trying to import a csv file, containing non-ascii characters, to a dataframe

When trying to import a csv file into a pandas dataframe I get a UnicodeEncodeError because some of the characters in the csv can't be encoded by ascii. The csv is orignally encoded in utf-8.
My code:
df1 = pd.read_csv(r'‪F:\data\Housing.csv')
UnicodeEncodeError: 'ascii' codec can't encode character '\u202a' in position 0: ordinal not in range(128)
Now I have tried some suggestions posted on stackoverflow to resolve this issue, but alas nothing has worked as of yet.
For instance, I saved the csv file as ascii encoded and tried using the open command hoping I could work my way to a dataframe from there:
open('‪F:\data\Housing.csv', mode='r', encoding='ascii', errors='replace')
However, whether I use 'replace' or 'ignore' the error still remains, I have also tried using the original encoding='utf-8':
UnicodeEncodeError: 'ascii' codec can't encode character '\u202a' in position 0: ordinal not in range(128)
I also tried using codecs.open, but the same result persists.
Perhaps someone here knows how one can solve this issue? Preferably I would replace the characters causing errors with a ? sign.
Thanks in advance!

Python Encoding Error when writing to file

I want write some strings to file which is not in English, they are in Azeri language. Even if I do utf-8 encoding I get following error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)
my code piece that wants to write to file is following:
t_w = text_list[y].encode('utf-8')
new_file.write(t_w.decode('utf-8'))
new_file.write('\n')
EDIT
Even if I make code as:
t_w = text_list[y].encode('ascii',errors='ignore')
new_file.write(t_w)
new_file.write('\n')
I get following error which is :
TypeError: write() argument must be str, not bytes
From what I can tell t_w.decode(...) attempts to convert your characters to ASCII, which doesn't encode some Azeri characters. There is no need to decode the string because you want to write it to the file as UTF-8, so omit the .decode(...) part:new_file.write(t_w)

Python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

I'm implementing this notebook on Windows with Python 3.5.3 and got the follow error on load_vectors() call. I've tried different solutions posted but none worked.
<ipython-input-86-dd4c123b0494> in load_vectors(loc)
1 def load_vectors(loc):
2 return (load_array(loc+'.dat'),
----> 3 pickle.load(open(loc+'_words.pkl','rb')),
4 pickle.load(open(loc+'_idx.pkl','rb')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
I solved this issue by copying and pasting the entire csv file into text and reading it with:
with open(self.path + "/review_collection.txt", "r", encoding="utf-8") as f:
read = f.read().splitlines()
for row in read:
print(row)
You should probably give encoding for pickle.load(f, encoding='latin1'), but please make sure all the characters in your file will follow the encoding.
By default, your pickle code is trying to decode the file with 'ASCII' which fails. Instead you can explicitly tell which one to use. See this from Documentation.
If latin1 doesn't solve, try with encoding='bytes' and then decode all the keys and values later on.
I got the same error as well. I realized that I copy and pasted text from a file that had left and right double-quotes (curly quotes). Once I changed it to the standard straight double-quotes (") the issue was fixed!
See this link for the difference between the quotes: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

JSON encoding/decoding issues in Python

I'm trying to read in a response from a REST API, parse it as JSON and write the properties to a CSV file.
It appears some of the characters are in an unknown encoding and can't be converted to strings when they're written out to the CSV file:
'ascii' codec can't encode character u'\xf6' in position 15: ordinal not in range(128)
So, what I've tried to do is follow the answer by "agf" on this question:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
I added a call to unicode(content).encode("utf-8") when my script reads the contents of the response:
obj = json.loads(unicode(content).encode("utf-8"))
Now I see a exceptions.UnicodeDecodeError on this line.
Is Python attempting to decode "content" before encoding it as utf-8? I don't quite understand what's going on. There is no way to determine the encoding of the response since the API I'm calling doesn't set a Content-Type header.
Not sure how to handle this. Please advise.

UnicodeDecodeError in Python with codecs module

I have a text file which comprises unicode strings "aBiyukÙwa", "varcasÙva" etc. When I try to decode them in the python interpreter using the following code, it works fine and decodes to u'aBiyuk\xd9wa':
"aBiyukÙwa".decode("utf-8")
But when I read it from a file in a python program using the codecs module in the following code it throws a UnicodeDecodeError.
file = codecs.open('/home/abehl/TokenOutput.wx', 'r', 'utf-8')
for row in file:
Following is the error message:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd9 in position 8: invalid continuation byte
Any ideas what is causing this strange behavior?
Your file is not encoded in UTF-8. Find out what it is encoded in, and then use that.

Categories