Python script to convert from UTF-8 to ASCII [duplicate] - python

This question already has answers here:
Convert Unicode to ASCII without errors in Python
(12 answers)
Closed 8 years ago.
I'm trying to write a script in python to convert utf-8 files into ASCII files:
#!/usr/bin/env python
# *-* coding: iso-8859-1 *-*
import sys
import os
filePath = "test.lrc"
fichier = open(filePath, "rb")
contentOfFile = fichier.read()
fichier.close()
fichierTemp = open("tempASCII", "w")
fichierTemp.write(contentOfFile.encode("ASCII", 'ignore'))
fichierTemp.close()
When I run this script I have the following error :
UnicodeDecodeError: 'ascii' codec
can't decode byte 0xef in position 13:
ordinal not in range(128)
I thought that can ignore error with the ignore parameter in the encode method. But it seems not.
I'm open to other ways to convert.

data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")

import codecs
...
fichier = codecs.open(filePath, "r", encoding="utf-8")
...
fichierTemp = codecs.open("tempASCII", "w", encoding="ascii", errors="ignore")
fichierTemp.write(contentOfFile)
...

UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

Related

Convert multiple CSV files into UTF-8 encoding

I need to convert multiple CSV files (with different encodings) into UTF-8.
Here is my code:
#find encoding and if not in UTF-8 convert it
import os
import sys
import glob
import chardet
import codecs
myFiles = glob.glob('/mypath/*.csv')
csv_encoding = []
for file in myFiles:
with open(file, 'rb') as opened_file:
bytes_file=opened_file.read()
result=chardet.detect(bytes_file)
my_encoding=result['encoding']
csv_encoding.append(my_encoding)
print(csv_encoding)
for file in myFiles:
if csv_encoding in ['utf-8', 'ascii']:
print(file + ' in utf-8 encoding')
else:
with codecs.open(file, 'r') as file_for_conversion:
read_file_for_conversion = file_for_conversion.read()
with codecs.open(file, 'w', 'utf-8') as converted_file:
converted_file.write(read_file_for_conversion)
print(file +' converted to utf-8')
When I try to run this code I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 5057: invalid continuation byte
Can someone help me? Thanks!!!
You need to zip the lists myFiles and csv_encoding to get their values aligned:
for file, encoding in zip(myFiles, csv_encoding):
...
And you need to specify that value in the open() call:
...
with codecs.open(file, 'r', encoding=encoding) as file_for_conversion:
Note: in Python 3 there's no need to use the codecs module for opening files.
Just use the built-in open function and specify the encoding with the encoding parameter.

How can I solve Python Unicode Encoding Error? [duplicate]

This question already has answers here:
Python, Unicode, and the Windows console
(15 answers)
Closed 2 years ago.
I was trying to load a .json file. This file is an English word dictionary that contained a lot of Unicode like \u266f. By using encoding = "utf8" can not solve the error. Then I replaced all of the Unicode with UTF-8; but still, it shows the same error.
My code:
import json
data = json.load(open("data.json", encoding="utf8"))
print(data)
Result:
Traceback (most recent call last):
File "E:\Python\Dictionary App\test.py", line 4, in <module>
print(data)
File "C:\Users\ahnaf\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u266f' in position 657370: character maps to <undefined>
[Finished in 0.32s]
The json file: data.json
Try to include:
# -*- coding: utf-8 -*-
or
# coding: utf-8
in your python file header.
Exemple:
# coding: utf-8
import json
with open("data.json") as f:
data = json.loads(f.read())
print(data)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0627' in position 0: ordinal not in range(128)

I got the error in the title when I tried to save the data in csv file and I don't know how to fix.
# -*- coding: utf-8 -*-
keys = sorted(self.Details.keys()) #### 1st sort the values of dictionary list
with open("test.csv", "wb") as outfile:
writer = csv.writer(outfile, delimiter = "\t")
writer.writerow(keys)
writer.writerows(zip(*[self.Details[key] for key in keys]))
The default for csv when writing is ascii but your data has strings that are out of ascii, so if you are using python3, try:
with open("test.csv", "wb", encoding='utf-8')
If you are using python2, you can try this package unicodecsv: https://pypi.org/project/unicodecsv/

UnicodeDecodeError when import json file

I want to open a json file in python and I have the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 64864: ordinal not in range(128)
my code is quite simple:
# -*- coding: utf-8 -*-
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file)
print(data)
Someone can help me? Thanks!
Try the following code.
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file).decode('utf-8')
print(data)
You should specify your encoding format when you load your json file. like this:
data = json.load(data_file, encoding='utf-8')
The encoding depends on your file encoding.

UnicodeEncodeError -- utf8 and unicode() not working

I have a synopsis as follows:
synopsis = 'Eine Geschichte, wie im normalen Leben... Der als äußerst vorsichtig
geltende Risikoanalytiker Ruben verlässt seine Frau,...'
I am trying to write this to a file, but keep running into:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 705: ordinal not in range(128)
Here is what I've tried:
synopsis = unicode(synopsis)
new_file.write('%s' % synopsis)
synopsis = synopsis.encode('utf-8')
new_file.write('%s' % synopsis)
In addition, I have # # -*- coding: utf-8 -*- specified at the top of my file.
Why is this occurring and how can I fix it?
How are you opening new_file?
import codecs
new_file = codecs.open('out', mode='w', encoding='utf-8')
This should allow you to write Unicode strings to the file, which will be encoded as UTF-8.
(Unless otherwise set, sys.getdefaultencoding() is 'ascii', which affects the encoding of newly-opened files.)

Categories