UnicodeDecodeError while trying to change points into commas in Excel - python

I'm currently trying to run following code:
import csv
with open("C:\\Users\\User\\Downloads\\stu21617.rw2.xlsx", 'r', encoding='utf-8') as infile:
with open("C:\\Users\\User\\Documents\\jens\\komma.xlsx", 'w', encoding='utf-8') as outfile:
tabel = []
writer = csv.writer(outfile, delimiter = ';')
for rij in csv.reader(infile, delimiter = ';'):
writer.writerow(rij.replace('.', ','))
But I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 15-16: invalid continuation byte

Related

How to read csv file using python that have multi line data in one field [duplicate]

I've read every post I can find, but my situation seems unique. I'm totally new to Python so this could be basic. I'm getting the following error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 70: character maps to undefined
When I run the code:
import csv
input_file = 'input.csv'
output_file = 'output.csv'
cols_to_remove = [4, 6, 8, 9, 10, 11,13, 14, 19, 20, 21, 22, 23, 24]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0 # Current amount of rows processed
with open(input_file, "r") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='') as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count), end='')
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
What am I doing wrong?
In Python 3, the csv module processes the file as unicode strings, and because of that has to first decode the input file. You can use the exact encoding if you know it, or just use Latin1 because it maps every byte to the unicode character with same code point, so that decoding+encoding keep the byte values unchanged. Your code could become:
...
with open(input_file, "r", encoding='Latin1') as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding='Latin1') as result:
...
Add encoding="utf8" while opening file. Try below instead:
with open(input_file, "r", encoding="utf8") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding="utf8") as result:
Try pandas
input_file = pandas.read_csv('input.csv')
output_file = pandas.read_csv('output.csv')
Try saving the file again as CSV UTF-8

csv read raises "UnicodeDecodeError: 'charmap' codec can't decode..."

I've read every post I can find, but my situation seems unique. I'm totally new to Python so this could be basic. I'm getting the following error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 70: character maps to undefined
When I run the code:
import csv
input_file = 'input.csv'
output_file = 'output.csv'
cols_to_remove = [4, 6, 8, 9, 10, 11,13, 14, 19, 20, 21, 22, 23, 24]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0 # Current amount of rows processed
with open(input_file, "r") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='') as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count), end='')
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
What am I doing wrong?
In Python 3, the csv module processes the file as unicode strings, and because of that has to first decode the input file. You can use the exact encoding if you know it, or just use Latin1 because it maps every byte to the unicode character with same code point, so that decoding+encoding keep the byte values unchanged. Your code could become:
...
with open(input_file, "r", encoding='Latin1') as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding='Latin1') as result:
...
Add encoding="utf8" while opening file. Try below instead:
with open(input_file, "r", encoding="utf8") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding="utf8") as result:
Try pandas
input_file = pandas.read_csv('input.csv')
output_file = pandas.read_csv('output.csv')
Try saving the file again as CSV UTF-8

ASCII Encode/Decode Error

I have csv files having encoding-'utf-8'. I need to convert the csv to excel workbook with same encoding but unable to do so. Tried many things but not able to fix. Here is the code snippet.
NOte: Using xlsxwriter package
def csv_to_excel(input_file_path, output_file_path):
file_path = input_file_path
excel_file_path = output_file_path
wb = Workbook(excel_file_path.encode('utf-8', 'ignore'), {'encoding': 'utf-8'})
sheet1 = wb.add_worksheet(("anyname1").encode('utf-8','ignore'))
sheet2 = wb.add_worksheet(("anyname2").encode('utf-8','ignore'))
for filename in glob.glob(file_path):
(f_path, f_name) = os.path.split(filename)
w_tab = str(f_name.split('_')[2]).split('.')[0]
if (w_tab=="anyname1"):
w_sheet = sheet1
elif (w_tab=="anyname2"):
w_sheet = sheet2
spamReader = csv.reader(open(filename, "rb"), delimiter=',',quotechar='"')
row_count = 0
for row in spamReader:
for col in range(len(row)):
w_sheet.write(row_count,col,row[col])
row_count +=1
try:
os.remove(excel_file_path)
except:
pass
wb.close()
print "Converted CSVs to Excel File"
Errors:
Case1: When I am trying to open the utf-8 encoded csv file as follows:
spamReader = csv.reader(io.open(filename, "r", encoding = 'utf-8'), delimiter=',',quotechar='"')
Then getting error while iterating over the spamReader object as
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 92: ordinal not in range(128)
Case2: When I am trying to open the same csv file as binary as mentioned in above code snippet, then I am not able to save it as utf-8 encoded excel, so while calling wb.close(), getting error as
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 12: ordinal not in range(128)
I have just started learning python so maybe this is not that big issue but Please help me on this.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to <undefined>

When I run my code (Python 3) I keep getting this error:
Traceback (most recent call last):
File "country.py", line 16, in <module>
for row in csv_reader:
File "C:\Users\benny\Anaconda3\lib\csv.py", line 112, in __next__
row = next(self.reader)
File "C:\Users\benny\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2247: character maps to <undefined>
I have tried these solutions but none work.
The code only prints one line if I fix the encoding problem by adding encoding='UTF-8. If I leave the encoding problem in place it prints almost 700 rows before it throws an error. Either way, it still won't work.
import csv
import country_converter as coco
with open('Interpol.csv', 'r') as csv_file, open('Interpol_Extra.csv', 'w', newline='') as new_file:
csv_reader = csv.DictReader(csv_file)
fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent',
'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames
csv_writer = csv.DictWriter(new_file, fieldnames)
csv_writer.writeheader()
for row in csv_reader:
case_country_name = row['Case happened - Country']
recovered_country_name = row['Recovered - Country']
if case_country_name:
row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')
if recovered_country_name:
row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')
csv_writer.writerow(row)
This is the code I used which finally worked.
As suggested by Arun in the comments, if you're having a similar problem you should read all the answers on this question. It has the most succinct and helpful info on stack exchange for this problem.
And then re-check your code to make sure it is valid. In my case, it was some wrong indentation that finally fixed it.
import csv
import country_converter as coco
with open('Interpol.csv', 'r', encoding="utf-8") as csv_file, open('Interpol_Extra.csv', 'w', newline='', encoding="utf-8") as new_file:
csv_reader = csv.DictReader(csv_file)
fieldnames = ['Case Happened - UN Region', 'Case Happened - Continent',
'Recovered - UN Region', 'Recovered - Continent'] + csv_reader.fieldnames
csv_writer = csv.DictWriter(new_file, fieldnames)
csv_writer.writeheader()
for row in csv_reader:
case_country_name = row['Case happened - Country']
recovered_country_name = row['Recovered - Country']
if case_country_name:
row['Case Happened - UN Region'] = coco.convert(names=case_country_name, to='UNregion')
row['Case Happened - Continent'] = coco.convert(names=case_country_name, to='Continent')
if recovered_country_name:
row['Recovered - UN Region'] = coco.convert(names=recovered_country_name, to='UNregion')
row['Recovered - Continent'] = coco.convert(names=recovered_country_name, to='Continent')
csv_writer.writerow(row)

Write numpy.ndarray with Russian characters to file

I try to write numpy.ndarray to file.
I use
unique1 = np.unique(df['search_term'])
unique1 = unique1.tolist()
and next try
1)
edf = pd.DataFrame()
edf['term'] = unique1
writer = pd.ExcelWriter(r'term.xlsx', engine='xlsxwriter')
edf.to_excel(writer)
writer.close()
and 2)
thefile = codecs.open('domain.txt', 'w', encoding='utf-8')
for item in unique:
thefile.write("%s\n" % item)
But all return UnicodeDecodeError: 'utf8' codec can't decode byte 0xd7 in position 9: invalid continuation byte
The second example should work if you encode the strings as utf8.
The following works in Python2 with a utf8 encoded file:
# _*_ coding: utf-8
import pandas as pd
edf = pd.DataFrame()
edf['term'] = ['foo', 'bar', u'русском']
writer = pd.ExcelWriter(r'term.xlsx', engine='xlsxwriter')
edf.to_excel(writer)
writer.save()
Output:

Categories