codecs issue when writing to csv python3

codecs issue when writing to csv python3 - python

output_file = open(OUTPUT_FILENAME,'w',newline='') #create new file
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
#some logic
for row in items:
#di dictionary
dict_writer.writerow(di)
Hello, I am new to python. I created this script on Linux (centos) I ran it and it works fine,
I tried running it on windows I got this error
Traceback (most recent call last):
File "C:\Users\user157\Desktop\test.py", line 180, in <module>
dict_writer.writerow(di)
File "C:\Users\user157\AppData\Local\Programs\Python\Python38-32\lib\csv.py", line 154, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "C:\Users\user157\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1256.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe3' in position 33: character maps to <undefined>
I tried solving it by using before I write the dictionary but still same error
for k,v in di.items():
try:
di[k] =v.encode().decode('utf-8')
except:
pass
I have python 3.7.5 on centos and 3.8.2 on windows

You need to check what your input file encoding is in Windows and use encoding='' in your input file open statement. In windows the default ending is not 'utf8' and therefore will have encoding issues if not opened with correct encoding like below,
open(input_file_name,encoding='iso-8859-1')
or better change your input file to 'utf8' encoding, so that the script can be used without modification on windoes and in linux.

Related

Python 3.11 - Remove digit code runs and works but results in error

I created simple code to remove digits from hosts file.
with open(r'C:\Windows\System32\drivers\etc\hosts', 'r') as infile, open(r'C:\XXXX\XXX\X.txt', 'w') as outfile:
for i in infile:
if not i[0].isdigit():
print("Line Skipped")
else:
if int(i[0]) == 0:
B = i[8:]
outfile.write(B)
elif int(i[0]) == 1:
B = i[10:]
outfile.write(B)
The script runs fine, until, it returns an error.
I've examined the hosts file where it broke and the data is not at fault.
The error returned is:
Traceback (most recent call last):<br>
File "<pyshell#142>", line 2, in <module><br>
for i in infile:<br>
File "C:\XXX\Python\Python311\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]<br>
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4244: character maps to undefined
I understand its pointing to a library, is it to do with running memory? Should I be running the script differently?
Im running this code from inside the python IDLE shell

Encoding issue related to the open() function

Can't install l18n, UnicodeDecodeError: 'cp950' codec can't decode byte

Error occurs when I run
pip install l18n
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\X\AppData\Local\Temp\pip-install-8urtlamu\l18n\setup.py", line 99, in <module>
long_description=open(os.path.join('README.rst')).read(),
UnicodeDecodeError: 'cp950' codec can't decode byte 0xc3 in position 2135: illegal multibyte sequence
Tried but didnt work:
chcp 65001
Alternative concole: cmder
Config:
Windows 7
Python 3.6.4
Pip 10.0.1
Thanks!

It is manifestly a bug inside l18n: in the setup.py, the long_description parameter is build by reading the README.rst file (this is a classic way to do that).
The trace back says: 'cp950' codec can't decode byte 0xc3 in position 2135. This is a classic error with utf-8 encoded text which contains non-ascii characters.
The source code is stored in Bitbucket:
long_description=open(os.path.join('README.rst')).read(),
The behavior of the open function changed in Python 3. You must set the file encoding which is utf-8 here:
A portable way to solve that is to define a function:
import io
def read(path):
with io.open(path, mode='r', encoding='utf-8') as f:
return f.read()
And to use it like this:
long_description=read('README.rst')
There is an issue about that.

Finding the encoding opening csv file in python

I have problems understanding how to detect the proper encoding of a csv file.
I created a small csv file as a sample for testing, cutting and pasting some rows from one of the original files I want to process, and saved that information in my local excel, as CSV.
My program can handle this or similar files without problem, but when I try to open a file sent to me from another computer, the program exits with an error.
The section of the code that opens the file:
with open(file_path,'r') as f:
dialect = csv.Sniffer().sniff(f.read(1024))
f.seek(0)
reader = csv.DictReader(f, fieldnames=['RUT', 'Nombre', 'Telefono'], dialect=dialect)
for row in reader:
numeros.append(row['Telefono'])
The error:
Traceback (most recent call last):
File "C:/Users/.PyCharmEdu3.5/config/scratches/scratch.py", line 22, in <module>
for row in reader:
File "C:\Program Files\Python35\lib\csv.py", line 110, in __next__
row = next(self.reader)
File "C:\Program Files\Python35\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6392: character maps to <undefined>
Process finished with exit code 1
My locale.getpreferredencoding() is 'cp1252'
I did a couple of attempts to guess the encoding:
with open(file_path,'r', encoding='cp1252') as f:
It works with my local generated csv, but not with the ones I'm sent.
with open(file_path,'r', encoding='utf-8') as f:
Doesn't work with any file, but it generates a different error:
Traceback (most recent call last):
File "C:/Users/.PyCharmEdu3.5/config/scratches/scratch.py", line 19, in <module>
dialect = csv.Sniffer().sniff(f.read(1024))
File "C:\Program Files\Python35\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1670: invalid continuation byte
Process finished with exit code 1
I tried too adding newline='' to the open() but it doesn't make a difference.
Following an answer from stackoverflow, I opened the file with notepad, and checked encoding in 'Save as', both my local files and the ones I receive from emails show 'ANSI' as the encoding.
Do I need to figure out the encoding by myself, or python can do that for me? Is there something wrong in my code?
I'm using Python 3.5, and the files are most likley created in computers with Spanish OS.
Update: I been doing some more testing. Almost all csv files open without problems, and the program runs correctly, but there are 2 files that cause an error when I try to open them. If I use excel, or notepad this files look normal. I suspect that the files were created or saved on a computer with an uncommon OS or language.

How to avoid automatic ASCII encoding on Python 3?

I'm working on an encryption program in Python 3 right now, but I am having some problems with ASCII encoding. For example, if I want to write a text file from python that rights Ϩ (which is chr(1000)) into a text file, and I do:
a_file = open('chr_ord.txt', 'w')
a_file.write(chr(1000))
a_file.close()
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
File "C:/Comp_Sci/Coding/printRAW.py", line 3, in <module>
a_file.write(chr(1000))
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03e8' in position 0: character maps to <undefined>
And if I try:
a_file = open('chr_ord.txt', 'w')
a_file.write(ascii(chr(1000)))
a_file.close()
Python doesn't crash, but the text file contains '\u03e8' instead of the desired Ϩ
Is there any way I can go around this?

The Python 3 way is to use the encoding parameter when opening the file. Eg. encode the file as UTF-8
a_file = open('chr_ord.txt', 'w', encoding='utf-8')
The default is your system ANSI code page, which doesn't contain the Ϩ character.

Python: File encoding errors

From a few days I'm struggling this annoying problem with file encoding in my little program in Python.
I work a lot with MediaWiki - recently I do documents conversion from .doc to Wikisource.
Document in Microsoft Word format is opened in Libre Office and then exported to .txt file with Wikisource format. My program is searching for a [[Image:]] tag and replace it with a name of image taken from a list - and that mechanism works really fine (Big Thanks for help brjaga!).
When I did some test on .txt files created by me everything worked just fine but when I put a .txt file with Wikisource whole thing is not so funny anymore :D
I got this message prom Python:
Traceback (most recent call last):
File "C:\Python33\final.py", line 15, in <module>
s = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
File "C:\Python33\lib\encodings\cp1250.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 7389: character maps to <undefined>
And this is my Python code:
li = [
"[[Image:124_BPP_PL_PL_Page_03_Image_0001.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0002.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0003.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0004.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0005.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0006.jpg]]",
"[[Image:124_BPP_PL_PL_Page_03_Image_0007.jpg]]",
"[[Image:124_BPP_PL_PL_Page_05_Image_0001.jpg]]",
"[[Image:124_BPP_PL_PL_Page_05_Image_0002.jpg]]"
]
with open ("C:\\124_BPP_PL_PL.txt") as myfile:
s = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
dest = open('C:\\124_BPP_PL_PL_processed.txt', 'w')
for item in li:
s = s.replace("[[Image:]]", item, 1)
dest.write(s)
dest.close()
OK, so I did some research and found that this is a problem with encoding. So I installed a program Notepad++ and changed the encoding of my .txt file with Wikisource to: UTF-8 and saved it. Then I did some change in my code:
with open ("C:\\124_BPP_PL_PL.txt", encoding="utf8') as myfile:
s = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
But I got this new error message:
Traceback (most recent call last):
File "C:\Python33\final.py", line 22, in <module>
dest.write(s)
File "C:\Python33\lib\encodings\cp1250.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>
And I'm really stuck on this one. I thought, when I change the encoding manually in Notepad++ and then I will tell the encoding which I set - everything will be good.
Please help, Thank You in advance.

When Python 3 opens a text file, it uses the default encoding for your system when trying to decode the file in order to give you full Unicode text (the str type is fully Unicode aware). It does the same when writing out such Unicode text values.
You already solved the input side; you specified an encoding when reading. Do the same when writing: specify a codec to use to write out the file that can handle Unicode, including the non-breaking whitespace character at codepoint U+FEFF. UTF-8 is usually a good default choice:
dest = open('C:\\124_BPP_PL_PL_processed.txt', 'w', encoding='utf8')
You can use the with statement when writing too and save yourself the .close() call:
for item in li:
s = s.replace("[[Image:]]", item, 1)
with open('C:\\124_BPP_PL_PL_processed.txt', 'w', encoding='utf8') as dest:
dest.write(s)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

codecs issue when writing to csv python3 - python

Related

Python 3.11 - Remove digit code runs and works but results in error

Can't install l18n, UnicodeDecodeError: 'cp950' codec can't decode byte

Finding the encoding opening csv file in python

How to avoid automatic ASCII encoding on Python 3?

Python: File encoding errors

Categories

Resources