Python read text

Python read text - python

I am simply trying to read a text file that has 4000+ lines of nouns all single column and I’m getting an error:
Traceback (most recent call last):
File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/nouns.py", line 4, in <module>
for i in nouns_file:
File "/var/containers/Bundle/Application/107074CD-03B1-4FB3-809A-CBD44D6CF245/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2241: ordinal not in range(128)
With code:
with open("nounlist.txt", "r") as nouns_file:
for i in nouns_file:
print(i)
I’m not sure what’s causing this. I would think that it would just output all of the nouns from my nounlist.txt file.

Related

unable to debug error produced while reading csv in python

I am trying to csv file. Code I've written below gives error(available after code block). Not sure what I am missing or doing wrong.
import csv
file = open('AlfaRomeo.csv')
csvreader = csv.reader(file)
for j in csvreader:
print(j)
Traceback (most recent call last):
File "C:\Users\Pratik\PycharmProjects\AkraScraper\Transform_Directory\Developer_Sandbox.py", line 39, in
for j in csvreader:
File "C:\Users\Pratik\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 402: character maps to

The error is that you have a character in your input file which fails the Unicode decode test. It's value is 0x8d (141 decimal), and it's the 402nd byte in the file. I suggest loading the file in a text editor and search forward until you find it. So you know what you're looking for, it's in the Extended ASCII code section of https://www.asciitable.com/.

LookupError: unknown encoding: utf8r

when I try the code:
f = open("xronia.txt", "r")
for x in f:
print(x)
I always take this Error:Traceback (most recent call last):
File "C:\Users\Desktop\PYTHON\Προγραμματισμός Σταύρος\disekta.py",
line 2, in
lines=fo.readlines() File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1253.py",
line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0xff in position
0: character maps to
I have tried to use encoding='utf8' but it didn't work. The file is an excel file formatted as .txt(as I read in a site). I am new to this world, so any help is acceptable..

Python - cannot decode html (urllib)

I'm trying to write html from webpage to file, but I have problem with decode characters:
import urllib.request
response = urllib.request.urlopen("https://www.google.com")
charset = response.info().get_content_charset()
print(response.read().decode(charset))
Last line causes error:
Traceback (most recent call last):
File "script.py", line 7, in <module>
print(response.read().decode(charset))
UnicodeEncodeError: 'ascii' codec can't encode character '\u015b' in
position 6079: ordinal not in range(128)
response.info().get_content_charset() returns iso-8859-2, but if i check content of response without decoding (print(resposne.read())) there is "utf-8" encoding as html metatag. If i use "utf-8" in decode function there is also similar problem:
Traceback (most recent call last):
File "script.py", line 7, in <module>
print(response.read().decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position
6111: invalid start byte
What's going on?

You can ignore invalid characters using
response.read().decode("utf-8", 'ignore')
Instead of ignore there are other options, e.g. replace
https://www.tutorialspoint.com/python/string_encode.htm
https://docs.python.org/3/howto/unicode.html#the-string-type
(There is also str.encode(encoding='UTF-8',errors='strict') for strings.)

Unicode Decode Error while trying to extract data from text file

I am writing a code to extract specific lines from a text file into an output file, and here is what my code looks like:
with open('output.txt', 'w') as outfile:
with open('testfile.txt') as infile:
for line in infile.readlines:
if 'Emotion' in line or 'PANS.RESP' in line:
outfile.write(line)
outfile.write('-----------------------------------------------------')
I keep getting this error:
Traceback (most recent call last): File
"/Users/Sam/Desktop/readstuff.py", line 3, in
for line in infile: File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py",
line 26, in decode
return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
154: ordinal not in range(128)
What should I do?

Python fails with parsing file using re

I have a file that is mostly ascii file, but there appear some non-ascii characters sometimes. I want to parse this files and extract the lines that are marked in a certain way. Previously I used sed for this, but now I need to do the same in python. (Of course I still can use os.system, but I'm hoping for something more convenient).
I'm doing following.
p = re.compile(".*STATWAH ([0-9]*):([0-9]*):([0-9 ]*):([0-9 ]*) STATWAH.*")
f = open("capture_8_8_8__1_2_3.log", encoding="ascii")
fl = filter(lambda line: p.match(line), f)
len(list(fl))
And in the last line I get following error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 2227: ordinal not in range(128)
If I remove encoding parameter from the second line, i. e. use default encoding which is utf-8, the error is following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 2227: invalid start byte
Could you help me please what can I do here, except calling sed from python?
UPD.
Thanks to #Wooble I found the answer.
The correct code looks following:
p = re.compile(rb".*STATWAH ([0-9]*):([0-9]*):([0-9 ]*):([0-9 ]*) STATWAH.*")
f = open("capture_8_8_8__1_2_3.log", "rb")
fl = filter(lambda line: p.match(line), f)
len(list(fl))
I opened file in binary mode and also compile regex from binary string representation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python read text - python

Related

unable to debug error produced while reading csv in python

LookupError: unknown encoding: utf8r

Python - cannot decode html (urllib)

Unicode Decode Error while trying to extract data from text file

Python fails with parsing file using re

Categories

Resources