JSON load doesn't work for me whenever I run my code,
In this part, I create the code to read the file
import json
filename = 'eq_1_day_m1.json'
with open(filename) as f:
all_eq_data = json.load(f)
readable_file = 'readable_eq_data.json'
with open(readable_file, 'w') as c:
json.dump(all_eq_data, c, indent=4
Then It gives me so many errors talking about charmap. I think this is because of the maximum capacity. Can I do something about this?
C:\Users\PC\AppData\Local\Microsoft\WindowsApps\python.exe "C:/Users/PC/PycharmProjects/Learning/Learning Matplotlib/eq_explore_data.py"
Traceback (most recent call last):
File "C:\Users\PC\PycharmProjects\Learning\Learning Matplotlib\eq_explore_data.py", line 5, in <module>
all_eq_data = json.load(f)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.496.0_x64__qbz5n2kfra8p0\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.496.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 10292: character maps to <undefined>
Process finished with exit code 1
I have my json file: 'eq_1_day_m1.json' if you are wondering. It's too big for StackOverflow to handle so I didn't add it to the question.
Related
So while trying to mess around with python i tried making a program which would get me the content from pastebin url's and then save each ones content into a file of their own. I got an error
This is the code :-
import requests
file = open("file.txt", "r", encoding="utf-8").readlines()
for line in file:
link = line.rstrip("\n")
n_link = link.replace("https://pastebin.com/", "https://pastebin.com/raw/")
pastebin = n_link.replace("https://pastebin.com/raw/", "")
r = requests.get(n_link, timeout=3)
x = open(f"{pastebin}.txt", "a+")
x.write(r.text)
x.close
I get the following error :-
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\Py\Misc. Scripts\ai.py", line 9, in <module>
x.write(r.text)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2694' in position 9721: character maps to <undefined>
Can somebody help?
You’re doing good at the start by reading in the input file as UTF-8. The only thing you’re missing is to do the same thing with your output file:
x = open(f"{pastebin}.txt", "a+", encoding="utf-8")
I've got problem with reading text files. When I start program and add file, it throws an error:
Traceback (most recent call last):
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 38, in <module>
main_func()
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 32, in main_func
read_file()
File "c:/Users/Marcin/Desktop/python/graf_menu.py", line 15, in read_file
for i in f.read():
File "C:\Users\Marcin\AppData\Local\Programs\Python\Python38-32\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 19: invalid start byte
In my code there is a line with "encoding="UTF-8". How to solve the problem. The code below:
files = input("File name: ")
try:
with open(files,"r",encoding="UTF-8") as f:
for i in f.read():
print(i,end='')
except FileNotFoundError:
print("FileNotFoundError")
There is nothing wrong with the program itself. You are getting this error because you are trying to read a file which is not encoded as UTF-8 as UTF-8-encoded. You have to either convert the contents of the file to UTF-8 or specify a different encoding (the one that the file actually uses) in the call to open.
This file is not encoded as UTF-8 try to use encoded="iso-8859-1"
when I try the code:
f = open("xronia.txt", "r")
for x in f:
print(x)
I always take this Error:Traceback (most recent call last):
File "C:\Users\Desktop\PYTHON\Προγραμματισμός Σταύρος\disekta.py",
line 2, in
lines=fo.readlines() File "C:\Users\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1253.py",
line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0xff in position
0: character maps to
I have tried to use encoding='utf8' but it didn't work. The file is an excel file formatted as .txt(as I read in a site). I am new to this world, so any help is acceptable..
I've created python modules from the JSON grammar on github / antlr4 with
antlr4 -Dlanguage=Python3 JSON.g4
I've written a main program "JSON2.py" following this guide: https://github.com/antlr/antlr4/blob/master/doc/python-target.md
and downloaded the example1.json also from github.
python3 ./JSON2.py example1.json # works perfectly, but
python3 ./JSON2.py bookmarks-2017-05-24.json # the bookmarks contain German Umlauts like "ü"
...
File "/home/xyz/lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)
The offending line in JSON2.py is
input = FileStream(argv[1])
I've searched stackoverflow and tried this instead of using the above FileStream:
fp = codecs.open(argv[1], 'rb', 'utf-8')
try:
input = fp.read()
finally:
fp.close()
lexer = JSONLexer(input)
stream = CommonTokenStream(lexer)
parser = JSONParser(stream)
tree = parser.json() # This is line 39, mentioned in the error message
Execution of this program ends with an error message, even if the input file doesn't contain Umlauts:
python3 ./JSON2.py example1.json
Traceback (most recent call last):
File "./JSON2.py", line 46, in <module>
main(sys.argv)
File "./JSON2.py", line 39, in main
tree = parser.json()
File "/home/x/Entwicklung/antlr/links/JSONParser.py", line 108, in json
self.enterRule(localctx, 0, self.RULE_json)
File "/home/xyz/lib/python3.5/site-packages/antlr4/Parser.py", line 358, in enterRule
self._ctx.start = self._input.LT(1)
File "/home/xyz/lib/python3.5/site-packages/antlr4/CommonTokenStream.py", line 61, in LT
self.lazyInit()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 186, in lazyInit
self.setup()
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 189, in setup
self.sync(0)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 111, in sync
fetched = self.fetch(n)
File "/home/xyz/lib/python3.5/site-packages/antlr4/BufferedTokenStream.py", line 123, in fetch
t = self.tokenSource.nextToken()
File "/home/xyz/lib/python3.5/site-packages/antlr4/Lexer.py", line 111, in nextToken
tokenStartMarker = self._input.mark()
AttributeError: 'str' object has no attribute 'mark'
This parses correctly:
javac *.java
grun JSON json -gui bookmarks-2017-05-24.json
So the grammar itself is not the problem.
So finally the question: How should I process the input file in python, so that lexer and parser can digest it?
Thanks in advance.
Make sure your input file is actually encoded as UTF-8. Many problems with character recognition by the lexer are caused by using other encodings. I just took a testbed application, added ëto the list of available characters for an IDENTIFIER and it works again. UTF-8 is the key -- and make sure your grammar also allows these characters where you want to accept them.
I solved it by passing the encoding info:
input = FileStream(sys.argv[1], encoding = 'utf8')
If without the encoding info, I will have the same issue as yours.
Traceback (most recent call last):
File "test.py", line 20, in <module>
main()
File "test.py", line 9, in main
input = FileStream(sys.argv[1])
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding, errors))
File ".../lib/python3.5/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
Where my input data is
[今明]天(台南|高雄)的?天氣如何
I have a strange problem. A friend of mine sent me a text file. If I copy/paste the text and then paste it into my text editor and save it, the following code works. If I choose the option to save the file directly from the browser, the following code breaks. What's going on? Is it the browser's fault for saving invalid characters?
This is an example line.
When I save it, the line says
What�s going on?
When I copy/paste it, the line says
What’s going on?
This is the code:
import codecs
def do_stuff(filename):
with codecs.open(filename, encoding='utf-8') as f:
def process_line(line):
return line.strip()
lines = f.readlines()
for line in lines:
line = process_line(line)
print line
do_stuff('stuff.txt')
This is the traceback I get:
Traceback (most recent call last):
File "test-encoding.py", line 13, in <module>
do_stuff('stuff.txt')
File "test-encoding.py", line 8, in do_stuff
lines = f.readlines()
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 679, in readlines
return self.reader.readlines(sizehint)
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 588, in readlines
data = self.read()
File "/home/somebody/.python/lib64/python2.7/codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 4: invalid start byte
What can I do in such cases?
How can I distribute the script if I don't know what encoding the user who runs it will use?
Fixed:
codecs.open(filename, encoding='utf-8', errors='ignore') as f:
The "file-oriented" part of the browser works with raw bytes, not characters. The specific encoding used by the page should be specified either in the HTTP headers or in the HTML itself. You must use this encoding instead of assuming that you have UTF-8 data.