python unable to load a json file with utf-8 encoding - python

With the following python code:
filePath = urllib2.urlopen('xx.json')
fileJSON = json.loads(filePath.read().decode('utf-8'))
Where the xx.json looks like:
{
"tags": [{
"id": "123",
"name": "Airport",
"name_en": "Airport",
"name_cn": "机场",
"display": false
}]
}
I see the following exception:
fileJSON = json.loads(filePath.read().decode('utf-8'))
File "/usr/lib64/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The code works before the Chinese characters are added to the json file, when I also added the .decode('utf-8') behind the read() as well.
I am not sure what needs to be done?

$ wget https://s3.amazonaws.com/wherego-sims/tags.json
$ file tags.json
tags.json: UTF-8 Unicode (with BOM) text, with CRLF line terminators
This file begins with a byte order mark (EF BB BF), which is illegal in JSON (JSON Specification and usage of BOM/charset-encoding). You must first decode this using 'utf-8-sig' in Python to get a valid JSON unicode string.
json.loads(filePath.read().decode('utf-8-sig'))
For what it's worth, Python 3 (which you should be using) will give a specific error in this case and guide you in handling this malformed file:
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Namely, by specifying that you wish to discard the BOM if it exists (again, it's not conventional to use this in UTF-8, particularly with JSON which is always encoded in UTF-8 so it is worse than useless):
>>> import json
>>> json.load(open('tags.json', encoding='utf-8-sig'))

Related

Error while loading a specific json file in Python [duplicate]

I have some json files created by powershell using the ConvertTo-Json command. The content of the json file looks like
{
"Key1": "Value1",
"Key2": "Value2"
}
I ran the python interpreter to see if I could read the file but I get this weird output
>>> f=open('test.json', 'r')
>>> f.read()
'ÿ\xfe{\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x001\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x001\x00"\x00,\x00\n\x00\n\x00 \x00 \x00 \x00 \x00"\x00K\x00e\x00y\x002\x00"\x00:\x00 \x00 \x00"\x00V\x00a\x00l\x00u\x00e\x002\x00"\x00\n\x00\n\x00}\x00\n\x00\n\x00'
For some reason all the characters are escaped byte characters and there's the weird ÿ at the begninning (powershell error?).
The weird thing is this:
>>> f=open('test.json', 'r')
>>> str=f.read()
>>> type(str)
<class 'str'>
>>> json.loads(str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Rutvik_Choudhary\AppData\Local\Programs\Python\Python35-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
So the input is a string, but the json module can't parse it (json.load(f) return the same error). What is causing this error? Is it a python thing, a powershell thing, a json thing?
As pointed out by jwodder, PowerShell has encoded your json using UTF-16LE. To get this data into json correctly, you need to open the file using the correct encoding. eg.
with open("test.json", "r", encoding="utf16") as f:
json_string = f.read()
my_dict = json.loads(json_string)
You don't need to tell Python which variant of UTF-16 is being used. This is the purpose of the first two bytes of the text file. It's called a Byte Order Mark (BOM). It lets a program know if UTF-16LE or UTF-16BE has been used to encode the text file.
It seems that you have a BOM at the start of your file. You can verify it in a hex editor or with a good text editor (Notepad++ shows if BOM is present).
If you want to load text files with Unicode BOM headers, like yours you should better use to codecs.open functions instead of open as the default open is not able to interpret the BOM.
Or you can have a look at tendo.unicode - a small library that I wrote that can improve life for people that are not used to Unicode texts.

Parse json file downloaded from Azure data lake

I download a file from azure data lake which is in the following format:
{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}
I just want to read and parse this in python.
def read_ods_file():
file_path = 'temp.json'
data = []
with open(file_path) as f:
for line in f:
data.append(json.loads(line))
This gave me the exception:
data.append(json.loads(line))
File "C:\python3.6\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\python3.6\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\python3.6\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Printing the lines show these added characters at the start. What are these added characters?
{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}
Microsoft uses all kinds of weird characters. You could try to use string.printable to only get normal ASCII characters like this:
How can I remove non-ASCII characters but leave periods and spaces using Python?
the f variable you set with
with open(file_path) as f:
is a python file object (of type _io.TextIOWrapper).
If you want to read each line as a json object, you should try something like:
with open(file_path) as f:
# read the file contents into a string
# strip off trailing whitespace
# split string into list of strings on \n character
for line in f.read().strip().splitlines():
data.append(json.loads(line))

Python JSON double quote errors

I am having JSON syntax issues when I am using the following code code: https://github.com/clarkbk/streeteasy-analysis
Using this JSON in buildings.json
{
"buildings": [
{
"name": "Henry Hall",
"addr": "https://streeteasy.com/nyc/property_activity/past_transactions_component/799324?all_activity=true&show_rentals=true&style=xls",
"id": 799324,
}
]
}
I am getting the following error:
2019-05-25 16:04:26,641 - INFO - Starting...
Traceback (most recent call last):
File "run.py", line 27, in <module>
data = json.load(f)
File "/usr/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 9 column 5 (char 220)
root#LAPTOP-4QGC19OR:/home/HN/streeteasy-analysis#
I have researched for a few hours now on how to fix this but can't come up with a fix. I am not that familiar with JSON in general but I don't know where I am not double qouting properly. Appreciate any help with this.
The line number gives a good hint
you want:
"id": 799324
}
(note no comma after the last element)
json isn't python ast.literal_eval, if there's a comma on last element it fails, because it expects another property as the message states (Expecting property name enclosed in double quotes explains that, although the message could be better, as this error is very common)
If you have data like this, you can use ast.literal_eval on it instead, it will work without modifications (unless there are false or null json booleans/null-pointers)

Passing json text as command line argument

I am trying to pass the following JSON text into my python code.
{"platform": "android", "version": "6.0.1"}
My code is as follows.
import sys
import json
data = json.loads(sys.argv[1])
print(str(data))
When running the following on Windows 10 PowerShell,
python jsonTest.py '{"platform": "android", "version": "6.0.1"}'
I get the following:
Traceback (most recent call last):
File "jsonTest.py", line 3, in <module>
data = json.loads(sys.argv[1])
File "C:\Users\Rishabh Bhatnagar\AppData\Local\Programs\Python\Python36-
32\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\Rishabh Bhatnagar\AppData\Local\Programs\Python\Python36-
32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Rishabh Bhatnagar\AppData\Local\Programs\Python\Python36-
32\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double
quotes: line 1 column 2 (char 1)
As far as I know, I take my code, and pass the JSON text properly. I can't figure out what I'm doing wrong. I know the JSON text is valid (checked with https://jsonlint.com/). Thanks.
So I figured it out.
sys.argv[1]
The above line was taking my Json text below and taking out the quotes from it.
{"platform": "android", "version": "6.0.1"}
into
{platform: android, version: 6.0.1}
My workaround is to run it as follows.
Python jsonTest.py '{\"platform\": \"android\", \"version\": \"6.0.1\"}'
I will try to find a better way, but for today, I'm done.
import sys
import json
data = json.loads(sys.argv[1].replace("'", '"'))
print(str(data))
This seems to work for me, python 3.6 when calling with python jsonTest.py "{'platform': 'android', 'version': '6.0.1'}"

Unable to parse json array

I am just learning python and can not solve one issue.
The input json text like:
[1123771,10,7699,4357,'UMF Selfoss','Haukar Hafnarfjordur','2015,5,25,19,15,00','2015,5,25,20,16,37',-1,0,1,0,1,0,0,2,2,'8','7',,'True',0.25,'',25,'',2.75]
Then I trying to use python json module to parse it i get an error.
Here is the code:
js = json.loads("[1123771,10,7699,4357,'UMF Selfoss','Haukar Hafnarfjordur','2015,5,25,19,15,00','2015,5,25,20,16,37',-1,0,1,0,1,0,0,2,2,'8','7',,'True',0.25,'',25,'',2.75]")
The error is:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
This json text successfuly parsed by another frameworks like json.net (C#).
So the question is what I doing wrong?
Your json needs to be valid in order to be able to parse it:
Use this tool :
http://jsonlint.com/
JSON only works with double quotes.
Also two consecutive commas would make your JSON invalid
It's wrong JSON format. Check it by using some online services.

Categories