Parse json file downloaded from Azure data lake

Parse json file downloaded from Azure data lake - python

I download a file from azure data lake which is in the following format:
{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}
I just want to read and parse this in python.
def read_ods_file():
file_path = 'temp.json'
data = []
with open(file_path) as f:
for line in f:
data.append(json.loads(line))
This gave me the exception:
data.append(json.loads(line))
File "C:\python3.6\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\python3.6\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\python3.6\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Printing the lines show these added characters at the start. What are these added characters?
ï»¿{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}

Microsoft uses all kinds of weird characters. You could try to use string.printable to only get normal ASCII characters like this:
How can I remove non-ASCII characters but leave periods and spaces using Python?

the f variable you set with
with open(file_path) as f:
is a python file object (of type _io.TextIOWrapper).
If you want to read each line as a json object, you should try something like:
with open(file_path) as f:
# read the file contents into a string
# strip off trailing whitespace
# split string into list of strings on \n character
for line in f.read().strip().splitlines():
data.append(json.loads(line))

Related

Load string into dictionary when some values contain single quotes

I have a string that I need to load into a dictionary. I'm trying to use json.loads() to do this, but it is failing because I need to replace single quotes with double quotes because by default the strings use single quotes to wrap property names and values, although in some cases the value is wrapped in double quotes because it contains an single quote.
Here is a reproduceable example using Python3.8
>>> import json
>>> example = "{'msg': \"I'm a string\"}"
>>> json.loads(example.replace("'", '"'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.8/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 12 (char 11)
Is there a better way to load a string into a dictionary? The json module seems to only work with double quotes and as that is not possible here (I have no control over the formatting of the strings), hence why I'm having to use the unreliable replace.
My desired result is to have a dictionary like this.
{'msg': "I'm a string"}

Use ast.literal_eval()Docs:
>>> import ast
>>> ast.literal_eval(example)
{'msg': "I'm a string"}

Problem with loading a json file for geo_data in python

I am currently trying to use folium library in python to create webmaps. I have a file world.json which contains geo_data. I have provided a link to the file at the end of this post. I tried the following code:
data = [json.loads(line) for line in open('world.json', 'r')]
and received the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How can I load this file?
What I want to achieve is essentially obtain the population data and create a Choropleth and overlay it on my webmap.
Edit: Forgot the link:
https://1drv.ms/u/s!Army95vqcKXpaooVAZU_g-VCAVw?e=vwTknq
Edit: Previous link to skydrive stopped working due to "high traffic". Below is link to dropbox, hopefully this works:
https://www.dropbox.com/s/gmm8db0g03rc7cv/world.json?dl=0

Good news/bad news:
It turns out that this file was encoded in a locale that we are not accustomed to, and json/ascii cannot make sense of some of the character encoding. I tried this, and it seems to be working for me -- with a major caveat:
with open("world.json", "r") as fh:
contents = fh.read()
asciiContents = contents.encode("ascii", errors="ignore")
data = json.loads(asciiContents)
The major caveat is that only 3 countries come through with no encoding errors:
>>> len(data["features"])
3
Maybe there another source for this data that is closer to a native english locale, or maybe someone else can provide wisdom in encoding foreign data in a more friendly way...

The open command will return a file handle, not string lines. I would do:
with open('world.json', 'r') as fh:
data = json.load(fh)
data will then be your contents converted to python (list or dictionary, etc)

I have a problem with json variable setting

i have this code here:
import json
with open("pass_file.txt", "r") as file:
password = json.loads(file.read())
it calls this error:
Traceback (most recent call last):
File "testdoc.py", line 9, in <module>
print(json.loads(file.read()))
File "C:\Program Files\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python37\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I dont know why this is happening because i have the same code on another file just with different variable name and file name and it works file. I did notice another similar question about a similar error but it didnt answer my question.
Thanks in advance :)

What is the content of you pass_file.txt ? The python code use json.loads so it expect JSON formated content in the pass_file.txt
For example for a string, the content of this file will be "hello world"
If you don't put quotes, the JSON parsing process will fail.

python unable to load a json file with utf-8 encoding

With the following python code:
filePath = urllib2.urlopen('xx.json')
fileJSON = json.loads(filePath.read().decode('utf-8'))
Where the xx.json looks like:
{
"tags": [{
"id": "123",
"name": "Airport",
"name_en": "Airport",
"name_cn": "机场",
"display": false
}]
}
I see the following exception:
fileJSON = json.loads(filePath.read().decode('utf-8'))
File "/usr/lib64/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The code works before the Chinese characters are added to the json file, when I also added the .decode('utf-8') behind the read() as well.
I am not sure what needs to be done?

$ wget https://s3.amazonaws.com/wherego-sims/tags.json
$ file tags.json
tags.json: UTF-8 Unicode (with BOM) text, with CRLF line terminators
This file begins with a byte order mark (EF BB BF), which is illegal in JSON (JSON Specification and usage of BOM/charset-encoding). You must first decode this using 'utf-8-sig' in Python to get a valid JSON unicode string.
json.loads(filePath.read().decode('utf-8-sig'))
For what it's worth, Python 3 (which you should be using) will give a specific error in this case and guide you in handling this malformed file:
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
Namely, by specifying that you wish to discard the BOM if it exists (again, it's not conventional to use this in UTF-8, particularly with JSON which is always encoded in UTF-8 so it is worse than useless):
>>> import json
>>> json.load(open('tags.json', encoding='utf-8-sig'))

Parsing through a JSON file with Python 2.x

I'm currently trying to parse through a text file containing a number of Facebook chat fragments. The fragments are stored as below:-
{"t":"msg","c":"p_100002239013747","s":14,"ms":[{"msg":{"text":"2what is the best restauran
t in hong kong? ","time":1303115825598,"clientTime":1303115824391,"msgID":"1862585188"},"from":10000
2239013747,"to":635527479,"from_name":"David Robinson","from_first_name":"David","from_gender":1,"to_name":"Jason Yeung","to_first_name":"Jason","to_gender":2,"type":"msg"}]}
I've tried a number of ways to parse / open the JSON file but to no avail. Here is what I've tried thusfar:-
import json
data = []
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string:
for line in json_string:
data.append(json.loads(line))
error:
Traceback (most recent call last):
File "C:/Users/Amy/Desktop/facebookparser.py", line 6, in <module>
data.append(json.loads(line))
File "C:\Program Files\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 91 (char 91)
and also:
import json
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_file:
data = json.load(json_file)
... but I get exactly the same error as above.
Any suggestions? I've searched previous posts on here and tried the alternative solutions but to no avail. I'm aware I need to treat it as a dictionary file with for example, 'time' being a key and '1303115825598' being the respective time value but if I can't even process the json file into memory, there's no way I can parse it.
Where am I going wrong? Thanks

Your data contains newlines where JSON would not allow these. You'll have to stitch the lines back together again:
data = []
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string:
partial = ''
for line in json_string:
partial += line.rstrip('\n')
try:
data.append(json.loads(partial))
partial = ''
except ValueError:
continue # Not yet a complete JSON value
The code collects lines into partial, but minus the newline, and tries to decode the JSON. If that succeeds, partial is set to the empty string again to process the next entry. If it fails, we loop to the next line to append, until there is a complete JSON value to decode.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse json file downloaded from Azure data lake - python

Microsoft uses all kinds of weird characters. You could try to use string.printable to only get normal ASCII characters like this: How can I remove non-ASCII characters but leave periods and spaces using Python?

Related

Load string into dictionary when some values contain single quotes

Problem with loading a json file for geo_data in python

I have a problem with json variable setting

python unable to load a json file with utf-8 encoding

Parsing through a JSON file with Python 2.x

Categories

Resources