Parsing through a JSON file with Python 2.x - python

I'm currently trying to parse through a text file containing a number of Facebook chat fragments. The fragments are stored as below:-
{"t":"msg","c":"p_100002239013747","s":14,"ms":[{"msg":{"text":"2what is the best restauran
t in hong kong? ","time":1303115825598,"clientTime":1303115824391,"msgID":"1862585188"},"from":10000
2239013747,"to":635527479,"from_name":"David Robinson","from_first_name":"David","from_gender":1,"to_name":"Jason Yeung","to_first_name":"Jason","to_gender":2,"type":"msg"}]}
I've tried a number of ways to parse / open the JSON file but to no avail. Here is what I've tried thusfar:-
import json
data = []
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string:
for line in json_string:
data.append(json.loads(line))
error:
Traceback (most recent call last):
File "C:/Users/Amy/Desktop/facebookparser.py", line 6, in <module>
data.append(json.loads(line))
File "C:\Program Files\Python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "C:\Program Files\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 91 (char 91)
and also:
import json
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_file:
data = json.load(json_file)
... but I get exactly the same error as above.
Any suggestions? I've searched previous posts on here and tried the alternative solutions but to no avail. I'm aware I need to treat it as a dictionary file with for example, 'time' being a key and '1303115825598' being the respective time value but if I can't even process the json file into memory, there's no way I can parse it.
Where am I going wrong? Thanks

Your data contains newlines where JSON would not allow these. You'll have to stitch the lines back together again:
data = []
with open("C:\\Users\\Me\\Desktop\\facebookchat.txt", 'r') as json_string:
partial = ''
for line in json_string:
partial += line.rstrip('\n')
try:
data.append(json.loads(partial))
partial = ''
except ValueError:
continue # Not yet a complete JSON value
The code collects lines into partial, but minus the newline, and tries to decode the JSON. If that succeeds, partial is set to the empty string again to process the next entry. If it fails, we loop to the next line to append, until there is a complete JSON value to decode.

Related

Parse json file downloaded from Azure data lake

I download a file from azure data lake which is in the following format:
{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}
I just want to read and parse this in python.
def read_ods_file():
file_path = 'temp.json'
data = []
with open(file_path) as f:
for line in f:
data.append(json.loads(line))
This gave me the exception:
data.append(json.loads(line))
File "C:\python3.6\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\python3.6\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\python3.6\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Printing the lines show these added characters at the start. What are these added characters?
{"PartitionKey":"2020-10-05","value":"Resolved"...}
{"PartitionKey":"2020-10-06","value":"Resolved"...}
Microsoft uses all kinds of weird characters. You could try to use string.printable to only get normal ASCII characters like this:
How can I remove non-ASCII characters but leave periods and spaces using Python?
the f variable you set with
with open(file_path) as f:
is a python file object (of type _io.TextIOWrapper).
If you want to read each line as a json object, you should try something like:
with open(file_path) as f:
# read the file contents into a string
# strip off trailing whitespace
# split string into list of strings on \n character
for line in f.read().strip().splitlines():
data.append(json.loads(line))

Problem with loading a json file for geo_data in python

I am currently trying to use folium library in python to create webmaps. I have a file world.json which contains geo_data. I have provided a link to the file at the end of this post. I tried the following code:
data = [json.loads(line) for line in open('world.json', 'r')]
and received the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\name\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How can I load this file?
What I want to achieve is essentially obtain the population data and create a Choropleth and overlay it on my webmap.
Edit: Forgot the link:
https://1drv.ms/u/s!Army95vqcKXpaooVAZU_g-VCAVw?e=vwTknq
Edit: Previous link to skydrive stopped working due to "high traffic". Below is link to dropbox, hopefully this works:
https://www.dropbox.com/s/gmm8db0g03rc7cv/world.json?dl=0
Good news/bad news:
It turns out that this file was encoded in a locale that we are not accustomed to, and json/ascii cannot make sense of some of the character encoding. I tried this, and it seems to be working for me -- with a major caveat:
with open("world.json", "r") as fh:
contents = fh.read()
asciiContents = contents.encode("ascii", errors="ignore")
data = json.loads(asciiContents)
The major caveat is that only 3 countries come through with no encoding errors:
>>> len(data["features"])
3
Maybe there another source for this data that is closer to a native english locale, or maybe someone else can provide wisdom in encoding foreign data in a more friendly way...
The open command will return a file handle, not string lines. I would do:
with open('world.json', 'r') as fh:
data = json.load(fh)
data will then be your contents converted to python (list or dictionary, etc)

What is the issue with Python while processing my JSON file?

I have tried to remove the first key and value from a json file using python. While running the program, I came across error, they are mentioned as follows:
import json
with open('testing') as json_data:
data = json.load(json_data)
for element in data:
del element['url']
Error:
Traceback (most recent call last):
File "p.py", line 3, in <module>
data = json.load(json_data)
File "/usr/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 180)
The file input is something like this:
{"url":"example.com","original_url":"http://example.com","text":"blah...blah"...}
{"url":"example1.com","original_url":"http://example1.com","text":"blah...blah"...}
.
.
.
.
{"url":"exampleN.com","original_url":"http://exampleN.com","text":"blah...blah"...}
I don't know why is this problem occurring?
you have to read the file line by line, since it's rather lines of json data than valid json structure
Here's my line-by-line proposal
import json
data = []
with open('testing') as f:
for json_data in f:
element = json.loads(json_data) # load from current line as string
del element['url']
data.append(element)
Valid json would be in that case:
[{"url":"example.com","original_url":"http://example.com","text":"blah...blah"...},
{"url":"example1.com","original_url":"http://example1.com","text":"blah...blah"...}]
As per my comment, the input file is not valid JSON.
This answer multiple json dictionaries python tells you how to successfully read such a file, which consists of a concatenation of valid JSON entities rather tyan a JSON list of such entities.
The alternative if and only if you can rely on the line-structure of the file, is to read line by line and decode each line separately.
json_data is an instance of your file, not the content. so first apply read() on the instance for getting data. and second, write the full file name if you are reading a JSON file. your file should be testing.json. and third specify the mode of file opening mode. you can use this code
import json
with open('testing.json', 'r') as json_data:
data = json.load(json_data.read())
for element in data:
del element['url']

Not able to import json from commandline for Python

I am currently tring to work with import a json input that is accepted by Python through a commandline argument and I am trying to save the different values to JSON to a list. I am having issues with my code given below and have attached both the code and the error I get below. Any help much appreciated.
import sys
import json
def lookup1 ():
jsonData = json.loads(sys.argv[1])
print jsonData
jsonList = [jsonData['proxy'],jsonData['OS']]
print jsonList
lookup1()
The error is given below:
$ python dynamicMapper.py '{'proxy':1,'OS':2}'
Traceback (most recent call last):
File "dynamicMapper.py", line 9, in <module>
lookup1()
File "dynamicMapper.py", line 4, in lookup1
jsonData = json.loads(sys.argv[1])
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
The commadline argunet that I give is python dynamicMapper.py '{'proxy':1,'OS':2}'
I am not able to find out what is causing this error and if my approach is right.
The script is working fine, you just need to call it the right way:
python dynamicMapper.py '{"proxy":1,"OS":2}'
{u'OS': 2, u'proxy': 1}
[1, 2]
In JSON the strings are quoted with double quotes instead of single quotes. You also need to quote the string passed to script so that shell understands it being a single argument.

Trouble parsing JSON object in Python

I am trying to parse some text files containing JSON objects in Python using the json.load() method. It's working for one set of them, but for this one it will not:
{
"mapinfolist":{
"mapinfo":[
{"sku":"00028-0059","price":"38.35","percent":"50","basepercent":"50","exact":0,"match":0,"roundup":0}
,{"sku":"77826-7230","price":"4.18","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-1310","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-2020","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-3360","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-4060","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-4510","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
,{"sku":"77827-7230","price":"2.36","percent":"60","basepercent":"60","exact":1,"match":0,"roundup":0}
],
"count":2
}
}
It is in a file called 'map.txt' - I open it using open('map.txt') and then call json.load(). When I run my test program (test.py), the following error trace is generated:
Traceback (most recent call last):
File "test.py", line 28, in <module>
main()
File "test.py", line 23, in main
map_list = json.load(f1)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
The JSON object is valid - when I put it into https://www.jsoneditoronline.org/ it is parsed and displayed correctly, so I am having trouble identifying what could be stopping it from working when I try to do it in Python. Any advice would be much appreciated. Thanks!
EDIT: Here's my code.
import json
def main():
with open('map.txt') as f1:
map_list = json.load(f1)
Trying map_list = json.loads(f1.read()) also does not work and gives me an almost identical error trace.
EDIT - RESOLVED:
I just copied and pasted FROM map.txt into a new TextEdit file map2.txt and used the new file instead, and it works now. I copied directly from the old file and made no changes - the only difference is that it is a different file. I can't make heads or tails of why that would be - any ideas? I would like to understand what may have happened so I can avoid the problem in the future.
Does the following solution work for you?
import json
f = open("map.txt")
map = json.loads(f.read())
Python Docs
maybe try to read all the file to string and then use json.loads
def yourfunc():
file = open('map.txt')
json_string = file.read()
map = json.loads(json_string)

Categories