Python Parse JSON array

Python Parse JSON array - python

I'm trying to put together a small python script that can parse out array's out of a large data set. I'm looking to pull a few key:values from each object so that I can play with them later on in the script. Here's my code:
# Load up JSON Function
import json
# Open our JSON file and load it into python
input_file = open ('stores-small.json')
json_array = json.load(input_file)
# Create a variable that will take JSON and put it into a python dictionary
store_details = [
["name"],
["city"]
]
# Learn how to loop better =/
for stores in [item["store_details"] for item in json_array]
Here's the sample JSON Data:
[
{
"id": 1000,
"type": "BigBox",
"name": "Mall of America",
"address": "340 W Market",
"address2": "",
"city": "Bloomington",
"state": "MN",
"zip": "55425",
"location": {
"lat": 44.85466,
"lon": -93.24565
},
"hours": "Mon: 10-9:30; Tue: 10-9:30; Wed: 10-9:30; Thurs: 10-9:30; Fri: 10-9:30; Sat: 10-9:30; Sun: 11-7",
"services": [
"Geek Squad Services",
"Best Buy Mobile",
"Best Buy For Business"
]
},
{
"id": 1002,
"type": "BigBox",
"name": "Tempe Marketplace",
"address": "1900 E Rio Salado Pkwy",
"address2": "",
"city": "Tempe",
"state": "AZ",
"zip": "85281",
"location": {
"lat": 33.430729,
"lon": -111.89966
},
"hours": "Mon: 10-9; Tue: 10-9; Wed: 10-9; Thurs: 10-9; Fri: 10-10; Sat: 10-10; Sun: 10-8",
"services": [
"Windows Store",
"Geek Squad Services",
"Best Buy Mobile",
"Best Buy For Business"
]}
]

In your for loop statement, Each item in json_array is a dictionary and the dictionary does not have a key store_details. So I modified the program a little bit
import json
input_file = open ('stores-small.json')
json_array = json.load(input_file)
store_list = []
for item in json_array:
store_details = {"name":None, "city":None}
store_details['name'] = item['name']
store_details['city'] = item['city']
store_list.append(store_details)
print(store_list)

If you arrived at this question simply looking for a way to read a json file into memory, then use the built-in json module.
with open(file_path, 'r') as f:
data = json.load(f)
If you have a json string in memory that needs to be parsed, use json.loads() instead:
data = json.loads(my_json_string)
Either way, now data is converted into a Python data structure (list/dictionary) that may be (deeply) nested and you'll need Python methods to manipulate it.
If you arrived here looking for ways to get values under several keys as in the OP, then the question is about looping over a Python data structure. For a not-so-deeply-nested data structure, the most readable (and possibly the fastest) way is a list / dict comprehension. For example, for the requirement in the OP, a list comprehension does the job.
store_list = [{'name': item['name'], 'city': item['city']} for item in json_array]
# [{'name': 'Mall of America', 'city': 'Bloomington'}, {'name': 'Tempe Marketplace', 'city': 'Tempe'}]
Other types of common data manipulation:
For a nested list where each sub-list is a list of items in the json_array.
store_list = [[item['name'], item['city']] for item in json_array]
# [['Mall of America', 'Bloomington'], ['Tempe Marketplace', 'Tempe']]
For a dictionary of lists where each key-value pair is a category-values in the json_array.
store_data = {'name': [], 'city': []}
for item in json_array:
store_data['name'].append(item['name'])
store_data['city'].append(item['city'])
# {'name': ['Mall of America', 'Tempe Marketplace'], 'city': ['Bloomington', 'Tempe']}
For a "transposed" nested list where each sub-list is a "category" in json_array.
store_list = list(store_data.values())
# [['Mall of America', 'Tempe Marketplace'], ['Bloomington', 'Tempe']]

Related

Remove duplicate dictionary from a list on the basis of key value by priority

Suppose I have the following type of list of dictionaries:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"},
{"Name": "Zeke", "Type": "Reader"}
]
I want to remove duplicates of "Name" on the basis of "Type" by the following priority (Admin > Writer > Reader), so the end result should be:
iterlist = [
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Writer"}
]
I found a similar question but it removes duplicates for one explicit type of key-value: Link
Can someone please guide me on how to move forward with this?

This is a modified form of the solution suggested by #azro, their solution and the other solution does not take into account the priority you mentioned, you can get over that using the following code. Have a priority dict.
iterlist = [
{"Name": "Mike", "Type": "Writer"},
{"Name": "Mike", "Type": "Reader"},
{"Name": "Mike", "Type": "Admin"},
{"Name": "Zeke", "Type": "Reader"},
{"Name": "Zeke", "Type": "Writer"}
]
# this is used to get the priority
priorites = {i:idx for idx, i in enumerate(['Admin', 'Writer', 'Reader'])}
sort_key = lambda x:(x['Name'], priorites[x['Type']])
groupby_key = lambda x:x['Name']
result = [next(i[1]) for i in groupby(sorted(iterlist, key=sort_key), key=groupby_key)]
print(result)
Output
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]

You can also use pandas in the following way:
transform the list of dictionary to data frame:
import pandas as pd
df = pd.DataFrame(iterlist)
create a mapping dict:
m = {'Admin': 3, 'Writer': 2, 'Reader': 1}
create a priority column using replace:
df['pri'] = df['Type'].replace(m)
sort_values by pri and groupby by Name and get the first element only:
df = df.sort_values('pri', ascending=False).groupby('Name').first().reset_index()
drop the pri column and return to dictionary using to_dict:
df.drop('pri', axis='columns').to_dict(orient='records')
This will give you the following:
[{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}]

Here is solution you can try out,
unique = {}
for v in iterlist:
# check if key exists, if not update to `unique` dict
if not unique.get(v['Name']):
unique[v['Name']] = v
print(unique.values())
dict_values([{'Name': 'Mike', 'Type': 'Admin'}, {'Name': 'Zeke', 'Type': 'Writer'}])

Why does updating a dictionary remove the rest of the dictionaries from my nested array?

I have a json file with players structured as so
[
{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-Mexico Championship",
"Points": "103.30",
"Salary": "12200.00"
},
{
"Name": "The Genesis Invitational",
"Points": "88.60",
"Salary": "12200.00"
},
{
"Name": "Farmers Insurance Open",
"Points": "107.30",
"Salary": "12200.00"
},
{
"Name": "World Golf Championships-HSBC Champions",
"Points": "138.70",
"Salary": "12400.00"
},
{
"Name": "The ZOZO Championship",
"Points": "103.40",
"Salary": "12300.00"
}
]
}]
When I run this code
import json
import numpy as np
import pandas as pd
from itertools import groupby
# using json open the player objects file and set it equal to data
with open('Active_PGA_Player_Objects.json') as json_file:
data = json.load(json_file)
with open('Players_DK.json') as json_file:
Players_DK = json.load(json_file)
results = []
for k,g in groupby(sorted(data, key=lambda x:x['Player_Name']), lambda x:x['Player_Name']):
results.append({'Player_Name':k, 'Tournament':[i['Tournament'][0] for i in g]})
for obj in results:
for x in Players_DK:
if obj['Player_Name'] == x['Name']:
obj['Average'] = x['AvgPointsPerGame']
i = 0
points_results = []
while i < len(results):
j = 0
while j < len(results[i]['Tournament']):
difference = (int(float(results[i]['Tournament'][j]['Points'])) - (results[i]['Average']))
points_results.append(round(difference,2))
j += 1
i += 1
with open('PGA_Player_Objects_w_Average.json', 'w') as my_file:
json.dump(results, my_file)
my list comes back like this
[{
"Player_Name": "Rory McIlroy",
"Tournament": [
{
"Name": "Arnold Palmer Invitational presented by Mastercard",
"Points": "68.10",
"Salary": "12200.00"
}
],
"Average": 96.19
}]
Can someone explain to me why when I update the specific dictionary it deletes all but the first value from the nested Tournament list? My goal here is to add each players average to their corresponding dictionary so that I can take each average and subtract it from each score. When I try to do this though I'm only able to perform it on the one value left in the list.

Just for what it's worth, I'd go back and really think about what each line is really doing. You're also making things harder on yourself by calling variables obj or x. Calculating the average can be done like:
for player in data: # data is poorly named, try players or players_data
player['Average'] = sum(float(tourny['Points']) for tourny in player['Tournament']) / len(player['Tournament'])
for tourny in player['Tournament']:
tourny['Difference'] = float(tourny['Points']) - float(player['Average'])
leaving you with:
{'Player_Name': 'Rory McIlroy',
'Tournament': [{
'Name': 'Arnold Palmer Invitational presented by Mastercard',
'Points': '68.10',
'Salary': '12200.00',
'Difference': -33.46666666666667},
{
'Name': 'World Golf Championships-Mexico Championship',
'Points': '103.30',
'Salary': '12200.00',
'Difference': 1.7333333333333343}, # .....etc
'Average': 101.566666666666666
}
When you use names in your code that describe what they're representing, a huge number of optimizations become immediately obvious. Give it a go!

Splitting a string in json using python

I have a simple Json file
input.json
[
{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "200/600"
},
{
"title": "Person1",
"type": "object2",
"required": "firstName1",
"min_max": "230/630"
},
{
"title": "Person2",
"type": "object2",
"required": "firstName2",
"min_max": "201/601"
},
{
"title": "Person3",
"type": "object3",
"required": "firstName3",
"min_max": "2000/6000"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "null"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "1024 / 256"
},
{
"title": "Person4",
"type": "object4",
"required": "firstName4",
"min_max": "0"
}
]
I am trying to create a new json file with new data. I would like to split "min_max" into two different fields ie., min and max. Below is the code written in python.
import json
input=open('input.json', 'r')
output=open('test.json', 'w')
json_decode=json.load(input)
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('title')
my_dict['min']=item.get('min_max')
my_dict['max']=item.get('min_max')
result.append(my_dict)
data=json.dumps(result, output)
output.write(data)
output.close()
How do I split the string into two different values. Also, is there any possibility of printing the json output in order.

Your JSON file seems to be written wrong (the example one). It is not a list. It is just a single associated array (or dictionary, in Python). Additionally, you don't seem to be using json.dumps properly. It only takes 1 argument. I also figured it would be easier to just create the dictionary inline. And you don't seem to be splitting the min_max properly.
Here's the correct input:
[{
"title": "Person",
"type": "object",
"required": "firstName",
"min_max": "20/60"
}]
Here's your new code:
import json
with open('input.json', 'r') as inp, open('test.json', 'w') as outp:
json_decode=json.load(inp)
result = []
for temp in json_decode:
minMax = temp["min_max"].split("/")
result.append({
"title":temp["title"],
"min":minMax[0],
"max":minMax[1]
})
data=json.dumps(result)
outp.write(data)

Table + Python == Pandas
import pandas as pd
# Read old json to a dataframe
df = pd.read_json("input.json")
# Create two new columns based on min_max
# Removes empty spaces with strip()
# Returns [None,None] if length of split is not equal to 2
df['min'], df['max'] = (zip(*df['min_max'].apply
(lambda x: [i.strip() for i in x.split("/")]
if len(x.split("/"))== 2 else [None,None])))
# 'delete' (drop) min_max column
df.drop('min_max', axis=1, inplace=True)
# output to json again
df.to_json("test.json",orient='records')
Result:
[{'max': '600',
'min': '200',
'required': 'firstName',
'title': 'Person',
'type': 'object'},
{'max': '630',
'min': '230',
'required': 'firstName1',
'title': 'Person1',
'type': 'object2'},
{'max': '601',
'min': '201',
'required': 'firstName2',
'title': 'Person2',
'type': 'object2'},
{'max': '6000',
'min': '2000',
'required': 'firstName3',
'title': 'Person3',
'type': 'object3'},
{'max': None,
'min': None,
...

You can do something like this:
import json
nl=[]
for di in json.loads(js):
min_,sep,max_=map(lambda s: s.strip(), di['min_max'].partition('/'))
if sep=='/':
del di['min_max']
di['min']=min_
di['max']=max_
nl.append(di)
print json.dumps(nl)
This keeps the "min_max" values that cannot be separated into two values unchanged.

Counting Items in Python from a JSON file

I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},

If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.

Append element to json dict (geojson) using Python

I would like to add style to my geojson through Python. The current features currently do not have any style elements. I want to append style and then fill. However, when I do, nothing is added to the file. It is the same as before
import json
with open('test.json') as f:
data = json.load(f)
for feature in data['features']:
feature.append("style")
feature["style"].append({"fill":color})
Sample GeoJson
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "STATEFP": "17", "COUNTYFP": "019", "TRACTCE": "005401", "BLKGRPCE": "2", "GEOID": "170190054012", "NAMELSAD": "Block Group 2", "MTFCC": "G5030", "FUNCSTAT": "S", "ALAND": 574246.000000, "AWATER": 4116.000000, "INTPTLAT": "+40.1238204", "INTPTLON": "-088.2038105", "GISJOIN": "G17001900054012", "STUSPS": "IL", "SHAPE_AREA": 578361.706954, "SHAPE_LEN": 3489.996273, "census_block_income_YEAR": "2009-2013", "census_block_income_STATE": "Illinois", "census_block_income_STATEA": 17, "census_block_income_COUNTY": "Champaign County"}}]}
I'm trying to get the end results to be:
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "STATEFP": "17", "COUNTYFP": "019", "TRACTCE": "005401", "BLKGRPCE": "2", "GEOID": "170190054012", "NAMELSAD": "Block Group 2", "MTFCC": "G5030", "FUNCSTAT": "S", "ALAND": 574246.000000, "AWATER": 4116.000000, "INTPTLAT": "+40.1238204", "INTPTLON": "-088.2038105", "GISJOIN": "G17001900054012", "STUSPS": "IL", "SHAPE_AREA": 578361.706954, "SHAPE_LEN": 3489.996273, "census_block_income_YEAR": "2009-2013", "census_block_income_STATE": "Illinois", "census_block_income_STATEA": 17, "census_block_income_COUNTY": "Champaign County"},"style"{fill:"red"}}]}

When you type
for feature in data['features']:
every feature will be an item of the list that is data['features']. Each item there is a dictionary, so you are calling the wrong method (append is a method of lists).
You could write
for feature in data['features']:
feature.update({"style": {"fill": "red"}})
Finally, if you want the file from which you got the initial json structure to be altered, make sure to write the now updated data structure back to a file:
with open('output2.json', 'w') as f:
json.dump(data, f)

You are working with list of dictionaries here, dictionary hasn't method append, you can create new key like here:
for feature in data['features']:
feature["style"] = {"fill":color}
Seems that you need rewrite file with JSON:
with open('test.json', 'w') as f:
json.dump(data, f)

There is no append method in a dictionary. One should use update.
import pprint as pp
for feature in data['features']:
feature.update({'style':{'fill': 'red'}})
pp.pprint(data)
Output:
{'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'},
'type': 'name'},
'features': [{'properties': {'ALAND': 574246.0,
'AWATER': 4116.0,
'BLKGRPCE': '2',
'COUNTYFP': '019',
'FUNCSTAT': 'S',
'GEOID': '170190054012',
'GISJOIN': 'G17001900054012',
'INTPTLAT': '+40.1238204',
'INTPTLON': '-088.2038105',
'MTFCC': 'G5030',
'NAMELSAD': 'Block Group 2',
'SHAPE_AREA': 578361.706954,
'SHAPE_LEN': 3489.996273,
'STATEFP': '17',
'STUSPS': 'IL',
'TRACTCE': '005401',
'census_block_income_COUNTY': 'Champaign County',
'census_block_income_STATE': 'Illinois',
'census_block_income_STATEA': 17,
'census_block_income_YEAR': '2009-2013'},
'style': {'fill': 'red'},
'type': 'Feature'}],
'type': 'FeatureCollection'}

You never write your changes back to the file. Add the following to the end of your code:
with open('test.json','w') as f:
json.dump(data, f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Parse JSON array - python

Related

Remove duplicate dictionary from a list on the basis of key value by priority

Why does updating a dictionary remove the rest of the dictionaries from my nested array?

Splitting a string in json using python

Counting Items in Python from a JSON file

Append element to json dict (geojson) using Python

Categories

Resources