How to loop through nested dictionaries in a JSON

How to loop through nested dictionaries in a JSON - python

I have a json file that gives the polygons of the neighborhoods of Chicago. Here is a small sample of the form.
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]
I want to create a dataframe where I have each 'SEC_NEIGH', linked to the coordinates such that
df['SEC_NEIGH'] = 'coordinates'
I have tried using a for loop to loop through the dictionaries but when I do so, the dataframe comes out with only showing an '_'
df = {}
for item in data:
if 'features' in item:
if 'properties' in item:
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
I was expecting something where each column would be a separate neighborhood, with only one value, that being the list of coordinates. Instead, my dataframe outputs as a single underscore('_'). Is there something wrong with my for loop?

try this :
import pandas as pd
data=[
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]]
}}
]
df = {}
for item in data:
if item["type"] =='Feature':
if 'properties' in item.keys():
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
print(df_n)
output:
Printers Row
0 [-87.62760697485339, 41.87437097785366]
1 [-87.6275952566332, 41.873861712441126]
2 [-87.62756611032259, 41.873091933433905]
3 [-87.62755513014902, 41.872801941012725]
4 [-87.62754038267386, 41.87230261598636]
5 [-87.62752573582432, 41.8718067089444]
6 [-87.62751740010017, 41.87152447340544]
7 [-87.62749380061304, 41.87053328991345]
8 [-87.62748640976544, 41.87022285721281]
9 [-87.62747968351987, 41.86986997314866]
10 [-87.62746758964467, 41.86923545315858]
11 [-87.62746178584428, 41.868930955522266]

Related

json to dataframe in python: some fields don't convert to dataframe

I have one json file. I opened it with pd.read_json and then whem parsing to a geodtaframe just some fields are considered. Nut some are not. When I open it on QGIS for instance there are multiple columns that I cannot convert to geodataframe.
So my file is called PT:
PT = pd.read_json('PT.json')
PT
type features
0 FeatureCollection {'id': 'osm-w96717521', 'type': 'Feature', 'pr...
1 FeatureCollection {'id': 'osm-w96850552', 'type': 'Feature', 'pr...
2 FeatureCollection {'id': 'osm-r1394361', 'type': 'Feature', 'pro...
and for different PT lines I have different fields:
So for instance for:
PT['features'][0]
{'id': 'osm-w96717521',
'type': 'Feature',
'properties': {'height': 24,
'heightSrc': 'manual',
'levels': 8,
'date': 201804},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.151539, 38.725054],
[-9.15148, 38.724906],
[-9.151281, 38.724918],
[-9.151254, 38.724867],
[-9.151142, 38.724699],
[-9.150984, 38.724783],
[-9.151081, 38.724918],
[-9.151152, 38.725076],
[-9.151539, 38.725054]]]}}
and for:
PT['features'][100000]
{'id': 'osm-w556092901',
'type': 'Feature',
'properties': {'date': 201801, 'orient': 95, 'height': 3, 'heightSrc': 'ai'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.402381, 38.742663],
[-9.402342, 38.74261],
[-9.402215, 38.742667],
[-9.402281, 38.742706],
[-9.402381, 38.742663]]]}}
it has also the field 'orient'.
When I convert the features dict to each column on a df, for some columns result:
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"])
df
but those that do not appear on every line I cannot consider. So for 'levels' or 'orient':
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In [46], line 1
----> 1 df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
(...)
KeyError: 'levels'
How can I get all columns contained in feature even if for some values they should be null?

You can use an if/else construct like return nan if the key does not exist.
import numpy as np
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"] if 'levels' in list(row["properties"].keys()) else np.nan)
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"] if 'coordinates' in list(row["geometry"].keys()) else np.nan)

loop through json file python to get specific key and values

I have a json file built like this:
{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
I would like to get for each feature: properties and geometry but I think I loop badly on my json file. here is my code
data = pd.read_json(json_file_path)
for key, v in data.items():
print(f"{key['features']['geometry']} : {v}",
f"{key['features']['properties']} : {v}")

The values you are interested in are located in a list that is itself a value of your main dictionary.
If you want to be able to process these values with pandas, it would be better to build your dataframe directly from them:
import json
import pandas as pd
data = json.loads("""{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}}
]
}
""")
df = pd.DataFrame(data['features'])
print(df)
It'll give you the following DataFrame:
type id geometry properties
0 Feature 010020000A0225 {'type': 'Polygon', 'coordinates': [[[5.430767... {'id': '010020000A0225', 'commune': '01002', '...
1 Feature 010020000A0346 {'type': 'Polygon', 'coordinates': [[[5.424195... {'id': '010020000A0346', 'commune': '01002', '...
From there you can easily access the geometry and properties columns.
Furthermore, if you want geometric and other properties in their own columns, you can use json_normalize:
df = pd.json_normalize(data['features'])
print(df)
Output:
type id geometry.type geometry.coordinates ... properties.contenance properties.arpente properties.created properties.updated
0 Feature 010020000A0225 Polygon [[[5.430767, 46.0214267], [5.4310805, 46.02201... ... 9440 False 2005-06-03 2018-09-25
1 Feature 010020000A0346 Polygon [[[5.4241952, 46.0255535], [5.4233594, 46.0262... ... 2800 False 2005-06-03 2018-09-25

Extracting dictionary values from dataframe and creating a new Pandas column

I have a Pandas DataFrame with several columns, one of which is a dictionary containing coordinates in a list. This is what the entry looks like:
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}
I would like to extract this data and create 2 new columns in the original DataFrame, one for the latitude and one for longitude.
ID
Latitude
Longitude
1
-120.12345
50.23456
I have not been able to find a simple solution at this point and would be grateful for any guidance.

You can access dictionary get method through the .str
test = pd.DataFrame(
{
"ID": [1, 2],
"point": [
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]},
{'type': 'Point', 'coordinates': [-10.12345, 50.23456]}]
},
)
pd.concat([
test["ID"],
pd.DataFrame(
test['point'].str.get('coordinates').to_list(),
columns=['Latitude', 'Longitude']
)
],axis=1)

You can use str to fetch the required structure:
df = pd.DataFrame({'Col' : [{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}]})
df['Latitude'] = df.Col.str['coordinates'].str[0]
df['Longitude'] = df.Col.str['coordinates'].str[1]
OUTPUT:
Col Latitude Longitude
0 {'type': 'Point', 'coordinates': [-120.12345, ... -120.12345 50.23456

Fill pandas dataframe within a for loop

I am working with Amazon Rekognition to do some image analysis.
With a symple Python script, I get - at every iteration - a response of this type:
(example for the image of a cat)
{'Labels':
[{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [],
'Parents': [{'Name': 'Animal'}]}, {'Name': 'Mammal', 'Confidence': 96.146484375,
'Instances': [], 'Parents': [{'Name': 'Animal'}]},
{'Name': 'Cat', 'Confidence': 96.146484375.....
I got all the attributes I need in a list, that looks like this:
[Pet, Mammal, Cat, Animal, Manx, Abyssinian, Furniture, Kitten, Couch]
Now, I would like to create a dataframe where the elements in the list above appear as columns and the rows take values 0 or 1.
I created a dictionary in which I add the elements in the list, so I get {'Cat': 1}, then I go to add it to the dataframe and I get the following error:
TypeError: Index(...) must be called with a collection of some kind, 'Cat' was passed.
Not only that, but I don't even seem able to add to the same dataframe the information from different images. For example, if I only insert the data in the dataframe (as rows, not columns), I get a series with n rows with the n elements (identified by Amazon Rekognition) of only the last image, i.e. I start from an empty dataframe at each iteration.
The result I would like to get is something like:
Image Human Animal Flowers etc...
Pic1 1 0 0
Pic2 0 0 1
Pic3 1 1 0
For reference, this is the code I am using now (I should add that I am working on a software called KNIME, but this is just Python):
from pandas import DataFrame
import pandas as pd
import boto3
fileName=flow_variables['Path_Arr[1]'] #This is just to tell Amazon the name of the image
bucket= 'mybucket'
client=boto3.client('rekognition', region_name = 'us-east-2')
response = client.detect_labels(Image={'S3Object':
{'Bucket':bucket,'Name':fileName}})
data = [str(response)] # This is what I inserted in the first cell of this question
d= {}
for key, value in response.items():
for el in value:
if isinstance(el,dict):
for k, v in el.items():
if k == "Name":
d[v] = 1
print(d)
df = pd.DataFrame(d, ignore_index=True)
print(df)
output_table = df
I am definitely getting it all wrong both in the for loop and when adding things to my dataframe, but nothing really seems to work!
Sorry for the super long question, hope it was clear! Any ideas?

I do not know if this answers your question completely, because i do not know, what you data can look like, but it's a good step that should help you, i think. I added the same data multiple time, but the way should be clear.
import pandas as pd
response = {'Labels': [{'Name': 'Pet', 'Confidence': 96.146484375, 'Instances': [], 'Parents': [{'Name': 'Animal'}]},
{'Name': 'Cat', 'Confidence': 96.146484375, 'Instances': [{'BoundingBox':
{'Width': 0.6686800122261047,
'Height': 0.9005332589149475,
'Left': 0.27255237102508545,
'Top': 0.03728689253330231},
'Confidence': 96.146484375}],
'Parents': [{'Name': 'Pet'}]
}]}
def handle_new_data(repsonse_data: dict, image_name: str) -> pd.DataFrame:
d = {"Image": image_name}
result = pd.DataFrame()
for key, value in repsonse_data.items():
for el in value:
if isinstance(el, dict):
for k, v in el.items():
if k == "Name":
d[v] = 1
result = result.append(d, ignore_index=True)
return result
df_all = pd.DataFrame()
df_all = df_all.append(handle_new_data(response, "image1"))
df_all = df_all.append(handle_new_data(response, "image2"))
df_all = df_all.append(handle_new_data(response, "image3"))
df_all = df_all.append(handle_new_data(response, "image4"))
df_all.reset_index(inplace=True)
print(df_all)

List of Dictionaries to DataFrame

I have a data like this and I want the data to be written in a dataframe so that I can convert it directly into a csv file.
Data =
[ {'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us', etc},
{'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage', etc} , ......
{'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant', etc}} ]
This is How I am able to access its values:
for item in list_of_dict_responses:
print item['event']
for key, value in item.items():
if type(value) is dict:
for k, v in value.items():
print k,v
I want it in a dataframe where event is a column with value of User Clicked and properties is a another column with sub column of user_id, page_visited, contact_us and then respective values of sub column.

flatten the nested dictionaries & then just use the data frame constructor to create a data frame.
data = [
{'event': 'User Clicked', 'properties': {'user_id': '123', 'page_visited': 'contact_us'}},
{'event': 'User Clicked', 'properties': {'user_id': '456', 'page_visited': 'homepage'}},
{'event': 'User Clicked', 'properties': {'user_id': '789', 'page_visited': 'restaurant'}}
]
The flattened dictionary may be constructed in several ways. Here's 1 method using a generator that is generic & will work with arbitrary-depth nested dictionaries (or at least until it hits the max recursion depth)
def flatten(kv, prefix=[]):
for k, v in kv.items():
if isinstance(v, dict):
yield from flatten(v, prefix+[str(k)])
else:
if prefix:
yield '_'.join(prefix+[str(k)]), v
else:
yield str(k), v
Then using list comprehension to flatten all the records in data, construct the data frame
pd.DataFrame({k:v for k, v in flatten(kv)} for kv in data)
#Out
event properties_page_visited properties_user_id
0 User Clicked contact_us 123
1 User Clicked homepage 456
2 User Clicked restaurant 789

You have 2 options: either use a MultiIndex for columns, or add a prefix for data in properties. The former, in my opinion, is not appropriate here, since you don't have a "true" hierarchical columnar structure. The second level, for example, would be empty for event.
Implementing the second idea, you can restructure your list of dictionaries before feeding to pd.DataFrame. The syntax {**d1, **d2} is used to combine two dictionaries.
data_transformed = [{**{'event': d['event']},
**{f'properties_{k}': v for k, v in d['properties'].items()}} \
for d in Data]
res = pd.DataFrame(data_transformed)
print(res)
event properties_page_visited properties_user_id
0 User Clicked contact_us 123
1 User Clicked homepage 456
2 User Clicked restaurant 789
This also aids writing to and reading from CSV files, where a MultiIndex can be ambiguous.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to loop through nested dictionaries in a JSON - python

Related

json to dataframe in python: some fields don't convert to dataframe

loop through json file python to get specific key and values

Extracting dictionary values from dataframe and creating a new Pandas column

Fill pandas dataframe within a for loop

List of Dictionaries to DataFrame

Categories

Resources