I have one json file. I opened it with pd.read_json and then whem parsing to a geodtaframe just some fields are considered. Nut some are not. When I open it on QGIS for instance there are multiple columns that I cannot convert to geodataframe.
So my file is called PT:
PT = pd.read_json('PT.json')
PT
type features
0 FeatureCollection {'id': 'osm-w96717521', 'type': 'Feature', 'pr...
1 FeatureCollection {'id': 'osm-w96850552', 'type': 'Feature', 'pr...
2 FeatureCollection {'id': 'osm-r1394361', 'type': 'Feature', 'pro...
and for different PT lines I have different fields:
So for instance for:
PT['features'][0]
{'id': 'osm-w96717521',
'type': 'Feature',
'properties': {'height': 24,
'heightSrc': 'manual',
'levels': 8,
'date': 201804},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.151539, 38.725054],
[-9.15148, 38.724906],
[-9.151281, 38.724918],
[-9.151254, 38.724867],
[-9.151142, 38.724699],
[-9.150984, 38.724783],
[-9.151081, 38.724918],
[-9.151152, 38.725076],
[-9.151539, 38.725054]]]}}
and for:
PT['features'][100000]
{'id': 'osm-w556092901',
'type': 'Feature',
'properties': {'date': 201801, 'orient': 95, 'height': 3, 'heightSrc': 'ai'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.402381, 38.742663],
[-9.402342, 38.74261],
[-9.402215, 38.742667],
[-9.402281, 38.742706],
[-9.402381, 38.742663]]]}}
it has also the field 'orient'.
When I convert the features dict to each column on a df, for some columns result:
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"])
df
but those that do not appear on every line I cannot consider. So for 'levels' or 'orient':
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In [46], line 1
----> 1 df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
(...)
KeyError: 'levels'
How can I get all columns contained in feature even if for some values they should be null?
You can use an if/else construct like return nan if the key does not exist.
import numpy as np
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"] if 'levels' in list(row["properties"].keys()) else np.nan)
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"] if 'coordinates' in list(row["geometry"].keys()) else np.nan)
Related
I know the "create pandas dataframe from nested dict" has a lot of entries here but I'm not found the answer that applies to my problem:
I have a dict like this:
{'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}....,
and after reading with pandas look like this:
df = pd.DataFrame.from_dict(my_dict,orient='index')
print(df)
id 1
creator_user_id {'id': 12170254, 'name': 'Nicolas', 'email': '...
user_id {'id': 12264469, 'name': 'Daniela Giraldo G', ...
person_id {'active_flag': True, 'name': 'Cristina Cardoz...
org_id {'name': 'Cristina Cardozo', 'people_count': 1...
stage_id 2
title Cristina Cardozo
I would like to create a one-row dataframe where, for example, the nested creator_user_id column results in several columns that I after can name: creator_user_id_id, creator_user_id_name, etc.
thank you for your time!
Given you want one row, just use json_normalize()
pd.json_normalize({'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}})
This question already has answers here:
How can I access and process nested objects, arrays, or JSON?
(31 answers)
Closed 1 year ago.
My assignment is requiring me to write code that can handle finding elements from one json list in another json list.
Here is an example of the data:
data_to_find = {'location': {'state': 'WA'}, 'active': True}
data_to_look_in = {'id': 3, 'last': 'Black', 'first': 'Jim', 'location': {'city': 'Spokane', 'state': 'WA', 'postalCode': '99207'}, 'active': True}
print(data_to_find)
print(data_to_look_in)
output:
{'location': {'state': 'WA'}, 'active': True}
{'id': 3, 'last': 'Black', 'first': 'Jim', 'location': {'city': 'Spokane', 'state': 'WA', 'postalCode': '99207'}, 'active': True}
I have to find to find the data in data_to_find in data_to_look_in and I am having a lot of trouble doing this. Here is what I have tried and what it returns:
for line in data_to_find:
tmp1 = data_to_find[line]
tmp2 = data_to_look_in[line]
print(tmp1)
print(tmp2)
if tmp1 in tmp2:
print("found")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-cf3b325f53e8> in <module>()
4 print(tmp1)
5 print(tmp2)
----> 6 if tmp1 in tmp2:
7 print("found")
TypeError: unhashable type: 'dict'
I've tried other ways that seem to return the same error so I am looking for help on how to best approach and solve this issue. Any tips would be appreciated. Thank you.
EDIT:
I am trying to write this so it can be done generically not just for these specific data values.
Perhaps use Pandas:
import pandas as pd
data_to_find = {'location': {'state': 'WA'}, 'active': True}
data_to_look_in = {'id': 3, 'last': 'Black', 'first': 'Jim', 'location': {'city': 'Spokane', 'state': 'WA', 'postalCode': '99207'}, 'active': True}
df1 = pd.DataFrame.from_dict(data_to_find)
print(df1)
'''
location active
state WA True
'''
df2 = pd.DataFrame.from_dict(data_to_look_in)
print(df2)
'''
id last first location active
city 3 Black Jim Spokane True
postalCode 3 Black Jim 99207 True
state 3 Black Jim WA True
'''
search_results = pd.merge(df1, df2, how ='inner', on =['location', 'active'])
print(search_results)
'''
location active id last first
0 WA True 3 Black Jim
'''
If the length of search_results is greater than zero, you have one or more matches:
print(len(search_results))
'''
1
'''
This example merges on location and active columns. To solve this with a generic data_to_find object, where you don't know column names ahead of time, you might use on=data_to_find.keys(), using the top-level keys in that object. Those keys will need to be in the data_to_look_in object for the merge to work correctly.
I have a Pandas DataFrame with several columns, one of which is a dictionary containing coordinates in a list. This is what the entry looks like:
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}
I would like to extract this data and create 2 new columns in the original DataFrame, one for the latitude and one for longitude.
ID
Latitude
Longitude
1
-120.12345
50.23456
I have not been able to find a simple solution at this point and would be grateful for any guidance.
You can access dictionary get method through the .str
test = pd.DataFrame(
{
"ID": [1, 2],
"point": [
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]},
{'type': 'Point', 'coordinates': [-10.12345, 50.23456]}]
},
)
pd.concat([
test["ID"],
pd.DataFrame(
test['point'].str.get('coordinates').to_list(),
columns=['Latitude', 'Longitude']
)
],axis=1)
You can use str to fetch the required structure:
df = pd.DataFrame({'Col' : [{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}]})
df['Latitude'] = df.Col.str['coordinates'].str[0]
df['Longitude'] = df.Col.str['coordinates'].str[1]
OUTPUT:
Col Latitude Longitude
0 {'type': 'Point', 'coordinates': [-120.12345, ... -120.12345 50.23456
I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active
I have a json file that gives the polygons of the neighborhoods of Chicago. Here is a small sample of the form.
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]
I want to create a dataframe where I have each 'SEC_NEIGH', linked to the coordinates such that
df['SEC_NEIGH'] = 'coordinates'
I have tried using a for loop to loop through the dictionaries but when I do so, the dataframe comes out with only showing an '_'
df = {}
for item in data:
if 'features' in item:
if 'properties' in item:
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
I was expecting something where each column would be a separate neighborhood, with only one value, that being the list of coordinates. Instead, my dataframe outputs as a single underscore('_'). Is there something wrong with my for loop?
try this :
import pandas as pd
data=[
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]]
}}
]
df = {}
for item in data:
if item["type"] =='Feature':
if 'properties' in item.keys():
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
print(df_n)
output:
Printers Row
0 [-87.62760697485339, 41.87437097785366]
1 [-87.6275952566332, 41.873861712441126]
2 [-87.62756611032259, 41.873091933433905]
3 [-87.62755513014902, 41.872801941012725]
4 [-87.62754038267386, 41.87230261598636]
5 [-87.62752573582432, 41.8718067089444]
6 [-87.62751740010017, 41.87152447340544]
7 [-87.62749380061304, 41.87053328991345]
8 [-87.62748640976544, 41.87022285721281]
9 [-87.62747968351987, 41.86986997314866]
10 [-87.62746758964467, 41.86923545315858]
11 [-87.62746178584428, 41.868930955522266]