Extracting dictionary values from dataframe and creating a new Pandas column - python

I have a Pandas DataFrame with several columns, one of which is a dictionary containing coordinates in a list. This is what the entry looks like:
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}
I would like to extract this data and create 2 new columns in the original DataFrame, one for the latitude and one for longitude.
ID
Latitude
Longitude
1
-120.12345
50.23456
I have not been able to find a simple solution at this point and would be grateful for any guidance.

You can access dictionary get method through the .str
test = pd.DataFrame(
{
"ID": [1, 2],
"point": [
{'type': 'Point', 'coordinates': [-120.12345, 50.23456]},
{'type': 'Point', 'coordinates': [-10.12345, 50.23456]}]
},
)
pd.concat([
test["ID"],
pd.DataFrame(
test['point'].str.get('coordinates').to_list(),
columns=['Latitude', 'Longitude']
)
],axis=1)

You can use str to fetch the required structure:
df = pd.DataFrame({'Col' : [{'type': 'Point', 'coordinates': [-120.12345, 50.23456]}]})
df['Latitude'] = df.Col.str['coordinates'].str[0]
df['Longitude'] = df.Col.str['coordinates'].str[1]
OUTPUT:
Col Latitude Longitude
0 {'type': 'Point', 'coordinates': [-120.12345, ... -120.12345 50.23456

Related

json to dataframe in python: some fields don't convert to dataframe

I have one json file. I opened it with pd.read_json and then whem parsing to a geodtaframe just some fields are considered. Nut some are not. When I open it on QGIS for instance there are multiple columns that I cannot convert to geodataframe.
So my file is called PT:
PT = pd.read_json('PT.json')
PT
type features
0 FeatureCollection {'id': 'osm-w96717521', 'type': 'Feature', 'pr...
1 FeatureCollection {'id': 'osm-w96850552', 'type': 'Feature', 'pr...
2 FeatureCollection {'id': 'osm-r1394361', 'type': 'Feature', 'pro...
and for different PT lines I have different fields:
So for instance for:
PT['features'][0]
{'id': 'osm-w96717521',
'type': 'Feature',
'properties': {'height': 24,
'heightSrc': 'manual',
'levels': 8,
'date': 201804},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.151539, 38.725054],
[-9.15148, 38.724906],
[-9.151281, 38.724918],
[-9.151254, 38.724867],
[-9.151142, 38.724699],
[-9.150984, 38.724783],
[-9.151081, 38.724918],
[-9.151152, 38.725076],
[-9.151539, 38.725054]]]}}
and for:
PT['features'][100000]
{'id': 'osm-w556092901',
'type': 'Feature',
'properties': {'date': 201801, 'orient': 95, 'height': 3, 'heightSrc': 'ai'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.402381, 38.742663],
[-9.402342, 38.74261],
[-9.402215, 38.742667],
[-9.402281, 38.742706],
[-9.402381, 38.742663]]]}}
it has also the field 'orient'.
When I convert the features dict to each column on a df, for some columns result:
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"])
df
but those that do not appear on every line I cannot consider. So for 'levels' or 'orient':
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In [46], line 1
----> 1 df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
(...)
KeyError: 'levels'
How can I get all columns contained in feature even if for some values they should be null?
You can use an if/else construct like return nan if the key does not exist.
import numpy as np
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"] if 'levels' in list(row["properties"].keys()) else np.nan)
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"] if 'coordinates' in list(row["geometry"].keys()) else np.nan)

loop through json file python to get specific key and values

I have a json file built like this:
{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
I would like to get for each feature: properties and geometry but I think I loop badly on my json file. here is my code
data = pd.read_json(json_file_path)
for key, v in data.items():
print(f"{key['features']['geometry']} : {v}",
f"{key['features']['properties']} : {v}")
The values you are interested in are located in a list that is itself a value of your main dictionary.
If you want to be able to process these values with pandas, it would be better to build your dataframe directly from them:
import json
import pandas as pd
data = json.loads("""{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}}
]
}
""")
df = pd.DataFrame(data['features'])
print(df)
It'll give you the following DataFrame:
type id geometry properties
0 Feature 010020000A0225 {'type': 'Polygon', 'coordinates': [[[5.430767... {'id': '010020000A0225', 'commune': '01002', '...
1 Feature 010020000A0346 {'type': 'Polygon', 'coordinates': [[[5.424195... {'id': '010020000A0346', 'commune': '01002', '...
From there you can easily access the geometry and properties columns.
Furthermore, if you want geometric and other properties in their own columns, you can use json_normalize:
df = pd.json_normalize(data['features'])
print(df)
Output:
type id geometry.type geometry.coordinates ... properties.contenance properties.arpente properties.created properties.updated
0 Feature 010020000A0225 Polygon [[[5.430767, 46.0214267], [5.4310805, 46.02201... ... 9440 False 2005-06-03 2018-09-25
1 Feature 010020000A0346 Polygon [[[5.4241952, 46.0255535], [5.4233594, 46.0262... ... 2800 False 2005-06-03 2018-09-25

Add dict as value to dataframe

I want to add a dict to a dataframe and the appended dict has dicts or list as value.
Example:
abc = {'id': 'niceId',
'category': {'sport':'tennis',
'land': 'USA'
},
'date': '2022-04-12T23:33:21+02:00'
}
Now, I want to add this dict to a dataframe. I tried this, but it failed:
df = pd.DataFrame(abc, columns = abc.keys())
Output:
ValueError: All arrays must be of the same length
I'm thankful for your help.
Your question is not very clear in terms of what your expected output is. But assuming you want to create a dataframe where the columns should be id, category, date and numbers (just added to show the list case) in which each cell in the category column keeps a dictionary and each cell in the numbers column keeps a list, you may use from_dict method with transpose:
abc = {'id': 'niceId',
'category': {'sport':'tennis',
'land': 'USA'
},
'date': '2022-04-12T23:33:21+02:00',
'numbers': [1,2,3,4,5]
}
df = pd.DataFrame.from_dict(abc, orient="index").T
gives you a dataframe as:
id
category
date
numbers
0
niceId
{'sport':'tennis','land': 'USA'}
2022-04-12T23:33:21+02:00
[1,2,3,4,5]
So let's say you want to add another item to this dataframe:
efg = {'id': 'notniceId',
'category': {'sport':'swimming',
'land': 'UK'
},
'date': '2021-04-12T23:33:21+02:00',
'numbers': [4,5]
}
df2 = pd.DataFrame.from_dict(efg, orient="index").T
pd.concat([df, df2], ignore_index=True)
gives you a dataframe as:
id
category
date
numbers
0
niceId
{'sport':'tennis','land': 'USA'}
2022-04-12T23:33:21+02:00
[1,2,3,4,5]
1
notniceId
{'sport':'swimming','land': 'UK'}
2021-04-12T23:33:21+02:00
[4,5]

Create pandas MultiIndex Dataframe from json

I am receiving the following json from a webservice:
{
"headers":[
{
"seriesId":"18805",
"Name":"Name1",
"assetId":"4"
},
{
"seriesId":"18801",
"Name":"Name2",
"assetId":"209"
}
],
"values":[
{
"Date":"01-Jan-2021",
"18805":"127.93",
"18801":"75.85"
}
]
}
Is there a way to create a MultiIndex dataframe from this data? I would like Date to be the row index and the rest to be column indexes.
the values key is a straight forward data frame
columns can be rebuilt from headers key
js = {'headers': [{'seriesId': '18805', 'Name': 'Name1', 'assetId': '4'},
{'seriesId': '18801', 'Name': 'Name2', 'assetId': '209'}],
'values': [{'Date': '01-Jan-2021', '18805': '127.93', '18801': '75.85'}]}
# get values into dataframe
df = pd.DataFrame(js["values"]).set_index("Date")
# get headers for use in rebuilding column names
dfc = pd.DataFrame(js["headers"])
# rebuild columns
df.columns = pd.MultiIndex.from_tuples(dfc.apply(tuple, axis=1), names=dfc.columns)
print(df)
seriesId 18805 18801
Name Name1 Name2
assetId 4 209
Date
01-Jan-2021 127.93 75.85

How to loop through nested dictionaries in a JSON

I have a json file that gives the polygons of the neighborhoods of Chicago. Here is a small sample of the form.
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]
I want to create a dataframe where I have each 'SEC_NEIGH', linked to the coordinates such that
df['SEC_NEIGH'] = 'coordinates'
I have tried using a for loop to loop through the dictionaries but when I do so, the dataframe comes out with only showing an '_'
df = {}
for item in data:
if 'features' in item:
if 'properties' in item:
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
I was expecting something where each column would be a separate neighborhood, with only one value, that being the list of coordinates. Instead, my dataframe outputs as a single underscore('_'). Is there something wrong with my for loop?
try this :
import pandas as pd
data=[
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]]
}}
]
df = {}
for item in data:
if item["type"] =='Feature':
if 'properties' in item.keys():
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
print(df_n)
output:
Printers Row
0 [-87.62760697485339, 41.87437097785366]
1 [-87.6275952566332, 41.873861712441126]
2 [-87.62756611032259, 41.873091933433905]
3 [-87.62755513014902, 41.872801941012725]
4 [-87.62754038267386, 41.87230261598636]
5 [-87.62752573582432, 41.8718067089444]
6 [-87.62751740010017, 41.87152447340544]
7 [-87.62749380061304, 41.87053328991345]
8 [-87.62748640976544, 41.87022285721281]
9 [-87.62747968351987, 41.86986997314866]
10 [-87.62746758964467, 41.86923545315858]
11 [-87.62746178584428, 41.868930955522266]

Categories