I have one json file. I opened it with pd.read_json and then whem parsing to a geodtaframe just some fields are considered. Nut some are not. When I open it on QGIS for instance there are multiple columns that I cannot convert to geodataframe.
So my file is called PT:
PT = pd.read_json('PT.json')
PT
type features
0 FeatureCollection {'id': 'osm-w96717521', 'type': 'Feature', 'pr...
1 FeatureCollection {'id': 'osm-w96850552', 'type': 'Feature', 'pr...
2 FeatureCollection {'id': 'osm-r1394361', 'type': 'Feature', 'pro...
and for different PT lines I have different fields:
So for instance for:
PT['features'][0]
{'id': 'osm-w96717521',
'type': 'Feature',
'properties': {'height': 24,
'heightSrc': 'manual',
'levels': 8,
'date': 201804},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.151539, 38.725054],
[-9.15148, 38.724906],
[-9.151281, 38.724918],
[-9.151254, 38.724867],
[-9.151142, 38.724699],
[-9.150984, 38.724783],
[-9.151081, 38.724918],
[-9.151152, 38.725076],
[-9.151539, 38.725054]]]}}
and for:
PT['features'][100000]
{'id': 'osm-w556092901',
'type': 'Feature',
'properties': {'date': 201801, 'orient': 95, 'height': 3, 'heightSrc': 'ai'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-9.402381, 38.742663],
[-9.402342, 38.74261],
[-9.402215, 38.742667],
[-9.402281, 38.742706],
[-9.402381, 38.742663]]]}}
it has also the field 'orient'.
When I convert the features dict to each column on a df, for some columns result:
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"])
df
but those that do not appear on every line I cannot consider. So for 'levels' or 'orient':
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In [46], line 1
----> 1 df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"])
(...)
KeyError: 'levels'
How can I get all columns contained in feature even if for some values they should be null?
You can use an if/else construct like return nan if the key does not exist.
import numpy as np
df["floors"] = nta["features"].apply(lambda row: row["properties"]["levels"] if 'levels' in list(row["properties"].keys()) else np.nan)
df["coordinates"] = nta["features"].apply(lambda row: row["geometry"]["coordinates"] if 'coordinates' in list(row["geometry"].keys()) else np.nan)
I have a json file built like this:
{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
I would like to get for each feature: properties and geometry but I think I loop badly on my json file. here is my code
data = pd.read_json(json_file_path)
for key, v in data.items():
print(f"{key['features']['geometry']} : {v}",
f"{key['features']['properties']} : {v}")
The values you are interested in are located in a list that is itself a value of your main dictionary.
If you want to be able to process these values with pandas, it would be better to build your dataframe directly from them:
import json
import pandas as pd
data = json.loads("""{"type":"FeatureCollection","features":[
{"type":"Feature","id":"010020000A0225","geometry":{"type":"Polygon","coordinates":[[[5.430767,46.0214267],[5.4310805,46.0220116],[5.4311205,46.0220864],[5.4312362,46.0223019],[5.4308994,46.0224141],[5.43087,46.0224242],[5.430774,46.0222401],[5.4304506,46.0223202],[5.4302885,46.021982],[5.4300391,46.0216054],[5.4299637,46.0216342],[5.4300862,46.0218401],[5.4299565,46.021902],[5.4298847,46.0218195],[5.4298545,46.0217829],[5.4297689,46.0216672],[5.4297523,46.0216506],[5.4297379,46.0216389],[5.4296432,46.0215854],[5.429517,46.0214509],[5.4294188,46.0213458],[5.4293757,46.0213128],[5.4291918,46.0211768],[5.4291488,46.0211448],[5.4291083,46.0211214],[5.429024,46.0210828],[5.4292965,46.0208202],[5.4294241,46.0208894],[5.4295183,46.0209623],[5.4295455,46.0209865],[5.429613,46.0210554],[5.4296428,46.0210813],[5.4298751,46.0212862],[5.429988,46.0213782],[5.430014,46.0213973],[5.4300746,46.0214318],[5.430124,46.0214542],[5.4302569,46.0215069],[5.4303111,46.0215192],[5.4303632,46.0215166],[5.4306127,46.0214642],[5.430767,46.0214267]]]},"properties":{"id":"010020000A0225","commune":"01002","prefixe":"000","section":"A","numero":"225","contenance":9440,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}},
{"type":"Feature","id":"010020000A0346","geometry":{"type":"Polygon","coordinates":[[[5.4241952,46.0255535],[5.4233594,46.0262031],[5.4232624,46.0262774],[5.4226259,46.0267733],[5.4227608,46.0268718],[5.4227712,46.0268789],[5.4226123,46.0269855],[5.422565,46.0270182],[5.4223546,46.027145],[5.4222957,46.0271794],[5.4221794,46.0272376],[5.4221383,46.0272585],[5.4221028,46.027152],[5.4220695,46.0270523],[5.4220378,46.026962],[5.4220467,46.0269265],[5.4220524,46.0268709],[5.4220563,46.0268474],[5.4222945,46.0268985],[5.4224161,46.0267746],[5.4224581,46.0267904],[5.4226286,46.02666],[5.4226811,46.02662],[5.4227313,46.0265803],[5.4227813,46.0265406],[5.4228535,46.0264868],[5.4229063,46.0264482],[5.4229741,46.0264001],[5.4234903,46.0260331],[5.4235492,46.0259893],[5.4235787,46.0259663],[5.423645,46.0259126],[5.4237552,46.0258198],[5.4237839,46.0257951],[5.4238321,46.0257547],[5.4239258,46.0256723],[5.4239632,46.0256394],[5.4241164,46.0255075],[5.4241952,46.0255535]]]},"properties":{"id":"010020000A0346","commune":"01002","prefixe":"000","section":"A","numero":"346","contenance":2800,"arpente":false,"created":"2005-06-03","updated":"2018-09-25"}}
]
}
""")
df = pd.DataFrame(data['features'])
print(df)
It'll give you the following DataFrame:
type id geometry properties
0 Feature 010020000A0225 {'type': 'Polygon', 'coordinates': [[[5.430767... {'id': '010020000A0225', 'commune': '01002', '...
1 Feature 010020000A0346 {'type': 'Polygon', 'coordinates': [[[5.424195... {'id': '010020000A0346', 'commune': '01002', '...
From there you can easily access the geometry and properties columns.
Furthermore, if you want geometric and other properties in their own columns, you can use json_normalize:
df = pd.json_normalize(data['features'])
print(df)
Output:
type id geometry.type geometry.coordinates ... properties.contenance properties.arpente properties.created properties.updated
0 Feature 010020000A0225 Polygon [[[5.430767, 46.0214267], [5.4310805, 46.02201... ... 9440 False 2005-06-03 2018-09-25
1 Feature 010020000A0346 Polygon [[[5.4241952, 46.0255535], [5.4233594, 46.0262... ... 2800 False 2005-06-03 2018-09-25
I have a json file that gives the polygons of the neighborhoods of Chicago. Here is a small sample of the form.
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]
I want to create a dataframe where I have each 'SEC_NEIGH', linked to the coordinates such that
df['SEC_NEIGH'] = 'coordinates'
I have tried using a for loop to loop through the dictionaries but when I do so, the dataframe comes out with only showing an '_'
df = {}
for item in data:
if 'features' in item:
if 'properties' in item:
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
I was expecting something where each column would be a separate neighborhood, with only one value, that being the list of coordinates. Instead, my dataframe outputs as a single underscore('_'). Is there something wrong with my for loop?
try this :
import pandas as pd
data=[
{'type': 'Feature',
'properties': {'PRI_NEIGH': 'Printers Row',
'SEC_NEIGH': 'PRINTERS ROW',
'SHAPE_AREA': 2162137.97139,
'SHAPE_LEN': 6864.247156},
'geometry': {'type': 'Polygon',
'coordinates': [[-87.62760697485339, 41.87437097785366],
[-87.6275952566332, 41.873861712441126],
[-87.62756611032259, 41.873091933433905],
[-87.62755513014902, 41.872801941012725],
[-87.62754038267386, 41.87230261598636],
[-87.62752573582432, 41.8718067089444],
[-87.62751740010017, 41.87152447340544],
[-87.62749380061304, 41.87053328991345],
[-87.62748640976544, 41.87022285721281],
[-87.62747968351987, 41.86986997314866],
[-87.62746758964467, 41.86923545315858],
[-87.62746178584428, 41.868930955522266]]
}}
]
df = {}
for item in data:
if item["type"] =='Feature':
if 'properties' in item.keys():
nn = item.get("properties").get("PRI_NEIGH")
if 'geometry' in item:
coords = item.get('geometry').get('coordinates')
df[nn] = coords
df_n=pd.DataFrame(df)
print(df_n)
output:
Printers Row
0 [-87.62760697485339, 41.87437097785366]
1 [-87.6275952566332, 41.873861712441126]
2 [-87.62756611032259, 41.873091933433905]
3 [-87.62755513014902, 41.872801941012725]
4 [-87.62754038267386, 41.87230261598636]
5 [-87.62752573582432, 41.8718067089444]
6 [-87.62751740010017, 41.87152447340544]
7 [-87.62749380061304, 41.87053328991345]
8 [-87.62748640976544, 41.87022285721281]
9 [-87.62747968351987, 41.86986997314866]
10 [-87.62746758964467, 41.86923545315858]
11 [-87.62746178584428, 41.868930955522266]