I am pulling information from a database and the location information comes back as GeoJSON. What I want to do is use np.where to apply a transformation on the pandas' dataframe to each row value. Can this be done instead of using apply?
Here is the np.where I am using:
The gist is this: If a value in location is null ensure it stays NULL, else convert the value to the Shapely Geometry object.
import pandas as pd
import numpy as np
from shapely.geometry import shape
class Geometry():
def __init__(self, data):
self._j = shape(data)
dummy_data = [{'_key': '42741', '_id': 'points/42741', '_rev': '_a6hqTXS-BI', 'location': {'coordinates': [-73.9942131, 40.7152034], 'type': 'Point'}, 'name': 'Lao Di Fang Noodle House'}, {'_key': '51156', '_id': 'points/51156', '_rev': '_a6hqTb6-_8', 'location': {'coordinates': [-73.9939157, 40.7154794], 'type': 'Point'}, 'name': '68 Deli Canal Store'}, {'_key': '42994', '_id': 'points/42994', '_rev': '_a6hqTXe--g', 'location': {'coordinates': [-73.9952133, 40.716076], 'type': 'Point'}, 'name': "Popeye'S Chicken & Biscuits"}, {'_key': '45769', '_id': 'points/45769', '_rev': '_a6hqTZC-_W', 'location': {'coordinates': [-73.9937748, 40.715124], 'type': 'Point'}, 'name': '27 Sheng Wang Noodle Shop'}]
df = pd.DataFrame(dummy_data)
df.location = np.where(df.location.isnull(), np.nan, Geometry(df.location.values))
I know this does not work, but I want to take advantage of the speed of numpy for these transformation processes.
Any advice/suggestions are welcome!
Related
I want to display a shape over the Canada map.
The idea is 2 shapes in different years.
But my slide at the end says:
"Time Not Available"
I tried to find here at the community, but I haven't found a problem like it.
Here you can find my file and here you can find my code:
import folium
from folium.plugins import TimestampedGeoJson
import json
import pandas as pd
with open('outputfile.json') as f:
poly = json.load(f)
features = [
{
'type': 'Feature',
'geometry': {
'type': 'MultiPolygon',
'coordinates': pol['coordinates'],
},
'properties': {
'ABBREVNAME': pol['ABBREVNAME'],
'time': pol['date'],
}
} for pol in poly
]
mapa = folium.Map(
location = [56.130,-106.35],
tiles='openstreetmap',
zoom_start = 3
)
TimestampedGeoJson({'type': 'FeatureCollection', 'features': features}).add_to(mapa)
mapa
Thanks!!!
I had the same problem of yours of time not available and solved by following the example on the documentation and this other post.
Basically, some key points from doc:
1- It's is not 'time' but "times" and it must be the same length of the list of coordinates
2- Lookout for time format it only takes ISO or ms epoch
enter image description here
Here is a code example of a store location code i was working, I didn't try Polygon yet but hope it helps u:
m = folium.Map([-23.579782, -46.687754], zoom_start=6, tiles="cartodbpositron")
TimestampedGeoJson({
'type': 'FeatureCollection',
'features': [
{
'type': 'Feature',
'geometry': {
'type': 'LineString',
'coordinates': [[-46.687754, -23.579782]],
},
'properties': {
'icon': 'marker',
'iconstyle': {
'iconSize': [20, 20],
'iconUrl':
'https://img.icons8.com/ios-filled/50/000000/online-store.png'
},
'id': 'house',
'popup': 1,
'times': [1633046400000.0]
}
}, {
'type': 'Feature',
'geometry': {
'type': 'LineString',
'coordinates': [[-46.887754, -23.579782]],
},
'properties': {
'icon': 'marker',
'iconstyle': {
'iconSize': [20, 20],
'iconUrl':
'https://img.icons8.com/ios-filled/50/000000/online-store.png'
},
'id': 'house',
'popup': 1,
'times': [1635046400000.0]
}
}
]
}).add_to(m)
folium_static(m)
m.save('test.html')
I am using geocodio to derive coordinates from a list of 2272 addresses from my dataframe. When I try to flatten the results using json_normalize, I get coordinates but my dataframe is 4800+ rows instead of the correct 2272 for each address.
import json
from pandas.io.json import json_normalize
addys = json_normalize(locations, record_path=['results'])
addys = addys[['location']]
The resulting JSON output is as follows: (I only want the lat/lon coordinates for each address under the 'results' -- 'location' section for each entry)
[{'input': {'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'zip': '73118',
'country': 'US'},
'formatted_address': '3704 N Western, Oklahoma City, OK 73118'},
'results': [{'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'zip': '73118',
'country': 'US'},
'formatted_address': '3704 N Western Ave, Oklahoma City, OK 73118',
'location': {'lat': 35.507996, 'lng': -97.52952},
...
'source': 'Oklahoma'},
{'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'country': 'US'},
'formatted_address': '3704 N Western Ave, Oklahoma City, OK 73118',
'location': {'lat': 35.508013, 'lng': -97.529453},
...
'source': 'Acog Counties'}]},
{'input': {'address_components': {'number': '1503',
'street': 'Winding Ridge',
The idea is to go down your dictionary. Like this:
import json
data = json.loads('{"a":{"b":{"location": {"lat": 35.507996, "lng": -97.52952}}}}')
data = data["a"]["b"]["location"]
print(data)
You can also read you nested json file as a pandas DataFrame. Call your file d (that's the json file you gave in your question). Read it with
df = pd.json_normalize(d)
df.columns = df.columns.map(lambda x: x.split(".")[-1])
I have the following as part of a function to extract some info from a json response (at the bottom you can find a portion of it), and is working fine. But......
venues_list=[]
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, search_categoryId, radius, LIMIT)
results = requests.get(url).json()["response"]['venues']
venues_list.append([(
v['name'],
v['location']['lat'],
v['location']['lng'],
v['categories'][0]['name'],
v['id']) for v in results])
To the mix of objects I'm retrieving from the json file, i want to add:
v['location']['postalCode']
But is not working. I get a KeyError: 'postalCode'
If I do:
v['location']['distance']
v['location']['country']
v['location']['formattedAddress']
It works. I get no error.
These don't work either:
v['location']['cc']
v['location']['city']
I get the same KeyError: 'cc' , KeyError: 'city'
Is there something I'm missing? Can you help me understand why it behaves this way?
[{'id': '4ad4c061f964a52099f720e3',
'name': 'Live Organic Food Bar',
'location': {'address': '264 Dupont Street',
'lat': 43.67505287052667,
'lng': -79.40671518307245,
'labeledLatLngs': [{'label': 'display',
'lat': 43.67505287052667,
'lng': -79.40671518307245}],
'distance': 273,
'postalCode': 'M5R 1V7',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['264 Dupont Street', 'Toronto ON M5R 1V7', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1d3941735',
'name': 'Vegetarian / Vegan Restaurant',
'pluralName': 'Vegetarian / Vegan Restaurants',
'shortName': 'Vegetarian / Vegan',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
'suffix': '.png'},
'primary': True}],
'referralId': 'v-1591407049',
'hasPerk': False},
I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f
I have a dictionary that is really a geojson:
points = {
'crs': {'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}, 'type': 'name'},
'features': [
{'geometry': {
'coordinates':[[[-3.693162104185235, 40.40734504903418],
[-3.69320229317164, 40.40719570724241],
[-3.693227952841606, 40.40698546120488],
[-3.693677594635894, 40.40712700492216]]],
'type': 'Polygon'},
'properties': {
'name': 'place1',
'temp': 28},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place2',
'temp': 27},
'type': 'Feature'
},
{'geometry': {
'coordinates': [[[-3.703886381691941, 40.405197271972035],
[-3.702972834622821, 40.40506272989243],
[-3.702552994966045, 40.40506798079752],
[-3.700985024825222, 40.405500820623814]]],
'type': 'Polygon'},
'properties': {
'name': 'place',
'temp': 25},
'type': 'Feature'
}
],
'type': u'FeatureCollection'
}
I would like to filter it to stay only with places that have a specific temperature, for example, more than 25 degrees Celsius.
I have managed to do it this way:
dict(crs = points["crs"],
features = [i for i in points["features"] if i["properties"]["temp"] > 25],
type = points["type"])
But I wondered if there was any way to do it more directly, with dictionary comprehension.
Thank you very much.
I'm very late. A dict compreheneison won't help you since you have only three keys. But if you meet the following conditions: 1. you don't need a copy of features (e.g. your dict is read only); 2. you don't need index access to features, you my use a generator comprehension instead of a list comprehension:
dict(crs = points["crs"],
features = (i for i in points["features"] if i["properties"]["temp"] > 25),
type = points["type"])
The generator is created in constant time, while the list comprehension is created in O(n). Furthermore, if you create a lot of those dicts, you have only one copy of the features in memory.