Extract Location Coordinates from JSON results - python

I am using geocodio to derive coordinates from a list of 2272 addresses from my dataframe. When I try to flatten the results using json_normalize, I get coordinates but my dataframe is 4800+ rows instead of the correct 2272 for each address.
import json
from pandas.io.json import json_normalize
addys = json_normalize(locations, record_path=['results'])
addys = addys[['location']]
The resulting JSON output is as follows: (I only want the lat/lon coordinates for each address under the 'results' -- 'location' section for each entry)
[{'input': {'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'zip': '73118',
'country': 'US'},
'formatted_address': '3704 N Western, Oklahoma City, OK 73118'},
'results': [{'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'zip': '73118',
'country': 'US'},
'formatted_address': '3704 N Western Ave, Oklahoma City, OK 73118',
'location': {'lat': 35.507996, 'lng': -97.52952},
...
'source': 'Oklahoma'},
{'address_components': {'number': '3704',
'predirectional': 'N',
'street': 'Western',
...
'country': 'US'},
'formatted_address': '3704 N Western Ave, Oklahoma City, OK 73118',
'location': {'lat': 35.508013, 'lng': -97.529453},
...
'source': 'Acog Counties'}]},
{'input': {'address_components': {'number': '1503',
'street': 'Winding Ridge',

The idea is to go down your dictionary. Like this:
import json
data = json.loads('{"a":{"b":{"location": {"lat": 35.507996, "lng": -97.52952}}}}')
data = data["a"]["b"]["location"]
print(data)

You can also read you nested json file as a pandas DataFrame. Call your file d (that's the json file you gave in your question). Read it with
df = pd.json_normalize(d)
df.columns = df.columns.map(lambda x: x.split(".")[-1])

Related

For loop in a dictionary inside a dictionary for dataframe construction in herepy (PlacesAPI)

I am using Python and I am trying to access the result of function PlacesAPI where I can see supermarkets around me and create a dataframe with few parts of each dictionary inside the main dictionary using a for loop, however I am getting the same information for different rows.
Can you please help me to put each different parts of dictionary in a different row?
Here is my code and the result (now reproducible):
from herepy import PlacesApi
import pandas as pd
def dataframe():
a = {'items': [{'title': 'Marcos Francisco dos Santos Padaria e Mercearia',
'id': 'here:pds:place:07675crc-dfd72cbf57bd45cc9277ed8530ffd61b',
'ontologyId': 'here:cm:ontology:supermarket',
'resultType': 'place',
'address': {'label': 'Marcos Francisco dos Santos Padaria e Mercearia, Rua Oswero Carmo Vilaça, 33, Petrópolis - RJ, 25635-101, Brazil',
'countryCode': 'BRA',
'countryName': 'Brazil',
'stateCode': 'RJ',
'state': 'Rio de Janeiro',
'city': 'Petrópolis',
'district': 'Petrópolis',
'street': 'Rua Oswero Carmo Vilaça',
'postalCode': '25635-101',
'houseNumber': '33'},
'position': {'lat': -22.5315, 'lng': -43.16904},
'access': [{'lat': -22.5314, 'lng': -43.16914}],
'distance': 134,
'categories': [{'id': '600-6300-0066', 'name': 'Grocery', 'primary': True},
{'id': '600-6300-0244', 'name': 'Bakery & Baked Goods Store'}],
'contacts': [{'phone': [{'value': '+552422312493'}]}]},
{'title': 'Mr. Frango',
'id': 'here:pds:place:076jx7ps-7c214f50052f0c23c9e5422ebde7d3cd',
'ontologyId': 'here:cm:ontology:supermarket',
'resultType': 'place',
'address': {'label': 'Mr. Frango, Rua Teresa, Petrópolis - RJ, 25635-530, Brazil',
'countryCode': 'BRA',
'countryName': 'Brazil',
'stateCode': 'RJ',
'state': 'Rio de Janeiro',
'city': 'Petrópolis',
'district': 'Petrópolis',
'street': 'Rua Teresa',
'postalCode': '25635-530'},
'position': {'lat': -22.52924, 'lng': -43.17222},
'access': [{'lat': -22.52925, 'lng': -43.1722}],
'distance': 545,
'categories': [{'id': '600-6300-0066', 'name': 'Grocery', 'primary': True},
{'id': '600-6000-0061', 'name': 'Convenience Store'}],
'references': [{'supplier': {'id': 'core'}, 'id': '1159487213'}],
'contacts': [{'phone': [{'value': '+552422201010'},
{'value': '+552422315720', 'categories': [{'id': '600-6000-0061'}]}]}]},
{'title': 'Mercadinho Flor de Petrópolis',
'id': 'here:pds:place:07675crc-6b03dfbac65a45c0bfc52ab9a3f04556',
'ontologyId': 'here:cm:ontology:supermarket',
'resultType': 'place',
'address': {'label': 'Mercadinho Flor de Petrópolis, Rua Teresa, 2060, Petrópolis - RJ, 25635-530, Brazil',
'countryCode': 'BRA',
'countryName': 'Brazil',
'stateCode': 'RJ',
'state': 'Rio de Janeiro',
'city': 'Petrópolis',
'district': 'Petrópolis',
'street': 'Rua Teresa',
'postalCode': '25635-530',
'houseNumber': '2060'},
'position': {'lat': -22.52895, 'lng': -43.17233},
'access': [{'lat': -22.52895, 'lng': -43.17219}],
'distance': 574,
'categories': [{'id': '600-6300-0066',
'name': 'Grocery',
'primary': True}]}]}
value = []
address = []
latlong = []
teste = pd.DataFrame(columns = ['nome','endereco','rua','numero',
'cidade','estado','cep','lat','long','raio'])
teste['nome'] = []
teste['endereco'] = []
teste['rua'] = []
teste['numero'] =[]
teste['cidade'] = []
teste['estado'] = []
teste['cep'] = []
teste['lat'] = []
teste['long'] = []
teste['raio'] = []
g = pd.DataFrame.from_dict(a.values())
h =[]
for i in range(3):
v = g[i].values[0]
h = v.items()
for k, l in h:
value.append(l)
for c, d in value[4].items():
address.append(d)
for la, lo in value[5].items():
latlong.append(lo)
novo_concorrente = {'nome': value[0], 'endereco':address[0],
'rua':address[7], 'numero':address[9],
'cidade':address[5], 'estado':address[3],
'cep':address[8],'lat':latlong[0],
'long':latlong[1],'raio':value[7]}
teste = teste.append(novo_concorrente, ignore_index=True)
return teste
You should focus on your for loop. I would suggest you to create a dictionary for each row you want to define in your final DataFrame, and then create a list to append those dictionaries to.
In example:
rows = []
for item in a["items"]:
row = {
"Latitude": item["position"]["lat"],
"Postal code": item["address"]["postalCode"],
}
rows.append(row)
result = DataFrame(rows)
result
Hope it helps as a starting point.

np.where to update row values

I am pulling information from a database and the location information comes back as GeoJSON. What I want to do is use np.where to apply a transformation on the pandas' dataframe to each row value. Can this be done instead of using apply?
Here is the np.where I am using:
The gist is this: If a value in location is null ensure it stays NULL, else convert the value to the Shapely Geometry object.
import pandas as pd
import numpy as np
from shapely.geometry import shape
class Geometry():
def __init__(self, data):
self._j = shape(data)
dummy_data = [{'_key': '42741', '_id': 'points/42741', '_rev': '_a6hqTXS-BI', 'location': {'coordinates': [-73.9942131, 40.7152034], 'type': 'Point'}, 'name': 'Lao Di Fang Noodle House'}, {'_key': '51156', '_id': 'points/51156', '_rev': '_a6hqTb6-_8', 'location': {'coordinates': [-73.9939157, 40.7154794], 'type': 'Point'}, 'name': '68 Deli Canal Store'}, {'_key': '42994', '_id': 'points/42994', '_rev': '_a6hqTXe--g', 'location': {'coordinates': [-73.9952133, 40.716076], 'type': 'Point'}, 'name': "Popeye'S Chicken & Biscuits"}, {'_key': '45769', '_id': 'points/45769', '_rev': '_a6hqTZC-_W', 'location': {'coordinates': [-73.9937748, 40.715124], 'type': 'Point'}, 'name': '27 Sheng Wang Noodle Shop'}]
df = pd.DataFrame(dummy_data)
df.location = np.where(df.location.isnull(), np.nan, Geometry(df.location.values))
I know this does not work, but I want to take advantage of the speed of numpy for these transformation processes.
Any advice/suggestions are welcome!

Not able to get specific object from a json response using python

I have the following as part of a function to extract some info from a json response (at the bottom you can find a portion of it), and is working fine. But......
venues_list=[]
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, search_categoryId, radius, LIMIT)
results = requests.get(url).json()["response"]['venues']
venues_list.append([(
v['name'],
v['location']['lat'],
v['location']['lng'],
v['categories'][0]['name'],
v['id']) for v in results])
To the mix of objects I'm retrieving from the json file, i want to add:
v['location']['postalCode']
But is not working. I get a KeyError: 'postalCode'
If I do:
v['location']['distance']
v['location']['country']
v['location']['formattedAddress']
It works. I get no error.
These don't work either:
v['location']['cc']
v['location']['city']
I get the same KeyError: 'cc' , KeyError: 'city'
Is there something I'm missing? Can you help me understand why it behaves this way?
[{'id': '4ad4c061f964a52099f720e3',
'name': 'Live Organic Food Bar',
'location': {'address': '264 Dupont Street',
'lat': 43.67505287052667,
'lng': -79.40671518307245,
'labeledLatLngs': [{'label': 'display',
'lat': 43.67505287052667,
'lng': -79.40671518307245}],
'distance': 273,
'postalCode': 'M5R 1V7',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['264 Dupont Street', 'Toronto ON M5R 1V7', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1d3941735',
'name': 'Vegetarian / Vegan Restaurant',
'pluralName': 'Vegetarian / Vegan Restaurants',
'shortName': 'Vegetarian / Vegan',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
'suffix': '.png'},
'primary': True}],
'referralId': 'v-1591407049',
'hasPerk': False},

Find nearest location coordinates in Land using python

A geocode api returns no location information for coordinates in ocean/sea. For those records, I would like to find the nearest possible coordinates that has a valid location information (that is closest land coordinates)
Below is the code for fetching location information by passing coordinates
import requests
request_url = "https://api.mapbox.com/geocoding/v5/mapbox.places/{0}%2C{1}.json?access_token={2}&types=country&limit=1".format(lng,lat,key)
response = requests.get(request_url)
output = response.json()
I have no clue in finding the nearest location. I'm also new to Python
Sample output:
{'type': 'FeatureCollection',
'query': [32.12, 54.21],
'features': [{'id': 'country.10008046970720960',
'type': 'Feature',
'place_type': ['country'],
'relevance': 1,
'properties': {'short_code': 'ru', 'wikidata': 'Q159'},
'text': 'Russia',
'place_name': 'Russia',
'bbox': [19.608673, 41.185353, 179.9, 81.961618],
'center': [37.61667, 55.75],
'geometry': {'type': 'Point', 'coordinates': [37.61667, 55.75]}}],
'attribution': 'NOTICE: © 2020 Mapbox and its suppliers. All rights reserved. Use of this data is subject to the Mapbox Terms of Service (https://www.mapbox.com/about/maps/). This response and the information it contains may not be retained. POI(s) provided by Foursquare.'}
Output when the coordinates are ocean:
{'type': 'FeatureCollection',
'query': [0, 0],
'features': [],
'attribution': 'NOTICE: © 2020 Mapbox and its suppliers. All rights reserved. Use of this data is subject to the Mapbox Terms of Service (https://www.mapbox.com/about/maps/). This response and the information it contains may not be retained. POI(s) provided by Foursquare.'}
Using Haversine Formula to find the nearest point (eg. city) based on latitude and longitude.
def dist_between_two_lat_lon(*args):
from math import asin, cos, radians, sin, sqrt
lat1, lat2, long1, long2 = map(radians, args)
dist_lats = abs(lat2 - lat1)
dist_longs = abs(long2 - long1)
a = sin(dist_lats/2)**2 + cos(lat1) * cos(lat2) * sin(dist_longs/2)**2
c = asin(sqrt(a)) * 2
radius_earth = 6378 # the "Earth radius" R varies from 6356.752 km at the poles to 6378.137 km at the equator.
return c * radius_earth
def find_closest_lat_lon(data, v):
try:
return min(data, key=lambda p: dist_between_two_lat_lon(v['lat'],p['lat'],v['lon'],p['lon']))
except TypeError:
print('Not a list or not a number.')
# city = {'lat_key': value, 'lon_key': value} # type:dict()
new_york = {'lat': 40.712776, 'lon': -74.005974}
washington = {'lat': 47.751076, 'lon': -120.740135}
san_francisco = {'lat': 37.774929, 'lon': -122.419418}
city_list = [new_york, washington, san_francisco]
city_to_find = {'lat': 29.760427, 'lon': -95.369804} # Houston
print(find_closest_lat_lon(city_list, city_to_find))
Which Yields:
{'lat': 47.751076, 'lon': -120.740135} # Corresponds to Washington
Let's suppose you got four json answers from mapbox and you saved them in a list:
json_answers = list() # = []
json_answers.append({'type': 'FeatureCollection',
'query': [32.12, 54.21],
'features': [{'id': 'country.10008046970720960',
'type': 'Feature',
'place_type': ['country'],
'relevance': 1,
'properties': {'short_code': 'ru', 'wikidata': 'Q159'},
'text': 'Russia',
'place_name': 'Russia',
'bbox': [19.608673, 41.185353, 179.9, 81.961618],
'center': [37.61667, 55.75],
'geometry': {'type': 'Point', 'coordinates': [37.61667, 55.75]}}],
'attribution': 'NOTICE: ...'})
# I changed only the 'coordinates' value for this example
json_answers.append({'type': 'FeatureCollection',
'query': [32.12, 54.21],
'features': [{'id': 'country.10008046970720960',
'type': 'Feature',
'place_type': ['country'],
'relevance': 1,
'properties': {'short_code': 'ru', 'wikidata': 'Q159'},
'text': 'Russia',
'place_name': 'Russia',
'bbox': [19.608673, 41.185353, 179.9, 81.961618],
'center': [37.61667, 55.75],
'geometry': {'type': 'Point', 'coordinates': [38.21667, 56.15]}}],
'attribution': 'NOTICE: ...'})
# I changed only the 'coordinates' value for this example
json_answers.append({'type': 'FeatureCollection',
'query': [32.12, 54.21],
'features': [{'id': 'country.10008046970720960',
'type': 'Feature',
'place_type': ['country'],
'relevance': 1,
'properties': {'short_code': 'ru', 'wikidata': 'Q159'},
'text': 'Russia',
'place_name': 'Russia',
'bbox': [19.608673, 41.185353, 179.9, 81.961618],
'center': [37.61667, 55.75],
'geometry': {'type': 'Point', 'coordinates': [33.21667, 51.15]}}],
'attribution': 'NOTICE: ...'})
# The last answer is "null"
json_answers.append({'type': 'FeatureCollection',
'query': [0, 0],
'features': [],
'attribution': 'NOTICE: ...'})
coord_list = []
for answer in json_answers:
if answer['features']: # check if ['features'] is not empty
# I'm not sure if it's [lat, lon] or [lon, lat] (you can verify it on mapbox)
print(f"Coordinates in [lat, lon]: {answer['features'][0]['geometry']['coordinates']}")
lat = answer['features'][0]['geometry']['coordinates'][0]
lon = answer['features'][0]['geometry']['coordinates'][1]
temp_dict = {'lat': lat, 'lon': lon}
coord_list.append(temp_dict)
print(f"coord_list = {coord_list}")
point_to_find = {'lat': 37.41667, 'lon': 55.05} # Houston
print(f"point_to_find = {point_to_find}")
print(f"find_closest_lat_lon = {find_closest_lat_lon(coord_list, point_to_find)}")
Which yields:
{'lat': 47.751076, 'lon': -120.740135}
Coordinates in [lat, lon]: [37.61667, 55.75]
Coordinates in [lat, lon]: [38.21667, 56.15]
Coordinates in [lat, lon]: [33.21667, 51.15]
coord_list = [{'lat': 37.61667, 'lon': 55.75}, {'lat': 38.21667, 'lon': 56.15}, {'lat': 33.21667, 'lon': 51.15}]
point_to_find = {'lat': 37.41667, 'lon': 55.05}
find_closest_lat_lon = {'lat': 38.21667, 'lon': 56.15}
Use reverse_geocode library in python to get nearest city with country.
Example:
import reverse_geocode
coordinates = (-37.81, 144.96), (31.76, 35.21)
reverse_geocode.search(coordinates)
Result:
[{'city': 'Melbourne', 'code': 'AU', 'country': 'Australia'},
{'city': 'Jerusalem', 'code': 'IL', 'country': 'Israel'}]
Here is an unoptimized solution.
What's going on under the hood of the function:
Run a GeoPy reverse look-up on a point.
If the point is found, return its country name.
If the point is not found, search for the nearest point of land in the world_geometry variable.
Perform a reverse lookup on that closest point.
Return that point's country name (if it exists) or the locality name (if no country name).
from geopy.geocoders import Nominatim
from shapely.ops import nearest_points
def country_lookup(query, geocoder, land_geometry):
try:
loc = geocoder.reverse((query.y, query.x))
return loc.raw['address']['country']
except (KeyError, AttributeError):
_, p2 = nearest_points(query, land_geometry)
loc = geocoder.reverse((p2.y, p2.x)).raw['address']
if 'country' in loc.keys():
return loc['country']
else:
return loc['locality']
# get world (or any land) geometry, instantiate geolocator service
world = gp.read_file(gp.datasets.get_path('naturalearth_lowres'))
world_geometry = world.geometry.unary_union
geolocator = Nominatim(user_agent="GIW")
# Create a column of country names from points in a GDF's geometry.
gdf['country'] = gdf.geometry.apply(country_lookup, args=(geolocator, world_geometry))
The accuracy of the results depends on the accuracy of the land geometry you provide. For example, geopandas's world geometry is pretty good. I was able to find names for all countries except for some of the smallest of the islands in the Bahamas. Those that it could not find were labelled "Bermuda Triangle" by the function, which is good enough for me.
Different package to try out is reverse_geocoder, which will return the nearest city, state, and country. Seems to be better than the reverse_geocode package.
import reverse_geocoder as rg
coordinates = (29,-84.1),(37,-125) #Both located in the ocean
rg.search(coordinates)
Output:
[OrderedDict([('lat', '29.67106'),
('lon', '-83.38764'),
('name', 'Steinhatchee'),
('admin1', 'Florida'),
('admin2', 'Taylor County'),
('cc', 'US')]),
OrderedDict([('lat', '38.71519'),
('lon', '-123.45445'),
('name', 'Sea Ranch'),
('admin1', 'California'),
('admin2', 'Sonoma County'),
('cc', 'US')])]

Carving out details from unstructured data through Python throwing exception--> IndexError: list index out of range

I have a csv file called "data" and in the 1st column (col. name = Address) there are two addresses:
1.United Kingdom, London, Burlington Gardens, 3
2.United States, Menlo Park, Sand Hill Road, 3000
I am trying following code to return country, postal_code, city, street_and_no. in the csv file by using google geocode API. (full API key not mentioned due to security reason)
import requests
import json
import csv
from tqdm import *
def addresses_from_csv(path=None, column=None):
addresses = []
with open(path, 'r') as f:
reader = csv.reader(f)
for row in reader:
addresses.append(row[column])
return addresses
# Get addresses from CSV
addresses = addresses_from_csv(path='C:/Users/kumarso/Documents/BioquellSales/Data.csv', column=0)
# Set Google Maps API key
api_key = 'AIzaSyCm5u6gF2QCccsn'
# Initialize array for transformed addresses
transformed = []
transformed.append(['Country', 'Post code', 'City', 'Street & No'])
for query in tqdm(addresses):
# API call, storing information as JSON
url = 'https://maps.googleapis.com/maps/api/geocode/json?address=' + query + '&lang=en&key=' + api_key
r = requests.get(url)
data = r.json()
#print(data)
# clear all values to avoid appending values from previous iterations a second time
number = street = country = postal_code = city = ''
# looping over address components in JSON
for component in data['results'][0]['address_components']:
if 'street_number' in component['types']:
number = component['long_name']
elif 'route' in component['types']:
street = component['long_name']
elif 'country' in component['types']:
country = component['long_name']
elif 'postal_code' in component['types']:
postal_code = component['long_name']
elif 'locality' in component['types']:
city = component['long_name']
elif 'postal_town' in component['types']:
city = component['long_name']
else:
continue
street_and_no = street + ' ' + number
transformed.append([country, postal_code, city, street_and_no])
with open('transformed_addresses.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
for row in transformed:
writer.writerow(row)
print('done')
I am getting the following error:
"File "c:/ExcelP/Practice.py", line 39, in
for component in data['results'][0]['address_components']: IndexError: list index out of range"
any help will be appreciated.
addition :- Print result before looping over address
{'results': [{'address_components': [{'long_name': 'Munich', 'short_name': 'Munich', 'types': ['locality', 'political']}, {'long_name': 'Upper Bavaria', 'short_name': 'Upper Bavaria', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Bavaria', 'short_name': 'BY', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'Germany', 'short_name': 'DE', 'types': ['country', 'political']}], 'formatted_address': 'Munich, Germany', 'geometry': {'bounds': {'northeast': {'lat': 48.2482197, 'lng': 11.7228755}, 'southwest': {'lat': 48.0616018, 'lng': 11.360796}}, 'location': {'lat': 48.1351253, 'lng': 11.5819805}, 'location_type': 'APPROXIMATE', 'viewport': {'northeast': {'lat': 48.2482197, 'lng': 11.7228755}, 'southwest': {'lat': 48.0616018, 'lng': 11.360796}}}, 'place_id': 'ChIJ2V-Mo_l1nkcRfZixfUq4DAE', 'types': ['locality', 'political']}], 'status': 'OK'}
The issue is resolved. Thanks to Massifox for giving hint. Following actions must be taken
1) Make sure data in CSV does not have ASCII character. Print the data in console to check if feed from csv is correct.
2) Make sure API key is active.
3) Add the path in the output csv file. For example, in my question, I put 'transformed_addresses.csv' but this should be with full path.
Hope that helps!

Categories