Remove the coordinates in the location column - python

Would like to have the coordinates in their own respective columns.
This is my df["Boroughs"]
0 Beaches-East York
1 Davenport
2 Eglinton-Lawrence
3 Etobicoke Centre
4 Etobicoke North
5 Humber River-Black Creek
6 Parkdale-High Park
7 Scarborough-Agincourt
8 Scarborough-Rouge Park
9 Toronto-Danforth
10 Willowdale
11 York Centre
Name: Boroughs, dtype: object
My steps:
dfB = []
city = 'Toronto, Canada'
boroughs = df["Boroughs"]
for borough in boroughs:
try:
address = borough + ', ' + city
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
dfB.append([location, lat, lng])
except Exception as e:
print(address, lat, lng)
dfB = pd.DataFrame(dfB, columns=["location", "lat", "lng"])
dfB
The Output:
How can I remove the extra info and the coordinates from the location column?
location lat lng
0 (Beaches—East York, Old Toronto, Toronto, Golden Horseshoe, Ontario, Canada, (43.6814698, -79.3060214)) 43.681470 -79.306021
1 (Davenport, Old Toronto, Toronto, Golden Horseshoe, Ontario, Canada, (43.6715614, -79.4482927)) 43.671561 -79.448293
2 (Eglinton—Lawrence, North York, Toronto, Golden Horseshoe, Ontario, Canada, (43.7192647, -79.429765)) 43.719265 -79.429765
3 (Etobicoke Centre, Etobicoke, Toronto, Ontario, Canada, (43.6798327, -79.5389927)) 43.679833 -79.538993
4 (Etobicoke North, Etobicoke, Toronto, Ontario, Canada, (43.7410925, -79.5892249)) 43.741093 -79.589225
5 (Humber River—Black Creek, North York, Toronto, Ontario, Canada, (43.7337368, -79.5382285)) 43.733737 -79.538229
6 (Parkdale—High Park, Old Toronto, Toronto, Golden Horseshoe, Ontario, Canada, (43.6499649, -79.473014)) 43.649965 -79.473014
7 (Scarborough—Agincourt, Scarborough, Toronto, Ontario, Canada, (43.797221, -79.3083901035784)) 43.797221 -79.308390
8 (Scarborough—Rouge Park, Scarborough, Toronto, Ontario, Canada, (43.80292335, -79.175434369733)) 43.802923 -79.175434
9 (Toronto—Danforth, Old Toronto, Toronto, Golden Horseshoe, Ontario, Canada, (43.6789439, -79.3448597)) 43.678944 -79.344860
10 (Willowdale, North York, Toronto, Ontario, Canada, (43.7753558, -79.4166859823926)) 43.775356 -79.416686
11 (York Centre, North York, Toronto, Golden Horseshoe, Ontario, Canada, (43.750241, -79.463352)) 43.750241 -79.463352
Expected Output:
location lat lng
0 Beaches—East York 43.681470 -79.306021
1 Davenport 43.671561 -79.448293
2 Eglinton—Lawrence 43.719265 -79.429765
3 Etobicoke Centre. 43.679833 -79.538993
.....

Try this:
Assuming location column has data in tuples in each row.
dfB.location=dfB.location.apply(lambda x: x[0])

You are getting all values in one column because you are appending a list to a list, while you want your final output to be a dataframe.
Try:
dfB = pd.DataFrame()
city = 'Toronto, Canada'
boroughs = df["Boroughs"]
for borough in boroughs:
try:
address = borough + ', ' + city
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
dfB = dfB.append([[location, lat, lng]])
except Exception as e:
print(address, lat, lng)
dfB.columns = ["location", "lat", "lng"]
dfB

Related

Assign values from a dictionary to a new column based on condition

This my data frame
City
sales
San Diego
500
Texas
400
Nebraska
300
Macau
200
Rome
100
London
50
Manchester
70
I want to add the country at the end which will look like this
City
sales
Country
San Diego
500
US
Texas
400
US
Nebraska
300
US
Macau
200
Hong Kong
Rome
100
Italy
London
50
England
Manchester
200
England
The countries are stored in below dictionary
country={'US':['San Diego','Texas','Nebraska'], 'Hong Kong':'Macau', 'England':['London','Manchester'],'Italy':'Rome'}
It's a little complicated because you have lists and strings as the values and strings are technically iterable, so distinguishing is more annoying. But here's a function that can flatten your dict:
def flatten_dict(d):
nd = {}
for k,v in d.items():
# Check if it's a list, if so then iterate through
if ((hasattr(v, '__iter__') and not isinstance(v, str))):
for item in v:
nd[item] = k
else:
nd[v] = k
return nd
d = flatten_dict(country)
#{'San Diego': 'US',
# 'Texas': 'US',
# 'Nebraska': 'US',
# 'Macau': 'Hong Kong',
# 'London': 'England',
# 'Manchester': 'England',
# 'Rome': 'Italy'}
df['Country'] = df['City'].map(d)
You can implement this using geopy
You can install geopy by pip install geopy
Here is the documentation : https://pypi.org/project/geopy/
# import libraries
from geopy.geocoders import Nominatim
# you need to mention a name for the app
geolocator = Nominatim(user_agent="some_random_app_name")
# get country name
df['Country'] = df['City'].apply(lambda x : geolocator.geocode(x).address.split(', ')[-1])
print(df)
City sales Country
0 San Diego 500 United States
1 Texas 400 United States
2 Nebraska 300 United States
3 Macau 200 中国
4 Rome 100 Italia
5 London 50 United Kingdom
6 Manchester 70 United Kingdom
# to get country name in english
df['Country'] = df['City'].apply(lambda x : geolocator.reverse(geolocator.geocode(x).point, language='en').address.split(', ')[-1])
print(df)
City sales Country
0 San Diego 500 United States
1 Texas 400 United States
2 Nebraska 300 United States
3 Macau 200 China
4 Rome 100 Italy
5 London 50 United Kingdom
6 Manchester 70 United Kingdom

Pythons 'DataFrame' object is not callable - FOR ERROR

Good Morning, My df(df_part3) is above:
Postal Code Borough Neighbourhood Latitude Longitude
0 M5A Downtown Toronto Regent Park, Harbourfront 43.654260 -79.360636
1 M7A Downtown Toronto Queen's Park, Ontario Provincial Government 43.662301 -79.389494
2 M5B Downtown Toronto Garden District, Ryerson 43.657162 -79.378937
3 M5C Downtown Toronto St. James Town 43.651494 -79.375418
4 M4E East Toronto The Beaches 43.676357 -79.293031
... ... ... ... ... ...
34 M5W Downtown Toronto Stn A PO Boxes 43.646435 -79.374846
35 M4X Downtown Toronto St. James Town, Cabbagetown 43.667967 -79.367675
36 M5X Downtown Toronto First Canadian Place, Underground city 43.648429 -79.382280
37 M4Y Downtown Toronto Church and Wellesley 43.665860 -79.383160
38 M7Y East Toronto Business reply mail Processing Centre, South C... 43.662744 -79.321558
And My Code is Here:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df_part3['Latitude'], df_part3['Longitude'], df_part3['Neighbourhood']):
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_toronto)
map_toronto
But When i Run it i get:
TypeError: 'DataFrame' object is not callable
----> 5 for lat, lng, label in zip(df_part3['Latitude'], df_part3['Longitude'], df_part3['Neighbourhood']):
Does anyone knows how to help-me?

Split column in DataFrame based on item in list

I have the following table and would like to split each row into three columns: state, postcode and city. State and postcode are easy, but I'm unable to extract the city. I thought about splitting each string after the street synonyms and before the state, but I seem to be getting the loop wrong as it will only use the last item in my list.
Input data:
Address Text
0 11 North Warren Circle Lisbon Falls ME 04252
1 227 Cony Street Augusta ME 04330
2 70 Buckner Drive Battle Creek MI
3 718 Perry Street Big Rapids MI
4 14857 Martinsville Road Van Buren MI
5 823 Woodlawn Ave Dallas TX 75208
6 2525 Washington Avenue Waco TX 76710
7 123 South Main St Dallas TX 75201
The output I'm trying to achieve (for all rows, but I only wrote out the first two to save time)
City State Postcode
0 Lisbon Falls ME 04252
1 Augusta ME 04330
My code:
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
# This is where I got stuck
df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))
df
Here's a way to do that:
import pandas as pd
# data
df = pd.DataFrame(
['11 North Warren Circle Lisbon Falls ME 04252',
'227 Cony Street Augusta ME 04330',
'70 Buckner Drive Battle Creek MI',
'718 Perry Street Big Rapids MI',
'14857 Martinsville Road Van Buren MI',
'823 Woodlawn Ave Dallas TX 75208',
'2525 Washington Avenue Waco TX 76710',
'123 South Main St Dallas TX 75201'],
columns=['Address Text'])
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand=True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand=True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
def find_city(address, state, street_synonyms):
for syn in street_synonyms:
if syn in address:
# remove street
city = address.split(syn)[-1]
# remove State and postcode
city = city.split(state)[0]
return city
df['City'] = df.apply(lambda x: find_city(x['Address Text'], x['State'], street_synonyms), axis=1)
print(df[['City', 'State', 'Zip']])
"""
City State Zip
0 Lisbon Falls ME 04252
1 Augusta ME 04330
2 Battle Creek MI NaN
3 Big Rapids MI NaN
4 Van Buren MI 14857
5 Dallas TX 75208
6 nue Waco TX 76710
7 Dallas TX 75201
"""

Group a dataframe by a column and concactenate strings in another

I know this should be easy but it's driving me mad...
I am trying to turn a dataframe into a grouped dataframe.
df outputs:
Postcode Borough Neighbourhood
0 M3A North York Parkwoods
1 M4A North York Victoria Village
2 M5A Downtown Toronto Harbourfront
3 M5A Downtown Toronto Regent Park
4 M6A North York Lawrence Heights
5 M6A North York Lawrence Manor
6 M7A Queen's Park Not assigned
7 M9A Etobicoke Islington Avenue
8 M1B Scarborough Rouge
9 M1B Scarborough Malvern
10 M3B North York Don Mills North
...
I want to make a grouped dataframe where the Neighbourhood is grouped by Postcode and all neighborhoods then become a concatenated string of Neighbourhoods as grouped by Postcode...
something like:
Postcode Borough Neighbourhood
0 M3A North York Parkwoods
1 M4A North York Victoria Village
2 M5A Downtown Toronto Harbourfront, Regent Park
...
I am trying to use:
df.groupby(['Postcode'])['Neighbourhood'].apply(lambda strs: ', '.join(strs))
But this does not return a new dataframe .. it outputs the same original dataframe when I use df after running.
if I use:
df = df.groupby(['Postcode'])['Neighbourhood'].apply(lambda strs: ', '.join(strs))
it turns df into an object?
Use this code
new_df = df.groupby(['Postcode', 'Borough']).agg({'Neighbourhood':lambda x:', '.join(x)}).reset_index()
reset_index() will take your group by columns out of the index and return it as a column to the dataframe and create a new integer index.

how to preserve original indexes in the new dataframe

def answer_eight():
templist = list()
for county, region, p15, p14, ste, cty in zip(census_df.CTYNAME,
census_df.REGION,
census_df.POPESTIMATE2015,
census_df.POPESTIMATE2014,
census_df.STNAME,
census_df.CTYNAME):
# print(county)
if region == 1 or region == 2:
if county.startswith('Washington'):
if p15 > p14:
templist.append((ste, cty))
labels = ['STNAME', 'CTYNAME']
df = pd.DataFrame.from_records(templist, columns=labels)
return df
STNAME CTYNAME
0 Iowa Washington County
1 Minnesota Washington County
2 Pennsylvania Washington County
3 Rhode Island Washington County
4 Wisconsin Washington County
All these CTYNAME has different indexes in the original census_df. How could I transfer them over to the new DF so the answer looks like:
STNAME CTYNAME
12 Iowa Washington County
222 Minnesota Washington County
400 Pennsylvania Washington County
2900 Rhode Island Washington County
2999 Wisconsin Washington County
I'd include the index with the other things your are zipping
def answer_eight():
templist = list()
index = list()
zipped = zip(
census_df.CTYNAME,
census_df.REGION,
census_df.POPESTIMATE2015,
census_df.POPESTIMATE2014,
census_df.STNAME,
census_df.CTYNAME,
census_df.index
)
for county, region, p15, p14, ste, cty, idx in zipped:
# print(county)
if region == 1 or region == 2:
if county.startswith('Washington'):
if p15 > p14:
templist.append((ste, cty))
index.append(idx)
labels = ['STNAME', 'CTYNAME']
df = pd.DataFrame(templist, index, labels)
return df.rename_axis(census_df.index.name)
Before you start filtering, you can assign the original index to a column with:
census_df['original index'] = census_df.index
Then just treat it like one of the other columns you're selecting from.

Categories