I am calculating a new column in dataframe using a regular expression with named capturing groups as follows:
(df["Address Column"]
.str.extract("(?P<Address>.*\d+[\w+?|\s]\s?\w+\s+\w+),?\s(?P<Suburb>.*$)")
.apply(lambda x: x.str.title()))
However, I am getting a KeyError when calling new column "Suburb"
KeyError: "['Suburb'] not in index"
Sample data:
**Address column**
4a Mcarthurs Road, Altona north
1 Neal court, Altona North
4 Vermilion Drive, Greenvale
Lot 307 Bonds Lane, Greenvale
430 Blackshaws rd, Altona North
159 Bonds lane, Greenvale
Desired output:
Address Suburb
4a Mcarthurs Road Altona North
1 Neal court Altona North
4 Vermilion Drive Greenvale
Lot 307 Bonds Lane Greenvale
430 Blackshaws rd Altona North
159 Bonds lane Greenvale
Not sure why I am getting this!
Any help on this will be highly appreciated.
Thank you in advance for the support!
I think your problem is that you don't assign the result of your regexp query to the original df.
The following works for me:
r = r"(?P<Address>.*\d+[\w+?|\s]\s?\w+\s+\w+),?\s(?P<Suburb>.*$)"
ret = df["Address Column"].str.extract(r).apply(lambda x: x.str.title())
df = pd.concat([df, ret], axis=1)
df["Suburb"]
For completeness, this is how I initialized df.
import pandas as pd
s = pd.Series(["4a Mcarthurs Road, Altona north",
"1 Neal court, Altona North",
"4 Vermilion Drive, Greenvale",
"Lot 307 Bonds Lane, Greenvale",
"430 Blackshaws rd, Altona North",
"159 Bonds lane, Greenvale"])
df = pd.DataFrame({"Address Column": s})
The above code adds the new columns Address and Suburb to df:
Address Column Address Suburb
4a Mcarthurs Road, Altona north 4A Mcarthurs Road Altona North
1 Neal court, Altona North 1 Neal Court Altona North
4 Vermilion Drive, Greenvale 4 Vermilion Drive Greenvale
Lot 307 Bonds Lane, Greenvale Lot 307 Bonds Lane Greenvale
430 Blackshaws rd, Altona North 430 Blackshaws Rd Altona North
159 Bonds lane, Greenvale 159 Bonds Lane Greenvale
Related
I have a ranking of countries across the world in a variable called rank_2000 that looks like this:
Seoul
Tokyo
Paris
New_York_Greater
Shizuoka
Chicago
Minneapolis
Boston
Austin
Munich
Salt_Lake
Greater_Sydney
Houston
Dallas
London
San_Francisco_Greater
Berlin
Seattle
Toronto
Stockholm
Atlanta
Indianapolis
Fukuoka
San_Diego
Phoenix
Frankfurt_am_Main
Stuttgart
Grenoble
Albany
Singapore
Washington_Greater
Helsinki
Nuremberg
Detroit_Greater
TelAviv
Zurich
Hamburg
Pittsburgh
Philadelphia_Greater
Taipei
Los_Angeles_Greater
Miami_Greater
MannheimLudwigshafen
Brussels
Milan
Montreal
Dublin
Sacramento
Ottawa
Vancouver
Malmo
Karlsruhe
Columbus
Dusseldorf
Shenzen
Copenhagen
Milwaukee
Marseille
Greater_Melbourne
Toulouse
Beijing
Dresden
Manchester
Lyon
Vienna
Shanghai
Guangzhou
San_Antonio
Utrecht
New_Delhi
Basel
Oslo
Rome
Barcelona
Madrid
Geneva
Hong_Kong
Valencia
Edinburgh
Amsterdam
Taichung
The_Hague
Bucharest
Muenster
Greater_Adelaide
Chengdu
Greater_Brisbane
Budapest
Manila
Bologna
Quebec
Dubai
Monterrey
Wellington
Shenyang
Tunis
Johannesburg
Auckland
Hangzhou
Athens
Wuhan
Bangalore
Chennai
Istanbul
Cape_Town
Lima
Xian
Bangkok
Penang
Luxembourg
Buenos_Aires
Warsaw
Greater_Perth
Kuala_Lumpur
Santiago
Lisbon
Dalian
Zhengzhou
Prague
Changsha
Chongqing
Ankara
Fuzhou
Jinan
Xiamen
Sao_Paulo
Kunming
Jakarta
Cairo
Curitiba
Riyadh
Rio_de_Janeiro
Mexico_City
Hefei
Almaty
Beirut
Belgrade
Belo_Horizonte
Bogota_DC
Bratislava
Dhaka
Durban
Hanoi
Ho_Chi_Minh_City
Kampala
Karachi
Kuwait_City
Manama
Montevideo
Panama_City
Quito
San_Juan
What I would like to do is a map of the world where those cities are colored according to their position on the ranking above. I am opened to further solutions for the representation (such as bubbles of increasing dimension according to the position of the cities in the rank or, if necessary, representing only a sample of countries taken from the top rank, the middle and the bottom).
Thank you,
Federico
Your question has two parts; finding the location of each city and then drawing them on the map. Assuming you have the latitude and longitude of each city, here's how you'd tackle the latter part.
I like Folium (https://pypi.org/project/folium/) for drawing maps. Here's an example of how you might draw a circle for each city, with it's position in the list is used to determine the size of that circle.
import folium
cities = [
{'name':'Seoul', 'coodrs':[37.5639715, 126.9040468]},
{'name':'Tokyo', 'coodrs':[35.5090627, 139.2094007]},
{'name':'Paris', 'coodrs':[48.8588787,2.2035149]},
{'name':'New York', 'coodrs':[40.6976637,-74.1197631]},
# etc. etc.
]
m = folium.Map(zoom_start=15)
for counter, city in enumerate(cities):
circle_size = 5 + counter
folium.CircleMarker(
location=city['coodrs'],
radius=circle_size,
popup=city['name'],
color="crimson",
fill=True,
fill_color="crimson",
).add_to(m)
m.save('map.html')
Output:
You may need to adjust the circle_size calculation a little to work with the number of cities you want to include.
Good Morning, My df(df_part3) is above:
Postal Code Borough Neighbourhood Latitude Longitude
0 M5A Downtown Toronto Regent Park, Harbourfront 43.654260 -79.360636
1 M7A Downtown Toronto Queen's Park, Ontario Provincial Government 43.662301 -79.389494
2 M5B Downtown Toronto Garden District, Ryerson 43.657162 -79.378937
3 M5C Downtown Toronto St. James Town 43.651494 -79.375418
4 M4E East Toronto The Beaches 43.676357 -79.293031
... ... ... ... ... ...
34 M5W Downtown Toronto Stn A PO Boxes 43.646435 -79.374846
35 M4X Downtown Toronto St. James Town, Cabbagetown 43.667967 -79.367675
36 M5X Downtown Toronto First Canadian Place, Underground city 43.648429 -79.382280
37 M4Y Downtown Toronto Church and Wellesley 43.665860 -79.383160
38 M7Y East Toronto Business reply mail Processing Centre, South C... 43.662744 -79.321558
And My Code is Here:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df_part3['Latitude'], df_part3['Longitude'], df_part3['Neighbourhood']):
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_toronto)
map_toronto
But When i Run it i get:
TypeError: 'DataFrame' object is not callable
----> 5 for lat, lng, label in zip(df_part3['Latitude'], df_part3['Longitude'], df_part3['Neighbourhood']):
Does anyone knows how to help-me?
I have the following table and would like to split each row into three columns: state, postcode and city. State and postcode are easy, but I'm unable to extract the city. I thought about splitting each string after the street synonyms and before the state, but I seem to be getting the loop wrong as it will only use the last item in my list.
Input data:
Address Text
0 11 North Warren Circle Lisbon Falls ME 04252
1 227 Cony Street Augusta ME 04330
2 70 Buckner Drive Battle Creek MI
3 718 Perry Street Big Rapids MI
4 14857 Martinsville Road Van Buren MI
5 823 Woodlawn Ave Dallas TX 75208
6 2525 Washington Avenue Waco TX 76710
7 123 South Main St Dallas TX 75201
The output I'm trying to achieve (for all rows, but I only wrote out the first two to save time)
City State Postcode
0 Lisbon Falls ME 04252
1 Augusta ME 04330
My code:
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
# This is where I got stuck
df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))
df
Here's a way to do that:
import pandas as pd
# data
df = pd.DataFrame(
['11 North Warren Circle Lisbon Falls ME 04252',
'227 Cony Street Augusta ME 04330',
'70 Buckner Drive Battle Creek MI',
'718 Perry Street Big Rapids MI',
'14857 Martinsville Road Van Buren MI',
'823 Woodlawn Ave Dallas TX 75208',
'2525 Washington Avenue Waco TX 76710',
'123 South Main St Dallas TX 75201'],
columns=['Address Text'])
# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand=True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand=True)
# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]
def find_city(address, state, street_synonyms):
for syn in street_synonyms:
if syn in address:
# remove street
city = address.split(syn)[-1]
# remove State and postcode
city = city.split(state)[0]
return city
df['City'] = df.apply(lambda x: find_city(x['Address Text'], x['State'], street_synonyms), axis=1)
print(df[['City', 'State', 'Zip']])
"""
City State Zip
0 Lisbon Falls ME 04252
1 Augusta ME 04330
2 Battle Creek MI NaN
3 Big Rapids MI NaN
4 Van Buren MI 14857
5 Dallas TX 75208
6 nue Waco TX 76710
7 Dallas TX 75201
"""
Having issues merging data from multiple sheets from within same excel file.
2008: Data
UNI DEP ADDRESBR
6 24065037 225 Franklin Street
17 416952 100 North Gay Street
361391 3756 1717 South College
blank 81651 215 South 6th Street
2009 : Data
UNI DEP-2009 ADDRESBR
6 20624948 225 Franklin Street
17 471803 100 North Gay Street
361391 3891 1717 South College
180886 100277 215 South 6th Street
493224 1683 2315 Bentcreek Road
The goal is to combine all the sheet values, into the first sheet, just appending the year_dep as a new column. The issue I am having is the blank information from sheet1, and trying to match address, uniq from each colum.
Final result should look like this.
UNI DEPSUMBR ADDRESBR DEP-2009 DEP-n
6 20624948 225 Franklin Street 20624948
17 471803 100 North Gay Street 471803
361391 3891 1717 South College 3891
180886 100277 215 South 6th Street ...
493224 1683 2315 Bentcreek Road ...
Can anyone help, as to how I would do this in python? The goal is to have a final dataset that accounts for dep per year appended as a column.The trouble that I am having is matching dep value per year with respective UNI#
I have below dataframe nbr2:
Postal_Code Borough Neighborhood
0 M1B Scarborough Rouge, Malvern
1 M4C East York Woodbine Heights
2 M4E East Toronto The Beaches
3 M4L East Toronto The Beaches West, India Bazaar
4 M4M East Toronto Studio District
5 M4N Central Toronto Lawrence Park
On applying below code to filter out rows:
neighbor = nbr2.drop(nbr2[nbr2['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=True)
the dataframe gets distributes like below:
Postal_Code Borough \
37 M4E East Toronto
41 M4K East Toronto
42 M4L East Toronto
43 M4M East Toronto
Neighborhood
37 The Beaches
41 The Danforth West\n, Riverdale
42 The Beaches West\n, India Bazaar
43 Studio District\n
below code also results in similar structure:
# define the dataframe columns
column_names = ['Postal_Code','Borough', 'Neighborhood']
# instantiate the dataframe
neighbor = pd.DataFrame(columns=column_names)
neighbor = nbr2.drop(nbr2[nbr2['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=True)
use
pd.set_option('display.expand_frame_repr', False)