replace row data in pandas based on another dataframe - python

I've a sample dataframe
name
0 Newyork
1 Los Angeles
2 Ohio
3 Washington DC
4 Kentucky
Also I've a second dataframe
name ratio
0 Newyork 1:2
1 Kentucky 3:7
2 Florida 1:5
3 SF 2:9
How can I replace the data of name column in the df2 with not available, if the name is present in df1?
Desired result:
name ratio
0 Not Available 1:2
1 Not Available 3:7
2 Florida 1:5
3 SF 2:9

Use numpy.where:
df2['name'] = np.where(df2['name'].isin(df1['name']), 'Not Available', df2['name'])

Related

How can I fill some data of the cell of the new column that is in accord with a substring of the original data using pandas?

There are 2 dataframes, and they have simillar data.
A dataframe
Index Business Address
1 Oils Moskva, Russia
2 Foods Tokyo, Japan
3 IT California, USA
... etc.
B dataframe
Index Country Country Calling Codes
1 USA +1
2 Egypt +20
3 Russia +7
4 Korea +82
5 Japan +81
... etc.
I will add a column named 'Country Calling Codes' to A dataframe, too.
After this, 'Country' column in B dataframe will be compared with the data of 'Address' column. If the string of 'A.Address' includes string of 'B.Country', 'B.Country Calling Codes' will be inserted to 'A.Country Calling Codes' of compared row.
Result is:
Index Business Address Country Calling Codes
1 Oils Moskva, Russia +7
2 Foods Tokyo, Japan +81
3 IT California, USA +1
I don't know how to deal with the issue because I don't have much experience using pandas. I should be very grateful to you if you might help me.
Use Series.str.extract for get possible strings by Country column and then Series.map by Series:
d = B.drop_duplicates('Country').set_index('Country')['Country Calling Codes']
s = A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False)
A['Country Calling Codes'] = s.map(d)
print (A)
Index Business Address Country Calling Codes
0 1 Oils Moskva, Russia +7
1 2 Foods Tokyo, Japan +81
2 3 IT California, USA +1
Detail:
print (A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False))
0 Russia
1 Japan
2 USA
Name: Address, dtype: object

pandas groupby replace based on condition

I have a dataset structures as below:
index country city Data
0 AU Sydney 23
1 AU Sydney 45
2 AU Unknown 2
3 CA Toronto 56
4 CA Toronto 2
5 CA Ottawa 1
6 CA Unknown 2
I want to replace 'Unknown' in the city column with the mode of the occurences of cities per country. The result would be:
...
2 AU Sydney 2
...
6 CA Toronto 2
I can get the city modes with:
city_modes = df.groupby('country')['city'].apply(lambda x: x.mode().iloc[0])
And I can replace values with:
df['column']=df.column.replace('Unknown', 'something')
But i cant work out how to combine these to only replace unknowns for each country based on mode of occurrence of cities.
Any ideas?
Use transform for Series with same size as original DataFrame and set new values by numpy.where:
city_modes = df.groupby('country')['city'].transform(lambda x: x.mode().iloc[0])
df['column'] = np.where(df['column'] == 'Unknown',city_modes, df['column'])
Or:
df.loc[df['column'] == 'Unknown', 'column'] = city_modes

pandas fill missing country values based on city if it exists

I'm trying to fill country names in my dataframe if it is null based on city and country names, which exists. For eg see the dataframe below, here i want to replace NaN for City Bangalore with Country India if such City exists in the dataframe
df1=
City Country
0 Bangalore India
1 Delhi India
2 London UK
3 California USA
4 Dubai UAE
5 Abu Dhabi UAE
6 Bangalore NaN
I am new to this so any help would be appreciated :).
You can create a series mapping after dropping nulls and duplicates.
Then use fillna with pd.Series.map:
g = df.dropna(subset=['Country']).drop_duplicates('City').set_index('City')['Country']
df['Country'] = df['Country'].fillna(df['City'].map(g))
print(df)
City Country
0 Bangalore India
1 Delhi India
2 London UK
3 California USA
4 Dubai UAE
5 AbuDhabi UAE
6 Bangalore India
This solution will also work if NaN occurs first within a group.
I believe
df1.groupby('City')['Country'].fillna(method='ffill')
should resolve your issue by forward filling missing values within the group by.
One of the ways could be -
non_null_cities = df1.dropna().drop_duplicates(['City']).rename(columns={'Country':'C'})
df1 = df1.merge(non_null_cities, on='City', how='left')
df1.loc[df1['Country'].isnull(), 'Country'] = df1['C']
del df1['C']
Hope this will be helpful!
Here is one nasty way to do it.
first use forward fill and then use backwardfill ( for the possible NaN occurs first)
df = df.groupby('City')[['City','Country']].fillna(method = 'ffill').groupby('City')[['City','Country']].fillna(method = 'bfill')

Left join tables (1:n) using Pandas, keeping number of rows the same as left table

How do I left join tables with 1:n relationship, while keeping the number of rows the same as left table and concatenating any duplicate data with a character/string like ';'.
Example: Country Table
CountryID Country Area
1 UK 1029
2 Russia 8374
Cities Table
CountryID City
1 London
1 Manchester
2 Moscow
2 Ufa
I want:
CountryID Country Area Cities
1 UK 1029 London;Manchester
2 Russia 8374 Moscow;Ufa
I know how to perform a normal left join
country.merge(city, how='left', on='CountryID')
which gives me four rows instead of two:
Area Country CountryID City
1029 UK 1 London
1029 UK 1 Manchester
8374 Russia 2 Moscow
8374 Russia 2 Ufa
Use map by Series created by groupby + join for new column in df1 if performance is important:
df1['Cities'] = df1['CountryID'].map(df2.groupby('CountryID')['City'].apply(';'.join))
print (df1)
CountryID Country Area Cities
0 1 UK 1029 London;Manchester
1 2 Russia 8374 Moscow;Ufa
Detail:
print (df2.groupby('CountryID')['City'].apply(';'.join))
CountryID
1 London;Manchester
2 Moscow;Ufa
Name: City, dtype: object
Another solution with join:
df = df1.join(df2.groupby('CountryID')['City'].apply(';'.join), on='CountryID')
print (df)
CountryID Country Area City
0 1 UK 1029 London;Manchester
1 2 Russia 8374 Moscow;Ufa
This will give you the desired result:
df1.merge(df2, on='CountryID').groupby(['CountryID', 'Country', 'Area']).agg({'City': lambda x: ';'.join(x)}).reset_index()
# CountryID Country Area City
#0 1 UK 1029 London;Manchester
#1 2 Russia 8374 Moscow;Ufa

Iterating through two pandas dataframes and appending data from one dataframe to the other

I have two pandas data-frames that look like this:
data_frame_1:
index un_id city
1 abc new york
2 def atlanta
3 gei toronto
4 lmn tampa
data_frame_2:
index name un_id
1 frank gei
2 john lmn
3 lisa abc
4 jessica def
I need to match names to cities via the un_id column either in a new data-frame or an existing data-frame. I am having trouble figuring out how to iterate through one column, grab the un_id, iterate through the other un_id column in the other data-frame with that un_id, and then append the information needed back to the original data-frame.
use pandas merge:
In[14]:df2.merge(df1,on='un_id')
Out[14]:
name un_id city
0 frank gei toronto
1 john lmn tampa
2 lisa abc new york
3 jessica def atlanta

Categories