Pandas iterate over rows and find the column names

Pandas iterate over rows and find the column names - python

i have a two dataframes as:
df = pd.DataFrame({'America':["Ohio","Utah","New York"],
'Italy':["Rome","Milan","Venice"],
'Germany':["Berlin","Munich","Jena"]});
df2 = pd.DataFrame({'Cities':["Rome", "New York", "Munich"],
'Country':["na","na","na"]})
i want to itirate on df2 "Cities" column to find the cities on my (df) and append the country of the city (df column names) to the df2 country column

Use melt with map by dictionary:
df1 = df.melt()
print (df1)
variable value
0 America Ohio
1 America Utah
2 America New York
3 Italy Rome
4 Italy Milan
5 Italy Venice
6 Germany Berlin
7 Germany Munich
8 Germany Jena
df2['Country'] = df2['Cities'].map(dict(zip(df1['value'], df1['variable'])))
#alternative, thanks #Sandeep Kadapa
#df2['Country'] = df2['Cities'].map(df1.set_index('value')['variable'])
print (df2)
Cities Country
0 Rome Italy
1 New York America
2 Munich Germany

After melting and renaming the first dataframe:
df1 = df.melt().rename(columns={'variable': 'Country', 'value': 'Cities'})
the solution is a simple merge:
df2 = df2[['Cities']].merge(df1, on='Cities')

Related

ValueError: Series.replace cannot use dict-value and non-None to_replace when creating a conditional column

given this dataframe named df:
Number City Country
one Milan Italy
two Paris France
three London UK
four Berlin Germany
five Milan Italy
six Oxford UK
I would like to create a new column called 'Classification' based on this condition:
if df['Country'] = "Italy" and df['City'] = "Milan", result = "zero" else result = df['Number']
The result I want to achieve is this:
Number City Country Classification
one Milan Italy zero
two Paris France two
three London UK three
four Berlin Germany four
five Milan Italy zero
six Oxford UK six
I tried to use this code:
condition = [(df['Country'] == "Italy") & (df['City'] == 'Milan'),]
values = ['zero']
df['Classification'] = np.select(condition, values)
the result of which is this dataframe:
Number City Country Classification
one Milan Italy zero
two Paris France 0
three London UK 0
four Berlin Germany 0
five Milan Italy zero
six Oxford UK 0
now I try to replace the '0' in the 'Classification' column with the values of the column 'Number'
df['Classification'].replace(0, df['Number'])
but the result I get is an error:
ValueError: Series.replace cannot use dict-value and non-None to_replace
I would be very grateful for any suggestion on how to fix this

What you want is np.where
df['Classification'] = np.where((df['Country'] == "Italy") & (df['City'] == 'Milan'), 'zero', df['Number'])
print(df)
Number City Country Classification
0 one Milan Italy zero
1 two Paris France two
2 three London UK three
3 four Berlin Germany four
4 five Milan Italy zero
5 six Oxford UK six
If you want to use np.select, you need to specify default argument
condition = [(df['Country'] == "Italy") & (df['City'] == 'Milan'),]
values = ['zero']
df['Classification'] = np.select(condition, values, default=df['Number'])

Using df1 as a lookup table for df2, df2 has more unique values than df1 in Python

I have a df with US citizens state and I would like to use that as a lookup for world citizens
df1=
[Sam, New York;
Nick, California;
Sarah, Texas]
df2 =
[Sam;
Phillip;
Will;
Sam]
I would like to either df2.replace() with the states or create df3 where my output is:
[New York;
NaN;
NaN;
New York]
I have tried mapping with set_index and dict(zip()) but have had no luck so far.
Thank you.

How about this method:
import pandas as pd
df1 = pd.DataFrame([['Sam','New York'],['Nick','California'],['Sarah','Texas']],\
columns = ['name','state'])
display(df1)
df2 = pd.DataFrame(['Sam','Phillip','Will','Sam'],\
columns = ['name'])
display(df2)
df2.merge(right=df1,left_on='name',right_on='name',how='left')
resulting in
name state
0 Sam New York
1 Nick California
2 Sarah Texas
name
0 Sam
1 Phillip
2 Will
3 Sam
name state
0 Sam New York
1 Phillip NaN
2 Will NaN
3 Sam New York
you can then filter for just the state column in the merged dataframe

How can I fill some data of the cell of the new column that is in accord with a substring of the original data using pandas?

There are 2 dataframes, and they have simillar data.
A dataframe
Index Business Address
1 Oils Moskva, Russia
2 Foods Tokyo, Japan
3 IT California, USA
... etc.
B dataframe
Index Country Country Calling Codes
1 USA +1
2 Egypt +20
3 Russia +7
4 Korea +82
5 Japan +81
... etc.
I will add a column named 'Country Calling Codes' to A dataframe, too.
After this, 'Country' column in B dataframe will be compared with the data of 'Address' column. If the string of 'A.Address' includes string of 'B.Country', 'B.Country Calling Codes' will be inserted to 'A.Country Calling Codes' of compared row.
Result is:
Index Business Address Country Calling Codes
1 Oils Moskva, Russia +7
2 Foods Tokyo, Japan +81
3 IT California, USA +1
I don't know how to deal with the issue because I don't have much experience using pandas. I should be very grateful to you if you might help me.

Use Series.str.extract for get possible strings by Country column and then Series.map by Series:
d = B.drop_duplicates('Country').set_index('Country')['Country Calling Codes']
s = A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False)
A['Country Calling Codes'] = s.map(d)
print (A)
Index Business Address Country Calling Codes
0 1 Oils Moskva, Russia +7
1 2 Foods Tokyo, Japan +81
2 3 IT California, USA +1
Detail:
print (A['Address'].str.extract(f'({"|".join(d.keys())})', expand=False))
0 Russia
1 Japan
2 USA
Name: Address, dtype: object

pandas fill missing country values based on city if it exists

I'm trying to fill country names in my dataframe if it is null based on city and country names, which exists. For eg see the dataframe below, here i want to replace NaN for City Bangalore with Country India if such City exists in the dataframe
df1=
City Country
0 Bangalore India
1 Delhi India
2 London UK
3 California USA
4 Dubai UAE
5 Abu Dhabi UAE
6 Bangalore NaN
I am new to this so any help would be appreciated :).

You can create a series mapping after dropping nulls and duplicates.
Then use fillna with pd.Series.map:
g = df.dropna(subset=['Country']).drop_duplicates('City').set_index('City')['Country']
df['Country'] = df['Country'].fillna(df['City'].map(g))
print(df)
City Country
0 Bangalore India
1 Delhi India
2 London UK
3 California USA
4 Dubai UAE
5 AbuDhabi UAE
6 Bangalore India
This solution will also work if NaN occurs first within a group.

I believe
df1.groupby('City')['Country'].fillna(method='ffill')
should resolve your issue by forward filling missing values within the group by.

One of the ways could be -
non_null_cities = df1.dropna().drop_duplicates(['City']).rename(columns={'Country':'C'})
df1 = df1.merge(non_null_cities, on='City', how='left')
df1.loc[df1['Country'].isnull(), 'Country'] = df1['C']
del df1['C']
Hope this will be helpful!

Here is one nasty way to do it.
first use forward fill and then use backwardfill ( for the possible NaN occurs first)
df = df.groupby('City')[['City','Country']].fillna(method = 'ffill').groupby('City')[['City','Country']].fillna(method = 'bfill')

Cells all becomes NaN after reordering alphabetically

After I tried to sort my Pandas dataframe by the country column with:
times_data2.reindex_axis(sorted(times_data2['country']), axis=1)
My dataframe became something like:
Argetina Argentina .... United States of America ...
NaN Nan .... NaN ....

If you want to set the index of the dataframe to sorted countries:
df = pd.DataFrame({'country': ['Brazil', 'USA', 'Argentina'], 'val': [1, 2, 3]})
>>> df
country val
0 Brazil 1
1 USA 2
2 Argentina 3
>>> df.set_index('country').sort_index()
val
country
Argentina 3
Brazil 1
USA 2
You may want to transpose these results:
>>> df.set_index('country').sort_index().T
country Argentina Brazil USA
val 3 1 2

If you want to sort by a column, use .sort_values():
times_data2.sort_values(by='country')
Then use .set_index('country') if necessary.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas iterate over rows and find the column names - python

After melting and renaming the first dataframe: df1 = df.melt().rename(columns={'variable': 'Country', 'value': 'Cities'}) the solution is a simple merge: df2 = df2[['Cities']].merge(df1, on='Cities')

Related

ValueError: Series.replace cannot use dict-value and non-None to_replace when creating a conditional column

Using df1 as a lookup table for df2, df2 has more unique values than df1 in Python

How can I fill some data of the cell of the new column that is in accord with a substring of the original data using pandas?

pandas fill missing country values based on city if it exists

Cells all becomes NaN after reordering alphabetically

Categories

Resources