I have the following data frame:
Name Age City Gender Country
0 Jane 23 NaN F London
1 Melissa 45 Nan F France
2 John 35 Nan M Toronto
I want to switch value between column based on condition:
if Country equal to Toronto and London
I would like to have this output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
How can I do this?
I would use .loc to check the rows where Country contains London or Toronto, then set the City column to those values and use another loc statement to replace London and Toronto with Nan in the country column
df.loc[df['Country'].isin(['London', 'Toronto']), 'City'] = df['Country']
df.loc[df['Country'].isin(['London', 'Toronto']), 'Country'] = np.nan
output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
You could use np.where:
cities = ['London', 'Toronto']
df['City'] = np.where(
df['Country'].isin(cities),
df['Country'],
df['City']
)
df['Country'] = np.where(
df['Country'].isin(cities),
np.nan,
df['Country']
)
Results:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
cond = df['Country'].isin(['London', 'Toronto'])
df['City'].mask(cond, df['Country'], inplace = True)
df['Country'].mask(cond, np.nan, inplace = True)
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
Related
I am trying to add a Pandas.Series as a new row to a Pandas.DataFrame. However, the Series always appear to be added with its index appearing as individual rows.
How can we append it as a single row?
import pandas as pd
df = pd.DataFrame([
('Tom', 'male', 10),
('Jane', 'female', 7),
('Peter', 'male', 9),
], columns=['name', 'gender', 'age'])
df.set_index(['name'], inplace=True)
print(df)
gender age
name
Tom male 10
Jane female 7
Peter male 9
s = pd.Series(('Jon', 'male', 12), index=['name', 'gender', 'age'])
print(s)
name Jon
gender male
age 12
dtype: object
Expected Result
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
Attempt 1
df2 = df.append(pd.DataFrame(s))
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
Attempt #2
df2 = pd.concat([df, s], axis=0)
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
Attempt #3
df2 = pd.concat([df, pd.DataFrame(s)], axis=0)
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
This "works", but you may want to reconsider how you are building your dataframes in the first place. If you append data, do it all at once instead of row by row.
>>> pd.concat([df, s.to_frame().T.set_index('name')])
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
As a column of a dataframe, a Series is generally all the same data type (e.g. age). In this case, your series represents a single row of data for a given record, e.g. a row in a database with potentially mixed types. You may want to consider your series as a dataframe row instead.
row = pd.DataFrame({'gender': 'male', 'age': 12},
index=pd.Index(['Jon'], name='name'))
>>> pd.concat([df, row])
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
>>> pd.concat([df, row])
I have 2 dataframes, df1 and df2.
df1 Contains the information of some interactions between people.
df1
Name1 Name2
0 Jack John
1 Sarah Jack
2 Sarah Eva
3 Eva Tom
4 Eva John
df2 Contains the status of general people and also some people in df1
df2
Name Y
0 Jack 0
1 John 1
2 Sarah 0
3 Tom 1
4 Laura 0
I would like df2 only for the people that are in df1 (Laura disappears), and for those that are not in df2 keep NaN (i.e. Eva) such as:
df2
Name Y
0 Jack 0
1 John 1
2 Sarah 0
3 Tom 1
4 Eva NaN
Create a DataFrame on unique values of df1 and map it with df2 as:
df = pd.DataFrame(np.unique(df1.values),columns=['Name'])
df['Y'] = df.Name.map(df2.set_index('Name')['Y'])
print(df)
Name Y
0 Eva NaN
1 Jack 0.0
2 John 1.0
3 Sarah 0.0
4 Tom 1.0
Note : Order is not preserved.
You can create a list of unique names in df1 and use isin
names = np.unique(df1[['Name1', 'Name2']].values.ravel())
df2.loc[~df2['Name'].isin(names), 'Y'] = np.nan
Name Y
0 Jack 0.0
1 John 1.0
2 Sarah 0.0
3 Tom 1.0
4 Laura NaN
I have two dataframes df1 and df2
df1
Name1 Name2
0 John Jack
1 Eva Tom
2 Eva Sara
3 Carl Sam
4 Sam Erin
df2 Name Money
0 John 40
1 Eva 20
2 Jack 10
3 Tom 80
4 Sara 34
5 Carl 77
6 Erin 12
I would like to merge the two dataframes and get:
df1
Name1 Name2 Money1 Money2
0 John Jack 40 10
1 Eva Tom 20 80
2 Eva Sara 20 34
3 Carl Sam 77 NaN
4 Sam Erin NaN 12
this what I am doing but I think this is not the best solution:
df1 = pd.merge(df1, df2, right_on='Name1', left_on='Name')
df1.columns = ['Name1', 'Name2', 'Money1']
df1 = pd.merge(df1, df2, right_on='Name2', left_on='Name')
df1.columns = ['Name1', 'Name2', 'Money1', 'Money2']
Using map with apply
df1[['Money1','Money2']]=df1.apply(lambda x : x.map(df2.set_index('Name').Money))
df1
Out[293]:
Name1 Name2 Money1 Money2
0 John Jack 40.0 10.0
1 Eva Tom 20.0 80.0
2 Eva Sara 20.0 34.0
3 Carl Sam 77.0 NaN
4 Sam Erin NaN 12.0
You can use index matching without the need to apply
assign
df = df.set_index('Name1').assign(Money_1=df2.set_index('Name').Money).reset_index().set_index('Name2').assign(Money_2=df2.set_index('Name').Money).reset_index()
Which is actually a one-liner, but is kinda big. The other option is to explicitly write the lines:
loc
df = df.set_index('Name1')
df.loc[:, 'Money_1'] = df2.set_index('Name').Money
df = df.reset_index().set_index('Name2')
df.loc[:, 'Money_2'] = df2.set_index('Name').Money
df.reset_index()
Both outputs
Name1 Name2 Money_1 Money_2
0 John Jack 40.0 10.0
1 Eva Tom 20.0 80.0
2 Eva Sara 20.0 34.0
3 Carl Sam 77.0 NaN
4 Sam Erin NaN 12.0
I have a datframe df like the following:
df name city
0 John New York
1 Carl New York
2 Carl Paris
3 Eva Paris
4 Eva Paris
5 Carl Paris
I want to know the total number of people in the different cities
df2 city number
0 New York 2
1 Paris 3
or the number of people with the same name in the cities
df2 name city number
0 John New York 1
1 Eva Paris 2
2 Carl Paris 2
3 Eva New York 0
I believe need GroupBy.size:
df1 = df.groupby(['city']).size().reset_index(name='number')
print (df1)
city number
0 New York 2
1 Paris 4
df2 = df.groupby(['name','city']).size().reset_index(name='number')
print (df2)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva Paris 2
3 John New York 1
If need all combinations one solution is add unstack and stack:
df3=df.groupby(['name','city']).size().unstack(fill_value=0).stack().reset_index(name='count')
print (df3)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva New York 0
3 Eva Paris 2
4 John New York 1
5 John Paris 0
Or reindex with MultiIndex.from_product:
df2 = df.groupby(['name','city']).size()
mux = pd.MultiIndex.from_product(df2.index.levels, names=df2.index.names)
df2 = df2.reindex(mux, fill_value=0).reset_index(name='number')
print (df2)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva New York 0
3 Eva Paris 2
4 John New York 1
5 John Paris 0
To count the number of people with different names in the same city:
groups = df.groupby('city').count().reset_index()
To count the number of people with the same name in different cities:
groups = df.groupby('city').count().reset_index()
I have a dataframe containing about 300 000 rows with a structure like this:
name Jack
gender M
year 1993
country USA
city Odessa
name John
gender M
year 1992
name Sam
country Canada
city Toronto
Is there a possibility to make dataframe looks like this using Pandas?
name gender year country city
Jack M 1993 USA Odessa
John M 1992
Sam Canada Toronto
Row with "name" is always there, but others could be absent. I try to use iterrows with no success.
In [17]:
g = np.cumsum(df.iloc[: , 0] == 'name')
In [15]:
df.groupby(g).apply(lambda x : pd.DataFrame(x.set_index([0]).T , columns=['name' , 'gender' , 'year' , 'country' , 'city']) )
Out[15]:
name gender year country city
0
1 1 Jack M 1993 USA Odessa
2 1 John M 1992 NaN NaN
3 1 Sam NaN NaN Canada Toronto