I have a datframe df like the following:
df name city
0 John New York
1 Carl New York
2 Carl Paris
3 Eva Paris
4 Eva Paris
5 Carl Paris
I want to know the total number of people in the different cities
df2 city number
0 New York 2
1 Paris 3
or the number of people with the same name in the cities
df2 name city number
0 John New York 1
1 Eva Paris 2
2 Carl Paris 2
3 Eva New York 0
I believe need GroupBy.size:
df1 = df.groupby(['city']).size().reset_index(name='number')
print (df1)
city number
0 New York 2
1 Paris 4
df2 = df.groupby(['name','city']).size().reset_index(name='number')
print (df2)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva Paris 2
3 John New York 1
If need all combinations one solution is add unstack and stack:
df3=df.groupby(['name','city']).size().unstack(fill_value=0).stack().reset_index(name='count')
print (df3)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva New York 0
3 Eva Paris 2
4 John New York 1
5 John Paris 0
Or reindex with MultiIndex.from_product:
df2 = df.groupby(['name','city']).size()
mux = pd.MultiIndex.from_product(df2.index.levels, names=df2.index.names)
df2 = df2.reindex(mux, fill_value=0).reset_index(name='number')
print (df2)
name city number
0 Carl New York 1
1 Carl Paris 2
2 Eva New York 0
3 Eva Paris 2
4 John New York 1
5 John Paris 0
To count the number of people with different names in the same city:
groups = df.groupby('city').count().reset_index()
To count the number of people with the same name in different cities:
groups = df.groupby('city').count().reset_index()
Related
I want to add a suffix to the first N columns. But I can't.
This is how to add a suffix to all columns:
import pandas as pd
df = pd.DataFrame( {"name" : ["John","Alex","Kate","Martin"], "surname" : ["Smith","Morgan","King","Cole"],
"job": ["Engineer","Dentist","Coach","Teacher"],"Age":[25,20,25,30],
"Id": [1,2,3,4]})
df.add_suffix("_x")
And this is the result:
name_x surname_x job_x Age_x Id_x
0 John Smith Engineer 25 1
1 Alex Morgan Dentist 20 2
2 Kate King Coach 25 3
3 Martin Cole Teacher 30 4
But I want to add the first N columns so let's say the first 3. Desired output is:
name_x surname_x job_x Age Id
0 John Smith Engineer 25 1
1 Alex Morgan Dentist 20 2
2 Kate King Coach 25 3
3 Martin Cole Teacher 30 4
Work with the indices and take slices to modify a subset of them:
df.columns = (df.columns[:3]+'_x').union(df.columns[3:], sort=False)
print(df)
name_x surname_x job_x Age Id
0 John Smith Engineer 25 1
1 Alex Morgan Dentist 20 2
2 Kate King Coach 25 3
3 Martin Cole Teacher 30 4
This should work:
N=3
cols=[i for i in df.columns[:N]]
new_cols=[i+'_x' for i in df.columns[:N]]
dict_cols=dict(zip(cols,new_cols))
df.rename(dict_cols,axis=1)
set the column labels using a list comprehension:
n = 3
df.columns = [f'{c}_x' if i < n else c for i, c in enumerate(df.columns)]
results in
name_x surname_x job_x Age Id
0 John Smith Engineer 25 1
1 Alex Morgan Dentist 20 2
2 Kate King Coach 25 3
3 Martin Cole Teacher 30 4
I have the following data frame:
Name Age City Gender Country
0 Jane 23 NaN F London
1 Melissa 45 Nan F France
2 John 35 Nan M Toronto
I want to switch value between column based on condition:
if Country equal to Toronto and London
I would like to have this output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
How can I do this?
I would use .loc to check the rows where Country contains London or Toronto, then set the City column to those values and use another loc statement to replace London and Toronto with Nan in the country column
df.loc[df['Country'].isin(['London', 'Toronto']), 'City'] = df['Country']
df.loc[df['Country'].isin(['London', 'Toronto']), 'Country'] = np.nan
output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
You could use np.where:
cities = ['London', 'Toronto']
df['City'] = np.where(
df['Country'].isin(cities),
df['Country'],
df['City']
)
df['Country'] = np.where(
df['Country'].isin(cities),
np.nan,
df['Country']
)
Results:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
cond = df['Country'].isin(['London', 'Toronto'])
df['City'].mask(cond, df['Country'], inplace = True)
df['Country'].mask(cond, np.nan, inplace = True)
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed last year.
I have a dataframe df:
Name
Place
Price
Bob
NY
15
Jack
London
27
John
Paris
5
Bill
Sydney
3
Bob
NY
39
Jack
London
9
Bob
NY
2
Dave
NY
7
I need to assign an incremental value (from 1 to N) for each row which has the same name and place (price can be different).
df_out:
Name
Place
Price
Value
Bob
NY
15
1
Jack
London
27
1
John
Paris
5
1
Bill
Sydney
3
1
Bob
NY
39
2
Jack
London
9
2
Bob
NY
2
3
Dave
NY
7
1
I could do this by sorting the dataframe (on Name and Place) and then iteratively checking if they match between two consecutive rows. Is there a smarter/faster pandas way to do this?
You can use a grouped (on Name, Place) cumulative count and add 1 as it starts from 0:
df['Value'] = df.groupby(['Name','Place']).cumcount().add(1)
prints:
Name Place Price Value
0 Bob NY 15 1
1 Jack London 27 1
2 John Paris 5 1
3 Bill Sydney 3 1
4 Bob NY 39 2
5 Jack London 9 2
6 Bob NY 2 3
7 Dave NY 7 1
I'm working with pandas in Python. I'm trying to return the name of a column when a cell of a row matches with another cell in the same row. This is an example:
City 1 City 2 City 3 City 4 1 2 3 4 5
0 New York London Paris Tokyo London Tokyo New York Paris X
1 Paris New York London Tokyo Tokyo London New York Paris X
I'm looking for the next coincidence to the column City 1 (the value is New York). In the first row this match is in the column named '3'. Then, I would like in the column 5, where X is it should appear '3' because 'New York' appears in the column '3'. The same for next one... I would look for the match with City 1 (Paris) and then the column 5 should show '4' because Paris is in that column.
I hope I've explained properly. Do you have any suggestion about how to do it? Thanks in advance!
Assuming you are checking the next occurance of City 1
df[5] = df.apply(lambda x: x[x == x['City 1']].index[-1], axis = 1)
City 1 City 2 City 3 City 4 1 2 3 4 5
0 New York London Paris Tokyo London Tokyo New York Paris 3
1 Paris New York London Tokyo Tokyo London New York Paris 4
This question already has answers here:
Concatenate a list of pandas dataframes together
(6 answers)
Closed 2 years ago.
i have a dataframe df1
id Name City type
1 Anna Paris AB
2 Marc Rome D
3 erika madrid AC
and a dataframe df2
id Name City type
1 Anna Paris B
and a dataframe df3
id Name City type
1 Anna Paris C
i want to append df2 and df3 to df1 , this is my expected output :
id Name City type
1 Anna Paris AB
2 Marc Rome D
3 erika madrid AC
1 Anna Paris B
1 Anna Paris C
df1 = df1.append(df2)
df1 = df1.append(df3)
but the dataframe add only the last row and delete the other rows with the same id
id Name City type
2 Marc Rome D
3 erika madrid AC
1 Anna Paris C
i mtrying also concat
df1= pd.concat([df1,df2,df3], join='inner')
I think the problem with pd.concat() is that you are passing the parameter join = inner. I expect this to work:
output = pd.concat([df1,df2,df3])
Using this an example code:
df1 = pd.DataFrame({'Name':['Anna','Marc','erika'],
'City':['Paris','Rome','madrid'],
'Type':['AB','D','AC']})
df2 = pd.DataFrame({'Name':['Anna'],
'City':['Paris'],
'Type':['B']})
df3 = pd.DataFrame({'Name':['Anna'],
'City':['Paris'],
'Type':['C']})
pd.concat([df1,df2,df3])
It outputs:
Name City Type
0 Anna Paris AB
1 Marc Rome D
2 erika madrid AC
0 Anna Paris B
0 Anna Paris C