I am trying to add a Pandas.Series as a new row to a Pandas.DataFrame. However, the Series always appear to be added with its index appearing as individual rows.
How can we append it as a single row?
import pandas as pd
df = pd.DataFrame([
('Tom', 'male', 10),
('Jane', 'female', 7),
('Peter', 'male', 9),
], columns=['name', 'gender', 'age'])
df.set_index(['name'], inplace=True)
print(df)
gender age
name
Tom male 10
Jane female 7
Peter male 9
s = pd.Series(('Jon', 'male', 12), index=['name', 'gender', 'age'])
print(s)
name Jon
gender male
age 12
dtype: object
Expected Result
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
Attempt 1
df2 = df.append(pd.DataFrame(s))
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
Attempt #2
df2 = pd.concat([df, s], axis=0)
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
Attempt #3
df2 = pd.concat([df, pd.DataFrame(s)], axis=0)
print(df2)
0 age gender
Tom NaN 10.0 male
Jane NaN 7.0 female
Peter NaN 9.0 male
name Jon NaN NaN
gender male NaN NaN
age 12 NaN NaN
This "works", but you may want to reconsider how you are building your dataframes in the first place. If you append data, do it all at once instead of row by row.
>>> pd.concat([df, s.to_frame().T.set_index('name')])
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
As a column of a dataframe, a Series is generally all the same data type (e.g. age). In this case, your series represents a single row of data for a given record, e.g. a row in a database with potentially mixed types. You may want to consider your series as a dataframe row instead.
row = pd.DataFrame({'gender': 'male', 'age': 12},
index=pd.Index(['Jon'], name='name'))
>>> pd.concat([df, row])
gender age
name
Tom male 10
Jane female 7
Peter male 9
Jon male 12
>>> pd.concat([df, row])
Related
how to make all columns in the last row as NAN in pandas dataframe
I have a DataFrame df which consists of the following,
Name Age Company Occupation
Gerald 30 Greenways Doctor
Tippi 25 Pathsect Engineer
Herbi 26 Neways Engineer
The end result should contain the following,
Name Age Company Occupation
Gerald 30 Greenways Doctor
Tippi 25 Pathsect Engineer
NAN NAN NAN NAN
I'm new to pandas
Use selection last row and set NaN:
df.iloc[-1] = np.nan
print (df)
Name Age Company Occupation
0 Gerald 30.0 Greenways Doctor
1 Tippi 25.0 Pathsect Engineer
2 NaN NaN NaN NaN
I have the following data frame:
Name Age City Gender Country
0 Jane 23 NaN F London
1 Melissa 45 Nan F France
2 John 35 Nan M Toronto
I want to switch value between column based on condition:
if Country equal to Toronto and London
I would like to have this output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
How can I do this?
I would use .loc to check the rows where Country contains London or Toronto, then set the City column to those values and use another loc statement to replace London and Toronto with Nan in the country column
df.loc[df['Country'].isin(['London', 'Toronto']), 'City'] = df['Country']
df.loc[df['Country'].isin(['London', 'Toronto']), 'Country'] = np.nan
output:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
You could use np.where:
cities = ['London', 'Toronto']
df['City'] = np.where(
df['Country'].isin(cities),
df['Country'],
df['City']
)
df['Country'] = np.where(
df['Country'].isin(cities),
np.nan,
df['Country']
)
Results:
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
cond = df['Country'].isin(['London', 'Toronto'])
df['City'].mask(cond, df['Country'], inplace = True)
df['Country'].mask(cond, np.nan, inplace = True)
Name Age City Gender Country
0 Jane 23 London F NaN
1 Melissa 45 NaN F France
2 John 35 Toronto M NaN
I have a csv file with some messy data.
I have following dataframe in pandas
Name
Age
Sex
Salary
Status
John
32
Nan
NaN
NaN
Nan
Male
4000
Single
NaN
May
20
Female
5000
Married
teresa
45
Desired output:
Name Age Sex Salary Status
0 John 32 Male 4000 Single
1 May 20 Female 5000 Married
2 teresa 45
So Does anyone know how to do it with Pandas?
You can use a bit of numpy magic to drop the NaNs and reshape the underlying array:
a = df.replace({'Nan': float('nan')}).values.flatten()
pd.DataFrame(a[~pd.isna(a)].reshape(-1, len(df.columns)),
columns=df.columns)
output:
Name Age Sex Salary Status
0 John 32 Male 4000 Single
1 May 20 Female 5000 Married
Try groupby:
>>> df.groupby(df['Name'].notna().cumsum()).apply(lambda x: x.apply(lambda x: next(iter(x.dropna()), np.nan))).reset_index(drop=True)
Name Age Sex Salary Status
0 John 32 4000 Single NaN
1 May 20 Female 5000 Married
>>>
I have 2 dataframes, df1 and df2.
df1 Contains the information of some interactions between people.
df1
Name1 Name2
0 Jack John
1 Sarah Jack
2 Sarah Eva
3 Eva Tom
4 Eva John
df2 Contains the status of general people and also some people in df1
df2
Name Y
0 Jack 0
1 John 1
2 Sarah 0
3 Tom 1
4 Laura 0
I would like df2 only for the people that are in df1 (Laura disappears), and for those that are not in df2 keep NaN (i.e. Eva) such as:
df2
Name Y
0 Jack 0
1 John 1
2 Sarah 0
3 Tom 1
4 Eva NaN
Create a DataFrame on unique values of df1 and map it with df2 as:
df = pd.DataFrame(np.unique(df1.values),columns=['Name'])
df['Y'] = df.Name.map(df2.set_index('Name')['Y'])
print(df)
Name Y
0 Eva NaN
1 Jack 0.0
2 John 1.0
3 Sarah 0.0
4 Tom 1.0
Note : Order is not preserved.
You can create a list of unique names in df1 and use isin
names = np.unique(df1[['Name1', 'Name2']].values.ravel())
df2.loc[~df2['Name'].isin(names), 'Y'] = np.nan
Name Y
0 Jack 0.0
1 John 1.0
2 Sarah 0.0
3 Tom 1.0
4 Laura NaN
I have two dataframes df1 and df2
df1
Name1 Name2
0 John Jack
1 Eva Tom
2 Eva Sara
3 Carl Sam
4 Sam Erin
df2 Name Money
0 John 40
1 Eva 20
2 Jack 10
3 Tom 80
4 Sara 34
5 Carl 77
6 Erin 12
I would like to merge the two dataframes and get:
df1
Name1 Name2 Money1 Money2
0 John Jack 40 10
1 Eva Tom 20 80
2 Eva Sara 20 34
3 Carl Sam 77 NaN
4 Sam Erin NaN 12
this what I am doing but I think this is not the best solution:
df1 = pd.merge(df1, df2, right_on='Name1', left_on='Name')
df1.columns = ['Name1', 'Name2', 'Money1']
df1 = pd.merge(df1, df2, right_on='Name2', left_on='Name')
df1.columns = ['Name1', 'Name2', 'Money1', 'Money2']
Using map with apply
df1[['Money1','Money2']]=df1.apply(lambda x : x.map(df2.set_index('Name').Money))
df1
Out[293]:
Name1 Name2 Money1 Money2
0 John Jack 40.0 10.0
1 Eva Tom 20.0 80.0
2 Eva Sara 20.0 34.0
3 Carl Sam 77.0 NaN
4 Sam Erin NaN 12.0
You can use index matching without the need to apply
assign
df = df.set_index('Name1').assign(Money_1=df2.set_index('Name').Money).reset_index().set_index('Name2').assign(Money_2=df2.set_index('Name').Money).reset_index()
Which is actually a one-liner, but is kinda big. The other option is to explicitly write the lines:
loc
df = df.set_index('Name1')
df.loc[:, 'Money_1'] = df2.set_index('Name').Money
df = df.reset_index().set_index('Name2')
df.loc[:, 'Money_2'] = df2.set_index('Name').Money
df.reset_index()
Both outputs
Name1 Name2 Money_1 Money_2
0 John Jack 40.0 10.0
1 Eva Tom 20.0 80.0
2 Eva Sara 20.0 34.0
3 Carl Sam 77.0 NaN
4 Sam Erin NaN 12.0