Lets say i Have 3 Pandas DF
DF1
Words Score
The Man 2
The Girl 4
Df2
Words2 Score2
The Boy 6
The Mother 7
Df3
Words3 Score3
The Son 3
The Daughter 4
Right now, I have them concatenated together so that it becomes 6 columns in one DF. That's all well and good but I was wondering, is there a pandas function to stack them vertically into TWO columns and change the headers?
So to make something like this?
Family Members Score
The Man 2
The Girl 4
The Boy 6
The Mother 7
The Son 3
The Daughter 4
everything I'm reading here http://pandas.pydata.org/pandas-docs/stable/merging.html seems to only have "horizontal" methods of joining DF!
As long as you rename the columns so that they're the same in each dataframe, pd.concat() should work fine:
# I read in your data as df1, df2 and df3 using:
# df1 = pd.read_clipboard(sep='\s\s+')
# Example dataframe:
Out[8]:
Words Score
0 The Man 2
1 The Girl 4
all_dfs = [df1, df2, df3]
# Give all df's common column names
for df in all_dfs:
df.columns = ['Family_Members', 'Score']
pd.concat(all_dfs).reset_index(drop=True)
Out[16]:
Family_Members Score
0 The Man 2
1 The Girl 4
2 The Boy 6
3 The Mother 7
4 The Son 3
5 The Daughter 4
Related
In python, I have a df that looks like this
Name ID
Anna 1
Polly 1
Sarah 2
Max 3
Kate 3
Ally 3
Steve 3
And a df that looks like this
Name ID
Dan 1
Hallie 2
Cam 2
Lacy 2
Ryan 3
Colt 4
Tia 4
How can I merge the df’s so that the ID column looks like this
Name ID
Anna 1
Polly 1
Sarah 2
Max 3
Kate 3
Ally 3
Steve 3
Dan 4
Hallie 5
Cam 5
Lacy 5
Ryan 6
Colt 7
Tia 7
This is just a minimal reproducible example. My actual data set has 1000’s of values. I’m basically merging data frames and want the ID’s in numerical order (continuation of previous data frame) instead of repeating from one each time. I know that I can reset the index if ID is a unique identifier. But in this case, more than one person can have the same ID. So how can I account for that?
From the example that you have provided above, you can observe that we can obtain the final dataframe by: adding the maximum value of ID in first df to the second and then concatenating them, to explain this better:
Name df2 final_df
Dan 1 4
This value in final_df is obtained by doing a 1+(max value of ID from df1 i.e. 3) and this trend is followed for all entries for the dataframe.
Code:
import pandas as pd
df = pd.DataFrame({'Name':['Anna','Polly','Sarah','Max','Kate','Ally','Steve'],'ID':[1,1,2,3,3,3,3]})
df1 = pd.DataFrame({'Name':['Dan','Hallie','Cam','Lacy','Ryan','Colt','Tia'],'ID':[1,2,2,2,3,4,4]})
max_df = df['ID'].max()
df1['ID'] = df1['ID'].apply(lambda x: x+max_df)
final_df = pd.concat([df,df1])
print(final_df)
This question already has answers here:
Pandas Merging 101
(8 answers)
Pandas: assign value depending on another dataframe
(1 answer)
Closed 1 year ago.
I want to copy the information (quantity) of one dataframe's column to the other dataframe's quantity column but do so matching the SKU column.
So for example the dataframes look like:
Dataframe 1:
SKU Quantity Title
A 3 Scissors
B 4 Cable
C 5 Goat
D 6 Cheese
Dataframe 2:
SKU Quantity Title
A 1 Blue Scissors
B 2 Red Cables
C 1 Fat Goat
D 2 Smelly Cheese
So I would like to get Dataframe 1's quantities and place them into Dataframe 2, but matching the SKUs (A, B, C, D etc) even though some other columns (such as Title) might have different information.
You can try to set index on SKU for both dataframes to align on index and copy the column Quantity with the aligned index. Reset index to restore SKU back to data column.
df1a = df1.set_index('SKU')
df2a = df2.set_index('SKU')
df2a['Quantity'] = df1a['Quantity']
df2 = df2a.reset_index()
Result:
print(df2)
SKU Quantity Title
0 A 3 Blue Scissors
1 B 4 Red Cables
2 C 5 Fat Goat
3 D 6 Smelly Cheese
You could try this :
df2['Quantity'] = np.where(df1['SKU'] == df2['SKU'], df1['Quantity'])
I have 3 datasets
All the same shape
CustomerNumber, Name, Status
A customer can appear on 1, 2 or all 3.
Each dataset is a list of gold/silver/bronze.
example data:
Dataframe 1:
100,James,Gold
Dataframe 2:
100,James,Silver
101,Paul,Silver
Dataframe 3:
100,James,Bronze
101,Paul,Bronze
102,Fred,Bronze
Expected output/aggregated list:
100,James,Gold
101,Paul,Silver
102,Fred,Bronze
So a customer that is captured in all 3, I want to keep Status as gold.
Have been playing with join and merge and just can’t get it right.
Use concat with convert column to ordered categorical, so get priorites if sorting values by multiple columns and last remove duplicates by DataFrame.drop_duplicates:
print (df1)
print (df2)
print (df3)
a b c
0 100 James Gold
a b c
0 100 James Silver
1 101 Paul Silver
a b c
0 101 Paul Bronze
1 102 Fred Bronze
df = pd.concat([df1, df2, df3], ignore_index=True)
df['c'] = pd.Categorical(df['c'], ordered=True, categories=['Gold','Silver','Bronze'])
df = df.sort_values(['a','b','c']).drop_duplicates(['a','b'])
print (df)
a b c
0 100 James Gold
2 101 Paul Silver
4 102 Fred Bronze
I have the following kind of data frame.
Id Name Exam Result Exam Result
1 Bob Maths 10 Physics 9
2 Mar ML 8 Chemistry 10
What I would like to have is removing the duplicate columns and adding their value to the corresponding rows. Something below
Id Name Exam Result
1 Bob Maths 10
1 Bob Physics 9
2 Mar ML 8
2 Mar Chemistry 10
Is there any way to do this in Python?
Any help is appreciated!
First create MultiIndex by first columns, which are not duplicated by DataFrame.set_index, then create MultiIndex in columns by counter of duplicates nameswith GroupBy.cumcount wotking with Series, so Index.to_series and last reshape by DataFrame.stack with DataFrame.reset_index for remove helper level and then for MultiIndex to columns:
df = df.set_index(['Id','Name'])
s = df.columns.to_series()
df.columns = [s, s.groupby(s).cumcount()]
df = df.stack().reset_index(level=2, drop=True).reset_index()
print (df)
Id Name Exam Result
0 1 Bob Maths 10
1 1 Bob Physics 9
2 2 Mar ML 8
3 2 Mar Chemistry 10
This is an alternative using pandas melt:
#flip table into long format
(df.melt(['Id','Name'])
#sort by Id so that result follows immediately after Exam
.sort_values('Id')
#create new column on rows that have result in the variable column
.assign(Result=lambda x: x.loc[x['variable']=="Result",'value'])
.bfill()
#get rid of rows that contain 'result' in variable column
.query('variable != "Result"')
.drop(['variable'],axis=1)
.rename(columns={'value':'Exam'})
)
Id Name Exam Result
0 1 Bob Maths 10
4 1 Bob Physics 9
1 2 Mar ML 8
5 2 Mar Chemistry 10
Alternatively, just for fun :
df = df.set_index(['Id','Name'])
#get boolean of duplicated columns
dupes = df.columns.duplicated()
#concatenate first columns and their duplicates
pd.concat([df.loc[:,~dupes],
df.loc[:,dupes]
]).sort_index()
I have two Dataframes: one with columns "Name", "Year" and "Type" and the other one with different parameters. There are 4 different types and each one has his specific parameters. Now i need to merge them together.
My approach is to use a if-function to find out the "type". For example in row two of df3 i have type 'a'. The parameters for type 'a' are in row 3 of df4. I tried to connect them with the following code:
df3.ix[[2]]
s1 = df3.ix[[2]]
s2 = df4.ix[[3]]
result = pd.concat([s1, s2], axis=1)
My problem is now, that the parameters are in a seperate row and not added to row 2. Is there a chance to merge them together in one row? Thanks for your answers!
If df3 has a Type column and df4 has a type column, then the two DataFrames can be merged with
pd.merge(df3, df4, left_on='Type', right_on='type')
This is by default an inner join.
In [13]: df3
Out[13]:
Name Year Type
1 A 2012 boat
2 B 2013 car
3 C 2011 truck
4 D 2013 boat
In [14]: df4
Out[14]:
type Parameter1 Parameter2 Parameter3
0 boat 2 8 7
1 car 1 9 3
2 truck 5 4 2
In [15]: pd.merge(df3, df4, left_on='Type', right_on='type')
Out[15]:
Name Year Type type Parameter1 Parameter2 Parameter3
0 A 2012 boat boat 2 8 7
1 D 2013 boat boat 2 8 7
2 B 2013 car car 1 9 3
3 C 2011 truck truck 5 4 2
Note that if the column names matched exactly, then
pd.merge(df3, df4)
would merge on column names shared in common by default.