How to merge two columns in pandas with length different - python

I have 2 df
In first df there are 3 columns, 10 rows, 3rd column is output column
In second Df there are 3 columns 1000 rows
If my first df 2 column matches with 2 columns of second df then 3rd column from first df has to append second df.
both df is below
df1
,A,B,output
1,abc,CCE,out1
2,def,CCE,out2
3,ghi,CCE,out3
4,hij,CCE,out4
5,klm,,out5
df2
,A,B
1,abc,CCE
2,def,CCE
3,lmn,CCE
4,opq,CCE
5,abc,CCE
6,klm,,
df2_expected
1,abc,CCE,out1
2,def,CCE,out2
3,lmn,CCE,
4,opq,CCE,
5,abc,CCE,out1
6,klm,,out5
As example i am giving 3 column actually in first df its n column and df2 its n-1 column means output column wont present in df2

Please try this
import pandas as pd
data1={'nu':[1,2,3,4,5], 'A':['abc','def','ghi','hij','klm'], 'B':['CCE','CCE','CCE','CCE','CCE'], 'output':['out1','out2','out3','out4','out5',]}
data2={'nu':[1,2,3,4,5], 'A':['abc','def','lmn','opq','abc'], 'B':['CCE','CCE','CCE','CCE','CCE'], 'output':[]}
df1=pd.DataFrame(data1,columns=['A','B','output'], index=data1['nu'])
df2=pd.DataFrame(data2,columns=['A','B'], index=data2['nu'])
df2.merge(df1, on=['A','B'],how='left').fillna('')
A B output
0 abc CCE out1
1 def CCE out2
2 lmn CCE
3 opq CCE
4 abc CCE out1

Related

Stick the columns based on the one columns keeping ids

I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5

Using the items of a df as a header of a diffeerent dataframe

I have 2 dataframes
df1= 0 2
1 _A1-Site_0_norm _A1-Site_1_norm
and df2=
0 2
2 0.500000 0.012903
3 0.010870 0.013793
4 0.011494 0.016260
I want to use df1 as a header of df2 so that df1 is either the header of the columns or the first raw.
1 _A1-Site_0_norm _A1-Site_1_norm
2 0.500000 0.012903
3 0.010870 0.013793
4 0.011494 0.016260
i have multiple columns so it will not work to do
df2.columns=["_A1-Site_0_norm", "_A1-Site_1_norm"]
I thought of making a list of all the items present in the df1 to the use df2.columns and then include that list but I am having problems with converting the elements in row 1 of df1 of each column in items of a list.
I am not married to that approach any alternative to do it is wellcome
Many thanks
if I understood you question correctly
then this example should work for you
d={'A':[1],'B':[2],'C':[3]}
df = pd.DataFrame(data=d)
d2 = {'1':['D'],'2':['E'],'3':['F']}
df2 = pd.DataFrame(data=d2)
df.columns = df2.values.tolist() #this is what you need to implement

Replace rows in Dataframe using index from another Dataframe

I have two dataframes with identical structures df and df_a. df_a is a subset of df that I need to reintegrate into df. Essentially, df_a has various rows (with varying indices) from df that have been manipulated.
Below is an example of indices of each df and df_a. These both have the same column structure so all the columns are the same, it's only the rows and idex of the rows that differ.
>> df
index .. other_columns ..
0
1
2
3
. .
9999
10000
10001
[10001 rows x 20 columns]
>> df_a
index .. other_columns ..
5
12
105
712
. .
9824
9901
9997
[782 rows x 20 columns]
So, I want to overwrite only the rows in df that have the indices of df_a with the corresponding rows in df_a. I checked out Replace rows in a Pandas df with rows from another df and replace rows in a pandas data frame but neither of those tell how to use the indices of another dataframe to replace the values in the rows.
Something along the lines of:
df.loc[df_a.index, :] = df_a[:]
I don't know if this wants you meant, for that you would need to be more specific, but if the first data frame was modified to be a new data frame with different indexes, then you can use this code to reset back the indexes:
import pandas as pd
df_a = pd.DataFrame({'a':[1,2,3,4],'b':[5,4,2,7]}, index=[2,55,62,74])
df_a.reset_index(inplace=True, drop=True)
print(df_a)
PRINTS:
a b
0 1 5
1 2 4
2 3 2
3 4 7

pandas reset index after performing groupby and retain selective columns

I want to take a pandas dataframe, do a count of unique elements by a column and retain 2 of the columns. But I get a multi-index dataframe after groupby which I am unable to (1) flatten (2) select only relevant columns. Here is my code:
import pandas as pd
df = pd.DataFrame({
'ID':[1,2,3,4,5,1],
'Ticker':['AA','BB','CC','DD','CC','BB'],
'Amount':[10,20,30,40,50,60],
'Date_1':['1/12/2018','1/14/2018','1/12/2018','1/14/2018','2/1/2018','1/12/2018'],
'Random_data':['ax','','nan','','by','cz'],
'Count':[23,1,4,56,34,53]
})
df2 = df.groupby(['Ticker']).agg(['nunique'])
df2.reset_index()
print(df2)
df2 still comes out with two levels of index. And has all the columns: Amount, Count, Date_1, ID, Random_data.
How do I reduce it to one level of index?
And retain only ID and Random_data columns?
Try this instead:
1) Select only the relevant columns (['ID', 'Random_data'])
2) Don't pass a list to .agg - just 'nunique' - the list is what is causing the multi index behaviour.
df2 = df.groupby(['Ticker'])['ID', 'Random_data'].agg('nunique')
df2.reset_index()
Ticker ID Random_data
0 AA 1 1
1 BB 2 2
2 CC 2 2
3 DD 1 1
Use SeriesGroupBy.nunique and filter columns in list after groupby:
df2 = df.groupby('Ticker')['Date_1','Count','ID'].nunique().reset_index()
print(df2)
Ticker Date_1 Count ID
0 AA 1 1 1
1 BB 2 2 2
2 CC 2 2 2
3 DD 1 1 1

join two pandas dataframe using a specific column

I am new with pandas and I am trying to join two dataframes based on the equality of one specific column. For example suppose that I have the followings:
df1
A B C
1 2 3
2 2 2
df2
A B C
5 6 7
2 8 9
Both dataframes have the same columns and the value of only one column (say A) might be equal. What I want as output is this:
df3
A B C B C
2 8 9 2 2
The values for column 'A' are unique in both dataframes.
Thanks
pd.concat([df1.set_index('A'),df2.set_index('A')], axis=1, join='inner')
If you wish to maintain column A as a non-index, then:
pd.concat([df1.set_index('A'),df2.set_index('A')], axis=1, join='inner').reset_index()
Alternatively, you could just do:
df3 = df1.merge(df2, on='A', how='inner', suffixes=('_1', '_2'))
And then you can keep track of each value's origin

Categories