How can I merge two pandas dataframes with different lengths like those:
df1 = Index block_id Ut_rec_0
0 0 7
1 1 10
2 2 2
3 3 0
4 4 10
5 5 3
6 6 6
7 7 9
df2 = Index block_id Ut_rec_1
0 0 3
2 2 5
3 3 5
5 5 9
7 7 4
result = Index block_id Ut_rec_0 Ut_rec_1
0 0 7 3
1 1 10 NaN
2 2 2 5
3 3 0 5
4 4 10 NaN
5 5 3 9
6 6 6 NaN
7 7 9 4
I already tried something like, but it did not work:
df_result = pd.concat([df1, df2], join_axes=[df1['block_id']])
I already tried:
df_result = pd.concat([df1,df2,axis = 1)
But the result was:
Index block_id Ut_rec_0 Index block_id Ut_rec_1
0 0 7 0.0 0.0 3.0
1 1 10 1.0 2.0 5.0
2 2 2 2.0 3.0 5.0
3 3 0 3.0 5.0 9.0
4 4 10 4.0 7.0 4.0
5 5 3 NaN NaN NaN
6 6 6 NaN NaN NaN
7 7 9 NaN NaN NaN
pandas.DataFrame.join can "join" dataframes based on overlap in column data (or index). Something like this will likely work for you:
df1.join(df2.set_index('block_id'), on='block_id')
As #Wen said best would be using concat with axis as 1, like the below code:
pd.concat([df1, df2],axis=1)
you need, pd.merge with outer join,
pd.merge(df1,df2,on=['Index','block_id'],how='outer')
#[out]
#Index block_id Ut_rec_0 Ut_rec_1
#0 0 7 3.0
#1 1 10 NaN
#2 2 2 5.0
#3 3 0 5.0
#4 4 10 NaN
#5 5 3 9.0
#6 6 6 NaN
#7 7 9 4.0
Related
I am trying to shift data in a Pandas dataframe in the following manner from this:
time
value
1
1
2
2
3
3
4
4
5
5
1
6
2
7
3
8
4
9
5
10
To this:
time
value
1
2
3
1
4
2
5
3
1
2
3
6
4
7
5
8
In short, I want to move the data 3 rows down each time a new cycle for a time block begins.
Have not been able to find solution on this, as it seems my English is quite limited not knowing how to describe the problem without an example.
Edit:
Both solutions work. Thank you.
IIUC, you can shift per group:
df['value_shift'] = df.groupby(df['time'].eq(1).cumsum())['value'].shift(2)
output:
time value value_shift
0 1 1 NaN
1 2 2 NaN
2 3 3 1.0
3 4 4 2.0
4 5 5 3.0
5 1 6 NaN
6 2 7 NaN
7 3 8 6.0
8 4 9 7.0
9 5 10 8.0
Try with groupby:
df["value"] = df.groupby(df["time"].diff().lt(0).cumsum())["value"].shift(2)
>>> df
time value
0 1 NaN
1 2 NaN
2 3 1.0
3 4 2.0
4 5 3.0
5 1 NaN
6 2 NaN
7 3 6.0
8 4 7.0
9 5 8.0
I have started learning python, and I wanted to ask if there is an alternative faster solution to the below nested loop:
for y in range(total_rows2):
for x in range(total_rows1):
if df2.iloc[y,0]==df1.iloc[x,0]:
df1.iloc[x,1]=df1.iloc[x,1]+df2.iloc[y,17]
Basically, I have found the number of rows (total_rows1 and total_rows2) of two dataframes (df1 and df2). The first column of both dataframes (index=0) correspond to some IDs.
If the IDs of the two dataframes match, then I want to add the value of column 18 of df2(index 17, column name='Profit') to the second column of df1 (index=1, column name='Profit'). An id may appear twice in df2 but I will appear the sum in df1 (please see below for id 0108). So the 'Profit' column of df1 will be the sum of Profit per ID.
df2:
---
ID
Profit
0
0104
0
1
0106
0
2
0107
0
3
0108
0
df1:
---
ID
Loss
Profit
0
0104
100
230
1
0106
200
150
2
0108
150
120
3
0107
120
230
4
0109
100
400
5
0108
150
400
So I want df2 to look as followed:
---
ID
Profit
0
0104
230
1
0106
150
2
0107
230
3
0108
520
Thanks!
I think just merging the two dfs on that first column and then doing the addition would be fine.
frames:
>>> df1
ID B C D
0 e 3 8 1
1 d 5 1 1
2 g 6 5 1
3 e 8 8 7
4 j 9 3 6
5 i 4 0 5
6 g 0 4 1
7 a 3 7 2
8 e 0 6 9
9 b 2 9 6
>>> df2
ID col_17
0 j 9
1 c 3
2 d 6
3 g 4
4 h 4
5 g 5
6 e 1
7 d 8
8 b 0
9 i 6
Merge:
>>> df3 = df1.merge(df2,how='left',on='ID')
>>> df3
ID B C D col_17
0 e 3 8 1 1.0
1 d 5 1 1 6.0
2 d 5 1 1 8.0
3 g 6 5 1 4.0
4 g 6 5 1 5.0
5 e 8 8 7 1.0
6 j 9 3 6 9.0
7 i 4 0 5 6.0
8 g 0 4 1 4.0
9 g 0 4 1 5.0
10 a 3 7 2 NaN
11 e 0 6 9 1.0
12 b 2 9 6 0.0
Add:
>>> df3['B']=np.where(df3['col_17'].notna(),df3['B']+df3['col_17'],df3['B'])
>>> df3
ID B C D col_17
0 e 4.0 8 1 1.0
1 d 11.0 1 1 6.0
2 d 13.0 1 1 8.0
3 g 10.0 5 1 4.0
4 g 11.0 5 1 5.0
5 e 9.0 8 7 1.0
6 j 18.0 3 6 9.0
7 i 10.0 0 5 6.0
8 g 4.0 4 1 4.0
9 g 5.0 4 1 5.0
10 a 3.0 7 2 NaN
11 e 1.0 6 9 1.0
12 b 2.0 9 6 0.0
I have a data frame (sample, not real):
df =
A B C D E F
0 3 4 NaN NaN NaN NaN
1 9 8 NaN NaN NaN NaN
2 5 9 4 7 NaN NaN
3 5 7 6 3 NaN NaN
4 2 6 4 3 NaN NaN
Now I want to fill NaN values with previous couple(!!!) values of row (fill Nan with left existing couple of numbers and apply to the whole row) and apply this to the whole dataset.
There are a lot of answers concerning filling the columns. But in
this case I need to fill based on rows.
There are also answers related to fill NaN based on other column, but
in my case number of columns are more than 2000. This is sample data
Desired output is:
df =
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
IIUC, a quick solution without reshaping the data:
df.iloc[:,::2] = df.iloc[:,::2].ffill(1)
df.iloc[:,1::2] = df.iloc[:,1::2].ffill(1)
df
Output:
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Idea is reshape DataFrame for possible forward and back filling missing values with stack and modulo and integer division of 2 of array by length of columns:
c = df.columns
a = np.arange(len(df.columns))
df.columns = [a // 2, a % 2]
#if possible some pairs missing remove .astype(int)
df1 = df.stack().ffill(axis=1).bfill(axis=1).unstack().astype(int)
df1.columns = c
print (df1)
A B C D E F
0 3 4 3 4 3 4
1 9 8 9 8 9 8
2 5 9 4 7 4 7
3 5 7 6 3 6 3
4 2 6 4 3 4 3
Detail:
print (df.stack())
0 1 2
0 0 3 NaN NaN
1 4 NaN NaN
1 0 9 NaN NaN
1 8 NaN NaN
2 0 5 4.0 NaN
1 9 7.0 NaN
3 0 5 6.0 NaN
1 7 3.0 NaN
4 0 2 4.0 NaN
1 6 3.0 NaN
Probably a similar question has been asked before, but I could not find anyone to solve my problem. Maybe I am not using the proper search words!.
I have two pandas Dataframes as below:
import pandas as pd
import numpy as np
df1
a = np.array([1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3])
b = np.array([1,1,2,2,3,3,1,1,2,2,3,3,1,1,2,2,3,3])
df1 = pd.DataFrame({'a':a, 'b':b})
print(df1)
a b
0 1 1
1 1 1
2 1 2
3 1 2
4 1 3
5 1 3
6 2 1
7 2 1
8 2 2
9 2 2
10 2 3
11 2 3
12 3 1
13 3 1
14 3 2
15 3 2
16 3 3
17 3 3
df2 is as below:
a2 = np.array([1,1,1,2,2,2,3,3,3])
b2 = np.array([1,2,3,1,2,3,1,2,3])
c = np.array([4,8,3,np.nan, 2, 5,6, np.nan, 1])
df2 = pd.DataFrame({'a':a2, 'b':b2, 'c': c})
a b c
0 1 1 4.0
1 1 2 8.0
2 1 3 3.0
3 2 1 NaN
4 2 2 2.0
5 2 3 5.0
6 3 1 6.0
7 3 2 NaN
8 3 3 1.0
Now I want to map column c from df2 to df1 but keeping the grouping of columns a=a1 and b=b2. Therefore, df1 is modified as shown below
a b c
0 1 1 4
1 1 1 4
2 1 2 8
3 1 2 8
4 1 3 3
5 1 3 3
6 2 1 NaN
7 2 1 NaN
8 2 2 2.0
9 2 2 2.0
10 2 3 5.0
11 2 3 5.0
12 3 1 6.0
13 3 1 6.0
14 3 2 NaN
15 3 2 NaN
16 3 3 1.0
17 3 3 1.0
How can I achieve this with simple and intuitive way using pandas?
Quite simple using merge:
df1.merge(df2)
a b c
0 1 1 4.0
1 1 1 4.0
2 1 2 8.0
3 1 2 8.0
4 1 3 3.0
5 1 3 3.0
6 2 1 NaN
7 2 1 NaN
8 2 2 2.0
9 2 2 2.0
10 2 3 5.0
11 2 3 5.0
12 3 1 6.0
13 3 1 6.0
14 3 2 NaN
15 3 2 NaN
16 3 3 1.0
17 3 3 1.0
If you have more columns and you want to specifically only merge on a and b, use:
df1.merge(df2, on=['a','b'])
I am trying to join (merge) two dataframes based on values in each column.
For instance, to merge by values in columns in A and B.
So, having df1
A B C D L
0 4 3 1 5 1
1 5 7 0 3 2
2 3 2 1 6 4
And df2
A B E F L
0 4 3 4 5 1
1 5 7 3 3 2
2 3 8 5 5 5
I want to get a d3 with such structure
A B C D E F L
0 4 3 1 5 4 5 1
1 5 7 0 3 3 3 2
2 3 2 1 6 Nan Nan 4
3 3 8 Nan Nan 5 5 5
Can you, please help me? I've tried both merge and join methods but havent succeed.
UPDATE: (for updated DFs and new desired DF)
In [286]: merged = pd.merge(df1, df2, on=['A','B'], how='outer', suffixes=('','_y'))
In [287]: merged.L.fillna(merged.pop('L_y'), inplace=True)
In [288]: merged
Out[288]:
A B C D L E F
0 4 3 1.0 5.0 1.0 4.0 5.0
1 5 7 0.0 3.0 2.0 3.0 3.0
2 3 2 1.0 6.0 4.0 NaN NaN
3 3 8 NaN NaN 5.0 5.0 5.0
Data:
In [284]: df1
Out[284]:
A B C D L
0 4 3 1 5 1
1 5 7 0 3 2
2 3 2 1 6 4
In [285]: df2
Out[285]:
A B E F L
0 4 3 4 5 1
1 5 7 3 3 2
2 3 8 5 5 5
OLD answer:
you can use pd.merge(..., how='outer') method:
In [193]: pd.merge(a,b, on=['A','B'], how='outer')
Out[193]:
A B C D E F
0 4 3 1.0 5.0 4.0 5.0
1 5 7 0.0 3.0 3.0 3.0
2 3 2 1.0 6.0 NaN NaN
3 3 8 NaN NaN 5.0 5.0
Data:
In [194]: a
Out[194]:
A B C D
0 4 3 1 5
1 5 7 0 3
2 3 2 1 6
In [195]: b
Out[195]:
A B E F
0 4 3 4 5
1 5 7 3 3
2 3 8 5 5