Subtract two timestamps from two different columns that are in consecutive rows - python

So I have a pandas dataframe like this
a b
0 1 3
1 7 8
2 11 3
3 9 1
And I want to subtract the column b from the column a with the previous index, for example b[1] = 8 and a[0] = 1, then b[1] - a[0] = 7, that will be in c[1]
a b c
0 1 3 -
1 7 8 7
2 11 3 -4
3 9 1 -10
How can I do it? Thanks for your time and help :)

Use Series.sub with shifted values by Series.shift:
df['c'] = df['b'].sub(df['a'].shift())
print (df)
a b c
0 1 3 NaN
1 7 8 7.0
2 11 3 -4.0
3 9 1 -10.0
For integers use integer na by convert to Int64:
df['c'] = df['b'].sub(df['a'].shift(1)).astype('Int64')
print (df)
a b c
0 1 3 <NA>
1 7 8 7
2 11 3 -4
3 9 1 -10

Related

Can You Preserve Column Order When Pandas Dataframe.Combine Or DataFrame.Combine_First?

If you have 2 dataframes, represented as:
A F Y
0 1 2 3
1 4 5 6
And
B C T
0 7 8 9
1 10 11 12
When combining it becomes:
A B C F T Y
0 1 7 8 2 9 3
1 4 10 11 5 12 6
I would like it to become:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
How do I combine 1 data frame with another but keep the original column order?
In [1294]: new_df = df.join(df1)
In [1295]: new_df
Out[1295]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12
OR you can also use pd.merge(not a very clean solution though)
In [1297]: df['tmp' ] =1
In [1298]: df1['tmp'] = 1
In [1309]: pd.merge(df, df1, on=['tmp'], left_index=True, right_index=True).drop('tmp', 1)
Out[1309]:
A F Y B C T
0 1 2 3 7 8 9
1 4 5 6 10 11 12

Pandas Use a column consists of column names to populate the values dynamically into another column

I would like to obtain the 'Value' column below, from the original df:
A B C Column_To_Use
0 2 3 4 A
1 5 6 7 C
2 8 0 9 B
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0
Use DataFrame.lookup:
df['Value'] = df.lookup(df.index, df['Column_To_Use'])
print (df)
A B C Column_To_Use Value
0 2 3 4 A 2
1 5 6 7 C 7
2 8 0 9 B 0

Subtract values from maximum value within groups

Trying to take a df and create a new column thats based on the difference between the Value in a group and that groups max:
Group Value
A 4
A 6
A 10
B 5
B 8
B 11
End up with a new column "from_max"
from_max
6
4
0
6
3
0
I tried this but a ValueError:
df['from_max'] = df.groupby(['Group']).apply(lambda x: x['Value'].max() - x['Value'])
Thanks in Advance
Option 1
vectorised groupby + transform
df['from_max'] = df.groupby('Group').Value.transform('max') - df.Value
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Option 2
index aligned subtraction
df['from_max'] = (df.groupby('Group').Value.max() - df.set_index('Group').Value).values
df
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
I think need GroupBy.transform for return Series with same size as original DataFrame:
df['from_max'] = df.groupby(['Group'])['Value'].transform(lambda x: x.max() - x)
Or:
df['from_max'] = df.groupby(['Group'])['Value'].transform(max) - df['Value']
Alternative is Series.map by aggregate max:
df['from_max'] = df['Group'].map(df.groupby(['Group'])['Value'].max()) - df['Value']
print (df)
Group Value from_max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0
Using reindex
df['From_Max']=df.groupby('Group').Value.max().reindex(df.Group).values-df.Value.values
df
Out[579]:
Group Value From_Max
0 A 4 6
1 A 6 4
2 A 10 0
3 B 5 6
4 B 8 3
5 B 11 0

sort dataframe by position in group then by that group

consider the dataframe df
df = pd.DataFrame(dict(
A=list('aaaaabbbbccc'),
B=range(12)
))
print(df)
A B
0 a 0
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 6
7 b 7
8 b 8
9 c 9
10 c 10
11 c 11
I want to sort the dataframe such if I grouped by column 'A' I'd pull the first position from each group, then cycle back and get the second position from each group if any are remaining. So on and so forth.
I'd expect results tot look like this
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4
You can use cumcount for count values in groups first, then sort_values and reindex by Series cum:
cum = df.groupby('A')['B'].cumcount().sort_values()
print (cum)
0 0
5 0
9 0
1 1
6 1
10 1
2 2
7 2
11 2
3 3
8 3
4 4
dtype: int64
print (df.reindex(cum.index))
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4
Here's a NumPy approach -
def approach1(g, v):
# Inputs : 1D arrays of groupby and value columns
id_arr2 = np.ones(v.size,dtype=int)
sf = np.flatnonzero(g[1:] != g[:-1])+1
id_arr2[sf[0]] = -sf[0]+1
id_arr2[sf[1:]] = sf[:-1] - sf[1:]+1
return id_arr2.cumsum().argsort(kind='mergesort')
Sample run -
In [246]: df
Out[246]:
A B
0 a 0
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 6
7 b 7
8 b 8
9 c 9
10 c 10
11 c 11
In [247]: df.iloc[approach1(df.A.values, df.B.values)]
Out[247]:
A B
0 a 0
5 b 5
9 c 9
1 a 1
6 b 6
10 c 10
2 a 2
7 b 7
11 c 11
3 a 3
8 b 8
4 a 4
Or using df.reindex from #jezrael's post :
df.reindex(approach1(df.A.values, df.B.values))

pandas compare and select the smallest number from another dataframe

I have two dataframes.
df1
Out[162]:
a b c
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
11 11 11 11
df2
Out[194]:
A B
0 a 3
1 b 4
2 c 5
I wish to create a 3rd column in df2 that maps df2['A'] to df1 and find the smallest number in df1 that's greater than the number in df2['B']. For example, for df2['C'].ix[0], it should go to df1['a'] and search for the smallest number that's greater than df2['B'].ix[0], which should be 4.
I had something like df2['C'] = df2['A'].map( df1[df1 > df2['B']].min() ). But this doesn't work as it won't go to df2['B'] search for corresponding rows. Thanks.
Use apply for row-wise methods:
In [54]:
# create our data
import pandas as pd
df1 = pd.DataFrame({'a':list(range(12)), 'b':list(range(12)), 'c':list(range(12))})
df1
Out[54]:
a b c
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
11 11 11 11
[12 rows x 3 columns]
In [68]:
# create our 2nd dataframe, note I have deliberately used alternate values for column 'B'
df2 = pd.DataFrame({'A':list('abc'), 'B':[3,5,7]})
df2
Out[68]:
A B
0 a 3
1 b 5
2 c 7
[3 rows x 2 columns]
In [69]:
# apply row-wise function, must use axis=1 for row-wise
df2['C'] = df2.apply(lambda row: df1[row['A']].ix[df1[row.A] > row.B].min(), axis=1)
df2
Out[69]:
A B C
0 a 3 4
1 b 5 6
2 c 7 8
[3 rows x 3 columns]
There is some example usage in the pandas docs

Categories