How do I subtract an odd row value from even row vlaue? - python

I have dataframe below.
I want to even row value substract from odd row value.
and make new dataframe.
How can I do it?
import pandas as pd
import numpy as np
raw_data = {'Time': [281.54385, 436.55295, 441.74910, 528.36445,
974.48405, 980.67895, 986.65435, 1026.02485]}
data = pd.DataFrame(raw_data)
data
dataframe
Time
0 281.54385
1 436.55295
2 441.74910
3 528.36445
4 974.48405
5 980.67895
6 986.65435
7 1026.02485
Wanted result
ON_TIME
0 155.00910
1 86.61535
2 6.19490
3 39.37050

You can use NumPy indexing:
res = pd.DataFrame(data.values[1::2] - data.values[::2], columns=['Time'])
print(res)
Time
0 155.00910
1 86.61535
2 6.19490
3 39.37050

you can use shift for the subtraction, and then pick every 2nd element, starting with the 2nd element (index = 1)
(data.Time - data.Time.shift())[1::2].rename('On Time').reset_index(drop=True)
outputs:
0 155.00910
1 86.61535
2 6.19490
3 39.37050
Name: On Time, dtype: float64

Related

Finding out if values in dataframe increases in tens place

I'm trying to figure out if the value in my dataframe is increasing in the tens/hundreds place. For example I created a dataframe with a few values, I duplicate the values and shifted them and now i'm able to compare them. But how do i code and find out if the tens place is increasing or if it just increasing by a little, for example 0.02 points.
import pandas as pd
import numpy as np
data = {'value':['9','10','19','22','31']}
df = pd.DataFrame(data)
df['value_copy'] = df['value'].shift(1)
df['Increase'] = np.where(df['value']<df['value_copy'],1,0)
output should be in this case:
[nan,1,0,1,1]
IIUC, divide by 10, get the floor, then compare the successive values (diff(1)) to see if the difference is exactly 1:
np.floor(df['value'].astype(float).div(10)).diff(1).eq(1).astype(int)
If you want a jump to at least the next tens (or more) use ge (≥):
np.floor(df['value'].astype(float).div(10)).diff(1).ge(1).astype(int)
output:
0 0
1 1
2 0
3 1
4 1
Name: value, dtype: int64
NB. if you insist on the NaN:
s = np.floor(df['value'].astype(float).div(10)).diff(1)
s.eq(1).astype(int).mask(s.isna())
output:
0 NaN
1 1.0
2 0.0
3 1.0
4 1.0
Name: value, dtype: float64

create a new data frame from existing data frame based on condition

I have a data frame df
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df
Now, from data frame df I like to create a new data frame based on condition
Condition: if a column contain three or more than three '1' then the new data frame column value is '1' otherwise '0'
expected output of new data frame
1 0 1 0 0 1
You can also get it without apply. You could sum along the rows, axis=0, and creating a boolean with gt(2):
res = df.sum(axis=0).gt(2).astype(int)
print(res)
0 1
1 0
2 1
3 0
4 0
5 1
dtype: int32
As David pointed out, the result of the above is a series. If you require a dataframe, you can chain to_frame() at the end of it
You could do the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1],
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))
In [6]: df_res
Out[6]:
0
0 1
1 0
2 1
3 0
4 0
5 1
Instead of np.sum(c) you can also do c.sum()
And if you want it transposed just do the following instead:
df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T

changing index of 1 row in pandas

I have the the below df build from a pivot of a larger df. In this table 'week' is the the index (dtype = object) and I need to show week 53 as the first row instead of the last
Can someone advice please? I tried reindex and custom sorting but can't find the way
Thanks!
here is the table
Since you can't insert the row and push others back directly, a clever trick you can use is create a new order:
# adds a new column, "new" with the original order
df['new'] = range(1, len(df) + 1)
# sets value that has index 53 with 0 on the new column
# note that this comparison requires you to match index type
# so if weeks are object, you should compare df.index == '53'
df.loc[df.index == 53, 'new'] = 0
# sorts values by the new column and drops it
df = df.sort_values("new").drop('new', axis=1)
Before:
numbers
weeks
1 181519.23
2 18507.58
3 11342.63
4 6064.06
53 4597.90
After:
numbers
weeks
53 4597.90
1 181519.23
2 18507.58
3 11342.63
4 6064.06
One way of doing this would be:
import pandas as pd
df = pd.DataFrame(range(10))
new_df = df.loc[[df.index[-1]]+list(df.index[:-1])].reset_index(drop=True)
output:
0
9 9
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Alternate method:
new_df = pd.concat([df[df["Year week"]==52], df[~(df["Year week"]==52)]])

python panda new column with order of values

I would like to make a new column with the order of the numbers in a list. I get 3,1,0,4,2,5 ( index of the lowest numbers ) but I would like to have a new column with 2,1,4,0,3,5 ( so if I look at a row i get the list and I get in what order this number comes in the total list. what am I doing wrong?
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df.sort_values(by='list').index
print(df)
What you're looking for is the rank:
import pandas as pd
df = pd.DataFrame({'list': [4,3,6,1,5,9]})
df['order'] = df['list'].rank().sub(1).astype(int)
Result:
list order
0 4 2
1 3 1
2 6 4
3 1 0
4 5 3
5 9 5
You can use the method parameter to control how to resolve ties.

Pandas Idxmax on a date-value DataFrame

Given this DataFrame:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Date':['20/03/17 10:30:34','20/03/17 10:31:24','20/03/17 10:34:34'],
'Value':[4,7,5]})
df['Date'] = pd.to_datetime(df.Date)
df
Out[53]:
Date Value
0 2017-03-20 10:30:34 4
1 2017-03-20 10:31:24 7
2 2017-03-20 10:34:34 5
Im am trying to extract the max value and its index. I can get the max value by df.Value.max() but when I use df.idxmax() to get the Index fo the value I get a TypeError:
TypeError: float() argument must be a string or a number
Is there any other way to get the Index of the Max value of a Dataframe? (Or any way to correct this one?)
Because it should be:
df.Value.idxmax()
It then returns 1.
If you only care about the Value column, you can use:
df.Value.idxmax()
>>> 1
However, it is strange that it fails on both columns with df.idxmax() as the following works, too:
df.Date.idxmax()
>>> 2
df.idxmax() also works for some other dummy data:
dummy = pd.DataFrame(np.random.random(size=(5,2)))
print(dummy)
0 1
0 0.944017 0.365198
1 0.541003 0.447632
2 0.583375 0.081192
3 0.492935 0.570310
4 0.832320 0.542983
print(dummy.idxmax())
0 0
1 3
dtype: int64
You have to specify from which column you want to get maximum value-idx.
To get the idx of maxumum value use:
df.Value.idxmax()
if you want to get the idx of maximum Date use:
df.Date.idxmax()

Categories