I have a dataframe with multiple columns and rows
For all columns I need to say the row value is equal to 0.5 of this row + 0.5 of the row befores value.
I currently set up a loop which is working. But I feel there is a better way without using a loop. Does anyone have any thoughts?
dataframe = df_input
df_output=df_input.copy()
for i in range(1, df_input.shape[0]):
try:
df_output.iloc[[i]]= (df_input.iloc[[i-1]]*(1/2)).values+(df_input.iloc[[i]]*(1/2)).values
except:
pass
Do you mean sth like this:
First creating test data:
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 20, [5, 3]), columns=['A', 'B', 'C'])
A B C
0 6 19 14
1 10 7 6
2 18 10 10
3 3 7 2
4 1 11 5
Your requested function:
(df*.5).rolling(2).sum()
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
EDIT:
for an unbalanced sum you can define an auxiliary function:
def weighted_mean(arr):
return sum(arr*[.25, .75])
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
EDIT2:
...and if the weights should be to be set at runtime:
def weighted_mean(arr, weights=[.5, .5]):
return sum(arr*weights/sum(weights))
No additional argument defaults to balanced mean:
df.rolling(2).apply(weighted_mean, raw=True)
A B C
0 NaN NaN NaN
1 8.0 13.0 10.0
2 14.0 8.5 8.0
3 10.5 8.5 6.0
4 2.0 9.0 3.5
An unbalanced mean:
df.rolling(2).apply(weighted_mean, raw=True, args=[[.25, .75]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
The division by sum(weights) enables the definition of weights not only restricted to fractions of one, but by any ratio:
df.rolling(2).apply(weighted_mean, raw=True, args=[[1, 3]])
A B C
0 NaN NaN NaN
1 9.00 10.00 8.00
2 16.00 9.25 9.00
3 6.75 7.75 4.00
4 1.50 10.00 4.25
df.rolling(window=2, min_periods=1).apply(lambda x: x[0]*0.5 + x[1] if len(x) > 1 else x)
This will do the same operation for all columns.
Explanation: For each rolling object the lambda chooses the columns and x are structured like [this_col[i], this_col[i+1]] for all cols, and then doing custom arithmetic is straightforward.
Some
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 1)), columns=['a'])
df["cumsum_a"] = 0.5*df["a"].cumsum() + 0.5*df["a"]
thing like below?
Related
I have a DataFrame that looks like this:
df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan],
'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})
a b
0 1.0 4.0
1 2.0 2.0
2 NaN 3.0
3 1.0 NaN
4 NaN NaN
5 NaN 1.0
6 4.0 5.0
7 2.0 NaN
8 3.0 5.0
9 NaN 8.0
I want to dynamically replace the nan values. I have tried doing (df.ffill()+df.bfill())/2 but that does not yield the desired output, as it casts the fill value to the whole column at once, rather then dynamically. I have tried with interpolate, but it doesn't work well for non linear data.
I have seen this answer but did not fully understand it and not sure if it would work.
Update on the computation of the values
I want every nan value to be the mean of the previous and next non nan value. In case there are more than 1 nan value in sequence, I want to replace one at a time and then compute the mean e.g., in case there is 1, np.nan, np.nan, 4, I first want the mean of 1 and 4 (2.5) for the first nan value - obtaining 1,2.5,np.nan,4 - and then the second nan will be the mean of 2.5 and 4, getting to 1,2.5,3.25,4
The desired output is
a b
0 1.00 4.0
1 2.00 2.0
2 1.50 3.0
3 1.00 2.0
4 2.50 1.5
5 3.25 1.0
6 4.00 5.0
7 2.00 5.0
8 3.00 5.0
9 1.50 8.0
Inspired by the #ye olde noobe answer (thanks to him!):
I've optimized it to make it ≃ 100x faster (times comparison below):
def custom_fillna(s:pd.Series):
for i in range(len(s)):
if pd.isna(s[i]):
last_valid_number = (s[s[:i].last_valid_index()] if s[:i].last_valid_index() is not None else 0)
next_valid_numer = (s[s[i:].first_valid_index()] if s[i:].first_valid_index() is not None else 0)
s[i] = (last_valid_number+next_valid_numer)/2
custom_fillna(df['a'])
df
Times comparison:
Maybe not the most optimized, but it works (note: from your example, I assume that if there is no valid value before or after a NaN, like the last row on column a, 0 is used as a replacement):
import pandas as pd
def fill_dynamically(s: pd.Series):
for i in range(len(s)):
s[i] = (
(0 if s[i:].first_valid_index() is None else s[i:][s[i:].first_valid_index()]) +
(0 if s[:i+1].last_valid_index() is None else s[:i+1][s[:i+1].last_valid_index()])
) / 2
Use like this for the full dataframe:
df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan],
'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})
df.apply(fill_dynamically)
df after applying:
a b
0 1.00 4.0
1 2.00 2.0
2 1.50 3.0
3 1.00 2.0
4 2.50 1.5
5 3.25 1.0
6 4.00 5.0
7 2.00 5.0
8 3.00 5.0
9 1.50 8.0
In case you would have other columns and don't want to apply that on the whole dataframe, you can of course use it on a single column, like that:
df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan],
'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})
fill_dynamically(df['a'])
In this case, df looks like that:
a b
0 1.00 4.0
1 2.00 2.0
2 1.50 3.0
3 1.00 NaN
4 2.50 NaN
5 3.25 1.0
6 4.00 5.0
7 2.00 NaN
8 3.00 5.0
9 1.50 8.0
Below is a portion of mydataframe which has many missing values.
A B
S a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
I would like to replace the NANs in each column using a specific backward fill condition .
For example, in column (A,a) missing values appear for dates 16th, 17th, 18th and 19th. The next value is '4' against 20th. I want this value (the next non missing value in the column) to be distributed among all these dates including 20th at a progressively increasing value of 10%. That is column (A,a) gets values of .655, .720,.793,.872 & .96 approximately for the dates 16th, 17th, 18th, 19th & 20th. This shall be the approach for all columns for all missing values across rows.
I tried using bfill() function but unable to fathom how to incorporate the required formula as an option.
I have checked the link Pandas: filling missing values in time series forward using a formula and a few other links on stackoverflow. This is somewhat similar, but in my case the the number of NANs in a given column are variable in nature and span multiple rows. Compare columns (A,a) with column (A,d) or column (B,d). Given this, I am finding it difficult to adopt the solution to my problem.
Appreciate any inputs.
Here is a completely vectorized way to do this. It is very efficient and fast: 130 ms on a 1000 x 1000 matrix. This is a good opportunity to expose some interesting techniques using numpy.
First, let's dig a bit into the requirements, specifically what exactly the value for each cell needs to be.
The example given is [nan, nan, nan, nan, 4.0] --> [.66, .72, .79, .87, .96], which is explained to be a "progressively increasing value of 10%" (in such a way that the total is the "value to spread": 4.0).
This is a geometric series with rate r = 1 + 0.1: [r^1, r^2, r^3, ...] and then normalized to sum to 1. For example:
r = 1.1
a = 4.0
n = 5
q = np.cumprod(np.repeat(r, n))
a * q / q.sum()
# array([0.65518992, 0.72070892, 0.79277981, 0.87205779, 0.95926357])
We'd like to do a direct calculation (to avoid calling Python functions and explicit loops, which would be much slower), so we need to express that normalizing factor q.sum() in closed form. It is a well-established quantity and is:
To generalize, we need 3 quantities to calculate the value of each cell:
a: value to distribute
i: index of run (0 .. n-1)
n: run length
then, the value is v = a * r**i * (r - 1) / (r**n - 1).
To illustrate with the first column in the OP's example, where the input is: [1, nan, nan, nan, nan, 4], we would like:
a = [1, 4, 4, 4, 4, 4]
i = [0, 0, 1, 2, 3, 4]
n = [1, 5, 5, 5, 5, 5]
then, the value v would be (rounded at 2 decimals): [1. , 0.66, 0.72, 0.79, 0.87, 0.96].
Now comes the part where we go about getting these three quantities as numpy arrays.
a is the easiest and is simply df.bfill().values. But for i and n, we do have to do a little bit of work, starting by assigning the values to a numpy array:
z = df.values
nrows, ncols = z.shape
For i, we start with the cumulative count of NaNs, with reset when values are not NaN. This is strongly inspired by this SO answer for "Cumulative counts in NumPy without iteration". But we do it for a 2D array, and we also want to add a first row of 0, and discard the last row to satisfy exactly our needs:
def rcount(z):
na = np.isnan(z)
without_reset = na.cumsum(axis=0)
reset_at = ~na
overcount = np.maximum.accumulate(without_reset * reset_at)
result = without_reset - overcount
return result
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
For n, we need to do some dancing on our own, using first principles of numpy (I'll break down the steps if I have time):
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
So, putting it all together:
def spread_bfill(df, r=1.1):
z = df.values
nrows, ncols = z.shape
a = df.bfill().values
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
v = a * r**i * (r - 1) / (r**n - 1)
return pd.DataFrame(v, columns=df.columns, index=df.index)
On your example data, we get:
>>> spread_bfill(df).round(2) # round(2) for printing purposes
A B
a b c d e a b c d e
S
2020-10-15 1.00 2.00 0.52 1.21 1.17 10.00 11.00 1.68 3.93 1.68
2020-10-16 0.66 0.98 0.57 1.33 1.28 1.64 0.33 1.85 4.32 1.85
2020-10-17 0.72 1.08 0.63 1.46 1.41 1.80 0.36 2.04 4.75 2.04
2020-10-18 0.79 1.19 0.69 0.30 1.55 1.98 0.40 2.24 1.21 2.24
2020-10-19 0.87 1.31 0.76 0.33 1.71 2.18 0.44 2.47 1.33 2.47
2020-10-20 0.96 1.44 0.83 0.37 1.88 2.40 0.48 2.71 1.46 2.71
For inspection, let's look at each of the 3 quantities in that example:
>>> a
[[ 1 2 4 4 9 10 11 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]]
>>> i
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 1 1 0 0 1 1 1]
[1 1 2 2 2 1 1 2 2 2]
[2 2 3 0 3 2 2 3 0 3]
[3 3 4 1 4 3 3 4 1 4]
[4 4 5 2 5 4 4 5 2 5]]
>>> n
[[1 1 6 3 6 1 1 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]]
And here is a final example, to illustrate what happens if a column ends with 1 or several NaNs (they remain NaN):
np.random.seed(10)
a = np.random.randint(0, 10, (6, 6)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
>>> df
0 1 2 3 4 5
0 NaN NaN NaN NaN NaN 0.0
1 NaN NaN 9.0 NaN 8.0 NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN 8.0 4.0 NaN NaN NaN
4 NaN NaN NaN 6.0 9.0 NaN
5 NaN NaN 2.0 NaN 7.0 8.0
Then:
>>> spread_bfill(df).round(2) # round(2) for printing
0 1 2 3 4 5
0 NaN 1.72 4.29 0.98 3.81 0.00
1 NaN 1.90 4.71 1.08 4.19 1.31
2 NaN 2.09 1.90 1.19 2.72 1.44
3 NaN 2.29 2.10 1.31 2.99 1.59
4 NaN NaN 0.95 1.44 3.29 1.74
5 NaN NaN 1.05 NaN 7.00 1.92
Speed
a = np.random.randint(0, 10, (1000, 1000)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
%timeit spread_bfill(df)
# 130 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Initial data:
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
Define your geometric sequence:
def geomseq(seq):
q = 1.1
n = len(seq)
S = seq.max()
Uo = S * (1-q) / (1-q**n)
Un = [Uo * q**i for i in range(0, n)]
return Un
TL;DR
>>> df.unstack().groupby(df.unstack().sort_index(ascending=False).notna().cumsum().sort_index()).transform(geomseq).unstack(level=[0, 1])
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542
Details
Convert your dataframe to series:
>>> sr = df.unstack()
>>> sr.head(10)
date
A a 2020-10-15 1.0
2020-10-16 NaN # <= group X (final value: .655)
2020-10-17 NaN # <= group X (final value: .720)
2020-10-18 NaN # <= group X (final value: .793)
2020-10-19 NaN # <= group X (final value: .872)
2020-10-20 4.0 # <= group X (final value: .960)
b 2020-10-15 2.0
2020-10-16 NaN
2020-10-17 NaN
2020-10-18 NaN
dtype: float64
Now you can build groups:
>>> groups = sr.sort_index(ascending=False).notna().cumsum().sort_index()
>>> groups.head(10)
date
A a 2020-10-15 16
2020-10-16 15 # <= group X15
2020-10-17 15 # <= group X15
2020-10-18 15 # <= group X15
2020-10-19 15 # <= group X15
2020-10-20 15 # <= group X15
b 2020-10-15 14
2020-10-16 13
2020-10-17 13
2020-10-18 13
dtype: int64
Apply your geometric progression:
>>> sr = sr.groupby(groups).transform(geomseq)
>>> sr.head(10)
date
A a 2020-10-15 1.000000
2020-10-16 0.655190 # <= group X15
2020-10-17 0.720709 # <= group X15
2020-10-18 0.792780 # <= group X15
2020-10-19 0.872058 # <= group X15
2020-10-20 0.959264 # <= group X15
b 2020-10-15 2.000000
2020-10-16 0.982785
2020-10-17 1.081063
2020-10-18 1.189170
dtype: float64
And finally, reshape series according to your initial dataframe:
>>> df = sr.unstack(level=[0, 1])
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542
I have this table
df= pd.DataFrame ({'A': [0,1.5,2.1,2.9,4], 'B': [1.5,2.05,3,4,5]})
here I have 2 problems, a gap and an overlapping, I would like to detect automatically using python pandas, someone can help me, thanks
df= pd.DataFrame ({'A': [0,1.5,2.1,2.9,4], 'Validate': [1.5,2.05,3,4,5], 'test': ['np.nan', np.nan, 'gab','over', np.nan]})
thanks
IIUC:
s = df.B.shift() - (df.A)
df['Validate'] = np.select((s>0, s<0), ('over', 'gap'), default=np.nan)
Output:
A B Validate
0 0.0 1.50 nan
1 1.5 2.05 nan
2 2.1 3.00 gap
3 2.9 4.00 over
4 4.0 5.00 nan
We can using sign from numpy
df['Validate']=np.sign(df.B.shift().sub(df.A)).map({1:'over',-1:'gap'})
df
Out[150]:
A B Validate
0 0.0 1.50 NaN
1 1.5 2.05 NaN
2 2.1 3.00 gap
3 2.9 4.00 over
4 4.0 5.00 NaN
When I am trying to use fillna to replace NaNs in the columns with means, the NaNs changed from float64 to object, showing:
bound method Series.mean of 0 NaN\n1
Here is the code:
mean = df['texture_mean'].mean
df['texture_mean'] = df['texture_mean'].fillna(mean)`
You cannot use mean = df['texture_mean'].mean. This is where the problem lies. The following code will work -
df=pd.DataFrame({'texture_mean':[2,4,None,6,1,None],'A':[1,2,3,4,5,None]}) # Example
df
A texture_mean
0 1.0 2.0
1 2.0 4.0
2 3.0 NaN
3 4.0 6.0
4 5.0 1.0
5 NaN NaN
df['texture_mean']=df['texture_mean'].fillna(df['texture_mean'].mean())
df
A texture_mean
0 1.0 2.00
1 2.0 4.00
2 3.0 3.25
3 4.0 6.00
4 5.0 1.00
5 NaN 3.25
In case you want to replace all the NaNs with the respective means of that column in all columns, then just do this -
df=df.fillna(df.mean())
df
A texture_mean
0 1.0 2.00
1 2.0 4.00
2 3.0 3.25
3 4.0 6.00
4 5.0 1.00
5 3.0 3.25
Let me know if this is what you want.
I am sure there must be a very simple solution to this problem, but I am failing to find it (and browsing through previously asked questions, I didn't find the answer I wanted or didn't understand it).
I have a dataframe similar to this (just much bigger, with many more rows and columns):
x val1 val2 val3
0 0.0 10.0 NaN NaN
1 0.5 10.5 NaN NaN
2 1.0 11.0 NaN NaN
3 1.5 11.5 NaN 11.60
4 2.0 12.0 NaN 12.08
5 2.5 12.5 12.2 12.56
6 3.0 13.0 19.8 13.04
7 3.5 13.5 13.3 13.52
8 4.0 14.0 19.8 14.00
9 4.5 14.5 14.4 14.48
10 5.0 15.0 19.8 14.96
11 5.5 15.5 15.5 15.44
12 6.0 16.0 19.8 15.92
13 6.5 16.5 16.6 16.40
14 7.0 17.0 19.8 18.00
15 7.5 17.5 17.7 NaN
16 8.0 18.0 19.8 NaN
17 8.5 18.5 18.8 NaN
18 9.0 19.0 19.8 NaN
19 9.5 19.5 19.9 NaN
20 10.0 20.0 19.8 NaN
In the next step, I need to compute the derivative dVal/dx for each of the value columns (in reality I have more than 3 columns, so I need to have a robust solution in a loop, I can't select the rows manually each time). But because of the NaN values in some of the columns, I am facing the problem that x and val are not of the same dimension. I feel the way to overcome this would be to only select only those x intervals, for which the val is notnull. But I am not able to do that. I am probably making some very stupid mistakes (I am not a programmer and I am very untalented, so please be patient with me:) ).
Here is the code so far (now that I think of it, I may have introduced some mistakes just by leaving some old pieces of code because I've been messing with it for a while, trying different things):
import pandas as pd
import numpy as np
df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')
vals = list(df.columns.values)[1:]
for i in vals:
V = np.asarray(pd.notnull(df[i]))
mask = pd.notnull(df[i])
X = np.asarray(df.loc[mask]['x'])
derivative=np.diff(V)/np.diff(X)
But I am getting this error:
ValueError: operands could not be broadcast together with shapes (20,) (15,)
So, apparently, it did not select only the notnull values...
Is there an obvious mistake that I am making or a different approach that I should adopt? Thanks!
(And another less important question: is np.diff the right function to use here or had I better calculated it manually by finite differences? I'm not finding numpy documentation very helpful.)
To calculate dVal/dX:
dVal = df.iloc[:, 1:].diff() # `x` is in column 0.
dX = df['x'].diff()
>>> dVal.apply(lambda series: series / dX)
val1 val2 val3
0 NaN NaN NaN
1 1 NaN NaN
2 1 NaN NaN
3 1 NaN NaN
4 1 NaN 0.96
5 1 NaN 0.96
6 1 15.2 0.96
7 1 -13.0 0.96
8 1 13.0 0.96
9 1 -10.8 0.96
10 1 10.8 0.96
11 1 -8.6 0.96
12 1 8.6 0.96
13 1 -6.4 0.96
14 1 6.4 3.20
15 1 -4.2 NaN
16 1 4.2 NaN
17 1 -2.0 NaN
18 1 2.0 NaN
19 1 0.2 NaN
20 1 -0.2 NaN
We difference all columns (except the first one), and then apply a lambda function to each column which divides it by the difference in column X.