How to change Nan value to particular number in Dataframe?

How to change Nan value to particular number in Dataframe? - python

DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
OP 1 2.33 1.711 1.218 1.046 1.150 1.025 1.046 1.092 nan -
OP 2 3.043 1.691 1.362 1.174 1.067 1.048 1.051 1.059
OP 3 4.054 1.717 1.238 1.132 1.068 1.056 1.045
OP 4 3.014 1.748 1.327 1.103 1.093 1.116
OP 5 2.798 1.862 1.241 1.242 1.148
OP 6 3.973 1.589 1.553 1.161
OP 7 3.372 1.552 1.458
OP 8 3.359 1.871
OP 9 3.494
OP 10
this is the dataframe DF1 ;
for ele in DF1:
x = ele+2.0
print(x)
this will give the output:
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
OP 1 4.33 3.711 3.218 3.046 3.150 3.025 3.046 3.092 nan -
OP 2 5.043 3.691 3.362 3.174 3.067 3.048 3.051 3.059
OP 3 6.054 3.717 3.238 3.132 3.068 3.056 3.045
OP 4 5.014 3.748 3.327 3.103 3.093 3.116
OP 5 4.798 3.862 3.241 3.242 3.148
OP 6 5.973 3.589 3.553 3.161
OP 7 5.372 3.552 3.458
OP 8 5.359 3.871
OP 9 5.494
OP 10
But i Need Output like :
DP 1 DP 2 DP 3 DP 4 DP 5 DP 6 DP 7 DP 8 DP 9 DP 10
OP 1 4.33 3.711 3.218 3.046 3.150 3.025 3.046 3.092 2.0 -
OP 2 5.043 3.691 3.362 3.174 3.067 3.048 3.051 3.059
OP 3 6.054 3.717 3.238 3.132 3.068 3.056 3.045
OP 4 5.014 3.748 3.327 3.103 3.093 3.116
OP 5 4.798 3.862 3.241 3.242 3.148
OP 6 5.973 3.589 3.553 3.161
OP 7 5.372 3.552 3.458
OP 8 5.359 3.871
OP 9 5.494
OP 10
that means if i add nan to number then it should give the respective number.

Does this help?
for ele in DF1:
for ind,val in ele:
if np.isnan(val):
ele[ind] = 2.0
else:
ele[ind] = val+2.0

As you want:
import pandas as pd
import numpy as np
data = [[1,10],[2,12],[3,13],[4,10],[5,12],[np.nan,13]]
df = pd.DataFrame(data,columns=['a','b'],dtype=float)
for element in df['a']:
if(element >= 0):
x = element + 2.0
else:
x = 2.0
print(x)
Easy Way:
df.fillna(2.0)

One way is to simply redefine addition so that x+nan evaluates to x, but that's rather dangerous. Safer is to define a custom function:
def nan_sum(a,b):
if not a:
return b
if not b:
return a
return a+b
Then you can apply it to the dataframe: DF1.applymap(lambda x: nan_sum(x,2.0))

You can utilize the np.nan_to_num() function which is specifically designed to replace nans with zeros. Its default behavior is to replace nans with 0.0.
import numpy as np
df.applymap(lambda x: np.nan_to_num(x)+2)

Related

Apply custom rolling function to pandas dataframe with datetime index

I have a pandas dataframe on which I wish to apply my own custom rolling function as follows:
def testms(x, field):
mu = np.sum(x[field])
si = np.sum(x[field])/len(x[field])
x['mu'] = mu
x['si'] = si
return x
df2 = pd.concat([pd.DataFrame({'A':[1,1,1,1,1,2,2,2,2,2]}),
pd.DataFrame({'B':random_dates(pd.to_datetime('2015-01-01'),
pd.to_datetime('2018-01-01'), 10)}),
pd.DataFrame({'C':np.random.rand(10)})],axis=1)
df2
A B C
0 1 2016-08-25 01:09:42.953011200 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895
2 1 2016-10-23 05:33:14.994806400 0.568045
3 1 2016-08-20 17:41:03.991027200 0.925597
4 1 2016-04-09 17:59:00.805200000 0.071036
5 2 2016-12-09 13:06:00.751737600 0.087129
6 2 2016-04-25 00:47:45.953232000 0.020218
7 2 2017-09-05 06:35:58.432531200 0.832620
8 2 2017-11-23 03:18:47.370528000 0.778157
9 2 2016-02-25 15:14:53.907532800 0.870012
tester = lambda x: testms(x, 'C')
df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1).apply(tester).reset_index()
However when I apply the above code, I get the following error:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

If use Rolling.apply it working differently like GroupBy.apply - it processing each columns separately and not possible return multiple columns, only scalars:
So in your solution are necessary 2 functions, where is not possible specify column, but column for processing is specify after groupby:
def testms1(x):
mu = np.sum(x)
return mu
def testms2(x):
#same like mean
#si = np.sum(x)/len(x)
si = np.mean(x)
return si
tester1 = lambda x: testms1(x)
tester2 = lambda x: testms2(x)
r = df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1)
s1 = r.apply(tester1, raw=False).rename('mu')
s2 = r.apply(tester2, raw=False).rename('si')
df = pd.concat([s1, s2], axis=1).reset_index()
print (df)
A B mu si
0 1 2016-08-25 01:09:42.953011200 0.791725 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895 0.528895
2 1 2016-10-23 05:33:14.994806400 1.096940 0.548470
3 1 2016-08-20 17:41:03.991027200 2.022537 0.674179
4 1 2016-04-09 17:59:00.805200000 2.093573 0.523393
5 2 2016-12-09 13:06:00.751737600 0.087129 0.087129
6 2 2016-04-25 00:47:45.953232000 0.107347 0.053673
7 2 2017-09-05 06:35:58.432531200 0.832620 0.832620
8 2 2017-11-23 03:18:47.370528000 1.610777 0.805389
9 2 2016-02-25 15:14:53.907532800 2.480789 0.826930
Alternative solution with Resampler.aggregate:
r = df2.set_index('B').groupby('A')['C'].rolling('90D', min_periods=1)
df1 = r.agg(['sum','mean']).rename(columns={'sum':'mu', 'mean':'si'}).reset_index()
print (df1)
A B mu si
0 1 2016-08-25 01:09:42.953011200 0.791725 0.791725
1 1 2017-02-23 13:30:20.296310399 0.528895 0.528895
2 1 2016-10-23 05:33:14.994806400 1.096940 0.548470
3 1 2016-08-20 17:41:03.991027200 2.022537 0.674179
4 1 2016-04-09 17:59:00.805200000 2.093573 0.523393
5 2 2016-12-09 13:06:00.751737600 0.087129 0.087129
6 2 2016-04-25 00:47:45.953232000 0.107347 0.053673
7 2 2017-09-05 06:35:58.432531200 0.832620 0.832620
8 2 2017-11-23 03:18:47.370528000 1.610777 0.805389
9 2 2016-02-25 15:14:53.907532800 2.480789 0.826930

How to create Traingular moving average in python using for loop

I use python pandas to caculate the following formula
(https://i.stack.imgur.com/XIKBz.png)
I do it in python like this :
EURUSD['SMA2']= EURUSD['Close']. rolling (2).mean()
EURUSD['TMA2']= ( EURUSD['Close'] + EURUSD[SMA2']) / 2
The proplem is long coding when i calculated TMA 100 , so i need to use " for loop " to easy change TMA period .
Thanks in advance
Edited :
I had found the code but there is an error :
values = []
for i in range(1,201): values.append(eurusd['Close']).rolling(window=i).mean() values.mean()

TMA is average of averages.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
print(df)
# df['mean0']=df.mean(0)
df['mean1']=df.mean(1)
print(df)
df['TMA'] = df['mean1'].rolling(window=10,center=False).mean()
print(df)
Or you can easily print it.
print(df["mean1"].mean())
Here is how it looks:
0 1 2 3 4
0 0.643560 0.412046 0.072525 0.618968 0.080146
1 0.018226 0.222212 0.077592 0.125714 0.595707
2 0.652139 0.907341 0.581802 0.021503 0.849562
3 0.129509 0.315618 0.711265 0.812318 0.757575
4 0.881567 0.455848 0.470282 0.367477 0.326812
5 0.102455 0.156075 0.272582 0.719158 0.266293
6 0.412049 0.527936 0.054381 0.587994 0.442144
7 0.063904 0.635857 0.244050 0.002459 0.423960
8 0.446264 0.116646 0.990394 0.678823 0.027085
9 0.951547 0.947705 0.080846 0.848772 0.699036
0 1 2 3 4 mean1
0 0.643560 0.412046 0.072525 0.618968 0.080146 0.365449
1 0.018226 0.222212 0.077592 0.125714 0.595707 0.207890
2 0.652139 0.907341 0.581802 0.021503 0.849562 0.602470
3 0.129509 0.315618 0.711265 0.812318 0.757575 0.545257
4 0.881567 0.455848 0.470282 0.367477 0.326812 0.500397
5 0.102455 0.156075 0.272582 0.719158 0.266293 0.303313
6 0.412049 0.527936 0.054381 0.587994 0.442144 0.404901
7 0.063904 0.635857 0.244050 0.002459 0.423960 0.274046
8 0.446264 0.116646 0.990394 0.678823 0.027085 0.451842
9 0.951547 0.947705 0.080846 0.848772 0.699036 0.705581
0 1 2 3 4 mean1 TMA
0 0.643560 0.412046 0.072525 0.618968 0.080146 0.365449 NaN
1 0.018226 0.222212 0.077592 0.125714 0.595707 0.207890 NaN
2 0.652139 0.907341 0.581802 0.021503 0.849562 0.602470 NaN
3 0.129509 0.315618 0.711265 0.812318 0.757575 0.545257 NaN
4 0.881567 0.455848 0.470282 0.367477 0.326812 0.500397 NaN
5 0.102455 0.156075 0.272582 0.719158 0.266293 0.303313 NaN
6 0.412049 0.527936 0.054381 0.587994 0.442144 0.404901 NaN
7 0.063904 0.635857 0.244050 0.002459 0.423960 0.274046 NaN
8 0.446264 0.116646 0.990394 0.678823 0.027085 0.451842 NaN
9 0.951547 0.947705 0.080846 0.848772 0.699036 0.705581 0.436115

Transform dataframe value to range value in Python 3

I have a dataframe with the values:
3.05
35.97
49.11
48.80
48.02
10.61
25.69
6.02
55.36
0.42
47.87
2.26
54.43
8.85
8.75
14.29
41.29
35.69
44.27
1.08
I want transform the value into range and give new value to each value.
From the df we know the min value is 0.42 and the max value is 55.36.
From range min to max, I want divide to 4 group which is:
0.42 - 14.15 transform to 1
14.16 - 27.88 transform to 2
27.89 - 41.61 transform to 3
41.62 - 55.36 transform to 4
so the result I expected is
1
3
4
4
4
1
2
1
4
1
4
1
4
1
1
2
3
3
4
1

This is normally called binning, but pandas calls it cut. Sample code is below:
import pandas as pd
# Create a list of numbers, with a header called "nums"
data_list = [('nums', [3.05, 35.97, 49.11, 48.80, 48.02, 10.61, 25.69, 6.02, 55.36, 0.42, 47.87, 2.26, 54.43, 8.85, 8.75, 14.29, 41.29, 35.69, 44.27, 1.08])]
# Create the labels for the bin
bin_labels = [1,2,3,4]
# Create the dataframe object using the data_list
df = pd.DataFrame.from_items(data_list)
# Define the scope of the bins
bins = [0.41, 14.16, 27.89, 41.62, 55.37]
# Create the "bins" column using the cut function using the bins and labels
df['bins'] = pd.cut(df['nums'], bins=bins, labels=bin_labels)
This creates a dataframe which has the following structure:
print(df)
nums bins
0 3.05 1
1 35.97 3
2 49.11 4
3 48.80 4
4 48.02 4
5 10.61 1
6 25.69 2
7 6.02 1
8 55.36 4
9 0.42 1
10 47.87 4
11 2.26 1
12 54.43 4
13 8.85 1
14 8.75 1
15 14.29 2
16 41.29 3
17 35.69 3
18 44.27 4
19 1.08 1

You could construct a function like the following to have full control over the process:
def transform(l):
l2 = []
for i in l:
if 0.42 <= i <= 14.15:
l2.append(1)
elif i <= 27.8:
l2.append(2)
elif i <= 41.61:
l2.append(3)
elif i <= 55.36:
l2.append(4)
return(l2)
df['nums'] = transform(df['nums'])

Finding smallest value in python

I am looking for some help with creating a code for the following in python
I have made an attempt at an answer but I am not quite sure how to finish it. Here is what I have so far
import numpy as np
import math
from numpy import cos
x=10**(-p)
funct = (1-math.cos(x))/x
So I have defined my function that I am trying to calculate, I believe I did that correctly with
funct = (1-math.cos(x))/x
I have said what x needs to be with
x=10**(-p)
But how do I add the code to find the smallest value of p which has no correct significant digit at x = 10**-p when using standard double precision?
Do I need to somehow use
print(min(funct))
Looking for some help with this execution, thanks!
Edit: new code
import numpy as np
import math
for p in range(10):
x=10.0**-p;
result = (1-np.cos(x))/x
print (p)
print (result)
Test = 2*np.sin(x/2)**2/x
print (p)
print(Test)
gives the results:
0
0.459697694132
0
0.459697694132
1
0.0499583472197
1
0.0499583472197
2
0.00499995833347
2
0.00499995833347
3
0.000499999958326
3
0.000499999958333
4
4.99999996961e-05
4
4.99999999583e-05
5
5.0000004137e-06
5
4.99999999996e-06
6
5.00044450291e-07
6
5e-07
7
4.99600361081e-08
7
5e-08
8
0.0
8
5e-09
9
0.0
9
5e-10

With the loop
for p in range(15): x=10.0**-p; print p, x, (1-np.cos(x))/x, 2*np.sin(x/2)**2/x
I get the values for the expression and a theoretically equivalent expression
p x (1-cos(x))/x 2*sin²(x/2)/x
0 1.0 0.459697694132 0.459697694132
1 0.1 0.0499583472197 0.0499583472197
2 0.01 0.00499995833347 0.00499995833347
3 0.001 0.000499999958326 0.000499999958333
4 0.0001 4.99999996961e-05 4.99999999583e-05
5 1e-05 5.0000004137e-06 4.99999999996e-06
6 1e-06 5.00044450291e-07 5e-07
7 1e-07 4.99600361081e-08 5e-08
8 1e-08 0.0 5e-09
9 1e-09 0.0 5e-10
10 1e-10 0.0 5e-11
11 1e-11 0.0 5e-12
12 1e-12 0.0 5e-13
13 1e-13 0.0 5e-14
14 1e-14 0.0 5e-15
but I have no idea how to interpret the task to give a valid answer. Could be p=5 or could be p=8.

Multiplying data within columns python

I've been working on this all morning and for the life of me cannot figure it out. I'm sure this is very basic, but I've become so frustrated my mind is being clouded. I'm attempting to calculate the total return of a portfolio of securities at each date (monthly).
The formula is (1 + r1) * (1+r2) * (1+ r(t))..... - 1
Here is what I'm working with:
Adj_Returns = Adj_Close/Adj_Close.shift(1)-1
Adj_Returns['Risk Parity Portfolio'] = (Adj_Returns.loc['2003-01-31':]*Weights.shift(1)).sum(axis = 1)
Adj_Returns
SPY IYR LQD Risk Parity Portfolio
Date
2002-12-31 NaN NaN NaN 0.000000
2003-01-31 -0.019802 -0.014723 0.000774 -0.006840
2003-02-28 -0.013479 0.019342 0.015533 0.011701
2003-03-31 -0.001885 0.010015 0.001564 0.003556
2003-04-30 0.088985 0.045647 0.020696 0.036997
For example, with 2002-12-31 being base 100 for risk parity, I want 2003-01-31 to be 99.316 (100 * (1-0.006840)), 2003-02-28 to be 100.478 (99.316 * (1+ 0.011701)) so on and so forth.
Thanks!!

You want to use pd.DataFrame.cumprod
df.add(1).cumprod().sub(1).sum(1)
Consider the dataframe of returns df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.normal(.025, .03, (10, 5)), columns=list('ABCDE'))
df
A B C D E
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521
1 0.024191 0.034487 0.035463 0.046461 0.048123
2 0.006754 0.035572 0.014424 0.012524 -0.002347
3 0.020724 0.047405 -0.020125 0.043341 0.037007
4 -0.003783 0.069827 0.014605 -0.019147 0.056897
5 0.056890 0.042756 0.033886 0.001758 0.049944
6 0.069609 0.032687 -0.001997 0.036253 0.009415
7 0.026503 0.053499 -0.006013 0.053447 0.047013
8 0.062084 0.029664 -0.015238 0.029886 0.062748
9 0.048341 0.065248 -0.024081 0.019139 0.028955
We can see the cumulative return or total return is
df.add(1).cumprod().sub(1)
A B C D E
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521
1 -0.015641 0.020983 0.000139 0.001702 0.063343
2 -0.008993 0.057301 0.014565 0.014247 0.060847
3 0.011544 0.107423 -0.005853 0.058206 0.100105
4 0.007717 0.184750 0.008666 0.037944 0.162699
5 0.065046 0.235405 0.042847 0.039769 0.220768
6 0.139183 0.275786 0.040764 0.077464 0.232261
7 0.169375 0.344039 0.034505 0.135051 0.290194
8 0.241974 0.383909 0.018742 0.168973 0.371151
9 0.302013 0.474207 -0.005791 0.191346 0.410852
Plot it
df.add(1).cumprod().sub(1).plot()
Add sum of returns to new column
df.assign(Portfolio=df.add(1).cumprod().sub(1).sum(1))
A B C D E Portfolio
0 -0.038892 -0.013054 -0.034115 -0.042772 0.014521 -0.114311
1 0.024191 0.034487 0.035463 0.046461 0.048123 0.070526
2 0.006754 0.035572 0.014424 0.012524 -0.002347 0.137967
3 0.020724 0.047405 -0.020125 0.043341 0.037007 0.271425
4 -0.003783 0.069827 0.014605 -0.019147 0.056897 0.401777
5 0.056890 0.042756 0.033886 0.001758 0.049944 0.603835
6 0.069609 0.032687 -0.001997 0.036253 0.009415 0.765459
7 0.026503 0.053499 -0.006013 0.053447 0.047013 0.973165
8 0.062084 0.029664 -0.015238 0.029886 0.062748 1.184749
9 0.048341 0.065248 -0.024081 0.019139 0.028955 1.372626

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to change Nan value to particular number in Dataframe? - python

Does this help? for ele in DF1: for ind,val in ele: if np.isnan(val): ele[ind] = 2.0 else: ele[ind] = val+2.0

As you want: import pandas as pd import numpy as np data = [[1,10],[2,12],[3,13],[4,10],[5,12],[np.nan,13]] df = pd.DataFrame(data,columns=['a','b'],dtype=float) for element in df['a']: if(element >= 0): x = element + 2.0 else: x = 2.0 print(x) Easy Way: df.fillna(2.0)

One way is to simply redefine addition so that x+nan evaluates to x, but that's rather dangerous. Safer is to define a custom function: def nan_sum(a,b): if not a: return b if not b: return a return a+b Then you can apply it to the dataframe: DF1.applymap(lambda x: nan_sum(x,2.0))

You can utilize the np.nan_to_num() function which is specifically designed to replace nans with zeros. Its default behavior is to replace nans with 0.0. import numpy as np df.applymap(lambda x: np.nan_to_num(x)+2)

Related

Apply custom rolling function to pandas dataframe with datetime index

How to create Traingular moving average in python using for loop

Transform dataframe value to range value in Python 3

Finding smallest value in python

Multiplying data within columns python

Categories

Resources