Issue in Appending data using Pandas - python

Problem: - I want to build a logic that take data like Attendance data, In Time, Employee Id and return a data frame with employee id, in time, attendance date and basically in which slot the employee entered. (Suppose In time is 9:30:00 of date 14-10-2019 so that if employee came at 9:30 so for that date and for that column it insert a value one.)
Given Example below
I tried lots of time to build logic for this problem but failed to build.
I have a dataset that looks like this.
I want an output like this so that whatever the time (employee enter's) it only insert data to that time column only column only.:
This is my code but its only repeating last loop.
temp =[]
for date in nf['DaiGong']:
for en in nf['EnNo']:
for i in nf['DateTime']:
col=['EnNo','Date','InTime','9:30-10:30','10:30-11:00','11:00-11:30','11:30-12:30','12:30-13:00','13:00-13:30']
ndf=pd.DataFrame(columns=col)
if i < '10:30:00' and i > '09:30:00':
temp.append(1)
ndf['9:30-10:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '11:00:00' and i > '10:30:00':
temp.append(1)
ndf['10:30-11:00'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '11:30:00' and i > '11:00:00':
temp.append(1)
ndf['11:00-11:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '12:30:00' and i > '11:30:00':
temp.append(1)
ndf['11:30-12:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '13:00:00' and i > '12:30:00':
temp.append(1)
ndf['12:30-13:00'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
elif i < '13:30:00' and i > '13:00:00':
temp.append(1)
ndf['13:00-13:30'] = temp
ndf['InTime'] = i
ndf['Date'] = date
ndf['EnNo'] = en
This is the output of my code.

IIUC,
df = pd.DataFrame({'EnNo':[2,2,2,2,2,3,3,3,3],
'DaiGong':['2019-10-12', '2019-10-13', '2019-10-14', '2019-10-15', '2019-10-16', '2019-10-12', '2019-10-13', '2019-10-14', '2019-10-15'],
'DateTime':['09:53:56', '10:53:56', '09:23:56', '11:53:56', '11:23:56', '10:33:56', '12:53:56', '12:23:56', '09:53:56']})
df
DaiGong DateTime EnNo
0 2019-10-12 09:53:56 2
1 2019-10-13 10:53:56 2
2 2019-10-14 09:23:56 2
3 2019-10-15 11:53:56 2
4 2019-10-16 11:23:56 2
5 2019-10-12 10:33:56 3
6 2019-10-13 12:53:56 3
7 2019-10-14 12:23:56 3
8 2019-10-15 09:53:56 3
import datetime
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.time #converting to datetime
def time_range(row): # I only wrote two conditions - add more
i = row['DateTime']
if i < datetime.time(10, 30, 0) and i > datetime.time(9, 30, 0):
return '9:30-10:30'
elif i < datetime.time(11, 0, 0) and i > datetime.time(10, 30, 0):
return '10:30-11:00'
else:
return 'greater than 11:00'
df['time range'] = df.apply(time_range, axis=1)
df1 = pd.concat([df[['EnNo', 'DaiGong', 'DateTime']], pd.get_dummies(df['time range'])], axis=1)
df1
EnNo DaiGong DateTime 10:30-11:00 9:30-10:30 greater than 11:00
0 2 2019-10-12 09:53:56 0 1 0
1 2 2019-10-13 10:53:56 1 0 0
2 2 2019-10-14 09:23:56 0 0 1
3 2 2019-10-15 11:53:56 0 0 1
4 2 2019-10-16 11:23:56 0 0 1
5 3 2019-10-12 10:33:56 1 0 0
6 3 2019-10-13 12:53:56 0 0 1
7 3 2019-10-14 12:23:56 0 0 1
8 3 2019-10-15 09:53:56 0 1 0
To get sum of count by employee,
df1.groupby(['EnNo'], as_index=False).sum()
Let me know if you have any questions

My test data:
df:
EnNo DaiGong DateTime
2 2019-10-12 09:53:56
2 2019-10-13 09:42:00
2 2019-10-14 12:00:01
1 2019-11-01 11:12:00
1 2019-11-02 10:13:45
Create helper datas:
tdr=pd.timedelta_range("09:00:00","12:30:00",freq="30T")
s=pd.Series( len(tdr)*["-"] )
s[0]=1
cls=[ t.rsplit(":",maxsplit=1)[0] for t in tdr.astype(str) ]
cols=[ t1+"-"+t2 for (t1,t2) in zip(cls,cls[1:]) ]
cols.append(cls[-1]+"-")
tdr:
TimedeltaIndex(['09:00:00', '09:30:00', '10:00:00', '10:30:00', '11:00:00', '11:30:00', '12:00:00', '12:30:00'], dtype='timedelta64[ns]', freq='30T')
cols:
['09:00-09:30', '09:30-10:00', '10:00-10:30', '10:30-11:00', '11:00-11:30', '11:30-12:00', '12:00-12:30', '12:30-']
s:
0 1
1 -
2 -
3 -
4 -
5 -
6 -
7 -
dtype: object
Use 'apply' and 'searchsorted' to get time slots:
df2= df.DateTime.apply(lambda t: \
s.shift(tdr.searchsorted(t)-1,fill_value="-"))
df2.columns=cols
df2:
09:00-09:30 09:30-10:00 10:00-10:30 10:30-11:00 11:00-11:30 11:30-12:00 12:00-12:30 12:30-
0 - 1 - - - - - -
1 - 1 - - - - - -
2 - - - - - - 1 -
3 - - - - 1 - - -
4 - - 1 - - - - -
Finally, concatenate the two data frames:
df_rslt= pd.concat([df,df2],axis=1)
df_rslt:
EnNo DaiGong DateTime 09:00-09:30 09:30-10:00 10:00-10:30 10:30-11:00 11:00-11:30 11:30-12:00 12:00-12:30 12:30-
0 2 2019-10-12 09:53:56 - 1 - - - - - -
1 2 2019-10-13 09:42:00 - 1 - - - - - -
2 2 2019-10-14 12:00:01 - - - - - - 1 -
3 1 2019-11-01 11:12:00 - - - - 1 - - -
4 1 2019-11-02 10:13:45 - - 1 - - - - -

Related

if statement in for loop with pandas dataframes

I am making a Dollar Cost Average code where I want to choose between 2 equations. I made an excel spreadsheet that I'm trying to portover to python. I've gotten pretty far except for the last step. The last step has had me searching for a solution for 3 weeks now. The errors happen when I try a for loop in a df when looping through. I would like to check a column with an if the statement. If is true then do an equation if false do another equation. I can get the for loop to work and I can the if statements to work, but not combined. See all commented out code for whats been tried. I have tried np.where instead of the if statements as well. I have tried .loc. I have tried lamda. I have tried list comp. Nothing is working please help. FYI the code referring is ['trend bal'] column. ***see end with correct code.
What the df looks like:
Index timestamp Open High Low ... rate account bal invested ST_10_1.0 if trend
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... 1.000000 $10,000.00 10000 1 0
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... 0.015242 $10,252.42 10100 1 0
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... 0.005779 $10,411.67 10200 1 0
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... -0.016944 $10,335.25 10300 1 0
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... -0.005739 $10,375.93 10400 1 0
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... 0.005073 $10,528.57 10500 1 0
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... -0.022094 $10,395.95 10600 1 0
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... 0.007872 $10,577.79 10700 1 0
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... 0.018225 $10,870.57 10800 1 0
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... 0.016445 $11,149.33 10900 1 0
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... 0.013307 $11,397.70 11000 1 0
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... 0.020009 $11,725.75 11100 1 0
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... -0.003125 $11,789.11 11200 1 0
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... 0.003227 $11,927.15 11300 1 0
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... -0.021997 $11,764.79 11400 1 0
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... -0.012230 $11,720.92 11500 -1 100
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... 0.038249 $12,269.23 11600 -1 100
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... -0.019393 $12,131.29 11700 1 0
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... 0.022757 $12,507.36 11800 1 0
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... 0.008547 $12,714.25 11900 1 0
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... -0.018705 $12,576.44 12000 1 0
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... -0.003032 $12,638.31 12100 1 0
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... -0.056813 $12,020.29 12200 1 0
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... 0.007710 $12,212.97 12300 -1 100
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... 0.015497 $12,502.23 12400 -1 100
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... -0.018196 $12,374.75 12500 1 0
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... -0.015790 $12,279.35 12600 1 0
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... 0.008227 $12,480.38 12700 1 0
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... -0.012722 $12,421.61 12800 1 0
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... -0.028774 $12,164.19 12900 -1 100
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... 0.061558 $13,012.99 13000 -1 100
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... 0.017911 $13,346.07 13100 1 0
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... 0.000616 $13,454.30 13200 1 0
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... -0.012666 $13,383.88 13300 1 0
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... -0.021320 $13,198.53 13400 1 0
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... -0.027503 $12,935.53 13500 -1 100
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... -0.032738 $12,612.05 13600 -1 100
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... -0.002079 $12,685.83 13700 -1 100
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... -0.024119 $12,479.86 13800 -1 100
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... -0.030451 $12,199.84 13900 -1 100
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... 0.065844 $13,103.12 14000 -1 100
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... -0.011952 $13,046.51 14100 1 0
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... -0.050548 $12,487.03 14200 1 0
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... -0.057941 $11,863.52 14300 -1 100
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... 0.064465 $12,728.31 14400 -1 100
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... -0.022090 $12,547.14 14500 -1 100
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... 0.019358 $12,890.03 14600 -1 100
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... -0.009289 $12,870.29 14700 -1 100
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... 0.025489 $13,298.35 14800 -1 100
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... 0.042573 $13,964.51 14900 1 0
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... 0.003607 $14,114.88 15000 1 0
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... 0.032558 $14,674.44 15100 1 0
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... 0.000839 $14,786.75 15200 1 0
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... -0.012900 $14,696.00 15300 1 0
What it should look like:
Index timestamp Open High Low ... account bal invested ST_10_1.0 if trend trend bal
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... $10,000.00 10000 1 0 $10,000.00
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... $10,252.42 10100 1 0 $10,252.42
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... $10,411.67 10200 1 0 $10,411.67
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... $10,335.25 10300 1 0 $10,335.25
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... $10,375.93 10400 1 0 $10,375.93
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... $10,528.57 10500 1 0 $10,528.57
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... $10,395.95 10600 1 0 $10,395.95
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... $10,577.79 10700 1 0 $10,577.79
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... $10,870.57 10800 1 0 $10,870.57
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... $11,149.33 10900 1 0 $11,149.33
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... $11,397.70 11000 1 0 $11,397.70
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... $11,725.75 11100 1 0 $11,725.75
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... $11,789.11 11200 1 0 $11,789.11
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... $11,927.15 11300 1 0 $11,927.15
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... $11,764.79 11400 1 0 $11,764.79
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... $11,720.92 11500 -1 100 $11,720.92
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... $12,269.23 11600 -1 100 $11,820.92
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... $12,131.29 11700 1 0 $11,920.92
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... $12,507.36 11800 1 0 $12,292.19
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... $12,714.25 11900 1 0 $12,497.25
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... $12,576.44 12000 1 0 $12,363.49
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... $12,638.31 12100 1 0 $12,426.01
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... $12,020.29 12200 1 0 $11,820.05
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... $12,212.97 12300 -1 100 $12,011.19
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... $12,502.23 12400 -1 100 $12,111.19
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... $12,374.75 12500 1 0 $12,211.19
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... $12,279.35 12600 1 0 $12,118.38
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... $12,480.38 12700 1 0 $12,318.08
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... $12,421.61 12800 1 0 $12,261.37
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... $12,164.19 12900 -1 100 $12,008.56
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... $13,012.99 13000 -1 100 $12,108.56
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... $13,346.07 13100 1 0 $12,208.56
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... $13,454.30 13200 1 0 $12,316.09
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... $13,383.88 13300 1 0 $12,260.08
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... $13,198.53 13400 1 0 $12,098.70
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... $12,935.53 13500 -1 100 $11,865.95
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... $12,612.05 13600 -1 100 $11,965.95
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... $12,685.83 13700 -1 100 $12,065.95
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... $12,479.86 13800 -1 100 $12,165.95
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... $12,199.84 13900 -1 100 $12,265.95
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... $13,103.12 14000 -1 100 $12,365.95
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... $13,046.51 14100 1 0 $12,465.95
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... $12,487.03 14200 1 0 $11,935.81
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... $11,863.52 14300 -1 100 $11,344.24
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... $12,728.31 14400 -1 100 $11,444.24
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... $12,547.14 14500 -1 100 $11,544.24
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... $12,890.03 14600 -1 100 $11,644.24
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... $12,870.29 14700 -1 100 $11,744.24
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... $13,298.35 14800 -1 100 $11,844.24
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... $13,964.51 14900 1 0 $11,944.24
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... $14,114.88 15000 1 0 $12,087.33
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... $14,674.44 15100 1 0 $12,580.87
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... $14,786.75 15200 1 0 $12,691.42
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... $14,696.00 15300 1 0 $12,627.70
Python Code:
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
import ccxt
from matplotlib import pyplot as plt
import config
import schedule
import pandas as pd
import pandas_ta as ta
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from datetime import datetime
import time
import yfinance as yf
ticker = yf.Ticker('^GSPC')
df = ticker.history(period="1y", interval="1wk")
df.reset_index(inplace=True)
df.rename(columns = {'Date':'timestamp'}, inplace = True)
#df.drop(columns ={'Open', 'High', 'Low', 'Volume'}, inplace=True, axis=1)
df.drop(columns ={'Dividends', 'Stock Splits'}, inplace=True, axis=1)
# df['Close'].ffill(axis = 0, inplace = True)
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'if trend'] = 0
df.loc[df[f'ST_{ATR}_{Mult}'] == -1, 'if trend'] = weekly
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# if df[f'ST_{ATR}_{Mult}'] == 1:
# df.loc[i, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# else:
# df.loc[i, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
# for i in range(1, len(df)):
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
#df.to_csv('GSPC.csv',index=False,mode='a')
# plt.plot(df['timestamp'], df['account bal'])
# plt.plot(df['timestamp'], df['invested'])
# plt.plot(df['timestamp'], df['close'])
# plt.show()
print(df)
What some errors looks like:
np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
File "<__array_function__ internals>", line 180, in where
ValueError: operands could not be broadcast together with shapes (36,) () (54,)
Another error:
line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
No error but not the correct amounts:
df['trend bal'] = 0
for i in range(1, len(df)):
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
See photo of screenshot of excel formula:
excel spreadsheet
*** Made correct calculations thanks to Ingwersen_erik:
from re import X
import pandas as pd
import pandas_ta as ta
import numpy as np
pd.set_option('display.max_rows', None)
df = pd.read_csv('etcusd.csv')
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
MDD = ((df['account bal']-df['account bal'].max()) / df['account bal'].max()).min()
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df.index == 0, "trend bal"] = invest
for index, row in df.iloc[1:].iterrows():
row['trend bal'] = np.where(
df.loc[index - 1, f'ST_{ATR}_{Mult}'] == 1,
(df.loc[index - 1, 'trend bal'] * (1 + row['rate'])) + weekly,
df.loc[index - 1, 'trend bal'] + weekly,
)
df.loc[df.index == index, 'trend bal'] = row['trend bal']
print(df)
Does this solve your problem?
import time
import ccxt
import warnings
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
warnings.filterwarnings("ignore")
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
invest = 10_000
weekly = 100
fee = 0.15 / 100
fees = 1 - fee
ATR = 10
Mult = 1.0
ticker = yf.Ticker("^GSPC")
df = (
ticker.history(period="1y", interval="1wk")
.reset_index()
.rename(columns={"Date": "timestamp"})
.drop(columns={"Dividends", "Stock Splits"}, errors="ignore")
)
df.loc[df.index == 0, "rate"] = 1
df.loc[df.index > 0, "rate"] = (df["Close"] / df["Close"].shift(1)) - 1
df.loc[df.index == 0, "account bal"] = invest
df.loc[df.index == 0, "account bal"] = invest
for i in range(1, len(df)):
df.loc[i, "account bal"] = (
df.loc[i - 1, "account bal"] * (1 + df.loc[i, "rate"])
) + weekly
df["invested"] = (df.index * weekly) + invest
# Super-trend
ST = ta.supertrend(df["High"], df["Low"], df["Close"], ATR, Mult)
df[f"ST_{ATR}_{Mult}"] = ST[f"SUPERTd_{ATR}_{Mult}"]
df[f"ST_{ATR}_{Mult}"] = df[f"ST_{ATR}_{Mult}"].shift(1).fillna(1)
df.loc[df[f"ST_{ATR}_{Mult}"] == 1, "if trend"] = 0
df.loc[df[f"ST_{ATR}_{Mult}"] == -1, "if trend"] = weekly
df.loc[df.index == 0, "trend bal"] = invest
# === Potential correction to the np.where ==============================
for index, row in df.iloc[1:].iterrows():
row["trend bal"] = np.where(
row[f"ST_{ATR}_{Mult}"] == 1,
(df.loc[index - 1, "trend bal"] * (1 + row["rate"])) + weekly,
df.loc[index - 1, "trend bal"] + row["if trend"],
)
# NOTE: The original "otherwise" clause from `np.where` had the
# following value: `df.loc[index - index, "trend bal"] + ...`
# I assumed you meant `index -1`, instead of `index - index`,
# therefore the above code uses `index -1`. If you really meant
# `index - index`, please change the code accordingly.
df.loc[df.index == index, "trend bal"] = row["trend bal"]
df
Result:
timestamp
Open
High
Low
Close
Volume
rate
account bal
invested
ST_10_1.0
if trend
trend bal
2021-08-16
4382.44
4444.35
4367.73
4441.67
5988610000
1
10000
10000
1
0
10000
2021-08-23
4450.29
4513.33
4450.29
4509.37
14124930000
0.0152421
10252.4
10100
1
0
10252.4
2021-08-30
4513.76
4545.85
4513.76
4535.43
14256180000
0.00577909
10411.7
10200
1
0
10411.7
2021-09-06
4535.38
4535.38
4457.66
4458.58
11793790000
-0.0169444
10335.3
10300
1
0
10335.3
2021-09-13
4474.81
4492.99
4427.76
4432.99
17763120000
-0.00573946
10375.9
10400
1
0
10375.9
2021-09-20
4402.95
4465.4
4305.91
4455.48
15697030000
0.00507327
10528.6
10500
1
0
10528.6
2021-09-27
4442.12
4457.3
4288.52
4357.04
15555390000
-0.0220941
10396
10600
1
0
10396
2021-10-04
4348.84
4429.97
4278.94
4391.34
14795520000
0.00787227
10577.8
10700
1
0
10577.8
2021-10-11
4385.44
4475.82
4329.92
4471.37
13758090000
0.0182246
10870.6
10800
1
0
10870.6
2021-10-18
4463.72
4559.67
4447.47
4544.9
13966070000
0.0164446
11149.3
10900
1
0
11149.3
2021-10-25
4553.69
4608.08
4537.36
4605.38
16206040000
0.0133072
11397.7
11000
1
0
11397.7
2021-11-01
4610.62
4718.5
4595.06
4697.53
16397220000
0.0200092
11725.8
11100
1
0
11725.8
2021-11-08
4701.48
4714.92
4630.86
4682.85
15646510000
-0.00312498
11789.1
11200
1
0
11789.1
2021-11-15
4689.3
4717.75
4672.78
4697.96
15279660000
0.00322664
11927.2
11300
1
0
11927.2
2021-11-22
4712
4743.83
4585.43
4594.62
11775840000
-0.0219967
11764.8
11400
1
0
11764.8
2021-11-29
4628.75
4672.95
4495.12
4538.43
20242840000
-0.0122295
11720.9
11500
-1
100
11864.8
2021-12-06
4548.37
4713.57
4540.51
4712.02
15411530000
0.0382489
12269.2
11600
-1
100
11964.8
2021-12-13
4710.3
4731.99
4600.22
4620.64
19184960000
-0.0193929
12131.3
11700
1
0
11832.8
2021-12-20
4587.9
4740.74
4531.1
4725.79
10594350000
0.0227566
12507.4
11800
1
0
12202
2021-12-27
4733.99
4808.93
4733.99
4766.18
11687720000
0.00854675
12714.3
11900
1
0
12406.3
2022-01-03
4778.14
4818.62
4662.74
4677.03
16800900000
-0.0187048
12576.4
12000
1
0
12274.3
2022-01-10
4655.34
4748.83
4582.24
4662.85
17126800000
-0.00303177
12638.3
12100
1
0
12337.1
2022-01-17
4632.24
4632.24
4395.34
4397.94
14131200000
-0.0568129
12020.3
12200
1
0
11736.1
2022-01-24
4356.32
4453.23
4222.62
4431.85
21218590000
0.00771046
12213
12300
-1
100
11836.1
2022-01-31
4431.79
4595.31
4414.02
4500.53
18846100000
0.0154968
12502.2
12400
-1
100
11936.1
2022-02-07
4505.75
4590.03
4401.41
4418.64
19119200000
-0.0181956
12374.7
12500
1
0
11819
2022-02-14
4412.61
4489.55
4327.22
4348.87
17775970000
-0.0157899
12279.4
12600
1
0
11732.3
2022-02-21
4332.74
4385.34
4114.65
4384.65
16834460000
0.00822737
12480.4
12700
1
0
11928.9
2022-02-28
4354.17
4416.78
4279.54
4328.87
22302830000
-0.0127216
12421.6
12800
1
0
11877.1
2022-03-07
4327.01
4327.01
4157.87
4204.31
23849630000
-0.0287743
12164.2
12900
-1
100
11977.1
2022-03-14
4202.75
4465.4
4161.72
4463.12
24946690000
0.0615583
13013
13000
-1
100
12077.1
2022-03-21
4462.4
4546.03
4424.3
4543.06
19089240000
0.0179112
13346.1
13100
1
0
12393.4
2022-03-28
4541.09
4637.3
4507.57
4545.86
19212230000
0.000616282
13454.3
13200
1
0
12501.1
2022-04-04
4547.97
4593.45
4450.04
4488.28
19383860000
-0.0126665
13383.9
13300
1
0
12442.7
2022-04-11
4462.64
4471
4381.34
4392.59
13812410000
-0.02132
13198.5
13400
1
0
12277.4
2022-04-18
4385.63
4512.94
4267.62
4271.78
18149540000
-0.0275032
12935.5
13500
-1
100
12377.4
2022-04-25
4255.34
4308.45
4124.28
4131.93
19610750000
-0.032738
12612
13600
-1
100
12477.4
2022-05-02
4130.61
4307.66
4062.51
4123.34
21039720000
-0.00207901
12685.8
13700
-1
100
12577.4
2022-05-09
4081.27
4081.27
3858.87
4023.89
23166570000
-0.0241188
12479.9
13800
-1
100
12677.4
2022-05-16
4013.02
4090.72
3810.32
3901.36
20590520000
-0.0304506
12199.8
13900
-1
100
12777.4
2022-05-23
3919.42
4158.49
3875.13
4158.24
19139100000
0.0658437
13103.1
14000
-1
100
12877.4
2022-05-30
4151.09
4177.51
4073.85
4108.54
16049940000
-0.0119522
13046.5
14100
1
0
12823.5
2022-06-06
4134.72
4168.78
3900.16
3900.86
17547150000
-0.0505484
12487
14200
1
0
12275.3
2022-06-13
3838.15
3838.15
3636.87
3674.84
24639140000
-0.0579411
11863.5
14300
-1
100
12375.3
2022-06-20
3715.31
3913.65
3715.31
3911.74
19287840000
0.0644654
12728.3
14400
-1
100
12475.3
2022-06-27
3920.76
3945.86
3738.67
3825.33
17735450000
-0.0220899
12547.1
14500
-1
100
12575.3
2022-07-04
3792.61
3918.5
3742.06
3899.38
14223350000
0.0193578
12890
14600
-1
100
12675.3
2022-07-11
3880.94
3880.94
3721.56
3863.16
16313500000
-0.00928865
12870.3
14700
-1
100
12775.3
2022-07-18
3883.79
4012.44
3818.63
3961.63
16859220000
0.0254895
13298.4
14800
-1
100
12875.3
2022-07-25
3965.72
4140.15
3910.74
4130.29
17356830000
0.0425734
13964.5
14900
1
0
13523.5
2022-08-01
4112.38
4167.66
4079.81
4145.19
18072230000
0.00360747
14114.9
15000
1
0
13672.3
2022-08-08
4155.93
4280.47
4112.09
4280.15
18117740000
0.0325582
14674.4
15100
1
0
14217.4
2022-08-15
4269.37
4325.28
4218.7
4228.48
16255850000
-0.012072
14597.3
15200
1
0
14145.8
2022-08-19
4266.31
4266.31
4218.7
4228.48
2045645000
0
14697.3
15300
1
0
14245.8

Group By - Total Hours and Hours by Category in Python / Pandas

I need to calculate Total Hours and Hours by Status per Week using Python / Pandas GROUP BY.
Id Week Status Hours
1 01/10/2022 - 01/16/2022 On 5
2 01/10/2022 - 01/16/2022 Off 2
3 01/17/2022 - 01/23/2022 Off 6
4 01/17/2022 - 01/23/2022 On 1
5 01/17/2022 - 01/23/2022 On 5
6 01/03/2022 - 01/09/2022 On 10
7 01/10/2022 - 01/16/2022 Off 9
8 01/03/2022 - 01/09/2022 On 3
9 01/24/2022 - 01/30/2022 Off 4
10 01/24/2022 - 01/30/2022 On 7
test_data = {'Id': [1,2,3,4,5,6,7,8,9,10],
'Week': ['01/10/2022 - 01/16/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/03/2022 - 01/09/2022', '01/24/2022 - 01/30/2022', '01/24/2022 - 01/30/2022'],
'Status': ['On', 'Off', 'Off', 'On', 'On', 'On', 'Off', 'On', 'Off', 'On'],
'Hours': [5,2,6,1,5,10,9,3,4,7]}
test_df = pd.DataFrame(data=test_data)
I can get Total Hours by each Week:
test_df.groupby(by=['Week'], as_index=False).agg({"Hours": "sum"})
But I don't know how to also group by Status, so it will be 2 additional columns (On Status Hours and Off Status Hours)
If I add Status column just to the groupby part, it creates extra rows (I understand why)
test_df.groupby(by=['Week', 'Status'], as_index=False).agg({"Hours": "sum"})
Output I want:
Week
Total Hours
On Status Hours
Off Status Hours
01/03/2022 - 01/09/2022
13
13
0
01/10/2022 - 01/16/2022
16
5
11
01/17/2022 - 01/23/2022
12
6
6
01/24/2022 - 01/30/2022
11
7
4
You can use:
(test_df
.groupby(['Week', 'Status'])['Hours']
.sum()
.unstack(1, fill_value=0)
.add_suffix(' Status Hours')
.assign(**{'Total Hours': lambda d: d.sum(1)})
)
Output:
Status Off Status Hours On Status Hours Total Hours
Week
01/03/2022 - 01/09/2022 0 13 13
01/10/2022 - 01/16/2022 11 5 16
01/17/2022 - 01/23/2022 6 6 12
01/24/2022 - 01/30/2022 4 7 11
You can use pd.pivot_table to get your result:
x = pd.pivot_table(
test_df,
index="Week",
columns="Status",
values="Hours",
aggfunc="sum",
fill_value=0,
).add_suffix(" Status Hours")
x["Total Hours"] = x.sum(axis=1)
print(x)
Prints:
Status Off Status Hours On Status Hours Total Hours
Week
01/03/2022 - 01/09/2022 0 13 13
01/10/2022 - 01/16/2022 11 5 16
01/17/2022 - 01/23/2022 6 6 12
01/24/2022 - 01/30/2022 4 7 11

How to convert epoch time to GMT + 7 time in pandas dataframe?

I have a pandas dataframe that has a column created_date which is in epoch format. I wanted to use a filter condition as shown below.
Dataframe sample
created_time updated_time sys_time last_action_time account_id \
0 1624473000000 1624459148023 1624459148023 0 812
1 1624471920000 1624448094358 1624448094358 0 812
2 1624469400000 1624455267579 1624455267579 0 812
3 1624466580000 1624466620020 1624466590321 0 812
4 1624466529000 1624466610222 1624466540086 0 812
5 1624466501000 1624466610270 1624466510212 0 812
6 1624466461000 1624466620149 1624466469825 0 812
7 1624466443000 1624466446558 1624466446558 0 812
8 1624466435000 1624466460213 1624466460213 0 812
daily_data_df = [(data_df['created_time'] >= start_date_int) & (data_df['created_time'] < end_date_int)
Where,
start_date_int & end_date_int is GMT+7 timezone
created_time is epoch format
Please help me with the conversion.
First : Strip last 3 digits from column "created_time", it seems that epoch lengh is only 9-10 and you have 13 :
df['created_time'] = df['created_time'].astype(str).apply(lambda x: x[:-3])
Second : Convert from Unix epoch to datetime :
df['created_time'] = pd.to_datetime(df['created_time'], unit = 's')
Third : Filter date range (below sample date range) :
start_date_int = '2021-06-23 18:12:00'
end_date_int = '2021-06-23 18:30:00'
df_filterd = df[(df['created_time'] >= start_date_int) &\
(df['created_time'] < end_date_int)]
*Alternative method for filter :
df_filterd = df[df['created_time'].between(start_date_int , end_date_int)]
You can convert the epoch time to GMT+7 using pd.to_datetime() and dt.tz_convert(), as follows:
data_df['created_GMT+7'] = pd.to_datetime(data_df['created_time'], unit='ms', utc=True).dt.tz_convert('Etc/GMT+7')
Result:
print(data_df['created_GMT+7'])
0 2021-06-23 11:30:00-07:00
1 2021-06-23 11:12:00-07:00
2 2021-06-23 10:30:00-07:00
3 2021-06-23 09:43:00-07:00
4 2021-06-23 09:42:09-07:00
5 2021-06-23 09:41:41-07:00
6 2021-06-23 09:41:01-07:00
7 2021-06-23 09:40:43-07:00
8 2021-06-23 09:40:35-07:00
Name: created_GMT+7, dtype: datetime64[ns, Etc/GMT+7]
Then, filter the rows as follows:
start_date_int = 1624466460500
end_date_int = 1624469402000
mask = data_df['created_GMT+7'].between(pd.Timestamp(start_date_int, unit='ms', tz='Etc/GMT+7'), pd.Timestamp(end_date_int, unit='ms', tz='Etc/GMT+7'))
daily_data_df = data_df.loc[mask]
Or,
start_date_int = 1624466460500
end_date_int = 1624469402000
mask = ((data_df['created_GMT+7'] - pd.Timestamp("1970-01-01", tz='Etc/GMT+7')) // pd.Timedelta('1ms')).between(start_date_int, end_date_int)
daily_data_df = data_df.loc[mask]
Result:
(using the sample start_date_int and end_date_int above)
print(daily_data_df)
created_time updated_time sys_time last_action_time account_id created_GMT+7
2 1624469400000 1624455267579 1624455267579 0 812 2021-06-23 10:30:00-07:00
3 1624466580000 1624466620020 1624466590321 0 812 2021-06-23 09:43:00-07:00
4 1624466529000 1624466610222 1624466540086 0 812 2021-06-23 09:42:09-07:00
5 1624466501000 1624466610270 1624466510212 0 812 2021-06-23 09:41:41-07:00
6 1624466461000 1624466620149 1624466469825 0 812 2021-06-23 09:41:01-07:00

pandas: rapidly calculating sum of column with certain values

I have a pandas dataframe and I need to calculate the sum of a column of values that fall within a certain window. So for instance, if I have a window of 500, and my initial value is 1000, I want to sum all values that are between 499 and 999, and also between 1001 and 1501.
This is easier to explain with some data:
chrom pos end AFR EUR pi
0 1 10177 10177 0.4909 0.4056 0.495988
1 1 10352 10352 0.4788 0.4264 0.496369
2 1 10617 10617 0.9894 0.9940 0.017083
3 1 11008 11008 0.1346 0.0885 0.203142
4 1 11012 11012 0.1346 0.0885 0.203142
5 1 13110 13110 0.0053 0.0567 0.053532
6 1 13116 13116 0.0295 0.1869 0.176091
7 1 13118 13118 0.0295 0.1869 0.176091
8 1 13273 13273 0.0204 0.1471 0.139066
9 1 13550 13550 0.0008 0.0080 0.007795
10 1 14464 14464 0.0144 0.1859 0.161422
11 1 14599 14599 0.1210 0.1610 0.238427
12 1 14604 14604 0.1210 0.1610 0.238427
13 1 14930 14930 0.4811 0.5209 0.500209
14 1 14933 14933 0.0015 0.0507 0.044505
15 1 15211 15211 0.5371 0.7316 0.470848
16 1 15585 15585 0.0008 0.0020 0.002635
17 1 15644 15644 0.0008 0.0080 0.007795
18 1 15777 15777 0.0159 0.0149 0.030470
19 1 15820 15820 0.4849 0.2714 0.477153
20 1 15903 15903 0.0431 0.4652 0.349452
21 1 16071 16071 0.0091 0.0010 0.011142
22 1 16142 16142 0.0053 0.0020 0.007721
23 1 16949 16949 0.0227 0.0159 0.038759
24 1 18643 18643 0.0023 0.0080 0.009485
25 1 18849 18849 0.8411 0.9911 0.170532
26 2 30923 30923 0.6687 0.9364 0.338400
27 2 20286 46286 0.0053 0.0010 0.006863
28 2 21698 46698 0.0015 0.0010 0.002566
29 2 42159 47159 0.0083 0.0696 0.067187
So I need to subset based on the first two columns. For example, if my window = 500, my chrom = 1 and my pos = 15500, I will need to subset my df to include only those rows that have chrom = 1 and 15000 > pos < 16000.
I would then like to sum the AFR column of this subset of data.
Here is the function I have made:
#vdf is my main dataframe,
#polyChrom is the chromosome to subset by,
#polyPos is the position to subset by.
#Distance is how far the window should be from the polyPos.
#windowSize is the size of the window itself
#E.g. if distance=20000 and windowSize= 500, we are looking at a window
#that is (polyPos-20000)-500 to (polyPos-20000) and a window that is
#(polyPos+20000) to (polyPos+20000)+500.
def mafWindow(vdf, polyChrom, polyPos, distance, windowSize):
#If start position becomes less than 0, set it to 0
if(polyPos - distance < 0):
start1 = 0
end1 = windowSize
else:
start1 = polyPos - distance
end1 = start1 + windowSize
end2 = polyPos + distance
start2 = end2 - windowSize
#subset df
df = vdf.loc[(vdf['chrom'] == polyChrom) & ((vdf['pos'] <= end1) & (vdf['pos'] >= start1))|
((vdf['pos'] <= end2) & (vdf['pos'] >= start2))].copy()
return(df.AFR.sum())
This whole method works on subsetting the dataframe and is very slow when my dataframe contains ~55k rows. Is there a quicker and more efficient way of doing this?
The trick is to drop down to numpy arrays. Pandas indexing and slicing is slow.
import pandas as pd
df = pd.DataFrame([[1, 10177, 0.5], [1, 10178, 0.2], [1, 20178, 0.1],
[2, 10180, 0.3], [1, 10180, 0.4]], columns=['chrom', 'pos', 'AFR'])
chrom = df['chrom'].values
pos = df['pos'].values
afr = df['AFR'].values
def filter_sum(chrom_arr, pos_arr, afr_arr, chrom_val, pos_start, pos_end):
return sum(k for i, j, k in zip(chrom_arr, pos_arr, afr_arr) \
if pos_start < j < pos_end and i == chrom_val)
filter_sum(chrom, pos, afr, 1, 10150, 10200)
# 1.1

iterating through a list error

I have a dataframe,delf
Date inp name
0 2017-08-07 2.3.6 ABC
1 2017-08-07 2.3.6 ABC
2 2017-08-08 2.3.6 TAC
3 2017-08-22 2.5.9 TTT
4 2017-09-23 0.8.0 TAC
5 2017-10-09 2.3.6 ABC
6 2017-10-09 2.3.6 TAC
7 2017-10-09 2.3.6 TAC
8 2017-10-23 0.8.0 TAC
9 2017-11-08 6.2.6 ABC
then another dataframe,trex
2.3.6ABC 2.3.6TAC 2.5.9TTT
August 2 1 0
September 0 0 0
October 1 2 0
November 0 0 1
another dataframe,dher
2.3.6ABC 2.3.6TAC
August 2 1
September 0 0
October 1 2
November 0 0
I want to get the distinct values in column: 'inp' of DELF which in this case is 4. Then, number of columns of TREX and DHER which is 3 and 2 in this case. How can I store the number of columns of the 2 dataframes,TREX and DHER, then getting the percentage TREX and DHER in df1. It should look like this:
noOfColumn pct
TREX 3 3/4=75
DHER 2 2/4=50
I tried using this code:
df_list = [TREX,DHER]
idx, v = [], []
for i, df in enumerate(df_list, 1):
idx.append('{}'.format())
v.append(len(df.columns))
then,
df = pd.DataFrame(v, index=idx, columns=['noOfColumn'])
df['pct'] = df['noOfColumn'] / DELF.inp.nunique()
Not giving me the right output
You can assign name to your df
TREX.name='TREX'
DHER.name='DHER'
df_list = [TREX,DHER]
idx, v = [], []
for i, df in enumerate(df_list, 1):
idx.append(df.name)
v.append(len(df.columns))
df = pd.DataFrame(v, index=idx, columns=['noOfColumn'])
df['pct'] = df['noOfColumn'] / DELF.inp.nunique()
df
Out[55]:
noOfColumn pct
TREX 3 0.75
DHER 2 0.50

Categories