How to add padding zeros to a datetime object in Python?

How to add padding zeros to a datetime object in Python? - python

I have a file which looks like this:
London XXX Europe 2020 9 7 0 0 0 2 2020 9 7 0 11 35 2 57
Tanger XXX Africa 2020 9 7 0 29 54 2 2020 9 7 23 57 16 2 29
Doha XXX Asia 2020 9 7 0 57 23 2 2020 9 7 23 58 48 2 11
I'am trying to combine index 3,4,5,6,7,8 into a datetimeobject with Year, Month, Day, Hour, Minute, Second. I try to do the same with end_time. However, the zeros in my file seem to produce some weird output.
This is my code:
path = r'c:\data\EK\Desktop\Python Microsoft Visual Studio\Extra\test_datetime.txt'
with open(path, 'r') as input_file:
reader = csv.reader(input_file, delimiter='\t')
for row in reader:
start_time = (row[3] + row[4] + row[5] + row[6] + row[7] + row[8])
end_time = (row[10] + row[11] + row[12] + row[13] + row[14] + row[15])
start_time = datetime.datetime.strptime(start_time, "%Y%m%d%H%M%S")
end_time = datetime.datetime.strptime(end_time, "%Y%m%d%H%M%S")
print(start_time)
print(end_time)
This is my current output:
2020-09-07 00:00:00
2020-09-07 01:13:05
2020-09-07 02:09:54
2020-09-07 23:57:16
2020-09-07 05:07:23
2020-09-07 23:58:48
This is my expected output:
2020-09-07 00:00:00
2020-09-07 00:11:35
2020-09-07 00:29:54
2020-09-07 23:57:16
2020-09-07 00:57:23
2020-09-07 23:58:48

The problem is that when you concatenate the fields like row[3] + row[4] + row[5] + row[6] + row[7] + row[8] all the single-digit fields have no leading zeroes, so they aren't parsed properly with strptime().
You could use a string formatting function to add leading zeroes, but there's no reason to use strptime() in the first place. Just call datetime.datetime() to create an object directly from the values.
start_time = datetime.datetime(*map(int, row[3:9]))
end_time = datetime.datetime(*map(int, row[10:16]))

Related

if statement in for loop with pandas dataframes

I am making a Dollar Cost Average code where I want to choose between 2 equations. I made an excel spreadsheet that I'm trying to portover to python. I've gotten pretty far except for the last step. The last step has had me searching for a solution for 3 weeks now. The errors happen when I try a for loop in a df when looping through. I would like to check a column with an if the statement. If is true then do an equation if false do another equation. I can get the for loop to work and I can the if statements to work, but not combined. See all commented out code for whats been tried. I have tried np.where instead of the if statements as well. I have tried .loc. I have tried lamda. I have tried list comp. Nothing is working please help. FYI the code referring is ['trend bal'] column. ***see end with correct code.
What the df looks like:
Index timestamp Open High Low ... rate account bal invested ST_10_1.0 if trend
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... 1.000000 $10,000.00 10000 1 0
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... 0.015242 $10,252.42 10100 1 0
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... 0.005779 $10,411.67 10200 1 0
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... -0.016944 $10,335.25 10300 1 0
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... -0.005739 $10,375.93 10400 1 0
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... 0.005073 $10,528.57 10500 1 0
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... -0.022094 $10,395.95 10600 1 0
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... 0.007872 $10,577.79 10700 1 0
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... 0.018225 $10,870.57 10800 1 0
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... 0.016445 $11,149.33 10900 1 0
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... 0.013307 $11,397.70 11000 1 0
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... 0.020009 $11,725.75 11100 1 0
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... -0.003125 $11,789.11 11200 1 0
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... 0.003227 $11,927.15 11300 1 0
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... -0.021997 $11,764.79 11400 1 0
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... -0.012230 $11,720.92 11500 -1 100
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... 0.038249 $12,269.23 11600 -1 100
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... -0.019393 $12,131.29 11700 1 0
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... 0.022757 $12,507.36 11800 1 0
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... 0.008547 $12,714.25 11900 1 0
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... -0.018705 $12,576.44 12000 1 0
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... -0.003032 $12,638.31 12100 1 0
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... -0.056813 $12,020.29 12200 1 0
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... 0.007710 $12,212.97 12300 -1 100
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... 0.015497 $12,502.23 12400 -1 100
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... -0.018196 $12,374.75 12500 1 0
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... -0.015790 $12,279.35 12600 1 0
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... 0.008227 $12,480.38 12700 1 0
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... -0.012722 $12,421.61 12800 1 0
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... -0.028774 $12,164.19 12900 -1 100
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... 0.061558 $13,012.99 13000 -1 100
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... 0.017911 $13,346.07 13100 1 0
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... 0.000616 $13,454.30 13200 1 0
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... -0.012666 $13,383.88 13300 1 0
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... -0.021320 $13,198.53 13400 1 0
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... -0.027503 $12,935.53 13500 -1 100
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... -0.032738 $12,612.05 13600 -1 100
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... -0.002079 $12,685.83 13700 -1 100
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... -0.024119 $12,479.86 13800 -1 100
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... -0.030451 $12,199.84 13900 -1 100
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... 0.065844 $13,103.12 14000 -1 100
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... -0.011952 $13,046.51 14100 1 0
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... -0.050548 $12,487.03 14200 1 0
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... -0.057941 $11,863.52 14300 -1 100
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... 0.064465 $12,728.31 14400 -1 100
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... -0.022090 $12,547.14 14500 -1 100
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... 0.019358 $12,890.03 14600 -1 100
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... -0.009289 $12,870.29 14700 -1 100
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... 0.025489 $13,298.35 14800 -1 100
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... 0.042573 $13,964.51 14900 1 0
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... 0.003607 $14,114.88 15000 1 0
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... 0.032558 $14,674.44 15100 1 0
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... 0.000839 $14,786.75 15200 1 0
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... -0.012900 $14,696.00 15300 1 0
What it should look like:
Index timestamp Open High Low ... account bal invested ST_10_1.0 if trend trend bal
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... $10,000.00 10000 1 0 $10,000.00
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... $10,252.42 10100 1 0 $10,252.42
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... $10,411.67 10200 1 0 $10,411.67
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... $10,335.25 10300 1 0 $10,335.25
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... $10,375.93 10400 1 0 $10,375.93
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... $10,528.57 10500 1 0 $10,528.57
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... $10,395.95 10600 1 0 $10,395.95
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... $10,577.79 10700 1 0 $10,577.79
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... $10,870.57 10800 1 0 $10,870.57
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... $11,149.33 10900 1 0 $11,149.33
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... $11,397.70 11000 1 0 $11,397.70
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... $11,725.75 11100 1 0 $11,725.75
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... $11,789.11 11200 1 0 $11,789.11
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... $11,927.15 11300 1 0 $11,927.15
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... $11,764.79 11400 1 0 $11,764.79
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... $11,720.92 11500 -1 100 $11,720.92
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... $12,269.23 11600 -1 100 $11,820.92
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... $12,131.29 11700 1 0 $11,920.92
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... $12,507.36 11800 1 0 $12,292.19
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... $12,714.25 11900 1 0 $12,497.25
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... $12,576.44 12000 1 0 $12,363.49
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... $12,638.31 12100 1 0 $12,426.01
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... $12,020.29 12200 1 0 $11,820.05
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... $12,212.97 12300 -1 100 $12,011.19
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... $12,502.23 12400 -1 100 $12,111.19
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... $12,374.75 12500 1 0 $12,211.19
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... $12,279.35 12600 1 0 $12,118.38
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... $12,480.38 12700 1 0 $12,318.08
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... $12,421.61 12800 1 0 $12,261.37
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... $12,164.19 12900 -1 100 $12,008.56
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... $13,012.99 13000 -1 100 $12,108.56
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... $13,346.07 13100 1 0 $12,208.56
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... $13,454.30 13200 1 0 $12,316.09
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... $13,383.88 13300 1 0 $12,260.08
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... $13,198.53 13400 1 0 $12,098.70
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... $12,935.53 13500 -1 100 $11,865.95
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... $12,612.05 13600 -1 100 $11,965.95
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... $12,685.83 13700 -1 100 $12,065.95
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... $12,479.86 13800 -1 100 $12,165.95
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... $12,199.84 13900 -1 100 $12,265.95
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... $13,103.12 14000 -1 100 $12,365.95
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... $13,046.51 14100 1 0 $12,465.95
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... $12,487.03 14200 1 0 $11,935.81
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... $11,863.52 14300 -1 100 $11,344.24
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... $12,728.31 14400 -1 100 $11,444.24
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... $12,547.14 14500 -1 100 $11,544.24
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... $12,890.03 14600 -1 100 $11,644.24
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... $12,870.29 14700 -1 100 $11,744.24
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... $13,298.35 14800 -1 100 $11,844.24
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... $13,964.51 14900 1 0 $11,944.24
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... $14,114.88 15000 1 0 $12,087.33
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... $14,674.44 15100 1 0 $12,580.87
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... $14,786.75 15200 1 0 $12,691.42
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... $14,696.00 15300 1 0 $12,627.70
Python Code:
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
import ccxt
from matplotlib import pyplot as plt
import config
import schedule
import pandas as pd
import pandas_ta as ta
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from datetime import datetime
import time
import yfinance as yf
ticker = yf.Ticker('^GSPC')
df = ticker.history(period="1y", interval="1wk")
df.reset_index(inplace=True)
df.rename(columns = {'Date':'timestamp'}, inplace = True)
#df.drop(columns ={'Open', 'High', 'Low', 'Volume'}, inplace=True, axis=1)
df.drop(columns ={'Dividends', 'Stock Splits'}, inplace=True, axis=1)
# df['Close'].ffill(axis = 0, inplace = True)
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'if trend'] = 0
df.loc[df[f'ST_{ATR}_{Mult}'] == -1, 'if trend'] = weekly
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# if df[f'ST_{ATR}_{Mult}'] == 1:
# df.loc[i, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# else:
# df.loc[i, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
# for i in range(1, len(df)):
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
#df.to_csv('GSPC.csv',index=False,mode='a')
# plt.plot(df['timestamp'], df['account bal'])
# plt.plot(df['timestamp'], df['invested'])
# plt.plot(df['timestamp'], df['close'])
# plt.show()
print(df)
What some errors looks like:
np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
File "<__array_function__ internals>", line 180, in where
ValueError: operands could not be broadcast together with shapes (36,) () (54,)
Another error:
line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
No error but not the correct amounts:
df['trend bal'] = 0
for i in range(1, len(df)):
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
See photo of screenshot of excel formula:
excel spreadsheet
*** Made correct calculations thanks to Ingwersen_erik:
from re import X
import pandas as pd
import pandas_ta as ta
import numpy as np
pd.set_option('display.max_rows', None)
df = pd.read_csv('etcusd.csv')
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
MDD = ((df['account bal']-df['account bal'].max()) / df['account bal'].max()).min()
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df.index == 0, "trend bal"] = invest
for index, row in df.iloc[1:].iterrows():
row['trend bal'] = np.where(
df.loc[index - 1, f'ST_{ATR}_{Mult}'] == 1,
(df.loc[index - 1, 'trend bal'] * (1 + row['rate'])) + weekly,
df.loc[index - 1, 'trend bal'] + weekly,
)
df.loc[df.index == index, 'trend bal'] = row['trend bal']
print(df)

Does this solve your problem?
import time
import ccxt
import warnings
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
warnings.filterwarnings("ignore")
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
invest = 10_000
weekly = 100
fee = 0.15 / 100
fees = 1 - fee
ATR = 10
Mult = 1.0
ticker = yf.Ticker("^GSPC")
df = (
ticker.history(period="1y", interval="1wk")
.reset_index()
.rename(columns={"Date": "timestamp"})
.drop(columns={"Dividends", "Stock Splits"}, errors="ignore")
)
df.loc[df.index == 0, "rate"] = 1
df.loc[df.index > 0, "rate"] = (df["Close"] / df["Close"].shift(1)) - 1
df.loc[df.index == 0, "account bal"] = invest
df.loc[df.index == 0, "account bal"] = invest
for i in range(1, len(df)):
df.loc[i, "account bal"] = (
df.loc[i - 1, "account bal"] * (1 + df.loc[i, "rate"])
) + weekly
df["invested"] = (df.index * weekly) + invest
# Super-trend
ST = ta.supertrend(df["High"], df["Low"], df["Close"], ATR, Mult)
df[f"ST_{ATR}_{Mult}"] = ST[f"SUPERTd_{ATR}_{Mult}"]
df[f"ST_{ATR}_{Mult}"] = df[f"ST_{ATR}_{Mult}"].shift(1).fillna(1)
df.loc[df[f"ST_{ATR}_{Mult}"] == 1, "if trend"] = 0
df.loc[df[f"ST_{ATR}_{Mult}"] == -1, "if trend"] = weekly
df.loc[df.index == 0, "trend bal"] = invest
# === Potential correction to the np.where ==============================
for index, row in df.iloc[1:].iterrows():
row["trend bal"] = np.where(
row[f"ST_{ATR}_{Mult}"] == 1,
(df.loc[index - 1, "trend bal"] * (1 + row["rate"])) + weekly,
df.loc[index - 1, "trend bal"] + row["if trend"],
)
# NOTE: The original "otherwise" clause from `np.where` had the
# following value: `df.loc[index - index, "trend bal"] + ...`
# I assumed you meant `index -1`, instead of `index - index`,
# therefore the above code uses `index -1`. If you really meant
# `index - index`, please change the code accordingly.
df.loc[df.index == index, "trend bal"] = row["trend bal"]
df
Result:
timestamp
Open
High
Low
Close
Volume
rate
account bal
invested
ST_10_1.0
if trend
trend bal
2021-08-16
4382.44
4444.35
4367.73
4441.67
5988610000
1
10000
10000
1
0
10000
2021-08-23
4450.29
4513.33
4450.29
4509.37
14124930000
0.0152421
10252.4
10100
1
0
10252.4
2021-08-30
4513.76
4545.85
4513.76
4535.43
14256180000
0.00577909
10411.7
10200
1
0
10411.7
2021-09-06
4535.38
4535.38
4457.66
4458.58
11793790000
-0.0169444
10335.3
10300
1
0
10335.3
2021-09-13
4474.81
4492.99
4427.76
4432.99
17763120000
-0.00573946
10375.9
10400
1
0
10375.9
2021-09-20
4402.95
4465.4
4305.91
4455.48
15697030000
0.00507327
10528.6
10500
1
0
10528.6
2021-09-27
4442.12
4457.3
4288.52
4357.04
15555390000
-0.0220941
10396
10600
1
0
10396
2021-10-04
4348.84
4429.97
4278.94
4391.34
14795520000
0.00787227
10577.8
10700
1
0
10577.8
2021-10-11
4385.44
4475.82
4329.92
4471.37
13758090000
0.0182246
10870.6
10800
1
0
10870.6
2021-10-18
4463.72
4559.67
4447.47
4544.9
13966070000
0.0164446
11149.3
10900
1
0
11149.3
2021-10-25
4553.69
4608.08
4537.36
4605.38
16206040000
0.0133072
11397.7
11000
1
0
11397.7
2021-11-01
4610.62
4718.5
4595.06
4697.53
16397220000
0.0200092
11725.8
11100
1
0
11725.8
2021-11-08
4701.48
4714.92
4630.86
4682.85
15646510000
-0.00312498
11789.1
11200
1
0
11789.1
2021-11-15
4689.3
4717.75
4672.78
4697.96
15279660000
0.00322664
11927.2
11300
1
0
11927.2
2021-11-22
4712
4743.83
4585.43
4594.62
11775840000
-0.0219967
11764.8
11400
1
0
11764.8
2021-11-29
4628.75
4672.95
4495.12
4538.43
20242840000
-0.0122295
11720.9
11500
-1
100
11864.8
2021-12-06
4548.37
4713.57
4540.51
4712.02
15411530000
0.0382489
12269.2
11600
-1
100
11964.8
2021-12-13
4710.3
4731.99
4600.22
4620.64
19184960000
-0.0193929
12131.3
11700
1
0
11832.8
2021-12-20
4587.9
4740.74
4531.1
4725.79
10594350000
0.0227566
12507.4
11800
1
0
12202
2021-12-27
4733.99
4808.93
4733.99
4766.18
11687720000
0.00854675
12714.3
11900
1
0
12406.3
2022-01-03
4778.14
4818.62
4662.74
4677.03
16800900000
-0.0187048
12576.4
12000
1
0
12274.3
2022-01-10
4655.34
4748.83
4582.24
4662.85
17126800000
-0.00303177
12638.3
12100
1
0
12337.1
2022-01-17
4632.24
4632.24
4395.34
4397.94
14131200000
-0.0568129
12020.3
12200
1
0
11736.1
2022-01-24
4356.32
4453.23
4222.62
4431.85
21218590000
0.00771046
12213
12300
-1
100
11836.1
2022-01-31
4431.79
4595.31
4414.02
4500.53
18846100000
0.0154968
12502.2
12400
-1
100
11936.1
2022-02-07
4505.75
4590.03
4401.41
4418.64
19119200000
-0.0181956
12374.7
12500
1
0
11819
2022-02-14
4412.61
4489.55
4327.22
4348.87
17775970000
-0.0157899
12279.4
12600
1
0
11732.3
2022-02-21
4332.74
4385.34
4114.65
4384.65
16834460000
0.00822737
12480.4
12700
1
0
11928.9
2022-02-28
4354.17
4416.78
4279.54
4328.87
22302830000
-0.0127216
12421.6
12800
1
0
11877.1
2022-03-07
4327.01
4327.01
4157.87
4204.31
23849630000
-0.0287743
12164.2
12900
-1
100
11977.1
2022-03-14
4202.75
4465.4
4161.72
4463.12
24946690000
0.0615583
13013
13000
-1
100
12077.1
2022-03-21
4462.4
4546.03
4424.3
4543.06
19089240000
0.0179112
13346.1
13100
1
0
12393.4
2022-03-28
4541.09
4637.3
4507.57
4545.86
19212230000
0.000616282
13454.3
13200
1
0
12501.1
2022-04-04
4547.97
4593.45
4450.04
4488.28
19383860000
-0.0126665
13383.9
13300
1
0
12442.7
2022-04-11
4462.64
4471
4381.34
4392.59
13812410000
-0.02132
13198.5
13400
1
0
12277.4
2022-04-18
4385.63
4512.94
4267.62
4271.78
18149540000
-0.0275032
12935.5
13500
-1
100
12377.4
2022-04-25
4255.34
4308.45
4124.28
4131.93
19610750000
-0.032738
12612
13600
-1
100
12477.4
2022-05-02
4130.61
4307.66
4062.51
4123.34
21039720000
-0.00207901
12685.8
13700
-1
100
12577.4
2022-05-09
4081.27
4081.27
3858.87
4023.89
23166570000
-0.0241188
12479.9
13800
-1
100
12677.4
2022-05-16
4013.02
4090.72
3810.32
3901.36
20590520000
-0.0304506
12199.8
13900
-1
100
12777.4
2022-05-23
3919.42
4158.49
3875.13
4158.24
19139100000
0.0658437
13103.1
14000
-1
100
12877.4
2022-05-30
4151.09
4177.51
4073.85
4108.54
16049940000
-0.0119522
13046.5
14100
1
0
12823.5
2022-06-06
4134.72
4168.78
3900.16
3900.86
17547150000
-0.0505484
12487
14200
1
0
12275.3
2022-06-13
3838.15
3838.15
3636.87
3674.84
24639140000
-0.0579411
11863.5
14300
-1
100
12375.3
2022-06-20
3715.31
3913.65
3715.31
3911.74
19287840000
0.0644654
12728.3
14400
-1
100
12475.3
2022-06-27
3920.76
3945.86
3738.67
3825.33
17735450000
-0.0220899
12547.1
14500
-1
100
12575.3
2022-07-04
3792.61
3918.5
3742.06
3899.38
14223350000
0.0193578
12890
14600
-1
100
12675.3
2022-07-11
3880.94
3880.94
3721.56
3863.16
16313500000
-0.00928865
12870.3
14700
-1
100
12775.3
2022-07-18
3883.79
4012.44
3818.63
3961.63
16859220000
0.0254895
13298.4
14800
-1
100
12875.3
2022-07-25
3965.72
4140.15
3910.74
4130.29
17356830000
0.0425734
13964.5
14900
1
0
13523.5
2022-08-01
4112.38
4167.66
4079.81
4145.19
18072230000
0.00360747
14114.9
15000
1
0
13672.3
2022-08-08
4155.93
4280.47
4112.09
4280.15
18117740000
0.0325582
14674.4
15100
1
0
14217.4
2022-08-15
4269.37
4325.28
4218.7
4228.48
16255850000
-0.012072
14597.3
15200
1
0
14145.8
2022-08-19
4266.31
4266.31
4218.7
4228.48
2045645000
0
14697.3
15300
1
0
14245.8

Adding Spotify Data from zenodo in a DataFrame

I want to add all the data from charts.zip from https://doi.org/10.5281/zenodo.4778562 in a single DataFrame. The data consist of a file per year that contains multiple CSVs. I made the following code:
header = 0
dfs = []
for file in glob.glob('Charts/*/201?/*.csv'):
region = file.split('/')[1]
dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
weekly_chart = pd.read_csv(file, header=header, sep='\t')
weekly_chart['week_start'] = datetime.strptime(dates[0], '%Y-%m-%d')
weekly_chart['week_end'] = datetime.strptime(dates[1], '%Y-%m-%d')
weekly_chart['region'] = region
dfs.append(weekly_chart)
all_charts = pd.concat(dfs)
But, when I run it, python returns:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12886/3473678833.py in <module>
9 weekly_chart['region'] = region
10 dfs.append(weekly_chart)
---> 11 all_charts = pd.concat(dfs)
~/Downloads/enter/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
~/Downloads/enter/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
344 ValueError: Indexes have overlapping values: ['a']
345 """
--> 346 op = _Concatenator(
347 objs,
348 axis=axis,
~/Downloads/enter/lib/python3.9/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
401
402 if len(objs) == 0:
--> 403 raise ValueError("No objects to concatenate")
404
405 if keys is None:
ValueError: No objects to concatenate
How can I fix it?

I think the glob.glob might just be over complicating things... This works perfectly for me.
# Gives you a list of EVERY file in the Charts directory
# and sub directories that is a CSV
file_list = []
for path, subdirs, files in os.walk("Charts"):
file_list.extend([os.path.join(path, x) for x in files if x.endswith('.csv')])
dfs = []
for file in file_list:
region = file.split('/')[1]
dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
df = pd.read_csv(file, sep='\t')
df['week_start'] = dates[0]
df['week_end'] = dates[1]
df['region'] = region
dfs.append(df)
all_charts = pd.concat(dfs, ignore_index=True)
print(all_charts)
Output:
position song_id song_name artist streams ... peak_position position_status week_start week_end region
0 1 7wGoVu4Dady5GV0Sv4UIsx rockstar Post Malone 17532665 ... 1 0 2017-10-20 2017-10-27 us
1 2 75ZvA4QfFiZvzhj2xkaWAh I Fall Apart Post Malone 8350785 ... 2 0 2017-10-20 2017-10-27 us
2 3 2fQrGHiQOvpL9UgPvtYy6G Bank Account 21 Savage 7589124 ... 3 1 2017-10-20 2017-10-27 us
3 4 43ZyHQITOjhciSUUNPVRHc Gucci Gang Lil Pump 7584237 ... 4 1 2017-10-20 2017-10-27 us
4 5 5tz69p7tJuGPeMGwNTxYuV 1-800-273-8255 Logic 7527770 ... 1 -2 2017-10-20 2017-10-27 us
... ... ... ... ... ... ... ... ... ... ... ...
273595 196 6kex4EBAj0WHXDKZMEJaaF Swalla (feat. Nicki Minaj & Ty Dolla $ign) Jason Derulo 3747830 ... 8 -5 2018-03-02 2018-03-09 global
273596 197 0CokSRCu5hZgPxcZBaEzVE Glorious (feat. Skylar Grey) Macklemore 3725286 ... 14 -8 2018-03-02 2018-03-09 global
273597 198 7oK9VyNzrYvRFo7nQEYkWN Mr. Brightside The Killers 3717326 ... 148 -3 2018-03-02 2018-03-09 global
273598 199 7EUfNvyCVxQV3oN5ScA2Lb Next To Me Imagine Dragons 3681739 ... 122 -77 2018-03-02 2018-03-09 global
273599 200 6u0EAxf1OJTLS7CvInuNd7 Vai malandra (feat. Tropkillaz & DJ Yuri Martins) Anitta 3676542 ... 30 -23 2018-03-02 2018-03-09 global
If you really want the dates to be dates, you can run this on those two columns at the end.
all_charts['week_start'] = pd.to_datetime(all_charts['week_start'])
Personally, I'd also do the following:
all_charts['week_start'] = pd.to_datetime(all_charts['week_start'])
all_charts['week_end'] = pd.to_datetime(all_charts['week_end'])
all_charts['region'] = all_charts['region'].astype('category')
all_charts['artist'] = all_charts['artist'].astype('category')
all_charts['song_name'] = all_charts['song_name'].astype('category')
all_charts['song_id'] = all_charts['song_id'].astype('category')
all_charts.set_index(['region', 'week_start', 'week_end', 'position'], inplace=True)
all_charts.position_status = pd.to_numeric(all_charts.position_status, errors='coerce')
print(df.head(10))
Giving:
song_id song_name artist streams last_week_position weeks_on_chart peak_position position_status
region week_start week_end position
us 2017-10-20 2017-10-27 1 7wGoVu4Dady5GV0Sv4UIsx rockstar Post Malone 17532665 1.0 3 1 0.0
2 75ZvA4QfFiZvzhj2xkaWAh I Fall Apart Post Malone 8350785 2.0 6 2 0.0
3 2fQrGHiQOvpL9UgPvtYy6G Bank Account 21 Savage 7589124 4.0 5 3 1.0
4 43ZyHQITOjhciSUUNPVRHc Gucci Gang Lil Pump 7584237 5.0 3 4 1.0
5 5tz69p7tJuGPeMGwNTxYuV 1-800-273-8255 Logic 7527770 3.0 26 1 -2.0
6 5Gd19NupVe5X8bAqxf9Iaz Gorgeous Taylor Swift 6940802 NaN 1 6 NaN
7 0ofbQMrRDsUaVKq2mGLEAb Havana Camila Cabello 6623184 10.0 12 7 3.0
8 2771LMNxwf62FTAdpJMQfM Bodak Yellow Cardi B 6472727 6.0 14 3 -2.0
9 5Z3GHaZ6ec9bsiI5BenrbY Young Dumb & Broke Khalid 5982108 9.0 29 6 0.0
10 7GX5flRQZVHRAGd6B4TmDO XO Tour Llif3 Lil Uzi Vert 5822583 8.0 9 2 -2.0

How to speed up this python script with multiprocessing

I have a script that get data from a dataframe, use those data to make a request to a website, using fuzzywuzzy module find the exact href and then runs a function to scrape odds. I would speed up this script with the multiprocessing module, it is possible?
Date HomeTeam AwayTeam
0 Monday 6 December 2021 20:00 Everton Arsenal
1 Monday 6 December 2021 17:30 Empoli Udinese
2 Monday 6 December 2021 19:45 Cagliari Torino
3 Monday 6 December 2021 20:00 Getafe Athletic Bilbao
4 Monday 6 December 2021 15:00 Real Zaragoza Eibar
5 Monday 6 December 2021 17:15 Cartagena Tenerife
6 Monday 6 December 2021 20:00 Girona Leganes
7 Monday 6 December 2021 19:45 Niort Toulouse
8 Monday 6 December 2021 19:00 Jong Ajax FC Emmen
9 Monday 6 December 2021 19:00 Jong AZ Excelsior
Script
df = pd.read_excel(path)
dates = df.Date
hometeams = df.HomeTeam
awayteams = df.AwayTeam
matches_odds = list()
for i,(a,b,c) in enumerate(zip(dates, hometeams, awayteams)):
try:
r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
except requests.exceptions.ConnectionError:
sleep(10)
r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
soup = BeautifulSoup(r.text, 'html.parser')
f = soup.find_all('td', class_="table-main__tt")
for tag in f:
match = fuzz.ratio(f'{b} - {c}', tag.find('a').text)
hour = a.split(" ")[4]
if hour.split(':')[0] == '23':
act_hour = '00' + ':' + hour.split(':')[1]
else:
act_hour = str(int(hour.split(':')[0]) + 1) + ':' + hour.split(':')[1]
if match > 70 and act_hour == tag.find('span').text:
href_id = tag.find('a')['href']
table = get_odds(href_id)
matches_odds.append(table)
print(i, ' of ', len(dates))
PS: The monthToNum function just replace the month name to his number

First, you make a function of your loop body with inputs i, a, b and c. Then, you create a multiprocessing.Pool and submit this function with the proper arguments (i, a, b, c) to the pool.
import multiprocessing
df = pd.read_excel(path)
dates = df.Date
hometeams = df.HomeTeam
awayteams = df.AwayTeam
matches_odds = list()
def fetch(data):
i, (a, b, c) = data
try:
r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
except requests.exceptions.ConnectionError:
sleep(10)
r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
soup = BeautifulSoup(r.text, 'html.parser')
f = soup.find_all('td', class_="table-main__tt")
for tag in f:
match = fuzz.ratio(f'{b} - {c}', tag.find('a').text)
hour = a.split(" ")[4]
if hour.split(':')[0] == '23':
act_hour = '00' + ':' + hour.split(':')[1]
else:
act_hour = str(int(hour.split(':')[0]) + 1) + ':' + hour.split(':')[1]
if match > 70 and act_hour == tag.find('span').text:
href_id = tag.find('a')['href']
table = get_odds(href_id)
matches_odds.append(table)
print(i, ' of ', len(dates))
if __name__ == '__main__':
num_processes = 20
with multiprocessing.Pool(num_processes) as pool:
pool.map(fetch, enumerate(zip(dates, hometeams, awayteams)))
Besides, multiprocessing is not the only way to improve the speed. Asynchronous programming can be used as well and is probably better for this scenario, although multiprocessing does the job, too - just want to mention that.
If carefully read the Python multiprocessing documentation, then it'll be obvious.

Merging computed file contents and display previous computed data in output

I am working to 2 files, oldFile.txt and newFile.txt and compute some changes between them. The newFile.txt is updated constantly and any updates will be written to oldFile.txt
I am trying to improve the snippet below by saving previous computed values and add it to a finalOutput.txt. Any idea will be very helpful to accomplish the needed output. Thank you in advance.
import pandas as pd
from time import sleep
def read_file(fn):
data = {}
with open(fn, 'r') as f:
for lines in f:
line = lines.rstrip()
pname, cnt, cat = line.split(maxsplit=2)
data.update({pname: {'pname': pname, 'cnt': int(cnt), 'cat': cat}})
return data
def process_data(oldfn, newfn):
old = read_file(oldfn)
new = read_file(newfn)
u_data = {}
for ko, vo in old.items():
if ko in new:
n = new[ko]
old_cnt = vo['cnt']
new_cnt = n['cnt']
u_cnt = old_cnt + new_cnt
tmp_old_cnt = 1 if old_cnt == 0 else old_cnt
cnt_change = 100 * (new_cnt - tmp_old_cnt) / tmp_old_cnt
u_data.update({ko: {'pname': n['pname'], 'cnt': new_cnt, 'cat': n['cat'],
'curr_change%': round(cnt_change, 0)}})
for kn, vn in new.items():
if kn not in old:
old_cnt = 1
new_cnt = vn['cnt']
cnt_change = 0
vn.update({'cnt_change': round(cnt_change, 0)})
u_data.update({kn: vn})
pd.options.display.float_format = "{:,.0f}".format
mydata = []
for _, v in u_data.items():
mydata.append(v)
df = pd.DataFrame(mydata)
df = df.sort_values(by=['cnt'], ascending=False)
# Save to text file.
with open('finalOutput.txt', 'w') as w:
w.write(df.to_string(header=None, index=False))
# Overwrite oldFile.txt
with open('oldFile.txt', 'w') as w:
w.write(df.to_string(header=None, index=False))
# Print in console.
df.insert(0, '#', range(1, 1 + len(df)))
print(df.to_string(index=False,header=True))
while True:
oldfn = './oldFile.txt'
newfn = './newFile.txt'
process_data(oldfn, newfn)
sleep(60)
oldFile.txt
e6c76e4810a464bc 1 Hello(HLL)
65b66cc4e81ac81d 2 CryptoCars (CCAR)
c42d0c924df124ce 3 GoldNugget (NGT)
ee70ad06df3d2657 4 BabySwap (BABY)
e5b7ebc589ea9ed8 8 Heroes&E... (HE)
7e7e9d75f5da2377 3 Robox (RBOX)
newfile.txt #-- content during 1st reading
e6c76e4810a464bc 34 Hello(HLL)
65b66cc4e81ac81d 43 CryptoCars (CCAR)
c42d0c924df124ce 95 GoldNugget (NGT)
ee70ad06df3d2657 15 BabySwap (BABY)
e5b7ebc589ea9ed8 37 Heroes&E... (HE)
7e7e9d75f5da2377 23 Robox (RBOX)
755507d18913a944 49 CharliesFactory
newfile.txt #-- content during 2nd reading
924dfc924df1242d 35 AeroDie (ADie)
e6c76e4810a464bc 34 Hello(HLL)
65b66cc4e81ac81d 73 CryptoCars (CCAR)
c42d0c924df124ce 15 GoldNugget (NGT)
ee70ad06df3d2657 5 BabySwap (BABY)
e5b7ebc589ea9ed8 12 Heroes&E... (HE)
7e7e9d75f5da2377 19 Robox (RBOX)
755507d18913a944 169 CharliesFactory
newfile.txt # content during 3rd reading
924dfc924df1242d 45 AeroDie (ADie)
e6c76e4810a464bc 2 Hello(HLL)
65b66cc4e81ac81d 4 CryptoCars (CCAR)
c42d0c924df124ce 7 GoldNugget (NGT)
ee70ad06df3d2657 5 BabySwap (BABY)
e5b7ebc589ea9ed8 3 Heroes&E... (HE)
7e7e9d75f5da2377 6 Robox (RBOX)
755507d18913a944 9 CharliesFactory
oldFile.txt #-- Current output that needs improvement
# pname cnt cat curr_change%
1 924dfc924df1242d 35 AeroDie (ADie) 29
2 755507d18913a944 9 CharliesFactory -95
3 c42d0c924df124ce 7 GoldNugget (NGT) -53
4 7e7e9d75f5da2377 6 Robox (RBOX) -68
5 ee70ad06df3d2657 5 BabySwap (BABY) 0
6 65b66cc4e81ac81d 4 CryptoCars (CCAR) -95
7 e5b7ebc589ea9ed8 3 Heroes&E... (HE) -75
8 e6c76e4810a464bc 2 Hello(HLL) -94
finalOutput.txt #-- Needed Improved Output with additional columns r1, r2 and so on depending on how many update readings
# curr_change% is the latest 3rd reading
# r2% is based on the 2nd reading
# r1% is based on the 1st reading
# pname cnt cat curr_change% r2% r1%
1 924dfc924df1242d 35 AeroDie (ADie) 29 0 0
2 755507d18913a944 9 CharliesFactory -95 245 0
3 c42d0c924df124ce 7 GoldNugget (NGT) -53 -84 3,067
4 7e7e9d75f5da2377 6 Robox (RBOX) -68 -17 667
5 ee70ad06df3d2657 5 BabySwap (BABY) 0 -67 275
6 65b66cc4e81ac81d 4 CryptoCars (CCAR) -95 70 2,050
7 e5b7ebc589ea9ed8 3 Heroes&E... (HE) -75 -68 362
8 e6c76e4810a464bc 2 Hello(HLL) -94 0 3,300

Updated for feedback, I made adjustments so that it would handle data that was fed to it live. Whenever new data is loaded, load the file name into process_new_file() function, and it will update the 'finalOutput.txt'.
For simplicity, I named the different files file1, file2, file3, and file4.
I'm doing most of the operations using the pandas Dataframe. I think working with Pandas DataFrames will make the task a lot easier for you.
Overall, I created one function to read the file and return a properly formatted DataFrame. I created a second function that compares the old and the new file and does the calculation you were looking for. I merge together the results of these calculations. Finally, I merge all of these calculations with the last file's data to get the output you're looking for.
import pandas as pd
global global_old_df
global results_df
global count
global_old_df = None
results_df = pd.DataFrame()
count = 0
def read_file(file_name):
rows = []
with open(file_name) as f:
for line in f:
rows.append(line.split(" ", 2))
df = pd.DataFrame(rows, columns=['pname', 'cnt', 'cat'])
df['cat'] = df['cat'].str.strip()
df['cnt'] = df['cnt'].astype(float)
return df
def compare_dfs(df_old, df_new, count):
df_ = df_old.merge(df_new, on=['pname', 'cat'], how='outer')
df_['r%s' % count] = (df_['cnt_y'] / df_['cnt_x'] - 1) * 100
df_ = df_[['pname', 'r%s' % count]]
df_ = df_.set_index('pname')
return df_
def process_new_file(file):
global global_old_df
global results_df
global count
df_new = read_file(file)
if global_old_df is None:
global_old_df = df_new
return
else:
count += 1
r_df = compare_dfs(global_old_df, df_new, count)
results_df = pd.concat([r_df, results_df], axis=1)
global_old_df = df_new
output_df = df_new.merge(results_df, left_on='pname', right_index=True)
output_df.to_csv('finalOutput.txt')
pd.options.display.float_format = "{:,.1f}".format
print(output_df.to_string())
files = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt']
for file in files:
process_new_file(file)
This gives the output:
pname cnt cat r3 r2 r1
0 924dfc924df1242d 45.0 AeroDie (ADie) 28.6 NaN NaN
1 e6c76e4810a464bc 2.0 Hello(HLL) -94.1 0.0 3,300.0
2 65b66cc4e81ac81d 4.0 CryptoCars (CCAR) -94.5 69.8 2,050.0
3 c42d0c924df124ce 7.0 GoldNugget (NGT) -53.3 -84.2 3,066.7
4 ee70ad06df3d2657 5.0 BabySwap (BABY) 0.0 -66.7 275.0
5 e5b7ebc589ea9ed8 3.0 Heroes&E... (HE) -75.0 -67.6 362.5
6 7e7e9d75f5da2377 6.0 Robox (RBOX) -68.4 -17.4 666.7
7 755507d18913a944 9.0 CharliesFactory -94.7 244.9 NaN
So, to run it live, you'd just replace that last section with:
while True:
newfn = './newFile.txt'
process_new_file(newfn)
sleep(60)

How to convert epoch time to GMT + 7 time in pandas dataframe?

I have a pandas dataframe that has a column created_date which is in epoch format. I wanted to use a filter condition as shown below.
Dataframe sample
created_time updated_time sys_time last_action_time account_id \
0 1624473000000 1624459148023 1624459148023 0 812
1 1624471920000 1624448094358 1624448094358 0 812
2 1624469400000 1624455267579 1624455267579 0 812
3 1624466580000 1624466620020 1624466590321 0 812
4 1624466529000 1624466610222 1624466540086 0 812
5 1624466501000 1624466610270 1624466510212 0 812
6 1624466461000 1624466620149 1624466469825 0 812
7 1624466443000 1624466446558 1624466446558 0 812
8 1624466435000 1624466460213 1624466460213 0 812
daily_data_df = [(data_df['created_time'] >= start_date_int) & (data_df['created_time'] < end_date_int)
Where,
start_date_int & end_date_int is GMT+7 timezone
created_time is epoch format
Please help me with the conversion.

First : Strip last 3 digits from column "created_time", it seems that epoch lengh is only 9-10 and you have 13 :
df['created_time'] = df['created_time'].astype(str).apply(lambda x: x[:-3])
Second : Convert from Unix epoch to datetime :
df['created_time'] = pd.to_datetime(df['created_time'], unit = 's')
Third : Filter date range (below sample date range) :
start_date_int = '2021-06-23 18:12:00'
end_date_int = '2021-06-23 18:30:00'
df_filterd = df[(df['created_time'] >= start_date_int) &\
(df['created_time'] < end_date_int)]
*Alternative method for filter :
df_filterd = df[df['created_time'].between(start_date_int , end_date_int)]

You can convert the epoch time to GMT+7 using pd.to_datetime() and dt.tz_convert(), as follows:
data_df['created_GMT+7'] = pd.to_datetime(data_df['created_time'], unit='ms', utc=True).dt.tz_convert('Etc/GMT+7')
Result:
print(data_df['created_GMT+7'])
0 2021-06-23 11:30:00-07:00
1 2021-06-23 11:12:00-07:00
2 2021-06-23 10:30:00-07:00
3 2021-06-23 09:43:00-07:00
4 2021-06-23 09:42:09-07:00
5 2021-06-23 09:41:41-07:00
6 2021-06-23 09:41:01-07:00
7 2021-06-23 09:40:43-07:00
8 2021-06-23 09:40:35-07:00
Name: created_GMT+7, dtype: datetime64[ns, Etc/GMT+7]
Then, filter the rows as follows:
start_date_int = 1624466460500
end_date_int = 1624469402000
mask = data_df['created_GMT+7'].between(pd.Timestamp(start_date_int, unit='ms', tz='Etc/GMT+7'), pd.Timestamp(end_date_int, unit='ms', tz='Etc/GMT+7'))
daily_data_df = data_df.loc[mask]
Or,
start_date_int = 1624466460500
end_date_int = 1624469402000
mask = ((data_df['created_GMT+7'] - pd.Timestamp("1970-01-01", tz='Etc/GMT+7')) // pd.Timedelta('1ms')).between(start_date_int, end_date_int)
daily_data_df = data_df.loc[mask]
Result:
(using the sample start_date_int and end_date_int above)
print(daily_data_df)
created_time updated_time sys_time last_action_time account_id created_GMT+7
2 1624469400000 1624455267579 1624455267579 0 812 2021-06-23 10:30:00-07:00
3 1624466580000 1624466620020 1624466590321 0 812 2021-06-23 09:43:00-07:00
4 1624466529000 1624466610222 1624466540086 0 812 2021-06-23 09:42:09-07:00
5 1624466501000 1624466610270 1624466510212 0 812 2021-06-23 09:41:41-07:00
6 1624466461000 1624466620149 1624466469825 0 812 2021-06-23 09:41:01-07:00

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add padding zeros to a datetime object in Python? - python

Related

if statement in for loop with pandas dataframes

Adding Spotify Data from zenodo in a DataFrame

How to speed up this python script with multiprocessing

Merging computed file contents and display previous computed data in output

How to convert epoch time to GMT + 7 time in pandas dataframe?

Categories

Resources