I am making a Dollar Cost Average code where I want to choose between 2 equations. I made an excel spreadsheet that I'm trying to portover to python. I've gotten pretty far except for the last step. The last step has had me searching for a solution for 3 weeks now. The errors happen when I try a for loop in a df when looping through. I would like to check a column with an if the statement. If is true then do an equation if false do another equation. I can get the for loop to work and I can the if statements to work, but not combined. See all commented out code for whats been tried. I have tried np.where instead of the if statements as well. I have tried .loc. I have tried lamda. I have tried list comp. Nothing is working please help. FYI the code referring is ['trend bal'] column. ***see end with correct code.
What the df looks like:
Index timestamp Open High Low ... rate account bal invested ST_10_1.0 if trend
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... 1.000000 $10,000.00 10000 1 0
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... 0.015242 $10,252.42 10100 1 0
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... 0.005779 $10,411.67 10200 1 0
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... -0.016944 $10,335.25 10300 1 0
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... -0.005739 $10,375.93 10400 1 0
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... 0.005073 $10,528.57 10500 1 0
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... -0.022094 $10,395.95 10600 1 0
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... 0.007872 $10,577.79 10700 1 0
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... 0.018225 $10,870.57 10800 1 0
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... 0.016445 $11,149.33 10900 1 0
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... 0.013307 $11,397.70 11000 1 0
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... 0.020009 $11,725.75 11100 1 0
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... -0.003125 $11,789.11 11200 1 0
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... 0.003227 $11,927.15 11300 1 0
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... -0.021997 $11,764.79 11400 1 0
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... -0.012230 $11,720.92 11500 -1 100
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... 0.038249 $12,269.23 11600 -1 100
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... -0.019393 $12,131.29 11700 1 0
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... 0.022757 $12,507.36 11800 1 0
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... 0.008547 $12,714.25 11900 1 0
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... -0.018705 $12,576.44 12000 1 0
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... -0.003032 $12,638.31 12100 1 0
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... -0.056813 $12,020.29 12200 1 0
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... 0.007710 $12,212.97 12300 -1 100
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... 0.015497 $12,502.23 12400 -1 100
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... -0.018196 $12,374.75 12500 1 0
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... -0.015790 $12,279.35 12600 1 0
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... 0.008227 $12,480.38 12700 1 0
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... -0.012722 $12,421.61 12800 1 0
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... -0.028774 $12,164.19 12900 -1 100
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... 0.061558 $13,012.99 13000 -1 100
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... 0.017911 $13,346.07 13100 1 0
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... 0.000616 $13,454.30 13200 1 0
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... -0.012666 $13,383.88 13300 1 0
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... -0.021320 $13,198.53 13400 1 0
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... -0.027503 $12,935.53 13500 -1 100
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... -0.032738 $12,612.05 13600 -1 100
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... -0.002079 $12,685.83 13700 -1 100
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... -0.024119 $12,479.86 13800 -1 100
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... -0.030451 $12,199.84 13900 -1 100
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... 0.065844 $13,103.12 14000 -1 100
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... -0.011952 $13,046.51 14100 1 0
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... -0.050548 $12,487.03 14200 1 0
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... -0.057941 $11,863.52 14300 -1 100
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... 0.064465 $12,728.31 14400 -1 100
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... -0.022090 $12,547.14 14500 -1 100
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... 0.019358 $12,890.03 14600 -1 100
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... -0.009289 $12,870.29 14700 -1 100
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... 0.025489 $13,298.35 14800 -1 100
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... 0.042573 $13,964.51 14900 1 0
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... 0.003607 $14,114.88 15000 1 0
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... 0.032558 $14,674.44 15100 1 0
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... 0.000839 $14,786.75 15200 1 0
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... -0.012900 $14,696.00 15300 1 0
What it should look like:
Index timestamp Open High Low ... account bal invested ST_10_1.0 if trend trend bal
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... $10,000.00 10000 1 0 $10,000.00
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... $10,252.42 10100 1 0 $10,252.42
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... $10,411.67 10200 1 0 $10,411.67
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... $10,335.25 10300 1 0 $10,335.25
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... $10,375.93 10400 1 0 $10,375.93
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... $10,528.57 10500 1 0 $10,528.57
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... $10,395.95 10600 1 0 $10,395.95
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... $10,577.79 10700 1 0 $10,577.79
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... $10,870.57 10800 1 0 $10,870.57
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... $11,149.33 10900 1 0 $11,149.33
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... $11,397.70 11000 1 0 $11,397.70
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... $11,725.75 11100 1 0 $11,725.75
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... $11,789.11 11200 1 0 $11,789.11
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... $11,927.15 11300 1 0 $11,927.15
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... $11,764.79 11400 1 0 $11,764.79
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... $11,720.92 11500 -1 100 $11,720.92
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... $12,269.23 11600 -1 100 $11,820.92
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... $12,131.29 11700 1 0 $11,920.92
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... $12,507.36 11800 1 0 $12,292.19
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... $12,714.25 11900 1 0 $12,497.25
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... $12,576.44 12000 1 0 $12,363.49
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... $12,638.31 12100 1 0 $12,426.01
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... $12,020.29 12200 1 0 $11,820.05
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... $12,212.97 12300 -1 100 $12,011.19
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... $12,502.23 12400 -1 100 $12,111.19
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... $12,374.75 12500 1 0 $12,211.19
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... $12,279.35 12600 1 0 $12,118.38
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... $12,480.38 12700 1 0 $12,318.08
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... $12,421.61 12800 1 0 $12,261.37
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... $12,164.19 12900 -1 100 $12,008.56
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... $13,012.99 13000 -1 100 $12,108.56
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... $13,346.07 13100 1 0 $12,208.56
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... $13,454.30 13200 1 0 $12,316.09
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... $13,383.88 13300 1 0 $12,260.08
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... $13,198.53 13400 1 0 $12,098.70
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... $12,935.53 13500 -1 100 $11,865.95
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... $12,612.05 13600 -1 100 $11,965.95
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... $12,685.83 13700 -1 100 $12,065.95
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... $12,479.86 13800 -1 100 $12,165.95
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... $12,199.84 13900 -1 100 $12,265.95
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... $13,103.12 14000 -1 100 $12,365.95
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... $13,046.51 14100 1 0 $12,465.95
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... $12,487.03 14200 1 0 $11,935.81
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... $11,863.52 14300 -1 100 $11,344.24
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... $12,728.31 14400 -1 100 $11,444.24
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... $12,547.14 14500 -1 100 $11,544.24
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... $12,890.03 14600 -1 100 $11,644.24
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... $12,870.29 14700 -1 100 $11,744.24
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... $13,298.35 14800 -1 100 $11,844.24
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... $13,964.51 14900 1 0 $11,944.24
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... $14,114.88 15000 1 0 $12,087.33
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... $14,674.44 15100 1 0 $12,580.87
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... $14,786.75 15200 1 0 $12,691.42
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... $14,696.00 15300 1 0 $12,627.70
Python Code:
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
import ccxt
from matplotlib import pyplot as plt
import config
import schedule
import pandas as pd
import pandas_ta as ta
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from datetime import datetime
import time
import yfinance as yf
ticker = yf.Ticker('^GSPC')
df = ticker.history(period="1y", interval="1wk")
df.reset_index(inplace=True)
df.rename(columns = {'Date':'timestamp'}, inplace = True)
#df.drop(columns ={'Open', 'High', 'Low', 'Volume'}, inplace=True, axis=1)
df.drop(columns ={'Dividends', 'Stock Splits'}, inplace=True, axis=1)
# df['Close'].ffill(axis = 0, inplace = True)
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'if trend'] = 0
df.loc[df[f'ST_{ATR}_{Mult}'] == -1, 'if trend'] = weekly
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# if df[f'ST_{ATR}_{Mult}'] == 1:
# df.loc[i, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# else:
# df.loc[i, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
# for i in range(1, len(df)):
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
#df.to_csv('GSPC.csv',index=False,mode='a')
# plt.plot(df['timestamp'], df['account bal'])
# plt.plot(df['timestamp'], df['invested'])
# plt.plot(df['timestamp'], df['close'])
# plt.show()
print(df)
What some errors looks like:
np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
File "<__array_function__ internals>", line 180, in where
ValueError: operands could not be broadcast together with shapes (36,) () (54,)
Another error:
line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
No error but not the correct amounts:
df['trend bal'] = 0
for i in range(1, len(df)):
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
See photo of screenshot of excel formula:
excel spreadsheet
*** Made correct calculations thanks to Ingwersen_erik:
from re import X
import pandas as pd
import pandas_ta as ta
import numpy as np
pd.set_option('display.max_rows', None)
df = pd.read_csv('etcusd.csv')
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
MDD = ((df['account bal']-df['account bal'].max()) / df['account bal'].max()).min()
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df.index == 0, "trend bal"] = invest
for index, row in df.iloc[1:].iterrows():
row['trend bal'] = np.where(
df.loc[index - 1, f'ST_{ATR}_{Mult}'] == 1,
(df.loc[index - 1, 'trend bal'] * (1 + row['rate'])) + weekly,
df.loc[index - 1, 'trend bal'] + weekly,
)
df.loc[df.index == index, 'trend bal'] = row['trend bal']
print(df)
Does this solve your problem?
import time
import ccxt
import warnings
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
warnings.filterwarnings("ignore")
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
invest = 10_000
weekly = 100
fee = 0.15 / 100
fees = 1 - fee
ATR = 10
Mult = 1.0
ticker = yf.Ticker("^GSPC")
df = (
ticker.history(period="1y", interval="1wk")
.reset_index()
.rename(columns={"Date": "timestamp"})
.drop(columns={"Dividends", "Stock Splits"}, errors="ignore")
)
df.loc[df.index == 0, "rate"] = 1
df.loc[df.index > 0, "rate"] = (df["Close"] / df["Close"].shift(1)) - 1
df.loc[df.index == 0, "account bal"] = invest
df.loc[df.index == 0, "account bal"] = invest
for i in range(1, len(df)):
df.loc[i, "account bal"] = (
df.loc[i - 1, "account bal"] * (1 + df.loc[i, "rate"])
) + weekly
df["invested"] = (df.index * weekly) + invest
# Super-trend
ST = ta.supertrend(df["High"], df["Low"], df["Close"], ATR, Mult)
df[f"ST_{ATR}_{Mult}"] = ST[f"SUPERTd_{ATR}_{Mult}"]
df[f"ST_{ATR}_{Mult}"] = df[f"ST_{ATR}_{Mult}"].shift(1).fillna(1)
df.loc[df[f"ST_{ATR}_{Mult}"] == 1, "if trend"] = 0
df.loc[df[f"ST_{ATR}_{Mult}"] == -1, "if trend"] = weekly
df.loc[df.index == 0, "trend bal"] = invest
# === Potential correction to the np.where ==============================
for index, row in df.iloc[1:].iterrows():
row["trend bal"] = np.where(
row[f"ST_{ATR}_{Mult}"] == 1,
(df.loc[index - 1, "trend bal"] * (1 + row["rate"])) + weekly,
df.loc[index - 1, "trend bal"] + row["if trend"],
)
# NOTE: The original "otherwise" clause from `np.where` had the
# following value: `df.loc[index - index, "trend bal"] + ...`
# I assumed you meant `index -1`, instead of `index - index`,
# therefore the above code uses `index -1`. If you really meant
# `index - index`, please change the code accordingly.
df.loc[df.index == index, "trend bal"] = row["trend bal"]
df
Result:
timestamp
Open
High
Low
Close
Volume
rate
account bal
invested
ST_10_1.0
if trend
trend bal
2021-08-16
4382.44
4444.35
4367.73
4441.67
5988610000
1
10000
10000
1
0
10000
2021-08-23
4450.29
4513.33
4450.29
4509.37
14124930000
0.0152421
10252.4
10100
1
0
10252.4
2021-08-30
4513.76
4545.85
4513.76
4535.43
14256180000
0.00577909
10411.7
10200
1
0
10411.7
2021-09-06
4535.38
4535.38
4457.66
4458.58
11793790000
-0.0169444
10335.3
10300
1
0
10335.3
2021-09-13
4474.81
4492.99
4427.76
4432.99
17763120000
-0.00573946
10375.9
10400
1
0
10375.9
2021-09-20
4402.95
4465.4
4305.91
4455.48
15697030000
0.00507327
10528.6
10500
1
0
10528.6
2021-09-27
4442.12
4457.3
4288.52
4357.04
15555390000
-0.0220941
10396
10600
1
0
10396
2021-10-04
4348.84
4429.97
4278.94
4391.34
14795520000
0.00787227
10577.8
10700
1
0
10577.8
2021-10-11
4385.44
4475.82
4329.92
4471.37
13758090000
0.0182246
10870.6
10800
1
0
10870.6
2021-10-18
4463.72
4559.67
4447.47
4544.9
13966070000
0.0164446
11149.3
10900
1
0
11149.3
2021-10-25
4553.69
4608.08
4537.36
4605.38
16206040000
0.0133072
11397.7
11000
1
0
11397.7
2021-11-01
4610.62
4718.5
4595.06
4697.53
16397220000
0.0200092
11725.8
11100
1
0
11725.8
2021-11-08
4701.48
4714.92
4630.86
4682.85
15646510000
-0.00312498
11789.1
11200
1
0
11789.1
2021-11-15
4689.3
4717.75
4672.78
4697.96
15279660000
0.00322664
11927.2
11300
1
0
11927.2
2021-11-22
4712
4743.83
4585.43
4594.62
11775840000
-0.0219967
11764.8
11400
1
0
11764.8
2021-11-29
4628.75
4672.95
4495.12
4538.43
20242840000
-0.0122295
11720.9
11500
-1
100
11864.8
2021-12-06
4548.37
4713.57
4540.51
4712.02
15411530000
0.0382489
12269.2
11600
-1
100
11964.8
2021-12-13
4710.3
4731.99
4600.22
4620.64
19184960000
-0.0193929
12131.3
11700
1
0
11832.8
2021-12-20
4587.9
4740.74
4531.1
4725.79
10594350000
0.0227566
12507.4
11800
1
0
12202
2021-12-27
4733.99
4808.93
4733.99
4766.18
11687720000
0.00854675
12714.3
11900
1
0
12406.3
2022-01-03
4778.14
4818.62
4662.74
4677.03
16800900000
-0.0187048
12576.4
12000
1
0
12274.3
2022-01-10
4655.34
4748.83
4582.24
4662.85
17126800000
-0.00303177
12638.3
12100
1
0
12337.1
2022-01-17
4632.24
4632.24
4395.34
4397.94
14131200000
-0.0568129
12020.3
12200
1
0
11736.1
2022-01-24
4356.32
4453.23
4222.62
4431.85
21218590000
0.00771046
12213
12300
-1
100
11836.1
2022-01-31
4431.79
4595.31
4414.02
4500.53
18846100000
0.0154968
12502.2
12400
-1
100
11936.1
2022-02-07
4505.75
4590.03
4401.41
4418.64
19119200000
-0.0181956
12374.7
12500
1
0
11819
2022-02-14
4412.61
4489.55
4327.22
4348.87
17775970000
-0.0157899
12279.4
12600
1
0
11732.3
2022-02-21
4332.74
4385.34
4114.65
4384.65
16834460000
0.00822737
12480.4
12700
1
0
11928.9
2022-02-28
4354.17
4416.78
4279.54
4328.87
22302830000
-0.0127216
12421.6
12800
1
0
11877.1
2022-03-07
4327.01
4327.01
4157.87
4204.31
23849630000
-0.0287743
12164.2
12900
-1
100
11977.1
2022-03-14
4202.75
4465.4
4161.72
4463.12
24946690000
0.0615583
13013
13000
-1
100
12077.1
2022-03-21
4462.4
4546.03
4424.3
4543.06
19089240000
0.0179112
13346.1
13100
1
0
12393.4
2022-03-28
4541.09
4637.3
4507.57
4545.86
19212230000
0.000616282
13454.3
13200
1
0
12501.1
2022-04-04
4547.97
4593.45
4450.04
4488.28
19383860000
-0.0126665
13383.9
13300
1
0
12442.7
2022-04-11
4462.64
4471
4381.34
4392.59
13812410000
-0.02132
13198.5
13400
1
0
12277.4
2022-04-18
4385.63
4512.94
4267.62
4271.78
18149540000
-0.0275032
12935.5
13500
-1
100
12377.4
2022-04-25
4255.34
4308.45
4124.28
4131.93
19610750000
-0.032738
12612
13600
-1
100
12477.4
2022-05-02
4130.61
4307.66
4062.51
4123.34
21039720000
-0.00207901
12685.8
13700
-1
100
12577.4
2022-05-09
4081.27
4081.27
3858.87
4023.89
23166570000
-0.0241188
12479.9
13800
-1
100
12677.4
2022-05-16
4013.02
4090.72
3810.32
3901.36
20590520000
-0.0304506
12199.8
13900
-1
100
12777.4
2022-05-23
3919.42
4158.49
3875.13
4158.24
19139100000
0.0658437
13103.1
14000
-1
100
12877.4
2022-05-30
4151.09
4177.51
4073.85
4108.54
16049940000
-0.0119522
13046.5
14100
1
0
12823.5
2022-06-06
4134.72
4168.78
3900.16
3900.86
17547150000
-0.0505484
12487
14200
1
0
12275.3
2022-06-13
3838.15
3838.15
3636.87
3674.84
24639140000
-0.0579411
11863.5
14300
-1
100
12375.3
2022-06-20
3715.31
3913.65
3715.31
3911.74
19287840000
0.0644654
12728.3
14400
-1
100
12475.3
2022-06-27
3920.76
3945.86
3738.67
3825.33
17735450000
-0.0220899
12547.1
14500
-1
100
12575.3
2022-07-04
3792.61
3918.5
3742.06
3899.38
14223350000
0.0193578
12890
14600
-1
100
12675.3
2022-07-11
3880.94
3880.94
3721.56
3863.16
16313500000
-0.00928865
12870.3
14700
-1
100
12775.3
2022-07-18
3883.79
4012.44
3818.63
3961.63
16859220000
0.0254895
13298.4
14800
-1
100
12875.3
2022-07-25
3965.72
4140.15
3910.74
4130.29
17356830000
0.0425734
13964.5
14900
1
0
13523.5
2022-08-01
4112.38
4167.66
4079.81
4145.19
18072230000
0.00360747
14114.9
15000
1
0
13672.3
2022-08-08
4155.93
4280.47
4112.09
4280.15
18117740000
0.0325582
14674.4
15100
1
0
14217.4
2022-08-15
4269.37
4325.28
4218.7
4228.48
16255850000
-0.012072
14597.3
15200
1
0
14145.8
2022-08-19
4266.31
4266.31
4218.7
4228.48
2045645000
0
14697.3
15300
1
0
14245.8
So I have this df
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP
SUP1 P1 STR1 50 5 18
SUP1 P1 STR2 6 7 18
SUP1 P1 STR3 74 4 18
SUP2 P4 STR1 35 3 500
SUP2 P4 STR2 5 4 500
SUP2 P4 STR3 54 7 500
It's always grouped by Supplier and product ID. The TO_SHIP column is unique for the group. So for example, I have 18 products for that SUP1 with P1 to send. Then I add new columns:
Calculate Wk_bal = (BALANCE / AVG_SALES)
Rank Wk_bal per supplierid-productid group
Lowest Wk_bal for the group : SEND_PKGS = +1
Then Calculate Wk_bal again but add pkg sent = ((BALANCE+SEND_PKGS) / AVG_SALES)
So this loops until all TO_SHIP has been distributed to the stores who need the most
To visualize a run:
First output (calculate wk_bal, then send 1 pkg to the lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 18 10 0
SUP1 P1 STR2 6 4 18 1.5 1
SUP1 P1 STR3 8 4 18 2 0
SUP2 P4 STR1 35 3 500 11.67 0
SUP2 P4 STR2 5 4 500 1.25 1
SUP2 P4 STR3 54 7 500 7.71 0
Second output (calculate updated wk_bal, send one pkg to lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 17 10 0
SUP1 P1 STR2 8 4 17 1.75 2
SUP1 P1 STR3 8 4 17 2 0
SUP2 P4 STR1 35 3 499 11.67 0
SUP2 P4 STR2 7 4 499 1.5 2
SUP2 P4 STR3 54 7 499 7.71 0
And so on...so until there is to_ship left, calculate-rank-give one pkg. The reason for this process is I want to make sure that the store with the lowest wk_balance get the package first. (and there's a lot of other reasons why)
I initially built this on SQL, but with the complexity I moved to python. Unfortunately my python isn't very good in coming up with loops with several conditions esp on pandas df. So far I've tried (and failed):
df['Wk_Bal'] = 0
df['TO_SHIP'] = 0
for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']:
if i > 0:
df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES']
df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1
df['SEND_PKGS'] = + 1
df['BALANCE'] = + 1
else:
df['TO_SHIP'] = 0
How do I do this better?
Hopefully I've understood all of your requirements. Here is your original data:
df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'],
'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'],
'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'],
'BALANCE': [50, 6, 74, 35, 5, 54],
'AVG_SALES': [5, 4, 4, 3, 4, 7],
'TO_SHIP': [18, 18, 18, 500, 500, 500]})
Here is my approach:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin()
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
I continue updating df until the TO_SHIP column is all zero. Then I increment SEND_PKGS which correspond to the lowest Wk_bal of each group. Then update Wk_bal and decrement any non-zero TO_SHIP columns.
I end up with:
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP SEND_PKGS Wk_bal
0 SUP1 P1 STR1 50 5 0 0 10.000000
1 SUP1 P1 STR2 6 4 0 18 6.000000
2 SUP1 P1 STR3 74 4 0 0 18.500000
3 SUP2 P4 STR1 35 3 0 92 42.333333
4 SUP2 P4 STR2 5 4 0 165 42.500000
5 SUP2 P4 STR3 54 7 0 243 42.428571
Edit: In the case of multiple Wk_bal minimums, we can choose based on the minimum AVG_SALES:
def find_min(x):
num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0]
if num_mins == 1:
return(x["Wk_bal"].idxmin())
else:
min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()]
return(min_df["AVG_SALES"].idxmin())
Then, more or less as before:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min)
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
I have 2 dataframe sets , I want to create a third one. I am trying to to write a code that to do the following :
if A_pd["from"] and A_pd["To"] is within the range of B_pd["from"]and B_pd["To"] then add to the C_pd dateframe A_pd["from"] and A_pd["To"] and B_pd["Value"].
if the A_pd["from"] is within the range of B_pd["from"]and B_pd["To"] and A_pd["To"] within the range of B_pd["from"]and B_pd["To"] of teh next row , then i want to split the range A_pd["from"] and A_pd["To"] to 2 ranges (A_pd["from"] and B_pd["To"]) and ( B_pd["To"] and A_pd["To"] ) and the corresponded B_pd["Value"].
I created the following code:
import pandas as pd
A_pd = {'from':[0,20,80,180,250],
'To':[20, 50,120,210,300]}
A_pd=pd.DataFrame(A_pd)
B_pd = {'from':[0,20,100,200],
'To':[20, 100,200,300],
'Value':[20, 17,15,12]}
B_pd=pd.DataFrame(B_pd)
for i in range(len(A_pd)):
numberOfIntrupt=0
for j in range(len(B_pd)):
if A_pd["from"].values[i] >= B_pd["from"].values[j] and A_pd["from"].values[i] > B_pd["To"].values[j]:
numberOfIntrupt+=1
cols = ['C_from', 'C_To', 'C_value']
C_dp=pd.DataFrame(columns=cols, index=range(len(A_pd)+numberOfIntrupt))
for i in range(len(A_pd)):
for j in range(len(B_pd)):
a=A_pd ["from"].values[i]
b=A_pd["To"].values[i]
c_eval=B_pd["Value"].values[j]
range_s=B_pd["from"].values[j]
range_f=B_pd["To"].values[j]
if a >= range_s and a <= range_f and b >= range_s and b <= range_f :
C_dp['C_from'].loc[i]=a
C_dp['C_To'].loc[i]=b
C_dp['C_value'].loc[i]=c_eval
elif a >= range_s and b > range_f:
C_dp['C_from'].loc[i]=a
C_dp['C_To'].loc[i]=range_f
C_dp['C_value'].loc[i]=c_eval
C_dp['C_from'].loc[i+1]=range_f
C_dp['C_To'].loc[i+1]=b
C_dp['C_value'].loc[i+1]=B_pd["Value"].values[j+1]
print(C_dp)
The current result is C_dp:
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 180 200 15
4 250 300 12
5 200 300 12
6 NaN NaN NaN
7 NaN NaN NaN
the expected should be :
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 100 120 15
4 180 200 15
5 200 210 12
6 250 300 12
Thank you a lot for the support
I'm sure there is a better way to do this without loops, but this will help your logic flow.
import pandas as pd
A_pd = {'from':[0, 20, 80, 180, 250],
'To':[20, 50, 120, 210, 300]}
A_pd=pd.DataFrame(A_pd)
B_pd = {'from':[0, 20, 100, 200],
'To':[20, 100,200, 300],
'Value':[20, 17, 15, 12]}
B_pd=pd.DataFrame(B_pd)
cols = ['C_from', 'C_To', 'C_value']
C_dp=pd.DataFrame(columns=cols)
spillover = False
for i in range(len(A_pd)):
for j in range(len(B_pd)):
a_from = A_pd["from"].values[i]
a_to = A_pd["To"].values[i]
b_from = B_pd["from"].values[j]
b_to = B_pd["To"].values[j]
b_value = B_pd['Value'].values[j]
if (a_from >= b_to):
# a_from outside b range
continue # next b
elif (a_from >= b_from):
# a_from within b range
if a_to <= b_to:
C_dp = C_dp.append({"C_from": a_from, "C_To": a_to, "C_value": b_value}, ignore_index=True)
break # next a
else:
C_dp = C_dp.append({"C_from": a_from, "C_To": b_to, "C_value": b_value}, ignore_index=True)
if j < len(B_pd):
spillover = True
continue
if spillover:
if a_to <= b_to:
C_dp = C_dp.append({"C_from": b_from, "C_To": a_to, "C_value": b_value}, ignore_index=True)
spillover = False
break
else:
C_dp = C_dp.append({"C_from": b_from, "C_To": b_to, "C_value": b_value}, ignore_index=True)
spillover = True
continue
print(C_dp)
Output
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 100 120 15
4 180 200 15
5 200 210 12
6 250 300 12
I have a dataframe of people with Age as a column. I would like to match this age to a group, i.e. Baby=0-2 years old, Child=3-12 years old, Young=13-18 years old, Young Adult=19-30 years old, Adult=31-50 years old, Senior Adult=51-65 years old.
I created the lists that define these year groups, e.g. Adult=list(range(31,51)) etc.
How do I match the name of the list 'Adult' to the dataframe by creating a new column?
Small input: the dataframe is made up of three columns: df['Name'], df['Country'], df['Age'].
Name Country Age
Anthony France 15
Albert Belgium 54
.
.
.
Zahra Tunisia 14
So I need to match the age column with lists that I already have. The output should look like:
Name Country Age Group
Anthony France 15 Young
Albert Belgium 54 Adult
.
.
.
Zahra Tunisia 14 Young
Thanks!
IIUC I would go with np.select:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Age': [3, 20, 40]})
condlist = [df.Age.between(0,2),
df.Age.between(3,12),
df.Age.between(13,18),
df.Age.between(19,30),
df.Age.between(31,50),
df.Age.between(51,65)]
choicelist = ['Baby', 'Child', 'Young',
'Young Adult', 'Adult', 'Senior Adult']
df['Adult'] = np.select(condlist, choicelist)
Output:
Age Adult
0 3 Child
1 20 Young Adult
2 40 Adult
Here's a way to do that using pd.cut:
df = pd.DataFrame({"person_id": range(25), "age": np.random.randint(0, 100, 25)})
print(df.head(10))
==>
person_id age
0 0 30
1 1 42
2 2 78
3 3 2
4 4 44
5 5 43
6 6 92
7 7 3
8 8 13
9 9 76
df["group"] = pd.cut(df.age, [0, 18, 50, 100], labels=["child", "adult", "senior"])
print(df.head(10))
==>
person_id age group
0 0 30 adult
1 1 42 adult
2 2 78 senior
3 3 2 child
4 4 44 adult
5 5 43 adult
6 6 92 senior
7 7 3 child
8 8 13 child
9 9 76 senior
Per your question, if you have a few lists (like the ones below), and would like to convert use them for 'binning', you can do:
# for example, these are the lists
Adult = list(range(18,50))
Child = list(range(0, 18))
Senior = list(range(50, 100))
# Creating bins out of the lists.
bins = [min(l) for l in [Child, Adult, Senior]]
bins.append(max([max(l) for l in [Child, Adult, Senior]]))
labels = ["Child", "Adult", "Senior"]
# using the bins:
df["group"] = pd.cut(df.age, bins, labels=labels)
To make things more clear for beginners, you can define a function that will return the age group of each person accordingly, then use pandas.apply() to apply that function to our 'Group' column:
import pandas as pd
def age(row):
a = row['Age']
if 0 < a <= 2:
return 'Baby'
elif 2 < a <= 12:
return 'Child'
elif 12 < a <= 18:
return 'Young'
elif 18 < a <= 30:
return 'Young Adult'
elif 30 < a <= 50:
return 'Adult'
elif 50 < a <= 65:
return 'Senior Adult'
df = pd.DataFrame({'Name':['Anthony','Albert','Zahra'],
'Country':['France','Belgium','Tunisia'],
'Age':[15,54,14]})
df['Group'] = df.apply(age, axis=1)
print(df)
Output:
Name Country Age Group
0 Anthony France 15 Young
1 Albert Belgium 54 Senior Adult
2 Zahra Tunisia 14 Young