For loop that adds and deducts from pandas columns - python

So I have this df
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP
SUP1 P1 STR1 50 5 18
SUP1 P1 STR2 6 7 18
SUP1 P1 STR3 74 4 18
SUP2 P4 STR1 35 3 500
SUP2 P4 STR2 5 4 500
SUP2 P4 STR3 54 7 500
It's always grouped by Supplier and product ID. The TO_SHIP column is unique for the group. So for example, I have 18 products for that SUP1 with P1 to send. Then I add new columns:
Calculate Wk_bal = (BALANCE / AVG_SALES)
Rank Wk_bal per supplierid-productid group
Lowest Wk_bal for the group : SEND_PKGS = +1
Then Calculate Wk_bal again but add pkg sent = ((BALANCE+SEND_PKGS) / AVG_SALES)
So this loops until all TO_SHIP has been distributed to the stores who need the most
To visualize a run:
First output (calculate wk_bal, then send 1 pkg to the lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 18 10 0
SUP1 P1 STR2 6 4 18 1.5 1
SUP1 P1 STR3 8 4 18 2 0
SUP2 P4 STR1 35 3 500 11.67 0
SUP2 P4 STR2 5 4 500 1.25 1
SUP2 P4 STR3 54 7 500 7.71 0
Second output (calculate updated wk_bal, send one pkg to lowest):
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP Wk_Bal SEND_PKGS
SUP1 P1 STR1 50 5 17 10 0
SUP1 P1 STR2 8 4 17 1.75 2
SUP1 P1 STR3 8 4 17 2 0
SUP2 P4 STR1 35 3 499 11.67 0
SUP2 P4 STR2 7 4 499 1.5 2
SUP2 P4 STR3 54 7 499 7.71 0
And so on...so until there is to_ship left, calculate-rank-give one pkg. The reason for this process is I want to make sure that the store with the lowest wk_balance get the package first. (and there's a lot of other reasons why)
I initially built this on SQL, but with the complexity I moved to python. Unfortunately my python isn't very good in coming up with loops with several conditions esp on pandas df. So far I've tried (and failed):
df['Wk_Bal'] = 0
df['TO_SHIP'] = 0
for i in df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']:
if i > 0:
df['Wk_Bal'] = df['BALANCE'] / df['AVG_SALES']
df['TO_SHIP'] = df.groupby(["SUPPLIER", "PRODUCTID"])['TO_SHIP']-1
df['SEND_PKGS'] = + 1
df['BALANCE'] = + 1
else:
df['TO_SHIP'] = 0
How do I do this better?

Hopefully I've understood all of your requirements. Here is your original data:
df = pd.DataFrame({'SUPPLIER': ['SUP1', 'SUP1', 'SUP1', 'SUP2', 'SUP2', 'SUP2'],
'PRODUCTID': ['P1', 'P1', 'P1', 'P4', 'P4', 'P4'],
'STOREID': ['STR1', 'STR2', 'STR3', 'STR1', 'STR2', 'STR3'],
'BALANCE': [50, 6, 74, 35, 5, 54],
'AVG_SALES': [5, 4, 4, 3, 4, 7],
'TO_SHIP': [18, 18, 18, 500, 500, 500]})
Here is my approach:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])['Wk_bal'].idxmin()
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1
I continue updating df until the TO_SHIP column is all zero. Then I increment SEND_PKGS which correspond to the lowest Wk_bal of each group. Then update Wk_bal and decrement any non-zero TO_SHIP columns.
I end up with:
SUPPLIER PRODUCTID STOREID BALANCE AVG_SALES TO_SHIP SEND_PKGS Wk_bal
0 SUP1 P1 STR1 50 5 0 0 10.000000
1 SUP1 P1 STR2 6 4 0 18 6.000000
2 SUP1 P1 STR3 74 4 0 0 18.500000
3 SUP2 P4 STR1 35 3 0 92 42.333333
4 SUP2 P4 STR2 5 4 0 165 42.500000
5 SUP2 P4 STR3 54 7 0 243 42.428571
Edit: In the case of multiple Wk_bal minimums, we can choose based on the minimum AVG_SALES:
def find_min(x):
num_mins = x["Wk_bal"].loc[x["Wk_bal"] == x["Wk_bal"].min()].shape[0]
if num_mins == 1:
return(x["Wk_bal"].idxmin())
else:
min_df = x.loc[x["Wk_bal"] == x["Wk_bal"].min()]
return(min_df["AVG_SALES"].idxmin())
Then, more or less as before:
df['SEND_PKGS'] = 0
df['Wk_bal'] = df['BALANCE'] / df['AVG_SALES']
while (df['TO_SHIP'] != 0).any():
lowest_idx = df[df['TO_SHIP'] > 0].groupby(["SUPPLIER", "PRODUCTID"])[['Wk_bal', 'AVG_SALES']].apply(find_min)
df.loc[lowest_idx, 'SEND_PKGS'] += 1
df['Wk_bal'] = (df['BALANCE'] + df['SEND_PKGS']) / df['AVG_SALES']
df.loc[df['TO_SHIP'] > 0, 'TO_SHIP'] -= 1

Related

if statement in for loop with pandas dataframes

I am making a Dollar Cost Average code where I want to choose between 2 equations. I made an excel spreadsheet that I'm trying to portover to python. I've gotten pretty far except for the last step. The last step has had me searching for a solution for 3 weeks now. The errors happen when I try a for loop in a df when looping through. I would like to check a column with an if the statement. If is true then do an equation if false do another equation. I can get the for loop to work and I can the if statements to work, but not combined. See all commented out code for whats been tried. I have tried np.where instead of the if statements as well. I have tried .loc. I have tried lamda. I have tried list comp. Nothing is working please help. FYI the code referring is ['trend bal'] column. ***see end with correct code.
What the df looks like:
Index timestamp Open High Low ... rate account bal invested ST_10_1.0 if trend
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... 1.000000 $10,000.00 10000 1 0
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... 0.015242 $10,252.42 10100 1 0
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... 0.005779 $10,411.67 10200 1 0
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... -0.016944 $10,335.25 10300 1 0
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... -0.005739 $10,375.93 10400 1 0
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... 0.005073 $10,528.57 10500 1 0
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... -0.022094 $10,395.95 10600 1 0
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... 0.007872 $10,577.79 10700 1 0
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... 0.018225 $10,870.57 10800 1 0
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... 0.016445 $11,149.33 10900 1 0
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... 0.013307 $11,397.70 11000 1 0
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... 0.020009 $11,725.75 11100 1 0
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... -0.003125 $11,789.11 11200 1 0
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... 0.003227 $11,927.15 11300 1 0
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... -0.021997 $11,764.79 11400 1 0
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... -0.012230 $11,720.92 11500 -1 100
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... 0.038249 $12,269.23 11600 -1 100
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... -0.019393 $12,131.29 11700 1 0
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... 0.022757 $12,507.36 11800 1 0
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... 0.008547 $12,714.25 11900 1 0
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... -0.018705 $12,576.44 12000 1 0
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... -0.003032 $12,638.31 12100 1 0
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... -0.056813 $12,020.29 12200 1 0
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... 0.007710 $12,212.97 12300 -1 100
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... 0.015497 $12,502.23 12400 -1 100
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... -0.018196 $12,374.75 12500 1 0
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... -0.015790 $12,279.35 12600 1 0
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... 0.008227 $12,480.38 12700 1 0
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... -0.012722 $12,421.61 12800 1 0
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... -0.028774 $12,164.19 12900 -1 100
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... 0.061558 $13,012.99 13000 -1 100
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... 0.017911 $13,346.07 13100 1 0
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... 0.000616 $13,454.30 13200 1 0
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... -0.012666 $13,383.88 13300 1 0
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... -0.021320 $13,198.53 13400 1 0
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... -0.027503 $12,935.53 13500 -1 100
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... -0.032738 $12,612.05 13600 -1 100
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... -0.002079 $12,685.83 13700 -1 100
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... -0.024119 $12,479.86 13800 -1 100
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... -0.030451 $12,199.84 13900 -1 100
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... 0.065844 $13,103.12 14000 -1 100
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... -0.011952 $13,046.51 14100 1 0
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... -0.050548 $12,487.03 14200 1 0
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... -0.057941 $11,863.52 14300 -1 100
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... 0.064465 $12,728.31 14400 -1 100
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... -0.022090 $12,547.14 14500 -1 100
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... 0.019358 $12,890.03 14600 -1 100
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... -0.009289 $12,870.29 14700 -1 100
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... 0.025489 $13,298.35 14800 -1 100
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... 0.042573 $13,964.51 14900 1 0
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... 0.003607 $14,114.88 15000 1 0
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... 0.032558 $14,674.44 15100 1 0
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... 0.000839 $14,786.75 15200 1 0
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... -0.012900 $14,696.00 15300 1 0
What it should look like:
Index timestamp Open High Low ... account bal invested ST_10_1.0 if trend trend bal
0 0 8/16/2021 4382.439941 4444.350098 4367.729980 ... $10,000.00 10000 1 0 $10,000.00
1 1 8/23/2021 4450.290039 4513.330078 4450.290039 ... $10,252.42 10100 1 0 $10,252.42
2 2 8/30/2021 4513.759766 4545.850098 4513.759766 ... $10,411.67 10200 1 0 $10,411.67
3 3 9/6/2021 4535.379883 4535.379883 4457.660156 ... $10,335.25 10300 1 0 $10,335.25
4 4 9/13/2021 4474.810059 4492.990234 4427.759766 ... $10,375.93 10400 1 0 $10,375.93
5 5 9/20/2021 4402.950195 4465.399902 4305.910156 ... $10,528.57 10500 1 0 $10,528.57
6 6 9/27/2021 4442.120117 4457.299805 4288.520020 ... $10,395.95 10600 1 0 $10,395.95
7 7 10/4/2021 4348.839844 4429.970215 4278.939941 ... $10,577.79 10700 1 0 $10,577.79
8 8 10/11/2021 4385.439941 4475.819824 4329.919922 ... $10,870.57 10800 1 0 $10,870.57
9 9 10/18/2021 4463.720215 4559.669922 4447.470215 ... $11,149.33 10900 1 0 $11,149.33
10 10 10/25/2021 4553.689941 4608.080078 4537.359863 ... $11,397.70 11000 1 0 $11,397.70
11 11 11/1/2021 4610.620117 4718.500000 4595.060059 ... $11,725.75 11100 1 0 $11,725.75
12 12 11/8/2021 4701.479980 4714.919922 4630.859863 ... $11,789.11 11200 1 0 $11,789.11
13 13 11/15/2021 4689.299805 4717.750000 4672.779785 ... $11,927.15 11300 1 0 $11,927.15
14 14 11/22/2021 4712.000000 4743.830078 4585.430176 ... $11,764.79 11400 1 0 $11,764.79
15 15 11/29/2021 4628.750000 4672.950195 4495.120117 ... $11,720.92 11500 -1 100 $11,720.92
16 16 12/6/2021 4548.370117 4713.569824 4540.509766 ... $12,269.23 11600 -1 100 $11,820.92
17 17 12/13/2021 4710.299805 4731.990234 4600.220215 ... $12,131.29 11700 1 0 $11,920.92
18 18 12/20/2021 4587.899902 4740.740234 4531.100098 ... $12,507.36 11800 1 0 $12,292.19
19 19 12/27/2021 4733.990234 4808.930176 4733.990234 ... $12,714.25 11900 1 0 $12,497.25
20 20 1/3/2022 4778.140137 4818.620117 4662.740234 ... $12,576.44 12000 1 0 $12,363.49
21 21 1/10/2022 4655.339844 4748.830078 4582.240234 ... $12,638.31 12100 1 0 $12,426.01
22 22 1/17/2022 4632.240234 4632.240234 4395.339844 ... $12,020.29 12200 1 0 $11,820.05
23 23 1/24/2022 4356.319824 4453.229980 4222.620117 ... $12,212.97 12300 -1 100 $12,011.19
24 24 1/31/2022 4431.790039 4595.310059 4414.020020 ... $12,502.23 12400 -1 100 $12,111.19
25 25 2/7/2022 4505.750000 4590.029785 4401.410156 ... $12,374.75 12500 1 0 $12,211.19
26 26 2/14/2022 4412.609863 4489.549805 4327.220215 ... $12,279.35 12600 1 0 $12,118.38
27 27 2/21/2022 4332.740234 4385.339844 4114.649902 ... $12,480.38 12700 1 0 $12,318.08
28 28 2/28/2022 4354.169922 4416.779785 4279.540039 ... $12,421.61 12800 1 0 $12,261.37
29 29 3/7/2022 4327.009766 4327.009766 4157.870117 ... $12,164.19 12900 -1 100 $12,008.56
30 30 3/14/2022 4202.750000 4465.399902 4161.720215 ... $13,012.99 13000 -1 100 $12,108.56
31 31 3/21/2022 4462.399902 4546.029785 4424.299805 ... $13,346.07 13100 1 0 $12,208.56
32 32 3/28/2022 4541.089844 4637.299805 4507.569824 ... $13,454.30 13200 1 0 $12,316.09
33 33 4/4/2022 4547.970215 4593.450195 4450.040039 ... $13,383.88 13300 1 0 $12,260.08
34 34 4/11/2022 4462.640137 4471.000000 4381.339844 ... $13,198.53 13400 1 0 $12,098.70
35 35 4/18/2022 4385.629883 4512.939941 4267.620117 ... $12,935.53 13500 -1 100 $11,865.95
36 36 4/25/2022 4255.339844 4308.450195 4124.279785 ... $12,612.05 13600 -1 100 $11,965.95
37 37 5/2/2022 4130.609863 4307.660156 4062.510010 ... $12,685.83 13700 -1 100 $12,065.95
38 38 5/9/2022 4081.270020 4081.270020 3858.870117 ... $12,479.86 13800 -1 100 $12,165.95
39 39 5/16/2022 4013.020020 4090.719971 3810.320068 ... $12,199.84 13900 -1 100 $12,265.95
40 40 5/23/2022 3919.419922 4158.490234 3875.129883 ... $13,103.12 14000 -1 100 $12,365.95
41 41 5/30/2022 4151.089844 4177.509766 4073.850098 ... $13,046.51 14100 1 0 $12,465.95
42 42 6/6/2022 4134.720215 4168.779785 3900.159912 ... $12,487.03 14200 1 0 $11,935.81
43 43 6/13/2022 3838.149902 3838.149902 3636.870117 ... $11,863.52 14300 -1 100 $11,344.24
44 44 6/20/2022 3715.310059 3913.649902 3715.310059 ... $12,728.31 14400 -1 100 $11,444.24
45 45 6/27/2022 3920.760010 3945.860107 3738.669922 ... $12,547.14 14500 -1 100 $11,544.24
46 46 7/4/2022 3792.610107 3918.500000 3742.060059 ... $12,890.03 14600 -1 100 $11,644.24
47 47 7/11/2022 3880.939941 3880.939941 3721.560059 ... $12,870.29 14700 -1 100 $11,744.24
48 48 7/18/2022 3883.790039 4012.439941 3818.629883 ... $13,298.35 14800 -1 100 $11,844.24
49 49 7/25/2022 3965.719971 4140.149902 3910.739990 ... $13,964.51 14900 1 0 $11,944.24
50 50 8/1/2022 4112.379883 4167.660156 4079.810059 ... $14,114.88 15000 1 0 $12,087.33
51 51 8/8/2022 4155.930176 4280.470215 4112.089844 ... $14,674.44 15100 1 0 $12,580.87
52 52 8/15/2022 4269.370117 4325.279785 4253.080078 ... $14,786.75 15200 1 0 $12,691.42
53 53 8/19/2022 4266.310059 4266.310059 4218.700195 ... $14,696.00 15300 1 0 $12,627.70
Python Code:
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
import ccxt
from matplotlib import pyplot as plt
import config
import schedule
import pandas as pd
import pandas_ta as ta
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import numpy as np
from datetime import datetime
import time
import yfinance as yf
ticker = yf.Ticker('^GSPC')
df = ticker.history(period="1y", interval="1wk")
df.reset_index(inplace=True)
df.rename(columns = {'Date':'timestamp'}, inplace = True)
#df.drop(columns ={'Open', 'High', 'Low', 'Volume'}, inplace=True, axis=1)
df.drop(columns ={'Dividends', 'Stock Splits'}, inplace=True, axis=1)
# df['Close'].ffill(axis = 0, inplace = True)
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'if trend'] = 0
df.loc[df[f'ST_{ATR}_{Mult}'] == -1, 'if trend'] = weekly
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
# df.loc[df.index == 0, 'trend bal'] = invest
# for i in range(1, len(df)):
# if df[f'ST_{ATR}_{Mult}'] == 1:
# df.loc[i, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# else:
# df.loc[i, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
# for i in range(1, len(df)):
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
# df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
#df.to_csv('GSPC.csv',index=False,mode='a')
# plt.plot(df['timestamp'], df['account bal'])
# plt.plot(df['timestamp'], df['invested'])
# plt.plot(df['timestamp'], df['close'])
# plt.show()
print(df)
What some errors looks like:
np.where(df.loc[df[f'ST_{ATR}_{Mult}'] == 1, 'trend bal'], (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly, df.loc[i-i, 'trend bal'] + df['if trend'])
File "<__array_function__ internals>", line 180, in where
ValueError: operands could not be broadcast together with shapes (36,) () (54,)
Another error:
line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
No error but not the correct amounts:
df['trend bal'] = 0
for i in range(1, len(df)):
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == 1, 'trend bal'] = (df.loc[i-1, 'trend bal'] * (1 + df.loc[i, 'rate'])) + weekly
df.loc[df[f'ST_{ATR}_{Mult}'].shift(1) == -1, 'trend bal'] = df.loc[i-i, 'trend bal'] + df['if trend']
See photo of screenshot of excel formula:
excel spreadsheet
*** Made correct calculations thanks to Ingwersen_erik:
from re import X
import pandas as pd
import pandas_ta as ta
import numpy as np
pd.set_option('display.max_rows', None)
df = pd.read_csv('etcusd.csv')
invest = 10000
weekly = 100
fee = .15/100
fees = 1-fee
df.loc[df.index == 0, 'rate'] = 1
df.loc[df.index > 0, 'rate'] = (df['Close'] / df['Close'].shift(1))-1
df.loc[df.index == 0, 'account bal'] = invest
for i in range(1, len(df)):
df.loc[i, 'account bal'] = (df.loc[i-1, 'account bal'] * (1 + df.loc[i, 'rate'])) + weekly
df['invested'] = (df.index*weekly)+invest
MDD = ((df['account bal']-df['account bal'].max()) / df['account bal'].max()).min()
#Supertrend
ATR = 10
Mult = 1.0
ST = ta.supertrend(df['High'], df['Low'], df['Close'], ATR, Mult)
df[f'ST_{ATR}_{Mult}'] = ST[f'SUPERTd_{ATR}_{Mult}']
df[f'ST_{ATR}_{Mult}'] = df[f'ST_{ATR}_{Mult}'].shift(1).fillna(1)
df.loc[df.index == 0, "trend bal"] = invest
for index, row in df.iloc[1:].iterrows():
row['trend bal'] = np.where(
df.loc[index - 1, f'ST_{ATR}_{Mult}'] == 1,
(df.loc[index - 1, 'trend bal'] * (1 + row['rate'])) + weekly,
df.loc[index - 1, 'trend bal'] + weekly,
)
df.loc[df.index == index, 'trend bal'] = row['trend bal']
print(df)
Does this solve your problem?
import time
import ccxt
import warnings
import pandas as pd
import pandas_ta as ta
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from ctypes.wintypes import VARIANT_BOOL
from xml.dom.expatbuilder import FilterVisibilityController
warnings.filterwarnings("ignore")
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
invest = 10_000
weekly = 100
fee = 0.15 / 100
fees = 1 - fee
ATR = 10
Mult = 1.0
ticker = yf.Ticker("^GSPC")
df = (
ticker.history(period="1y", interval="1wk")
.reset_index()
.rename(columns={"Date": "timestamp"})
.drop(columns={"Dividends", "Stock Splits"}, errors="ignore")
)
df.loc[df.index == 0, "rate"] = 1
df.loc[df.index > 0, "rate"] = (df["Close"] / df["Close"].shift(1)) - 1
df.loc[df.index == 0, "account bal"] = invest
df.loc[df.index == 0, "account bal"] = invest
for i in range(1, len(df)):
df.loc[i, "account bal"] = (
df.loc[i - 1, "account bal"] * (1 + df.loc[i, "rate"])
) + weekly
df["invested"] = (df.index * weekly) + invest
# Super-trend
ST = ta.supertrend(df["High"], df["Low"], df["Close"], ATR, Mult)
df[f"ST_{ATR}_{Mult}"] = ST[f"SUPERTd_{ATR}_{Mult}"]
df[f"ST_{ATR}_{Mult}"] = df[f"ST_{ATR}_{Mult}"].shift(1).fillna(1)
df.loc[df[f"ST_{ATR}_{Mult}"] == 1, "if trend"] = 0
df.loc[df[f"ST_{ATR}_{Mult}"] == -1, "if trend"] = weekly
df.loc[df.index == 0, "trend bal"] = invest
# === Potential correction to the np.where ==============================
for index, row in df.iloc[1:].iterrows():
row["trend bal"] = np.where(
row[f"ST_{ATR}_{Mult}"] == 1,
(df.loc[index - 1, "trend bal"] * (1 + row["rate"])) + weekly,
df.loc[index - 1, "trend bal"] + row["if trend"],
)
# NOTE: The original "otherwise" clause from `np.where` had the
# following value: `df.loc[index - index, "trend bal"] + ...`
# I assumed you meant `index -1`, instead of `index - index`,
# therefore the above code uses `index -1`. If you really meant
# `index - index`, please change the code accordingly.
df.loc[df.index == index, "trend bal"] = row["trend bal"]
df
Result:
timestamp
Open
High
Low
Close
Volume
rate
account bal
invested
ST_10_1.0
if trend
trend bal
2021-08-16
4382.44
4444.35
4367.73
4441.67
5988610000
1
10000
10000
1
0
10000
2021-08-23
4450.29
4513.33
4450.29
4509.37
14124930000
0.0152421
10252.4
10100
1
0
10252.4
2021-08-30
4513.76
4545.85
4513.76
4535.43
14256180000
0.00577909
10411.7
10200
1
0
10411.7
2021-09-06
4535.38
4535.38
4457.66
4458.58
11793790000
-0.0169444
10335.3
10300
1
0
10335.3
2021-09-13
4474.81
4492.99
4427.76
4432.99
17763120000
-0.00573946
10375.9
10400
1
0
10375.9
2021-09-20
4402.95
4465.4
4305.91
4455.48
15697030000
0.00507327
10528.6
10500
1
0
10528.6
2021-09-27
4442.12
4457.3
4288.52
4357.04
15555390000
-0.0220941
10396
10600
1
0
10396
2021-10-04
4348.84
4429.97
4278.94
4391.34
14795520000
0.00787227
10577.8
10700
1
0
10577.8
2021-10-11
4385.44
4475.82
4329.92
4471.37
13758090000
0.0182246
10870.6
10800
1
0
10870.6
2021-10-18
4463.72
4559.67
4447.47
4544.9
13966070000
0.0164446
11149.3
10900
1
0
11149.3
2021-10-25
4553.69
4608.08
4537.36
4605.38
16206040000
0.0133072
11397.7
11000
1
0
11397.7
2021-11-01
4610.62
4718.5
4595.06
4697.53
16397220000
0.0200092
11725.8
11100
1
0
11725.8
2021-11-08
4701.48
4714.92
4630.86
4682.85
15646510000
-0.00312498
11789.1
11200
1
0
11789.1
2021-11-15
4689.3
4717.75
4672.78
4697.96
15279660000
0.00322664
11927.2
11300
1
0
11927.2
2021-11-22
4712
4743.83
4585.43
4594.62
11775840000
-0.0219967
11764.8
11400
1
0
11764.8
2021-11-29
4628.75
4672.95
4495.12
4538.43
20242840000
-0.0122295
11720.9
11500
-1
100
11864.8
2021-12-06
4548.37
4713.57
4540.51
4712.02
15411530000
0.0382489
12269.2
11600
-1
100
11964.8
2021-12-13
4710.3
4731.99
4600.22
4620.64
19184960000
-0.0193929
12131.3
11700
1
0
11832.8
2021-12-20
4587.9
4740.74
4531.1
4725.79
10594350000
0.0227566
12507.4
11800
1
0
12202
2021-12-27
4733.99
4808.93
4733.99
4766.18
11687720000
0.00854675
12714.3
11900
1
0
12406.3
2022-01-03
4778.14
4818.62
4662.74
4677.03
16800900000
-0.0187048
12576.4
12000
1
0
12274.3
2022-01-10
4655.34
4748.83
4582.24
4662.85
17126800000
-0.00303177
12638.3
12100
1
0
12337.1
2022-01-17
4632.24
4632.24
4395.34
4397.94
14131200000
-0.0568129
12020.3
12200
1
0
11736.1
2022-01-24
4356.32
4453.23
4222.62
4431.85
21218590000
0.00771046
12213
12300
-1
100
11836.1
2022-01-31
4431.79
4595.31
4414.02
4500.53
18846100000
0.0154968
12502.2
12400
-1
100
11936.1
2022-02-07
4505.75
4590.03
4401.41
4418.64
19119200000
-0.0181956
12374.7
12500
1
0
11819
2022-02-14
4412.61
4489.55
4327.22
4348.87
17775970000
-0.0157899
12279.4
12600
1
0
11732.3
2022-02-21
4332.74
4385.34
4114.65
4384.65
16834460000
0.00822737
12480.4
12700
1
0
11928.9
2022-02-28
4354.17
4416.78
4279.54
4328.87
22302830000
-0.0127216
12421.6
12800
1
0
11877.1
2022-03-07
4327.01
4327.01
4157.87
4204.31
23849630000
-0.0287743
12164.2
12900
-1
100
11977.1
2022-03-14
4202.75
4465.4
4161.72
4463.12
24946690000
0.0615583
13013
13000
-1
100
12077.1
2022-03-21
4462.4
4546.03
4424.3
4543.06
19089240000
0.0179112
13346.1
13100
1
0
12393.4
2022-03-28
4541.09
4637.3
4507.57
4545.86
19212230000
0.000616282
13454.3
13200
1
0
12501.1
2022-04-04
4547.97
4593.45
4450.04
4488.28
19383860000
-0.0126665
13383.9
13300
1
0
12442.7
2022-04-11
4462.64
4471
4381.34
4392.59
13812410000
-0.02132
13198.5
13400
1
0
12277.4
2022-04-18
4385.63
4512.94
4267.62
4271.78
18149540000
-0.0275032
12935.5
13500
-1
100
12377.4
2022-04-25
4255.34
4308.45
4124.28
4131.93
19610750000
-0.032738
12612
13600
-1
100
12477.4
2022-05-02
4130.61
4307.66
4062.51
4123.34
21039720000
-0.00207901
12685.8
13700
-1
100
12577.4
2022-05-09
4081.27
4081.27
3858.87
4023.89
23166570000
-0.0241188
12479.9
13800
-1
100
12677.4
2022-05-16
4013.02
4090.72
3810.32
3901.36
20590520000
-0.0304506
12199.8
13900
-1
100
12777.4
2022-05-23
3919.42
4158.49
3875.13
4158.24
19139100000
0.0658437
13103.1
14000
-1
100
12877.4
2022-05-30
4151.09
4177.51
4073.85
4108.54
16049940000
-0.0119522
13046.5
14100
1
0
12823.5
2022-06-06
4134.72
4168.78
3900.16
3900.86
17547150000
-0.0505484
12487
14200
1
0
12275.3
2022-06-13
3838.15
3838.15
3636.87
3674.84
24639140000
-0.0579411
11863.5
14300
-1
100
12375.3
2022-06-20
3715.31
3913.65
3715.31
3911.74
19287840000
0.0644654
12728.3
14400
-1
100
12475.3
2022-06-27
3920.76
3945.86
3738.67
3825.33
17735450000
-0.0220899
12547.1
14500
-1
100
12575.3
2022-07-04
3792.61
3918.5
3742.06
3899.38
14223350000
0.0193578
12890
14600
-1
100
12675.3
2022-07-11
3880.94
3880.94
3721.56
3863.16
16313500000
-0.00928865
12870.3
14700
-1
100
12775.3
2022-07-18
3883.79
4012.44
3818.63
3961.63
16859220000
0.0254895
13298.4
14800
-1
100
12875.3
2022-07-25
3965.72
4140.15
3910.74
4130.29
17356830000
0.0425734
13964.5
14900
1
0
13523.5
2022-08-01
4112.38
4167.66
4079.81
4145.19
18072230000
0.00360747
14114.9
15000
1
0
13672.3
2022-08-08
4155.93
4280.47
4112.09
4280.15
18117740000
0.0325582
14674.4
15100
1
0
14217.4
2022-08-15
4269.37
4325.28
4218.7
4228.48
16255850000
-0.012072
14597.3
15200
1
0
14145.8
2022-08-19
4266.31
4266.31
4218.7
4228.48
2045645000
0
14697.3
15300
1
0
14245.8

creating a dataframe and based on 2 dataframe sets that have different lengths

I have 2 dataframe sets , I want to create a third one. I am trying to to write a code that to do the following :
if A_pd["from"] and A_pd["To"] is within the range of B_pd["from"]and B_pd["To"] then add to the C_pd dateframe A_pd["from"] and A_pd["To"] and B_pd["Value"].
if the A_pd["from"] is within the range of B_pd["from"]and B_pd["To"] and A_pd["To"] within the range of B_pd["from"]and B_pd["To"] of teh next row , then i want to split the range A_pd["from"] and A_pd["To"] to 2 ranges (A_pd["from"] and B_pd["To"]) and ( B_pd["To"] and A_pd["To"] ) and the corresponded B_pd["Value"].
I created the following code:
import pandas as pd
A_pd = {'from':[0,20,80,180,250],
'To':[20, 50,120,210,300]}
A_pd=pd.DataFrame(A_pd)
B_pd = {'from':[0,20,100,200],
'To':[20, 100,200,300],
'Value':[20, 17,15,12]}
B_pd=pd.DataFrame(B_pd)
for i in range(len(A_pd)):
numberOfIntrupt=0
for j in range(len(B_pd)):
if A_pd["from"].values[i] >= B_pd["from"].values[j] and A_pd["from"].values[i] > B_pd["To"].values[j]:
numberOfIntrupt+=1
cols = ['C_from', 'C_To', 'C_value']
C_dp=pd.DataFrame(columns=cols, index=range(len(A_pd)+numberOfIntrupt))
for i in range(len(A_pd)):
for j in range(len(B_pd)):
a=A_pd ["from"].values[i]
b=A_pd["To"].values[i]
c_eval=B_pd["Value"].values[j]
range_s=B_pd["from"].values[j]
range_f=B_pd["To"].values[j]
if a >= range_s and a <= range_f and b >= range_s and b <= range_f :
C_dp['C_from'].loc[i]=a
C_dp['C_To'].loc[i]=b
C_dp['C_value'].loc[i]=c_eval
elif a >= range_s and b > range_f:
C_dp['C_from'].loc[i]=a
C_dp['C_To'].loc[i]=range_f
C_dp['C_value'].loc[i]=c_eval
C_dp['C_from'].loc[i+1]=range_f
C_dp['C_To'].loc[i+1]=b
C_dp['C_value'].loc[i+1]=B_pd["Value"].values[j+1]
print(C_dp)
The current result is C_dp:
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 180 200 15
4 250 300 12
5 200 300 12
6 NaN NaN NaN
7 NaN NaN NaN
the expected should be :
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 100 120 15
4 180 200 15
5 200 210 12
6 250 300 12
Thank you a lot for the support
I'm sure there is a better way to do this without loops, but this will help your logic flow.
import pandas as pd
A_pd = {'from':[0, 20, 80, 180, 250],
'To':[20, 50, 120, 210, 300]}
A_pd=pd.DataFrame(A_pd)
B_pd = {'from':[0, 20, 100, 200],
'To':[20, 100,200, 300],
'Value':[20, 17, 15, 12]}
B_pd=pd.DataFrame(B_pd)
cols = ['C_from', 'C_To', 'C_value']
C_dp=pd.DataFrame(columns=cols)
spillover = False
for i in range(len(A_pd)):
for j in range(len(B_pd)):
a_from = A_pd["from"].values[i]
a_to = A_pd["To"].values[i]
b_from = B_pd["from"].values[j]
b_to = B_pd["To"].values[j]
b_value = B_pd['Value'].values[j]
if (a_from >= b_to):
# a_from outside b range
continue # next b
elif (a_from >= b_from):
# a_from within b range
if a_to <= b_to:
C_dp = C_dp.append({"C_from": a_from, "C_To": a_to, "C_value": b_value}, ignore_index=True)
break # next a
else:
C_dp = C_dp.append({"C_from": a_from, "C_To": b_to, "C_value": b_value}, ignore_index=True)
if j < len(B_pd):
spillover = True
continue
if spillover:
if a_to <= b_to:
C_dp = C_dp.append({"C_from": b_from, "C_To": a_to, "C_value": b_value}, ignore_index=True)
spillover = False
break
else:
C_dp = C_dp.append({"C_from": b_from, "C_To": b_to, "C_value": b_value}, ignore_index=True)
spillover = True
continue
print(C_dp)
Output
C_from C_To C_value
0 0 20 20
1 20 50 17
2 80 100 17
3 100 120 15
4 180 200 15
5 200 210 12
6 250 300 12

subtract two columns in a data frame if they have the same ending in a loop

If my data looks like this
Index Country ted_Val1 sam_Val1 ... ted_Val10 sam_Val10
1 Australia 1 3 ... 20 5
2 Bambua 12 33 ... 15 56
3 Tambua 14 34 ... 10 58
df = pd.DataFrame([["Australia", 1, 3, 20, 5],
["Bambua", 12, 33, 15, 56],
["Tambua", 14, 34, 10, 58]
], columns=["Country", "ted_Val1", "sam_Val1", "ted_Val10", "sam_Val10"]
)
I'd like to subtract all 'val_' columns from all 'ted_' values using a list, creating a new column starting with 'dif_' such that:
Index Country ted_Val1 sam_Val1 diff_Val1 ... ted_Val10 sam_Val10 diff_val10
1 Australia 1 3 -2 ... 20 5 -15
2 Bambua 12 33 12 ... 15 56 -41
3 Tambua 14 34 14... 10 58 -48
so far I've got:
calc_vars = ['ted_Val1',
'sam_Val1',
'ted_Val10',
'sam_Val10']
for i in calc_vars:
df_diff['dif_' + str(i)] = df.['ted_' + str(i)] - df.['sam_' + str(i)]
but I'm getting errors, not sure where to go from here. As a warning this is dummy data and there can be several underscores in the names
IIUC you can use filter to choose the columns for subtraction (assuming your columns are properly sorted like your sample):
print (pd.concat([df, pd.DataFrame(df.filter(like="ted").to_numpy()-df.filter(like="sam").to_numpy(),
columns=["diff"+i.split("_")[-1] for i in df.columns if "ted_Val" in i])],1))
Country ted_Val1 sam_Val1 ted_Val10 sam_Val10 diff1 diff10
0 Australia 1 3 20 5 -2 15
1 Bambua 12 33 15 56 -21 -41
2 Tambua 14 34 10 58 -20 -48
try this,
calc_vars = ['ted_Val1', 'sam_Val1', 'ted_Val10', 'sam_Val10']
# extract even & odd values from calc_vars
# ['ted_Val1', 'ted_Val10'], ['sam_Val1', 'sam_Val10']
for ted, sam in zip(calc_vars[::2], calc_vars[1::2]):
df['diff_' + ted.split("_")[-1]] = df[ted] - df[sam]
Edit: if columns are not sorted,
ted_cols = sorted(df.filter(regex="ted_Val\d+"), key=lambda x : x.split("_")[-1])
sam_cols = sorted(df.filter(regex="sam_Val\d+"), key=lambda x : x.split("_")[-1])
for ted, sam in zip(ted_cols, sam_cols):
df['diff_' + ted.split("_")[-1]] = df[ted] - df[sam]
Country ted_Val1 sam_Val1 ted_Val10 sam_Val10 diff_Val1 diff_Val10
0 Australia 1 3 20 5 -2 15
1 Bambua 12 33 15 56 -21 -41
2 Tambua 14 34 10 58 -20 -48

Python : Print function is not giving excepted output

I have written below function in Python:
df = pd.DataFrame({'age': [32, 33, 33,34,44]})
def PROC_FREQ(dataset,arg1):
x= dataset.groupby(arg1)[arg1[0]].agg(({'Frequency':'count'}))
nombre=x.columns.tolist()[0]
x.rename(columns={nombre:'Freq'},inplace=True)
x['Pct']=round((x['Freq']/x.Freq.sum())*100,2)
x['Freq Acum'],x['Cumm Percent']=x.Freq.cumsum(),x.Pct.cumsum()
x.sort_values(arg1,ascending=[1],inplace=True)
pd.set_option('display.max_columns',500)
x=x.reset_index()
string_repr = x.to_string(index=False,justify='center').splitlines()
string_repr.insert(1, "-" * len(string_repr[0]))
out = '\n'.join(string_repr)
df_split = out.split('\n')
columns = shutil.get_terminal_size().columns
for i in range(len(df_split)):
print(df_split[i].center(columns))
and below is the code to call the function:
PROC_FREQ(df,['age'])
and below is the output of the function:
age Freq Pct Freq Acum Cumm Percent
-----------------------------------------
32 1 16.67 1 16.67
33 2 33.33 3 50.00
34 1 16.67 4 66.67
44 2 33.33 6 100.00
Last line the output is not aligned correctly.

Numpy Finding Matching number with Array

Any help is greatly appreciated!! I have been trying to solve this for the last few days....
I have two arrays:
import pandas as pd
OldDataSet = {
'id': [20,30,40,50,60,70]
,'OdoLength': [26.12,43.12,46.81,56.23,111.07,166.38]}
NewDataSet = {
'id': [3000,4000,5000,6000,7000,8000]
,'OdoLength': [25.03,42.12,45.74,46,110.05,165.41]}
df1= pd.DataFrame(OldDataSet)
df2 = pd.DataFrame(NewDataSet)
OldDataSetArray = df1.as_matrix()
NewDataSetArray = df2.as_matrix()
The result that I am trying to get is:
Array 1 and Array 2 Match by closes difference, based on left over number from Array2
20 26.12 3000 25.03
30 43.12 4000 42.12
40 46.81 6000 46
50 56.23 7000 110.05
60 111.07 8000 165.41
70 166.38 0 0
Starting at Array 1, ID 20, find the nearest which in this case would be the first Number in Array 2 ID 3000 (26.12-25.03). so ID 20, gets matched to 3000.
Where it gets tricky is if one value in Array 2 is not the closest, then it is skipped. for example, ID 40 value 46.81 is compared to 45.74, 46 and the smallest value is .81 from 46 ID 6000. So ID 40--> ID 6000. ID 5000 in array 2 is now skipped for any future comparisons. So now when comparing array 1 ID 50, it is compared to the next available number in array 2, 110.05. array 1 ID 50 is matched to Array 2 ID 7000.
UPDATE
so here's the code that i have tried and it works. Yes, it is not the greatest, so if someone has another suggestion please let me know.
import pandas as pd
import operator
OldDataSet = {
'id': [20,30,40,50,60,70]
,'OdoLength': [26.12,43.12,46.81,56.23,111.07,166.38]}
NewDataSet = {
'id': [3000,4000,5000,6000,7000,8000]
,'OdoLength': [25.03,42.12,45.74,46,110.05,165.41]}
df1= pd.DataFrame(OldDataSet)
df2 = pd.DataFrame(NewDataSet)
OldDataSetArray = df1.as_matrix()
NewDataSetArray = df2.as_matrix()
newPos = 1
CurrentNumber = 0
OldArrayLen = len(OldDataSetArray) -1
NewArrayLen = len(NewDataSetArray) -1
numberResults = []
for oldPos in range(len(OldDataSetArray)):
PreviousNumber = abs(OldDataSetArray[oldPos, 0]- NewDataSetArray[oldPos, 0])
while newPos <= len(NewDataSetArray) - 1:
CurrentNumber = abs(OldDataSetArray[oldPos, 0] - NewDataSetArray[newPos, 0])
#if it is the last row for the inner array, then match the next available
#in Array 1 to that last record
if newPos == NewArrayLen and oldPos < newPos and oldPos +1 <= OldArrayLen:
numberResults.append([OldDataSetArray[oldPos +1, 1],NewDataSetArray[newPos, 1],OldDataSetArray[oldPos +1, 0],NewDataSetArray[newPos, 0]])
if PreviousNumber < CurrentNumber:
numberResults.append([OldDataSetArray[oldPos, 1], NewDataSetArray[newPos - 1, 1], OldDataSetArray[oldPos, 0], NewDataSetArray[newPos - 1, 0]])
newPos +=1
break
elif PreviousNumber > CurrentNumber:
PreviousNumber = CurrentNumber
newPos +=1
#sort by array one values
numberResults = sorted(numberResults, key=operator.itemgetter(0))
numberResultsDf = pd.DataFrame(numberResults)
You can use NumPy broadcasting to build a distance matrix:
a = numpy.array([26.12, 43.12, 46.81, 56.23, 111.07, 166.38,])
b = numpy.array([25.03, 42.12, 45.74, 46, 110.05, 165.41,])
numpy.abs(a[:, None] - b[None, :])
# array([[ 1.09, 16. , 19.62, 19.88, 83.93, 139.29],
# [ 18.09, 1. , 2.62, 2.88, 66.93, 122.29],
# [ 21.78, 4.69, 1.07, 0.81, 63.24, 118.6 ],
# [ 31.2 , 14.11, 10.49, 10.23, 53.82, 109.18],
# [ 86.04, 68.95, 65.33, 65.07, 1.02, 54.34],
# [ 141.35, 124.26, 120.64, 120.38, 56.33, 0.97]])
of that matrix you can then find the closest elements using argmin, either row- or columnwise (depending of if you want to search in a or b).
numpy.argmin(numpy.abs(a[:, None] - b[None, :]), axis=1)
# array([0, 1, 3, 3, 4, 5])
Compute all the differences, and use `np.argmin to lookup the closest.
a,b=np.random.rand(2,10)
all_differences=np.abs(np.subtract.outer(a,b))
ia=all_differences.argmin(axis=1)
for i in range(10):
print(i,a[i],ia[i], b[ia[i]])
0 0.231603891949 8 0.21177584152
1 0.27810475456 7 0.302647382888
2 0.582133214953 2 0.548920922033
3 0.892858042793 1 0.872622982632
4 0.67293347218 6 0.677971552011
5 0.985227546492 1 0.872622982632
6 0.82431697833 5 0.83765895237
7 0.426992114791 4 0.451084369838
8 0.181147161752 8 0.21177584152
9 0.631139744522 3 0.653554586691
EDIT
with dataframes and indexes:
va,vb=np.random.rand(2,10)
na,nb=np.random.randint(0,100,(2,10))
dfa=pd.DataFrame({'id':na,'odo':va})
dfb=pd.DataFrame({'id':nb,'odo':vb})
all_differences=np.abs(np.subtract.outer(dfa.odo,dfb.odo))
ia=all_differences.argmin(axis=1)
dfc=dfa.merge(dfb.loc[ia].reset_index(drop=True),\
left_index=True,right_index=True)
Input :
In [337]: dfa
Out[337]:
id odo
0 72 0.426457
1 12 0.315997
2 96 0.623164
3 9 0.821498
4 72 0.071237
5 5 0.730634
6 45 0.963051
7 14 0.603289
8 5 0.401737
9 63 0.976644
In [338]: dfb
Out[338]:
id odo
0 95 0.333215
1 7 0.023957
2 61 0.021944
3 57 0.660894
4 22 0.666716
5 6 0.234920
6 83 0.642148
7 64 0.509589
8 98 0.660273
9 19 0.658639
Output :
In [339]: dfc
Out[339]:
id_x odo_x id_y odo_y
0 72 0.426457 64 0.509589
1 12 0.315997 95 0.333215
2 96 0.623164 83 0.642148
3 9 0.821498 22 0.666716
4 72 0.071237 7 0.023957
5 5 0.730634 22 0.666716
6 45 0.963051 22 0.666716
7 14 0.603289 83 0.642148
8 5 0.401737 95 0.333215
9 63 0.976644 22 0.666716

Categories