unable to loop through numpy arrays

unable to loop through numpy arrays - python

I am really confused and can't seem to find an answer for my code below. I keep getting the following error:
File "C:\Users\antoniozeus\Desktop\backtester2.py", line 117, in backTest
if prices >= smas:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Now, as you will see my code below, I am trying to compare two numpy arrays, step by step, to try and generate a signal once my condition is met. This is based on stock Apple data.
Going from one point at a time so starting at index[0] then [1], if my prices is greater than or equal to smas (moving average), then a signal is produced. Here is the code:
def backTest():
#Trade Rules
#Buy when prices are greater than our moving average
#Sell when prices drop below or moving average
portfolio = 50000
tradeComm = 7.95
stance = 'none'
buyPrice = 0
sellPrice = 0
previousPrice = 0
totalProfit = 0
numberOfTrades = 0
startPrice = 0
startTime = 0
endTime = 0
totalInvestedTime = 0
overallStartTime = 0
overallEndTime = 0
unixConvertToWeeks = 7*24*60*60
unixConvertToDays = 24*60*60
date, closep, highp, lowp, openp, volume = np.genfromtxt('AAPL2.txt', delimiter=',', unpack=True,
converters={ 0: mdates.strpdate2num('%Y%m%d')})
## FIRST SMA
window = 10
weights = np.repeat(1.0, window)/window
'''valid makes sure that we only calculate from valid data, no MA on points 0:21'''
smas = np.convolve(closep, weights, 'valid')
prices = closep[9:]
for price in prices:
if stance == 'none':
if prices >= smas:
print "buy triggered"
buyPrice = closep
print "bought stock for", buyPrice
stance = "holding"
startTime = date
print 'Enter Date:', startTime
if numberOfTrades == 0:
startPrice = buyPrice
overallStartTime = date
numberOfTrades += 1
elif stance == 'holding':
if prices < smas:
print 'sell triggered'
sellPrice = closep
print 'finished trade, sold for:',sellPrice
stance = 'none'
tradeProfit = sellPrice - buyPrice
totalProfit += tradeProfit
print totalProfit
print 'Exit Date:', endTime
endTime = date
timeInvested = endTime - startTime
totalInvestedTime += timeInvested
overallEndTime = endTime
numberOfTrades += 1
#this is our reset
previousPrice = closep

You have numpy arrays -- smas is the output of np.convolve which is an array, and I believe that prices is also an array. with numpy,arr > other_arrwill return anndarray` which doesn't have a well defined truth value (hence the error).
You probably want to compare price with a single element from smas although I'm not sure which (or what np.convolve is going to return here -- It may only have a single element)...

I think you mean
if price >= smas
You have
if prices >= smas
which compares the whole list at once.

Related

How to replace 'for' loop with more efficient code for stock market analysis example

I am working on a stock market analysis project. I am attempting to find the highest price in the past 5 days, the volume on the day of the highest price, and how many days before that highest price occurred.
I have constructed a solution utilizing a couple of 'for' loops, but would like to find a more efficient way to code this without utilizing 'for' loops. Any suggestions would be appreciated.
A1 = pd.merge(A, B, left_index = True, right_index = True)
A1["Date"] = A1.index
A1.reset_index(inplace = True)
### 5 Day High and Volume
Indexes = []
for index in range(len(A1.index) - 5):
M = 0
H = 0
for i in range(1,6):
if H < A1.iloc[i+index,2]:
H = A1.iloc[i+index,2]
M = i+index
Indexes.append(M)
Vol = pd.DataFrame(columns = ['B','C'])
Vol5 = []
DH5 = []
Z = []
count = 0
for i in Indexes:
Vol5.append(A1.iloc[i,1])
DH5.append(A1.iloc[i,2])
Z.append(count - i)
count += 1
for i in range(5):
Vol5.append(np.nan)
DH5.append(np.nan)
Z.append(np.nan)
Vol['B'] = Vol5
Vol.index = A1['Date']
Vol['C'] = DH5
Vol['D'] = Z

I suggest using the rolling method to find the index of the maximum value computed over the previous 5 rows:
import pandas as pd
import numpy as np
d={'date':np.random.random(10), 'open':np.random.random(10), 'high':np.random.random(10), 'low':np.random.random(10), 'close':np.random.random(10), 'volume':np.random.random(10)}
A1=pd.DataFrame(data=d)
df=A1.rolling(window=5).apply(np.argmax).shift(1).fillna(0)
Then to find the volume associated with this maximum value (in this example for the highest column):
A1['volume associated with maximum price']=A1.iloc[df.high]['volume']

Trying to get a specific temperature range while also keeping in range of other two variables

min_desired =int(input("Min. Desired Temp.: "))
max_desired = int(input("Man. Desired Temp.: "))
def desired(min_desired,max_desired):
holder= []
count = 0
total = 0
with open('C:/Users/amaya/OneDrive/Desktop/Weather_final.txt','r') as weather_contents:
weather = weather_contents.readlines()
for lines in weather:
#Use map to convert values on each line to float and to list
column = list(map(float, lines.strip().split()))
holder.append(column)
print(holder)
for x in holder:
print(x)
if x >= min_desired and x <= max_desired:
if humidity < 70 and humidity > 40:
if wind < 12:
count +=1
total += x
avg = (total/ count)
print(count)
print (avg)
print(desired(min_desired, max_desired))
I'm aware that 'humidity' and 'wind' are undefined and that what I've tried might be completely wrong. I'm stumped on how to get the first column, which would be 'Temp' that needs to be in a specific range.
ex. min temp = 60
max temp = 85
while taking into consideration 2 pre-set conditions
humidity must be between 70 and 40 & wind must be lower than 12
Thanks in advance for all the help!!
enter image description here

I would suggest using pandas. Not only will this make parsing your textfile much easier, but pandas dataframes have methods to help you select data based on any criteria you want. Using pandas, your code can be made much simpler:
import pandas as pd
weather = pd.read_csv('weather.txt', sep=" ", names=['Temperature', 'Humidity', 'Wind'])
To select data where wind < 12 and 40 < humidity < 70:
subset = weather.loc[(weather['Wind']>12) & (weather['Humidity']>40) & (weather['Humidity']<70)]

Normally I would use pandas because it would need simpler code and it has many other useful functions.
But here I will show how it could be done without pandas - but I don't have your data to test it.
I would use one for-loop and then I could directly convert column (or rather row) to variables
temp, humidity, wind = column
# --- functions ---
def desired(min_desired,max_desired):
#data = []
count = 0
total = 0
with open('C:/Users/amaya/OneDrive/Desktop/Weather_final.txt','r') as weather_contents:
for line in weather_contents:
row = list(map(float, line.strip().split()))
#data.append(row)
temp, humidity, wind = row
if min_desired <= temp <= max_desired and 40 < humidity < 70 and wind < 12:
count += 1
total += temp
print('count:', count)
if count != 0:
print('avg:', total/count) # don't divide by zero
# --- main ---
min_desired = int(input("Min. Desired Temp.: "))
max_desired = int(input("Man. Desired Temp.: "))
# without `print()` if you use `print()` inside function
desired(min_desired, max_desired)

Calculating monthly growth percentage from cumulative total growth

I am trying to calculate a constant for month-to-month growth rate from an annual growth rate (goal) in Python.
My question has arithmetic similarities to this question, but was not completely answered.
For example, if total annual sales for 2018 are $5,600,000.00 and I have an expected 30% increase for the next year, I would expect total annual sales for 2019 to be $7,280,000.00.
BV_2018 = 5600000.00
Annual_GR = 0.3
EV_2019 = (BV * 0.3) + BV
I am using the last month of 2018 to forecast the first month of 2019
Last_Month_2018 = 522000.00
Month_01_2019 = (Last_Month_2018 * CONSTANT) + Last_Month_2018
For the second month of 2019 I would use
Month_02_2019 = (Month_01_2019 * CONSTANT) + Month_01_2019
...and so on and so forth
The cumulative sum of Month_01_2019 through Month_12_2019 needs to be equal to EV_2019.
Does anyone know how to go about calculating the constant in Python? I am familiar with the np.cumsum function, so that part is not an issue. My problem is I cannot solve for the constant I need.
Thank you in advance and please do not hesitate to ask for further clarification.
More clarification:
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
We are trying to get from BV to EV (which is a cumulative sum) by calculating the cumulative sum of the [12] monthly totals. Each monthly total will have a % increase from the previous month that is constant across months. It is this % increase that I want to solve for.
Keep in mind, BV is the last month of the previous year. It is from BV that our forecast (i.e., Months 1 through 12) will be calculated. So, I'm thinking that it makes sense to go from BV to the EV plus the BV. Then, just remove BV and its value from the list, giving us EV as the cumulative total of Months 1 through 12.
I imagine using this constant in a function like this:
def supplier_forecast_calculator(sales_at_cost_prior_year, sales_at_cost_prior_month, year_pct_growth_expected):
"""
Calculates monthly supplier forecast
Example:
monthly_forecast = supplier_forecast_calculator(sales_at_cost_prior_year = 5600000,
sales_at_cost_prior_month = 522000,
year_pct_growth_expected = 0.30)
monthly_forecast.all_metrics
"""
# get monthly growth rate
monthly_growth_expected = CONSTANT
# get first month sales at cost
month1_sales_at_cost = (sales_at_cost_prior_month*monthly_growth_expected)+sales_at_cost_prior_month
# instantiate lists
month_list = ['Month 1'] # for months
sales_at_cost_list = [month1_sales_at_cost] # for sales at cost
# start loop
for i in list(range(2,13)):
# Append month to list
month_list.append(str('Month ') + str(i))
# get sales at cost and append to list
month1_sales_at_cost = (month1_sales_at_cost*monthly_growth_expected)+month1_sales_at_cost
# append month1_sales_at_cost to sales at cost list
sales_at_cost_list.append(month1_sales_at_cost)
# add total to the end of month_list
month_list.insert(len(month_list), 'Total')
# add the total to the end of sales_at_cost_list
sales_at_cost_list.insert(len(sales_at_cost_list), np.sum(sales_at_cost_list))
# put the metrics into a df
all_metrics = pd.DataFrame({'Month': month_list,
'Sales at Cost': sales_at_cost_list}).round(2)
# return the df
return all_metrics

Let r = 1 + monthly_rate. Then, the problem we are trying to solve is
r + ... + r**12 = EV/BV. We can use numpy to get the numeric solution. This should be relatively fast in practice. We are solving a polynomial r + ... + r**12 - EV/BV = 0 and recovering monthly rate from r. There will twelve complex roots, but only one real positive one - which is what we want.
import numpy as np
# get beginning value (BV)
BV = 522000.00
# get desired end value (EV)
EV = 7280000.00
def get_monthly(BV, EV):
coefs = np.ones(13)
coefs[-1] -= EV / BV + 1
# there will be a unique positive real root
roots = np.roots(coefs)
return roots[(roots.imag == 0) & (roots.real > 0)][0].real - 1
rate = get_monthly(BV, EV)
print(rate)
# 0.022913299846925694
Some comments:
roots.imag == 0 may be problematic in some cases since roots uses a numeric algorithm. As an alternative, we can pick a root with the least imaginary part (in absolute value) among all roots with a positive real part.
We can use the same method to get rates for other time intervals. For example, for weekly rates, we can replace 13 == 12 + 1 with 52 + 1.
The above polynomial has a solution by radicals, as outlined here.
Update on performance. We could also frame this as a fixed point problem, i.e. to look for a fixed point of a function
x = EV/BV * x ** 13 - EV/BV + 1
The fix point x will be equal to (1 + rate)**13.
The following pure-Python implementation is roughly four times faster than the above numpy version on my machine.
def get_monthly_fix(BV, EV, periods=12):
ratio = EV / BV
r = guess = ratio
while True:
r = ratio * r ** (1 / periods) - ratio + 1
if abs(r - guess) < TOLERANCE:
return r ** (1 / periods) - 1
guess = r
We can make this run even faster with a help of numba.jit.

I am not sure if this works (tell me if it doesn't) but try this.
def get_value(start, end, times, trials=100, _amount=None, _last=-1, _increase=None):
#don't call with _amount, _last, or _increase! Only start, end and times
if _amount is None:
_amount = start / times
if _increase is None:
_increase = start / times
attempt = 1
for n in range(times):
attempt = (attempt * _amount) + attempt
if attempt > end:
if _last != 0:
_increase /= 2
_last = 0
_amount -= _increase
elif attempt < end:
if _last != 1:
_increase /= 2
_last = 1
_amount += _increase
else:
return _amount
if trials <= 0:
return _amount
return get_value(start, end, times, trials=trials-1,
_amount=_amount, _last=_last, _increase=_increase)
Tell me if it works.
Used like this:
get_value(522000.00, 7280000.00, 12)

Comparing values in Python data frame efficiently

I'm trading daily on Cryptocurrencies and would like to find which are the most desirable Cryptos for trading.
I have CSV file for every Crypto with the following fields:
Date Sell Buy
43051.23918 1925.16 1929.83
43051.23919 1925.12 1929.79
43051.23922 1925.12 1929.79
43051.23924 1926.16 1930.83
43051.23925 1926.12 1930.79
43051.23926 1926.12 1930.79
43051.23927 1950.96 1987.56
43051.23928 1190.90 1911.56
43051.23929 1926.12 1930.79
I would like to check:
How many quotes will end with profit:
for Buy positions - if one of the following Sells > current Buy.
for Sell positions - if one of the following Buys < current Sell.
How much time it would take to a theoretical position to become profitable.
What can be the profit potential.
I'm using the following code:
#converting from OLE to datetime
OLE_TIME_ZERO = dt.datetime(1899, 12, 30, 0, 0, 0)
def ole(oledt):
return OLE_TIME_ZERO + dt.timedelta(days=float(oledt))
#variables initialization
buy_time = ole(43031.57567) - ole(43031.57567)
sell_time = ole(43031.57567) - ole(43031.57567)
profit_buy_counter = 0
no_profit_buy_counter = 0
profit_sell_counter = 0
no_profit_sell_counter = 0
max_profit_buy_positions = 0
max_profit_buy_counter = 0
max_profit_sell_positions = 0
max_profit_sell_counter = 0
df = pd.read_csv("C:/P/Crypto/bitcoin_test_normal_276k.csv")
#comparing to max
for index, row in df.iterrows():
a = index + 1
df_slice = df[a:]
if df_slice["Sell"].max() - row["Buy"] > 0:
max_profit_buy_positions += df_slice["Sell"].max() - row["Buy"]
max_profit_buy_counter += 1
for index1, row1 in df_slice.iterrows():
if row["Buy"] < row1["Sell"] :
buy_time += ole(row1["Date"])- ole(row["Date"])
profit_buy_counter += 1
break
else:
no_profit_buy_counter += 1
#comparing to sell
for index, row in df.iterrows():
a = index + 1
df_slice = df[a:]
if row["Sell"] - df_slice["Buy"].min() > 0:
max_profit_sell_positions += row["Sell"] - df_slice["Buy"].min()
max_profit_sell_counter += 1
for index2, row2 in df_slice.iterrows():
if row["Sell"] > row2["Buy"] :
sell_time += ole(row2["Date"])- ole(row["Date"])
profit_sell_counter += 1
break
else:
no_profit_sell_counter += 1
num_rows = len(df.index)
buy_avg_time = buy_time/num_rows
sell_avg_time = sell_time/num_rows
if max_profit_buy_counter == 0:
avg_max_profit_buy = "There is no profitable buy positions"
else:
avg_max_profit_buy = max_profit_buy_positions/max_profit_buy_counter
if max_profit_sell_counter == 0:
avg_max_profit_sell = "There is no profitable sell positions"
else:
avg_max_profit_sell = max_profit_sell_positions/max_profit_sell_counter
The code works fine for 10K-20K lines but for a larger amount (276K) it take a long time (more than 10 hrs)
What can I do in order to improve it?
Is there any "Pythonic" way to compare each value in a data frame to all following values?
note - the dates in the CSV are in OLE so I need to convert it to Datetime.
File for testing:
Thanks for your comment.
Here you can find the file that I used:

First, I'd want to create the cumulative maximum/minimum values for Sell and Buy per row, so it's easy to compare to. pandas has cummax and cummin, but they go the wrong way. So we'll do:
df['Max Sell'] = df[::-1]['Sell'].cummax()[::-1]
df['Min Buy'] = df[::-1]['Buy'].cummin()[::-1]
Now, we can just compare each row:
df['Buy Profit'] = df['Max Sell'] - df['Buy']
df['Sell Profit'] = df['Sell'] - df['Min Buy']
I'm positive this isn't exactly what you want as I don't perfectly understand what you're trying to do, but hopefully it leads you in the right direction.
After comparing your function and mine, there is a slight difference, as your a is offset one off the index. Removing that offset, you'll see that my method produces the same results as yours, only in vastly shorter time:
for index, row in df.iterrows():
a = index
df_slice = df[a:]
assert (df_slice["Sell"].max() - row["Buy"]) == df['Max Sell'][a] - df['Buy'][a]
else:
print("All assertions passed!")
Note this will still take the very long time required by your function. Note that this can be fixed with shift, but I don't want to run your function for long enough to figure out what way to shift it.

Conditionally setting values with df.loc inside a loop

I'm querying an MS Access db to retrieve a set of leases. My task is to calculate monthly totals for base rent for the next 60 months. The leases have dates related to start and end in order to calculate the correct periods in the event a lease terminates prior to 60 periods. My current challenge comes in when I attempt to increase the base rent by a certain amount whenever it's time to increment for that specific lease. I'm at a beginner level with Python/pandas so my approach is likely not optimum and the code rough looking. It's likely a vectorized approach is better suited however i'm not quite able to execute such code yet.
Data:
Lease input & output
Code:
try:
sql = 'SELECT * FROM [tbl_Leases]'
#sql = 'SELECT * FROM [Copy Of tbl_Leases]'
df = pd.read_sql(sql, conn)
#print df
#df.to_csv('lease_output.csv', index_label='IndexNo')
df_fcst_periods = pd.DataFrame()
# init increments
periods = 0
i = 0
# create empty lists to store looped info from original df
fcst_months = []
fcst_lease_num = []
fcst_base_rent = []
fcst_method = []
fcst_payment_int = []
fcst_rate_inc_amt = []
fcst_rate_inc_int = []
fcst_rent_start = []
# create array for period deltas, rent interval calc, pmt interval calc
fcst_period_delta = []
fcst_rate_int_bool = []
fcst_pmt_int_bool = []
for row in df.itertuples():
# get min of forecast period or lease ending date
min_period = min(fcst_periods, df.Lease_End_Date[i])
# count periods to loop for future periods in new df_fcst
periods = (min_period.year - currentMonth.year) * 12 + (min_period.month - currentMonth.month)
for period in range(periods):
nextMonth = (currentMonth + monthdelta(period))
period_delta = (nextMonth.year - df.Rent_Start_Date[i].year) * 12 + (nextMonth.month - df.Rent_Start_Date[i].month)
period_delta = float(period_delta)
# period delta values allow us to divide by the payment & rent intervals looking for integers
rate_int_calc = period_delta/df['Rate_Increase_Interval'][i]
pmt_int_calc = period_delta/df['Payment_Interval'][i]
# float.is_integer() method - returns bool
rate_int_bool = rate_int_calc.is_integer()
pmt_int_bool = pmt_int_calc.is_integer()
# conditional logic to handle base rent increases
if df['Forecast_Method'][i] == "Percentage" and rate_int_bool:
rate_increase = df['Base_Rent'][i] * (1 + df['Rate_Increase_Amt'][i]/100)
df.loc[df.index, "Base_Rent"] = rate_increase
fcst_base_rent.append(df['Base_Rent'][i])
print "Both True"
else:
fcst_base_rent.append(df['Base_Rent'][i])
print rate_int_bool
fcst_rate_int_bool.append(rate_int_bool)
fcst_pmt_int_bool.append(pmt_int_bool)
fcst_months.append(nextMonth)
fcst_period_delta.append(period_delta)
fcst_rent_start.append(df['Rent_Start_Date'][i])
fcst_lease_num.append(df['Lease_Number'][i])
#fcst_base_rent.append(df['Base_Rent'][i])
fcst_method.append(df['Forecast_Method'][i])
fcst_payment_int.append(df['Payment_Interval'][i])
fcst_rate_inc_amt.append(df['Rate_Increase_Amt'][i])
fcst_rate_inc_int.append(df['Rate_Increase_Interval'][i])
i += 1
df_fcst_periods['Month'] = fcst_months
df_fcst_periods['Rent_Start_Date'] = fcst_rent_start
df_fcst_periods['Lease_Number'] = fcst_lease_num
df_fcst_periods['Base_Rent'] = fcst_base_rent
df_fcst_periods['Forecast_Method'] = fcst_method
df_fcst_periods['Payment_Interval'] = fcst_payment_int
df_fcst_periods['Rate_Increase_Amt'] = fcst_rate_inc_amt
df_fcst_periods['Rate_Increase_Interval'] = fcst_rate_inc_int
df_fcst_periods['Period_Delta'] = fcst_period_delta
df_fcst_periods['Rate_Increase_Interval_bool'] = fcst_rate_int_bool
df_fcst_periods['Payment_Interval_bool'] = fcst_pmt_int_bool
except Exception, e:
print str(e)
conn.close()

I ended up initializing a variable before the periods loop which allowed me to perform a calculation when looping to obtain the correct base rents for subsequent periods.
# init base rent, rate increase amount, new rate for leases
base_rent = df['Base_Rent'][i]
rate_inc_amt = float(df['Rate_Increase_Amt'][i])
new_rate = 0
for period in range(periods):
nextMonth = (currentMonth + monthdelta(period))
period_delta = (nextMonth.year - df.Rent_Start_Date[i].year) * 12 + (nextMonth.month - df.Rent_Start_Date[i].month)
period_delta = float(period_delta)
# period delta values allow us to divide by the payment & rent intervals looking for integers
rate_int_calc = period_delta/df['Rate_Increase_Interval'][i]
pmt_int_calc = period_delta/df['Payment_Interval'][i]
# float.is_integer() method - returns bool
rate_int_bool = rate_int_calc.is_integer()
pmt_int_bool = pmt_int_calc.is_integer()
# conditional logic to handle base rent increases
if df['Forecast_Method'][i] == "Percentage" and rate_int_bool:
new_rate = base_rent * (1 + rate_inc_amt/100)
base_rent = new_rate
fcst_base_rent.append(new_rate)
elif df['Forecast_Method'][i] == "Manual" and rate_int_bool:
new_rate = base_rent + rate_inc_amt
base_rent = new_rate
fcst_base_rent.append(new_rate)
else:
fcst_base_rent.append(base_rent)
Still open for any alternative approaches though!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

unable to loop through numpy arrays - python

I think you mean if price >= smas You have if prices >= smas which compares the whole list at once.

Related

How to replace 'for' loop with more efficient code for stock market analysis example

Trying to get a specific temperature range while also keeping in range of other two variables

Calculating monthly growth percentage from cumulative total growth

Comparing values in Python data frame efficiently

Conditionally setting values with df.loc inside a loop

Categories

Resources