The code below calculates the Compounding values starting from $100 and the percentage gains gains. The code below goes from the start off with the entirety of the gains array [20,3,4,55,6.5,-10, 20,-60,5] resulting in 96.25 at the end and then takes off the first index and recalculates the compounding value [3,4,55,6.5,-10, 20,-60,5] resulting in 80.20. It would do this until the end of the gains array [5]. I want to write a code that calculates maximum drawdown as it is calculating f. This would be the compounding results for the first iteration of f [120., 123.6 ,128.544, 199.243, 212.194008 190.9746072, 229.16952864, 91.66781146, 96.25120203] I want to record a value if it is lower than the initial capital Amount value. So the lowest value is 91.67 on the first iteration so that would be the output, and on the second iteration it would be 76.37. Since in the last iteration there is [5] which results in the compounding output of 105 there are no values that go below 100 so it is None as the output. How would I be able to implement this to the code below and get the expected output?
import numpy as np
Amount = 100
def moneyrisk(array):
f = lambda array: Amount*np.cumprod(array/100 + 1, 1)
rep = array[None].repeat(len(array), 0)
rep_t = np.triu(rep, k=0)
final = f(rep_t)[:, -1]
gains= np.array([20,3,4,55,6.5,-10, 20,-60,5])
Expected output:
[91.67, 76.37, 74.164, 71.312, 46.008, 43.2, 48., 40., None]
I think I've understood the requirement. Calculating the compound factors after the np.triu fills the zeroes with ones which means the min method returns a valid value.
import numpy as np
gains= np.array( [20,3,4,55,6.5,-10, 20,-60,5] ) # Gains in %
amount = 100
def moneyrisk( arr ):
rep = arr[ None ].repeat( len(arr), 0 )
rep_t = np.triu( rep, k = 0 )
rep_t = ( 1 + rep_t * .01 ) # Create factors to compound in rep_t
result = amount*(rep_t.cumprod( axis = 1 ).min( axis = 1 ))
# compound and find min value.
return [ x if x < amount else None for x in result ]
# Set >= amount to None in a list as numpy floats can't hold None
moneyrisk( gains )
# [91.667811456, 76.38984288, 74.164896, 71.3124, 46.008, 43.2, 48.0, 40.0, None]
Related
I'm attempting to write python code to solve a transportation problem using the Least Cost method. I have a 2D numpy array that I am iterating through to find the minimum, perform calculations with that minimum, and then replace it with a 0 so that the loops stops when values matches constantarray, an array of the same shape containing only 0s. The values array contains distances from points in supply to points in demand. I'm currently using a while loop to do so, but the loop isn't running because values.all() != constantarray.all() evaluates to False.
I also need the process to repeat once the arrays have been edited to move onto the next lowest number in values.
constarray = np.zeros((len(supply),len(demand)) #create array of 0s
sandmoved = np.zeros((len(supply),len(demand)) #used to store information needed for later
totalcost = 0
while values.all() != constantarray.all(): #iterate until `values` only contains 0s
m = np.argmin(values,axis = 0)[0] #find coordinates of minimum value
n = np.argmin(values,axis = 1)[0]
if supply[m] > abs(demand[m]): #all demand numbers are negative
supply[m]+=demand[n] #subtract demand from supply
totalcost +=abs(demand[n])*values[m,n]
sandmoved[m,n] = demand[n] #add amount of 'sand' moved to an empty array
values[m,0:-1] = 0 #replace entire m row with 0s since demand has been filled
demand[n]=0 #replace demand value with 0
elif supply[m]< abs(demand[n]):
demand[n]+=supply[m] #combine positive supply with negative demand
sandmoved[m,n]=supply[m]
totalcost +=supply[m]*values[m,n]
values[:-1,n]=0 #replace entire column with 0s since supply has been depleted
supply[m] = 0
There is an additional if statement for when supply[m]==demand[n] but I feel that isn't necessary. I've already tried using nested for loops, and so many different syntax combinations for a while loop but I just can't get it to work the way I want it to. Even when running the code block over over by itself, m and n stay the same and the function removes one value from values but doesn't add it to sandmoved. Any ideas are greatly appreciated!!
Well, here is an example from an old implementation of mine:
import numpy as np
values = np.array([[3, 1, 7, 4],
[2, 6, 5, 9],
[8, 3, 3, 2]])
demand = np.array([250, 350, 400, 200])
supply = np.array([300, 400, 500])
totCost = 0
MAX_VAL = 2 * np.max(values) # choose MAX_VAL higher than all values
while np.any(values.ravel() < MAX_VAL):
# find row and col indices of min
m, n = np.unravel_index(np.argmin(values), values.shape)
if supply[m] < demand[n]:
totCost += supply[m] * values[m,n]
demand[n] -= supply[m]
values[m,:] = MAX_VAL # set all row to MAX_VAL
else:
totCost += demand[n] * values[m,n]
supply[m] -= demand[n]
values[:,n] = MAX_VAL # set all col to MAX_VAL
Solution:
print(totCost)
# 2850
Basically, start by choosing a MAX_VAL higher than all given values and a totCost = 0. Then follow the standard steps of the algorithm. Find row and column indices of the smallest cell, say m, n. Select the m-th supply or the n-th demand whichever is smaller, then add what you selected multiplied by values[m,n] to the totCost, and set all entries of the selected row or column to MAX_VAL to avoid it in the next iterations. Update the greater value by subtracting the selected one and repeat until all values are equal to MAX_VAL.
I have been struggling to mock the IoT Sensor data. I need a list of floats which will increase and decrease sequentially.
For example [0.1, 0.12, 0.13, 0.18, 1.0, 1.2, 1.0, 0.9, 0.6]
Right now I have generated the list with max and min range using this,
for k in dts:
x = round(random.uniform(j["min"], j["max"]), 3)
random_float_list.append(x)
list generated form this code is not in a sequence. I need something which generates random floats in range and there are no abrupt changes in it. Values can increase and decrease in a sequence.
You can generate multiple random sequences and glue them together. Something like this:
import numpy as np
def gen_floats(count, min_step_size, max_step_size, max_seq_len):
# Start around 0
res = [np.round(np.random.rand() - 0.5, 2)]
while len(res) < count:
step_size = np.random.uniform(min_step_size, max_step_size)
# Generate random number of steps for sequence
remaining = count - len(res)
steps = np.random.randint(1, remaining + 1 if remaining < max_seq_len else max_seq_len)
# Generate additive or subtractive sequence using previous values
if np.random.rand() > 0.5:
vals = np.round(np.linspace(res[-1] + step_size, res[-1] + steps * step_size, steps), 2)
else:
vals = np.round(np.linspace(res[-1] + step_size, res[-1] - steps * step_size, steps), 2)
res.extend(vals)
return res
Then print(gen_floats(20, 0.1, 0.5, 10)) generates something like: [0.4, 0.86, 0.25, -0.37, -0.99, -1.61, -2.23, -2.85, -2.64, -2.95, -3.26, -3.57, -3.88, -3.63, -3.38, -3.19, -2.89, -2.63, -3.15, -3.68]. You can play with params to match desired output.
Something like this should work if you want a random where you can control the min, max and max difference between the values.
It will first random a value between start and end and append it to the list output. The next value will be a random value between the last value in the output list +-max_diff.
import random
def rand(start,end,max_diff,elements,output):
elements -= 1
if output:
if output[-1]-max_diff < start: #To not get a value smaller than start
output.append(round(random.uniform(start,output[-1]+max_diff),3))
elif output[-1]+max_diff > end: #To not get a value bigger than end
output.append(round(random.uniform(output[-1]-max_diff,end),3))
else:
output.append(round(random.uniform(output[-1]-max_diff,output[-1]+max_diff),3))
else:
output.append(round(random.uniform(start,end),3))
if elements > 0:
output = rand(start,end,max_diff,elements,output)
return output
print(rand(1,2,0.1,3,[])) #[1.381, 1.375, 1.373]
You can generate random numbers with a uniform distribution, and then sort the numbers into ascending order in the first part, and into descending order in the second part.
import numpy as np
np.random.seed(0)
def gen_rnd_sensor_data(low: float,
high: float,
n_incr: int,
n_decr: int) -> np.ndarray:
incr = np.random.uniform(low=low, high=high, size=n_incr)
incr.sort()
decr = np.random.uniform(low=low, high=high, size=n_decr)
decr[::-1].sort()
return np.concatenate((incr, decr))
Then you can call this function with:
print(gen_rnd_sensor_data(0, 1, 5, 3))
This generates data within 0. and 1., the first 5 values are increasing, the last 3 are decreasing. Within the program, every time you call the function, you get different results, but if you rerun your program, you get the same results, so you can debug your program.
Is there a way to get rid of the loop in the code below and replace it with vectorized operation?
Given a data matrix, for each row I want to find the index of the minimal value that fits within ranges defined (per row) in a separate array.
Here's an example:
import numpy as np
np.random.seed(10)
# Values of interest, for this example a random 6 x 100 matrix
data = np.random.random((6,100))
# For each row, define an inclusive min/max range
ranges = np.array([[0.3, 0.4],
[0.35, 0.5],
[0.45, 0.6],
[0.52, 0.65],
[0.6, 0.8],
[0.75, 0.92]])
# For each row, find the index of the minimum value that fits inside the given range
result = np.zeros(6).astype(np.int)
for i in xrange(6):
ind = np.where((ranges[i][0] <= data[i]) & (data[i] <= ranges[i][1]))[0]
result[i] = ind[np.argmin(data[i,ind])]
print result
# Result: [35 8 22 8 34 78]
print data[np.arange(6),result]
# Result: [ 0.30070006 0.35065639 0.45784951 0.52885388 0.61393513 0.75449247]
Approach #1 : Using broadcasting and np.minimum.reduceat -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
r,c = np.nonzero(mask)
cut_idx = np.unique(r, return_index=1)[1]
out = np.minimum.reduceat(data[mask], cut_idx)
Improvement to avoid np.nonzero and compute cut_idx directly from mask :
cut_idx = np.concatenate(( [0], np.count_nonzero(mask[:-1],1).cumsum() ))
Approach #2 : Using broadcasting and filling invalid places with NaNs and then using np.nanargmin -
mask = (ranges[:,None,0] <= data) & (data <= ranges[:,None,1])
result = np.nanargmin(np.where(mask, data, np.nan), axis=1)
out = data[np.arange(6),result]
Approach #3 : If you are not iterating enough (just like you have a loop of 6 iterations in the sample), you might want to stick to a loop for memory efficiency, but make use of more efficient masking with a boolean array instead -
out = np.zeros(6)
for i in xrange(6):
mask_i = (ranges[i,0] <= data[i]) & (data[i] <= ranges[i,1])
out[i] = np.min(data[i,mask_i])
Approach #4 : There is one more loopy solution possible here. The idea would be to sort each row of data. Then, use the two range limits for each row to decide on the start and stop indices with help from np.searchsorted. Further, we would use those indices to slice and then get the minimum values. Benefit with slicing that way is, we would be working with views and as such would be very efficient, both on memory and performance.
The implementation would look something like this -
out = np.zeros(6)
sdata = np.sort(data, axis=1)
for i in xrange(6):
start = np.searchsorted(sdata[i], ranges[i,0])
stop = np.searchsorted(sdata[i], ranges[i,1], 'right')
out[i] = np.min(sdata[i,start:stop])
Furthermore, we could get those start, stop indices in a vectorized manner following an implementation of vectorized searchsorted.
Based on suggestion by #Daniel F for the case when we are dealing with ranges that are within the limits of given data, we could simply use the start indices -
out[i] = sdata[i, start]
Assuming at least one value in range, you don't even have to bother with the upper limit:
result = np.empty(6)
for i in xrange(6):
lt = (ranges[i,0] >= data[i]).sum()
result[i] = np.argpartition(data[i], lt)[lt]
Actually, you could even vectorize the whole thing using argpartition
lt = (ranges[:,None,0] >= data).sum(1)
result = np.argpartition(data, lt)[np.arange(data.shape[0]), lt]
Of course, this is only efficient if data.shape[0] << data.shape[1], as otherwise you're basically sorting
The following is my script. Each equal part has self.number samples, in0 is input sample. There is an error as follows:
pn[i] = pn[i] + d
IndexError: list index out of range
Is this the problem about the size of pn? How can I define a list with a certain size but no exact number in it?
for i in range(0,len(in0)/self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
if pn[i] >= self.alpha:
out[i] = 1
elif pn[i] <= self.beta:
out[i] = 0
else:
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
if pn[i] >= self.noise:
out[i] = 1
else:
out[i] = 0
There are a number of problems in the code as posted, however, the gist seems to be something that you'd want to do with numpy arrays instead of iterating over lists.
For example, the set of if/else cases that check if pn[i] >= some_value and then sets a corresponding entry into another list with the result (true/false) could be done as a one-liner with an array operation much faster than iterating over lists.
import numpy as np
# for example, assuming you have 9 numbers in your list
# and you want them divided into 3 sublists of 3 values each
# in0 is your original list, which for example might be:
in0 = [1.05, -0.45, -0.63, 0.07, -0.71, 0.72, -0.12, -1.56, -1.92]
# convert into array
in2 = np.array(in0)
# reshape to 3 rows, the -1 means that numpy will figure out
# what the second dimension must be.
in2 = in2.reshape((3,-1))
print(in2)
output:
[[ 1.05 -0.45 -0.63]
[ 0.07 -0.71 0.72]
[-0.12 -1.56 -1.92]]
With this 2-d array structure, element-wise summing is super easy. So is element-wise threshold checking. Plus 'vectorizing' these operations has big speed advantages if you are working with large data.
# add corresponding entries, we want to add the columns together,
# as each row should correspond to your sub-lists.
pn = in2.sum(axis=0) # you can sum row-wise or column-wise, or all elements
print(pn)
output: [ 1. -2.72 -1.83]
# it is also trivial to check the threshold conditions
# here I check each entry in pn against a scalar
alpha = 0.0
out1 = ( pn >= alpha )
print(out1)
output: [ True False False]
# you can easily convert booleans to 1/0
x = out1.astype('int') # or simply out1 * 1
print(x)
output: [1 0 0]
# if you have a list of element-wise thresholds
beta = np.array([0.0, 0.5, -2.0])
out2 = (pn >= beta)
print(out2)
output: [True False True]
I hope this helps. Using the correct data structures for your task can make the analysis much easier and faster. There is a wealth of documentation on numpy, which is the standard numeric library for python.
You initialize pn to an empty list just inside the for loop, never assign anything into it, and then attempt to access an index i. There is nothing at index i because there is nothing at any index in pn yet.
for i in range(0, len(in0) / self.number):
pn = []
m = i*self.number
for d in in0[m: m + self.number]:
pn[i] += d
If you are trying to add the value d to the pn list, you should do this instead:
pn.append(d)
I have a python function which takes in two lists, looks for pairs in the two inputs where both have positive values at the same index, and creates two output lists by appending to each one of those two positive values. I have a working function:
def get_pairs_in_first_quadrant(x_in, y_in):
"""If both x_in[i] and y_in[i] are > 0 then both will appended to the output list. If either are negative
then the pair of them will be absent from the output list.
:param x_in: A list of positive or negative floats
:param y_in: A list of positive or negative floats
:return: A list of positive floats <= in length to the inputs.
"""
x_filtered, y_filtered = [], []
for x, y in zip(x_in, y_in):
if x > 0 and y > 0:
x_filtered.append(x)
y_filtered.append(y)
return x_filtered, y_filtered
How can I make this faster using numpy?
You can do this by simply finding the indices where they are both positive:
import numpy as np
a = np.random.random(10) - .5
b = np.random.random(10) - .5
def get_pairs_in_first_quadrant(x_in, y_in):
i = np.nonzero( (x_in>0) & (y_in>0) ) # main line of interest
return x_in[i], y_in[i]
print a # [-0.18012451 -0.40924713 -0.3788772 0.3186816 0.14811581 -0.04021951 -0.21278312 -0.36762629 -0.45369899 -0.46374929]
print b # [ 0.33005969 -0.03167875 0.11387641 0.22101336 0.38412264 -0.3880842 0.08679424 0.3126209 -0.08760505 -0.40921421]
print get_pairs_in_first_quadrant(a, b) # (array([ 0.3186816 , 0.14811581]), array([ 0.22101336, 0.38412264]))
I was interested in Jaime's suggestion to just using the boolean indexing without calling nonzero so I ran some timing tests. The results are somewhat interesting since they advantage ratio is non-monotonic with the number of positive matches, but basically, at least for speed, it doesn't really matter which is used (though nonzero is usually a bit faster, and can be about twice as fast):
threshold = .6
a = np.random.random(10000) - threshold
b = np.random.random(10000) - threshold
def f1(x_in, y_in):
i = np.nonzero( (x_in>0) & (y_in>0) ) # main line of interest
return x_in[i], y_in[i]
def f2(x_in, y_in):
i = (x_in>0) & (y_in>0) # main line of interest
return x_in[i], y_in[i]
print threshold, len(f1(a,b)[0]), len(f2(a,b)[0])
print timeit("f1(a, b)", "from __main__ import a, b, f1, f2", number = 1000)
print timeit("f2(a, b)", "from __main__ import a, b, f1, f2", number = 1000)
Which gives, for different threshold values:
0.05 9086 9086
0.0815141201019
0.104746818542
0.5 2535 2535
0.0715141296387
0.153401851654
0.95 21 21
0.027126789093
0.0324990749359