I'm working on a time-series problem, and I have a list of events such that each data point represent several objects being pulled from an inventory.
Each time the value reaches below some threshold, I want to add a constant number to the inventory.
For example, I want:
(threshold = 55, constant = 20)
70 60 50 45 30 0 -5 -75
to become:
70 60 70 65 70 60 75 25
Is there a "pythonic" way (pandas, numpy, etc...) to do it with no loops?
Edit: the addition of constant can occur multiple times, and only effect the future (i.e indexes that are greater than the observed index). This is the code I'm using right now, and my goal is to lose the for loop:
threshold = 55
constant = 20
a = np.array([70, 60, 50, 45, 30, 0, -5, -75])
b = a.copy()
for i in range(len(b)):
if b[i] <= threshold:
temp_add_array = np.zeros(b.shape)
indexes_to_add = np.array(range(len(b))) >= i
temp_add_array[indexes_to_add] += constant
b += temp_add_array.astype(int)
print(b)
print('*************')
print('[70 60 70 65 70 60 75 25]')
Since you're allowing for numpy:
>>> import numpy as np
# threshold and constant
>>> t, c = 55, 20
>>> data = np.asarray([70, 60, 50, 45, 30, 0, -5, -75])
# if you allow for data == threshold
>>> np.where(data >= t, data, data + c*((t-1-data) // c + 1))
array([70, 60, 70, 65, 70, 60, 55, 65])
# if you enforce data > threshold
>>> np.where(data > t, data, data + c*((t-data) // c + 1))
array([70, 60, 70, 65, 70, 60, 75, 65])
But there is really no need for an external dependency for a task like that
# threshold and constant
>>> t, c = 55, 20
>>> data = [70, 60, 50, 45, 30, 0, -5, -75]
# if you allow for data == threshold
>>> [x if x >= t else x + c*((t-1-x)//c + 1) for x in data]
[70, 60, 70, 65, 70, 60, 55, 65]
# if you enforce data > threshold
>>> [x if x > t else x + c*((t-x)//c + 1) for x in data]
[70, 60, 70, 65, 70, 60, 75, 65]
Edit of OP
I doubt there's a (readable) solution for your problem without using a loop; best thing I could come up with:
>>> import numpy as np
>>> a = np.asarray([70, 60, 50, 45, 30, 0, -5, -75])
# I don't think you *can* get rid of the loop since there are forward dependencies in the the data
>>> def stock_inventory(data: np.ndarray, threshold: int, constant: int) -> np.ndarray:
... res = data.copy()
... for i, e in enumerate(res):
... if e <= threshold:
... res[i:] += constant
... return res
...
>>> stock_inventory(a, threshold=55, constant=20)
array([70, 60, 70, 65, 70, 60, 75, 25])
Assuming a numpy ndarray...
original array is named a
subtract the threshold value from a - name the result b
make a boolean array of b < 0 - name this array c
integer/floor divide b by -1 * constant - name this d (it could be named b as it is no longer needed)
add one to d - name this e
use c as a boolean index to add e to a for those values that were less than the threshold. a[c] += e[c]
Related
I have two lists of marks for the same set of students. For example:
A = [22, 2, 88, 3, 93, 84]
B = [66, 0, 6, 33, 99, 45]
If I accept only students above a threshold according to list A then I can look up their marks in list B. For example, if I only accept students with at least a mark of 80 from list A then their marks in list B are [6, 99, 45].
I would like to compute the smallest threshold for A which gives at least 90% of students in the derived set in B getting at least 50. In this example the threshold will have to be 93 which gives the list [99] for B.
Another example:
A = [3, 36, 66, 88, 99, 52, 55, 42, 10, 70]
B = [5, 30, 60, 80, 80, 60, 45, 45, 15, 60]
In this case we have to set the threshold to 66 which then gives 100% of [60, 80, 80, 60] getting at least 50.
This is an O(nlogn + m) approach (due to sorting) where n is the length of A and m is the length of B:
from operator import itemgetter
from itertools import accumulate
def find_threshold(lst_a, lst_b):
# find the order of the indices of lst_a according to the marks
indices, values = zip(*sorted(enumerate(lst_a), key=itemgetter(1)))
# find the cumulative sum of the elements of lst_b above 50 sorted by indices
cumulative = list(accumulate(int(lst_b[j] > 50) for j in indices))
for i, value in enumerate(values):
# find the index where the elements above 50 is greater than 90%
if cumulative[-1] - cumulative[i - 1] > 0.9 * (len(values) - i):
return value
return None
print(find_threshold([22, 2, 88, 3, 93, 84], [66, 0, 6, 33, 99, 45]))
print(find_threshold([3, 36, 66, 88, 99, 52, 55, 42, 10, 70], [5, 30, 60, 80, 80, 60, 45, 45, 15, 60]))
Output
93
66
First, define a function that will tell you if 90% of students in a set scored more than 50:
def setb_90pc_pass(b):
return sum(score >= 50 for score in b) >= len(b) * 0.9
Next, loop over scores in a in ascending order, setting each of them as the threshold. Filter out your lists according that threshold, and check if they fulfill your condition:
for threshold in sorted(A):
filtered_a, filtered_b = [], []
for ai, bi in zip(A, B):
if ai >= threshold:
filtered_a.append(ai)
filtered_b.append(bi)
if setb_90pc_pass(filtered_b):
break
print(threshold)
I want to write myself a program where a variable is going to increment everytime in the while-loop and want at the end, that all values will be stored in a list. At the end the values of the list should be summed with sum().
My problem is that when I execute my program it just let me show the last number of all. I want to have like l = [5,10,15,...,175] and not just l = [175] (I hope its clear what I mean)
def calc_cost():
x = 0
k = 34
j = 0
while x <= k:
x = x + 1
j = j + 5
l = []
l.append(j)
print(sum(l))
print(calc_cost())
def calc_cost():
x = 0
k = 34
j = 0
l = []
while x <= k:
x = x + 1
j = j + 5
l.append(j)
print(l)
return sum(l)
print(calc_cost())
I made the edit I suggested. I also returned the sum so it could be printed by the line: print(calc_cost())
Here is the output:
[5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175]
3150
Here's a similar approach using the range function:
def calc_cost():
k = 34
l = list( range( 5, (k+2) * 5, 5 ) )
print(l)
return sum(l)
print(calc_cost())
What would be the most efficient way to find the frequency/count of elements in non-overlapping intervals? For example:
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
For the above lists, I want to find the number of elements in data that are within two adjacent limits. So for the above, the count would be something like:
0-25: 6;
25-40: 1;
40-60: 3;
All I can think of is O(n^2) in time. Is there a better way to do it?
Doesn't need Counter to count as number of bins is known, swaps dict to array accesses for binning..
from bisect import bisect_right
def bin_it(limits, data):
"Bin data according to (ascending) limits."
bins = [0] * (len(limits) + 1) # adds under/over range bins too
for d in data:
bins[bisect_right(limits, d)] += 1
return bins
if __name__ == "__main__":
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
bins = bin_it(limits, data)
print(f" < {limits[0]:2} :", bins[0])
for lo, hi, count in zip(limits, limits[1:], bins[1:]):
print(f">= {lo:2} .. < {hi:2} :", count)
print(f">= {limits[-1]:2} ... :", bins[-1])
"""
SAMPLE OUTPUT:
< 0 : 0
>= 0 .. < 25 : 6
>= 25 .. < 40 : 1
>= 40 .. < 60 : 3
>= 60 ... : 0
"""
I recommend you this approach which implements what you want in order of O(nlogn)
limits = [0, 25, 40, 60] # m
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18] # n
data += limits # O(n+m)
data.sort() # O((n+m)log(n+m)) = O(nlogn)
result=dict() # O(1)
cnt = 0 # O(1)
aux ='' # O(1)
i = 0 # O(1)
for el in data: # O(n+m)
if el == limits[i]:
i+=1
if cnt > 0:
aux+='-'+str(el)
result[aux] = cnt # average = O(1)
cnt = 0
aux = str(el)
else:
aux = str(el)
else:
cnt+=1
print(result)
# {'0-25': 6, '25-40': 1, '40-60': 3}
I showed the time complexity of each important line to calculate the total time complexity of the code. the total time complexity of the code is equal to O((n+m)log(n+m)) which can be shown as O(nlogn).
Improvement
you can improve it if you have some assumptions about the inputs. if you have info about the range of limits and data, then you can change the sorting algorithm to counting sort. the time complexity of counting sort is considered as O(n) and the total time complexity of code would be O(n)
Here is a simple O(NlogN) approach. Sort your data, then use a two pointer approach to place each element in the correct interval.
limits = [0, 25, 40, 60]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
data.sort()
n,m = len(data), len(limits)
count = [0]*(m-1)
# count[i] represents count between limits[i] and limits[i+1]
low = 0 # lower index of interval we are currently checking
ptr = 0
while ptr < n:
i = data[ptr]
if i >= limits[low] and i <= limits[low+1]:
count[low] += 1
ptr += 1
elif i>=limits[low]:
if low == len(limits)-1:
break
low += 1
print(count)
limits = [0, 25, 40, 60, 80]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18, 25, 45, 85]
dict_data = {}
i = 0
count = 1
while i < len(limits)-1:
for j in data:
if j in range(limits[i], limits[i+1]):
if '{}-{}'.format(limits[i],limits[i+1]) in dict_data:
dict_data['{}-{}'.format(limits[i],limits[i+1])] +=count
else:
dict_data['{}-{}'.format(limits[i],limits[i+1])] = count
i+=1
print(dict_data)
You could use Counter (from collections) to manage the tallying and bisect to categorize:
from collections import Counter
from bisect import bisect_left
limits = [0, 25, 40, 60, 80]
data = [15, 5, 2, 56, 45, 23, 6, 59, 33, 18]
r = Counter(limits[bisect_left(limits,d)-1] for d in data)
print(r)
Counter({0: 6, 40: 3, 25: 1})
This has a time complexity of O(NLogM) where M is the number of limit breaks and N is the number of data items
I would like to create a matrix with cells that increment by 10. For example, the output of a 3x3 matrix should be:
[[10, 20, 30], [40, 50, 60], [70, 80, 90]]
The code I currently have creates a 3x3 matrix filled with 0s:
print([[0 for x in range(3)] for y in range(3)])
output: [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
Try this on for size
print([[30*y + 10*x for x in range(3)] for y in range(3)])
What this does is swaps out the 0 you were using with 30*y + 10*x which is exactly what you need to generate your array. For a more general solution that lets you scale to n by n matrices you can use
n = k
print([[10*k*y + 10*x for x in range(k)] for y in range(k)])
For different rows and columns you can use
rows = k
cols = j
print([[10*cols*y + 10*x for x in range(cols)] for y in range(rows)])
numpy package is quite flexible for things you want:
import numpy as np
m = np.arange(10, 100, 10) #array [10, 20, 30, 40, 50, 60, 70, 80, 90]
m = m.reshape(3,3) # array [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
print(m.tolist()) # array converted to list if you need
Output:
[[10, 20, 30], [40, 50, 60], [70, 80, 90]]
import numpy as np
x = np.array(range(10,100,10)).reshape(3,3)
print(x)
[[10 20 30]
[40 50 60]
[70 80 90]]
The code is not very compact but it gets the job done:
matrix = []
bar = []
foo = 10
for i in range(3):
for i in range(3):
bar.append(foo)
foo = foo + 10
matrix.append(bar)
bar = []
print(matrix)
There are three types of foods were provided i.e. meat, cake and pizza
and N different stores selling it where, i can only pick one type of food from
each store. Also I can only buy items in A, B and C numbers where 'A' means, Meat from total 'A' number of different stores (see example). My task is
to consume food, so that i can have maximum amount of energy.
example,
10 <= number of stores <br>
5 3 2 <= out of 10 stores I can pick meat from 5 stores only. Similarly,
I can pick cake from 3 out of 10 stores...
56 44 41 1 <= Energy level of meat, cake and pizza - (56, 44, 41) for first store.<br>
56 84 45 2
40 98 49 3
91 59 73 4
69 94 42 5
81 64 80 6
55 76 26 7
63 24 22 8
81 60 44 9
52 95 11 10
So to maximize my energy, I can consume...
Meat from store numbers:
[1, 4, 7, 8, 9] => [56, 91, 55, 63, 81]
Cake from store numbers:
[3, 5, 10] => [98, 94, 95]
Pizza from store numbers:
[2, 6] => [45, 80]
This leads me to eventually obtain a maximum energy level of 758.
As I am new to dynamic programming, I tried to solve it by generating unique combinations like,
10C5 * (10-5)C3 * (10-5-3)C2 = 2520
Here is my code,
nStores = 10
a, b, c = 5, 3, 2
matrix = [
[56,44,41],
[56,84,45],
[40,98,49],
[91,59,73],
[69,94,42],
[81,64,80],
[55,76,26],
[63,24,22],
[81,60,44],
[52,95,11]
]
count = a + b + c
data = []
allOverCount = [i for i in range(count)]
def genCombination(offset, depth, passedData, reductionLevel = 3):
if (depth == 0):
first = set(data)
if reductionLevel == 3:
return genCombination(0,b,[i for i in allOverCount if i not in first], reductionLevel=2)
elif reductionLevel == 2:
return genCombination(0,c,[i for i in allOverCount if i not in first], reductionLevel=1)
elif reductionLevel == 1:
xAns = 0
for i in range(len(data)):
if i < a:
xAns += matrix[data[i]][0]
elif i < a + b:
xAns += matrix[data[i]][1]
else:
xAns += matrix[data[i]][2]
return xAns
oneData = 0
for i in range(offset, len(passedData) - depth + 1 ):
data.append(passedData[i])
oneData = max(oneData, genCombination(i+1, depth-1, passedData, reductionLevel))
del data[-1]
return oneData
passedData = [i for i in range(count)]
finalOutput = genCombination(0,a,passedData)
print(finalOutput)
I know this is not the right way to do it. How can I optimize it?
This is a solution using Linear Programming through pulp (https://pypi.org/project/PuLP) that gives me the optimal solution
Maximum energy level: 758.0
Mapping of stores per foodtype: {1: [9, 2, 4], 0: [3, 8, 0, 6, 7], 2: [1, 5]}
The performance should be better than a hand-coded exhaustive solver I think.
from collections import defaultdict
import pulp
# data
nStores = 10
a, b, c = max_stores = 5, 3, 2
matrix = [
[56, 44, 41],
[56, 84, 45],
[40, 98, 49],
[91, 59, 73],
[69, 94, 42],
[81, 64, 80],
[55, 76, 26],
[63, 24, 22],
[81, 60, 44],
[52, 95, 11]
]
# create an LP problem
lp = pulp.LpProblem("maximize energy", sense=pulp.LpMaximize)
# create the list of indices for the variables
# the variables are binary variables for each combination of store and food_type
# the variable alpha[(store, food_typeà] = 1 if the food_type is taken from the store
index = {(store, food_type) for store in range(nStores) for food_type in range(3)}
alpha = pulp.LpVariable.dicts("alpha", index, lowBound=0, cat="Binary")
# add the constrain on max stores
for food_type, n_store_food_type in enumerate(max_stores):
lp += sum(alpha[(store, food_type)] for store in range(nStores)) <= n_store_food_type
# only one food type can be taken per store
for store in range(nStores):
lp += sum(alpha[(store, food_type)] for food_type in range(3)) <= 1
# add the objective to maximise
lp += sum(alpha[(store, food_type)] * matrix[store][food_type] for store, food_type in index)
# solve the problem
lp.solve()
# collect the results
stores_for_foodtype = defaultdict(list)
for (store, food_type) in index:
# check if the variable is active
if alpha[(store, food_type)].varValue:
stores_for_foodtype[food_type].append(store)
print(f"Maximum energy level: {lp.objective.value()}")
print(f"Mapping of stores per foodtype: {dict(stores_for_foodtype)}")
It looks like a modification to knapsack would solve it.
let's define our dp table as 4-dimensional array dp[N+1][A+1][B+1][C+1]
now some cell dp[n][a][b][c] means that we have considered n shops, out of them we picked a shops for meat,
b shops for cake and c shops for pizza and it stores max energy we can have.
Transitions are easy too, from some state dp[n][a][b][c] we can move to:
dp[n+1][a][b][c] if we skip n+1 th shop
dp[n+1][a+1][b][c] if we buy
meat from shop n+1
dp[n+1][a][b+1][c] if we buy cake from shop n+1
dp[n+1][a][b][c+1] if we buy pizza from shop n+1
All that's left is to fill dp table. Sample code:
N = 10
A,B,C = 5,3,2
energy = [
[56, 44, 41],
[56, 84, 45],
[40, 98, 49],
[91, 59, 73],
[69, 94, 42],
[81, 64, 80],
[55, 76, 26],
[63, 24, 22],
[81, 60, 44],
[52, 95, 11]
]
dp = {}
for n in range(N+1):
for a in range(A+1):
for b in range(B+1):
for c in range(C+1):
dp[n,a,b,c]=0
answer = 0;
for n in range(N+1):
for a in range(A+1):
for b in range(B+1):
for c in range(C+1):
#Case 1, skip n-th shop
if (n+1,a,b,c) in dp: dp[n+1,a,b,c] = max(dp[n+1,a,b,c], dp[n,a,b,c])
#Case 2, buy meat from n-th shop
if (n+1,a+1,b,c) in dp: dp[n+1,a+1,b,c] = max(dp[n+1,a+1,b,c], dp[n,a,b,c] + energy[n][0])
#Case 3, buy cake from n-th shop
if (n+1,a,b+1,c) in dp: dp[n+1,a,b+1,c] = max(dp[n+1,a,b+1,c], dp[n,a,b,c] + energy[n][1])
#Case 4, buy pizza from n-th shop
if (n+1,a,b,c+1) in dp: dp[n+1,a,b,c+1] = max(dp[n+1,a,b,c+1], dp[n,a,b,c] + energy[n][2])
answer = max(answer,dp[n,a,b,c])
print(answer)