How can I reduce time complexity on this algorithm? - python

I have this exercise and the goal is to solve it with complexity less than O(n^2).
You have an array with length N filled with event probabilities. Create another array in which for each element i calculate the probability of all event to happen until the position i.
I have coded this O(n^2) solution. Any ideas how to improve it?
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = list()
for i in range(len(probabilityTable)):
finalTable.append(1)
for j in range(i):
finalTable[i] *= probabilityTable[j]
for item in finalTable:
print(item)

probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = probabilityTable.copy()
for i in range(1, len(probabilityTable)):
finalTable[i] = finalTable[i] * finalTable[i - 1]
for item in finalTable:
print(item)

new_probs = [probabilityTable[0]]
for prob in probabilityTable[1:]:
new_probs.append(new_probs[-1] + prob)

Related

np.arange producing elements with many decimals

I have the following loop.
x_array = []
for x in np.arange(0.01, 0.1, 0.01 ):
x_array.append(x)
Why are some of the elements in x_array in so many decimals?
[0.01,
0.02,
0.03,
0.04,
0.05,
0.060000000000000005,
0.06999999999999999,
0.08,
0.09]
If you want your list of numbers without "additional" digits in the
fractional part, try the following code:
x_array = np.arange(0.01, 0.1, 0.01).round(2).tolist()
As you see, you don't even need any explicit loop.
The result is just what you want, i.e.:
[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09]
Another choice is:
x_array = (np.arange(1, 10) / 100).tolist()

Two different beaviour for .tolist() IndexError

I looping over on dataframe df1 to look for maximum order and then I want to take discount_first to assign to max order.
For one dataset everything goes OK
new_rate_1 = []
for value in df1["maximum_order"]:
new_val = df[df["New_Order_Lines"]==value]["discount_first"]
new_val = new_val.tolist()[0]
new_rate_1.append(new_val)
new_rate_1
[-1.3,
-1.3,
0.35,
0.8,
0.75,
0.55,
0.8,
0.85,
0.4,
0.75,
0.85,
0.85,
0.55,
0.45,
0.8,
0.65,
0.55,
0.85,
0.35,
0.85,
0.9,
0.5,
0.55,
-0.6,
0.85,
0.75,
0.35,
0.15,
0.55,
0.7,
0.8,
0.85,
0.75,
0.65,
0.75,
0.75,
0.35,
0.85,
0.4,
...
....
]
for other data set i start getting error ?
IndexError: list index out of range
If I dont index the list within the look I dont get error and output looks like this
[[0.8],
[0.8],
[0.55],
[0.55],
[0.55],
[0.85],
[0.55],
[0.85],
[0.85],
[0.65],
[0.65],
[0.75],
[0.7]
.....
any suggestion/advice how can I get rid of behaviour?
Thanks in advance
How about using this
# new_val = new_val.tolist()[0]
new_val = new_val.values.flatten()[0]
Why looping at all when you can do it without a loop?
you can use isin()+tolist() method:
new_rate_1 =df.loc[df["New_Order_Lines"].isin(df1["maximum_order"]),"discount_first"].tolist()

Python: Optimize weights in portfolio

I have the following dataframe with weights:
df = pd.DataFrame({'a': [0.1, 0.5, 0.1, 0.3], 'b': [0.2, 0.4, 0.2, 0.2], 'c': [0.3, 0.2, 0.4, 0.1],
'd': [0.1, 0.1, 0.1, 0.7], 'e': [0.2, 0.1, 0.3, 0.4], 'f': [0.7, 0.1, 0.1, 0.1]})
and then I normalize each row using:
df = df.div(df.sum(axis=1), axis=0)
I want to optimize the normalized weights of each row such that no weight is less than 0 or greater than 0.4.
If the weight is greater than 0.4, it will be clipped to 0.4 and the additional weight will be distributed to the other entries in a pro-rata fashion (meaning the second largest weight will receive more weight so it gets close to 0.4, and if there is any remaining weight, it will be distributed to the third and so on).
Can this be done using the "optimize" function?
Thank you.
UPDATE: I would also like to set a minimum bound for the weights. In my original question, the minimum weight bound was automatically considered as zero, however, I would like to set a constraint such that the minimum weight is at at least equal to 0.05, for example.
Unfortunately, I can only find a loop solution to this problem. When you trim off the excess weight and redistribute it proportionally, the underweight may go over the limit. Then they have to be trimmed off. And the cycle keep repeating until no value is overweight. The same goes for underweight rows.
# The original data frame. No normalization yet
df = pd.DataFrame(
{
"a": [0.1, 0.5, 0.1, 0.3],
"b": [0.2, 0.4, 0.2, 0.2],
"c": [0.3, 0.2, 0.4, 0.1],
"d": [0.1, 0.1, 0.1, 0.7],
"e": [0.2, 0.1, 0.3, 0.4],
"f": [0.7, 0.1, 0.1, 0.1],
}
)
def ensure_min_weight(row: np.array, min_weight: float):
while True:
underweight = row < min_weight
if not underweight.any():
break
missing_weight = min_weight * underweight.sum() - row[underweight].sum()
row[~underweight] -= missing_weight / row[~underweight].sum() * row[~underweight]
row[underweight] = min_weight
def ensure_max_weight(row: np.array, max_weight: float):
while True:
overweight = row > max_weight
if not overweight.any():
break
excess_weight = row[overweight].sum() - (max_weight * overweight.sum())
row[~overweight] += excess_weight / row[~overweight].sum() * row[~overweight]
row[overweight] = max_weight
values = df.to_numpy()
normalized = values / values.sum(axis=1)[:, None]
min_weight = 0.15 # just for fun
max_weight = 0.4
for i in range(len(values)):
row = normalized[i]
ensure_min_weight(row, min_weight)
ensure_max_weight(row, max_weight)
# Normalized weight
assert np.isclose(normalized.sum(axis=1), 1).all(), "Normalized weight must sum up to 1"
assert ((min_weight <= normalized) & (normalized <= max_weight)).all(), f"Normalized weight must be between {min_weight} and {max_weight}"
print(pd.DataFrame(normalized, columns=df.columns))
# Raw values
# values = normalized * values.sum(axis=1)[:, None]
# print(pd.DataFrame(values, columns=df.columns))
Note that this algorithm will run into infinite loop if your min_weight and max_weight are illogical: try min_weight = 0.4 and max_weight = 0.5. You should handle these errors in the 2 ensure functions.

IndexError on 3-dimensional arrays

Noob question, but I can't seem to figure out why this is throwing an error: IndexError: index 4 is out of bounds for axis 2 with size 4
import numpy as np
numP = 4;
P = np.zeros((3,3,numP))
P[:,:,1] = np.array([[0.50, 0.25, 0.25],
[0.20, 0.55, 0.25],
[0.20, 0.30, 0.50]])
P[:,:,2] = np.array([[0.70, 0.20, 0.10],
[0.05, 0.75, 0.20],
[0.10, 0.20, 0.70]])
P[:,:,3] = np.array([[0.45, 0.35, 0.20],
[0.20, 0.65, 0.15],
[0.00, 0.30, 0.70]])
P[:,:,4] = np.array([[0.60, 0.20, 0.20],
[0.20, 0.60, 0.20],
[0.05, 0.05, 0.90]])
Python is 0-indexed (as in list[0] refers to the first element in the list, list[1] refers to the second element... etc)
so the last assignment should be P[:,:,3]

How to get elements from a specific range out of a list?

Does anybody have an idea how to get the elements in a list whose values fall within a specific (from - to) range?
I need a loop to check if a list contains elements in a specific range, and if there are any, I need the biggest one to be saved in a variable..
Example:
list = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
# range (0.5 - 0.58)
# biggest = 0.56
You could use a filtered comprehension to get only those elements in the range you want, then find the biggest of them using the built-in max():
lst = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
biggest = max([e for e in lst if 0.5 < e < 0.58])
# biggest = 0.56
As an alternative to other answers, you can also use filter and lambda:
lst = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
biggest = max([i for i in filter(lambda x: 0.5 < x < 0.58, lst)])
I suppose a normal if check would be faster, but I'll give this just for completeness.
Also, you should not use list = ... as list is a built-in in python.
You could also go about it a step at a time, as the approach may aid in debugging.
I used numpy in this case, which is also a helpful tool to put in your tool belt.
This should run as is:
import numpy as np
l = [0.5, 0.56, 0.34, 0.45, 0.53, 0.6]
a = np.array(l)
low = 0.5
high = 0.58
index_low = (a < high)
print(index_low)
a_low = a[index_low]
print(a_low)
index_in_range = (a_low >= low)
print(index_in_range)
a_in_range = a_low[index_in_range]
print(a_in_range)
a_max = a_in_range.max()
print(a_max)

Categories