I have the following dataframe with weights:
df = pd.DataFrame({'a': [0.1, 0.5, 0.1, 0.3], 'b': [0.2, 0.4, 0.2, 0.2], 'c': [0.3, 0.2, 0.4, 0.1],
'd': [0.1, 0.1, 0.1, 0.7], 'e': [0.2, 0.1, 0.3, 0.4], 'f': [0.7, 0.1, 0.1, 0.1]})
and then I normalize each row using:
df = df.div(df.sum(axis=1), axis=0)
I want to optimize the normalized weights of each row such that no weight is less than 0 or greater than 0.4.
If the weight is greater than 0.4, it will be clipped to 0.4 and the additional weight will be distributed to the other entries in a pro-rata fashion (meaning the second largest weight will receive more weight so it gets close to 0.4, and if there is any remaining weight, it will be distributed to the third and so on).
Can this be done using the "optimize" function?
Thank you.
UPDATE: I would also like to set a minimum bound for the weights. In my original question, the minimum weight bound was automatically considered as zero, however, I would like to set a constraint such that the minimum weight is at at least equal to 0.05, for example.
Unfortunately, I can only find a loop solution to this problem. When you trim off the excess weight and redistribute it proportionally, the underweight may go over the limit. Then they have to be trimmed off. And the cycle keep repeating until no value is overweight. The same goes for underweight rows.
# The original data frame. No normalization yet
df = pd.DataFrame(
{
"a": [0.1, 0.5, 0.1, 0.3],
"b": [0.2, 0.4, 0.2, 0.2],
"c": [0.3, 0.2, 0.4, 0.1],
"d": [0.1, 0.1, 0.1, 0.7],
"e": [0.2, 0.1, 0.3, 0.4],
"f": [0.7, 0.1, 0.1, 0.1],
}
)
def ensure_min_weight(row: np.array, min_weight: float):
while True:
underweight = row < min_weight
if not underweight.any():
break
missing_weight = min_weight * underweight.sum() - row[underweight].sum()
row[~underweight] -= missing_weight / row[~underweight].sum() * row[~underweight]
row[underweight] = min_weight
def ensure_max_weight(row: np.array, max_weight: float):
while True:
overweight = row > max_weight
if not overweight.any():
break
excess_weight = row[overweight].sum() - (max_weight * overweight.sum())
row[~overweight] += excess_weight / row[~overweight].sum() * row[~overweight]
row[overweight] = max_weight
values = df.to_numpy()
normalized = values / values.sum(axis=1)[:, None]
min_weight = 0.15 # just for fun
max_weight = 0.4
for i in range(len(values)):
row = normalized[i]
ensure_min_weight(row, min_weight)
ensure_max_weight(row, max_weight)
# Normalized weight
assert np.isclose(normalized.sum(axis=1), 1).all(), "Normalized weight must sum up to 1"
assert ((min_weight <= normalized) & (normalized <= max_weight)).all(), f"Normalized weight must be between {min_weight} and {max_weight}"
print(pd.DataFrame(normalized, columns=df.columns))
# Raw values
# values = normalized * values.sum(axis=1)[:, None]
# print(pd.DataFrame(values, columns=df.columns))
Note that this algorithm will run into infinite loop if your min_weight and max_weight are illogical: try min_weight = 0.4 and max_weight = 0.5. You should handle these errors in the 2 ensure functions.
Related
I have a list of weights which all have a value range between 0.0 and 1.0. The sum of the values in list should be always 1.0.
Now I would like to write a function in which I can change one weight from the list by a certain value (positive or negative). The remaining weights of the lst should be adjusted evenly, so that the sum of the list result in 1.0 again at the end.
Example:
weights = [0.5, 0.2, 0.2, 0.1]
If I increase the second entry of the list by 0.3, the resulting list should look like this:
weights = [0.4, 0.5, 0.1, 0.0]
I've tried with the following function:
def change_weight(weights, index, value):
result = []
weight_to_change = weights[index] + value
weights.pop(index)
for i, weight in enumerate(weights):
if i == index:
result.append(weight_to_change)
result.append(weight - value/len(weights))
return result
This works perfectly for the example above:
weights = [0.5, 0.2, 0.2, 0.1]
print(change_weight(weights, 1, 0.3))
# like expected: [0.4, 0.5, 0.1, 0.0]
However, if I want to change the second weight about 0.5. The the last element of the list will get a negative value:
weights = [0.5, 0.2, 0.2, 0.1]
print(change_weight(weights, 1, 0.5))
results in [0.33, 0.7, 0.03, -0.07]
However, I do not want any negative values in the list. Such values should instead be set to 0.0 and the remainder added or subtracted evenly to the other values.
Does anyone have an idea how I can implement this?
Here is a implementation of the idea of #RemiCuingnet :
def change_weight(weights, index, value):
new_weight = weights[index] + value
old_sum = sum(w for i,w in enumerate(weights) if i != index)
new_weights = []
for i,w in enumerate(weights):
if i == index:
new_weights.append(new_weight)
else:
new_weights.append(w*(1-new_weight)/old_sum)
return new_weights
For example
print(change_weight([0.5, 0.2, 0.2, 0.1],1,.3))
print(change_weight([0.5, 0.2, 0.2, 0.1],1,.5))
Output:
[0.3125, 0.5, 0.12500000000000003, 0.06250000000000001]
[0.18750000000000006, 0.7, 0.07500000000000002, 0.03750000000000001]
I have this exercise and the goal is to solve it with complexity less than O(n^2).
You have an array with length N filled with event probabilities. Create another array in which for each element i calculate the probability of all event to happen until the position i.
I have coded this O(n^2) solution. Any ideas how to improve it?
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = list()
for i in range(len(probabilityTable)):
finalTable.append(1)
for j in range(i):
finalTable[i] *= probabilityTable[j]
for item in finalTable:
print(item)
probabilityTable = [0.1, 0.54, 0.34, 0.11, 0.55, 0.75, 0.01, 0.06, 0.96]
finalTable = probabilityTable.copy()
for i in range(1, len(probabilityTable)):
finalTable[i] = finalTable[i] * finalTable[i - 1]
for item in finalTable:
print(item)
new_probs = [probabilityTable[0]]
for prob in probabilityTable[1:]:
new_probs.append(new_probs[-1] + prob)
I have this exercise where I get to build a simple neural network with one input layer and one hidden layer... I made the code below to perform a simple matrix multiplication, but it's not doing it properly as when I do the multiplication by hand. What am I doing wrong in my code?
#toes %win #fans
ih_wgt = ([0.1, 0.2, -0.1], #hid[0]
[-0.1, 0.1, 0.9], #hid[1]
[0.1, 0.4, 0.1]) #hid[2]
#hid[0] hid[1] #hid[2]
ho_wgt = ([0.3, 1.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1]) #sad?
weights = [ih_wgt, ho_wgt]
def w_sum(a,b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
def vect_mat_mul(vec, mat):
assert(len(vec) == len(mat))
output = [0, 0, 0]
for i in range(len(vec)):
output[i]= w_sum(vec, mat[i])
return output
def neural_network(input, weights):
hid = vect_mat_mul(input, weights[0])
pred = vect_mat_mul(hid, weights[1])
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
input = [toes[0],wlrec[0],nfans[0]]
pred = neural_network(input, weights)
print(pred)
the output of my code is:
[0.258, 0, 0]
The way I attempted to solve it by hand is as follows:
I multiplied the input vector [8.5, 0.65, 1.2] with the input weight matrix
ih_wgt = ([0.1, 0.2, -0.1], #hid[0]
[-0.1, 0.1, 0.9], #hid[1]
[0.1, 0.4, 0.1]) #hid[2]
[0.86, 0.295, 1.23]
the output vector is then fed into the network as an input vector which is then multiplied by the hidden weight matrix
ho_wgt = ([0.3, 1.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1]) #sad?
the correct output prediction:
[0.2135, 0.145, 0.5065]
Your help would be much appreciated!
You're almost there! Only a simple indentation thing is the reason:
def vect_mat_mul(vec, mat):
assert(len(vec) == len(mat))
output = [0, 0, 0]
for i in range(len(vec)):
output[i]= w_sum(vec, mat[i])
return output # <-- This one was inside the for loop
I have a list with a series of random floats that go from negative to positive, like:
values = [0.001, 0.05, 0.09, 0.1, 0.4, 0.8, 0.9, 0.95, 0.99]
I wish to filter out the indices that first meet the greater than/less than values that I wish. For example, if I want the first closest value less than 0.1 I would get an index of 2 and if I want the first highest value greater than 0.9 I'd get 7.
I have a find_nearest method that I am using but since this dataset is randomized, this is not ideal.
EDIT: Figured out a solution.
low = next(x[0] for x in enumerate(list(reversed(values))) if x[1] < 0.1)
high = next(x[0] for x in enumerate(values) if x[1] > 0.9)
if the values list gets long you may want the bisect module from the standard lib
bisect_left, bisect_right may serve as the >, < tests
import bisect
values = [0.001, 0.05, 0.09, 0.1, 0.4, 0.8, 0.9, 0.95, 0.99]
bisect.bisect_left(values, .1)
Out[226]: 3
bisect.bisect_right(values, .1)
Out[227]: 4
I have periodic data with the index being a floating point number like so:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
df.plot(marker='o')
I want to create a cross(df, y_val, direction='rise' | 'fall' | 'cross') function that returns an array of times (indexes) with all the
interpolated points where the voltage values equal y_val. For 'rise' only the values where the slope is positive are returned; for 'fall' only the values with a negative slope are retured; for 'cross' both are returned. So if y_val=0 and direction='cross' then an array with 10 values would be returned with the X values of the crossing points (the first one being about 0.025).
I was thinking this could be done with an iterator but was wondering if there was a better way to do this.
Thanks. I'm loving Pandas and the Pandas community.
To do this I ended up with the following. It is a vectorized version which is 150x faster than one that uses a loop.
def cross(series, cross=0, direction='cross'):
"""
Given a Series returns all the index values where the data values equal
the 'cross' value.
Direction can be 'rising' (for rising edge), 'falling' (for only falling
edge), or 'cross' for both edges
"""
# Find if values are above or bellow yvalue crossing:
above=series.values > cross
below=np.logical_not(above)
left_shifted_above = above[1:]
left_shifted_below = below[1:]
x_crossings = []
# Find indexes on left side of crossing point
if direction == 'rising':
idxs = (left_shifted_above & below[0:-1]).nonzero()[0]
elif direction == 'falling':
idxs = (left_shifted_below & above[0:-1]).nonzero()[0]
else:
rising = left_shifted_above & below[0:-1]
falling = left_shifted_below & above[0:-1]
idxs = (rising | falling).nonzero()[0]
# Calculate x crossings with interpolation using formula for a line:
x1 = series.index.values[idxs]
x2 = series.index.values[idxs+1]
y1 = series.values[idxs]
y2 = series.values[idxs+1]
x_crossings = (cross-y1)*(x2-x1)/(y2-y1) + x1
return x_crossings
# Test it out:
time = [0, 0.1, 0.21, 0.31, 0.40, 0.49, 0.51, 0.6, 0.71, 0.82, 0.93]
voltage = [1, -1, 1.1, -0.9, 1, -1, 0.9,-1.2, 0.95, -1.1, 1.11]
df = DataFrame(data=voltage, index=time, columns=['voltage'])
x_crossings = cross(df['voltage'])
y_crossings = np.zeros(x_crossings.shape)
plt.plot(time, voltage, '-ob', x_crossings, y_crossings, 'or')
plt.grid(True)
It was quite satisfying when this worked. Any improvements that can be made?