I am curious to know if there are any more optimal ways to compute this "rolling weighted sum" (unsure what the actual terminology is, but I will provide an example to further clarify). I am asking this because I am certain that my current code snippet is not coded in the most optimal way with respect to memory usage, and there is opportunity to improve its performance by using numpy's more advanced functions.
Example:
import numpy as np
A = np.append(np.linspace(0, 1, 10), np.linspace(1.1, 2, 30))
np.random.seed(0)
B = np.random.randint(3, size=40) + 1
# list of [(weight, (lower, upper))]
d = [(1, (-0.25, -0.20)), (0.5, (-0.20, -0.10)), (2, (-0.10, 0.15))]
In Python 3.7:
## A
array([0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ,
1.1 , 1.13103448, 1.16206897, 1.19310345, 1.22413793,
1.25517241, 1.2862069 , 1.31724138, 1.34827586, 1.37931034,
1.41034483, 1.44137931, 1.47241379, 1.50344828, 1.53448276,
1.56551724, 1.59655172, 1.62758621, 1.65862069, 1.68965517,
1.72068966, 1.75172414, 1.78275862, 1.8137931 , 1.84482759,
1.87586207, 1.90689655, 1.93793103, 1.96896552, 2. ])
## B
array([1, 2, 1, 2, 2, 3, 1, 3, 1, 1, 1, 3, 2, 3, 3, 1, 2, 2, 2, 2, 1, 2,
1, 1, 2, 3, 1, 3, 1, 2, 2, 3, 1, 2, 2, 2, 1, 3, 1, 3])
Expected Solution:
array([ 6. , 6.5, 8. , 10.5, 12. , 11. , 11.5, 11.5, 6.5, 13.5, 25. ,
27.5, 30.5, 34.5, 37.5, 36. , 35. , 35. , 34. , 34.5, 34. , 36.5,
33. , 34. , 34.5, 34.5, 36. , 39. , 37. , 36. , 37. , 36.5, 37.5,
39. , 36.5, 37.5, 34. , 31. , 27.5, 23. ])
The logic I want to translate into code:
Let's look at how 10.5 (the fourth element in the expected solution) is computed. d represents a collection of nested tuples with first float element weight, and second tuple element bounds (in the form of (lower, upper)).
We look at the fourth element of A (0.33333333) and apply bounds for each tuple in d. For the first tuple in d:
0.33333333 + (-0.25) = 0.08333333
0.33333333 + (-0.20) = 0.13333333
We go back to A to see if there are any elements between bounds (0.08333333, 0.1333333). Because the second element of A (0.11111111) falls in this range, we pull the second element of B (2) and multiply it by its weight from d (1) and add it to the second element of the expected output.
After iterating across all tuples in d, the fourth element of the expected output is computed as:
1 * 2 + 0.5 * 1 + 2 * (2 + 2) = 10.5
Here is my attempted code:
D = np.zeros(len(A))
for v in d:
weight, (_lower, _upper) = v
lower, upper = A + _lower, A + _upper
_A = np.tile(A, (len(A), 1))
__A = np.bitwise_and(_A > lower.reshape(-1, 1), _A < upper.reshape(-1, 1))
D += weight * (__A # B)
D
Hopefully this makes sense. Please feel free to ask clarifying questions. Thanks!
Since intervals (-0.25, -0.20), (-0.20, -0.10) and (-0.10, 0.15) are actually subintervals of partition of an interval (-0.25, 0.15) you could find indices where elements should be inserted in A to maintain order. They specify slices of B to perform addition on. In short:
partition = np.array([-0.25, -0.20, -0.10, 0.15])
weights = np.array([1, 0.5, 2])
out = []
for n in A:
idx = np.searchsorted(A, n + partition)
results = np.add.reduceat(B[:idx[-1]], idx[:-1])
out.append(np.dot(results, weights))
>>> print(out)
[7.5, 7.5, 8.0, 10.5, 12.0, 11.0, 11.5, 11.5, 6.5, 13.5, 27.5, 27.5, 31.5, 35.5, 37.5, 37.0, 36.0, 35.0, 34.0, 34.5, 34.0, 36.5, 33.0, 34.0, 34.5, 34.5, 36.0, 39.0, 37.0, 36.0, 37.0, 36.5, 37.5, 39.0, 36.5, 37.5, 34.0, 31.0, 27.5, 23.0]
Note that results are wrong if there are empty slices of B
Credits to #mathfux for providing me enough guidance. Here's the final code solution that I developed based on conversations here:
partition = np.array([-0.25, -0.20, -0.10, 0.15])
weights = np.array([1, 0.5, 2])
idx = np.searchsorted(A, partition + A[:, None])
_idx = np.lib.stride_tricks.sliding_window_view(idx, 2, axis = 1)
values = np.apply_along_axis(lambda x: B[slice(*(x))].sum(), 2, _idx)
values # weights
Related
I have a dataframe (df) that has three columns (user, vector, and group name), the vector column with multiple comma-separated values in each row.
df = pd.DataFrame({'user': ['user_1', 'user_2', 'user_3', 'user_4', 'user_5', 'user_6'], 'vector': [[1, 0, 2, 0], [1, 8, 0, 2],[6, 2, 0, 0], [5, 0, 2, 2], [3, 8, 0, 0],[6, 0, 0, 2]], 'group': ['A', 'B', 'C', 'B', 'A', 'A']})
I would like to calculate for each group, the sum of dimensions in all rows divided by the total number of rows for this group.
For example:
For group, A is [(1+3+6)/3, (0+8+0)/3, (2+0+0)/3, (0+0+2)/3] = [3.3, 2.6, 0.6, 0.6].
For group, B is [(1+5)/2, (8+0)/2, (0+2)/2, (2+2)/2] = [3,4,1,2].
For group, C is [6, 2, 0, 0]
So, the expected result is an array:
group A: [3.3, 2.6, 0.6, 0.6]
group B: [3,4,1,2]
group C: [6, 2, 0, 0]
I'm not sure if you were looking for the results stored in a single array/dataframe, or if you're just looking to get the results as separate arrays.
If the latter, something like this should work for you:
for group in df.group.unique():
print(f'Group {group} results: ')
tmp_df = pd.DataFrame(df[df.group==group]['vector'].tolist())
print(tmp_df.mean().values)
Output:
Group A results:
[3.33333333 2.66666667 0.66666667 0.66666667]
Group B results:
[3. 4. 1. 2.]
Group C results:
[6. 2. 0. 0.]
It's a little clunky, but gets the job done if you're just looking to get the results.
Filters the dataframe based on group, then turns the vectors of that into it's own tmp_df and gets the mean for each column.
If you want you could easily take those arrays and save them for further manipulation or what have you.
Hope that helps!
Take advantage of numpy:
import numpy as np
out = (df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
print(out)
Output:
group
A [3.33, 2.67, 0.67, 0.67]
B [3.0, 4.0, 1.0, 2.0]
C [6.0, 2.0, 0.0, 0.0]
Name: vector, dtype: object
as DataFrame
out = (df.groupby('group', as_index=False)['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
group vector
0 A [3.33, 2.67, 0.67, 0.67]
1 B [3.0, 4.0, 1.0, 2.0]
2 C [6.0, 2.0, 0.0, 0.0]
as array
out = np.vstack(df.groupby('group')['vector']
.agg(lambda x: np.vstack(x).mean(0).round(2))
)
Output:
[[3.33 2.67 0.67 0.67]
[3. 4. 1. 2. ]
[6. 2. 0. 0. ]]
I have data in the following format:
[('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
And I want use this information to create a NumPy array that has the value 1.0 in position 2, value 2.5 in position 6, etc. All positions not listed in the above should be zeroes. Like this:
array([0, 3.0, 0, 0, 0, 0, 2.5, 0, 0, 3.0])
First reformat the data:
data = [
("user_1", 2, 1.0),
("user_2", 6, 2.5),
("user_3", 9, 3.0),
("user_4", 1, 3.0),
]
usernames, indices, values = zip(*data)
And then create the array:
length = max(indices) + 1
arr = np.zeros(shape=(length,))
arr[list(indices)] = values
print(arr) # array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
Note that you need to convert indices to a list,
otherwise when using it for indexing numpy will
think it is trying to index multiple dimensions.
I've come up with this solution:
import numpy as np
a = [('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
res = np.zeros(max(x[1] for x in a)+1)
for i in range(len(a)):
res[a[i][1]] = a[i][2]
res
# array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
First I create a 0 filled array with maximum value of the number in index 1 of each tuple in list a + 1 to account that your positions are 1 higher than the indexes inside the array are.
Then I do a simple loop and assign the values according to the arguments in the tuple.
Say we have a meshgrid:
import numpy as np
u = np.zeros((5, 5))
# borders
u[ 0, :] = np.array([ 1.0, 2.0, 4.5, 8.0, 12.5])
u[-1, :] = np.array([ 1.0, 4.0, 9.0, 16.0, 25.0])
u[ :, 0] = np.array([ 1.0, 2.0, 4.5, 2.0, 1.0])
u[ :, -1] = np.array([12.5, 17.0, 22.0, 23.5, 25.0])
print(u)
[[ 1. 2. 4.5 8. 12.5]
[ 2. 0. 0. 0. 17. ]
[ 4.5 0. 0. 0. 22. ]
[ 2. 0. 0. 0. 23.5]
[ 1. 4. 9. 16. 25. ]]
How can I fill the inner nodes with the combination/blend/interpolation of the values at the borders? (I don't want the surface, only the values. The plot is for visual assistance.)
Edit: actually, I found a workaround. One can do a linear interpolation at each opposite pair veritcally and an other horizontally. Then, take the mean of the two:
# vertical interpolation
y_borders_values = np.array([0, 4])
u_vert_borders_values = np.array([[u[ 0, i], u[-1, i]] for i in range(5)])
yvals = np.arange(5)
uy_inter = np.zeros_like(u)
for column in range(5):
uy_inter[:, column] = np.interp(yvals, y_borders_values,
u_vert_borders_values[column])
# horizontal interpolation
x_borders_values = np.array([0, 5])
u_hor_borders_values = np.array([[u[i, 0], u[i, -1]] for i in range(5)])
xvals = np.arange(5)
ux_inter = np.zeros_like(u)
for row in range(5):
ux_inter[row, :] = np.interp(xvals, x_borders_values,
u_hor_borders_values[row])
# final interpolation as the mean of verical and horizontal
u[1:-1, 1:-1] = (uy_inter[1:-1, 1:-1] + ux_inter[1:-1, 1:-1]) / 2
Visualizing the result:
(Edit after Matthew's note on the wording)
But, is there any tool that does something like that with less code?
For this problem, I got the 8 vertices of a box that i need to shrink, with a given size that is an integer which I need to shrink every side with. For example, if the size of the box I need to shrink is 8*8*8 and the shrinking size is 2, I need to return a list of all the vertices of the 4*4*4 boxes that fill the big box in a 3D coordinate system.
I thought about having a for loop that runs in range of the size of the box, but than I thought if I want to eventually seperate the box into a lot more boxes that are smaller and I want to fill the big box i would have to write an amount of code that I wouldn't be able to write. How to get this list of vertices without writing this much code?
I'm not sure if this is what you want, but here is a simple way to compute vertices in a grid with NumPy:
import numpy as np
def make_grid(x_size, y_size, z_size, shrink_factor):
n = (shrink_factor + 1) * 1j
xx, yy, zz = np.mgrid[:x_size:n, :y_size:n, :z_size:n]
return np.stack([xx.ravel(), yy.ravel(), zz.ravel()], axis=1)
print(make_grid(8, 8, 8, 2))
Output:
[[0. 0. 0.]
[0. 0. 4.]
[0. 0. 8.]
[0. 4. 0.]
[0. 4. 4.]
[0. 4. 8.]
[0. 8. 0.]
[0. 8. 4.]
[0. 8. 8.]
[4. 0. 0.]
[4. 0. 4.]
[4. 0. 8.]
[4. 4. 0.]
[4. 4. 4.]
[4. 4. 8.]
[4. 8. 0.]
[4. 8. 4.]
[4. 8. 8.]
[8. 0. 0.]
[8. 0. 4.]
[8. 0. 8.]
[8. 4. 0.]
[8. 4. 4.]
[8. 4. 8.]
[8. 8. 0.]
[8. 8. 4.]
[8. 8. 8.]]
Otherwise with itertools:
from itertools import product
def make_grid(x_size, y_size, z_size, shrink_factor):
return [(x * x_size, y * y_size, z * z_size)
for x, y, z in product((i / shrink_factor
for i in range(shrink_factor + 1)), repeat=3)]
print(*make_grid(8, 8, 8, 2), sep='\n')
Output:
(0.0, 0.0, 0.0)
(0.0, 0.0, 4.0)
(0.0, 0.0, 8.0)
(0.0, 4.0, 0.0)
(0.0, 4.0, 4.0)
(0.0, 4.0, 8.0)
(0.0, 8.0, 0.0)
(0.0, 8.0, 4.0)
(0.0, 8.0, 8.0)
(4.0, 0.0, 0.0)
(4.0, 0.0, 4.0)
(4.0, 0.0, 8.0)
(4.0, 4.0, 0.0)
(4.0, 4.0, 4.0)
(4.0, 4.0, 8.0)
(4.0, 8.0, 0.0)
(4.0, 8.0, 4.0)
(4.0, 8.0, 8.0)
(8.0, 0.0, 0.0)
(8.0, 0.0, 4.0)
(8.0, 0.0, 8.0)
(8.0, 4.0, 0.0)
(8.0, 4.0, 4.0)
(8.0, 4.0, 8.0)
(8.0, 8.0, 0.0)
(8.0, 8.0, 4.0)
(8.0, 8.0, 8.0)
A solution using numpy, which allows easy bloc manipulation.
First I choose to represent a cube with an origin and three vectors : the unit cube is represented with orig=np.array([0,0,0]) and vects=np.array([[1,0,0],[0,1,0],[0,0,1]]).
Now a numpy function to generate the eight vertices:
import numpy as np
def cube(origin,edges):
for e in edges:
origin = np.vstack((origin,origin+e))
return origin
cube(orig,vects)
array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[1, 1, 0],
[0, 0, 1],
[1, 0, 1],
[0, 1, 1],
[1, 1, 1]])
Then an other to span minicubes in 3D :
def split(origin,edges,k):
minicube=cube(origin,edges/k)
for e in edges/k:
minicube =np.vstack([minicube + i*e for i in range(k) ])
return minicube.reshape(k**3,8,3)
split (orig,vects,2)
array([[[ 0. , 0. , 0. ],
[ 0.5, 0. , 0. ],
[ 0. , 0.5, 0. ],
[ 0.5, 0.5, 0. ],
[ 0. , 0. , 0.5],
[ 0.5, 0. , 0.5],
[ 0. , 0.5, 0.5],
[ 0.5, 0.5, 0.5]],
...
[[ 0.5, 0.5, 0.5],
[ 1. , 0.5, 0.5],
[ 0.5, 1. , 0.5],
[ 1. , 1. , 0.5],
[ 0.5, 0.5, 1. ],
[ 1. , 0.5, 1. ],
[ 0.5, 1. , 1. ],
[ 1. , 1. , 1. ]]])
My example below will work on a generic box and assumes integer coordinates.
import numpy as np
def create_cube(start_x, start_y, start_z, size):
return np.array([
[x,y,z]
for z in [start_z, start_z+size]
for y in [start_y, start_y+size]
for x in [start_x, start_x+size]
])
def subdivide(box, scale):
start = np.min(box, axis=0)
end = np.max(box, axis=0) - scale
return np.array([
create_cube(x, y, z, scale)
for z in range(start[2], end[2]+1)
for y in range(start[1], end[1]+1)
for x in range(start[0], end[0]+1)
])
cube = create_cube(1, 3, 2, 8)
Cube will look like:
array([[ 1, 3, 2],
[ 9, 3, 2],
[ 1, 11, 2],
[ 9, 11, 2],
[ 1, 3, 10],
[ 9, 3, 10],
[ 1, 11, 10],
[ 9, 11, 10]])
Running the following subdivide:
subcubes = subdivide(cube, 2)
The subdivide function creates an nparray with a shape: (343, 8, 3). You would expect to have 343 subcubes moving the 2x2 cube evenly over an 8x8 cube.
I have a few numpy arrays like so:
import numpy as np
a = np.array([[1, 2, 3, 4, 5], [14, 16, 17, 27, 38]])
b = np.array([[1, 2, 3, 4, 5], [.4, .2, .5, .1, .6]])
I'd like to be able to 1.Copy these arrays into a new single array and 2. Sort the data so that the result is as follows:
data = [[1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [14, .4, 16, .2, 17, .5, 27, .1, 38, .6]]
Or, in other words, I need all columns from the original array to be the same, just in an ascending order. I tried this:
data = np.hstack((a,b))
Which gave me the appended data, but I'm not sure how to sort it. I tried np.sort() but it didn't keep the columns the same. Thanks!
Stack those horizontally (as you already did), then get argsort indices for sorting first row and use those to sort all columns in the stacked array.
Thus, we need to add one more step, like so -
ab = np.hstack((a,b))
out = ab[:,ab[0].argsort()]
Sample run -
In [370]: a
Out[370]:
array([[ 1, 2, 3, 4, 5],
[14, 16, 17, 27, 38]])
In [371]: b
Out[371]:
array([[ 1. , 2. , 3. , 4. , 5. ],
[ 0.4, 0.2, 0.5, 0.1, 0.6]])
In [372]: ab = np.hstack((a,b))
In [373]: print ab[:,ab[0].argsort()]
[[ 1. 1. 2. 2. 3. 3. 4. 4. 5. 5. ]
[ 14. 0.4 16. 0.2 17. 0.5 27. 0.1 38. 0.6]]
Please note that to keep the order for identical elements, we need to use to use kind='mergesort' with argsort as described in the docs.
If you like something short.
np.array(zip(*sorted(zip(*np.hstack((a,b))))))
>>> array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 0.4, 14. , 0.2, 16. , 0.5, 17. , 0.1, 27. , 0.6, 38. ]])
Version that preserve second element order:
np.array(zip(*sorted(zip(*np.hstack((a,b))),key=lambda x:x[0])))
>>>array([[ 1. , 1. , 2. , 2. , 3. , 3. , 4. , 4. , 5. , 5. ],
[ 14. , 0.4, 16. , 0.2, 17. , 0.5, 27. , 0.1, 38. ,0.6]])