Delete all zeros slices from 4d numpy array

Delete all zeros slices from 4d numpy array - python

I pretend to remove slices from the third dimension of a 4d numpy array if it's contains only zeros.
I have a 4d numpy array of dimensions [256,256,336,6] and I need to delete the slices in the third dimension that only contains zeros. So the result would have a shape like this , e.g. [256,256,300,6] if 36 slices are fully zeros. I have tried multiple approaches including for loops, np.delete and all(), any() functions without success.

You need to reduce on all axes but the one you are interested in.
An example using np.any() where there are all-zero subarrays along the axis 1 (at position 0 and 2):
import numpy as np
a=np.ones((2, 3, 2, 3))
a[:, 0, :, :] = a[:, 2, :, :] =0
mask = np.any(a, axis=(0, 2, 3))
new_a = a[:, mask, :, :]
print(new_a.shape)
# (2, 1, 2, 3)
print(new_a)
# [[[[1. 1. 1.]
# [1. 1. 1.]]]
#
#
# [[[1. 1. 1.]
# [1. 1. 1.]]]]
The same code parametrized and refactored as a function:
def remove_all_zeros(arr: np.ndarray, axis: int) -> np.ndarray:
red_axes = tuple(i for i in range(arr.ndim) if i != axis)
mask = np.any(arr, axis=red_axes)
slicing = tuple(slice(None) if i != axis else mask for i in range(arr.ndim))
return arr[slicing]
a = np.ones((2, 3, 2, 3))
a[:, 0, :, :] = a[:, 2, :, :] = 0
new_a = remove_all_zeros(a, 1)
print(new_a.shape)
# (2, 1, 2, 3)
print(new_a)
# [[[[1. 1. 1.]
# [1. 1. 1.]]]
#
#
# [[[1. 1. 1.]
# [1. 1. 1.]]]]

I'm not an afficionado with numpy, but does this do what you want?
I take the following small example matrix with 4 dimensions all full of 1s and then I set some slices to zero:
import numpy as np
a=np.ones((4,4,5,2))
The shape of a is:
>>> a.shape
(4, 4, 5, 2)
I will artificially set some of the slices in dimension 3 to zero:
a[:,:,0,:]=0
a[:,:,3,:]=0
I can find the indices of the slices with not all zeros by calculating sums (not very efficient, perhaps!)
indices = [i for i in range(a.shape[2]) if a[:,:,i,:].sum() != 0]
>>> indices
[1, 2, 4]
So, in your general case you could do this:
indices = [i for i in range(a.shape[2]) if a[:,:,i,:].sum() != 0]
a_new = a[:, :, indices, :].copy()
Then the shape of a_new is:
>>> anew.shape
(4, 4, 3, 2)

Related

How can I manipulate a numpy array without nested loops?

If I have a MxN numpy array denoted arr, I wish to index over all elements and adjust the values like so
for m in range(arr.shape[0]):
for n in range(arr.shape[1]):
arr[m, n] += x**2 * np.cos(m) * np.sin(n)
Where x is a random float.
Is there a way to broadcast this over the entire array without needing to loop? Thus, speeding up the run time.

You are just adding zeros, because sin(2*pi*k) = 0 for integer k.
However, if you want to vectorize this, the function np.meshgrid could help you.
Check the following example, where I removed the 2 pi in the trigonometric functions to add something unequal zero.
x = 2
arr = np.arange(12, dtype=float).reshape(4, 3)
n, m = np.meshgrid(np.arange(arr.shape[1]), np.arange(arr.shape[0]), sparse=True)
arr += x**2 * np.cos(m) * np.sin(n)
arr
Edit: use the sparse argument to reduce memory consumption.

You can use nested generators of two-dimensional arrays:
import numpy as np
from random import random
x = random()
n, m = 10,20
arr = [[x**2 * np.cos(2*np.pi*j) * np.sin(2*np.pi*i) for j in range(m)] for i in range(n)]

In [156]: arr = np.ones((2, 3))
Replace the range with arange:
In [157]: m, n = np.arange(arr.shape[0]), np.arange(arr.shape[1])
And change the first array to (2,1) shape. A (2,1) array broadcasts with a (3,) to produce a (2,3) result.
In [158]: A = 0.23**2 * np.cos(m[:, None]) * np.sin(n)
In [159]: A
Out[159]:
array([[0. , 0.04451382, 0.04810183],
[0. , 0.02405092, 0.02598953]])
In [160]: arr + A
Out[160]:
array([[1. , 1.04451382, 1.04810183],
[1. , 1.02405092, 1.02598953]])
The meshgrid suggested in the accepted answer does the same thing:
In [161]: np.meshgrid(m, n, sparse=True, indexing="ij")
Out[161]:
[array([[0],
[1]]),
array([[0, 1, 2]])]
This broadcasting may be clearer with:
In [162]: m, n
Out[162]: (array([0, 1]), array([0, 1, 2]))
In [163]: m[:, None] * 10 + n
Out[163]:
array([[ 0, 1, 2],
[10, 11, 12]])

Numpy Dot product with nested array

trying to come up with a method to perform load combinations and transient load patterning for structural/civil engineering applications.
without patterning it's fairly simple:
list of load results = [[d],[t1],...,[ti]], where [ti] = transient load result as a numpy array = A
list of combos = [[1,0,....,0],[0,1,....,1], [dfi, tf1,.....,tfi]] , where tfi = code load factor for transient load = B
in python this works as numpy.dot(A,B)
so my issue arises where:
`list of load results = [[d],[t1],.....[ti]]`, where [t1] = [[t11]......[t1i]] for i pattern possibilities and [t1i] = numpy array
so I have a nested array within another array and want to multiply by a matrix of load combinations. Is there a way to implement this in one matrix operation, I can come up with a method by looping the pattern possibilities then a dot product with the load combos, but this is computationally expensive. Any thoughts?
Thanks
for an example not considering patterning see: https://github.com/buddyd16/Structural-Engineering/blob/master/Analysis/load_combo_test.py
essential I need a method that gives similar results assuming that for loads = np.array([[D],[Ex],[Ey],[F],[H],[L],[Lr],[R],[S],[Wx],[Wy]]) --> [L],[Lr],[R],[S] are actually nested arrays ie if D = 1x500 array/vector, L, Lr, R, or S could = 100x500 array.
my simple solution is:
combined_pattern = []
for pattern in load_patterns:
loads = np.array([[D],[Ex],[Ey],[F],[H],[L[pattern]],[Lr[pattern]],[R[pattern]],[S[pattern]],[Wx],[Wy]])
combined_pattern.append(np.dot(basic_factors, loads))
Simpler Example:
import numpy as np
#Simple
A = np.array([1,0,0])
B = np.array([0,1,0])
C = np.array([0,0,1])
Loads = np.array([A,B,C])
Factors = np.array([[1,1,1],[0.5,0.5,0.5],[0.25,0.25,0.25]])
result = np.dot(Factors, Loads)
# Looking for a faster way to accomplish the below operation
# this works but will be slow for large data sets
# bi can be up to 1x5000 in size and i can be up to 500
A = np.array([1,0,0])
b1 = np.array([1,0,0])
b2 = np.array([0,1,0])
b3 = np.array([0,0,1])
B = np.array([b1,b2,b3])
C = np.array([0,0,1])
result_list = []
for load in B:
Loads = np.array([A,load,C])
Factors = np.array([[1,1,1],[0.5,0.5,0.5],[0.25,0.25,0.25]])
result = np.dot(Factors, Loads)
result_list.append(result)
edit: Had Factors and Loads reversed in the np.dot().

In your simple example, the array shapes are:
In [2]: A.shape
Out[2]: (3,)
In [3]: Loads.shape
Out[3]: (3, 3)
In [4]: Factors.shape
Out[4]: (3, 3)
In [5]: result.shape
Out[5]: (3, 3)
The rule in dot is that the last dimension of Loads pairs with the 2nd to the last of Factors
result = np.dot(Loads,Factors)
(3,3) dot (3,3) => (3,3) # 3's in common
(m,n) dot (n,l) => (m,l) # n's in common
In the iteration, A,load and C are all (3,) and Loads is (3,3).
result_list is a list of 3 (3,3) arrays, and np.array(result_list) would be (3,3,3).
Let's make a 3d array of all the Loads:
In [16]: Bloads = np.array([np.array([A,load,C]) for load in B])
In [17]: Bloads.shape
Out[17]: (3, 3, 3)
In [18]: Bloads
Out[18]:
array([[[1, 0, 0],
[1, 0, 0],
[0, 0, 1]],
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]],
[[1, 0, 0],
[0, 0, 1],
[0, 0, 1]]])
I can easily do a dot of this Bloads and Factors with einsum:
In [19]: np.einsum('lkm,mn->lkn', Bloads, Factors)
Out[19]:
array([[[1. , 1. , 1. ],
[1. , 1. , 1. ],
[0.25, 0.25, 0.25]],
[[1. , 1. , 1. ],
[0.5 , 0.5 , 0.5 ],
[0.25, 0.25, 0.25]],
[[1. , 1. , 1. ],
[0.25, 0.25, 0.25],
[0.25, 0.25, 0.25]]])
einsum isn't the only way, but it's the easiest way (for me) to keep track of dimensions.
It's even easier to keep dimensions straight if they differ. Here they are all 3, so it's hard to keep them separate. But if B was (5,4) and Factors (4,2), then Bloads would be (5,3,4), and the einsum result (5,3,2) (the size 4 dropping out in the dot).
Constructing Bloads without a loop is a bit trickier, since the rows of B are interleaved with A and C.
In [38]: np.stack((A[None,:].repeat(3,0),B,C[None,:].repeat(3,0)),1)
Out[38]:
array([[[1, 0, 0],
[1, 0, 0],
[0, 0, 1]],
[[1, 0, 0],
[0, 1, 0],
[0, 0, 1]],
[[1, 0, 0],
[0, 0, 1],
[0, 0, 1]]])
To understand this test the subexpressions, e.g. A[None,:], the repeat etc.
Equivalently:
np.array((A[None,:].repeat(3,0),B,C[None,:].repeat(3,0))).transpose(1,0,2)

Numpy normalize multi dim (>=3) array

I have a 5 dim array (comes from binning operations) and would like to have it normed (sum == 1 for the last dimension).
I thought I found the answer here but it says:
ValueError: Found array with dim 5. the normalize function expected <= 2.
I achieve the result with 5 nested loops, like:
for en in range(en_bin.nb):
for zd in range(zd_bin.nb):
for az in range(az_bin.nb):
for oa in range(oa_bin.nb):
# reduce fifth dimension (en reco) for normalization
b = np.sum(a[en][zd][az][oa])
for er in range(er_bin.nb):
a[en][zd][az][oa][er] /= b
but I want to vectorise operations.
For example:
In [18]: a.shape
Out[18]: (3, 1, 1, 2, 4)
In [20]: b.shape
Out[20]: (3, 1, 1, 2)
In [22]: a
Out[22]:
array([[[[[ 0.90290316, 0.00953237, 0.57925688, 0.65402645],
[ 0.68826638, 0.04982717, 0.30458093, 0.0025204 ]]]],
[[[[ 0.7973917 , 0.93050739, 0.79963614, 0.75142376],
[ 0.50401287, 0.81916812, 0.23491561, 0.77206141]]]],
[[[[ 0.44507296, 0.06625994, 0.6196917 , 0.6808444 ],
[ 0.8199077 , 0.02179789, 0.24627425, 0.43382448]]]]])
In [23]: b
Out[23]:
array([[[[ 2.14571886, 1.04519487]]],
[[[ 3.27895899, 2.33015801]]],
[[[ 1.81186899, 1.52180432]]]])

Sum along the last axis by listing axis=-1 with numpy.sum, keeping dimensions and then simply divide by the array itself, thus bringing in NumPy broadcasting -
a/a.sum(axis=-1,keepdims=True)
This should be applicable for ndarrays of generic number of dimensions.
Alternatively, we could sum with axis-reduction and then add a new axis with None/np.newaxis to match up with the input array shape and then divide -
a/(a.sum(axis=-1)[...,None])

Calculate percentage of count for a list of arrays

Simple problem, but I cannot seem to get it to work. I want to calculate the percentage a number occurs in a list of arrays and output this percentage accordingly.
I have a list of arrays which looks like this:
import numpy as np
# Create some data
listvalues = []
arr1 = np.array([0, 0, 2])
arr2 = np.array([1, 1, 2, 2])
arr3 = np.array([0, 2, 2])
listvalues.append(arr1)
listvalues.append(arr2)
listvalues.append(arr3)
listvalues
>[array([0, 0, 2]), array([1, 1, 2, 2]), array([0, 2, 2])]
Now I count the occurrences using collections, which returns a a list of collections.Counter:
import collections
counter = []
for i in xrange(len(listvalues)):
counter.append(collections.Counter(listvalues[i]))
counter
>[Counter({0: 2, 2: 1}), Counter({1: 2, 2: 2}), Counter({0: 1, 2: 2})]
The result I am looking for is an array with 3 columns, representing the value 0 to 2 and len(listvalues) of rows. Each cell should be filled with the percentage of that value occurring in the array:
# Result
66.66 0 33.33
0 50 50
33.33 0 66.66
So 0 occurs 66.66% in array 1, 0% in array 2 and 33.33% in array 3, and so on..
What would be the best way to achieve this?
Many thanks!

Here's an approach -
# Get lengths of each element in input list
lens = np.array([len(item) for item in listvalues])
# Form group ID array to ID elements in flattened listvalues
ID_arr = np.repeat(np.arange(len(lens)),lens)
# Extract all values & considering each row as an indexing perform counting
vals = np.concatenate(listvalues)
out_shp = [ID_arr.max()+1,vals.max()+1]
counts = np.bincount(ID_arr*out_shp[1] + vals)
# Finally get the percentages with dividing by group counts
out = 100*np.true_divide(counts.reshape(out_shp),lens[:,None])
Sample run with an additional fourth array in input list -
In [316]: listvalues
Out[316]: [array([0, 0, 2]),array([1, 1, 2, 2]),array([0, 2, 2]),array([4, 0, 1])]
In [317]: print out
[[ 66.66666667 0. 33.33333333 0. 0. ]
[ 0. 50. 50. 0. 0. ]
[ 33.33333333 0. 66.66666667 0. 0. ]
[ 33.33333333 33.33333333 0. 0. 33.33333333]]

The numpy_indexed package has a utility function for this, called count_table, which can be used to solve your problem efficiently as such:
import numpy_indexed as npi
arrs = [arr1, arr2, arr3]
idx = [np.ones(len(a))*i for i, a in enumerate(arrs)]
(rows, cols), table = npi.count_table(np.concatenate(idx), np.concatenate(arrs))
table = table / table.sum(axis=1, keepdims=True)
print(table * 100)

You can get a list of all values and then simply iterate over the individual arrays to get the percentages:
values = set([y for row in listvalues for y in row])
print [[(a==x).sum()*100.0/len(a) for x in values] for a in listvalues]

You can create a list with the percentages with the following code :
percentage_list = [((counter[i].get(j) if counter[i].get(j) else 0)*10000)//len(listvalues[i])/100.0 for i in range(len(listvalues)) for j in range(3)]
After that, create a np array from that list :
results = np.array(percentage_list)
Reshape it so we have the good result :
results = results.reshape(3,3)
This should allow you to get what you wanted.
This is most likely not efficient, and not the best way to do this, but it has the merit of working.
Do not hesitate if you have any question.

I would like to use functional-paradigm to resolve this problem. For example:
>>> import numpy as np
>>> import pprint
>>>
>>> arr1 = np.array([0, 0, 2])
>>> arr2 = np.array([1, 1, 2, 2])
>>> arr3 = np.array([0, 2, 2])
>>>
>>> arrays = (arr1, arr2, arr3)
>>>
>>> u = np.unique(np.hstack(arrays))
>>>
>>> result = [[1.0 * c.get(uk, 0) / l
... for l, c in ((len(arr), dict(zip(*np.unique(arr, return_counts=True))))
... for arr in arrays)] for uk in u]
>>>
>>> pprint.pprint(result)
[[0.6666666666666666, 0.0, 0.3333333333333333],
[0.0, 0.5, 0.0],
[0.3333333333333333, 0.5, 0.6666666666666666]]

How to vectorize this python code?

I am trying to use NumPy and vectorization operations to make a section of code run faster. I appear to have a misunderstanding of how to vectorize this code, however (probably due to an incomplete understanding of vectorization).
Here's the working code with loops (A and B are 2D arrays of a set size, already initialized):
for k in range(num_v):
B[:] = A[:]
for i in range(num_v):
for j in range(num_v):
A[i][j] = min(B[i][j], B[i][k] + B[k][j])
return A
And here is my attempt at vectorizing the above code:
for k in range(num_v):
B = numpy.copy(A)
A = numpy.minimum(B, B[:,k] + B[k,:])
return A
For testing these, I used the following, with the code above wrapped in a function called 'algorithm':
def setup_array(edges, num_v):
r = range(1, num_v + 1)
A = [[None for x in r] for y in r] # or (numpy.ones((num_v, num_v)) * 1e10) for numpy
for i in r:
for j in r:
val = 1e10
if i == j:
val = 0
elif (i,j) in edges:
val = edges[(i,j)]
A[i-1][j-1] = val
return A
A = setup_array({(1, 2): 2, (6, 4): 1, (3, 2): -3, (1, 3): 5, (3, 6): 5, (4, 5): 2, (3, 1): 4, (4, 3): 8, (3, 4): 6, (2, 4): -4, (6, 5): -5}, 6)
B = []
algorithm(A, B, 6)
The expected outcome, and what I get with the first code is:
[[0, 2, 5, -2, 0, 10]
[8, 0, 4, -4, -2, 9]
[4, -3, 0, -7, -5, 5]
[12, 5, 8, 0, 2, 13]
[10000000000.0, 9999999997.0, 10000000000.0, 9999999993.0, 0, 10000000000.0]
[13, 6, 9, 1, -5, 0]]
The second (vectorized) function instead returns:
[[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. -4. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. 0. 0.]
[ 0. -4. 0. 0. -5. 0.]]
What am I missing?

Usually you want to vectorize code because you think it is running too slow.
If your code is too slow, then I can tell you that proper indexing will make it faster.
Instead of A[i][j] you should write A[i, j] -- this avoids a transient copy of a (sub)array.
Since you do this in the inner-most loop of your code this might be very costly.
Look here:
In [37]: timeit test[2][2]
1000000 loops, best of 3: 1.5 us per loop
In [38]: timeit test[2,2]
1000000 loops, best of 3: 639 ns per loop
Do this consistently in your code -- I strongly believe that solves already your performance problem!
Having said that...
... here's my take on how to vectorize
for k in range(num_v):
numpy.minimum(A, np.add.outer(A[:,k], A[k,:]), A)
return A
numpy.minimum will compare two arrays and return element-wise the smaller of two elements. If you pass a third argument it will take the output. If this is an input array the whole operation is in place.
As Peter de Rivay explains, there is a problem in your solution with broadcasting -- but mathematically what you want to do is some kind of outer product over addition of two vectors. Therefore you can use the outer operation on the add function.
NumPy’s binary ufuncs have special methods for performing certain kinds of
special vectorized operations like reduce, accumulate, sum and outer.

The problem is caused by array broadcasting in the line:
A = numpy.minimum(B, B[:,k] + B[k,:])
B is size 6 by 6, B[:,k] is an array with 6 elements, B[k,:] is an array with 6 elements.
(Because you are using the numpy array type, both B[:,k] and B[k,:] return a rank-1 array of shape N)
Numpy automatically changes the sizes to match:
First B[:,k] is added to B[k,:] to make an intermediate array result with 6 elements. (This is not what you intended)
Second this 6 element array is broadcast to a 6 by 6 matrix by repeating the rows
Third the minimum of the original matrix and this broadcast matrix is computed.
This means that your numpy code is equivalent to:
for k in range(num_v):
B[:] = A[:]
C=[B[i][k]+B[k][i] for i in range(num_v)]
for i in range(num_v):
for j in range(num_v):
A[i][j] = min(B[i][j], C[j])
The easiest way to fix your code is to use the matrix type instead of the array type:
A = numpy.matrix(A)
for k in range(num_v):
A = numpy.minimum(A, A[:,k] + A[k,:])
The matrix type uses stricter broadcasting rules so in this case:
A[:,k] is extended to a 6 by 6 matrix by repeating columns
A[k,:] is extended to a 6 by 6 matrix by repeating rows
The broadcasted matrices are added together to make a 6 by 6 matrix
The minimum is applied

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete all zeros slices from 4d numpy array - python

Related

How can I manipulate a numpy array without nested loops?

Numpy Dot product with nested array

Numpy normalize multi dim (>=3) array

Calculate percentage of count for a list of arrays

How to vectorize this python code?

Categories

Resources