Coercing numpy arrays to same dimensions - python

I have 2 matrices, and I want to perform a 'cell-wise' addition, however the matrices aren't the same size. I want to preserve the cells relative positions during the calculation (i.e. their 'co-ordinates' from the top left), so a simple (if maybe not the best) solution, seems to be to pad the smaller matrix's x and y with zeros.
This thread has a perfectly satisfactory answer for concatenating vertically, and this does work with my data, and following the suggestion in the answer, I also threw in the hstack but at the moment, it's complaining that the dimensions (excluding concatenation axis) need to match exactly. Perhaps hstack doesnt work as I anticipate or exactly equivalently to vstack, but I'm at a bit of a loss now.
This is what hstack throws at me, meanwhile vstack seems to have no problem.
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Essentially the code checks which of a pair of matrices is the shorter and/or wider, and then pads the smaller matrix with zeros to match.
Here's the code I have:
import numpy as np
A = np.random.randint(2, size = (3, 7))
B = np.random.randint(2, size = (5, 10))
# If the arrays have different row numbers:
if A.shape[0] < B.shape[0]: # Is A shorter than B?
A = np.vstack((A, np.zeros((B.shape[0] - A.shape[0], A.shape[1]))))
elif A.shape[0] > B.shape[0]: # or is A longer than B?
B = np.vstack((B, np.zeros((A.shape[0] - B.shape[0], B.shape[1]))))
# If they have different column numbers
if A.shape[1] < B.shape[1]: # Is A narrower than B?
A = np.hstack((A, np.zeros((B.shape[1] - A.shape[1], A.shape[0]))))
elif A.shape[1] > B.shape[1]: # or is A wider than B?
B = np.hstack((B, np.zeros((A.shape[1] - B.shape[1], B.shape[0]))))
It's getting late so its possible I've just missed something obvious with hstack but I can't see my logic error at the moment.

Just use np.pad :
np.pad(A,((0,2),(0,3)),'constant') # 2 is 5-3, 3 is 10-7
[[0 1 1 0 1 0 0 0 0 0]
[1 0 0 1 0 1 0 0 0 0]
[1 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
But the 4 pads width must be computed; so an another simple
method to pad the 2 array in any case is :
A = np.ones((3, 7),int)
B = np.ones((5, 2),int)
ma,na = A.shape
mb,nb = B.shape
m,n = max(ma,mb) , max(na,nb)
newA = np.zeros((m,n),A.dtype)
newA[:ma,:na]=A
newB = np.zeros((m,n),B.dtype)
newB[:mb,:nb]=B
For :
[[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
[[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]
[1 1 0 0 0 0 0]]

I think your hstack lines should be of the form
np.hstack((A, np.zeros((A.shape[0], B.shape[1] - A.shape[1]))))
You seem to have the rows and columns swapped.

Yes, indeed. You should swap (B.shape[1] - A.shape[1], A.shape[0]) to (A.shape[0], B.shape[1] - A.shape[1]) and so on, because you need to have the same numbers of rows to stack them horizontally.

Try b[:a.shape[0], :a.shape[1]] = b[:a.shape[0], :a.shape[1]]+a where b the larger array
Example below
import numpy as np
a = np.arange(12).reshape(3, 4)
print("a\n", a)
b = np.arange(16).reshape(4, 4)
print("b original\n", b)
b[:a.shape[0], :a.shape[1]] = b[:a.shape[0], :a.shape[1]]+a
print("b new\n",b)
output
a
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
b original
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
b new
[[ 0 2 4 6]
[ 8 10 12 14]
[16 18 20 22]
[12 13 14 15]]

Related

How to resize a Numpy array to add/replace rows with combinations determined by the values in each row of the array

So I have a program that at some point creates random arrays and I have perform an operation which is to add rows while replacing other rows based on the values found in the rows. One of the random arrays will look something like this but keep in mind that it could randomly vary in size ranging from 3x3 up to 10x10:
0 2 0 1
1 0 0 1
1 0 2 1
2 0 1 2
For every row that has at least one value equal to 2 I need to remove/replace the row and add some more rows. The number of rows added will depend on the number of combinations possible of 0s and 1s where the number of digits is equal to the number of 2s counted in each row. Each added row will introduce one of these combinations in the positions where the 2s are located. The result that I'm looking for will look like this:
0 1 0 1 # First combination to replace 0 2 0 1
0 0 0 1 # Second combination to replace 0 2 0 1 (Only 2 combinations, only one 2)
1 0 0 1 # Stays the same
1 0 1 1 # First combination to replace 1 0 2 1
1 0 0 1 # Second combination to replace 1 0 2 1 (Only 2 combinations, only one 2)
0 0 1 0 # First combination to replace 2 0 1 2
0 0 1 1 # Second combination to replace 2 0 1 2
1 0 1 1 # Third combination to replace 2 0 1 2
1 0 1 0 # Fourth combination to replace 2 0 1 2 (4 combinations, there are two 2s)
If you know a Numpy way of accomplishing this I will be grateful.
You can try the following. Create a sample array:
import numpy as np
np.random.seed(5)
a = np.random.randint(0, 3, (4, 4))
print(a)
This gives:
[[2 1 2 2]
[0 1 0 0]
[2 0 2 0]
[0 1 1 0]]
Compute the output array:
ts = (a == 2).sum(axis=1)
r = np.hstack([np.array(np.meshgrid(*[[0, 1]] * t)).reshape(t, -1).T.ravel() for t in ts if t])
out = np.repeat(a, 2**ts, axis=0)
out[out == 2] = r
print(out)
Result:
[[0 1 0 0]
[0 1 0 1]
[1 1 0 0]
[1 1 0 1]
[0 1 1 0]
[0 1 1 1]
[1 1 1 0]
[1 1 1 1]
[0 1 0 0]
[0 0 0 0]
[1 0 0 0]
[0 0 1 0]
[1 0 1 0]
[0 1 1 0]]
Not the prettiest code but it does the job. You could clean up the itertools calls but this lets you see how it works.
import numpy as np
import itertools
X = np.array([[0, 2, 0, 1],
[1, 0, 0, 1],
[1, 0, 2, 1],
[2, 0, 1, 2]])
def add(X_,Y):
if Y.size == 0:
Y = X_
else:
Y = np.vstack((Y, X_))
return(Y)
Y = np.array([])
for i in range(len(X)):
if 2 not in X[i,:]:
Y = add(X[i,:], Y)
else:
a = np.where(X[i,:]==2)[0]
n = [[i for i in itertools.chain([1, 0])] for _ in range(len(a))]
m = list(itertools.product(*n))
for j in range(len(m)):
M = 1 * X[i,:]
u = list(m[j])
for k in range(len(a)):
M[a[k]] = u[k]
Y = add(M, Y)
print(Y)
#[[0 1 0 1]
# [0 0 0 1]
# [1 0 0 1]
# [1 0 1 1]
# [1 0 0 1]
# [1 0 1 1]
# [1 0 1 0]
# [0 0 1 1]
# [0 0 1 0]]

How to find numpy array shape in a larger array?

big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]

Create arbitrary dimension sparse matrix in Python 3

So I'm using Python 3 to create a matrix of the form
L=[B 0 0
0 B 0
0 0 B]
where
B=[4 -1 0
-1 4 -1
0 -1 4]
But instead of repeating B three times, I want to do it N times (depending on input). My shot so far is the following
import numpy as np
import scipy.sparse as sp
one=np.ones(N)
four=4*np.ones(N)
data = np.array([-one, four, -one])
diags = np.array([-1,0, 1])
B=sp.spdiags(data, diags, N, N).toarray() # Create matrix B
L=np.kron(np.eye(N), B)
However, it takes a lot of time when N is big (which is necessary since this is to solve a differential equation). Is there a more effective way to do this?
No timings here (no performance guarantees from me), but for me the most natural approach (don't have much experience with kronecker) would be scipy's block_diag although i always wonder if i'm using it correctly (in this case: list-comprehension):
Code
import numpy as np
import scipy.sparse as sp
N = 2
B = np.array([[4,-1,0],[-1,4,-1],[0,-1,4]])
L = sp.block_diag([B for i in range(N)])
print(L.todense())
Out
[[ 4 -1 0 0 0 0]
[-1 4 -1 0 0 0]
[ 0 -1 4 0 0 0]
[ 0 0 0 4 -1 0]
[ 0 0 0 -1 4 -1]
[ 0 0 0 0 -1 4]]

find mean position and area of labelled objects

I have a 2D labeled image (numpy array), each label represents an object. I have to find the object's center and its area. My current solution:
centers = [np.mean(np.where(label_2d == i),1) for i in range(1,num_obj+1)]
surface_area = np.array([np.sum(label_2d == i) for i in range(1,num_obj+1)])
Note that label_2d used for centers is not the same as the one for surface area, so I can't combine both operations. My current code is about 10-100 times to slow.
In C++ I would iterate through the image once (2 for loops) and fill the table (an array), from which I would than calculate centers and surface area.
Since for loops are quite slow in python, I have to find another solution. Any advice?
You could use the center_of_mass function present in scipy.ndimage.measurements for the first problem and then use np.bincount for the second problem. Because these are in the mainstream libraries, they will be heavily optimized, so you can expect decent speed gains.
Example:
>>> import numpy as np
>>> from scipy.ndimage.measurements import center_of_mass
>>>
>>> a = np.zeros((10,10), dtype=np.int)
>>> # add some labels:
... a[3:5, 1:3] = 1
>>> a[7:9, 0:3] = 2
>>> a[5:6, 4:9] = 3
>>> print(a)
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 1 1 0 0 0 0 0 0 0]
[0 0 0 0 3 3 3 3 3 0]
[0 0 0 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[2 2 2 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
>>>
>>> num_obj = 3
>>> surface_areas = np.bincount(a.flat)[1:]
>>> centers = center_of_mass(a, labels=a, index=range(1, num_obj+1))
>>> print(surface_areas)
[4 6 5]
>>> print(centers)
[(3.5, 1.5), (7.5, 1.0), (5.0, 6.0)]
Speed gains depend on the size of your input data though, so I can't make any serious estimates on that. Would be nice if you could add that info (size of a, number of labels, timing results for the method you used and these functions) in the comments.

Insert data from one sorted array into another sorted array

I apologize if this has been asked here - I've hunted around here and in the Tentative NumPy Tutorial for an answer.
I have 2 numpy arrays. The first array is similar to:
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
(etc... It's ~700x10 in actuality)
I then have a 2nd array similar to
3 1
4 18
5 2
(again, longer - maybe 400 or so rows)
The first column of the 2nd array is always completely contained within the first column of the first array
What I'd like to do is to insert the 2nd column of the 2nd array into that first array as part of an existing column, i.e:
array a:
1 0 0 0 0
2 0 0 0 0
3 1 0 0 0
4 18 0 0 0
5 2 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
(I'd be filling in each of those columns in turn, but each covers a different range within the original)
My first try was along the lines of a[b[:,0],1]=b[:,1] which puts them into the indices of b, not the values (ie, in my example above instead of filling in rows 3, 4, and 5, I filled in 2, 3, and 4). I should have realized that!
Since then, I've tried to make it work pretty inelegantly with where(), and I think I could make it work by finding the difference in the starting values of the first columns.
I'm new to python, so perhaps I'm overly optimistic - but it seems like there should be a more elegant way and I'm just missing it.
Thanks for any insights!
If the numbers in the first column of a are in sorted order, then you could use
a[a[:,0].searchsorted(b[:,0]),1] = b[:,1]
For example:
import numpy as np
a = np.array([(1,0,0,0,0),
(2,0,0,0,0),
(3,0,0,0,0),
(4,0,0,0,0),
(5,0,0,0,0),
(6,0,0,0,0),
(7,0,0,0,0),
(8,0,0,0,0),
])
b = np.array([(3, 1),
(5, 18),
(7, 2)])
a[a[:,0].searchsorted(b[:,0]),1] = b[:,1]
print(a)
yields
[[ 1 0 0 0 0]
[ 2 0 0 0 0]
[ 3 1 0 0 0]
[ 4 0 0 0 0]
[ 5 18 0 0 0]
[ 6 0 0 0 0]
[ 7 2 0 0 0]
[ 8 0 0 0 0]]
(I changed your example a bit to show that the values in b's first column do not have to be contiguous.)
If a[:,0] is not in sorted order, then you could use np.argsort to workaround this:
a = np.array( [(1,0,0,0,0),
(2,0,0,0,0),
(5,0,0,0,0),
(3,0,0,0,0),
(4,0,0,0,0),
(6,0,0,0,0),
(7,0,0,0,0),
(8,0,0,0,0),
])
b = np.array([(3, 1),
(5, 18),
(7, 2)])
perm = np.argsort(a[:,0])
a[:,1][perm[a[:,0][perm].searchsorted(b[:,0])]] = b[:,1]
print(a)
yields
[[ 1 0 0 0 0]
[ 2 0 0 0 0]
[ 5 18 0 0 0]
[ 3 1 0 0 0]
[ 4 0 0 0 0]
[ 6 0 0 0 0]
[ 7 2 0 0 0]
[ 8 0 0 0 0]]
The setup:
a = np.arange(20).reshape(2,10).T
b = np.array([[1, 100], [3, 300], [8, 800]])
This will work if you don't know anything about a[:, 0] except that it is sorted.
index = a[:, 0].searchsorted(b[:, 0])
a[index, 1] = b[:, 1]
print a
array([[ 0, 10],
[ 1, 100],
[ 2, 12],
[ 3, 300],
[ 4, 14],
[ 5, 15],
[ 6, 16],
[ 7, 17],
[ 8, 800],
[ 9, 19]])
But if you know that a[:, 0] is a sequence of contiguous integers like your example you can do:
index = b[:,0] + a[0, 0]
a[index, 1] = b[:, 1]

Categories