Related
I am working on a large array (3000 x 3000) over which I use scipy.ndimage.label. The return is 3403 labels and the labelled array. I would like to know the indices of these labels for e.g. for label 1 I should know the rows and columns in the labelled array.
So basically like this
a[0] = array([[1, 1, 0, 0],
[1, 1, 0, 2],
[0, 0, 0, 2],
[3, 3, 0, 0]])
indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3 is number of labels.
print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]
And I would like to create a list of indices for all 3403 labels like above. The above method seems to be slow. I tried using generators, it doesn't look like there is improvement.
Are there any efficient ways?
Well the idea with gaining efficiency would be to minimize the work once inside the loop. A vectorized method isn't possible given that you would have variable number of elements per label. So, with those factors in mind, here's one solution -
a_flattened = a[0].ravel()
sidx = np.argsort(a_flattened)
afs = a_flattened[sidx]
cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
row, col = np.unravel_index(sidx, a[0].shape)
row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
Sample input, output -
In [59]: a[0]
Out[59]:
array([[1, 1, 0, 0],
[1, 1, 0, 2],
[0, 0, 0, 2],
[3, 3, 0, 0]])
In [60]: a[1]
Out[60]: 3
In [62]: row_indices # row indices
Out[62]:
[array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
array([0, 0, 1, 1]), # for label-1
array([1, 2]), # for label-2
array([3, 3])] # for label-3
In [63]: col_indices # column indices
Out[63]:
[array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
array([0, 1, 0, 1]), # for label-1
array([3, 3]), # for label-2
array([0, 1])] # for label-3
The first elements off row_indices and col_indices are the expected output. The first groups from each those represent the 0-th regions, so you might want to skip those.
If I have two lists and want to iterate through subtracting one from the other how would I go about this? I was thinking broadcasting. Right now I have:
array1 = [0,2,2,0]
array2 = [2,2,0,1]
I would like to subtract array1 from each value in array2 and make a new matrix of outputs:
output = [2, 0, 0, 2,
2, 0, 0, 2,
0, -2, -2, 0,
1, -1, -1, 1]
so in the end it's a 4x4 matrix.
Is this possible? Is the easiest way to use broadcasting? I was thinking of making each row value in array2 into it's own array, subtracting that from array2 using broadcasting, then summing all the array's at the end into one big array (using Numpy)... is there an easier way?
If I have two lists and want to iterate through subtracting one from the other how would I go about this? I was thinking broadcasting. Right now I have:
array1 = [0,2,2,0]
array2 = [2,2,0,1]
I would like to subtract array1 from each value in array2 and make a new matrix of outputs:
output = [2, 0, 0, 2,
2, 0, 0, 2,
0, -2, -2, 0,
1, -1, -1, 1]
so in the end it's a 4x4 matrix.
Is this possible? Is the easiest way to use broadcasting? I was thinking of making each row value in array2 into it's own array, subtracting that from array2 using broadcasting, then summing all the array's at the end into one big array (using Numpy)... is there an easier way?
Broadcasting with numpy:
>>> a1 = np.array([0,2,2,0])
>>> a2 = np.array([2,2,0,1])
>>> a2[:, np.newaxis] - a1
array([[ 2, 0, 0, 2],
[ 2, 0, 0, 2],
[ 0, -2, -2, 0],
[ 1, -1, -1, 1]])
Something like this?
def all_differences(x, y):
return (a - b for a in y for b in x)
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))
# -> [2, 0, 0, 2, 2, 0, 0, 2, 0, -2, -2, 0, 1, -1, -1, 1]
It just itertates over every item in the second list for every item in the first list, and gives their difference.
This can also be solved with itertools.product and can be generalised for multiple lists:
import itertools
import functools
import operator
difference = functools.partial(functools.reduce, operator.sub)
def all_differences(*lists):
return map(difference, itertools.product(*reversed(lists)))
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))
Or just handling two lists:
import itertools
def all_differences(x, y):
return (b - a for (a, b) in itertools.product((x, y)))
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))
my situation is as follows:
I have an array of results, say
S = np.array([2,3,10,-1,12,1,2,4,4]), which I would like to insert in the last row of a scipy.sparse.lil_matrix M according to an array of column indices with possibly repeated elements (with no specific pattern), e.g.:
j = np.array([3,4,5,14,15,16,3,4,5]).
When column indices are repeated, the sum of their corresponding values in S should be inserted in the matrix M. Thus, in the example above, results [4,7,14] should be placed in columns [3,4,5] of the last row of M. In other words, I would like to achieve something like:
M[-1,j] = np.array([2+2,3+4,10+4,-1,12,1]).
Calculation speed is very important for my program, such that I should avoid using loops. Looking forward to your clever solutions! Thanks!
That kind of summation is the normal behavior for sparse matrices, especially in the csr format.
define the 3 input arrays:
In [408]: S = np.array([2,3,10,-1,12,1,2,4,4])
In [409]: j=np.array([3,4,5,14,15,16,3,4,5])
In [410]: i=np.ones(S.shape,int)
The coo format takes those 3 arrays, as is, without change
In [411]: c0=sparse.coo_matrix((S,(i,j)))
In [412]: c0.data
Out[412]: array([ 2, 3, 10, -1, 12, 1, 2, 4, 4])
But when converted to csr format, it sums repeated indices:
In [413]: c1=c0.tocsr()
In [414]: c1.data
Out[414]: array([ 4, 7, 14, -1, 12, 1], dtype=int32)
In [415]: c1.A
Out[415]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 4, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, -1, 12, 1]], dtype=int32)
That summation is also done when converting the coo to dense or array, c0.A.
and when converting to lil:
In [419]: cl=c0.tolil()
In [420]: cl.data
Out[420]: array([[], [4, 7, 14, -1, 12, 1]], dtype=object)
In [421]: cl.rows
Out[421]: array([[], [3, 4, 5, 14, 15, 16]], dtype=object)
lil_matrix does not accept the (data,(i,j)) input directly, so you have to go through coo if that is your target.
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html
By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)
To do this as an insertion in an existing lil use an intermediate csr:
In [443]: L=sparse.lil_matrix((3,17),dtype=S.dtype)
In [444]: L[-1,:]=sparse.csr_matrix((S,(np.zeros(S.shape),j)))
In [445]: L.A
Out[445]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 4, 7, 14, 0, 0, 0, 0, 0, 0, 0, 0, -1, 12, 1]])
This statement is faster than the one using csr_matrix;
L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j)))
Examine L.__setitem__ if you are really worried about speed. Off hand it looks like it normally converts a sparse matrix to array
L[-1,:]=sparse.coo_matrix((S,(np.zeros(S.shape),j))).A
takes the same time. With a small test case like this, the overhead of creating an intermediate matrix can swamp any time spent adding these duplicate indices.
In general, inserting or appending values to an existing sparse matrix is slow, regardless of whether you do this summation or not. Where possible it is best to create the data, i and j arrays for the whole matrix first, and then make the sparse matrix.
You could use a defaultdict that maps the M column indices to their value and use the map function to update this defaultdict, like so:
from collections import defaultdict
d = defaultdict(int) #Use your array type here
def f(j, s):
d[j] += s
map(f, j, S)
M[-1, d.keys()] = d.values() #keys and values are always in the same order
Instead of map, you can use filter if you don't want to create a list of None uselessly:
d = defaultdict(int) #Use your array type here
def g(e):
d[e[1]] += S[e[0]]
filter(g, enumerate(j))
M[-1, d.keys()] = d.values() #keys and values are always in the same
import numpy as np
data = np.array([[0, 0, 1, 1, 2, 2],
[1, 0, 0, 1, 2, 2],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 2, 0]])
How can I do the followings?
Within 2 by 2 patch:
if any element is 2: put 2
if any element is 1: put 1
if all elements are 0: put 0
The expected result is:
np.array([[1, 1, 2],
[1, 1, 2]])
Using extract_patches from scikit-learn you can write this as follows (copy and paste-able code):
import numpy as np
from sklearn.feature_extraction.image import extract_patches
data = np.array([[0, 0, 1, 1, 2, 2],
[1, 0, 0, 1, 2, 2],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 2, 0]])
patches = extract_patches(data, patch_shape=(2, 2), extraction_step=(2, 2))
output = patches.max(axis=-1).max(axis=-1)
Explanation: extract_patches gives you a view on patches of your array, of size patch_shape and lying on a grid of extraction_step. The result is a 4D array where the first two axes index the patch and the last two axes index the pixels within the patch. We then evaluate the maximum over the last two axes to obtain the maximum per patch.
EDIT This is actually very much related to this question
I'm not sure where you get your input from or where you are supposed to leave the output, but you can adapt this.
import numpy as np
data = np.array([[0, 0, 1, 1, 2, 2],
[1, 0, 0, 1, 2, 2],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 2, 0]])
def patchValue(i,j):
return max([data[i][j],
data[i][j+1],
data[i+1][j],
data[i+1][j+1]])
result = np.array([[0, 0, 0],
[0, 0, 0]])
for (v,i) in enumerate(range(0,4,2)):
for (w,j) in enumerate(range(0,6,2)):
result[v][w] = patchValue(i,j)
print(result)
Here's a rather lengthy one-liner that relies solely on reshaping, transposes, and taking maximums along different axes. It is fairly fast too.
data.reshape((-1,2)).max(axis=1).reshape((data.shape[0],-1)).T.reshape((-1,2)).max(axis=1).reshape((data.shape[1]/2,data.shape[0]/2)).T
Essentially what this does is reshape to take the maximum in pairs of two horizontally, then shuffle things around again and take the maximum in pairs of two vertically, ultimately giving the maximum of each block of 4, matching your desired output.
If the original array is large, and performance is an issue, the loops can be pushed down to numpy C code by manipulating the shape and strides of the original array to create the windows that you are acting on:
import numpy as np
from numpy.lib.stride_tricks import as_strided
data = np.array([[0, 0, 1, 1, 2, 2],
[1, 0, 0, 1, 2, 2],
[1, 0, 1, 0, 0, 0],
[1, 1, 0, 0, 2, 0]])
patch_shape = (2,2)
data_shape = np.array(data.shape)
# transform data to a 2x3 array of 2x2 patches/windows
# final shape of the computation on the windows can be calculated with:
# tuple(((data_shape-patch_shape) // patch_shape) + 1)
final_shape = (2,3)
# the shape of the windowed array can be calculated with:
# final_shape + patch_shape
newshape = (2, 3, 2, 2)
# the strides of the windowed array can be calculated with:
# tuple(np.array(data.strides) * patch_shape) + data.strides
newstrides = (48, 8, 24, 4)
# use as_strided to 'transform' the array
patch_array = as_strided(data, shape = newshape, strides = newstrides)
# flatten the windowed array for iteration - dim of 6x2x2
# the number of windows is the product of the 'first' dimensions of the array
# which can be calculated with:
# (np.product(newshape[:-len(patch_shape)])) + (newshape[-len(patch_array):])
dim = (6,2,2)
patch_array = patch_array.reshape(dim)
# perfom computations on the windows and reshape to final dimensions
result = [2 if np.any(patch == 2) else
1 if np.any(patch == 1) else
0 for patch in patch_array]
result = np.array(result).reshape(final_shape)
A generalized 1-d function for creating the windowed array can be found at Efficient rolling statistics with NumPy
A generalised multi-dimension function and a nice explanation can be found at Efficient Overlapping Windows with Numpy
I have a sparse matrix (numpy.array) and I would like to have the index of the nonzero elements in it.
In Matlab I would write:
[i, j] = find(CM)
and in Python what should I do?
I have tried numpy.nonzero (but I don't know how to take the indices from that) and flatnonzero (but it's not convenient for me, I need both the row and column index).
Thanks in advance!
Assuming that by "sparse matrix" you don't actually mean a scipy.sparse matrix, but merely a numpy.ndarray with relatively few nonzero entries, then I think nonzero is exactly what you're looking for. Starting from an array:
>>> a = (np.random.random((5,5)) < 0.10)*1
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
nonzero returns the indices (here x and y) where the nonzero entries live:
>>> a.nonzero()
(array([1, 2, 3]), array([4, 2, 0]))
We can assign these to i and j:
>>> i, j = a.nonzero()
We can also use them to index back into a, which should give us only 1s:
>>> a[i,j]
array([1, 1, 1])
We can even modify a using these indices:
>>> a[i,j] = 2
>>> a
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 2],
[0, 0, 2, 0, 0],
[2, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
If you want a combined array from the indices, you can do that too:
>>> np.array(a.nonzero()).T
array([[1, 4],
[2, 2],
[3, 0]])
(there are lots of ways to do this reshaping; I chose one almost at random.)
This goes slightly beyond what you as and I only mention it since I once faced a similar problem. If you want the indices to access some other array there is some very simple sytax:
import numpy as np
array = np.random.randint(0, 2, size=(3, 3))
data = np.random.random(size=(3, 3))
Now array looks something like
>>> print array
array([[0, 1, 0],
[1, 0, 1],
[1, 1, 0]])
while data could be
>>> print data
array([[ 0.92824816, 0.43605604, 0.16627849],
[ 0.00301434, 0.94342538, 0.95297402],
[ 0.32665135, 0.03504204, 0.86902492]])
Then if we want the elements of data which are zero:
>>> print data[array==0]
array([ 0.92824816, 0.16627849, 0.94342538, 0.86902492])
Which is nice and simple.