I want to convolve an n-dimensional image which is conceptually periodic.
What I mean is the following: if I have a 2D image
>>> image2d = [[0,0,0,0],
... [0,0,0,1],
... [0,0,0,0]]
and I want to convolve it with this kernel:
>>> kernel = [[ 1,1,1],
... [ 1,1,1],
... [ 1,1,1]]
then I want the result to be:
>>> result = [[1,0,1,1],
... [1,0,1,1],
... [1,0,1,1]]
How to do this in python/numpy/scipy?
Note that I am not interested in creating the kernel, but mainly the periodicity of the convolution, i.e. the three leftmost ones in the resulting image (if that makes sense).
This is already built in, with scipy.signal.convolve2d's optional boundary='wrap' which gives periodic boundary conditions as padding for the convolution. The mode option here is 'same' to make the output size match the input size.
In [1]: image2d = [[0,0,0,0],
... [0,0,0,1],
... [0,0,0,0]]
In [2]: kernel = [[ 1,1,1],
... [ 1,1,1],
... [ 1,1,1]]
In [3]: from scipy.signal import convolve2d
In [4]: convolve2d(image2d, kernel, mode='same', boundary='wrap')
Out[4]:
array([[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1]])
The only disadvantage here is that you cannot use scipy.signal.fftconvolve which is often faster.
image2d = [[0,0,0,0,0],
[0,0,0,1,0],
[0,0,0,0,0],
[0,0,0,0,0]]
kernel = [[1,1,1],
[1,1,1],
[1,1,1]]
image2d = np.asarray(image2d)
kernel = np.asarray(kernel)
img_f = np.fft.fft2(image2d)
krn_f = np.fft.fft2(kernel, s=image2d.shape)
conv = np.fft.ifft2(img_f*krn_f).real
>>> conv.round()
array([[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 1., 1.],
[ 1., 0., 0., 1., 1.],
[ 1., 0., 0., 1., 1.]])
Note that the kernel is placed with its top left corner at the position of the 1 in the image. You would need to roll the result to get what you are after:
k_rows, k_cols = kernel.shape
conv2 = np.roll(np.roll(conv, -(k_cols//2), axis=-1),
-(k_rows//2), axis=-2)
>>> conv2.round()
array([[ 0., 0., 1., 1., 1.],
[ 0., 0., 1., 1., 1.],
[ 0., 0., 1., 1., 1.],
[ 0., 0., 0., 0., 0.]])
This kind of 'periodic convolution' is better known as circular or cyclic convolution. See http://en.wikipedia.org/wiki/Circular_convolution.
In the case of an n-dimensional image, as is the case in this question, one can use the scipy.ndimage.convolve function. It has a parameter mode which can be set to wrap for a circular convolution.
result = scipy.ndimage.convolve(image,kernel,mode='wrap')
>>> import numpy as np
>>> image = np.array([[0, 0, 0, 0],
... [0, 0, 0, 1],
... [0, 0, 0, 0]])
>>> kernel = np.array([[1, 1, 1],
... [1, 1, 1],
... [1, 1, 1]])
>>> from scipy.ndimage import convolve
>>> convolve(image, kernel, mode='wrap')
array([[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1]])
Related
I am working with a thematic raster of land use classes. The goal is to split the raster into smaller tiles of a given size. For example, I have a raster of 1490 pixels and I want to split it into tiles of 250x250 pixels. To get tiles of equal size, I would want to increase the width of the raster to 1500 pixels to fit in exactly 6 tiles. To do so, I need to increase the size of the raster by 10 pixels.
I am currently opening the raster with the rasterio library, which returns a NumPy ndarray. Is there a function to add a buffer around this array? The goal would be something like this:
import numpy as np
a = np.array([
[1,4,5],
[4,5,5],
[1,2,2]
])
a_with_buffer = a.buffer(a, 1) # 2nd argument refers to the buffer size
Then a_with_buffer would look as following:
[0,0,0,0,0]
[0,1,4,5,0],
[0,4,5,5,0],
[0,1,2,2,0],
[0,0,0,0,0]
You can use np.pad:
>>> np.pad(a, 1)
array([[0, 0, 0, 0, 0],
[0, 1, 4, 5, 0],
[0, 4, 5, 5, 0],
[0, 1, 2, 2, 0],
[0, 0, 0, 0, 0]])
you can create np.zeros then insert a in the index what you want like below.
Try this:
>>> a = np.array([[1,4,5],[4,5,5],[1,2,2]])
>>> b = np.zeros((5,5))
>>> b[1:1+a.shape[0],1:1+a.shape[1]] = a
>>> b
array([[0., 0., 0., 0., 0.],
[0., 1., 4., 5., 0.],
[0., 4., 5., 5., 0.],
[0., 1., 2., 2., 0.],
[0., 0., 0., 0., 0.]])
so maybe this is a basic question about numpy, but I can't see how to do is, so lets say I have a 2D numpy array like this
import numpy as np
arr = np.array([[ 0., 460., 166., 167., 123.],
[ 0., 0., 0., 0., 0.],
[ 0., 81., 0., 21., 0.],
[ 0., 128., 23., 0., 12.],
[ 0., 36., 0., 13., 0.]])
And I want the coordinates from the subarray
[[0., 21,. 0.],
[23., 0., 12.],
[0., 13., 0.]]
I tried slicing my original array and the find the coordinates using np.argwhere like this
newarr = np.argwhere(arr[2:, 2:] != 0)
#output
#[[0 1]
# [1 0]
# [1 2]
# [2 1]]
Which are indeed the coordinates from the subarray but I was expecting the coordinates corresponding to my original array, the desired output is:
[[2 3]
[3 2]
[3 4]
[4 3]]
If I use the np.argwhere with my original array I get a bunch of coordinates that I don't need, so I can't figure it out how to get what I need, any help or if you can point me to the right direction will be great, thank you!
Assume origin on the top left corner of the matrix and the matrix itself placed in 4th quadrant of Cartesian space. The horizontal axis having the column indices, and the vertical axis coming down having row indices.
You will see the whole sub-matrix is origin shifted on (2,2) coordinate. Thus when the coordinates you get are with respect to sub-matrix on origin, then to get them back to (2,2) again, just add (2,2) in whole elements:
>>> np.argwhere(arr[2:, 2:] != 0) + [2, 2]
array([[2, 3],
[3, 2],
[3, 4],
[4, 3]])
For other examples:
>>> col_shift, row_shift = 3, 2
>>> arr[row_shift:, col_shift:]
array([[21., 0.],
[ 0., 12.],
[13., 0.]])
>>> np.argwhere(arr[row_shift:, col_shift:] != 0) + [row_shift, col_shift]
array([[2, 3],
[3, 4],
[4, 3]])
For a fully inside sub matrix, you can bound the column and rows:
>>> col_shift, row_shift = 0, 1
>>> col_bound, row_bound = 4, 4
>>> arr[row_shift:row_bound, col_shift:col_bound]
array([[ 0., 0., 0., 0.],
[ 0., 81., 0., 21.],
[ 0., 128., 23., 0.]])
>>> np.argwhere(arr[row_shift:row_bound, col_shift:col_bound] != 0) + [row_shift, col_shift]
array([[2, 1],
[2, 3],
[3, 1],
[3, 2]])
You have moved down the array two times and two times to the right. All that remains for you is to add the number of steps taken towards X and towards Y in the coordinates:
y = 2
x = 2
newarr = np.argwhere(arr[y:, x:] != 0)
X = (newarr[0:, 0] + x).reshape(4,1)
Y = (newarr[0:, 1] + y).reshape(4,1)
print(np.concatenate((X, Y), axis=1))
This question already has answers here:
Convert array of indices to one-hot encoded array in NumPy
(22 answers)
Closed 5 years ago.
After running kmeans I can easily get an array with the assigned clusters for ever data point. Now I want to get a membership matrix (one-hot array) which has the different clusters as columns and indicates the cluster assignment by either 1 or 0 in the matrix for each data point.
My code is shown below and it works but I am wondering if there is a more elegant way to do the same.
km = KMeans(n_clusters=3).fit(data)
membership_matrix = np.stack([np.where(km.labels_ == 0, 1,0),
np.where(km.labels_ == 1, 1,0),
np.where(km.labels_ == 2, 1,0)]
axis = 1)
So you can create 'one-hot array' which is equivalent to your membership array from array of cluster according to this question. Here is how you do it using np.eye
import numpy as np
clusters = np.array([2,1,2,2,0,1])
n_clusters = max(clusters) + 1
membership_matrix = np.eye(n_clusters)[clusters]
Output is as follows
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])
Here's a method that's agnostic to the number of clusters you have (with your method, you'll have to "stack" more things if you have more clusters).
This code sample assumes you have six data points and 3 clusters:
NUM_DATA_POINTS = 6
NUM_CLUSTERS = 3
clusters = np.array([2,1,2,2,0,1]) # hard-coded as an example, but this is your KMeans output
# create your empty membership matrix
membership = np.zeros((NUM_DATA_POINTS, NUM_CLUSTERS))
membership[np.arange(NUM_DATA_POINTS), clusters] = 1
The key feature being used here is 2D array indexing - in the last line of code above, we index into the rows of membership sequentially (np.arange creates an incrementing sequence from 0 to NUM_DATA_POINTS-1) and into the columns of membership using the cluster assignments. Here's the relevant numpy reference.
It would produce the following membership matrix:
>>> membership
array([[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])
You are looking for LabelBinarizer. Give this code a try:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
membership_matrix = lb.fit_transform(km.labels_)
In contrast to other solutions proposed here, this approach:
Generates a compact membership matrix when the labels are not consecutive numbers.
Is able to deal with categorical labels.
Sample run:
In [9]: lb.fit_transform([0, 1, 2, 0, 2, 2])
Out[9]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])
In [10]: lb.fit_transform([0, 1, 9, 0, 9, 9])
Out[10]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])
In [11]: lb.fit_transform(['first', 'second', 'third', 'first', 'third', 'third'])
Out[11]:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 0, 1],
[0, 0, 1]])
I am looking for a way to binarize numpy N-d array based on the threshold using only one expression. So I have something like this:
np.random.seed(0)
np.set_printoptions(precision=3)
a = np.random.rand(4, 4)
threshold, upper, lower = 0.5, 1, 0
a is now:
array([[ 0.02 , 0.833, 0.778, 0.87 ],
[ 0.979, 0.799, 0.461, 0.781],
[ 0.118, 0.64 , 0.143, 0.945],
[ 0.522, 0.415, 0.265, 0.774]])
Now I can fire these 2 expressions:
a[a>threshold] = upper
a[a<=threshold] = lower
and achieve what I want:
array([[ 0., 1., 1., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 0., 1.],
[ 1., 0., 0., 1.]])
But is there a way to do this with just one expression?
We may consider np.where:
np.where(a>threshold, upper, lower)
Out[6]:
array([[0, 1, 1, 1],
[1, 1, 0, 1],
[0, 1, 0, 1],
[1, 0, 0, 1]])
Numpy treats every 1d array as a vector, 2d array as sequence of vectors (matrix) and 3d+ array as a generic tensor. This means when we perform operations, we are performing vector math. So you can just do:
>>> a = (a > 0.5).astype(np.int_)
For example:
>>> np.random.seed(0)
>>> np.set_printoptions(precision=3)
>>> a = np.random.rand(4, 4)
>>> a
>>> array([[ 0.549, 0.715, 0.603, 0.545],
[ 0.424, 0.646, 0.438, 0.892],
[ 0.964, 0.383, 0.792, 0.529],
[ 0.568, 0.926, 0.071, 0.087]])
>>> a = (a > 0.5).astype(np.int_) # Where the numpy magic happens.
>>> array([[1, 1, 1, 1],
[0, 1, 0, 1],
[1, 0, 1, 1],
[1, 1, 0, 0]])
Whats going on here is that you are automatically iterating through every element of every row in the 4x4 matrix and applying a boolean comparison to each element.
If > 0.5 return True, else return False.
Then by calling the .astype method and passing np.int_ as the argument, you're telling numpy to replace all boolean values with their integer representation, in effect binarizing the matrix based on your comparison value.
A shorter method is to simply multiply the boolean matrix from the condition by 1 or 1.0, depending on the type you want.
>>> a = np.random.rand(4,4)
>>> a
array([[ 0.63227032, 0.18262573, 0.21241511, 0.95181594],
[ 0.79215808, 0.63868395, 0.41706148, 0.9153959 ],
[ 0.41812268, 0.70905987, 0.54946947, 0.51690887],
[ 0.83693151, 0.10929998, 0.19219377, 0.82919761]])
>>> (a>0.5)*1
array([[1, 0, 0, 1],
[1, 1, 0, 1],
[0, 1, 1, 1],
[1, 0, 0, 1]])
>>> (a>0.5)*1.0
array([[ 1., 0., 0., 1.],
[ 1., 1., 0., 1.],
[ 0., 1., 1., 1.],
[ 1., 0., 0., 1.]])
You can write expression directly, this will return a boolean array, and it can be used simply as an 1-byte unsigned integer ("uint8") array for further calculations:
print a > 0.5
output
[[False True True True]
[ True True False True]
[False True False True]
[ True False False True]]
In one line and with custom upper/lower values you can write so for example:
upper = 10
lower = 3
treshold = 0.5
print lower + (a>treshold) * (upper-lower)
Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...
The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.
You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])