I have a batch of 20 flattened tensors representing 256X256 images.
>>> imgs.shape
(20, 65536)
Each image was split into 32x32 patches (a total of 64 patches per image). I have calculated a score for each patch and got a vector with the shape of (20,64)
I would like to multiply each pixel with the corresponding patch score.
imgs * score yields an error and score.repeat(1,1,64) didn't repeat the scores in a way that preserves the score of each pixel.
How can this be achieved?
EDIT:
A simple example can be using
import torch
img_size = 4
patch_size = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
And trying to achieve
score = [[1,1,3,3],[2,2,4,4],[5,5,6,6][7,7,8,8]]
I would suggest reshaping your scores array to preserve information about how it relates to the original image, then using repeat_interleave() twice.
Example:
import torch
img_size = 4
patch_size = 2
patches_per_axis = int(img_size / patch_size)
num_images = 2
img = torch.rand((2,img_size,img_size)) # (2,4,4)
score = torch.tensor([[1,2,3,4],[5,6,7,8]]) # (2,4)
def expand_scores(scores):
# Unflatten scores
scores = scores.reshape((num_images, patches_per_axis, patches_per_axis))
# Repeat scores to match dimensions of image, in vertical direction
scores = scores.repeat_interleave(repeats=patch_size, axis=1)
# Repeat scores to match dimensions of image, in horizontal direction
scores = scores.repeat_interleave(repeats=patch_size, axis=2)
# Optional: use reshape() to re-flatten scores. If you do that here, you'll need to do it to the image tensor too.
return scores
(I added two constants at the top to your example, num_images, and patches_per_axis. In your original example, these would be set to 20 and 8, respectively.)
When you call expand_scores(), you'll get the following output:
tensor([[[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]],
[[5, 5, 6, 6],
[5, 5, 6, 6],
[7, 7, 8, 8],
[7, 7, 8, 8]]])
You can multiply that by the pixel values:
expand_scores(score) * img
Related
Using Torch, I am trying to load a large set of images into the program. But as I approach 50'000 images the kernel starts to crash which I assume is due to memory limitation. A minimal example of my code (results using 20,000 images):
print(f"Before starting to loop: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
X_data = []
y_data = []
for path in paths:
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
X_data.append(np.array(img/255, dtype=np.uint8))
print(f"Before convert to numpy: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
X_data = np.array(X_data)
print(f"Before shuffle: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
shuffle_index = np.random.permutation(X_data.shape[0])
X_data = X_data[shuffle_index]
print(f"Before Convert to tensor: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
X_data = torch.Tensor(X_data).view(-1, 3, 128, 128)
print(f"Before save: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
torch.save(X_data, f"X_data.pt")
print(f"After save: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 3} GB")
Gives the following memory information:
Before starting to loop: 0.26 GB
Before convert to numpy: 1.29 GB
Before shuffle: 2.28 GB
Before Convert to tensor: 2.28 GB
Before save: 5.22 GB
After save: 4.14 GB
Is there something I am doing inefficiently? I have tried playing around with not using the intermediate steps but both torch.cat and numpy.append are just way too slow.
Is it instead recommended to store data as files in batch sizes and then load the data whenever that batch is going to be fed through the network? I can not find any beginner guides on how to do that and also, 50'000 images of size 1281283 seem to be a rather small amount of images to be causing issues...
Two points:
Use in place shuffles instead of np.random.permutation to create indexes. The latter will create new arrays while the former will not:
np.random.default_rng().shuffle(X_data, 0)
Use torch.from_numpy to create the Tensor instead of torch.Tensor. In this way, the Tensor created will share memory with the numpy array:
X_data = torch.from_numpy(X_data).view(-1, 3, 128, 128)
If you want to shuffle multiple arrays of the same length in the same order, you can use the same seed to build random generator (if you don't want the results to be reproducible, you can first use the default random generator to generate the seed to be used):
>>> a1 = np.arange(10).repeat(2).reshape(-1, 2)
>>> a2 = np.arange(10)
>>> np.random.default_rng(12345).shuffle(a1, 0)
>>> np.random.default_rng(12345).shuffle(a2, 0)
>>> a1
array([[4, 4],
[8, 8],
[1, 1],
[3, 3],
[7, 7],
[9, 9],
[6, 6],
[0, 0],
[2, 2],
[5, 5]])
>>> a2
array([4, 8, 1, 3, 7, 9, 6, 0, 2, 5])
I'm in the process of learning some ML concepts using OpenCV, and I have a piece of python code that I was given to translate into c++. I have a very basic knowledge of python, and I've run into some syntax that I can't seem to find the meaning for.
I have a variable being passed into a method (whole method not shown) that is coming from the result of cv2.imread(), so an image. In c++, it's of type Mat:
def preprocess_image(img, side = 96):
min_side = min(img.shape[0], img.shape[1])
img = img[:min_side, :min_side * 2]
I have a couple questions:
What does the syntax ":min_side" do?
What is that line doing in terms of the image?
I am assuming the input of the image is a Matrix. In Python the image is generally read as numpy matrix
1.What does the syntax ":min_side" do?
It "Slice" the List/Array or basically in this case, a Matrix.
2.What is that line doing in terms of the image?
It "crops" the 2D Array(Basically a Matrix/Image)
A simple example of slicing:
x = np.array([[0, 1, 2],[3, 4, 5], [6, 7, 8]])
print(x)
out:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Performing Slicing on this Matrix(Image):
x[:2, :3]
output after Slicing:
array([[0, 1, 2],
[3, 4, 5]])
A good source to read more about it would be straight from the source: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
The line:
img = img[:min_side, :min_side * 2]
is cropping the image so that the resulting image is min_side in height and min_side * 2 in width. The colon preceding a variable name is python's slicing syntax. Observe:
arr = [1, 2, 3, 4, 5, 6]
length = 4
print(arr[:length])
Output:
[1, 2, 3, 4]
:min_side is a shorthand for 0:min_side i.e it produces a slice the object from the start to min_side. For example:
f = [2, 4, 5, 6, 8, 9]
f[:3] # returns [2,4,5]
img = img[:min_side, :min_side *2] produces a crop of the image (which is a numpy array) from 0 to min_side along the height and from 0 to min_side * 2 along the width. Therefore the resulting image would be one of width min_side * 2 and height min_side .
I'm trying to register two images that are a rotated and translated version of one another using opencv. Generally speaking, the procedure is (pseudo code):
a. IF1 = FFT2(I1); IF2 = FFT2(I2)
b. R_translation = (IF1).*(IF2_conjugate)
c. R_translation = R_translation./abs(R_translation)
d. r_translation = IFFT2(R_translation)
where the maximum of r_translation corresponds to the translation. Moving on to calculate the rotation, the abs value removes the translation part,
e. IF1_abs = abs(IF1); IF2_abs = abs(IF2)
Converting to Linear-Polar coordinates,
f. IF1_abs_pol = LINPOL(IF1_abs); IF2_abs_pol = LINPOL(IF2_abs)
f. IFF1 = FFT2(IF1_abs_pol); IFF2 = FFT2(IF2_abs_pol)
f. R_rot = (IFF1).*(IFF2_conjugate)
c. R_rot = R_rot./abs(R_rot)
d. r_rot = IFFT2(R_rot)
where the maximum of r_rotationn corresponds to the rotation. While for translation alone, the cv2.phaseCorrelate function returns expected results, for rotation, it returns odd results. So I had tried the following.
I took two numpy.array-s 5x5, which are a rotated version of one another like so:
a = numpy.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
a = a.astype('float')/a.astype('float').max()
b = numpy.array([[5, 5, 5, 5, 5], [4, 4, 4, 4, 4], [3, 3, 3, 3, 3], [2, 2, 2, 2, 2], [1, 1, 1, 1, 1]])
b = b.astype('float') / b.astype('float').max()
First I calculated the phase correlation myself:
center_x = numpy.floor(a.shape[0] / 2.0)#the x center of rotation (= x center of image)
center_y = numpy.floor(a.shape[1] / 2.0)#the y center of rotation (= y center of image)
Mvalue = a.shape[1] / numpy.sqrt(
((a.shape[0] / 2.0) ** 2.0) + ((a.shape[1] / 2.0) ** 2.0)) # rotation radius
Calculating the FFT, taking the absolute value (losing the translation difference data if existed), and switching to Linear-Polar coordinates and normalizing:
a_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(a)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
b_polar = cv2.linearPolar(numpy.abs(numpy.fft.fft2(b)), (center_x, center_y), Mvalue, cv2.WARP_FILL_OUTLIERS)
a_polar = a_polar/a_polar.max()
b_polar = b_polar / b_polar.max()
Another FFT step, multiplying point wise, and IFFT back:
aff = numpy.fft.fft2(a_polar)
bff = numpy.fft.fft2(b_polar)
R = aff * numpy.ma.conjugate(bff)
R = R / numpy.absolute(R)
r = numpy.fft.ifft2(R).real
r = r/r.max()
yields,
Phase correlation for rotation, b with respect to a
According to cv2.linearPolar() the rows, span the angle (in this case with step size of 360/5 = 72degrees) and the columns span the radius (from 0 to the maximum radius given in Mvalue. The maximum is evident at the last row (corresponding to approximately -90degree shift). So far so good..
The second method is using cv2.phaseCorrelate() directly,
r_direct = cv2.phaseCorrelate(a_polar, b_polar)
which yields,
Phase correlation for rotation, b with respect to a direct method
The first tuple, is the X,Y correlation coefficient (in pixels?) and the third number is the fit grade. When it is close to unity, the correlation coefficient represents better the data (the blob around the maximum is more distinct).
Other than the fact that the result is not distinct enough (why?), the result is confusing...
Generally, The first FFT process in this 5x5 example was not necessary. If rotation is the only interference, one can immediately switch to Linear-Polar coordinates and use cv2.phaseCorrelate. In that case, the result is also confusing.
Any help would be appreciated :)
Thanks!
David
I have the following blurring kernel I need to apply to every pixel in an RGB image
[ 0.0625 0.025 0.375 0.025 0.0625 ]
So, the pseudo-code looks something like this in Numpy
for i in range(rows):
for j in range(cols):
for k in range(3):
final[i][j][k] = image[i-2][j][k]*0.0625 + \
image[i-1][j][k]*0.25 + \
image[i][j][k]*0.375 + \
image[i+1][j][k]*0.25 + \
image[i+2][j][k]*0.0625
I've tried searching for a question similar to this but never found these sort of data accesses in the computation.
How do I perform the above function for a Theano tensor matrix?
You can use Conv2D function for this task. see the reference here and may be you also can read the example tutorial here. Notes for this solution:
Because your kernel is symmetrical, you can ignore filter_flip parameter
Conv2D is using 4D input and kernel shape as parameters, so you need to reshape it first
Conv2D sum every channel (I think in your case 'k' variable is for RGB right? it's called channel) so you should separate it first
This is my example code, I use simpler kernel here:
import numpy as np
import theano
import theano.tensor as T
from theano.tensor.nnet import conv2d
# original image
img = [[[1, 2, 3, 4], #R channel
[1, 1, 1, 1], #
[2, 2, 2, 2]], #
[[1, 1, 1, 1], #G channel
[2, 2, 2, 2], #
[1, 2, 3, 4]], #
[[1, 1, 1, 1], #B channel
[1, 2, 3, 4], #
[2, 2, 2, 2],]]#
# separate and reshape each channel to 4D
R = np.asarray([[img[0]]], dtype='float32')
G = np.asarray([[img[1]]], dtype='float32')
B = np.asarray([[img[2]]], dtype='float32')
# 4D kernel from the original : [1,0,1]
kernel = np.asarray([[[[1],[0],[1]]]], dtype='float32')
# theano convolution
t_img = T.ftensor4("t_img")
t_kernel = T.ftensor4("t_kernel")
result = conv2d(
input = t_img,
filters=t_kernel,
filter_shape=(1,1,1,3),
border_mode = 'half')
f = theano.function([t_img,t_kernel],result)
# compute each channel
R = f(R,kernel)
G = f(G,kernel)
B = f(B,kernel)
# reshape again
img = np.asarray([R,G,B])
img = np.reshape(img,(3,3,4))
print img
If you have anything to discuss about the code, please comment. Hope it helps.
I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.
I suspect there is numpy, scipy, or pandas functionality to do this.
example:
data = [4,2,5,6,7,5,4,3,5,7]
for a bin size of 2:
bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]
for a bin size of 3:
bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
Just use reshape and then mean(axis=1).
As the simplest possible example:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
Since you already have a numpy array, to avoid for loops, you can use reshape and consider the new dimension to be the bin:
In [33]: data.reshape(2, -1)
Out[33]:
array([[4, 2, 5, 6, 7],
[5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5, 3. , 4. , 5.5, 7. ])
Actually this will just work if the size of data is divisible by n. I'll edit a fix.
Looks like Joe Kington has an answer that handles that.
Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:
data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]
I just wrote a function to apply it to all array size or dimension you want.
data is your array
axis is the axis you want to been
binstep is the number of points between each bin (allow overlapping bins)
binsize is the size of each bin
func is the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)
def binArray(data, axis, binstep, binsize, func=np.nanmean):
data = np.array(data)
dims = np.array(data.shape)
argdims = np.arange(data.ndim)
argdims[0], argdims[axis]= argdims[axis], argdims[0]
data = data.transpose(argdims)
data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
data = np.array(data).transpose(argdims)
return data
In you case it will be :
data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)
or for the bin size of 3:
bin_data_mean = binArray(data, 0, 3, 3, np.mean)