I was reading the following statement about how convolution is equivariant with respect to translation from the Deep Learning Book.
Let g be a function mapping one image function to another image
function, such that I'=g(I) is the image function with I'(x, y)
=I(x−1, y). This shifts every pixel ofIone unit to the right. If we apply this transformation to I, then apply convolution, the result will
be the same as if we applied convolution to I', then applied the
transformation g to the output.
For the last line I bolded, they are applying convolution to I', but shouldn't this be I? I' is the translated image. Otherwise it would effectively be saying:
f(g(I)) = g( f(g(I)) )
where f is convolution & g is translation.
I am trying to execute the same myself in python using 3D kernel equal to the depth of the image as would be the case in the convolution layer for a colored image, a house.
Here is my code for applying a translation & then convolution to an image.
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Translation
translated = 100
new_I = np.zeros( (I.shape[0]-translated, I.shape[1], I.shape[2]) )
for i in range(translated, I.shape[0]):
for j in range(I.shape[1]):
for l in range(I.shape[2]):
new_I[i-translated,j,l] = I[i,j,l]
## Convolution
conv = np.zeros( (int((new_I.shape[0]-3)/2), int((new_I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(new_I[2*i:2*i+3, 2*j:2*j+3, :], k)
scipy.misc.imsave('pics/convoled_image_2nd.png', conv)
I get the following output:
Now, I switch the convolution and Translation steps:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import scipy
import scipy.ndimage
I = scipy.ndimage.imread('pics/house.jpg')
def convolution(A, B):
return np.sum( np.multiply(A, B) )
k = np.array([[[0,1,-1],[1,-1,0],[0,0,0]], [[-1,0,-1],[1,-1,0],[1,0,0]], [[1,-1,0],[1,0,1],[-1,0,1]]]) #kernel
## Convolution
conv = np.zeros( (int((I.shape[0]-3)/2), int((I.shape[1]-3)/2) ) )
for i in range( conv.shape[0] ):
for j in range(conv.shape[1]):
conv[i, j] = convolution(I[2*i:2*i+3, 2*j:2*j+3, :], k)
## Translation
translated = 100
new_I = np.zeros( (conv.shape[0]-translated, conv.shape[1]) )
for i in range(translated, conv.shape[0]):
for j in range(conv.shape[1]):
new_I[i-translated,j] = conv[i,j]
scipy.misc.imsave('pics/conv_trans_image.png', new_I)
And now I get the following output:
Shouldn't they be the same according the book? What am I doing wrong?
Just as the book says, the linearity properties of convolution and translation guarantee that their order is interchangable, excepting boundary effects.
For instance:
import numpy as np
from scipy import misc, ndimage, signal
def translate(img, dx):
img_t = np.zeros_like(img)
if dx == 0: img_t[:, :] = img[:, :]
elif dx > 0: img_t[:, dx:] = img[:, :-dx]
else: img_t[:, :dx] = img[:, -dx:]
return img_t
def convolution(img, k):
return np.sum([signal.convolve2d(img[:, :, c], k[:, :, c])
for c in range(img.shape[2])], axis=0)
img = ndimage.imread('house.jpg')
k = np.array([
[[ 0, 1, -1], [1, -1, 0], [ 0, 0, 0]],
[[-1, 0, -1], [1, -1, 0], [ 1, 0, 0]],
[[ 1, -1, 0], [1, 0, 1], [-1, 0, 1]]])
ct = translate(convolution(img, k), 100)
tc = convolution(translate(img, 100), k)
misc.imsave('conv_then_trans.png', ct)
misc.imsave('trans_then_conv.png', tc)
if np.all(ct[2:-2, 2:-2] == tc[2:-2, 2:-2]):
print('Equal!')
Prints:
Equal!
The problem is that you're overtranslating in the second example. After you shrink the image 2x, try translating by 50 instead.
Related
I have a mixture of three Gaussians and would like to compute the gradient of the log-density using Pytorch or Tensorflow. How can I do that?
from numpy import eye, log
from scipy.stats import multivariate_normal as MVN
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * MVN.pdf(x) for (c, MVN) in zip(cs, MVNs)])))
Essentially I would like to compute the gradient of log_density. I tried using autograd.grad but it fails because of the array assignment.
Attempted Pytorch Solution
from torch import tensor, eye, sqrt, zeros, log, exp
from torch.distributions import MultivariateNormal as MVN
μs = [tensor([0, 0]), tensor([2, 0]), tensor([0, 2])] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * exp(MVN.log_prob(x)) for (c, MVN) in zip(cs, MVNs)])))
Attempted Autograd Solution (won't work)
from numpy import eye, log, zeros
from scipy.stats import multivariate_normal as MVN
from autograd import grad
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
log_density = lambda x: log((sum([c * MVN.pdf(x) for (c, MVN) in zip(cs, MVNs)])))
gradient = grad(log_density)
# If you try using this gradient function you get an error
gradient(zeros(2))
The error I get is
ValueError: setting an array element with a sequence.
Naive Autograd Solution
There is, of course, a bad Autograd solution that won't scale well. For instance
from autograd.numpy import log, eye, zeros, array
from autograd.scipy.stats import multivariate_normal as MVN
from autograd import grad
μs = [[0, 0], [2, 0], [0, 2]] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
def log_density(x):
return log((1/3) * MVN.pdf(x, zeros(2), eye(2)) + (1/3) * MVN.pdf(x, array([2, 0]), eye(2)) + (1/3) * MVN.pdf(x, array([0, 2]), eye(2)))
grad(log_density)(zeros(2)) # Works!
You can do
from torch import tensor, eye, sqrt, zeros, log, exp
from torch.distributions import MultivariateNormal as MVN
μs = [tensor([0, 0]), tensor([2, 0]), tensor([0, 2])] # Means
Σs = [eye(2), eye(2), eye(2)] # Covariance Matrices
cs = [1 / 3] * 3 # Mixture coefficients
MVNs = [MVN(μ, Σ) for (μ, Σ) in zip(μs, Σs)] # List of Gaussians
x = tensor((0.0,0.0), requires_grad=True)
log_density = log((sum([c * exp(MVN.log_prob(x)) for (c, MVN) in zip(cs, MVNs)])))
log_density.backward()
print(x.grad)
which will print the gradient at (0.0,0.0). However as pytorch is not generating a static computation graph, I could not find an easy way to calculate the gradient at another point without rebuilding the computation graph. You could try to use tensorflow, which gives you more control on the computation graphs and allows you to construct a graph for the gradient computation.
Edit With tensorflow you could do something like
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import tensorflow_probability as tfp
#tf.function
def mygrad(x):
print("building graph")
us = tf.stack([tf.constant([0.0, 0.0]), tf.constant([2., 0.]), tf.constant([0., 2.])])
covs = tf.stack([tf.eye(2), tf.eye(2), tf.eye(2)])
cs = tf.constant([1 / 3] * 3)
with tf.GradientTape() as gt:
gt.watch(x)
log_density = tf.math.log(tf.math.reduce_sum(tfp.distributions.MultivariateNormalTriL(us,covs).prob(x) * cs) )
return gt.gradient(log_density,x)
print(mygrad(tf.constant([0.0,0.0])).numpy()) #gradient at 0.0,0.0
print(mygrad(tf.constant([1.0,0.0])).numpy()) #gradient at 1.0,0.0
Essentially you do automatic differentiation with the tf.GradientTape and capture the computation graph in a tf.function. There is more background information on the very extensive Tensorflow API documentation.
I'm implementing a 2d periodic convolution on a synthetic image in three different ways: using scipy, using torch and using the Fourier transform (also under torch framework).
However, I've got different results. Performing the operation by hand I can see that scipy's convolution yields the correct results. torch's spatial version, on the other hand, yields the expected result inverted. Finally, the Fourier version returns something unexpected.
The code is the following:
import torch
import numpy as np
import scipy.signal as sig
import torch.nn.functional as F
import matplotlib.pyplot as plt
def numpy_periodic_conv(f, k):
H, W = f.shape
periodic_f = np.hstack([f, f])
periodic_f = np.vstack([periodic_f, periodic_f])
conv = sig.convolve2d(periodic_f, k, mode='same')
conv = conv[H // 2:-H // 2, W // 2:-W // 2]
return periodic_f, conv
def torch_periodic_conv(f, k):
H, W = f.shape[-2:]
periodic_f = f.repeat(1, 1, 2, 2)
conv = F.conv2d(periodic_f, k, padding=1)
conv = conv[:, :, H // 2:-H // 2, W // 2:-W // 2]
return periodic_f.squeeze().numpy(), conv.squeeze().numpy()
def torch_fourier_conv(f, k):
pad_x = f.shape[-2] - k.shape[-2]
pad_y = f.shape[-1] - k.shape[-1]
expanded_kernel = F.pad(k, [0, pad_x, 0, pad_y])
fft_x = torch.rfft(f, 2, onesided=False)
fft_kernel = torch.rfft(expanded_kernel, 2, onesided=False)
real = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 0] - \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 1]
im = fft_x[:, :, :, :, 0] * fft_kernel[:, :, :, :, 1] + \
fft_x[:, :, :, :, 1] * fft_kernel[:, :, :, :, 0]
fft_conv = torch.stack([real, im], -1) # (a+bj)*(c+dj) = (ac-bd)+(ad+bc)j
ifft_conv = torch.irfft(fft_conv, 2, onesided=False)
return expanded_kernel.squeeze().numpy(), ifft_conv.squeeze().numpy()
if __name__ == '__main__':
f = np.concatenate([np.ones((10, 5)), np.zeros((10, 5))], 1)
k = np.array([[1, 0, -1], [2, 0, -2], [1, 0, -1]])
f_tensor = torch.from_numpy(f).unsqueeze(0).unsqueeze(0).float()
k_tensor = torch.from_numpy(k).unsqueeze(0).unsqueeze(0).float()
np_periodic_f, np_periodic_conv = numpy_periodic_conv(f, k)
tc_periodic_f, tc_periodic_conv = torch_periodic_conv(f_tensor, k_tensor)
tc_fourier_k, tc_fourier_conv = torch_fourier_conv(f_tensor, k_tensor)
print('Spatial numpy conv shape= ', np_periodic_conv.shape)
print('Spatial torch conv shape= ', tc_periodic_conv.shape)
print('Fourier torch conv shape= ', tc_fourier_conv.shape)
r_np = dict(name='numpy', im=np_periodic_f, k=k, conv=np_periodic_conv)
r_torch = dict(name='torch', im=tc_periodic_f, k=k, conv=tc_periodic_conv)
r_fourier = dict(name='fourier', im=f, k=tc_fourier_k, conv=tc_fourier_conv)
titles = ['{} im', '{} kernel', '{} conv']
results = [r_np, r_torch, r_fourier]
fig, axs = plt.subplots(3, 3)
for i, r_dict in enumerate(results):
axs[i, 0].imshow(r_dict['im'], cmap='gray')
axs[i, 0].set_title(titles[0].format(r_dict['name']))
axs[i, 1].imshow(r_dict['k'], cmap='gray')
axs[i, 1].set_title(titles[1].format(r_dict['name']))
axs[i, 2].imshow(r_dict['conv'], cmap='gray')
axs[i, 2].set_title(titles[2].format(r_dict['name']))
plt.show()
The results I'm obtaining:
Note: The image for both numpyand torch versions show the periodic image, which is required to perform the periodic convolution. The kernel for the Fourier version shows the original kernel zero-padded to the image size, which is required to compute the element-wise multiplication in the frequency domain.
-Edit1: There was an error when in the multiplication in the Fourier version, I was doing (ac-bd)+(ad-bc)j instead of (ac-bd)+(ad+bc)j. But now, I get the convolution shifted by one column.
-Edit2: torch's spatial convolution results is inverted because the operation is actually a cross-correlation. This was confirmed in the pytorch's official forum here. Furthermore, after fixing the kernel padding as Cris Luengo's answer, the frequency method yielded the same results as the correlations. This is rather strange for me because, as far as I know, the frequency property hold for convolution, not correlation.
New-results after fixing the kernel:
The FFT result is wrong because the padding is wrong. When padding, you need to put the origin (center of the kernel) at the top-left corner of the image. See this other answer for details.
The difference between the other two is the difference between a convolution and a correlation. It looks like the “numpy“ result is a convolution, the “torch” one a correlation.
My goal is to transform an image in such a way that three source points are mapped to three target points in an empty array. I have solved the finding of the correct affine matrix, however I cannot apply an affine transformation on a color image.
More specifically, I am struggling with the correct use of the scipy.ndimage.interpolation.affine_transform method. As this question and it's anwers point out, the affine_transform-method can be somewhat unintuitive (especially regarding offset calculation), however, user timday shows how apply a rotation and a shearing on an image and position it in another array, while user geodata gives more background information.
My problem is to generalize the approach shown there (1) to color images and (2) to an arbitrary transformation which I calculated myself.
This is my code (which should run as is on your computer):
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
def calcAffineMatrix(sourcePoints, targetPoints):
# For three source- and three target points, find the affine transformation
# Function works correctly, not part of the question
A = []
b = []
for sp, trg in zip(sourcePoints, targetPoints):
A.append([sp[0], 0, sp[1], 0, 1, 0])
A.append([0, sp[0], 0, sp[1], 0, 1])
b.append(trg[0])
b.append(trg[1])
result, resids, rank, s = np.linalg.lstsq(np.array(A), np.array(b))
a0, a1, a2, a3, a4, a5 = result
# Ignoring offset here, later use timday's suggested offset calculation
affineTrafo = np.array([[a0, a1, 0], [a2, a3, 0], [0, 0, 1]], 'd')
# Testing the correctness of transformation matrix
for i, _ in enumerate(sourcePoints):
src = sourcePoints[i]
src.append(1.)
trg = targetPoints[i]
trg.append(1.)
at = affineTrafo.copy()
at[2, 0:2] = [a4, a5]
assert(np.array_equal(np.round(np.array(src).dot(at)), np.array(trg)))
return affineTrafo
# Prepare source image
sourcePoints = [[162., 112.], [130., 112.], [162., 240.]]
targetPoints = [[180., 102.], [101., 101.], [190., 200.]]
image = np.empty((300, 300, 3), dtype='uint8')
image[:] = 255
# Mark border for better visibility
image[0:2, :] = 0
image[-3:-1, :] = 0
image[:, 0:2] = 0
image[:, -3:-1] = 0
# Mark source points in red
for sp in sourcePoints:
sp = [int(u) for u in sp]
image[sp[1] - 5:sp[1] + 5, sp[0] - 5:sp[0] + 5, :] = np.array([255, 0, 0])
# Show image
plt.subplot(3, 1, 1)
plt.imshow(image)
# Prepare array in which the image is placed
array = np.empty((400, 300, 3), dtype='uint8')
array[:] = 255
a2 = array.copy()
# Mark target points in blue
for tp in targetPoints:
tp = [int(u) for u in tp]
a2[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
# Show array
plt.subplot(3, 1, 2)
plt.imshow(a2)
# Next 5 program lines are actually relevant for question:
# Calculate affine matrix
affineTrafo = calcAffineMatrix(sourcePoints, targetPoints)
# This follows the c_in-c_out method proposed in linked stackoverflow issue
# extended for color channel (no translation here)
c_in = np.array([sourcePoints[0][0], sourcePoints[0][1], 0])
c_out = np.array([targetPoints[0][0], targetPoints[0][1], 0])
offset = (c_in - np.dot(c_out, affineTrafo))
# Affine transform!
ndimage.interpolation.affine_transform(image, affineTrafo, order=2, offset=offset,
output=array, output_shape=array.shape,
cval=255)
# Mark blue target points in array, expected to be above red source points
for tp in targetPoints:
tp = [int(u) for u in tp]
array[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
plt.subplot(3, 1, 3)
plt.imshow(array)
plt.show()
Other approaches I tried include working with the inverse, transpose or both of affineTrafo:
affineTrafo = np.linalg.inv(affineTrafo)
affineTrafo = affineTrafo.T
affineTrafo = np.linalg.inv(affineTrafo.T)
affineTrafo = np.linalg.inv(affineTrafo).T
In his answer, geodata shows how to calculate the matrix that affine_trafo needs to do a scaling and rotation:
If one wants a scaling S first and then a rotation R it holds that T=R*S and therefore T.inv=S.inv*R.inv (note the reversed order).
Which I tried to copy using matrix decomposition (decomposing my affine transformation into a rotation, a shearing and another rotation):
u, s, v = np.linalg.svd(affineTrafo[:2,:2])
uInv = np.linalg.inv(u)
sInv = np.linalg.inv(np.diag((s)))
vInv = np.linalg.inv(v)
affineTrafo[:2, :2] = uInv.dot(sInv).dot(vInv)
Again, without success.
For all of my results, it's not (only) an offset problem. It is clearly visible from the pictures that the relative positions of source and target points do not correspond.
I searched the web and stackoverflow and did not find an answer for my problem. Please help me! :)
I finally got it working thanks to AlexanderReynolds hint to use another library. This is of course a workaround; I could not get it working using scipy's affine_transform, so I used OpenCVs cv2.warpAffine instead. In case this is helpful to anyone else, this is my code:
import numpy as np
import matplotlib.pyplot as plt
import cv2
# Prepare source image
sourcePoints = [[162., 112.], [130., 112.], [162., 240.]]
targetPoints = [[180., 102.], [101., 101.], [190., 200.]]
image = np.empty((300, 300, 3), dtype='uint8')
image[:] = 255
# Mark border for better visibility
image[0:2, :] = 0
image[-3:-1, :] = 0
image[:, 0:2] = 0
image[:, -3:-1] = 0
# Mark source points in red
for sp in sourcePoints:
sp = [int(u) for u in sp]
image[sp[1] - 5:sp[1] + 5, sp[0] - 5:sp[0] + 5, :] = np.array([255, 0, 0])
# Show image
plt.subplot(3, 1, 1)
plt.imshow(image)
# Prepare array in which the image is placed
array = np.empty((400, 300, 3), dtype='uint8')
array[:] = 255
a2 = array.copy()
# Mark target points in blue
for tp in targetPoints:
tp = [int(u) for u in tp]
a2[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
# Show array
plt.subplot(3, 1, 2)
plt.imshow(a2)
# Calculate affine matrix and transform image
M = cv2.getAffineTransform(np.float32(sourcePoints), np.float32(targetPoints))
array = cv2.warpAffine(image, M, array.shape[:2], borderValue=[255, 255, 255])
# Mark blue target points in array, expected to be above red source points
for tp in targetPoints:
tp = [int(u) for u in tp]
array[tp[1] - 2:tp[1] + 2, tp[0] - 2:tp[0] + 2] = [0, 0, 255]
plt.subplot(3, 1, 3)
plt.imshow(array)
plt.show()
Comments:
Interesting how it worked almost immediately after changing the library. After having spent more than a day trying to get it work with scipy, this is a lesson for myself to change libraries faster.
In case someone wants to find an (least squares) approximation for an affine transformation based on more than three points, this is how you get the matrix that works with cv2.warpAffine:
Code:
def calcAffineMatrix(sourcePoints, targetPoints):
# For three or more source and target points, find the affine transformation
A = []
b = []
for sp, trg in zip(sourcePoints, targetPoints):
A.append([sp[0], 0, sp[1], 0, 1, 0])
A.append([0, sp[0], 0, sp[1], 0, 1])
b.append(trg[0])
b.append(trg[1])
result, resids, rank, s = np.linalg.lstsq(np.array(A), np.array(b))
a0, a1, a2, a3, a4, a5 = result
affineTrafo = np.float32([[a0, a2, a4], [a1, a3, a5]])
return affineTrafo
How could I smooth the x[1,3] and x[3,2] elements of the array,
x = np.array([[0,0,0,0,0],[0,0,0,1,0],[0,0,0,0,0],[0,0,1,0,0],[0,0,0,0,0]])
with two two-dimensional gaussian functions of width 1 and 2, respectively? In essence I need a function that allows me to smooth single "point like" array elements with gaussians of differing widths, such that I get an array with smoothly varying values.
I am a little confused with the question you asked and the comments you have posted. It seems to me that you want to use scipy.ndimage.filters.gaussian_filter but I don't understand what you mean by:
[...] gaussian functions with different sigma values to each pixel. [...]
In fact, since you use a 2-dimensional array x the gaussian filter will have 2 parameters. The rule is: one sigma value per dimension rather than one sigma value per pixel.
Here is a short example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
n = 200 # widht/height of the array
m = 1000 # number of points
sigma_y = 3.0
sigma_x = 2.0
# Create input array
x = np.zeros((n, n))
i = np.random.choice(range(0, n * n), size=m)
x[i / n, i % n] = 1.0
# Plot input array
pl.imshow(x, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("array.png")
# Apply gaussian filter
sigma = [sigma_y, sigma_x]
y = sp.ndimage.filters.gaussian_filter(x, sigma, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.title("$\sigma_x = " + str(sigma_x) + "\quad \sigma_y = " + str(sigma_y) + "$")
pl.savefig("smooth_array_" + str(sigma_x) + "_" + str(sigma_y) + ".png")
Here is the initial array:
Here are some results for different values of sigma_x and sigma_y:
This allows to properly account for the influence of the second parameter of scipy.ndimage.filters.gaussian_filter.
However, according to the previous quote, you might be more interested in the assigement of different weights to each pixel. In this case, scipy.ndimage.filters.convolve is the function you are looking for. Here is the corresponding example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
# Arbitrary weights
weights = np.array([[0, 0, 1, 0, 0],
[0, 2, 4, 2, 0],
[1, 4, 8, 4, 1],
[0, 2, 4, 2, 0],
[0, 0, 1, 0, 0]],
dtype=np.float)
weights = weights / np.sum(weights[:])
y = sp.ndimage.filters.convolve(x, weights, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("smooth_array.png")
And the corresponding result:
I hope this will help you.
I tried to convert an image to hsv and back to rgb, but somehow I lost color information.
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
And I also replicated the problem on shell too, just writing this line after importing also gives the same result.
plt.imshow(
matplotlib.colors.hsv_to_rgb(
matplotlib.colors.rgb_to_hsv(mpimg.imread('go2.jpg'))
)
)
Can you tell me what am I doing wrong?
edit: this is only a partial solution,
see discussion at https://github.com/matplotlib/matplotlib/pull/2569
This is an integer division issue. numpy is serious about it's types and seems to not respect from __future__ import division. The simple work around is to convert your rgb values to floats before calling rgb_to_hsv or patch the function as such:
def rgb_to_hsv(arr):
"""
convert rgb values in a numpy array to hsv values
input and output arrays should have shape (M,N,3)
"""
arr = arr.astype('float') # <- add this line
out = np.zeros(arr.shape, dtype=np.float)
arr_max = arr.max(-1)
ipos = arr_max > 0
delta = arr.ptp(-1)
s = np.zeros_like(delta)
s[ipos] = delta[ipos] / arr_max[ipos]
ipos = delta > 0
# red is max
idx = (arr[:, :, 0] == arr_max) & ipos
out[idx, 0] = (arr[idx, 1] - arr[idx, 2]) / delta[idx]
# green is max
idx = (arr[:, :, 1] == arr_max) & ipos
out[idx, 0] = 2. + (arr[idx, 2] - arr[idx, 0]) / delta[idx]
# blue is max
idx = (arr[:, :, 2] == arr_max) & ipos
out[idx, 0] = 4. + (arr[idx, 0] - arr[idx, 1]) / delta[idx]
out[:, :, 0] = (out[:, :, 0] / 6.0) % 1.0
out[:, :, 1] = s
out[:, :, 2] = arr_max
return out
This problem is reproducible for me (matplotlib 1.3.0). It looks like a bug to me. The issue seems to be that in the rgb_to_hsv step, the saturation is being dropped to zero. At least for most colours:
import numpy as np
darkgreen = np.array([[[0, 100, 0]]], dtype='uint8')
matplotlib.colors.rgb_to_hsv(darkgreen) # [0.33, 1., 100.], okay so far
darkgreen2 = np.array([[[10, 100, 10]]], dtype='uint8') # very similar colour
matplotlib.colors.rgb_to_hsv(darkgreen2) # [0.33, 0., 100.], S=0 means this is a shade of gray
I think the correct place to report bugs is on the github issue tracker.