Differences between OpenCV image processing and tf.image processing - python

I recently switched out using OpenCV for Tensorflow's tf.image module for image processing. However, my validation accuracy dropped around 10%.
I believe the issue is related to
cv2.imread() vs. tf.image.decode_jpeg()
cv2.resize() vs. tf.image.resize_images()
While these differences result in worse accuracy, the images seem to be human-indistinguishable when using plt.imshow(). For example, take Image #1 of the ImageNet Validation Dataset:
First issue:
cv2.imread() takes in a string and outputs a BGR 3-channel uint8 matrix
tf.image_decode_jpeg() takes in a string tensor and outputs an RGB 3-channel uint8 tensor.
However, after converting the tf tensor to BGR format, there are very slight differences at many pixels in the image.
Using tf.image.decode_jpeg and then converting to BGR
[[ 26 41 24 ..., 57 48 46]
[ 36 39 36 ..., 24 24 29]
[ 41 26 34 ..., 11 17 27]
...,
[ 71 67 61 ..., 106 105 100]
[ 66 63 59 ..., 106 105 101]
[ 64 66 58 ..., 106 105 101]]```
Using cv.imread
[[ 26 42 24 ..., 57 48 48]
[ 38 40 38 ..., 26 27 31]
[ 41 28 36 ..., 14 20 31]
...,
[ 72 67 60 ..., 108 105 102]
[ 65 63 58 ..., 107 107 103]
[ 65 67 60 ..., 108 106 102]]```
Second issue:
tf.image.resize_images() automatically converts a uint8 tensor to a float32 tensor, and seems to exacerbate the differences in pixel values.
I believe that tf.image.resize_images() and cv2.resize() are both
tf.image.resize_images
[[ 26. 25.41850281 35.73127747 ..., 81.85855103
59.45834351 49.82373047]
[ 38.33480072 32.90485001 50.90826797 ..., 86.28446198
74.88543701 20.16353798]
[ 51.27312469 26.86172867 39.52401352 ..., 66.86851501
81.12111664 33.37636185]
...,
[ 70.59472656 75.78851318
45.48100662 ..., 70.18637085
88.56777191 97.19295502]
[ 70.66964722 59.77249908 48.16699219 ..., 74.25527954
97.58244324 105.20263672]
[ 64.93395996 59.72298431 55.17600632 ..., 77.28720856
98.95108032 105.20263672]]```
cv2.resize
[[ 36 30 34 ..., 102 59 43]
[ 35 28 51 ..., 85 61 26]
[ 28 39 50 ..., 59 62 52]
...,
[ 75 67 34 ..., 74 98 101]
[ 67 59 43 ..., 86 102 104]
[ 66 65 48 ..., 86 103 105]]```
Here's a gist demonstrating the behavior just mentioned. It includes the full code for how I am processing the image.
So my main questions are:
Why is the output of cv2.imread() and tf.image.decode_jpeg() different?
How are cv2.resize() and tf.image.resize_images() different if they use the same interpolation scheme?
Thank you!

As vijay m points out correctly, by changing the dct_method to "INTEGER_ACCURATE" you will get the same uint8 image using cv2 or tf. The problem indeed seems to be the resizing method. I also tried to force Tensorflow to use the same interpolation method than cv2 uses by default (bilinear) but the results are still different. This might be the case, because cv2 does the interpolation on integer values and TensorFlow converts to float before interpolating. But this is only a guess. If you plot the pixel-wise difference between the resized image by TF and cv2 you'll get the following historgram:
Histrogramm of pixel-wise difference
As you can see, this looks pretty normal distributed. (Also I was surprised amount of pixel-wise difference). The problem of your accuracy drop could lie exactly here. In this paper Goodfellow et al. describe the effect of adversarial examples and classification systems. This problem here is something similar I think. If the original weights you use for your network were trained using some input pipeline which gives the results of the cv2 functions, the image from the TF input pipeline is something like an adversarial example.
(See the image on page 3 at the top for an example...I can't post more than two links.)
So in the end I think if you want to use the original network weights for the same data they trained the network on, you should stay with a similar/same input pipeline. If you use the weights to finetune the network on your own data, this should not be of a big concern, because you retrain the classification layer to work with the new input images (from the TF pipeline).
And # Ishant Mrinal: Please have a look at the code the OP provided in the GIST. He is aware of the difference of BGR (cv2) and RGB (TF) and is converting the images to the same color space.

Related

Normalization problem when running my image processing

I am trying to normalize the images and used the following code to do that but when trying to normalize
img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
when I print the image using print(img)
i get the following as if No normalization was applied to the image
[[199 204 205 ... 29 30 34]
[195 200 203 ... 30 30 32]
[190 195 200 ... 35 31 29]
...
[ 7 3 1 ... 16 16 15]
[ 19 13 7 ... 18 18 17]
[ 35 26 19 ... 18 20 19]]
I tried to use another approach as
img/255 or img/255.0.
I still can see black images and upon printing print(img) I get the following as:
[[0.78039216 0.8 0.80392157 ... 0.11372549 0.11764706 0.13333333]
[0.76470588 0.78431373 0.79607843 ... 0.11764706 0.11764706 0.1254902 ]
[0.74509804 0.76470588 0.78431373 ... 0.1372549 0.12156863 0.11372549]
I am kind of confused on why I get the black images ?
...
You probably have very small areas with luminosity that is very close to 255. That will "halt" the normalization.
What you can do is use some kind of thresholding to remove, say, all intensities from 220 to 255 and map them to 220. If you normalize that, the points with intensity 220 will be driven up to 255, but this time the darker values will get amplified too.
However, I think you're likely to get better answers if you describe in more detail what you're trying to accomplish - what the image is, and to what end you want to normalize it.

Cannot retrieve Original image from Encrypted image In Python using PIL

I am writing a script that can encrypt and decrypt an image using the RSA algorithm. My public key is (7, 187) and the private key is (23,187) now the calculation for the encryption is correct like for an entry in the matrix of the image, 41 the encrypted value is 46. But when the decryption is happening it is not giving the appropriate result like for 46 it is giving 136 and for every entry of 46 in the encrypt matrix the result I am getting is 136 in the decrypt matrix. And I don't know why this is happening. When I am doing the same calculation in the python prompt(or shell) it is giving the correct answer.
In the script, I am first converting the RGB image into grayscale and then converting it to a 2d numpy array, then for each element, I am applying the RSA algo(the keys) and then saving it as an image. Then I am applying the decryption key in the encrypted matrix and then the problem is occurring. Heres the code:
from PIL import Image
import numpy as np
from pylab import *
#encryption
img1 = (Image.open('image.jpeg').convert('L'))
img1.show()
img = array((Image.open('image.jpeg').convert('L')))
a,b = img.shape #saving the no of rows and col in a tuple
print('\n\nOriginal image: ')
print(img)
print((a,b))
tup = a,b
for i in range (0, tup[0]):
for j in range (0, tup[1]):
img[i][j]= (pow(img[i][j],7)%187)
print('\n\nEncrypted image: ')
print(img)
imgOut = Image.fromarray(img)
imgOut.show()
imgOut.save('img.bmp')
#decryption
img2 = (Image.open('img.bmp'))
img2.show()
img3 = array(Image.open('img.bmp'))
print('\n\nEncrypted image: ')
print(img3)
a1,b1 = img3.shape
print((a1,b1))
tup1 = a1,b1
for i1 in range (0, tup1[0]):
for j1 in range (0, tup1[1]):
img3[i1][j1]= ((pow(img3[i1][j1], 23))%187)
print('\n\nDecrypted image: ')
print(img3)
imgOut1 = Image.fromarray(img3)
imgOut1.show()
print(type(img))
The values of the matrices:
Original image:
[[41 42 45 ... 47 41 33]
[41 43 45 ... 44 38 30]
[41 42 46 ... 41 36 30]
...
[43 43 44 ... 56 56 55]
[45 44 45 ... 55 55 54]
[46 46 46 ... 53 54 54]]
Encrypted image:
[[ 46 15 122 ... 174 46 33]
[ 46 87 122 ... 22 47 123]
[ 46 15 7 ... 46 9 123]
...
[ 87 87 22 ... 78 78 132]
[122 22 122 ... 132 132 164]
[ 7 7 7 ... 26 164 164]]
Decrypted image:
[[136 70 24 ... 178 136 164]
[136 111 24 ... 146 141 88]
[136 70 96 ... 136 100 88]
...
[111 111 146 ... 140 140 1]
[ 24 146 24 ... 1 1 81]
[ 96 96 96 ... 52 81 81]]
Any help will be greatly appreciated. Thank You.
I think you will get on better using the 3rd parameter to the pow() function which does the modulus internally for you.
Here is a little example without the complexity of loading images - just imagine it is a greyscale gradient from black to white.
# Make single row greyscale gradient from 0..255
img = [ x for x in range(256) ]
# Create encrypted version
enc = [ pow(x,7,187) for x in img ]
# Decrypt back to plaintext
dec = [ pow(x,23,187) for x in enc ]
It seems to decrypt back into the original values from 0..187, where it goes wrong - presumably because of overflow? Maybe someone cleverer than me will be able to explain that - please add comment for me if you know!

Remove Specific Indices From 2D Numpy Array

If I have a set of data that's of shape (1000,1000) and I know that the values I need from it are contained within the indices (25:888,11:957), how would I go about separating the two sections of data from one another?
I couldn't figure out how to get np.delete() to like the specific 2D case and I also need both the good and the bad sections of data for analysis, so I can't just specify my array bounds to be within the good indices.
I feel like there's a simple solution I'm missing here.
Is this how you want to divide the array?
In [364]: arr = np.ones((1000,1000),int)
In [365]: beta = arr[25:888, 11:957]
In [366]: beta.shape
Out[366]: (863, 946)
In [367]: arr[:25,:].shape
Out[367]: (25, 1000)
In [368]: arr[888:,:].shape
Out[368]: (112, 1000)
In [369]: arr[25:888,:11].shape
Out[369]: (863, 11)
In [370]: arr[25:888,957:].shape
Out[370]: (863, 43)
I'm imaging a square with a rectangle cut out of the middle. It's easy to specify that rectangle, but the frame is has to be viewed as 4 rectangles - unless it is described via the mask of what is missing.
Checking that I got everything:
In [376]: x = np.array([_366,_367,_368,_369,_370])
In [377]: np.multiply.reduce(x, axis=1).sum()
Out[377]: 1000000
Let's say your original numpy array is my_arr
Extracting the "Good" Section:
This is easy because the good section has a rectangular shape.
good_arr = my_arr[25:888, 11:957]
Extracting the "Bad" Section:
The "bad" section doesn't have a rectangular shape. Rather, it has the shape of a rectangle with a rectangular hole cut out of it.
So, you can't really store the "bad" section alone, in any array-like structure, unless you're ok with wasting some extra space to deal with the cut out portion.
What are your options for the "Bad" Section?
Option 1:
Be happy and content with having extracted the good section. Let the bad section remain as part of the original my_arr. While iterating trough my_arr, you can always discriminate between good and and bad items based on the indices. The disadvantage is that, whenever you want to process only the bad items, you have to do it through a nested double loop, rather than use some vectorized features of numpy.
Option 2:
Suppose we want to perform some operations such as row-wise totals or column-wise totals on only the bad items of my_arr, and suppose you don't want the overhead of the nested for loops. You can create something called a numpy masked array. With a masked array, you can perform most of your usual numpy operations, and numpy will automatically exclude masked out items from the calculations. Note that internally, there will be some memory wastage involved, just to store an item as "masked"
The code below illustrates how you can create a masked array called masked_arr from your original array my_arr:
import numpy as np
my_size = 10 # In your case, 1000
r_1, r_2 = 2, 8 # In your case, r_1 = 25, r_2 = 889 (which is 888+1)
c_1, c_2 = 3, 5 # In your case, c_1 = 11, c_2 = 958 (which is 957+1)
# Using nested list comprehension, build a boolean mask as a list of lists, of shape (my_size, my_size).
# The mask will have False everywhere, except in the sub-region [r_1:r_2, c_1:c_2], which will have True.
mask_list = [[True if ((r in range(r_1, r_2)) and (c in range(c_1, c_2))) else False
for c in range(my_size)] for r in range(my_size)]
# Your original, complete 2d array. Let's just fill it with some "toy data"
my_arr = np.arange((my_size * my_size)).reshape(my_size, my_size)
print (my_arr)
masked_arr = np.ma.masked_where(mask_list, my_arr)
print ("masked_arr is:\n", masked_arr, ", and its shape is:", masked_arr.shape)
The output of the above is:
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]
masked_arr is:
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 -- -- 25 26 27 28 29]
[30 31 32 -- -- 35 36 37 38 39]
[40 41 42 -- -- 45 46 47 48 49]
[50 51 52 -- -- 55 56 57 58 59]
[60 61 62 -- -- 65 66 67 68 69]
[70 71 72 -- -- 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]] , and its shape is: (10, 10)
Now that you have a masked array, you will be able to perform most of the numpy operations on it, and numpy will automatically exclude the masked items (the ones that appear as "--" when you print the masked array)
Some examples of what you can do with the masked array:
# Now, you can print column-wise totals, of only the bad items.
print (masked_arr.sum(axis=0))
# Or row-wise totals, for that matter.
print (masked_arr.sum(axis=1))
The output of the above is:
[450 460 470 192 196 500 510 520 530 540]
[45 145 198 278 358 438 518 598 845 945]

Why feed_dict is constructed when running epoch in PTB tutorial on Tensorflow?

Q1: I am following this tutorial on Recurrent Neural Networks, and I am wondering why do you need to create feed_dict in the following part of the code:
def run_epoch(session, model, eval_op=None, verbose=False):
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
I tested and it seems that if you remove this part of the code, the code also runs:
def run_epoch(session, model, eval_op=None, verbose=False):
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
vals = session.run(fetches)
So my question is why do you need to reset the initial state to zeros after you feed a new batch of data?
Q2: Also, from what I understand using feed_dict is considered to be slow. That is why it is recommended to feed data using tf.data APIs. Is using feed_dict also an issue in this case? If so, how is it possible to avoid using feed_dict in this example.
UPD: Thank you a lot #jdehesa for your detailed response. It helps a lot! Just before I close this question and accept your answer, could you clarify one point that you mentioned answering Q1.
I see now the purpose of feed_dict. However, I am not sure that it is something that is implemented in the tutorial. From what you say:
At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.
I just looked again into the source code of the tutorial, and I do not see where the the output state is set as new current state for the next iteration. Is it done somewhere implicitly or do I miss something?
I maybe also missing something on theoretical side. Just to make sure that I understand it correctly, here there is a quick example. Assume the input data is an array that stores integer values from 0 to 120. We set the batch size is 5, the number of data points in one batch is 24, and the number of time steps in unrolled RNN is 10. In this case you, you only use data points at time points from 0 to 20. Then you process the data in two steps (model.input.epoch_size = 2). When you iterate over model.input.epoch_size:
state = session.run(model.initial_state)
# ...
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
you feed a batch of data like this:
> Iteration (step) 1:
x:
[[ 0 1 2 3 4 5 6 7 8 9]
[ 24 25 26 27 28 29 30 31 32 33]
[ 48 49 50 51 52 53 54 55 56 57]
[ 72 73 74 75 76 77 78 79 80 81]
[ 96 97 98 99 100 101 102 103 104 105]]
y:
[[ 1 2 3 4 5 6 7 8 9 10]
[ 25 26 27 28 29 30 31 32 33 34]
[ 49 50 51 52 53 54 55 56 57 58]
[ 73 74 75 76 77 78 79 80 81 82]
[ 97 98 99 100 101 102 103 104 105 106]]
> Iteration (step) 2:
x:
[[ 10 11 12 13 14 15 16 17 18 19]
[ 34 35 36 37 38 39 40 41 42 43]
[ 58 59 60 61 62 63 64 65 66 67]
[ 82 83 84 85 86 87 88 89 90 91]
[106 107 108 109 110 111 112 113 114 115]]
y:
[[ 11 12 13 14 15 16 17 18 19 20]
[ 35 36 37 38 39 40 41 42 43 44]
[ 59 60 61 62 63 64 65 66 67 68]
[ 83 84 85 86 87 88 89 90 91 92]
[107 108 109 110 111 112 113 114 115 116]]
At each iteration, you construct a new feed_dict with the initial state of he recurrent units at zero. So you assume at each step that you start processing the sequence from scratch. Is it correct?
Q1. feed_dict is used in this case to set the initial state of the recurrent units. By default, on each call to run recurrent units process data with an initial "zero" state. However, if your sequences are long you may need to split them into several steps. It is important that, after each step, you save the final state of the recurrent units and input as initial state for the next step, otherwise it would be as if the next step were the beginning of the sequence again (in particular, if your output is only the final output of the network after processing the whole sequence, it would be like discarding all the data prior to the last step). At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.
Q2. The claim the "feed_dict is slow" can be somewhat misleading, taken as a general truism (I am not blaming you for saying it, I have seen it many times too). The problem with feed_dict is that its function is to take non-TensorFlow data (typically NumPy data) into TensorFlow world. It is not that it is terrible at that, it is just that it takes some extra time to move the data around, which is especially notable when a lot of data is involved. For example, if you want to input a batch of images through feed_dict, you need to load them from disk, decode them, convert it to a big NumPy array and pass it into feed_dict, then TensorFlow would copy all the data into the session (GPU memory or whatever); so you would two copies of the data in memory and additional memory exchanges. tf.data helps because it does everything within TensorFlow (which also reduces the number of Python/C trips and is sometimes more convenient in general). In your case, what is being fed through feed_dict are the initial states of the recurrent units. Unless you have several quite big recurrent layers I'd say the performance impact is probably rather small. It is possible, though, to avoid feed_dict in this case too, you would need to have a set of TensorFlow variables holding the current state, set up the recurrent units to use their output as initial state (with the initial_state parameter of tf.nn.dynamic_rnn) and use their final state to update the variable values; then on each new batch you would have to reinitialize the variables to the "zero" state again. However, I would make sure that this is going to have a significant benefit before going down that route (e.g. measure runtime with and without feed_dict, even though the results will be wrong).
EDIT:
As a clarification for the update, I copied here the relevant lines of the code:
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
At the beginning of an epoch, state takes the value of model.initial_state, which, unless a feed_dict replacing its values is given, will be the default "zero" initial state value. fetches is a dictionary that is passed to session.run later so it return another dictionary where (among other things) the key "final_state" will hold the final state value. Then, on every step, a feed_dict is created that replaces the initial_state tensor values with the data in state, and run is called with that feed_dict to retrieve the values of the tensors in fetches, and vals holds then the outputs of the run call. The line state = vals["final_state"] replaces the contents of state which was our current state value, with the output state of the last run; so on the next iteration feed_dict will hold the values of the previous last state, and so the network will continue "as if" the whole sequence had been given in one go. In the next call to run_epoch, state will be initialized again as the default value of model.initial_state and the process will start from "zero" again.

Getting error: Cannot reshape array of size 122304 into shape (52,28,28)

I'm trying to reshape a numpy array as:
data3 = data3.reshape((data3.shape[0], 28, 28))
where data3 is:
[[54 68 66 ..., 83 72 58]
[63 63 63 ..., 51 51 51]
[41 45 80 ..., 44 46 81]
...,
[58 60 61 ..., 75 75 81]
[56 58 59 ..., 72 75 80]
[ 4 4 4 ..., 8 8 8]]
data3.shape is (52, 2352 )
But I keep getting the following error:
ValueError: cannot reshape array of size 122304 into shape (52,28,28)
Exception TypeError: TypeError("'NoneType' object is not callable",) in <function _remove at 0x10b6477d0> ignored
What is happening and how to fix this error?
UPDATE:
I'm doing this to obtain data3 that is being used above:
def image_to_feature_vector(image, size=(28, 28)):
return cv2.resize(image, size).flatten()
data3 = np.array([image_to_feature_vector(cv2.imread(imagePath)) for imagePath in imagePaths])
imagePaths contains paths to all the images in my dataset. I actually want to convert the data3 to a flat list of 784-dim vectors, however the
image_to_feature_vector
function converts it to a 3072-dim vector!!
You can reshape the numpy matrix arrays such that before(a x b x c..n) = after(a x b x c..n). i.e the total elements in the matrix should be same as before, In your case, you can transform it such that transformed data3
has shape (156, 28, 28) or simply :-
import numpy as np
data3 = np.arange(122304).reshape(52, 2352 )
data3 = data3.reshape((data3.shape[0]*3, 28, 28))
print(data3.shape)
Output is of the form
[[[ 0 1 2 ..., 25 26 27]
[ 28 29 30 ..., 53 54 55]
[ 56 57 58 ..., 81 82 83]
...,
[ 700 701 702 ..., 725 726 727]
[ 728 729 730 ..., 753 754 755]
[ 756 757 758 ..., 781 782 783]]
...,
[122248 122249 122250 ..., 122273 122274 122275]
[122276 122277 122278 ..., 122301 122302 122303]]]
First, your input image's number of elements should match the number of elements in the desired feature vector.
Assuming the above is satisfied, the below should work:
# Reading all the images to a one numpy array. Paths of the images are in the imagePaths
data = np.array([np.array(cv2.imread(imagePaths[i])) for i in range(len(imagePaths))])
# This will contain the an array of feature vectors of the images
features = data.flatten().reshape(1, 784)

Categories