Related
Model Input feed:
<tf.Tensor 'serialized_test:0' shape=(?, ?, ?, ?) dtype=float32>
Image numpy array:
{'test': array([[[[ 31, 24, 14],
[ 24, 20, 11],
[ 21, 21, 19],
...,
[ 12, 23, 29],
[ 14, 25, 31],
[ 17, 28, 34]],
[[ 12, 23, 27],
[ 10, 21, 23],
[ 20, 32, 32],
...,
[ 23, 45, 56],
[ 16, 40, 50],
[ 2, 31, 39]],
[[ 6, 33, 42],
[ 0, 21, 29],
[ 5, 25, 34],
...,
[ 28, 47, 64],
[ 13, 30, 48],
[ 0, 15, 34]],
...,
[[ 29, 46, 56],
[ 50, 68, 78],
[ 29, 46, 56],
...,
[ 84, 104, 111],
[ 91, 111, 118],
[ 69, 89, 96]],
[[ 90, 110, 119],
[ 96, 116, 125],
[ 95, 115, 124],
...,
[ 70, 85, 92],
[ 81, 98, 106],
[ 86, 103, 111]],
[[100, 118, 128],
[ 71, 89, 99],
[ 62, 80, 90],
...,
[ 7, 44, 71],
[ 14, 51, 77],
[ 7, 43, 65]]],
[[[ 6, 37, 57],
[ 23, 49, 64],
[ 20, 42, 53],
...,
[ 41, 40, 36],
[ 17, 8, 3],
[ 24, 0, 0]],
[[ 28, 29, 24],
[ 19, 21, 18],
[ 20, 22, 21],
...,
[ 33, 75, 91],
[ 34, 86, 110],
[ 21, 84, 119]],
[[ 12, 81, 120],
[ 5, 77, 117],
[ 16, 85, 124],
...,
[ 74, 96, 117],
[ 74, 99, 119],
[ 51, 78, 97]],
...,
[[ 14, 22, 33],
[ 27, 36, 45],
[ 11, 20, 29],
...,
[ 56, 63, 69],
[ 74, 81, 87],
[ 50, 59, 64]],
[[ 40, 51, 55],
[ 52, 63, 67],
[ 26, 40, 41],
...,
[ 13, 33, 44],
[ 7, 25, 37],
[ 34, 50, 63]],
[[ 10, 26, 39],
[ 10, 28, 38],
[ 39, 59, 68],
...,
[ 87, 110, 126],
[ 64, 87, 103],
[ 63, 86, 102]]]], dtype=uint8)}
Required Output:
{'test': array([[[[0.12156863, 0.09411765, 0.05490196],
[0.11372549, 0.09019608, 0.05098039],
[0.09803922, 0.08235294, 0.04313725],
...,
[0.1372549 , 0.03137255, 0.01960784],
[0.18823529, 0.03529412, 0.03921569],
[0.21568627, 0.03921569, 0.05098039]],
[[0.12156863, 0.09803922, 0.05882353],
[0.10980392, 0.09019608, 0.05490196],
[0.09411765, 0.08235294, 0.04705882],
...,
[0.13333333, 0.03529412, 0.02352941],
[0.18823529, 0.04313725, 0.04705882],
[0.21960784, 0.05098039, 0.05882353]],
[[0.11764706, 0.10196078, 0.06666667],
[0.10588235, 0.09411765, 0.05882353],
[0.09019608, 0.07843137, 0.05098039],
...,
[0.1254902 , 0.03921569, 0.02745098],
[0.18823529, 0.05490196, 0.05490196],
[0.22352941, 0.06666667, 0.07058824]],
...,
[[0.06666667, 0.07058824, 0.05098039],
[0.06666667, 0.07058824, 0.05490196],
[0.06666667, 0.06666667, 0.05882353],
...,
[0.10196078, 0.03921569, 0.02745098],
[0.14117647, 0.04705882, 0.04705882],
[0.16470588, 0.05098039, 0.05882353]],
[[0.04313725, 0.04705882, 0.02745098],
[0.04313725, 0.04705882, 0.02745098],
[0.04313725, 0.04705882, 0.03137255],
...,
[0.10588235, 0.03921569, 0.03137255],
[0.14509804, 0.04705882, 0.04705882],
[0.16862745, 0.05098039, 0.05882353]],
[[0.02745098, 0.03137255, 0.01176471],
[0.02745098, 0.03137255, 0.01176471],
[0.02745098, 0.03137255, 0.01176471],
...,
[0.10588235, 0.03921569, 0.03137255],
[0.14509804, 0.04705882, 0.04705882],
[0.16862745, 0.05098039, 0.05882353]]]])}
I don't know how to do this, Actually, I am trying to find text objects present in the image. There are some preprocessing techniques to achieve this out.
Any help would be massively appreciated.
Edit:
The image is attached here
My inference code
from tensorflow.contrib import predictor
from PIL import Image
import numpy as np
a = predictor.from_saved_model('my_model') # this is a tensorflow saved model not a frozenmodel
image_np = np.array(Image.open("car_1.jpg"))
image_resized = np.resize(image_np, (2,70,130,3))
a({'test':image_resized})
An image has 3 components -> height, width, and channels. In your case, the desired height is 70 and the width is 130. Now as for channels you can't control them. Then remain as it is. In your case that is 3.
Coming to the first value in [2,70,130,3] the 2 represents the batch_size or to say the number of images. In your case, as you have only 1 image you can't get 2 here. If you had 2 images then you would have gotten that.
from PIL import Image
import requests
import numpy as np
import cv2
# reading your image
img = Image.open(requests.get('https://i.stack.imgur.com/qf5RE.jpg', stream=True).raw)
# You need to do this
new_img = cv2.resize(np.array(img), (70,130))
new_img = new_img / 255
The new_img is the desired output.
UPDATE:
Let's say you have a list of images. Then you can do it like this
def reshape_images(images):
new_images = []
for image in images:
new_img = cv2.resize(np.array(img), (70,130))
new_images.append(new_img)
new_images = np.stack(new_images, axis=0)
return new_images
new_images = reshape_images([img, img])
Why is there a difference between the pixel values if I open an image with skimage.io.imread(image) and tf.image.decode_image(image)?
For example:
import skimage.io
original_img = skimage.io.imread("test.jpg")
print(original_img.shape)
print(np.amax(original_img))
print(original_img)
The output is:
(110, 150, 3)
255
array([[[ 29, 65, 117],
[ 45, 43, 90],
[ 78, 39, 68],
...,
[ 30, 46, 95],
[ 30, 43, 96],
[ 31, 44, 97]],
[[ 41, 54, 89],
[ 95, 89, 123],
[ 57, 39, 65],
...,
[ 32, 46, 91],
[ 32, 46, 95],
[ 32, 45, 97]],
[[ 62, 49, 69],
[ 84, 76, 97],
[ 68, 70, 95],
...,
[ 18, 30, 70],
[ 35, 47, 95],
[ 34, 47, 99]],
...,
[[136, 124, 22],
[144, 136, 53],
[134, 123, 44],
...,
[ 16, 74, 16],
[ 39, 89, 52],
[ 53, 108, 69]],
[[161, 125, 5],
[149, 129, 42],
[129, 116, 48],
...,
[ 67, 119, 73],
[ 39, 80, 48],
[ 33, 69, 41]],
[[196, 127, 6],
[160, 111, 32],
[141, 108, 55],
...,
[ 26, 56, 32],
[ 8, 29, 10],
[ 12, 24, 12]]], dtype=uint8)
And if I open the same image with Tensorflow:
import tensorflow as tf
original_img = tf.image.decode_image(tf.io.read_file("test.jpg"))
print(np.amax(original_img))
print(original_img)
The output is:
255
<tf.Tensor: shape=(110, 150, 3), dtype=uint8, numpy=
array([[[ 44, 57, 101],
[ 40, 42, 80],
[ 65, 41, 65],
...,
[ 25, 42, 88],
[ 33, 49, 100],
[ 25, 41, 92]],
[[ 47, 53, 89],
[ 96, 95, 127],
[ 60, 44, 70],
...,
[ 29, 43, 88],
[ 40, 54, 103],
[ 19, 35, 84]],
[[ 59, 54, 74],
[ 72, 69, 90],
[ 70, 70, 96],
...,
[ 23, 35, 77],
[ 16, 29, 74],
[ 50, 64, 111]],
...,
[[145, 116, 24],
[161, 131, 43],
[141, 113, 30],
...,
[ 19, 67, 19],
[ 49, 95, 58],
[ 53, 97, 64]],
[[164, 119, 16],
[166, 123, 28],
[143, 108, 27],
...,
[ 73, 119, 80],
[ 29, 68, 37],
[ 39, 75, 47]],
[[182, 128, 20],
[160, 112, 14],
[149, 112, 32],
...,
[ 11, 57, 21],
[ 7, 44, 13],
[ 0, 14, 0]]], dtype=uint8)>
I have also noticed that if I open an image with tensorflow, make some changes in this image, save the image on the disk and open it again with tf.image.decode_image(image) the pixel values are again different, but this time, not so much.
This is due to the algorithm used for decompression. By default, the system-specific method is used. tf.image.decode_image() does not provide any possibility to change the method.
In tf.image.decode_jpeg() there is the dct_method argument which can be used to change the method for decompression. Currently there are two valid values that can be set: INTEGER_FAST and INTEGER_ACCURATE.
If you open the image in the following way you should have the same output as with skimage.io.imread(image):
original_img = tf.image.decode_jpeg(tf.io.read_file("test.jpg"),dct_method="INTEGER_ACCURATE")
I have a 3D tensor A x B x C. For each matrix B x C, I want to extract the leading diagonal.
Is there a vectorized way of doing this in numpy or pytorch instead of looping over A?
You can use numpy.diagonal()
np.diagonal(a, axis1=1, axis2=2)
Example:
In [10]: a = np.arange(3*4*5).reshape(3,4,5)
In [11]: a
Out[11]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
In [12]: np.diagonal(a, axis1=1, axis2=2)
Out[12]:
array([[ 0, 6, 12, 18],
[20, 26, 32, 38],
[40, 46, 52, 58]])
Assuming that the leading diagonal for a generic non-squared (BxC) slice starts off from the top-left corner, we can reshape and slice -
a.reshape(a.shape[0],-1)[:,::a.shape[-1]+1]
Sample run -
In [193]: np.random.seed(0)
In [194]: a = np.random.randint(11,99,(3,4,5))
In [195]: a
Out[195]:
array([[[55, 58, 75, 78, 78],
[20, 94, 32, 47, 98],
[81, 23, 69, 76, 50],
[98, 57, 92, 48, 36]],
[[88, 83, 20, 31, 91],
[80, 90, 58, 75, 93],
[60, 40, 30, 30, 25],
[50, 43, 76, 20, 68]],
[[43, 42, 85, 34, 46],
[86, 66, 39, 45, 11],
[11, 47, 64, 16, 49],
[28, 90, 15, 53, 69]]])
In [196]: a.reshape(a.shape[0],-1)[:,::a.shape[-1]+1]
Out[196]:
array([[55, 94, 69, 48],
[88, 90, 30, 20],
[43, 66, 64, 53]])
In PyTorch, use torch.diagonal():
t.diagonal(dim0=-2, dim1=-1)
I have experimenting with a python script which scales the images by 2 times and it is working fine, but the problem is how to store this resulted image in my disk so I can compare the results before and after.
import cv2
import numpy as np
img = cv2.imread('input.jpg')
res = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
Resultant is stored in res variable but it should be created as new image. How?
My desired output should be result.jpg
What i got when printed res
>>> res
array([[[ 39, 43, 44],
[ 40, 44, 44],
[ 41, 45, 46],
...,
[ 54, 52, 52],
[ 52, 50, 50],
[ 51, 49, 49]],
[[ 38, 42, 44],
[ 39, 43, 44],
[ 41, 45, 46],
...,
[ 55, 53, 53],
[ 54, 52, 52],
[ 53, 51, 51]],
[[ 37, 40, 43],
[ 38, 41, 44],
[ 40, 43, 46],
...,
[ 58, 56, 55],
[ 56, 54, 54],
[ 56, 53, 53]],
...,
[[ 52, 135, 94],
[ 54, 137, 95],
[ 59, 141, 99],
...,
[ 66, 139, 101],
[ 62, 135, 96],
[ 60, 133, 94]],
[[ 47, 131, 89],
[ 49, 133, 91],
[ 55, 138, 96],
...,
[ 56, 129, 91],
[ 54, 127, 89],
[ 54, 127, 88]],
[[ 44, 128, 86],
[ 47, 130, 88],
[ 53, 136, 94],
...,
[ 50, 123, 85],
[ 50, 123, 85],
[ 50, 123, 85]]], dtype=uint8)
You can use imwrite function.
You can find the description of this function here
I converted an image to numpy array and it returned a 3D array instead of 2D (width and height).
My code is:
import PIL
from PIL import Image
import numpy as np
samp_jpg = "imgs_subset/w_1.jpg"
samp_img = Image.open(samp_jpg)
print samp_img.size
(3072, 2048)
I = np.asarray(samp_img)
I.shape
(2048, 3072, 3)
The 3D matrix looks like:
array([[[ 58, 95, 114],
[ 54, 91, 110],
[ 52, 89, 108],
...,
[ 48, 84, 106],
[ 50, 85, 105],
[ 51, 86, 106]],
[[ 63, 100, 119],
[ 61, 97, 119],
[ 59, 95, 117],
...,
[ 48, 84, 106],
[ 50, 85, 105],
[ 51, 86, 106]],
[[ 66, 102, 124],
[ 66, 102, 124],
[ 65, 101, 125],
...,
[ 48, 84, 106],
[ 50, 85, 105],
[ 51, 86, 106]],
...,
[[ 69, 106, 135],
[ 66, 103, 132],
[ 61, 98, 127],
...,
[ 49, 85, 111],
[ 51, 87, 113],
[ 53, 89, 115]],
[[ 59, 98, 127],
[ 57, 96, 125],
[ 56, 95, 124],
...,
[ 51, 85, 113],
[ 52, 86, 114],
[ 53, 87, 115]],
[[ 63, 102, 131],
[ 62, 101, 130],
[ 60, 101, 129],
...,
[ 53, 86, 117],
[ 52, 85, 116],
[ 51, 84, 115]]], dtype=uint8)
I'm wondering what does the 3rd dimension mean? It is an array of length 3 (each line in the output above).
Red, green and blue channels, naturally.