Building a simple image recognition engine by tensorflow cifar10 example - python

I would like to make a simple engine to classify image dataset and I am asking for guide or help.
I have already trained dataset and save the train model (1000000) and eval about 86.6%.
Then, Here is steps that I would like to follow:
Download image and convert into tensorflow dataset (I am not sure since it's all converted to bin type
Input image on trained model by cifar10 and test whether this image is dog, cat or sth else (print value would be... this image would be dog with 70% accuracy)
or distribute image folder if I input several images.
The whole purpose of this is to visualize all the process and use it in real by tensorflow.
I would appreciate if anyone could at least give me the ref.

Related

About Image Denoising Paper Training Method: IDR: Self-Supervised Image Denoising via Iterative Data Refinement

Has anyone successfully trained the method for this paper?
I tried many times but always fail.
I really want to train this model for raw image denoising.
Could someone help me pls?
Below is this paper's github
https://github.com/zhangyi-3/IDR
I would like to use the dataset provided by the author, But I don't know the data structure and how to start.

How to do classification based on different types of input

I want to know if there is any way to do a classification based on different input types.
Basically, I have the image dog vs cat dataset from Kaggle and also the sound dog vs cat dataset and I want to show that by combining models from audio alone and image alone we could get a better accuracy.
I read things on ensemble learning where we can combine different models with average precision to get a higher accuracy but these are doing the classification on same types of inputs whereas here I want to do classification by using image and audio as inputs. I also read things from Keras using mixed input but this work only for combining tabular data with images but not sound with images...
I did not find any labelled dataset of dog vs cat videos from which I could extract frames and audio and then apply a CNN on both to do the classification.
Do you have any idea on how to tackle this problem ?

Pytorch - Use a UNet to perform Image Deblurring/Image Reconstruction

Currently, I'm working with a dataset where I have two kinds of images: "sharp version" of the image and "blurry version" of the same images, where a blur was added synthetically. My goal is to train a model that takes the blurry version of the images in and tries to deblur the image as much as it can so that the "deblurred image" is closer to the sharp version. In the literature, the UNet architecture seemed to be a model with good results. Additionally, I can use a pre-trained U-Net via Pytorch (https://pytorch.org/hub/mateuszbuda_brain-segmentation-pytorch_unet/).
My problem is now: When I train this pre-trained U-Net with my images and then try it on my test set, I get the following output:
The original image:
I know that this pre-trained model is usually used for biomedical image segmentation but I'm rather confused about how I have to modify the model to use it for an Image Deblurring/Reconstruction task. Does anyone have any advice on how to do this?
I would appreciate any feedback :)
The U-net you're using is for segmentation (classification of each pixels of the image) whereas you're trying to denoise the image (getting your image "sharper"/remove noise). It explains the results you got.
To get what you want you need and as DerekG said, you first need to modify the number of channels of the output. By modify it, you can't load all the pretrained model. You will have to copy parameters by parameters until the last one.
As the last layer is initialized randomly, you can retrained the model with your training set. You can freeze or not the pretrained parts.
Also, I'm not sure what your new dataset is but if it's really not related to biomedical images you should retrain your network from scratch (transfer learning shouldn't be done in these cases), maybe even change the encoder-decoder network.
From the included link:
import torch
model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
in_channels=3, out_channels=1, init_features=32, pretrained=True)
The model you define has a single output channel, resulting in a grayscale image output. You need 3 output channels for an RGB image.

How to use tensorflow model for predicting my own images

I've just started with tensorflow. I wrote a program that uses Fashion_MNIST dataset to train the model. And then predicts the labels using 'test_images'and it's working good so far.
But what I am curious how can I use my own image of a shoe or shirt for prediction. Because all the test images are of shape 28*28. How can I do this ?
The task you are engaged in is the task of data preparation and preprocessing. Among the things you must do already having a directory with images is the tagging of the images, for this task I recommend labelImg.
If you also need the dimensionality of the input to be of a specific size like the example you give, you can use digital image processing software. The OpenCV library has dimensionality reduction tools that work for this.

Using your own Data in Tensorflow

I already know how to make a neural network using the mnist dataset. I have been searching for tutorials on how to train a neural network on your own dataset for 3 months now but I'm just not getting it. If someone can suggest any good tutorials or explain how all of this works, please help.
PS. I won't install NLTK. It seems like a lot of people are training their neural network on text but I won't do that. If I would install NLTK, I would only use it once.
I suggest you use OpenCV library. Whatever you uses your MNIST data or PIL, when it's loaded, they're all just NumPy arrays. If you want to make MNIST datasets fit with your trained model, here's how I did it:
1.Use cv2.imread to load all the images you want them to act as training datasets.
2.Use cv2.cvtColor to convert all the images into grayscale images and resize them into 28x28.
3.Divide each pixel in all the datasets by 255.
4.Do the training as usual!
I haven't tried to make it your own format, but theoratically it's the same.

Categories