I've just started with tensorflow. I wrote a program that uses Fashion_MNIST dataset to train the model. And then predicts the labels using 'test_images'and it's working good so far.
But what I am curious how can I use my own image of a shoe or shirt for prediction. Because all the test images are of shape 28*28. How can I do this ?
The task you are engaged in is the task of data preparation and preprocessing. Among the things you must do already having a directory with images is the tagging of the images, for this task I recommend labelImg.
If you also need the dimensionality of the input to be of a specific size like the example you give, you can use digital image processing software. The OpenCV library has dimensionality reduction tools that work for this.
Related
Regular TimesFormer takes 3 channel input images, while I have 4 channel images (RGBD). I am struggling to find a TimesFormer (or a model similar to TimesFormer) that takes 4 channel input images and extract features from them.
Does anybody know such a model? Preferably, I would like to find pretrained model with weights.
MORE CONTEXT:
I am working with RGBD video frames and have multiclass classification problem at the end. My videos are fairly large, between 2 to 4 minutes so classical time-series models doesn't work for me. So my inputs are RGBD frames/images from the video and at the end I would like to get class prediction.
My idea was to divide the problem into 2 stages:
Extract features from video into smaller dimension with TimesFormer-like model. Result: I would get a new data representation (dataset).
Train clasification ML network with new data to get a class prediction.
As of Jan 2023, I don't think there's any readily available TimeSformer model/code that works on RGBD 4 channel image.
Alternatively, If you are looking for Vision Transformers that can work with depth as well (RGBD data), you have the entire list with state-of-the-art approaches and corresponding code (wherever available) here.
One of the good approach to start with is DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation. You can find the pre-trained models with this approach here.
If you're looking for 3D CNN based object detectors that can work on RGBD data: RGB-D Salient Object Detection via 3D Convolutional Neural Networks is one of the good ones to start with. Code and pre-trained models can be found here.
Since I don't fully understand your problem statement or exact requirement I'm proposed few things that I thought could be helpful.
I was wondering if you can detect s using their voice. For example, we give feed some data in our program like this is the voice of a cat and when it detects it, it says hello cat or something.
Sure! Machine learning can do that, I suggest using deep-learning such as CNN, LSTM model, etc. what you need to prepare is the dataset of your cat voices and accurate model architecture.
roughly process
prepare dataset:
for example: 300 audio files each contain 3 seconds of different cat voices.
train model
model evaluation
script to respond to the output of model classification.
say "hello cat" when the prediction output is a cat.
but it's not that easy, hope this example is helpful.
more detail: https://www.analyticsvidhya.com/blog/2022/03/implementing-audio-classification-project-using-deep-learning/
of course, you can use machine learning to solve that problem, you can either deal with it as a signal processing/time series-ish problem and slide a window over your audio then feed it to your model to classify it (cat, dog, ...) LSTMs and RNNs are your go-to in this case. Or transform your audios to spectrogram to transform it into an image, then feed those images to your typical CNN architecture (NASNet, DenseNet, ...).
here is a reference you can check:
UrbanSound (audio/signal):https://www.kaggle.com/datasets/raghavrawat/segregatedurban8ksounds
UrbanSound (Spectrogram, I create it long ago): https://www.kaggle.com/datasets/skywolfmo/urban8kspectograms
There might be other ideas and methods besides the ones I mentioned above.
Good luck
I started to do the medical image analysis for a project.
In this project I have images of human kidney(s) with and without stones. The aim is to predict if the given new image has stone or not.
I chose the KNN classifier model to do classification but I do not understand the image processing. I have some knowledge on segmentation. I can convert it into array for processing but I need some pointers to understand the process.
Image - https://i.stack.imgur.com/9FDUM.jpg
For image classification I would recommend you to use pre-trained neural networks like Resnet etc.
Frameworks like Tensorflow give a good API to re-train pre-trainined neural networks for a different use-case.
You can follow below link:
https://www.tensorflow.org/hub/tutorials/image_retraining
Image Processing is done to convert the digital images into a format which would be easier for a computer to calculate statistics on.
Images do not always contain the necessary information, there is noise and lots of unnecessary background information available in the image which won't be required for a specific purpose.
The Goal of processing an image is to extract the region of interest from the whole image.
Along with this various enhancements are done to the image so that we get features that are useful in calculating inferences
Processing an image consists of various image enhancement techniques and segmentation and other stuff like maybe a histogram equalization which in the end would be used to extract features. Doing this processing yields better features generally.
Also Image processing in itself is a vast topic. I recommend you read about it in papers from Google scholar
I already know how to make a neural network using the mnist dataset. I have been searching for tutorials on how to train a neural network on your own dataset for 3 months now but I'm just not getting it. If someone can suggest any good tutorials or explain how all of this works, please help.
PS. I won't install NLTK. It seems like a lot of people are training their neural network on text but I won't do that. If I would install NLTK, I would only use it once.
I suggest you use OpenCV library. Whatever you uses your MNIST data or PIL, when it's loaded, they're all just NumPy arrays. If you want to make MNIST datasets fit with your trained model, here's how I did it:
1.Use cv2.imread to load all the images you want them to act as training datasets.
2.Use cv2.cvtColor to convert all the images into grayscale images and resize them into 28x28.
3.Divide each pixel in all the datasets by 255.
4.Do the training as usual!
I haven't tried to make it your own format, but theoratically it's the same.
I would like to make a simple engine to classify image dataset and I am asking for guide or help.
I have already trained dataset and save the train model (1000000) and eval about 86.6%.
Then, Here is steps that I would like to follow:
Download image and convert into tensorflow dataset (I am not sure since it's all converted to bin type
Input image on trained model by cifar10 and test whether this image is dog, cat or sth else (print value would be... this image would be dog with 70% accuracy)
or distribute image folder if I input several images.
The whole purpose of this is to visualize all the process and use it in real by tensorflow.
I would appreciate if anyone could at least give me the ref.