I have a large images 5000x3500 and I want to divide it into small images 512x512 but without loosing the original image coordinates. The large images are annotated/labled that's why I want to keep the original coordinates and I will use the small images to train YOLO model. I am not sure if that called tiled or not. but is there any suggestion to do it using python or opencv-python?
Related
So here is my first question here. I am preparing a dataset for object detection. I have done the following things so far:
I have an original picture (size w4000 x h3000).
I used the annotation platform Roboflow to annotate it in COCO format, with close to 250 objects in the picture.
Roboflow returned a downscaled picture (2048x1536) with a respective json file with the annotations in COCO format.
Then, to obtain a dataset from my original picture (as I have a lot of objects and the picture is big enough), I decided to tile the original picture in patches of 224x224. For this purpose, I upscaled a bit (4032x3136) to be able to slice it properly, obtaining 252 pictures.
QUESTIONS
How can I resize the bounding boxes of the Roboflow 2048x1536 picture to my original picture (4032x3136)?
Once the b.boxes are resized to my original size picture, how can I resize them again, adapting the size to each of my patches (224x224) created by slicing the original picture?
Thank you!!
It sounds like the ultimate goal is to have tiled 224x224 images from the source 4032x3136 images with bounding boxes correctly updated.
In Roboflow, at least, you can add tiling as a preprocessing step to your original 4032x3136 images. The images will be broken into the number of tiles you select (2x2, 3x3, NxY). The bounding boxes will be correctly updated to cover the objects across each individual tile as well.
To reimplement in code from what you've described, you would need to:
Upscale your 2048x1536 images to 4032x3136
Scale the bounding boxes accordingly
Break the images into 224x224 tiles using something like Pil
Update the annotations to be broken into the coordinates on the respective tiles; one annotation per tile
I am new to Machine learning.
I have an image with size 28x200 as above. I have to detect images of size 28x28 in the large picture of size 28x200. For example, in this large image, there are three 28x28 images. How could I detect them? I read some answers using OpenCV but I could not manage to make it work with numpy array (.npy file), the answers are mainly concerned with JPG. Any solutions? Thanks in advance!
Lets assume i have a little dataset. I want to implement data augmentation. First i implement image segmentation (after this, image will be binary image) and then implement data augmentation. Is this a good way?
For image augmentation in segmentation and instance segmentation, you have to either no change the positions of the objects contained in the image by manipulating colors for example, or modify these positions by applying translations and rotation.
So, yes this way works, but you have to take into consideration the type of data you have and what you are looking to achieve. Data augmentation isn't a ready to-go process with good results everywhere.
In case you have a:
Semantic segmentation : Each pixel of your image has a row i and a column j which are labeled as its enclosing object. This means having your main image I and a label image L with its same size linking every pixel to its object label. In this case, your data augmentation is applied to both I and L, giving a combination of the two transformed images.
Instance segmentation : Here we generate a mask for every instance of the original image and the augmentation is applied to all of them including the original, then from these transformed masks we get our new instances.
EDIT:
Take a look at CLoDSA (Classification, Localization, Detection and Segmentation Augmentor) it may help you implement your idea.
In case your dataset is small, you should add data-augmentation during the training. It is important to change the original image & the targets (masks) in the same way !!.
For example, If an image is rotated 90 degrees, then its mask should also be rotated 90 degrees. Since you are using Keras library, You should check if the ImageDataGenerator also changes the target images (masks), along with the inputs. If it doesn't, You can implement the augmentations by yourself. This repository shows how it is done in OpenCV here:
https://github.com/kochlisGit/random-data-augmentations
I am trying to analyse an image and extract each number to then process using a CNN trained with MNIST. The images show garments with a grid-like pattern in each intersection of the grid there is a number (e.g. 0412). I want to analyse and detect which number it is to then store it's coordinates. Does anyone have any recommendations on how to preprocess the image given that it is quite noisy and with multiple numbers. I have tried using contours and it didn't work. I also put the image into binary and there are areas of the image which are unreadable. My initial idea was to isolate each number to then process.
Thanks in advance!
I'm training a deep neural network to improve the quality of images. The images contain some specific types of noise that I want to reduce/remove by means of a deep learning model. In order to do so I'm using a huge dataset of similar clear high-res images with barely any noise, add the specific types of noise to the images and train the network on regenerating the original image (a custom autoencoder network). With one of the several noise types this works very well so far. Without going to far into the details, adding that particular type of noise was easy.
Now I need to add another noise type to the images, more precisely: chroma noise like in the following image (the bottom right one): link
How do I artificially generate and add chroma noise to an image in Python? I can use the full range of image processing packages, PIL, numpy, OpenCV, torchvision...
You need to convert the image to a colorspace such as HSV or CIE Lab. You then add noise to the chromacity channels (a, b in Lab, or H, S is HSV). Finally, convert back to RGB.
This colorspace conversion step is very common and most image toolkits should have that functionality.