How to load many images eficiently from folder using openCV - python

I try to create my own image datasets for machine learning.
The workflow I thought is the following :
①Load all image files as an array in the folder.
②Label the loaded images
③Split loaded image files to image_data and label_data.
④Finally, split image_data to image_train_data and image_test_data and split label_data to label_train_data and label_test_data.
However, it doesn't go well in the first step(①).
How can I load all image data efficiently?
And if you implement an image data set for machine learning according to this workflow, how you handle it?
I wrote following code.
cat_im = cv2.imread("C:\\Users\\path\\cat1.jpg")
But, Am I forced writing \cat1.jpg , \cat2.jpg ,\cat3.jpg.....?

## you can find all images like extenstion
import os,cv2
import glob
all_images_path= glob.glob('some_folder\images\*png') ## it gives path of images as list
## then you can loop over all files
loaded_images = []
for image_path in all_images_path:
image = cv2.imread(image_path)
loaded_images.append(image)
## lets assume your labels are just name of files and its like cat1.png,cat2.png etc
labels = []
for image_path in all_images_path:
labels.append(os.basename(image_path))

Related

How do I convert a folder of images to a npy file?

I have a folder containing images of gestures. But to make it work on my code I need to change it to X.npy and Y.npy. I looked for many questions regarding this kinda problems but still in the dark. How do I evaluate this? How do I convert the folder to create npy dataset of my own? Is there any code for this or any converter?
I found a piece of code for this purpose on github.
from PIL import Image
import os, sys
import cv2
import numpy as np
'''
Converts all images in a directory to '.npy' format.
Use np.save and np.load to save and load the images.
Use it for training your neural networks in ML/DL projects.
'''
# Path to image directory
path = "/path/to/image/directory/"
dirs = os.listdir( path )
dirs.sort()
x_train=[]
def load_dataset():
# Append images to a list
for item in dirs:
if os.path.isfile(path+item):
im = Image.open(path+item).convert("RGB")
im = np.array(im)
x_train.append(im)
if __name__ == "__main__":
load_dataset()
# Convert and save the list of images in '.npy' format
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
You can refer to the code snippet in the following GitHub repo, that I found in google to convert a folder of images to a npy file:
https://gist.github.com/anilsathyan7/ffb35601483ac46bd72790fde55f5c04
Here in this case entire images in the folder are converted into NumPy array and are appended in a list named x_train.To convert and save this list of images in a single '.npy' format file, we can use the same code snippet:
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
To convert and save this list of images in multiple '.npy' format files, use below code snippet :
imgset=np.array(x_train,dtype=object)
for i in range(len(imgset)):
np.save("imgds"+str(i)+".npy",imgset[i])

How to read images using skimage

seed = 42
np.random.seed = seed
Img_Width=128
Img_Height=128
Img_Channel = 3
Train_Path = 'stage1_train/'
Test_Path = 'stage1_test/'
train_ids = next(os.walk(Train_Path))[1]
test_ids = next(os.walk(Test_Path))[1]
print(train_ids)
X_train = np.zeros((len(train_ids), Img_Height, Img_Width, Img_Channel),dtype=np.uint8)
Y_train = np.zeros((len(train_ids),Img_Height, Img_Width, 1), dtype=bool)
Above's code give as sample. I see this code and try to load my dataset.
I want to load all the image data from one folder. But it has 2 types file. 1 is .jpg file 2 is .png file. Now I want to load them into two different variables.variable = train_ids, where I can load images from several folder. But, in my dataset all the images in the same folder. How can I load them all?
This is my path, where all the images located:
F:\segmentation\ISBI2016_ISIC_Part3B_Training_Data\ISBI2016_ISIC_Part3B_Training_Data_1
[Here .jpg & .png file present]
My python code has situated on segmentation folder.
Whether the image is a JPG or a PNG makes no difference to ImageIO; it will load either format into a ndarray using the same syntax.
Regarding your the desire to load all images in a folder, we have an official example on how to read all images from a folder:
import imageio.v3 as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
if not file.is_file():
continue
images.append(iio.imread(file))
If you instead want to read a list of images from several folders, this works in almost the same way
import imageio.v3 as iio
list_of_files = [] # list of paths to images in various formats
images = [iio.imread(file_path) for file_path in list_of_files]

How to find a file/ data from a given data set in python- opencv image processing project?

I have a data set of images in an image processing project. I want to input an image and scan through the data set to recognize the given image. What module/ library/ approach( eg: ML) should I use to identify my image in my python- opencv code?
To find exactly the same image, you don't need any kind of ML. The image is just an array of pixels, so you can check if the array of the input image equals that of an image in your dataset.
import glob
import cv2
import numpy as np
# Read in source image (the one you want to match to others in the dataset)
source = cv2.imread('test.jpg')
# Make a list of all the images in the dataset (I assume they are images in a directory)
filelist = glob.glob(r'C:\Users\...\Images\*.JPG')
# Loop through the images, read them in and check if an image is equal to your source
for file in filelist:
img = cv2.imread(file)
if np.array_equal(source, img):
print("%s is the same image as source" %(file))
break

Chunking a directory and applying image blending using PIL. Can't save images correctly with Python

Sorry for the title... So the goal of this script is to take a folder full on images that are listed in a particular order. Then it chunks the images into groups of 3. From there it takes the 3 images and blends them together using PIL. Now the issue that I have is that the code below does a great job of doing what I want. I can show imgbld2 it'll create 4 images in a temporary folder.
Now my problem is that when I go to save the images using imgbld2.save()it will only save the first created image into 4 image files, instead of 4 created images into 4 separate files.
I can fix this issue by pointing another script to retrieve the images from the temp folder by using glob.glob(). But that would require me to make sure to run the script on a freshly restarted computer but that seems to be too messy for my taste.
Is there a better way to achieve what I'm trying to do? Or there a saving method that I'm missing?
Any help would be appreciated, here is the code:
from PIL import Image
import os.path
import glob
#Lists Directory
Dir = os.listdir('/path/to/Directory/of/Images')
#Glob all jpgs
im = glob.glob( '/path/to/Directory/of/Images/*.jpg')
#sort jpg according to name
imsort = sorted(im)
def chunker(imsort,size = 3):
for i in range(0, len(imsort), size):
yield imsort[i:i + size]
print('what does it look like?')
for j in chunker(imsort):
print(j)
img1 = Image.open(j[0])
img2 = Image.open(j[1])
img3 = Image.open(j[2])
imgbld1 = Image.blend(img1, img2, 0.3)
imgbld2 = Image.blend(imgbld1, img3, 0.3)
imgbld2.show()
imgbld2.save('path/to/new/folder/' + 'blended' , 'JPEG')

Read mnist images into Tensorflow

I was looking at this Tensorflow tutorial.
In the tutorial the images are magically read like this:
mnist = learn.datasets.load_dataset("mnist")
train_data = mnist.train.images
My images are placed in two directories:
../input/test/
../input/train/
They all have a *.jpg ending.
So how can read them into my program?
I don't think I can use learn.datasets.load_dataset because this seems to take in a specialized dataset structure, while I only have folders with images.
mnist.train.images is essentially a numpy array of shape [55000, 784]. Where, 55000 is the number of images and 784 is the number of pixels in each image (each image is 28x28)
You need to create a similar numpy array from your data in case you want to run this exact code. So, you'll need to iterate over all your images, read image as a numpy array, flatten it and create a matrix of size [num_examples, image_size]
The following code snippet should do it:
import os
import cv2
import numpy as np
def load_data(img_dir):
return np.array([cv2.imread(os.path.join(img_dir, img)).flatten() for img in os.listdir(img_dir) if img.endswith(".jpg")])
A more comprehensive code to enable debugging:
import os
list_of_imgs = []
img_dir = "../input/train/"
for img in os.listdir("."):
img = os.path.join(img_dir, img)
if not img.endswith(".jpg"):
continue
a = cv2.imread(img)
if a is None:
print "Unable to read image", img
continue
list_of_imgs.append(a.flatten())
train_data = np.array(list_of_imgs)
Note:
If your images are not 28x28x1 (B/W images), you will need to change the neural network architecture (defined in cnn_model_fn). The architecture in the tutorial is a toy architecture which only works for simple images like MNIST. Alexnet may be a good place to start for RGB images.
You can check the answers given in How do I convert a directory of jpeg images to TFRecords file in tensorflow?. Easiest way is to use the utility provided by tensor flow :build_image_data.py, which does exactly the thing you want to do.

Categories