Create numpy array from images in different folders - python

I am a beginner with Python, scikit-learn and numpy. I have a set of folders with images for which I want to do apply different Machine Learning algorithms. I am however struggling to get these images into numpy data that I can use.
These are my prerequisites:
Each folder name holds the key to what the images are. For example /birds/abc123.jpg and /birds/def456.jpg are both "birds"
Each image is 100x100px jpg
I am using Python 2.7
There are 2800 images in total
This is my code as far as I have gotten:
# Standard scientific Python imports
import matplotlib.pyplot as plt
# Import datasets, classifiers and performance metrics
from sklearn import svm, metrics
import numpy as np
import os # Working with files and folders
from PIL import Image # Image processing
rootdir = os.getcwd()
key_array = []
pixel_arr = np.empty((0,10000), int)
for subdir, dirs, files in os.walk('data'):
dir_name = subdir.split("/")[-1]
if "x" in dir_name:
key_array.append(dir_name)
for file in files:
if ".DS_Store" not in file:
file = os.path.join(subdir, file)
im = Image.open(file)
im_bw = im.convert('1') #Black and white
new_np = np.array(im_bw2).reshape(1,-1)
print new_np.shape
pixel_arr = np.append(pixel_arr, new_np, axis=0)
What works in this code is the browsing through the folders, getting the folder names and fetching the correct files/images. What I cannot get to work is to create a numpy array that is 2800,10000 (or maybe the correct would be 10000,2800), i.e. 2800 rows with 10000 values in each.
This solution (that I am not sure if it works) is super slow though and I am quite sure that there must be a solution that is faster and more elegant than this!
How can I create this 2800x10000 numpy array, preferrably with the index number from the key_array attached?

If you don't need all the images at the same time, you can use a generator.
def get_images():
for subdir, dirs, files in os.walk('data'):
dir_name = subdir.split("/")[-1]
if "x" in dir_name:
key_array.append(dir_name)
for file in files:
if ".DS_Store" not in file:
file = os.path.join(subdir, file)
im = Image.open(file)
im_bw = im.convert('1') #Black and white
yield np.array(im_bw2).reshape(1,-1)
This way you don't hold all the images in memory at the same time, which will probably help you out.
The use the images you would then do:
for image in get_images():
...

Related

How do I convert a folder of images to a npy file?

I have a folder containing images of gestures. But to make it work on my code I need to change it to X.npy and Y.npy. I looked for many questions regarding this kinda problems but still in the dark. How do I evaluate this? How do I convert the folder to create npy dataset of my own? Is there any code for this or any converter?
I found a piece of code for this purpose on github.
from PIL import Image
import os, sys
import cv2
import numpy as np
'''
Converts all images in a directory to '.npy' format.
Use np.save and np.load to save and load the images.
Use it for training your neural networks in ML/DL projects.
'''
# Path to image directory
path = "/path/to/image/directory/"
dirs = os.listdir( path )
dirs.sort()
x_train=[]
def load_dataset():
# Append images to a list
for item in dirs:
if os.path.isfile(path+item):
im = Image.open(path+item).convert("RGB")
im = np.array(im)
x_train.append(im)
if __name__ == "__main__":
load_dataset()
# Convert and save the list of images in '.npy' format
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
You can refer to the code snippet in the following GitHub repo, that I found in google to convert a folder of images to a npy file:
https://gist.github.com/anilsathyan7/ffb35601483ac46bd72790fde55f5c04
Here in this case entire images in the folder are converted into NumPy array and are appended in a list named x_train.To convert and save this list of images in a single '.npy' format file, we can use the same code snippet:
imgset=np.array(x_train)
np.save("imgds.npy",imgset)
To convert and save this list of images in multiple '.npy' format files, use below code snippet :
imgset=np.array(x_train,dtype=object)
for i in range(len(imgset)):
np.save("imgds"+str(i)+".npy",imgset[i])

How to read images using skimage

seed = 42
np.random.seed = seed
Img_Width=128
Img_Height=128
Img_Channel = 3
Train_Path = 'stage1_train/'
Test_Path = 'stage1_test/'
train_ids = next(os.walk(Train_Path))[1]
test_ids = next(os.walk(Test_Path))[1]
print(train_ids)
X_train = np.zeros((len(train_ids), Img_Height, Img_Width, Img_Channel),dtype=np.uint8)
Y_train = np.zeros((len(train_ids),Img_Height, Img_Width, 1), dtype=bool)
Above's code give as sample. I see this code and try to load my dataset.
I want to load all the image data from one folder. But it has 2 types file. 1 is .jpg file 2 is .png file. Now I want to load them into two different variables.variable = train_ids, where I can load images from several folder. But, in my dataset all the images in the same folder. How can I load them all?
This is my path, where all the images located:
F:\segmentation\ISBI2016_ISIC_Part3B_Training_Data\ISBI2016_ISIC_Part3B_Training_Data_1
[Here .jpg & .png file present]
My python code has situated on segmentation folder.
Whether the image is a JPG or a PNG makes no difference to ImageIO; it will load either format into a ndarray using the same syntax.
Regarding your the desire to load all images in a folder, we have an official example on how to read all images from a folder:
import imageio.v3 as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
if not file.is_file():
continue
images.append(iio.imread(file))
If you instead want to read a list of images from several folders, this works in almost the same way
import imageio.v3 as iio
list_of_files = [] # list of paths to images in various formats
images = [iio.imread(file_path) for file_path in list_of_files]

Showing all the images with matplotlib

I'm using numpy and matplotlib to read all the images in the folder for image processing techniques. Although, I have done the part of reading image dataset from folders and process it with numpy array. But the problem, I'm facing is of showing all the images with matplotlib.imshow function. Everytime I want to show all the images with imshow function, unfortunately it just give me first image nothing else.
My code is below:
import os
import numpy as np
import matplotlib.pyplot as mpplot
import matplotlib.image as mpimg
images = []
path = "../path/to/folder"
for root, _, files in os.walk(path):
current_directory_path = os.path.abspath(root)
for f in files:
name, ext = os.path.splitext(f)
if ext == ".jpg":
current_image_path = os.path.join(current_directory_path,f)
current_image = mpimg.imread(current_image_path)
images.append(current_image)
for img in images:
print len(img.shape)
i = 0
for i in range(len(img.shape)):
mpplot.imshow(img)
mpplot.show()
I will be thankful if somebody can help me in this.
P.S. I'm pretty new with python, numpy and also at stackoverflow. So, please don't mind if the question is unclear or not direct.
Thanks,
About showing only one plot in one moment: please get familiar with matplotlib subplots.
Also, what is your problem that you are not iterating over images. You are calling img x-times.
Try to iterate over images as below:
for img in images:
mpplot.imshow(img)
mpplot.show()
I think what you need to add is mpplot.figure() before each mpplot.show(), this will open a new window for each image.

Reading image files

I have a directory containing 4 folder (1,2,3,4). Each folder has jpg images in them. I used the code below to read the images. The problem is all images are in different shapes. So, now I have a list of images each with different shape.
1) Is there a better way to read img files from a directory? (maybe assign directly to a numpy array)
2) How can I resize the images so that they all have the same shape?
Thanks!
import imageio
import os.path
images = []
for folder in os.listdir('images'):
for filename in os.listdir('images/'+folder):
if filename.endswith(".jpg"):
img = imageio.imread('images/'+folder+'/'+filename)
img.reshape((1,img.flatten().shape[0])).shape
images.append(img)

How to iterate through a folder of images and print images inline in Jupyter notebook with Python 3.6

My following code iterates through a list of files & prints each image "inline" in a Jupyter Notebook with the filename.
from IPython.display import Image, display
listOfImageNames = ["0/IMG_1118.JPG","0/IMG_1179.JPG"]
for imageName in listOfImageNames:
display(Image(filename=imageName))
print(imageName)
However, I want to achieve the same output, but iterate through a folder of images, without having to reference each image file in the code. I have been battling with this one- can anyone point me in the right direction?
Using glob to search for JPG files!
import glob
from IPython.display import Image, display
for imageName in glob.glob('yourpath/*.jpg'): #assuming JPG
display(Image(filename=imageName))
print(imageName)
If your images are present in different folders and on different levels, you should approach recursively:
from IPython.display import Image, display
from glob import glob
listofImageNames = glob('**/*.JPG', recursive=True)
for imageName in listOfImageNames:
display(Image(filename=imageName))
print(imageName)
Display 'n' number of images from a folder
If anyone wants to show n number of images from a folder, he can use the below code.
Remember to use * in file_type like file_type = "*.jpg".
Because here glob will return a list of images.
# Display n images from a folder
import glob
from IPython.display import Image, display
file_type = "*.jpg" # Assuming all jpg images of folder (Not a single)
src_path = "your_path/"
no_of_image_to_show = 5
def display_n_images(src_path, file_type, no_of_image_to_show):
image_folder = glob.glob(src_path + file_type) # glob will return list of jpg images
image_folder = image_folder[0:no_of_image_to_show] # splitting list
for a_image in image_folder:
display(Image(filename=a_image))
print(a_image)
display_n_images(src_path, file_type, no_of_image_to_show)

Categories