How to load an image dataset in scikit-learn? - python

I have collected a group of images that I want to train a model on.
How do I load the image dataset? I have a folder of training data with two folders in it denoting the two different kinds of objects. How would I go about loading this data set and then training a model?

this might help you to load your dataset into data variable from a single folder of images
import cv2
import os
import numpy as np
path = 'path to your dataset'
list_of_files = os.listdir(path)
data = np.empty(0)
for i in list_of_files:
x = cv2.imread(os.path.join(path+i))
data.append(x)

Related

How to read images using skimage

seed = 42
np.random.seed = seed
Img_Width=128
Img_Height=128
Img_Channel = 3
Train_Path = 'stage1_train/'
Test_Path = 'stage1_test/'
train_ids = next(os.walk(Train_Path))[1]
test_ids = next(os.walk(Test_Path))[1]
print(train_ids)
X_train = np.zeros((len(train_ids), Img_Height, Img_Width, Img_Channel),dtype=np.uint8)
Y_train = np.zeros((len(train_ids),Img_Height, Img_Width, 1), dtype=bool)
Above's code give as sample. I see this code and try to load my dataset.
I want to load all the image data from one folder. But it has 2 types file. 1 is .jpg file 2 is .png file. Now I want to load them into two different variables.variable = train_ids, where I can load images from several folder. But, in my dataset all the images in the same folder. How can I load them all?
This is my path, where all the images located:
F:\segmentation\ISBI2016_ISIC_Part3B_Training_Data\ISBI2016_ISIC_Part3B_Training_Data_1
[Here .jpg & .png file present]
My python code has situated on segmentation folder.
Whether the image is a JPG or a PNG makes no difference to ImageIO; it will load either format into a ndarray using the same syntax.
Regarding your the desire to load all images in a folder, we have an official example on how to read all images from a folder:
import imageio.v3 as iio
from pathlib import Path
images = list()
for file in Path("path/to/folder").iterdir():
if not file.is_file():
continue
images.append(iio.imread(file))
If you instead want to read a list of images from several folders, this works in almost the same way
import imageio.v3 as iio
list_of_files = [] # list of paths to images in various formats
images = [iio.imread(file_path) for file_path in list_of_files]

Convert a folder of images into a dataset of numpy arrays in Google Colab

I have uploaded the fairface dataset (https://github.com/joojs/fairface) into my google drive and I'm trying to convert the images to a dataset of arrays that I can use in a CNN.
First, I created a list of the files for the validation set. Now I am trying to convert the images to arrays. This is what I am trying, but it says my directory does not exist.
val is the folder of validation images.
import os
from PIL import Image
from numpy import asarray
val_items = os.listdir('/content/val')
train_items = os.listdir('/content/train')
val_img_array = []
# load the image and convert into
# numpy array
for i in range(len(val_items)):
img = Image.open('/content/val/*.jpg')
numpydata = asarray(img)
val_img_array.append(numpydata)
print(val_img_array)
Please give me any guidance you have. Thanks!
You are not importing the drive correctly. Your path should look like this:
Image.open("/content/drive/MyDrive/val/")

python about train a deep neural network on the MRI slices dataset

I want to Train a deep neural network on the MRI slices dataset. Here is my code
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib
file_dir = 'C:\\Users\\adam\\Downloads\\MRI_Images\\'
import glob
import cv2
images = [cv2.imread(file) for file in glob.glob("C:\\Users\\adam\\Downloads\\MRI_Images\\.png")]
(X_train_full, y_train_full), (X_test, y_test) = images
And python shows that not enough values to unpack. I don't know why. Is there problem when I put all images in one file to python?
I don't know the structure of your dataset directory, but I know that using glob.glob() will return all the images inside the 'C:\\Users\\adam\\Downloads\\MRI_Images\\' folder (not include subfolder).
That is, what you get inside image is a list of read-in images (numpy array format), like:
[image_0, image_1, ...]
A list can not be unpack into two tuples. And this is why the error comes out.
Try reading your train and test images seperately might help:
images_trainx = [cv2.imread(file) for file in glob.glob("C:\\Users\\adam\\Downloads\\MRI_Images\\trainx\\*.png")]
images_trainy = [cv2.imread(file) for file in glob.glob("C:\\Users\\adam\\Downloads\\MRI_Images\\trainy\\*.png")]
images_testx = [cv2.imread(file) for file in glob.glob("C:\\Users\\adam\\Downloads\\MRI_Images\\testx\\*.png")]
images_testy = [cv2.imread(file) for file in glob.glob("C:\\Users\\adam\\Downloads\\MRI_Images\\testy\\*.png")]
This approach is clunky but hard to go wrong.

Splitting image based dataset for YOLOv3

I have a question about splitting a dataset of 20k images along with their labels, the dataset is in the format of YOLOv3 which has an image file and a .txt file with the same name as the image, the text file has the labels inside it.
I want to split the dataset into train/test splits, is there a way to randomly select the image and its labels .txt file with it and store it in a separate folder using Python?
I want to be able to split the dataset randomly. For instance, select 16k files along with label file too and store them separately in a train folder and the remaining 4k should be stored in a test folder.
This could manually be done in the file explorer by selecting the first 16k files and move them to a different folder but the split won't be random as I plan to do this over and over again for the same dataset.
Here is what the data looks like
images and labels screenshot
I suggest you to take a look at following Python built-in modules
glob
random
os
shutill
for manipulating files and paths in Python. Here is my code with comments that might solve your problem. It's very simple
import glob
import random
import os
import shutil
# Get all paths to your images files and text files
PATH = 'path/to/dataset/'
img_paths = glob.glob(PATH+'*.jpg')
txt_paths = glob.glob(PATH+'*.txt')
# Calculate number of files for training, validation
data_size = len(img_paths)
r = 0.8
train_size = int(data_size * 0.8)
# Shuffle two list
img_txt = list(zip(img_paths, txt_paths))
random.seed(43)
random.shuffle(img_txt)
img_paths, txt_paths = zip(*img_txt)
# Now split them
train_img_paths = img_paths[:train_size]
train_txt_paths = txt_paths[:train_size]
valid_img_paths = img_paths[train_size:]
valid_txt_paths = txt_paths[train_size:]
# Move them to train, valid folders
train_folder = PATH+'train/'
valid_folder = PATH+'valid/'
os.mkdir(train_folder)
os.mkdir(valid_folder)
def move(paths, folder):
for p in paths:
shutil.move(p, folder)
move(train_img_paths, train_folder)
move(train_txt_paths, train_folder)
move(valid_img_paths, valid_folder)
move(valid_txt_paths, valid_folder)

How to load image dataset for SVM image classification task

I'm trying to make a linear SVM classifier (AD vs NC) for the classification of Alzheimer's Disease by using MRI images. How can I load the image dataset correctly?
I found an example of SVM image classification and I tried to run through the trial, but there was an error when loading the dataset.
The folder name is "images"
there are five subfolders in "images". They are named as doller_bill, sunflower, pizza, dog, and ball. Each subfolder contains 50-60 photos as jpg format. The followings are the sample codes I downloaded.
download from github
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np %matplotlib notebook
from sklearn import svm, metrics, datasets
from sklearn.utils import Bunch
from sklearn.model_selection import GridSearchCV, train_test_split
from skimage.io import imread
from skimage.transform import resize
def load_image_files(container_path, dimension=(64, 64)):
image_dir = Path(container_path)
folders = [directory for directory in image_dir.iterdir() if
directory.is_dir()]
categories = [fo.name for fo in folders]
descr = "A image classification dataset"
images = []
flat_data = []
target = []
for i, direc in enumerate(folders):
for file in direc.iterdir():
img = skimage.io.imread(file)
img_resized = resize(img, dimension, anti_aliasing=True,
mode='reflect')
flat_data.append(img_resized.flatten())
images.append(img_resized)
target.append(i)
flat_data = np.array(flat_data)
target = np.array(target)
images = np.array(images)
return Bunch(data=flat_data,
target=target,
target_names=categories,
images=images,
DESCR=descr)
image_dataset = load_image_files("images/")
However, when I run through the codes, it appeared an error as follows
NameError: name 'skimage' is not defined
So, would you please help me to figure out how to load the image dataset.
For instance, I have a folder named "images"
the subfolders are named as "MRI images_NC", "MRI images_AD",
Accordingly, each folder contains 1500 photos approximately.
Thanks again.
name 'skimage' is not defined
means that during the import
from skimage.io import imread `enter code here`
the skimage package can not be found
Please run a
pip install scikit-image

Categories