How to save tiff images into a new npy file? - python

I would like to save some tiff images I have into a new npy file.
My data are saved in 5 different files (tiff format). I want to access to each one of them, convert them in narray and then save them in a new npy file (for deep learning classification).
import numpy as np
from PIL import Image
import os
Data_dir = r"C:\Desktop\Université_2019_2020\CoursS2_Mosef\Stage\Data\Grand_Leez\shp\imagettes"
Categories = ["Bouleau_tiff", "Chene_tiff", "Erable_tiff", "Frene_tiff", "Peuplier_tiff"]
for categorie in Categories:
path = os.path.join(Data_dir, categorie) #path for each species
for img in os.listdir(path):
path_img = os.path.join(path,img)
im = Image.open(os.path.join(path_img)) #load an image file
imarray = np.array(im) # convert it to a matrix
imarray = np.delete(imarray, 3, axis=2)
np.save(Data_dir, imarray)
Problem: It's only return me the last observation of my last category "Peuplier_tiff", also it's saved into the name imagette, I don't know why.
Last but not least, I have a doubt for my targets, how I can be sure that my categories are correctly assign to the corresponding arrays.
A lot of questions,
thanks in advance for your help.
S.V

Thanks for your response. Its working with this code :
import numpy as np
from PIL import Image
import os
new_dir = "dta_npy"
directory = r"C:\Desktop\Université_2019_2020\CoursS2_Mosef\Stage\Data\Grand_Leez\shp\imagettes"
Data_dir = os.path.join(directory, new_dir)
os.makedirs(Data_dir)
print("Directory '%s' created" %Data_dir)
Categories = ["Bouleau_tif","Chene_tif", "Erable_tif", "Frene_tif", "Peuplier_tif"]
for categorie in Categories:
path = os.path.join(directory,categorie) #path for each species
for img in os.listdir(path):
im = Image.open(os.path.join(path,img)) #load an image file
imarray = np.array(im) # convert it to a matrix
imarray = np.delete(imarray, 3, axis=2)
unique_name=img
unique_name = unique_name.split(".")
unique_name = unique_name[0]
np.save(Data_dir+"/"+unique_name, imarray)
Now my objective is to format my data, for each of my class, in this way : (click on the link)
format goal

Related

What is the most efficient way to read an hdf5 file containing an image stored as a numpy array?

I'm converting image files to hdf5 files as follows:
import h5py
import io
import os
import cv2
import numpy as np
from PIL import Image
def convertJpgtoH5(input_dir, filename, output_dir):
filepath = input_dir + '/' + filename
print('image size: %d bytes'%os.path.getsize(filepath))
img_f = open(filepath, 'rb')
binary_data = img_f.read()
binary_data_np = np.asarray(binary_data)
new_filepath = output_dir + '/' + filename[:-4] + '.hdf5'
f = h5py.File(new_filepath, 'w')
dset = f.create_dataset('image', data = binary_data_np)
f.close()
print('hdf5 file size: %d bytes'%os.path.getsize(new_filepath))
pathImg = '/path/to/images'
pathH5 = '/path/to/hdf5/files'
ext = [".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif"]
for img in os.listdir(pathImg):
if img.endswith(tuple(ext)):
convertJpgtoH5(pathImg, img, pathH5)
I later read these hdf5 files as follows:
for hf in os.listdir(pathH5):
if hf.endswith(".hdf5"):
hf = h5py.File(f"{pathH5}/{hf}", "r")
key = list(hf.keys())[0]
data = np.array(hf[key])
img = Image.open(io.BytesIO(data))
image = cv2.cvtColor(np.float32(img), cv2.COLOR_BGR2RGB)
hf.close()
Is there a more efficient way to read the hdf5 files rather than converting to numpy array, opening with Pillow before using with OpenCV?
Ideally this should be closed as a duplicate because most of what you want to do is explained in the answers I referenced in my comments above. I am including those links here:
How do I process a large dataset of images in python?
Convert a folder comprising jpeg images to hdf5
There is one difference: my examples load all the image data into 1 HDF5 file, and you are creating 1 HDF5 file for each image. Frankly, I don't think there is much value doing that. You wind up with twice as many files and there's nothing gained. If you are still interested in doing that, here are 2 more answers that might help (and I updated your code at the end):
How to split a big HDF5 file into multiple small HDF5 dataset
Extracting datasets from 1 HDF5 file to multiple files
In the interest of addressing your specific question, I modified your code to use cv2 only (no need for PIL). I resized the images and saved as 1 dataset in 1 file. If you are using the images for training and testing a CNN model, you need to do this anyway (it needs arrays of size/consistent shape). Also, I think you can save the data as int8 -- no need for floats. See below.
import h5py
import glob
import os
import cv2
import numpy as np
def convertImagetoH5(imgfilename):
print('image size: %d bytes'%os.path.getsize(imgfilename))
img = cv2.imread(imgfilename, cv2.COLOR_BGR2RGB)
img_resize = cv2.resize(img, (IMG_WIDTH, IMG_HEIGHT) )
return img_resize
pathImg = '/path/to/images'
pathH5 = '/path/to/hdf5file'
ext_list = [".ppm", ".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif"]
IMG_WIDTH = 120
IMG_HEIGHT = 120
#get list of all images and number of images
all_images = []
for ext in ext_list:
all_images.extend(glob.glob(pathImg+"/*"+ext, recursive=True))
n_images = len(all_images)
ds_img_arr = np.zeros((n_images, IMG_WIDTH, IMG_HEIGHT,3),dtype=np.uint8)
for cnt,img in enumerate(all_images):
img_arr = convertImagetoH5(img)
ds_img_arr[cnt]=img_arr[:]
h5_filepath = pathH5 + '/all_image_data.hdf5'
with h5py.File(h5_filepath, 'w') as h5f:
dset = h5f.create_dataset('images', data=ds_img_arr)
print('hdf5 file size: %d bytes'%os.path.getsize(h5_filepath))
with h5py.File(h5_filepath, "r") as h5r:
key = list(h5r.keys())[0]
print (key, h5r[key].shape, h5r[key].dtype)
If you really want 1 HDF5 for each image, the code from your question is updated below. Again, only cv2 is used -- no need for PIL. Images are not resized. This is for completeness only (to demonstrate the process). It's not how you should manage your image data.
import h5py
import os
import cv2
import numpy as np
def convertImagetoH5(input_dir, filename, output_dir):
filepath = input_dir + '/' + filename
print('image size: %d bytes'%os.path.getsize(filepath))
img = cv2.imread(filepath, cv2.COLOR_BGR2RGB)
new_filepath = output_dir + '/' + filename[:-4] + '.hdf5'
with h5py.File(new_filepath, 'w') as h5f:
h5f.create_dataset('image', data =img)
print('hdf5 file size: %d bytes'%os.path.getsize(new_filepath))
pathImg = '/path/to/images'
pathH5 = '/path/to/hdf5file'
ext = [".ppm", ".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif"]
# Loop thru image files and create a matching HDF5 file
for img in os.listdir(pathImg):
if img.endswith(tuple(ext)):
convertImagetoH5(pathImg, img, pathH5)
# Loop thru HDF5 files and read image dataset (as an array)
for h5name in os.listdir(pathH5):
if h5name.endswith(".hdf5"):
with h5f = h5py.File(f"{pathH5}/{h5name}", "r") as h5f:
key = list(h5f.keys())[0]
image = h5f[key][:]
print(f'{h5name}: {image.shape}, {image.dtype}')

How to make a csv dataset from raw images in python?

I am making a ML project to recognize the silouhettes of different users. I have a raw image dataset of 1900 images. I want to convert them to a csv dataset with labels being the names of the users. I am currently stuck with the part of converting the images to a numpy array. The code is here
from PIL import Image
import numpy as np
import sys
import os
import csv
# default format can be changed as needed
def createFileList(myDir, format='.jpg'):
fileList = []
print(myDir)
for root, dirs, files in os.walk(myDir, topdown=False):
for name in files:
if name.endswith(format):
fullName = os.path.join(root, name)
fileList.append(fullName)
return fileList
rahul = []
# load the original image
myFileList = createFileList(r'C:\Users\Mr.X\PycharmProjects\Gait_Project\data\rahul')
for file in myFileList:
print(file)
img_file = Image.open(file)
# img_file.show()
# get original image parameters...
width, height = img_file.size
format = img_file.format
mode = img_file.mode
# Make image Greyscale
img_grey = img_file.convert('L')
img_res = img_grey.resize((480, 272))
# img_grey.save('result.png')
# img_grey.show()
# Save Greyscale values
value = np.asarray(img_res.getdata(), dtype=np.int).reshape((img_res.size[1], img_res.size[0]))
value = value.flatten()
print(value)
npvalue = np.array(value)
rahul.append(npvalue)
#with open("rahul.csv", 'a') as f:
# writer = csv.writer(f)
# writer.writerow(value)
final = np.array(rahul)
np.save("rahul.npy", final)
My goal is to make a data set with 1900 images and 4 labels, currently while making the numpy array each pixel of an image is entered in a separate column. making if 1900 rows and 200k columns that needs to become 1900 rows and 2 columns. Any suggestion or help is appreciated

How to read the shape of all images and display them, present in a dataset folder through google colab?

I am trying to train my image dataset on google colab. I have the dataset folder present in colab. When trying to read the images from the directory in colab, I am just able to read the file name of all the images. However, if I try extracting the shape of the images in an array, it gives different errors with different approaches. I have tried using os library and PIL.Image and even pickle, but I am still not able to sort or even guess what could be the issue. The errors which I am getting are:
1) AttributeError: 'list' object has no attribute 'read'
2) AttributeError: 'list' object has no attribute 'seek'
Both when using os.walk(path) function in a for loop and picking up the files from the resultant list of all files present in the path.
3) FileNotFoundError: [Errno 2] No such file or directory: '7119-220.jpg'
This seems weird as it is looking out specifically for the same file each time I run the code. By using try and except for this FileNotFoundError I don't get any output.
Question: What's the mistake that I am not getting?
import os
import matplotlib.pyplot as plt
import time
import numpy as np
from PIL import Image
imagesPath = 'Neural_Net-for-Concrete-Crack-Detection/Wall_crack_dataset/W/CW'
target_names = [item for item in os.listdir(imagesPath)
if os.path.isdir(os.path.join(imagesPath, item))]
number_train_samples = sum([len(files) for _, _, files in os.walk(imagesPath)])
image = np.zeros((256, 256), dtype=int)
total_number_samples = number_train_samples
print('Training a CNN Multi-Classifier Model ......')
print(' - # of trained samples: ', number_train_samples,
'\n - total # of samples: ', total_number_samples)
This piece works for just counting the number of image files.
from PIL import Image
import os
i=0
image = np.zeros((256, 256), dtype='uint8')
imagesPath = 'Neural_Net-for-Concrete-Crack-Detection/Wall_crack_dataset/W/CW'
for _, _, files in os.walk(imagesPath):
for file in files:
image = Image.open(file)
This code works better if I specify a particular image file in the directory to be plotted, but not for all.
os.walk(...) yields a 3-tuple (dirpath, dirnames, filenames). Therefore, you should try to open os.path.join(dirpath, file) instead of file:
from PIL import Image
import os
i=0
image = np.zeros((256, 256), dtype='uint8')
imagesPath = 'Neural_Net-for-Concrete-Crack-Detection/Wall_crack_dataset/W/CW'
for dirpath, _, files in os.walk(imagesPath): # <--
for file in files:
image = Image.open(os.path.join(dirpath, file)) # <--
If you need a dataset with shape (n_samples, channels, height, width) and you want to stick to PIL.Image, you can do this:
dataset_dir = "[DATASET_DIR]"
dataset = np.asarray([
np.asarray( # convert from PIL.Image to np.array
Image.open(os.path.join(dirpath, img_fname)) # open image
).transpose((2,0,1)) # change from (H,W,C) to (C,H,W)
for dirpath, _, fnames in os.walk(dataset_dir) # scan the `dataset_dir`
for img_fname in fnames # for each file in `dataset_dir`
])
Pay attention that it requires all images to have the same shape.

Feature selection using python

It's a letter recognition task and there are 284 images, and 19 classes. I want to apply naive bayesian. First I have to convert each image to feature vector and for reducing extra info I should use some feature selection code like cropping images to remove extra black borders. But I'm not much experienced in python.
How can I crop black spaces in images in order to decrease the size of csv files? ( because the columns are more than expected!) And also how can I resize images to be the same size?
from PIL import Image, ImageChops
from resize import trim
import numpy as np
import cv2
import os
import csv
#Useful function
def createFileList(myDir, format='.jpg'):
fileList = []
print(myDir)
for root, dirs, files in os.walk(myDir, topdown=False):
for name in files:
if name.endswith(format):
fullName = os.path.join(root, name)
fileList.append(fullName)
return fileList
# load the original image
myFileList = createFileList('image_ocr')
#print(myFileList)
for file in myFileList:
#print(file)
img_file = Image.open(file)
# img_file.show()
# get original image parameters...
width, height = img_file.size
format = img_file.format
mode = img_file.mode
# Make image Greyscale
img_grey = img_file.convert('L')
# Save Greyscale values
value = np.asarray(img_grey.getdata(), dtype=np.int).reshape((img_grey.size[1], img_grey.size[0]))
value = value.flatten()
#print(value)
with open("trainData.csv", 'a') as f:
writer = csv.writer(f)
writer.writerow(value)

Image to matrix using python

I am required to access all images in a folder and store it in a matrix. I was able to do it using matlab and here is the code:
input_dir = 'C:\Users\Karim\Downloads\att_faces\New Folder';
image_dims = [112, 92];
filenames = dir(fullfile(input_dir, '*.pgm'));
num_images = numel(filenames);
images = [];
for n = 1:num_images
filename = fullfile(input_dir, filenames(n).name);
img = imread(filename);
img = imresize(img,image_dims);
end
but I am required to do it using python and here is my python code:
import Image
import os
from PIL import Image
from numpy import *
import numpy as np
#import images
dirname = "C:\\Users\\Karim\\Downloads\\att_faces\\New folder"
#get number of images and dimentions
path, dirs, files = os.walk(dirname).next()
num_images = len(files)
image_file = "C:\\Users\\Karim\\Downloads\\att_faces\\New folder\\2.pgm"
im = Image.open(image_file)
width, height = im.size
images = []
for x in xrange(1, num_images):
filename = os.listdir(dirname)[x]
img = Image.open(filename)
img = im.convert('L')
images[:, x] = img[:]
but I am getting this error:
IOError: [Errno 2] No such file or directory: '10.pgm'
although the file is present.
I'm not quite sure what your end goal is, but try something more like this:
import numpy as np
import Image
import glob
filenames = glob.glob('/path/to/your/files/*.pgm')
images = [Image.open(fn).convert('L') for fn in filenames]
data = np.dstack([np.array(im) for im in images])
This will yield a width x height x num_images numpy array, assuming that all of your images have the same dimensions.
However, your images will be unsorted, so you may want to do filenames.sort().
Also, you may or may not want things as a 3D numpy array, but that depends entirely on what you're actually doing. If you just want to operate on each "frame" individually, then don't bother stacking them into one gigantic array.

Categories