I'm trying to sort an image by luminosity using NumPy, which I'm new to. I've managed to create a random image and sort it.
def create_image(output, width, height, arr):
array = np.zeros([height, width, 3], dtype=np.uint8)
numOfSwatches = len(arr)
swatchWidth = int(width/ numOfSwatches)
for i in range (0, numOfSwatches):
m = i * swatchWidth
r = (i+1) * swatchWidth
array[:, m:r] = arr[i]
img = Image.fromarray(array)
img.save(output)
Which creates this image:
So far so good. Only now I want to switch from creating random images to loading them and then sorting them.
#!/usr/bin/python3
import numpy as np
from PIL import Image
# --------------------------------------------------------------
def load_image( infilename ) :
img = Image.open( infilename )
img.load()
data = np.asarray( img, dtype = "int32" )
return data
# --------------------------------------------------------------
def lum (r,g,b):
return math.sqrt( .241 * r + .691 * g + .068 * b )
myImageFile = "random_colours.png"
imageNP = load_image(myImageFile)
imageNP.sort(key=lambda rgb: lum(*rgb) )
The image should look like this:
The error I get is TypeError: 'key' is an invalid keyword argument for this function I may have created the NP array incorrectly as it worked when it was a random NP array.
Have not ever used PIL, but the following approach hopefully works (I'm not sure as I can't reproduce your exact examples), and of course there might be more efficient ways to do so.
I'm using your functions, having changed the math.sqrt function to np.sqrt in the lum function - as it is better for vector calculations. By the way, I believe this won't work with an int32 type array (as in your load_image function).
The key part is Numpy's argsort function (last line), which gives the indices that would sort the given array; this is applied to a row of the luminosity array (exploiting simmetry) and later used as indexer of img_array.
# Create random image
np.random.seed(4)
img = create_image('test.png', 75, 75, np.random.random((25,3))*255)
# Convert to Numpy array and calculate luminosity
img_array = np.array(img, dtype = np.uint8)
luminosity = lum(img_array[...,0], img_array[...,1], img_array[...,2])
# Sort by luminosity and convert to image again
img_sorted = Image.fromarray(img_array[:,luminosity[0].argsort()])
The original picture:
And the luminosity-sorted one:
My code below is intended to get a batch of images and convert them to RGB. But I keep getting an error which says to convert to type uint8. I have seen other questions regarding the conversion to uint8, but none directly from an array to uint8. Any advice on how to make that happen is welcome, thank you!
from skimage import io
import numpy as np
import glob, os
from tkinter import Tk
from tkinter.filedialog import askdirectory
import cv2
# wavelength in microns
MWIR = 4.5
R = .692
G = .582
B = .140
rgb_sum = R + G + B;
NRed = R/rgb_sum;
NGreen = G/rgb_sum;
NBlue = B/rgb_sum;
path = askdirectory(title='Select PNG Folder') # shows dialog box and return the path
outpath = askdirectory(title='Select SAVE Folder')
for file in os.listdir(path):
if file.endswith(".png"):
imIn = io.imread(os.path.join(path, file))
imOut = np.zeros(imIn.shape)
for i in range(imIn.shape[0]): # Assuming Rayleigh-Jeans law
for j in range(imIn.shape[1]):
imOut[i,j,0] = imIn[i,j,0]/((NRed/MWIR)**4)
imOut[i,j,1] = imIn[i,j,0]/((NGreen/MWIR)**4)
imOut[i,j,2] = imIn[i,j,0]/((NBlue/MWIR)**4)
io.imsave(os.path.join(outpath, file) + '_RGB.png', imOut)
the code I am trying to integrate into my own (found in another thread, used to convert type to uint8) is:
info = np.iinfo(data.dtype) # Get the information of the incoming image type
data = data.astype(np.float64) / info.max # normalize the data to 0 - 1
data = 255 * data # Now scale by 255
img = data.astype(np.uint8)
cv2.imshow("Window", img)
thank you!
Normally imInt is of type uint8, after your normalisation it is of type float32 because of the casting cause by the division. you must convert back to uint8 before saving to PNG file:
io.imsave(os.path.join(outpath, file) + '_RGB.png', imOut.astype(np.uint8))
Note that the two loops are not necessary, you can use numpy vector operations instead:
MWIR = 4.5
R = .692
G = .582
B = .140
vector = [R, G, B]
vector = vector / vector.sum()
vector = vector / MWIR
vector = np.pow(vector, 4)
for file in os.listdir(path):
if file.endswith((".png"):
imgIn = ...
imgOut = imgIn * vector
io.imsave(
os.path.join(outpath, file) + '_RGB.png',
imgOut.astype(np.uint8))
mat` files from the main folder which contains seven subfolders. Each folder is named with class number.
import glob
import os
import hdf5storage
import numpy as np
DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
files = glob.glob(DATASET_PATH + "**/*.mat", recursive= True)
class_labels = [i.split(os.sep)[-2] for i in files]
for label in range(0, len(class_labels)):
class_labels [label] = int(class_labels[label])
files variable contains the following:
Class labels contains the following:
I want to ask couple of things:
1) when I read the .mat files, it comes if dict and each dict contains different variable name. I want to know how can I read the key and assign to the array?
array_store=[]
for f in files:
mat = hdf5storage.loadmat(f)
arrays = np.array(mat.keys())
array_store.append(arrays)
2) files = glob.glob(DATASET_PATH + "**/*.mat", recursive= True) Is it possible to randomly read the specific amount of files from each folder inside the main folder? like 60% for training and 40% testing?
UPDATE
I have tried what #vopsea sugeested in Answer.
The output looks like that for train variable.
How I make the final array of images each files foy Key 1 - 7 (array (256 x 256 x 11 x total number of images))and labels (total number of images x 1 )? Labels will be same as key values, for example for all the files associated with Key 1 (188 files) will have label 1 (188 x 1).
UPDATE
resolving issue of making label and accessing key without key name.
import os
import random
import hdf5storage
import numpy as np
DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
train_images = []
test_images = []
train_label = list()
test_label = list()
percent_train = 0.4
class_folders = next(os.walk(DATASET_PATH))[1]
for x in class_folders:
files = os.listdir(os.path.join(DATASET_PATH,x))
random.shuffle(files)
n = int(len(files) * percent_train)
train_i = []
test_i = []
for i,f in enumerate(files):
abs_path= os.path.join(DATASET_PATH,x,f)
mat = hdf5storage.loadmat(abs_path)
if(i < n):
train_i.append(mat.values())
train_label.append(x)
else:
test_i.append(mat.values())
test_label.append(x)
train_images.append(train_i)
test_images.append(test_i)
1) Could you explain a bit more what you want in question 1? What is being appended? I might be misunderstanding, but it's easy to read unknown key, value pairs
for key, value in mat.items():
print(key, value)
2) I did this without glob. Shuffle the class files and slice them into two lists according to training percent. Probably best to have the same number of files for each class (or close) so training doesn't favor one especially.
import os
import random
DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
train = {}
test = {}
percent_train = 0.4
class_folders = next(os.walk(DATASET_PATH))[1]
for x in class_folders:
files = os.listdir(os.path.join(DATASET_PATH,x))
random.shuffle(files)
n = int(len(files) * percent_train)
train[x] = files[:n]
test[x] = files[n:]
EDIT 2:
Is this what you mean?
import os
import random
import hdf5storage
import numpy as np
DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
train_images = []
test_images = []
train_label = []
test_label = []
percent_train = 0.4
class_folders = next(os.walk(DATASET_PATH))[1]
for x in class_folders:
files = os.listdir(os.path.join(DATASET_PATH,x))
random.shuffle(files)
n = int(len(files) * percent_train)
for i,f in enumerate(files):
abs_path= os.path.join(DATASET_PATH,x,f)
mat = hdf5storage.loadmat(abs_path)
if(i < n):
train_images.append(mat.values())
train_label.append(x)
else:
test_images.append(mat.values())
test_label.append(x)
EDIT 3: Using dict for simplicity
Notice how simple it is to run through the images at the end. The alternative is storing two lists (data and labels) and one will have many duplicate items. You then have to through them both at the same time.
Although depending on what you're doing with this later, two lists could be the right choice.
import os
import random
import hdf5storage
import numpy as np
DATASET_PATH = "D:/Dataset/Multi-resolution_data/Visual/High/"
train_images = {}
test_images = {}
percent_train = 0.4
class_folders = next(os.walk(DATASET_PATH))[1]
for x in class_folders:
files = os.listdir(os.path.join(DATASET_PATH,x))
random.shuffle(files)
n = int(len(files) * percent_train)
for i,f in enumerate(files):
abs_path= os.path.join(DATASET_PATH,x,f)
mat = hdf5storage.loadmat(abs_path)
if(i < n):
train_images[x] = mat.values()
else:
test_images[x] = mat.values()
for img_class,img_data in train_images.items():
print( img_class, img_data )
I want to use SimpleITK or wedpy to convert the 3d images into 2d images.
Or i want to get a three-dimensional matrix, and then i divide the three-dimensional matrix into some two-dimensional matrices.
import SimpleITK as ITK
import numpy as np
#from medpy.io import load
url=r'G:\path\to\my.mha'
image = ITK.ReadImage(url)
frame_num, width, height = image_array.shape
print(frame_num,width,height)
Then only get it:155 240 240
but i want to get [[1,5,2,3,1...],[54,1,3,5...],[5,8,9,6....]]
Just to add to Dave Chen's answer, as it is unclear if you want to get a set of 2D SimpleITK images or numpy arrays. The following code covers all three available options:
import SimpleITK as sitk
import numpy as np
url = "my_file.mha"
image = sitk.ReadImage(url)
max_index = image.GetDepth() # or image.GetWidth() or image.GetHeight() depending on the axis along which you want to extract
# As list of 2D SimpleITK images
list_of_2D_images = [image[:,:,i] for i in range(max_index)]
# As list of 2D numpy arrays which cannot be modified (no data copied)
list_of_2D_images_np_view = [sitk.GetArrayViewFromImage(image[:,:,i]) for i in range(max_index)]
# As list of 2D numpy arrays (data copied to numpy array)
list_of_2D_images_np = [sitk.GetArrayFromImage(image[:,:,i]) for i in range(max_index)]
Also, if you really want to work with URLs and not local files I would suggest looking at the remote download approach used in the SimpleITK notebooks repository, the relevant file is downloaddata.py.
That's not a big deal.
CT images have originally all numbers in int16 type so you don't need to handle float numbers.. In this case, we can think that we can easily change from int16 to uint16 only removing negative values in the image (CT images have some negative numbers as pixel values). Note that we really need uint16, or uint8 type so that OpenCV can handle it... as we have a lot of values in the CT image array, the best choice is uint16, so that we don't lose too much precision.
Ok, now you just need to do as follows:
import SimpleITK as sitk
import numpy as np
import cv2
mha = sitk.ReadImage('/mha/directory') #Importing mha file
array = sitk.GetArrayFromImage(mha) #Converting to array int16 (default)
#Translating each slice to the positive side
for m in range(array.shape[0]):
array[m] = array[m] + abs(np.min(array[m]))
array = np.around(array, decimals=0) #remove any float numbers if exists.. probably not
array = np.asarray(array, dtype='uint16') #From int16 to uint16
After these steps the array is just ready to be saved as png images using opencv.imwrite module:
for image in array:
cv2.imwrite('/dir/to/save/'+'name_image.png', image)
Note that by default SimpleITK handles .mha files by the axial view. I really don't know how to change it because I've never needed it before. Anyway, in this case with some searches you can find something.
I'm not sure exactly what you want to get. But it's easy to extract a 2d slice from a 3d image in SimpleITK.
To get a Z slice where Z=100 you can do this:
zslice = image[100]
To get a Y slice for Y=100:
yslice = image[:, 100]
And a X slice for X=100:
xslice = image[:, :, 100]
#zivy#Dave Chen
I've solved my problem.In fact, running this code will give you 150 240*240 PNG pictures.It's i want to get.
# -*- coding:utf-8 -*-
import numpy as np
import subprocess
import random
import progressbar
from glob import glob
from skimage import io
np.random.seed(5) # for reproducibility
progress = progressbar.ProgressBar(widgets=[progressbar.Bar('*', '[', ']'), progressbar.Percentage(), ' '])
class BrainPipeline(object):
'''
A class for processing brain scans for one patient
INPUT: (1) filepath 'path': path to directory of one patient. Contains following mha files:
flair, t1, t1c, t2, ground truth (gt)
(2) bool 'n4itk': True to use n4itk normed t1 scans (defaults to True)
(3) bool 'n4itk_apply': True to apply and save n4itk filter to t1 and t1c scans for given patient. This will only work if the
'''
def __init__(self, path, n4itk = True, n4itk_apply = False):
self.path = path
self.n4itk = n4itk
self.n4itk_apply = n4itk_apply
self.modes = ['flair', 't1', 't1c', 't2', 'gt']
# slices=[[flair x 155], [t1], [t1c], [t2], [gt]], 155 per modality
self.slices_by_mode, n = self.read_scans()
# [ [slice1 x 5], [slice2 x 5], ..., [slice155 x 5]]
self.slices_by_slice = n
self.normed_slices = self.norm_slices()
def read_scans(self):
'''
goes into each modality in patient directory and loads individual scans.
transforms scans of same slice into strip of 5 images
'''
print('Loading scans...')
slices_by_mode = np.zeros((5, 155, 240, 240))
slices_by_slice = np.zeros((155, 5, 240, 240))
flair = glob(self.path + '/*Flair*/*.mha')
t2 = glob(self.path + '/*_T2*/*.mha')
gt = glob(self.path + '/*more*/*.mha')
t1s = glob(self.path + '/**/*T1*.mha')
t1_n4 = glob(self.path + '/*T1*/*_n.mha')
t1 = [scan for scan in t1s if scan not in t1_n4]
scans = [flair[0], t1[0], t1[1], t2[0], gt[0]] # directories to each image (5 total)
if self.n4itk_apply:
print('-> Applyling bias correction...')
for t1_path in t1:
self.n4itk_norm(t1_path) # normalize files
scans = [flair[0], t1_n4[0], t1_n4[1], t2[0], gt[0]]
elif self.n4itk:
scans = [flair[0], t1_n4[0], t1_n4[1], t2[0], gt[0]]
for scan_idx in xrange(5):
# read each image directory, save to self.slices
slices_by_mode[scan_idx] = io.imread(scans[scan_idx], plugin='simpleitk').astype(float)
for mode_ix in xrange(slices_by_mode.shape[0]): # modes 1 thru 5
for slice_ix in xrange(slices_by_mode.shape[1]): # slices 1 thru 155
slices_by_slice[slice_ix][mode_ix] = slices_by_mode[mode_ix][slice_ix] # reshape by slice
return slices_by_mode, slices_by_slice
def norm_slices(self):
'''
normalizes each slice in self.slices_by_slice, excluding gt
subtracts mean and div by std dev for each slice
clips top and bottom one percent of pixel intensities
if n4itk == True, will apply n4itk bias correction to T1 and T1c images
'''
print('Normalizing slices...')
normed_slices = np.zeros((155, 5, 240, 240))
for slice_ix in xrange(155):
normed_slices[slice_ix][-1] = self.slices_by_slice[slice_ix][-1]
for mode_ix in xrange(4):
normed_slices[slice_ix][mode_ix] = self._normalize(self.slices_by_slice[slice_ix][mode_ix])
print('Done.')
return normed_slices
def _normalize(self, slice):
'''
INPUT: (1) a single slice of any given modality (excluding gt)
(2) index of modality assoc with slice (0=flair, 1=t1, 2=t1c, 3=t2)
OUTPUT: normalized slice
'''
b, t = np.percentile(slice, (0.5,99.5))
slice = np.clip(slice, b, t)
if np.std(slice) == 0:
return slice
else:
return (slice - np.mean(slice)) / np.std(slice)
def save_patient(self, reg_norm_n4, patient_num):
'''
INPUT: (1) int 'patient_num': unique identifier for each patient
(2) string 'reg_norm_n4': 'reg' for original images, 'norm' normalized images, 'n4' for n4 normalized images
OUTPUT: saves png in Norm_PNG directory for normed, Training_PNG for reg
'''
print('Saving scans for patient {}...'.format(patient_num))
progress.currval = 0
if reg_norm_n4 == 'norm': #saved normed slices
for slice_ix in progress(xrange(155)): # reshape to strip
strip = self.normed_slices[slice_ix].reshape(1200, 240)
if np.max(strip) != 0: # set values < 1
strip /= np.max(strip)
if np.min(strip) <= -1: # set values > -1
strip /= abs(np.min(strip))
# save as patient_slice.png
io.imsave('Norm_PNG/{}_{}.png'.format(patient_num, slice_ix), strip)
elif reg_norm_n4 == 'reg':
for slice_ix in progress(xrange(155)):
strip = self.slices_by_slice[slice_ix].reshape(1200, 240)
if np.max(strip) != 0:
strip /= np.max(strip)
io.imsave('Training_PNG/{}_{}.png'.format(patient_num, slice_ix), strip)
else:
for slice_ix in progress(xrange(155)): # reshape to strip
strip = self.normed_slices[slice_ix].reshape(1200, 240)
if np.max(strip) != 0: # set values < 1
strip /= np.max(strip)
if np.min(strip) <= -1: # set values > -1
strip /= abs(np.min(strip))
# save as patient_slice.png
io.imsave('n4_PNG/{}_{}.png'.format(patient_num, slice_ix), strip)
def n4itk_norm(self, path, n_dims=3, n_iters='[20,20,10,5]'):
'''
INPUT: (1) filepath 'path': path to mha T1 or T1c file
(2) directory 'parent_dir': parent directory to mha file
OUTPUT: writes n4itk normalized image to parent_dir under orig_filename_n.mha
'''
output_fn = path[:-4] + '_n.mha'
# run n4_bias_correction.py path n_dim n_iters output_fn
subprocess.call('python n4_bias_correction.py ' + path + ' ' + str(n_dims) + ' ' + n_iters + ' ' + output_fn, shell = True)
def save_patient_slices(patients, type):
'''
INPUT (1) list 'patients': paths to any directories of patients to save. for example- glob("Training/HGG/**")
(2) string 'type': options = reg (non-normalized), norm (normalized, but no bias correction), n4 (bias corrected and normalized)
saves strips of patient slices to approriate directory (Training_PNG/, Norm_PNG/ or n4_PNG/) as patient-num_slice-num
'''
for patient_num, path in enumerate(patients):
a = BrainPipeline(path)
a.save_patient(type, patient_num)
def s3_dump(directory, bucket):
'''
dump files from a given directory to an s3 bucket
INPUT (1) string 'directory': directory containing files to save
(2) string 'bucket': name od s3 bucket to dump files
'''
subprocess.call('aws s3 cp' + ' ' + directory + ' ' + 's3://' + bucket + ' ' + '--recursive')
def save_labels(fns):
'''
INPUT list 'fns': filepaths to all labels
'''
progress.currval = 0
for label_idx in progress(range(len(labels))):
slices = io.imread(labels[label_idx], plugin = 'simpleitk')
for slice_idx in range(len(slices)):
io.imsave(r'{}_{}L.png'.format(label_idx, slice_idx), slices[slice_idx])
if __name__ == '__main__':
url = r'G:\work\deeplearning\BRATS2015_Training\HGG\brats_2013_pat0005_1\VSD.Brain.XX.O.MR_T1.54537\VSD.Brain.XX.O.MR_T1.54537.mha'
labels = glob(url)
save_labels(labels)
# patients = glob('Training/HGG/**')
# save_patient_slices(patients, 'reg')
# save_patient_slices(patients, 'norm')
# save_patient_slices(patients, 'n4')
# s3_dump('Graveyard/Training_PNG/', 'orig-training-png')
I'm trying to build a numpy array of arrays of arrays with the following code below.
Which gives me a
ValueError: setting an array element with a sequence.
My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure..
How can I fix the the code below so that I can build array of array of arrays?
from PIL import Image
import pickle
import os
import numpy
indir1 = 'PositiveResize'
trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)
i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
print 'hello'
for f in filenames:
try:
im = Image.open(os.path.join(root,f))
Imv=im.load()
x,y=im.size
pixelv = numpy.empty(6400)
ind=0
for i in range(x):
for j in range(y):
temp=float(Imv[j,i])
temp=float(temp/255.0)
pixelv[ind]=temp
ind+=1
if i<40000:
trainpixels[tr]=pixelv
tr+=1
elif i<45000:
validpixels[va]=pixelv
va+=1
else:
testpixels[te]=pixelv
te+=1
print str(i)+'\t'+str(f)
i+=1
except IOError:
continue
trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels
Don't try to smash your entire object into a numpy array. If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together.
For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays.
trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
trainpixels[i] = ...
trainlabels[i] = ...
To access an individual image's data:
imagepixels = trainpixels[253]
imagelabel = trainlabels[253]
And you can easily do stuff like compute summary statistics over the images.
meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)
If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. Some example usage:
# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
trainimages['label'][i] = ...
trainimages['pixel'][i] = ...
# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])
# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]
Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list.
trainimages = []
for i in range(10000):
pixels = ...
label = ...
image = (pixels, label)
trainimages.append(image)
Now to access a single images data:
imagepixels, imagelabel = trainimages[253]
This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images.
Refer to the examples in numpy.empty:
>>> np.empty([2, 2])
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]]) #random
Give your images a shape with the N dimensions:
testpixels = numpy.empty([96, 96])