Loading high number of images to memory and save pickle

Loading high number of images to memory and save pickle - python

I have a problem...
I have a dataset with 1200 cases and 30 classes per case and 160 images per class. These images are grayscale ndarrays, float64 dtype.
I would like to slice each case and get only 30 images from each class and put them in a dictionary where the first_key is case_name and second_one name of a class. After all of this I would like to save whole dictionary to a pickle.
but I run out of memory all the time.
brain_all = []
for dir in path.iterdir():
brain_sample = {}
path_dir = path_save / dir.name
try:
path_dir.mkdir(parents=True, exist_ok=False)
except FileExistsErorr:
print('Folder is already there')
for file in dir.iterdir():
sample = nib.load(file).get_fdata()[:, :, 75:105]
if 'flair' in file.name:
brain_sample['flair'] = sample
elif 't1ce' in file.name:
brain_sample['t1ce'] = sample
brain_all.append([file, brain_sample])

Related

Saving images with different name in folder

I tried save images in folder like this, it saves different images but every next image have all names of previously images.
db = h5py.File('results/Results.h5', 'r')
dsets = sorted(db['data'].keys())
for k in dsets:
db = get_data()
imnames = sorted(db['data'].keys())
slika = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
cv2.imwrite(f'spremljene_slike/ime_{imnames}.png', slika)
So i tried like this and it saves different names but only last generated picture is imwrited in folder, so different names - the same picture
NUM_IMG = -1
N = len(imnames)
global NUM_IMG
if NUM_IMG < 0:
NUM_IMG = N
start_idx,end_idx = 0,N #min(NUM_IMG, N)
**In different function:**
for u in range(start_idx,end_idx):
imname = imnames[u]
cv2.imwrite(f'spremljene_slike/ime_{imname}.png', imname)
Can someone help, I can't figure out.
I have script which generate images with rendered text and save it in .h5 file, and then from there I want to save this pictures with corresponding names in different folder.

Don't see how this works at all. On line 1 you define db=h5py.File(), then on line 4, you redefine it as db=get_data(). What is get_data()?
It's hard to write code without the schema. Answer below is my best-guess assuming your images are datasets in db['data'] and you want to use the dataset names (aka keys) as the image names.
with h5py.File('results/Results.h5', 'r') as db:
dsets = sorted(db['data'].keys())
for imname in dsets:
img_arr = db['data'][imname][()]
slika = cv2.cvtColor(img_arr, cv2.COLOR_BGR2RGB)
cv2.imwrite(f'spremljene_slike/ime_{imname}.png', slika)
That should be all you need to do. You will get 1 .png for each dataset named ime_{imname}.png (where imname is the matching dataset name).
Also, you can eliminate all of the intermediate variables (dsets, img_arr and slika). Compress the code above into a few lines:
with h5py.File('results/Results.h5', 'r') as db:
for imname in sorted(db['data'].keys()):
cv2.imwrite(f'spremljene_slike/ime_{imname}.png', \
cv2.cvtColor(db['data'][imname][()], cv2.COLOR_BGR2RGB))

Undersampling with image data in python

main idea of undersampling is randomly delete the class which has sufficient observations so that the comparative ratio of two classes is significant in our data.
So, how to undersampling with image data in python? please help me:(
I took the fundus image data from Kaggle. there are 35127 images with 5 classes.
class 0: 25810 data,
class 1: 2443 data,
class 2: 5292 data,
class 3: 873 data,
class 4: 708 data,
I want each class to have as much as 708 images following the 4th class. How do I delete the rest of the images in Python?

I know it is an old question but for the sake of people looking for the answer, this code works perfectly:
path = r'C:/The_Path'# You can provide the path here
n = 2500 # Number of random images to be removed
img_names = os.listdir(path) # Get image names in folder
img_names = random.sample(img_names, n) # Pick 2500 random images
for image in img_names: # Go over each image name to be deleted
f = os.path.join(path, image) # Create valid path to image
os.remove(f) # Remove the image
As your question states, you want all classes to be equal to class 4, i.e., 708 images. Simply find out the difference and replace n, for example, the difference between the number of class 3 images and 708 images are 165 images and so n = 165. Furthermore, you can make this into a function to generalise it more.
The code has been taken from, but edited:
How can i delete multiple images from multiple folders using python
https://stackoverflow.com/users/10512332/vikrant-sharma answered the question.
Thank you!

How to create variables for Facial_Recognition from database

I'm trying to be able to pull data from a database with a name and an image file name then put it into a face_recognition Python program. However, for the code that I'm using, the program learns the faces by calling variables with different names.
How can I create variables based on the amount of data that I have in the database?
What could be a better approach to solve this problem?
first_image = face_recognition.load_image_file("first.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
second_image = face_recognition.load_image_file("second.jpg")
biden_face_encoding = face_recognition.face_encodings(second_image)[0]

You can use arrays instead of storing each image/encoding in an individual variable, and fill the arrays from a for loop.
Assuming you can change the filenames from first.jpg, second.jpg... to 1.jpg, 2.jpg... you can do this:
numberofimages = 10 # change this to the total number of images
images = [None] * (numberofimages+1) # create an array to store all the images
encodings = [None] * (numberofimages+1) # create an array to store all the encodings
for i in range(1, numberofimages+1):
filename = str(i) + ".jpg" # generate image file name (eg. 1.jpg, 2.jpg...)
# load the image and store it in the array
images[i] = face_recognition.load_image_file(filename)
# store the encoding
encodings[i] = face_recognition.face_encodings(images[i])[0]
You can then access eg. the 3rd image and 3rd encoding like this:
image[3]
encoding[3]
If changing image file names is not an option, you can store them in a dictionary and do this:
numberofimages = 3 # change this to the total number of images
images = [None] * (numberofimages+1) # create an array to store all the images
encodings = [None] * (numberofimages+1) # create an array to store all the encodings
filenames = {
1: "first",
2: "second",
3: "third"
}
for i in range(1, numberofimages+1):
filename = filenames[i] + ".jpg" # generate file name (eg. first.jpg, second.jpg...)
print(filename)
# load the image and store it in the array
images[i] = face_recognition.load_image_file(filename)
# store the encoding
encodings[i] = face_recognition.face_encodings(images[i])[0]

Modifying a large number of DICOM (.dcm) files.

bit of a simple question perhaps but I am not making progress and would appreciate help.
I have a list of size 422. Within index 0 there are 135 file paths to .dcm images. For example '~/images/0001.dcm','~/images/0135.dcm' Within index 1, there are 112 image paths, index 2 has 110, etc.
All images are of size 512 x 512. I am looking to re-size them to be 64 x 64.
This is my first time working with both image and .dcm data so I'm very unsure about how to resize. I am also unsure how to access and modify the files within the 'inner' list, if you will.
Is something like this way off the mark?
IMG_PX_SIZE = 64
result = []
for i in test_list:
result_inner_list = []
for image in i:
# resize all images at index position i and store in list
new_img = cv2.resize(np.array(image.pixel_array (IMG_PX_SIZE,IMG_PX_SIZE))
result_inner_list.append(new_img)
# Once all images at index point i are modified, append them these to a master list.
result.append(result_inner_list)

You seem to be struggling with two issues:
accessing the file paths
resize
For you to win, better separate these two tasks, sample code below
IMG_PX_SIZE = 64
def resize(image):
# your resize code here similar to:
# return v2.resize(np.array(image.pixel_array(IMG_PX_SIZE,IMG_PX_SIZE))
pass
def read(path):
# your file read operation here
pass
big_list = [['~/images/0001.dcm','~/images/0135.dcm'],
['~/images/0002.dcm','~/images/0136.dcm']]
resized_images = [[resize(read(path)) for path in paths] for paths in big_list]

How to loop through one element of a zip() function twice - Python

So here's my dilema... I'm writing a script that reads all .png files from a folder and then converts them to a number of different dimensions which I have specified in a list. Everything works as it should except it quits after handling one image.
Here is my code:
sizeFormats = ["1024x1024", "114x114", "40x40", "58x58", "60x60", "640x1136", "640x960"]
def resizeImages():
widthList = []
heightList = []
resizedHeight = 0
resizedWidth = 0
#targetPath is the path to the folder that contains the images
folderToResizeContents = os.listdir(targetPath)
#This splits the dimensions into 2 separate lists for height and width (ex: 640x960 adds
#640 to widthList and 960 to heightList
for index in sizeFormats:
widthList.append(index.split("x")[0])
heightList.append(index.split("x")[1])
#for every image in the folder, apply the dimensions from the populated lists and save
for image,w,h in zip(folderToResizeContents,widthList,heightList):
resizedWidth = int(w)
resizedHeight = int(h)
sourceFilePath = os.path.join(targetPath,image)
imageFileToConvert = Image.open(sourceFilePath)
outputFile = imageFileToConvert.resize((resizedWidth,resizedHeight), Image.ANTIALIAS)
outputFile.save(sourceFilePath)
The following will be returned if the target folder contains 2 images called image1.png,image2.png (for sake of visualization I'll add the dimensions that get applied to the image after an underscore):
image1_1024x1024.png,
..............,
image1_640x690.png (Returns all 7 different dimensions for image1 fine)
it stops there when I need it to apply the same transformations to image_2. I know this is because the length of widthList and heightList are only 7 elements long and so exits the loop before image2 gets its turn. Is there any way I can go about looping through widthList and heightList for every image in the targetPath?

Why not keep it simple:
for image in folderToResizeContents:
for fmt in sizeFormats:
(w,h) = fmt.split('x')
N.B. You are overwriting the files produced as you are not changing the name of the outpath.

Nest your for loops and you can apply all 7 dimensions to each image
for image in folderToResizeContents:
for w,h in zip(widthList,heightList):
the first for loop will ensure it happens for each image, whereas the second for loop will ensure that the image is resized to each size

You need to re-iterate through the sizeFormats for every file. Zip doesn't do this unless you get even trickier with cyclic iterators for height and width.
Sometimes tools such as zip make for longer more complicated code when a couple of nested for loops work fine. I think its more straight forward than splitting into multiple lists and then zipping them back together again.
sizeFormats = ["1024x1024", "114x114", "40x40", "58x58", "60x60", "640x1136", "640x960"]
sizeTuples = [(int(w), int(h)) for w,h in map(lambda wh: wh.split('x'), sizeFormats)]
def resizeImages():
#for every image in the folder, apply the dimensions from the populated lists and save
for image in os.listdir(targetPath):
for resizedWidth, resizedHeight in sizeTuples:
sourceFilePath = os.path.join(targetPath,image)
imageFileToConvert = Image.open(sourceFilePath)
outputFile = imageFileToConvert.resize((resizedWidth,resizedHeight), Image.ANTIALIAS)
outputFile.save(sourceFilePath)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading high number of images to memory and save pickle - python

Related

Saving images with different name in folder

Undersampling with image data in python

How to create variables for Facial_Recognition from database

Modifying a large number of DICOM (.dcm) files.

How to loop through one element of a zip() function twice - Python

Categories

Resources