Modifying a large number of DICOM (.dcm) files.

Modifying a large number of DICOM (.dcm) files. - python

bit of a simple question perhaps but I am not making progress and would appreciate help.
I have a list of size 422. Within index 0 there are 135 file paths to .dcm images. For example '~/images/0001.dcm','~/images/0135.dcm' Within index 1, there are 112 image paths, index 2 has 110, etc.
All images are of size 512 x 512. I am looking to re-size them to be 64 x 64.
This is my first time working with both image and .dcm data so I'm very unsure about how to resize. I am also unsure how to access and modify the files within the 'inner' list, if you will.
Is something like this way off the mark?
IMG_PX_SIZE = 64
result = []
for i in test_list:
result_inner_list = []
for image in i:
# resize all images at index position i and store in list
new_img = cv2.resize(np.array(image.pixel_array (IMG_PX_SIZE,IMG_PX_SIZE))
result_inner_list.append(new_img)
# Once all images at index point i are modified, append them these to a master list.
result.append(result_inner_list)

You seem to be struggling with two issues:
accessing the file paths
resize
For you to win, better separate these two tasks, sample code below
IMG_PX_SIZE = 64
def resize(image):
# your resize code here similar to:
# return v2.resize(np.array(image.pixel_array(IMG_PX_SIZE,IMG_PX_SIZE))
pass
def read(path):
# your file read operation here
pass
big_list = [['~/images/0001.dcm','~/images/0135.dcm'],
['~/images/0002.dcm','~/images/0136.dcm']]
resized_images = [[resize(read(path)) for path in paths] for paths in big_list]

Related

Why isn't Python writing elements from one 3-D list to another?

I'm trying to create a program that creates an average out of an image by looping through each RGBA value in two images to average them out and create one composite image, but the right value isn't being written to my list comp_img, which contains all the new image data.
I'm using these 256x256 images for debugging.
But it just creates this as output:
While this is the composite color, the 64 gets wiped out entirely. Help is very much appreciated, thank you.
from PIL import Image
import numpy as np
from math import ceil
from time import sleep
red = Image.open("64red.png")
grn = Image.open("64green.png")
comp_img = []
temp = [0,0,0,0] #temp list used for appending
#temp is a blank pixel template
for i in range(red.width):
comp_img.append(temp)
temp = comp_img
#temp should now be a row template composed of pixel templates
#2d to 3d array code go here
comp_img = []
for i in range(red.height):
comp_img.append(temp)
reddata = np.asarray(red)
grndata = np.asarray(grn)
reddata = reddata.tolist() #its uncanny how easy it is
grndata = grndata.tolist()
for row, elm in enumerate(reddata):
for pxl, subelm in enumerate(elm):
for vlu_index, vlu in enumerate(subelm):
comp_img[row][pxl][vlu_index] = ceil((reddata[row][pxl][vlu_index] + grndata[row][pxl][vlu_index])/2)
#These print statements dramatically slowdown the otherwise remarkably quick program, and are solely for debugging/demonstration.
output = np.array(comp_img, dtype=np.uint8) #the ostensible troublemaker
outputImg = Image.fromarray(output)
outputImg.save("output.png")

You could simply do
comp_img = np.ceil((reddata + grndata) / 2)
This gives me
To get correct values it needs to work with 16bit values - because for uint8 it works only with values 0..255 and 255+255 gives 254 instead of 510 (it calculates it modulo 256 and (255+255) % 256 gives 254)
reddata = np.asarray(red, dtype=np.uint16)
grndata = np.asarray(grn, dtype=np.uint16)
and then it gives
from PIL import Image
import numpy as np
red = Image.open("64red.png")
grn = Image.open("64green.png")
reddata = np.asarray(red, dtype=np.uint16)
grndata = np.asarray(grn, dtype=np.uint16)
#print(reddata[128,128])
#print(grndata[128,128])
#comp_img = (reddata + grndata) // 2
comp_img = np.ceil((reddata + grndata) / 2)
#print(comp_img[128,128])
output = np.array(comp_img, dtype=np.uint8)
outputImg = Image.fromarray(output)
outputImg.save("output.png")

You really should work with just NumPy arrays and functions, but I'll explain the bug here. It's rooted in how you make the nested list. Let's look at the first level:
temp = [0,0,0,0] #temp list used for appending
#temp is a blank pixel template
for i in range(red.width):
comp_img.append(temp)
At this stage, comp_img is a list with size red.width whose every element/pixel references the same RGBA-list [0,0,0,0]. I don't just mean the values are the same, it's one RGBA-list object. When you edit the values of that one RGBA-list, you edit all the pixels' colors at once.
Just fixing that step isn't enough. You also make the same error in the next step of expanding comp_img to a 2D matrix of RGBA-lists; every row is the same list.
If you really want to make a blank comp_img first, you should just make a NumPy array of a numerical scalar dtype; there you can guarantee every element is independent:
comp_img = np.zeros((red.height, red.width, 4), dtype = np.uint8)
If you really want a nested list, you have to properly instantiate (make new) lists at every level for them to be independent. A list comprehension is easy to write:
comp_img = [[[0,0,0,0] for i in range(red.width)] for j in range(red.height)]

Undersampling with image data in python

main idea of undersampling is randomly delete the class which has sufficient observations so that the comparative ratio of two classes is significant in our data.
So, how to undersampling with image data in python? please help me:(
I took the fundus image data from Kaggle. there are 35127 images with 5 classes.
class 0: 25810 data,
class 1: 2443 data,
class 2: 5292 data,
class 3: 873 data,
class 4: 708 data,
I want each class to have as much as 708 images following the 4th class. How do I delete the rest of the images in Python?

I know it is an old question but for the sake of people looking for the answer, this code works perfectly:
path = r'C:/The_Path'# You can provide the path here
n = 2500 # Number of random images to be removed
img_names = os.listdir(path) # Get image names in folder
img_names = random.sample(img_names, n) # Pick 2500 random images
for image in img_names: # Go over each image name to be deleted
f = os.path.join(path, image) # Create valid path to image
os.remove(f) # Remove the image
As your question states, you want all classes to be equal to class 4, i.e., 708 images. Simply find out the difference and replace n, for example, the difference between the number of class 3 images and 708 images are 165 images and so n = 165. Furthermore, you can make this into a function to generalise it more.
The code has been taken from, but edited:
How can i delete multiple images from multiple folders using python
https://stackoverflow.com/users/10512332/vikrant-sharma answered the question.
Thank you!

A dynamic list/array which contains matrices and vectors

so I have the following setup. I have a folder with two different file types. One file type (.m) are files of matrices and the other file type are the corresponding vectors (.txt). They have the same name i.e. matrix1.m and matrix1.txt
I want to create a list or an array such that I can save each matrix and each corresponding vector. I thought about a panda data frame, but I do not know if this is useful here.
Is there a way to do this?
for file1 in os.listdir(path):
file1name = os.fsdecode(file1)
if file1.endswith('.m'):
file2 = file1name.replace('.m', '.txt')
file2name = os.fsdecode(file2)
matrix = np.matrix(scipy.io.mmread(os.path.join(path, file1name)).toarray())
vector = np.loadtxt(os.path.join(path, file2name)
continue

This is how I would approach something like this given-
I need to do and store some result for each pair of files
Each pair is independent
The reason the code above isn't enough is because you wanted to try and speed things up.
Don't know if this will help, but it is one approach.
pool = ThreadPool()
op_names = {name.split('.')[0] for name in os.listdir(path)}
def operation(name):
matrix_filename = f'name.m'
vector_filename = f'name.txt'
return name, do_op(matrix_filename, vector_filename)
results = pool.map(operation, op_names)

Why os.remove() or shutil.move() can only move part of files

I want to randomly choose 10 images from the training dataset to be the test data. If I only copy the selected data to the destination path, it works. But if I want to remove the source data, it can only remove some of them. I tried both os.remove() and shutil.move() function, but the issue remains. The below is my script:
for label in labels:
training_data_path_ch1 = os.path.join(training_data_folder, label, 'ch1')
test_data_path_ch1 = os.path.join(test_data_folder, label, 'ch1')
training_data_path_ch5 = os.path.join(training_data_folder, label, 'ch5')
test_data_path_ch5 = os.path.join(test_data_folder, label, 'ch5')
ch1_imgs = listdir(training_data_path_ch1)
# Randomly select 10 images
ch1_mask = np.random.choice(len(ch1_imgs), 10)
ch1_selected_imgs = [ch1_imgs[i] for i in ch1_mask]
for selected_img in ch1_selected_imgs:
ch1_img_path = os.path.join(training_data_path_ch1, selected_img)
shutil.copy2(ch1_img_path, test_data_path_ch1)
os.remove(ch1_img_path)
print('Successfully move ' + label + ' ch1 images')
And I add an image to show the running status.
You can see, the program indeed can copy the images and remove some of the images, but why it cannot remove all images?
Any ideas? I appreciate any helps!

In:
ch1_mask = np.random.choice(len(ch1_imgs), 10)
You're potentially getting the same index returned more than once which means you're then trying to copy a file you've already copied and deleted (so you can't copy it again as it's removed), instead pass replace=False, eg:
ch1_mask = np.random.choice(len(ch1_imgs), 10, replace=False)

How to loop through one element of a zip() function twice - Python

So here's my dilema... I'm writing a script that reads all .png files from a folder and then converts them to a number of different dimensions which I have specified in a list. Everything works as it should except it quits after handling one image.
Here is my code:
sizeFormats = ["1024x1024", "114x114", "40x40", "58x58", "60x60", "640x1136", "640x960"]
def resizeImages():
widthList = []
heightList = []
resizedHeight = 0
resizedWidth = 0
#targetPath is the path to the folder that contains the images
folderToResizeContents = os.listdir(targetPath)
#This splits the dimensions into 2 separate lists for height and width (ex: 640x960 adds
#640 to widthList and 960 to heightList
for index in sizeFormats:
widthList.append(index.split("x")[0])
heightList.append(index.split("x")[1])
#for every image in the folder, apply the dimensions from the populated lists and save
for image,w,h in zip(folderToResizeContents,widthList,heightList):
resizedWidth = int(w)
resizedHeight = int(h)
sourceFilePath = os.path.join(targetPath,image)
imageFileToConvert = Image.open(sourceFilePath)
outputFile = imageFileToConvert.resize((resizedWidth,resizedHeight), Image.ANTIALIAS)
outputFile.save(sourceFilePath)
The following will be returned if the target folder contains 2 images called image1.png,image2.png (for sake of visualization I'll add the dimensions that get applied to the image after an underscore):
image1_1024x1024.png,
..............,
image1_640x690.png (Returns all 7 different dimensions for image1 fine)
it stops there when I need it to apply the same transformations to image_2. I know this is because the length of widthList and heightList are only 7 elements long and so exits the loop before image2 gets its turn. Is there any way I can go about looping through widthList and heightList for every image in the targetPath?

Why not keep it simple:
for image in folderToResizeContents:
for fmt in sizeFormats:
(w,h) = fmt.split('x')
N.B. You are overwriting the files produced as you are not changing the name of the outpath.

Nest your for loops and you can apply all 7 dimensions to each image
for image in folderToResizeContents:
for w,h in zip(widthList,heightList):
the first for loop will ensure it happens for each image, whereas the second for loop will ensure that the image is resized to each size

You need to re-iterate through the sizeFormats for every file. Zip doesn't do this unless you get even trickier with cyclic iterators for height and width.
Sometimes tools such as zip make for longer more complicated code when a couple of nested for loops work fine. I think its more straight forward than splitting into multiple lists and then zipping them back together again.
sizeFormats = ["1024x1024", "114x114", "40x40", "58x58", "60x60", "640x1136", "640x960"]
sizeTuples = [(int(w), int(h)) for w,h in map(lambda wh: wh.split('x'), sizeFormats)]
def resizeImages():
#for every image in the folder, apply the dimensions from the populated lists and save
for image in os.listdir(targetPath):
for resizedWidth, resizedHeight in sizeTuples:
sourceFilePath = os.path.join(targetPath,image)
imageFileToConvert = Image.open(sourceFilePath)
outputFile = imageFileToConvert.resize((resizedWidth,resizedHeight), Image.ANTIALIAS)
outputFile.save(sourceFilePath)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Modifying a large number of DICOM (.dcm) files. - python

Related

Why isn't Python writing elements from one 3-D list to another?

Undersampling with image data in python

A dynamic list/array which contains matrices and vectors

Why os.remove() or shutil.move() can only move part of files

How to loop through one element of a zip() function twice - Python

Categories

Resources