I am a new Python user and would like to do some simple image processing. Essentially I will have a dynamic medical image - a series of 2D images at different time points which I would like to store as a 3D array. Due to the nature of the scanning technique there is likely to be occasional patient motion during certain imaging frames which makes the data unusable. I would like to delete such frames and recast the array - new dimensions (n-1, 256, 256). After deletion of the frame I would like to update the image display. What is the best way to achieve this goal? Here is the skeleton code I have so far:
import dicom
import numpy as np
import pylab
from matplotlib.widgets import Slider, Button
ds = dicom.read_file("/home/moadeep/Dropbox/FS1.dcm")
#data = ds.pixel_array
data = np.random.rand(16,256,256)
nframes = data.shape[0]
ax = pylab.subplot(111)
pylab.subplots_adjust(left=0.25, bottom=0.25)
frame = 0
l = pylab.imshow(data[frame,:,:]) #shows 1024x256 imagge, i.e. 0th frame*
axcolor = 'lightgoldenrodyellow'
axframe = pylab.axes([0.35, 0.1, 0.5, 0.03], axisbg=axcolor)
#add slider to scroll image frames
sframe = Slider(axframe, 'Frame', 0, nframes, valinit=0,valfmt='%1d'+'/'+str(nframes))
ax_delete = pylab.axes([0.8,0.025,0.1,0.04], axisbg=axcolor)
#add slider to scroll image frames
#Delete button to delete frame from data set
bDelete = Button(ax_delete, 'Delete')
def update(val):
frame = np.around(sframe.val)
pylab.subplot(111)
pylab.subplots_adjust(left=0.25, bottom=0.25)
pylab.imshow(data[frame,:,:])
sframe.on_changed(update)
pylab.gray()
pylab.show()
The short answer to your question is use numpy.delete. E.g.
import numpy as np
data = np.arange(1000).reshape((10,10,10))
# Delete the third slice along the first axis
# (note that you can delete multiple slices at once)
data = np.delete(data, [2], axis=0)
print data.shape
However, this is a poor approach if you're going to be removing individual slices many times.
The longer answer is to avoid doing this each time you want to delete a slice.
Numpy arrays have to be contiguous in memory. Therefore, this will make a new copy (and delete the old) each time. This will be relatively slow, and requires you to have twice the free memory space required to store the array.
In your case, why not store a python list of 2D arrays? That way you can pop the slices you don't want out without any problems. If you need it as a 3D array afterwards, just use numpy.dstack to create it.
Of course, if you need to do 3D processing, you'll need the 3D array. Therefore, another approach would be to store a list of "bad" indicies and remove them at the end using numpy.delete (note that the items to be deleted is a list, so you can just pass in your list of "bad" indicies).
On a side note, the way you're updating the image will be very slow.
You're creating lots of images, so each one will be redrawn each time and the update will become very slow as you go on.
You're better off setting the data of the image (im.set_data(next_slice)) instead of creating a new image each time.
Better yet, use blitting, but with image data in matplotlib, it's not as advantageous as it is for other types of plots due to matplotlib's slow-ish rescaling of images.
As a quick example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
def main():
# Set up 3D coordinates from -10 to 10 over a 200x100x100 "open" grid
x, y, z = np.ogrid[-10:10:200j, -10:10:100j, -10:10:100j]
# Generate a cube of interesting data
data= np.sin(x*y*z) / (x*y*z)
# Visualize it
viewer = VolumeViewer(data)
viewer.show()
class VolumeViewer(object):
def __init__(self, data):
self.data = data
self.nframes = self.data.shape[0]
# Setup the axes.
self.fig, self.ax = plt.subplots()
self.slider_ax = self.fig.add_axes([0.2, 0.03, 0.65, 0.03])
# Make the slider
self.slider = Slider(self.slider_ax, 'Frame', 1, self.nframes,
valinit=1, valfmt='%1d/{}'.format(self.nframes))
self.slider.on_changed(self.update)
# Plot the first slice of the image
self.im = self.ax.imshow(data[0,:,:])
def update(self, value):
frame = int(np.round(value - 1))
# Update the image data
dat = self.data[frame,:,:]
self.im.set_data(dat)
# Reset the image scaling bounds (this may not be necessary for you)
self.im.set_clim([dat.min(), dat.max()])
# Redraw the plot
self.fig.canvas.draw()
def show(self):
plt.show()
if __name__ == '__main__':
main()
Related
I am somewhat of a beginner when it comes to using Matplotlib and Python in general. I am trying to create a class that can generate multiple subplots given basic information such as how many rows and columns the user wants. I am running into an issue when trying to convert the multidimensional array that comes from matplotlib.pyplot.subplots(). Here is a copy of what I am working with currently:
import math
import numpy as np
import matplotlib.pyplot as plt
class subplotgrid:
def __init__(self, rows, cols, width, height): # class initialization
self.rows = rows
self.cols = cols
self.width = width
self.height = height
self.pltarray = [] # refers to the subplot number, values are 0 - (rows x cols)
def create(self, h_spacing = 0.2, w_spacing =0.1): # generate the subplots,
# h_spcing and v_spacing are optional params with default values
fig, pltarray = plt.subplots(self.rows, self.cols, figsize=(rows*self.width, cols*self.height)) # Creates subplots, rows x cols, total figure size
fig.subplots_adjust(hspace = .2, wspace=.1) # Specifies spaces between each subplot
self.pltarray = self.pltarray.reshape(-1) # axes within plt.subplot() are multidimensional arrays that can't be iterated over, so we have to flatten it with .ravel()
def show_scatter(self, x_array, y_array, dotSize = 0.5, title = "DEFAULT TITLE"):
for i in range(0, rows+cols-1): # loops through all of the subplots
self.pltarray[i].scatter(x[:,0],x[:,1], s=dotSize, c='red') # Scatter plot all points red
self.pltarray[i].scatter(inside[:,0],inside[:,1],s=dotSize, c='blue') # Scatter plot points inside blue
if (isinstance(title, np.ndarray)): # check if we are passing an array of titles, test passes True if we are
pltarray[i].set_title(title[i]) # set the title to the indexed title for the title array
else:
axs[i].set_title(title) # set the title to a static value if no title array was found
plt.show()
however the line self.pltarray = self.reshape(-1) throws the error:
AttributeError: 'subplotgrid' object has no attribute 'reshape'
with similar issues for .ravel() and .flatten() respectively. Why does this error show up and how does one go about fixing it?
You are reshaping your own class self.reahape(-1) which has no reshape method implemented. You need to reshape the axis array:
self.pltarray = pltarray.reshape(-1)
I am using the sliding window technic to an image and i am extracting the mean values of pixels of each one window. So the results are someting like this [[[[215.015625][123.55036272][111.66057478]]]].now the question is how could i save all these values for every one window into a txt file or at a CSV because i want to use them for further compare similarities? whatever i tried the error is same..that it is a 4D array and not an 1D or 2D. I ll appreciate any help really.! Thank you in advance
import cv2
import matplotlib.pyplot as plt
import numpy as np
# read the image and define the stepSize and window size
# (width,height)
image2 = cv2.imread("bird.jpg")# your image path
image = cv2.resize(image2, (224, 224))
tmp = image # for drawing a rectangle
stepSize = 10
(w_width, w_height) = (60, 60 ) # window size
for x in range(0, image.shape[1] - w_width, stepSize):
for y in range(0, image.shape[0] - w_height, stepSize):
window = image[x:x + w_width, y:y + w_height, :]
# classify content of the window with your classifier and
# determine if the window includes an object (cell) or not
# draw window on image
cv2.rectangle(tmp, (x, y), (x + w_width, y + w_height), (255, 0, 0), 2) # draw rectangle on image
plt.imshow(np.array(tmp).astype('uint8'))
# show all windows
plt.show()
mean_values=[]
mean_val, std_dev = cv2.meanStdDev(image)
mean_val = mean_val[:3]
mean_values.append([mean_val])
mean_values = np.asarray(mean_values)
print(mean_values)
Human Readable Option
Assuming that you want the data to be human readable, saving the data takes a little bit more work. My search showed me that there's this solution for saving 3D data to a text file. However, it's pretty simple to extend this example to 4D for your use case. This code is taken and adapted from that post, thank you Joe Kington and David Cheung.
import numpy as np
data = np.arange(2*3*4*5).reshape((2,3,4,5))
with open('test.csv', 'w') as outfile:
# We write this header for readable, the pound symbol
# will cause numpy to ignore it
outfile.write('# Array shape: {0}\n'.format(data.shape))
# Iterating through a ndimensional array produces slices along
# the last axis. This is equivalent to data[i,:,:] in this case.
# Because we are dealing with 4D data instead of 3D data,
# we need to add another for loop that's nested inside of the
# previous one.
for threeD_data_slice in data:
for twoD_data_slice in threeD_data_slice:
# The formatting string indicates that I'm writing out
# the values in left-justified columns 7 characters in width
# with 2 decimal places.
np.savetxt(outfile, twoD_data_slice, fmt='%-7.2f')
# Writing out a break to indicate different slices...
outfile.write('# New slice\n')
And then once the data has been saved all you need to do is load it and reshape it (np.load()) will default to reading in the data as a 2D array but np.reshape() will allow us to recover the structure. Again, this code is adapted from the previous post.
new_data = np.loadtxt('test.csv')
# Note that this returned a 2D array!
print(new_data.shape)
# However, going back to 3D is easy if we know the
# original shape of the array
new_data = new_data.reshape((2,3,4,5))
# Just to check that they're the same...
assert np.all(new_data == data)
Binary Option
Assuming that human readability is not necessary, I would recommend using the built-in *.npy format which is described here. This stores the data in a binary format.
You can save the array by doing np.save('NAME_OF_ARRAY.npy', ARRAY_TO_BE_SAVED) and then load it with SAVED_ARRAY = np.load('NAME_OF_ARRAY.npy').
You can also save several numpy array in a single zip file with the np.savez() function like so np.savez('MANY_ARRAYS.npz', ARRAY_ONE, ARRAY_TWO). And you load the zipped arrays in a similar fashion SEVERAL_ARRAYS = np.load('MANY_ARRAYS.npz').
Plotting a fairly large point cloud in python using plotly produces a graph with axes (not representative of the data range) and no data points.
The code:
import pandas as pd
import plotly.express as px
import numpy as np
all_res = np.load('fullshelf4_11_2019.npy' )
all_res.shape
(3, 6742382)
np.max(all_res[2])
697.5553566696478
np.min(all_res[2])
-676.311654692491
frm = pd.DataFrame(data=np.transpose(all_res[0:, 0:]),columns=["X", "Y", "Z"])
fig = px.scatter_3d(frm, x='X', y='Y', z='Z')
fig.update_traces(marker=dict(size=4))
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))
fig.show()
Alternatively you could generate random data and follow the process through
all_res = np.random.rand(3, 6742382)
Which also produces a blank graph with a axis scales that are incorrect.
So -- what am I doing wrong, and is there a better way to plot such a moderately large data set?
Thanks for your help!
Try plotting using ipyvolume.It can handle large point cloud datasets.
It seems like that's too much data for WebGL to handle. I managed to plot 100k points, but 1M points already caused Jupyter to crash. However, a 3D scatterplot of 6.7 million points is of questionable value anyway. You probably won't be able to make any sense out of it (except for data boundaries maybe) and it will be super slow to rotate etc.
I would try to think of alternative approaches, depending on what you want to do. Maybe pick a representative subset of points and plot those.
I would suggest using pythreejs for a point cloud. It has very good performance, even for a large number of points.
import pythreejs as p3
import numpy as np
N = 1_000_000
# Positions centered around the origin
positions = np.random.normal(loc=0.0, scale=100.0, size=(N, 3)).astype('float32')
# Create a buffer geometry with random color for each point
geometry = p3.BufferGeometry(
attributes={'position': p3.BufferAttribute(array=positions),
'color': p3.BufferAttribute(
array=np.random.random((N, 3)).astype('float32'))})
# Create a points material
material = p3.PointsMaterial(vertexColors='VertexColors', size=1)
# Combine the geometry and material into a Points object
points = p3.Points(geometry=geometry, material=material)
# Create the scene and the renderer
view_width = 700
view_height = 500
camera = p3.PerspectiveCamera(position=[800.0, 0, 0], aspect=view_width/view_height)
scene = p3.Scene(children=[points, camera], background="#DDDDDD")
controller = p3.OrbitControls(controlling=camera)
renderer = p3.Renderer(camera=camera, scene=scene, controls=[controller],
width=view_width, height=view_height)
renderer
How can I use something like below line to save images on a grid of 4x4 of heterogenous images? Imagine that images are identified by sample[i] and i takes 16 different values.
scipy.misc.imsave(str(img_index) + '.png', sample[1])
Similar to this answer but for 16 different images
https://stackoverflow.com/a/42041135/2414957
I am not biased towards the used method as long as it does the deed. Also, I am interested in saving images rather than showing them using plt.show() as I am using a remote server and dealing with CelebA image dataset which is a giant dataset. I just want to randomly select 16 images from my batch and save the results of DCGAN and see if it makes any sense or if it converges.
*Currently, I am saving images like below:
batch_no = random.randint(0, 63)
scipy.misc.imsave('sample_gan_images/iter_%d_epoch_%d_sample_%d.png' %(itr, epoch, batch_no), sample[batch_no])
and here, I have 25 epochs and 2000 iterations and batch size is 64.
Personally, I tend to use matplotlib.pyplot.subplots for these kinds of situations. If your images are really heterogenous it might be a better choice than the image concatenation based approach in the answer you linked to.
import matplotlib.pyplot as plt
from scipy.misc import face
x = 4
y = 4
fig,axarr = plt.subplots(x,y)
ims = [face() for i in range(x*y)]
for ax,im in zip(axarr.ravel(), ims):
ax.imshow(im)
fig.savefig('faces.png')
My big complaint about subplots is the quantity of whitespace in the resulting figure. As well, for your application you may not want the axes ticks/frames. Here's a wrapper function that deals with those issues:
import matplotlib.pyplot as plt
def savegrid(ims, rows=None, cols=None, fill=True, showax=False):
if rows is None != cols is None:
raise ValueError("Set either both rows and cols or neither.")
if rows is None:
rows = len(ims)
cols = 1
gridspec_kw = {'wspace': 0, 'hspace': 0} if fill else {}
fig,axarr = plt.subplots(rows, cols, gridspec_kw=gridspec_kw)
if fill:
bleed = 0
fig.subplots_adjust(left=bleed, bottom=bleed, right=(1 - bleed), top=(1 - bleed))
for ax,im in zip(axarr.ravel(), ims):
ax.imshow(im)
if not showax:
ax.set_axis_off()
kwargs = {'pad_inches': .01} if fill else {}
fig.savefig('faces.png', **kwargs)
Running savegrid(ims, 4, 4) on the same set of images as used earlier yields:
If you use savegrid, if you want each individual image to take up less space, pass the fill=False keyword arg. If you want to show the axes ticks/frames, pass showax=True.
I found this on github, also sharing it:
import matplotlib.pyplot as plt
def merge_images(image_batch, size):
h,w = image_batch.shape[1], image_batch.shape[2]
c = image_batch.shape[3]
img = np.zeros((int(h*size[0]), w*size[1], c))
for idx, im in enumerate(image_batch):
i = idx % size[1]
j = idx // size[1]
img[j*h:j*h+h, i*w:i*w+w,:] = im
return img
im_merged = merge_images(sample, [8,8])
plt.imsave('sample_gan_images/im_merged.png', im_merged )
I'm trying to overplot two arrays with different shapes but I'm unable to project one on the top of the other. For example:
#importing the relevant packages
import numpy as np
import matplotlib.pyplot as plt
def overplot(data1,data2):
'''
This function should make a contour plot
of data2 over the data1 plot.
'''
#creating the figure
fig = plt.figure()
#adding an axe
ax = fig.add_axes([1,1,1,1])
#making the plot for the
#first dataset
ax.imshow(data1)
#overplotting the contours
#for the second dataset
ax.contour(data2, projection = data2,
levels = [0.5,0.7])
#showing the figure
plt.show(fig)
return
if __name__ == '__main__':
'''
testing zone
'''
#creating two mock datasets
data1 = np.random.rand(3,3)
data2 = np.random.rand(9,9)
#using the overplot
overplot(data1,data2)
Currently, my output is something like:
While what I actually would like is to project the contours of the second dataset into the first one. This way, if I got images of the same object but with different resolution for the cameras I would be able to do such plots. How can I do that?
Thanks for your time and attention.
It's generally best to make the data match, and then plot it. This way you have complete control over how things are done.
In the simple example you give, you could use repeat along each axis to expand the 3x3 data to match the 9x9 data. That is, you could use, data1b = np.repeat(np.repeat(data1, 3, axis=1), 3, axis=0) to give:
But for the more interesting case of images, like you mention at the end of your question, then the axes probably won't be integer multiples and you'll be better served by a spline or other type interpolation. This difference is an example of why it's better to have control over this yourself, since there are many ways to to this type of mapping.