What is the most efficient way to plot 3d array in Python?
For example:
volume = np.random.rand(512, 512, 512)
where array items represent grayscale color of each pixel.
The following code works too slow:
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.gca(projection='3d')
volume = np.random.rand(20, 20, 20)
for x in range(len(volume[:, 0, 0])):
for y in range(len(volume[0, :, 0])):
for z in range(len(volume[0, 0, :])):
ax.scatter(x, y, z, c = tuple([volume[x, y, z], volume[x, y, z], volume[x, y, z], 1]))
plt.show()
For better performance, avoid calling ax.scatter multiple times, if possible.
Instead, pack all the x,y,z coordinates and colors into 1D arrays (or
lists), then call ax.scatter once:
ax.scatter(x, y, z, c=volume.ravel())
The problem (in terms of both CPU time and memory) grows as size**3, where size is the side length of the cube.
Moreover, ax.scatter will try to render all size**3 points without regard to
the fact that most of those points are obscured by those on the outer
shell.
It would help to reduce the number of points in volume -- perhaps by
summarizing or resampling/interpolating it in some way -- before rendering it.
We can also reduce the CPU and memory required from O(size**3) to O(size**2)
by only plotting the outer shell:
import functools
import itertools as IT
import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def cartesian_product_broadcasted(*arrays):
"""
http://stackoverflow.com/a/11146645/190597 (senderle)
"""
broadcastable = np.ix_(*arrays)
broadcasted = np.broadcast_arrays(*broadcastable)
dtype = np.result_type(*arrays)
rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)
out = np.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
# #profile # used with `python -m memory_profiler script.py` to measure memory usage
def main():
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
size = 512
volume = np.random.rand(size, size, size)
x, y, z = cartesian_product_broadcasted(*[np.arange(size, dtype='int16')]*3).T
mask = ((x == 0) | (x == size-1)
| (y == 0) | (y == size-1)
| (z == 0) | (z == size-1))
x = x[mask]
y = y[mask]
z = z[mask]
volume = volume.ravel()[mask]
ax.scatter(x, y, z, c=volume, cmap=plt.get_cmap('Greys'))
plt.show()
if __name__ == '__main__':
main()
But note that even when plotting only the outer shell, to achieve a plot with
size=512 we still need around 1.3 GiB of memory. Also beware that even if you have enough total memory but, due to a lack of RAM, the program uses swap space, then the overall speed of the program will
slow down dramatically. If you find yourself in this situation, then the only solution is to find a smarter way to render an acceptable image using fewer points, or to buy more RAM.
First, a dense grid of 512x512x512 points is way too much data to plot, not from a technical perspective but from being able to see anything useful from it when observing the plot. You probably need to extract some isosurfaces, look at slices, etc. If most of the points are invisible, then it's probably okay, but then you should ask ax.scatter to only show the nonzero points to make it faster.
That said, here's how you can do it much more quickly. The tricks are to eliminate all Python loops, including ones that would be hidden in libraries like itertools.
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
# Make this bigger to generate a dense grid.
N = 8
# Create some random data.
volume = np.random.rand(N, N, N)
# Create the x, y, and z coordinate arrays. We use
# numpy's broadcasting to do all the hard work for us.
# We could shorten this even more by using np.meshgrid.
x = np.arange(volume.shape[0])[:, None, None]
y = np.arange(volume.shape[1])[None, :, None]
z = np.arange(volume.shape[2])[None, None, :]
x, y, z = np.broadcast_arrays(x, y, z)
# Turn the volumetric data into an RGB array that's
# just grayscale. There might be better ways to make
# ax.scatter happy.
c = np.tile(volume.ravel()[:, None], [1, 3])
# Do the plotting in a single call.
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter(x.ravel(),
y.ravel(),
z.ravel(),
c=c)
A similar solution can be achieved with product from itertools:
from itertools import product
from matplotlib import pyplot as plt
N = 8
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(projection="3d")
space = np.array([*product(range(N), range(N), range(N))]) # all possible triplets of numbers from 0 to N-1
volume = np.random.rand(N, N, N) # generate random data
ax.scatter(space[:,0], space[:,1], space[:,2], c=space/8, s=volume*300)
Related
I have a sequence of data files which contain two columns of data (x value, and z value). I want to asign each file with a unique constant y value with a loop and then use x,y,z values to make a contour plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files=glob.glob('C:\Users\DDT\Desktop\DATA TIANYU\materials\AB2O4\synchronchron\OX1\YbFe1Mn1O4_2cyc_600_meth_ox1-*.xye')
s1=1
for file in files:
t1=s1/3
x,z = np.loadtxt(file,skiprows=3,unpack=True, usecols=[0,1])
def f(x, y):
return x*0 +y*0 +z
l1=np.size(x)
y=np.full(l1, t1,dtype=int)
X,Y=np.meshgrid(x,y)
Z = f(X,Y)
plt.contour(X,Y,Z)
s1=s1+1
continue
plt.show()
There is no error in this code, however what I got is an empty figure with nothing.
What mistake did I make?
It is very hard to guess what you're trying to do. Here is an attempt. It supposes that all x-arrays are equal. And that the y really makes sense (although that is hard if the files are read in an unspecified order). To get a useful plot, the data from all the files should be collected before starting to plot.
import glob
import matplotlib.pyplot as plt
import numpy as np
files = glob.glob('........')
zs = []
for file in files:
x, z = np.loadtxt(file, skiprows=3, unpack=True, usecols=[0, 1])
zs.append(z)
# without creating a new x, the x from the last file will be used
# x = np.linspace(0, 15, 10)
y = np.linspace(-100, 1000, len(zs))
zs = np.array(zs)
fig, axs = plt.subplots(ncols=2)
axs[0].scatter(np.tile(x, y.size), np.repeat(y, x.size), c=zs)
axs[1].contour(x, y, zs)
plt.show()
With simulated random data, the scatter plot and the contour plot would look like:
I wrote a simple function to plot log in python:
import matplotlib.pyplot as plt
import numpy as np
x = list(range(1, 10000, 1))
y = [-np.log(p/10000) for p in x]
plt.scatter(x, y) # also tried with plt.plot(x, y)
plt.show()
I just want to see how the plot looks.
fn.py:5: RuntimeWarning: divide by zero encountered in log
y = [-np.log(p/10000) for p in x]
I get the above error and on top of that I get a blank plot with even the ranges wrong.
It is strange why there is divide by zero warning, when I am dividing by a number?
How can I correctly plot the function?
Although you have tagged python-3.x, it seems that you are using python-2.x where p/10000 will result in 0 for values of p < 10000 because the division operator / performs integer division in python-2.x. If that is the case, you can explicitly use 10000.0 instead of 10000 to avoid that and get a float division.
Using .0 is not needed in python 3+ because by default it performs float division. Hence, your code works fine in python 3.6.5 though
import matplotlib.pyplot as plt
import numpy as np
x = list(range(1, 10000, 1))
y = [-np.log(p/10000.0) for p in x]
plt.scatter(x, y)
plt.show()
On a different note: You can simply use NumPy's arange to generate x and avoid the list completely and use vectorized operation.
x = np.arange(1, 10000)
y = -np.log(x/10000.0)
Why import numpy and then avoid using it? You could have simply done:
from math import log
import matplotlib.pyplot as plt
x = xrange(1, 10000)
y = [-log(p / 10000.0) for p in x]
plt.scatter(x, y)
plt.show()
If you're going to bring numpy into the picture, think about doing things in a numpy-like fashion:
import matplotlib.pyplot as plt
import numpy as np
f = lambda p: -np.log(p / 10000.0)
x = np.arange(1, 10000)
plt.scatter(x, f(x))
plt.show()
Here is my resulting plot below but I would like it to look like the truncated dendrograms in astrodendro such as this:
There is also a really cool looking dendrogram from this paper that I would like to recreate in matplotlib.
Below is the code for generating an iris data set with noise variables and plotting the dendrogram in matplotlib.
Does anyone know how to either: (1) truncate the branches like in the example figures; and/or (2) to use astrodendro with a custom linkage matrix and labels?
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
import astrodendro
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial import distance
def iris_data(noise=None, palette="hls", desat=1):
# Iris dataset
X = pd.DataFrame(load_iris().data,
index = [*map(lambda x:f"iris_{x}", range(150))],
columns = [*map(lambda x: x.split(" (cm)")[0].replace(" ","_"), load_iris().feature_names)])
y = pd.Series(load_iris().target,
index = X.index,
name = "Species")
c = map_colors(y, mode=1, palette=palette, desat=desat)#y.map(lambda x:{0:"red",1:"green",2:"blue"}[x])
if noise is not None:
X_noise = pd.DataFrame(
np.random.RandomState(0).normal(size=(X.shape[0], noise)),
index=X_iris.index,
columns=[*map(lambda x:f"noise_{x}", range(noise))]
)
X = pd.concat([X, X_noise], axis=1)
return (X, y, c)
def dism2linkage(DF_dism, method="ward"):
"""
Input: A (m x m) dissimalrity Pandas DataFrame object where the diagonal is 0
Output: Hierarchical clustering encoded as a linkage matrix
Further reading:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.linkage.html
https://pypi.python.org/pypi/fastcluster
"""
#Linkage Matrix
Ar_dist = distance.squareform(DF_dism.as_matrix())
return linkage(Ar_dist,method=method)
# Get data
X_iris_with_noise, y_iris, c_iris = iris_data(50)
# Get distance matrix
df_dism = 1- X_iris_with_noise.corr().abs()
# Get linkage matrix
Z = dism2linkage(df_dism)
#Create dendrogram
with plt.style.context("seaborn-white"):
fig, ax = plt.subplots(figsize=(13,3))
D_dendro = dendrogram(
Z,
labels=df_dism.index,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
ax=ax
)
ax.set_ylabel("Distance")
I'm not sure this really constitutes a practical answer, but it does allow you to generate dendrograms with truncated hanging lines. The trick is to generate the plot as normal, then manipulate the resulting matplotlib plot to recreate the lines.
I couldn't get your example to work locally, so I've just created a dummy dataset.
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
a = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
b = np.random.multivariate_normal([0, 10], [[3, 1], [1, 4]], size=[5,])
X = np.concatenate((a, b),)
Z = linkage(X, 'ward')
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax)
The resulting plot is the usual long-arm dendrogram.
Now for the more interesting bit. A dendrogram is made up of a number of LineCollection objects (one for each colour). To update the lines we iterate through these, extracting the details about their constituent paths, modifying these to remove any lines reaching to a y of zero, and then recreating a LineCollection for these modified paths.
The updated path is then added to the axes, and the original is removed.
The one tricky part is determining what height to draw to instead of zero. Since we are iterating over each dendrograms path, we don't know which point came before — we basically have no idea where we are. However, we can exploit the fact that hanging lines hang vertically. Assuming there are no lines on the same x, we can look for the known other y values for a given x and use that as the basis for our new y when calculating. The downside is that in order to make sure we have this number, we have to pre-scan the data.
Note: If you can get dendrogram hanging lines on the same x, you would need to include the y and search for nearest y above this x to do this.
import numpy as np
from matplotlib.path import Path
from matplotlib.collections import LineCollection
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
dendrogram(Z, ax=ax);
for c in ax.collections[:]: # use [:] to get a copy, since we're adding to the same list
paths = []
for path in c.get_paths():
segments = []
y_at_x = {}
# Pre-pass over all elements, to find the lowest y value at each x value.
# we can use this to caculate where to cut our lines.
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
# Don't store if the y is zero, or if it's higher than the current low.
if y > 0 and y < y_at_x.get(x, np.inf):
y_at_x[x] = y
for n, seg in enumerate(path.iter_segments()):
x, y = seg[0]
if y == 0:
# If we know the last y at this x, use it - 0.5, limit > 0
y = max(0, y_at_x.get(x, 0) - 0.5)
segments.append([x,y])
paths.append(segments)
lc = LineCollection(paths, colors=c.get_colors()) # Recreate a LineCollection with the same params
ax.add_collection(lc)
ax.collections.remove(c) # Remove the original LineCollection
The resulting dendrogram looks like this:
I've plotted a 3-d mesh in Matlab by below little m-file:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
mu = 0;
sigma = sqrt(2)./n;
f = normcdf(x,mu,sigma);
mesh(x,n,f);
I am going to acquire the same result by utilization of Python and its corresponding modules, by below code snippet:
import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt
sigma = 1
def integrand(x, n):
return (n/(2*sigma*np.sqrt(np.pi)))*np.exp(-(n**2*x**2)/(4*sigma**2))
tt = np.linspace(0, 20, 2000)
nn = np.linspace(1, 100, 100)
T = np.zeros([len(tt), len(nn)])
for i,t in enumerate(tt):
for j,n in enumerate(nn):
T[i, j], _ = quad(integrand, -np.inf, t, args=(n,))
x, y = np.mgrid[0:20:0.01, 1:101:1]
plt.pcolormesh(x, y, T)
plt.show()
But the output of the Python is is considerably different with the Matlab one, and as a matter of fact is unacceptable.
I am afraid of wrong utilization of the functions just like linespace, enumerate or mgrid...
Does somebody have any idea about?!...
PS. Unfortunately, I couldn't insert the output plots within this thread...!
Best
..............................
Edit: I changed the linespace and mgrid intervals and replaced plot_surface method... The output is 3d now with the suitable accuracy and smoothness...
From what I see the equivalent solution would be:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
x, n = np.mgrid[0:20:0.01, 1:100:1]
mu = 0
sigma = np.sqrt(2)/n
f = norm.cdf(x, mu, sigma)
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(x, n, f, rstride=x.shape[0]//20, cstride=x.shape[1]//20, alpha=0.3)
plt.show()
Unfortunately 3D plotting with matplotlib is not as straight forward as with matlab.
Here is the plot from this code:
Your Matlab code generate 201 points through x:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
While your Python code generate only 20 points:
tt = np.linspace(0, 19, 20)
Maybe it's causing accuracy problems?
Try this code:
tt = np.linspace(0, 20, 201)
The seminal points to resolve the problem was:
1- Necessity of the equivalence regarding the provided dimensions of the linespace and mgrid functions...
2- Utilization of a mesh with more density to make a bee line into a high degree of smoothness...
3- Application of a 3d plotter function, like plot_surf...
The current code is totally valid...
I am trying to get an filled binary mask of a contour of this image.
I took a look this question SciPy Create 2D Polygon Mask; however it does not seem to like my set of data.
import numpy as np
from matplotlib.nxutils import points_inside_poly
nx, ny = 10, 10
poly_verts = [(1,1), (5,1), (5,9),(3,2),(1,1)]
# Create vertex coordinates for each grid cell...
# (<0,0> is at the top left of the grid in this system)
x, y = np.meshgrid(np.arange(nx), np.arange(ny))
x, y = x.flatten(), y.flatten()
points = np.vstack((x,y)).T
grid = points_inside_poly(points, poly_verts)
grid = grid.reshape((ny,nx))
print grid
I wonder if there is another way that I can try to return a binary mask or someone to explain the limitations of points_inside_poly
because it seems to end up something like this
I'm not sure what you're plotting at the end, but your example works for me:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.nxutils import points_inside_poly
from itertools import product, compress
pv = [(1,1),(5,1),(5,9),(3,2),(1,1)]
x, y = np.meshgrid(np.arange(10),np.arange(10))
x, y = x.flatten(), y.flatten()
xy = np.vstack((x,y)).T
grid = points_inside_poly(xy,pv)
xv, yv = zip(*pv)
xp, yp = zip(*compress(xy,grid))
plt.plot(xp,yp,'o',color='red',label='points')
plt.plot(xv,yv,'o',color='blue',label='vertices')
plt.xlim((0,10))
plt.ylim((0,10))
plt.legend()
plt.show()