Related
I would like to evaluate a 4d Gaussian / normal distribution on a 4d grid. Let's call the variables (x1,y1,x2,y2). Then if I have means = (x1=1,y1=0,x2=2,y2=0), I expect that when I do a 2d contour plot in the x1, x2 direction, at y1=y2=0, to see a Gaussian centered in (x1=1, x2=2). However, I see the mean/center at (x1=2,x2=0) instead.
What am I missing here? Is it how I define the grid to begin with?
For a 2d normal distribution it works as expected.
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import multivariate_normal
xy_min = -5
xy_max = 5
npoints = 50
x = np.linspace(xy_min, xy_max, npoints)
dim = 4
xx1,yy1,xx2,yy2 = np.meshgrid(x, x,x,x)
points = np.concatenate([xx1[:, :,:, :,None], yy1[:, :, :,:,None],xx2[:, :, :,:,None],yy2[:, :, :,:,None]], axis=-1)
cov = np.diag(np.ones(4))
mean=np.array([1,0,2,0])
rv = multivariate_normal.pdf(points , mean=mean, cov=cov)
plt.figure()
plt.contourf(x, x, rv[:,0,:,0])
I tried to manually reshape the evaluation points first, but it gives the same results. So I think I am missing something conceptually here?
points_resh = np.reshape(points,[npoints**4,dim],order='C')
rv_resh = multivariate_normal.pdf(points_resh , mean=mean, cov=cov)
rv2 = np.reshape(rv_resh,[npoints,npoints,npoints,npoints],order='C')
plt.figure()
plt.contourf(x, x, rv2[:,0,:,0])
** EDIT: SOLVED **
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
using ij indexing for meshgrid everything works as expected. Only need to keep in mind that the matrix needs to be transposed for contour plotting. See example below:
#%% Instead use ij indexing
x = np.linspace(-5, 5, 50)
y = np.linspace(-3, 3, 30)
z= np.linspace(-2, 2, 20)
w= np.linspace(-1, 1, 10)
x4d,y4d,z4d,w4d= np.meshgrid(x, y,z,w,indexing='ij')
points4d= np.concatenate([x4d[:, :,:,:,None], y4d[:, :,:,:,None], z4d[:, :,:,:,None],w4d[:, :,:,:,None]], axis=-1)
rv4d = multivariate_normal.pdf(points4d , mean=[1,0.0,2,0.0], cov=[0.1,0.1,0.1,0.1])
fig,ax=plt.subplots()
ax.contourf(x,z,rv4d[:,0,:,0].T)
ax.set(xlabel='x',ylabel='y')
print(x_mean)
I have a 2d numpy array that I want to plot so I can see how each category is positioned on the grid. The matrix (mat) looks something like this:
156 138 156
1300 137 156
138 138 1300
137 137 137
I plotted this as follows:
plt.imshow(mat, cmap='tab20', interpolation='none')
However, I want to have custom colors. I have a csv where the id's correspond with the values in the matrix:
id,R,G,B
156,200,200,200
138,170,255,245
137,208,130,40
1300,63,165,76
Is there a way I can have the values in the matrix correspond with the R, G, B values in the csv file?
Edit: someone asked for a clarification but the entire answer was deleted.
each row has an ID and a 3 columns, representing the respective R, G, and B values. So the first row has ID 156 (a domain specific code) with R 200, G 200 and B 200 (which is grey).
Now I have a 2d matrix that I want to plot, and on each coordinate where the value is 156 I want that pixel to be grey. Same with ID 1300, where the colors 63, 165, and 76 represent a green color that I want to use in the matrix.
Using a colormap
In principle the matrix with RGB values is some kind of colormap. It makes sense to use a colormap in matplotlib to get the colors for a plot. What makes this a little more complicated here is that the values are not well spaced. So one idea would be to map them to integers starting at 0 first. Then creating a colormap from those values and using it with a BoundaryNorm allows to have a equidistant colorbar. Finally one may set the ticklabels of the colorbar back to the initial values.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
a =np.array([[156, 138, 156],
[1300, 137, 156],
[138, 138, 1300],
[137, 137, 137]])
ca = np.array([[156,200,200,200],
[138,170,255,245],
[137,208,130,40],
[1300,63,165,76]])
u, ind = np.unique(a, return_inverse=True)
b = ind.reshape((a.shape))
colors = ca[ca[:,0].argsort()][:,1:]/255.
cmap = matplotlib.colors.ListedColormap(colors)
norm = matplotlib.colors.BoundaryNorm(np.arange(len(ca)+1)-0.5, len(ca))
plt.imshow(b, cmap=cmap, norm=norm)
cb = plt.colorbar(ticks=np.arange(len(ca)))
cb.ax.set_yticklabels(np.unique(ca[:,0]))
plt.show()
Plotting RGB array
You may create an RGB array from your data to directly plot as imshow. To this end you may index the original array with the colors from the color array and reshape the resulting array such that it is in the correct shape to be plotted with imshow.
import numpy as np
import matplotlib.pyplot as plt
a =np.array([[156, 138, 156],
[1300, 137, 156],
[138, 138, 1300],
[137, 137, 137]])
ca = np.array([[156,200,200,200],
[138,170,255,245],
[137,208,130,40],
[1300,63,165,76]])
u, ind = np.unique(a, return_inverse=True)
c = ca[ca[:,0].argsort()][:,1:]/255.
b = np.moveaxis(c[ind][:,:,np.newaxis],1,2).reshape((a.shape[0],a.shape[1],3))
plt.imshow(b)
plt.show()
The result is the same as above, but without colorbar (as there is no quantity to map here).
It's not particularly elegant, but it is simple
In [72]: import numpy as np
In [73]: import matplotlib.pyplot as plt
In [74]: a = np.mat("156 138 156;1300 137 156;138 138 1300;137 137 137")
In [75]: d = { 156: [200, 200, 200],
...: 138: [170, 255, 245],
...: 137: [208, 130, 40],
...: 1300: [63, 165, 76]}
In [76]: image = np.array([[d[val] for val in row] for row in a], dtype='B')
In [77]: plt.imshow(image);
The point is to generate an array of the correct dtype ('B' encodes short unsigned integer) containing the correct (and unpacked) RGB tuples.
Addendum
Following an exchange of comments following the original question in this Addendum I'll propose a possible solution to the problem of plotting the same type of data using plt.scatter() (the problem was a bit tougher than I expected...)
import numpy as np
import matplotlib.pyplot as plt
from random import choices, randrange
######## THIS IS FOR IMSHOW ######################################
# the like of my previous answer
values = [20,150,900,1200]
rgb = lambda x=255:(randrange(x), randrange(x), randrange(x))
colord = {v:rgb() for v in values}
nr, nc = 3, 5
data = np.array(choices(values, k=nr*nc)).reshape((nr,nc))
c = np.array([[colord[v] for v in row] for row in data], dtype='B')
######## THIS IS FOR SCATTER ######################################
# This is for having the coordinates of the scattered points, note that rows' indices
# map to y coordinates and columns' map to x coordinates
y, x = np.array([(i,j) for i in range(nr) for j in range(nc)]).T
# Scatter does not expect a 3D array of uints but a 2D array of RGB floats
c1 = (c/255.0).reshape(nr*nc,3)
######## THIS IS FOR PLOTTING ######################################
# two subplots, plot immediately the imshow
f, (ax1, ax2) = plt.subplots(nrows=2)
ax1.imshow(c)
# to make a side by side comparison we set the boundaries and aspect
# of the second plot to mimic imshow's
ax2.set_ylim(ax1.get_ylim())
ax2.set_xlim(ax1.get_xlim())
ax2.set_aspect(1)
# and finally plot the data --- the size of dots `s=900` was by trial and error
ax2.scatter(x, y, c=c1, s=900)
plt.show()
Pandas can help you to collect data:
im = pd.read_clipboard(header=None) # from your post
colours = pd.read_clipboard(index_col=0,sep=',') # from your post
Pandas helps also for the colormap :
colordf = colours.reindex(arange(1301)).fillna(0).astype(np.uint8)
And numpy.take build the image :
rgbim = colordf.values.take(im,axis=0))
plt.imshow(rgbim):
Using pandas and numpy, (Edit for n x m matrix):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 2
m = 2
df = pd.read_csv('matrix.txt')
id = df.id.values
id = np.reshape(id, (n, m))
R = df.R.values
R = np.reshape(R/255, (n, m))
G = df.R.values
G = np.reshape(G/255, (n, m))
B = df.B.values
B = np.reshape(B/255, (n, m))
img = []
for i in range(n):
img.append([])
for j in range(m):
img[i].append((R[i][j], G[i][j], B[i][j]))
plt.imshow(img)
plt.show()
I have a 2d numpy array that I want to plot so I can see how each category is positioned on the grid. The matrix (mat) looks something like this:
156 138 156
1300 137 156
138 138 1300
137 137 137
I plotted this as follows:
plt.imshow(mat, cmap='tab20', interpolation='none')
However, I want to have custom colors. I have a csv where the id's correspond with the values in the matrix:
id,R,G,B
156,200,200,200
138,170,255,245
137,208,130,40
1300,63,165,76
Is there a way I can have the values in the matrix correspond with the R, G, B values in the csv file?
Edit: someone asked for a clarification but the entire answer was deleted.
each row has an ID and a 3 columns, representing the respective R, G, and B values. So the first row has ID 156 (a domain specific code) with R 200, G 200 and B 200 (which is grey).
Now I have a 2d matrix that I want to plot, and on each coordinate where the value is 156 I want that pixel to be grey. Same with ID 1300, where the colors 63, 165, and 76 represent a green color that I want to use in the matrix.
Using a colormap
In principle the matrix with RGB values is some kind of colormap. It makes sense to use a colormap in matplotlib to get the colors for a plot. What makes this a little more complicated here is that the values are not well spaced. So one idea would be to map them to integers starting at 0 first. Then creating a colormap from those values and using it with a BoundaryNorm allows to have a equidistant colorbar. Finally one may set the ticklabels of the colorbar back to the initial values.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
a =np.array([[156, 138, 156],
[1300, 137, 156],
[138, 138, 1300],
[137, 137, 137]])
ca = np.array([[156,200,200,200],
[138,170,255,245],
[137,208,130,40],
[1300,63,165,76]])
u, ind = np.unique(a, return_inverse=True)
b = ind.reshape((a.shape))
colors = ca[ca[:,0].argsort()][:,1:]/255.
cmap = matplotlib.colors.ListedColormap(colors)
norm = matplotlib.colors.BoundaryNorm(np.arange(len(ca)+1)-0.5, len(ca))
plt.imshow(b, cmap=cmap, norm=norm)
cb = plt.colorbar(ticks=np.arange(len(ca)))
cb.ax.set_yticklabels(np.unique(ca[:,0]))
plt.show()
Plotting RGB array
You may create an RGB array from your data to directly plot as imshow. To this end you may index the original array with the colors from the color array and reshape the resulting array such that it is in the correct shape to be plotted with imshow.
import numpy as np
import matplotlib.pyplot as plt
a =np.array([[156, 138, 156],
[1300, 137, 156],
[138, 138, 1300],
[137, 137, 137]])
ca = np.array([[156,200,200,200],
[138,170,255,245],
[137,208,130,40],
[1300,63,165,76]])
u, ind = np.unique(a, return_inverse=True)
c = ca[ca[:,0].argsort()][:,1:]/255.
b = np.moveaxis(c[ind][:,:,np.newaxis],1,2).reshape((a.shape[0],a.shape[1],3))
plt.imshow(b)
plt.show()
The result is the same as above, but without colorbar (as there is no quantity to map here).
It's not particularly elegant, but it is simple
In [72]: import numpy as np
In [73]: import matplotlib.pyplot as plt
In [74]: a = np.mat("156 138 156;1300 137 156;138 138 1300;137 137 137")
In [75]: d = { 156: [200, 200, 200],
...: 138: [170, 255, 245],
...: 137: [208, 130, 40],
...: 1300: [63, 165, 76]}
In [76]: image = np.array([[d[val] for val in row] for row in a], dtype='B')
In [77]: plt.imshow(image);
The point is to generate an array of the correct dtype ('B' encodes short unsigned integer) containing the correct (and unpacked) RGB tuples.
Addendum
Following an exchange of comments following the original question in this Addendum I'll propose a possible solution to the problem of plotting the same type of data using plt.scatter() (the problem was a bit tougher than I expected...)
import numpy as np
import matplotlib.pyplot as plt
from random import choices, randrange
######## THIS IS FOR IMSHOW ######################################
# the like of my previous answer
values = [20,150,900,1200]
rgb = lambda x=255:(randrange(x), randrange(x), randrange(x))
colord = {v:rgb() for v in values}
nr, nc = 3, 5
data = np.array(choices(values, k=nr*nc)).reshape((nr,nc))
c = np.array([[colord[v] for v in row] for row in data], dtype='B')
######## THIS IS FOR SCATTER ######################################
# This is for having the coordinates of the scattered points, note that rows' indices
# map to y coordinates and columns' map to x coordinates
y, x = np.array([(i,j) for i in range(nr) for j in range(nc)]).T
# Scatter does not expect a 3D array of uints but a 2D array of RGB floats
c1 = (c/255.0).reshape(nr*nc,3)
######## THIS IS FOR PLOTTING ######################################
# two subplots, plot immediately the imshow
f, (ax1, ax2) = plt.subplots(nrows=2)
ax1.imshow(c)
# to make a side by side comparison we set the boundaries and aspect
# of the second plot to mimic imshow's
ax2.set_ylim(ax1.get_ylim())
ax2.set_xlim(ax1.get_xlim())
ax2.set_aspect(1)
# and finally plot the data --- the size of dots `s=900` was by trial and error
ax2.scatter(x, y, c=c1, s=900)
plt.show()
Pandas can help you to collect data:
im = pd.read_clipboard(header=None) # from your post
colours = pd.read_clipboard(index_col=0,sep=',') # from your post
Pandas helps also for the colormap :
colordf = colours.reindex(arange(1301)).fillna(0).astype(np.uint8)
And numpy.take build the image :
rgbim = colordf.values.take(im,axis=0))
plt.imshow(rgbim):
Using pandas and numpy, (Edit for n x m matrix):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 2
m = 2
df = pd.read_csv('matrix.txt')
id = df.id.values
id = np.reshape(id, (n, m))
R = df.R.values
R = np.reshape(R/255, (n, m))
G = df.R.values
G = np.reshape(G/255, (n, m))
B = df.B.values
B = np.reshape(B/255, (n, m))
img = []
for i in range(n):
img.append([])
for j in range(m):
img[i].append((R[i][j], G[i][j], B[i][j]))
plt.imshow(img)
plt.show()
I will try to specify what I say. Firstly show you the code:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y = np.random.random((2, 1000))
xbins = np.linspace(0, 1, 10)
ybins = np.linspace(0, 1, 10)
counts, _, _ = np.histogram2d(x, y, bins=(xbins, ybins))
print counts
You will get a two dimension array from this code, now if I have another array
z = np.random.random((2, 1000))
Then how to get a three dimension array of distribution from these three arrays. I tried:
zbins = np.linspace(0, 1, 10)
counts, _,_,_ = np.histogramdd(x, y, z, bins=(xbins, ybins, zbins))
But it's no use.
What's more, the really data file is too big to use loop statement, which will cost me hours to run it over, and it will not easy for me to check.
Thanks for thinking about the question!
I made the following code according to your last comment
import numpy as np
data = np.random.random((1000, 3))
nbins = 10
H, [bx, by, bz]=np.histogramdd(data, bins=(nbins,nbins,nbins),
range=((0,1),(0,1),(0,1)))
And H is the summary of the number of points in each grid. In your previous code, histogramdd was not used correctly. The input data is the first argument which should be N x 3 array in your case.
You can see the document of histogramdd here.
The griding the data (d) in irregular grid (x and y) using Scipy's griddata is timecomsuing when the datasets are many. But, the longitudes and latitudes (x and y) are always same, only the data (d) are changing. In this case, once using the giddata, how to repeat the procedure with different d arrys to achieve faster result?
import numpy as np, matplotlib.pyplot as plt
from scipy.interpolate import griddata
x = np.array([110, 112, 114, 115, 119, 120, 122, 124]).astype(float)
y = np.array([60, 61, 63, 67, 68, 70, 75, 81]).astype(float)
d = np.array([4, 6, 5, 3, 2, 1, 7, 9]).astype(float)
ulx, lrx = np.min(x), np.max(x)
uly, lry = np.max(y), np.min(y)
xi = np.linspace(ulx, lrx, 15)
yi = np.linspace(uly, lry, 15)
grided_data = griddata((x, y), d, (xi.reshape(1,-1), yi.reshape(-1,1)), method='nearest',fill_value=0)
plt.imshow(grided_data)
plt.show()
The above code works for one array of d.
But I have hundreds of other arrays.
griddata with nearest ends up using NearestNDInterpolator. That's a class that creates an iterator, which is called with the xi:
elif method == 'nearest':
ip = NearestNDInterpolator(points, values, rescale=rescale)
return ip(xi)
So you could create your own NearestNDInterpolator and call it with multiple times with different xi.
But I think in your case you want to change the values. Looking at the code for that class I see
self.tree = cKDTree(self.points)
self.values = y
the __call__ does:
dist, i = self.tree.query(xi)
return self.values[i]
I don't know the relative cost of creating the tree versus query.
So it should be easy to change values between uses of __call__. And it looks like values could have multiple columns, since it's just indexing on the 1st dimension.
This interpolator is simple enough that you could write your own using the same tree idea.
Here's a Nearest Interpolator that lets you repeat the interpolation for the same points, but different z values. I haven't done timings yet to see how much time it saves
class MyNearest(interpolate.NearestNDInterpolator):
# normal interpolation, but returns the near neighbor indices as well
def __call__(self, *args):
xi = interpolate.interpnd._ndim_coords_from_arrays(args, ndim=self.points.shape[1])
xi = self._check_call_shape(xi)
xi = self._scale_x(xi)
dist, i = self.tree.query(xi)
return i, self.values[i]
def my_griddata(points, values, method='linear', fill_value=np.nan,
rescale=False):
points = interpolate.interpnd._ndim_coords_from_arrays(points)
if points.ndim < 2:
ndim = points.ndim
else:
ndim = points.shape[-1]
assert(ndim==2)
# simplified call for 2d 'nearest'
ip = MyNearest(points, values, rescale=rescale)
return ip # ip(xi) # return iterator, not values
ip = my_griddata((xreg, yreg), z, method='nearest',fill_value=0)
print(ip)
xi = (xi.reshape(1,-1), yi.reshape(-1,1))
I, data = ip(xi)
print(data.shape)
print(I.shape)
print(np.allclose(z[I],data))
z1 = xreg+yreg # new z data
data = z1[I] # should show diagonal color bars
So as long as z has the same shape as before (and as xreg), z[I] will return the nearest value for each xi.
And it can interpolated 2d data as well (e.g. (225,n) shaped)
z1 = np.array([xreg+yreg, xreg-yreg]).T
print(z1.shape) # (225,2)
data = z1[I]
print(data.shape) # (20,20,2)