From 3D data to colormap [duplicate] - python

This question already has answers here:
Matplotlib: how to make imshow read x,y coordinates from other numpy arrays?
(4 answers)
Set Matplotlib colorbar size to match graph
(9 answers)
Closed 5 years ago.
PREAMBLE: I have seen these, but I can't figure out from the answer how to do the plot. Also, I'm new with python and matplotlib.
I have a data file of the form
X Y Z
0.05 1 z
0.10 1 z
... ... ...
0.95 1 z
0.05 2 z
... ... ...
... ... ...
0.95 10 z
with z in [-0.02:0.5] for each of them. These results in 190 (x,y,z) points.
I acquire the data in this way
data_file = open('tau.txt', 'r')
buffer = data_file.read()
data_file.close()
data = [map(float, row.split('\t')) for row
buffer.strip().split("\n")]
As the link suggests, I convert them into a grid
mu = []
alpha = []
tau = []
for elements in data:
mu.append(elements[0])
alpha.append(elements[1])
tau.append(elements[2])
x_data = np.asarray(mu)
y_data = np.asarray(alpha)
z_data = np.asarray(tau)
xi = np.linspace(0.05,0.95,19)
yi = np.linspace(1,10,10)
ar = griddata(x_data,y_data,z_data,xi,yi,interp='nn')
Then I do the plot: I would like this so that each (x,y) co-ordinate has a square centered on the co-ordinate, with a colorbar showing the z value.
cmap = mpl.colors.LinearSegmentedColormap.from_list('my_colormap',
['white','grey','black'],256)
img = plt.imshow(ar,interpolation='nearest',cmap =
cmap,origin='lower')
plt.colorbar(img,cmap=cmap)
I obtain this:
First of all, I want the colourbar to be of the same height of the plot itself. I can't understand how to avoid this trash.
Moreover, if you look at the file you immediately see that ranges are not right: x has to be in [0.05:0.95] and y in [1:10]. y is simply shifted of 1 (the white lines, with all z=0 should be for y=1 and not y=0), while x assumes values I can't understand.
I this is important to note that except for these, the plot is right, both in the z values and in the trend.
How can I fix my problem(s)?

imshow is rather used for plotting images and matrices using a grid the same size as your matrix or image. Thats why your x- and y- axis are that way.
For what you are trying to do use pcolormesh or pcolor
in combination with numpy.meshgrid to get the correct x and y spacing.
These functions should also support non-regular grid spacings.
This page has some information on how it works.

Related

plt.imshow() shows only one color

I am trying to plot a heatmap from a 2000x2000 NumPy array. I have tried every solution from this post and many others. I have tried many cmaps and interpolation combinations.
This is the code that prepares the data:
def parse_cords(cord: float):
cord = str(cord).split(".")
h_map[int(cord[0])][int(cord[1])] += 1
df["coordinate"] is a pandas series of floats x,y coordinate. x and y are ranging from 0 to 1999.
I have decided to modify the array so that values will range from 0 to 1, but I have tested the code also without changing the range.
h_map = np.zeros((2000, 2000), dtype='int')
cords = df["coordinate"].map(lambda cord: parse_cords(cord))
maximum = float(np.max(h_map))
precent = lambda x: x/maximum
h_map = precent(h_map)
h_map looks like this:
[[0.58396242 0.08840799 0.03153833 ... 0.00285187 0.00419393 0.06324442]
[0.09075658 0.11172622 0.01476262 ... 0.00134206 0.00687804 0.0082201 ]
[0.02986076 0.01862104 0.03959067 ... 0.00100654 0.00134206 0.00251636]
...
[0.00301963 0.00134206 0.00134206 ... 0.00100654 0.00150981 0.00553598]
[0.00419393 0.00268411 0.00100654 ... 0.00201309 0.00402617 0.01342057]
[0.05183694 0.00251636 0.00184533 ... 0.00301963 0.00838785 0.1016608 ]]
Now the plot:
fig, ax = plt.subplots(figsize=figsize)
ax = plt.imshow(h_map)
And result:
final plot
The result is always a heatmap with only a single color depending on the cmap used. Is my array just too big to be plotted like this or am I doing something wrong?
EDIT:
I have added plt.colorbar() and removed scaling from 0 to 1. The plot knows the range of data (0 to 5500) but assumes that every value is equal to 0.
I think that is because you only provide one color channel. Therefore, plt.imshow() interprets the data as black and white image. You could either add more channels or use a different function e.g. sns.heatmap().
from seaborn import sns

Plotting per-point alpha values in 3D scatterplot throws ValueError

I have data in form of a 3D array, with "intensities" at every point. Depending on the intensity, I want to plot the point with a higher alpha. There are a lot of low-value outliers, so color coding (with scalar floats) won't work since they eclipse the real data.
What I have tried:
#this generates a 3D array with higher values around the center
a = np.array([0,1,2,3,4,5,4,3,2,1])
aa = np.outer(a,a)
aaa = np.einsum("ij,jk,jl",aa,aa,aa)
x_,y_,z_,v_ = [],[],[],[]
from matplotlib.colors import to_rgb,to_rgba
for x in range(aaa.shape[0]):
for y in range(aaa.shape[1]):
for z in range(aaa.shape[2]):
x_.append(x)
y_.append(y)
z_.append(z)
v_.append(aaa[x,y,z])
r,g,b = to_rgb("blue")
color = np.array([[r,g,b,a] for a in v_])
fig = plt.figure()
ax = fig.add_subplot(projection = '3d')
ax.scatter(x_,y_,z_,c =color)
plt.show()
the scatterplot documentation says that color can be a 2D array of RGBA, which I do pass. Hoever when I try to run the code, I get the following error:
ValueError: 'c' argument has 4000 elements, which is inconsistent with 'x' and 'y' with size 1000.
I just found my own answer.
The "A 2D array in which the rows are RGB or RGBA." statement in the documentation was a bit confusing - one needs to convert the RGBA rows to RGBA objects first, so that list comprehension should read:
color = np.array([to_rgba([r,g,b,a]) for a in v_])

Center a colormap around 0 in Mayavi

I'm plotting a point cloud and coloring by residual error. I'd like the colormap to remain centered on 0, so that 0 error is white.
I see answers for matplotlib. What about Mayavi?
from mayavi import mlab
mlab.points3d(x, y, z, e, colormap='RdBu')
you can set the vmin and vmax of the colormap explicitly with mlab.points3d. So, you could just make sure that vmin = -vmax. Something like this:
mylimit = 10
mlab.points3d(x, y, z, e, colormap='RdBu',vmin=-mylimit,vmax=mylimit)
Or, you could set the limit automatically with something like:
mylimit = max(abs(e.min()),abs(e.max()))
In case anybody wishes to do this but so that the full extent of the colorbar is used, here is a solution I made (with help from here) for mayavi that stretches the colorbar so that the centre of it is on zero:
#Mayavi surface
s = mlab.surf(data)
#Get the lut table of the data
lut = s.module_manager.scalar_lut_manager.lut.table.asarray()
maxd = np.max(data)
mind = np.min(data)
#Data range
dran = maxd - mind
#Proportion of the data range at which the centred value lies
zdp = abs(mind / dran)
#The +0.5's here are because floats are rounded down when converted to ints
#index equal portion of distance along colormap
cmzi = int(zdp * 255 + 0.5)
#linspace from zero to 128, with number of points matching portion to side of zero
topi = np.linspace(0, 127, cmzi) + 0.5
#and for other side
boti = np.linspace(128, 255, 255 - cmzi) + 0.5
#convert these linspaces to ints and map the new lut from these
shift_index = np.hstack([topi.astype(int), boti.astype(int)])
s.module_manager.scalar_lut_manager.lut.table = self.lut[shift_index]
#Force update of the figure now that we have changed the LUT
mlab.draw()
Note that if you wish to do this multiple times for the same surface (ie. if you're modifying the mayavi scalars rather than redrawing the plot) you need to make a record of the initial lut table and modify that each time.

Python scatter plot 2 dimensional array

I'm trying to do something that I think should be pretty straight forward but I can't seem to get it to work.
I'm trying to plot 16 byte values measured over time to see how they change. I'm trying to use a scatter plot to do this with:
x axis being the measurement index
y axis being the index of the byte
and the color indicating the value of the byte.
I have the data stored in a numpy array where data[2][14] would give me the value of the 14th byte in the 2nd measurement.
Every time I try to plot this, I'm getting either:
ValueError: x and y must be the same size
IndexError: index 10 is out of bounds for axis 0 with size 10
Here is the sample test I'm using:
import numpy
import numpy.random as nprnd
import matplotlib.pyplot as plt
#generate random measurements
# 10 measurements of 16 byte values
x = numpy.arange(10)
y = numpy.arange(16)
test_data = nprnd.randint(low=0,high=65535, size=(10, 16))
#scatter plot the measurements with
# x - measurement index (0-9 in this case)
# y - byte value index (0-15 in this case)
# c = test_data[x,y]
plt.scatter(x,y,c=test_data[x][y])
plt.show()
I'm sure it is something stupid I'm doing wrong but I can't seem to figure out what.
Thanks for the help.
Try using a meshgrid to define your point locations, and don't forget to index into your NumPy array properly (with [x,y] rather than [x][y]):
x, y = numpy.meshgrid(x,y)
plt.scatter(x,y,c=test_data[x,y])
plt.show()

Two colour scatter plot in R or in python

I have a dataset of three columns and n number of rows. column 1 contains name, column 2 value1, and column 3 value2 (rank2).
I want to plot a scatter plot with the outlier values displaying names.
The R commands I am using in are:
tiff('scatterplot.tiff')
data<-read.table("scatterplot_data", header=T)
attach(data)
reg1<-lm(A~B)
plot(A,B,col="red")
abline(reg1)
outliers<-data[which(2^(data[,2]-data[,3]) >= 4 | 2^(data[,2]-data[,3]) <=0.25),]
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50)
dev.off()
and I get a figure like this:
What I want is the labels on the lower half should be of one colour and the labels in upper half should be of another colour say green and red respectively.
Any suggestions, or adjustment in the commands?
You already have a logical test that works to your satisfaction. Just use it in the color spec to text:
text(outliers[,2], outliers[,3],labels=outliers[,1],cex=0.50,
col=c("blue", "green")[
which(2^(data[,2]-data[,3]) >= 4 , 2^(data[,2]-data[,3]) <=0.25)] )
It's untested of course because you offered no test case, but my reasoning is that the which() function should return 1 for the differences >= 4, and 2 for the ones <= 0.25, and integer(0) for all the others and that this should give you the proper alignment of color choices with the 'outliers' vector.
Using python, matplotlib (pylab) to plot, and scipy, numpy to fit data. The trick with numpy is to create a index or mask to filter out the results that you want.
EDIT: Want to selectively color the top and bottom outliers? It's a simple combination of both masks that we created:
import scipy as sci
import numpy as np
import pylab as plt
# Create some data
N = 1000
X = np.random.normal(5,1,size=N)
Y = X + np.random.normal(0,5.5,size=N)/np.random.normal(5,.1)
NAMES = ["foo"]*1000 # Customize names here
# Fit a polynomial
(a,b)=sci.polyfit(X,Y,1)
# Find all points above the line
idx = (X*a + b) < Y
# Scatter according to that index
plt.scatter(X[idx],Y[idx], color='r')
plt.scatter(X[~idx],Y[~idx], color='g')
# Find top 10 outliers
err = ((X*a+b) - Y) ** 2
idx_L = np.argsort(err)[-10:]
for i in idx_L:
plt.text(X[i], Y[i], NAMES[i])
# Color the outliers purple or black
top = idx_L[idx[idx_L]]
bot = idx_L[~idx[idx_L]]
plt.scatter(X[top],Y[top], color='purple')
plt.scatter(X[bot],Y[bot], color='black')
XF = np.linspace(0,10,1000)
plt.plot(XF, XF*a + b, 'k--')
plt.axis('tight')
plt.show()

Categories