I have a MxN (say, 1000x50) array. I want to plot each 50-point line onto the same plot, and have a heatmap of their density.
Simply doing a plt.pcolor(data) is not what I want, since I don't want to plot the matrix.
This is what I want to plot, but as I said it doesn't provide me with the heatmap I need.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(1000, 50)
fig, ax = plt.subplots()
for i in range(0,1000):
ax.plot(data[i], '.')
plt.show()
I would like a way of getting this together (I assume it will have something to do with histograms and binning?).
EDIT: simply adding an alpha value to the plot ( ax.plot(data[i], '.r', alpha=0.01)) achieves something similar to what I want. I would like, however, to have a heatmap with different colours.
As you already pointed out in your question, probably one of the simplest approaches involves histograms. A linear approximation of the histogram is probably enough for this application.
You can use np.histogram to calculate bin heights and edges and use scipy.interpolate.interp1d to obtain a function that provides an interpolation of the histogram. We can define a simple helper function to get the approximate density around each value in one column of the data array:
# import scipy.interpolate as interp
def get_density(vals, bins=30, kind="linear"):
y, bin_edges = np.histogram(vals, bins=bins, density=True)
x = (bin_edges[1:] + bin_edges[:-1])/2.
f = interp.interp1d(x, y, kind=kind, fill_value="extrapolate")
return f(vals)
Then you can use any colormap you want to map the density to a color value. The easiest way to go from here is to use plt.scatter instead of plot, where you can provide a specific color for every data point.
I would do something like this:
fig, ax = plt.subplots()
for i in range(data.shape[1]):
colors = plt.cm.viridis(get_density(data[:, i]))
ax.scatter(i*np.ones(data.shape[0]), data[:, i], c=colors, marker='.')
Related
I am trying to make a plot to specify gravitational redshift as a function of distance. However, i have a problem in plotting. I want to plot it from rs=1.0 because no object can be detectable within a schwarzshild radius, rs=1.0 in my case.
I tried to do mask but it was not working. Is there any method to do contour plot with the starting radius about at r>1?. Actually, in the above figure, I want to let my imshow to plot the amount of redshift from the blue solid circle, not at r=0 (i have no idea why it starts there).
import numpy as np
import matplotlib.pyplot as plt
rs=1
ang=np.linspace(0,2*np.pi,2000)
x, y = np.mgrid[2:100, 2:100]
dist = np.hypot(x, y) # Linear distance from point 0, 0
z = np.sqrt(1/dist)
f=1/np.sqrt((1-rs*z)/(1-rs/4))*(1/10)
plt.imshow(f, interpolation='bilinear')
a=np.cos(ang)
b=np.sin(ang)
plt.xlim(0,15)
plt.ylim(0,15)
plt.plot(a,b)
plt.colorbar()
plt.show()
I think there is a misunderstanding in the kind of plot. plt.imshow creates colormappings of 2D-arrays - but the scales of the axes are not showing the independant data variables, but only the indices of the array. This is different from e.g. plt.contourf.
In fact, your array f doesn't even have values at [x=1, y=1], as xand y start at 2...
Let's compare imshow and contourf:
fig, axs = plt.subplots(1, 2)
axs[0].imshow(f, interpolation='bilinear')
axs[0].set_xlim(0,15)
axs[0].set_ylim(0,15)
axs[1].contourf(x, y, f)
axs[1].set_aspect(1)
axs[1].set_xlim(0,15)
axs[1].set_ylim(0,15)
Or in other words: check the limits of your scales without setting xlim and ylim: they go from -0,5 to 97,5 instead of 2 to 99...
However, there are interesting kwargs of imshow for you.
Look what happens to the above plot with
axs[0].imshow(f, interpolation='bilinear', origin='lower', extent=[2, 99, 2, 99])
I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f
I have run numpy.histogram() on a bunch of subsets of a larger datasets. I want to separate the calculations from the graphical output, so I would prefer not to call matplotlib.pyplot.hist() on the data itself.
In principle, both of these functions take the same inputs: the raw data itself, before binning. The numpy version just returns the nbin+1 bin edges and nbin frequencies, whereas the matplotlib version goes on to make the plot itself.
So is there an easy way to generate the histograms from the numpy.histogram() output itself, without redoing the calculations (and having to save the inputs)?
To be clear, the numpy.histogram() output is a list of nbin+1 bin edges of nbin bins; there is no matplotlib routine which takes those as input.
You can plot the output of numpy.histogram using plt.bar.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
a = np.random.rayleigh(scale=3,size=100)
bins = np.arange(10)
frq, edges = np.histogram(a, bins)
fig, ax = plt.subplots()
ax.bar(edges[:-1], frq, width=np.diff(edges), edgecolor="black", align="edge")
plt.show()
New in matplotlib 3.4.0
It's no longer necessary to manually reconstruct a bar chart, as there's now a built-in method:
Use the new plt.stairs method for the common case where you know the values and edges of the steps, for instance when plotting the output of np.histogram.
Note that stairs are plotted as lines by default, so use fill=True for a solid histogram:
a = np.random.RandomState(1).rayleigh(3, size=100)
counts, edges = np.histogram(a, bins=range(10))
plt.stairs(counts, edges, fill=True)
If you want a more conventional "bar" aesthetic, combine with plt.vlines:
plt.stairs(counts, edges, fill=True)
plt.vlines(edges, 0, counts.max(), colors='w')
If you don't need the counts and edges, just unpack np.histogram directly into plt.stairs:
plt.stairs(*np.histogram(a), fill=True)
And as usual, there is an ax.stairs counterpart:
fig, ax = plt.subplots()
ax.stairs(*np.histogram(a), fill=True)
I have a user case that, let's say I have three series data: x,y,z.
I would like to make a scatter plot using (x,y) as coordinates and z as the color of scatter points, using cmap keyword of plt.scatter. However, I would like to highlight some specific point by using a different marker type and size than other points.
A minimum example is like below:
x,y,z = np.random.randn(3,10)
plt.scatter(x,y,c=z,cmap=matplotlib.cm.jet)
plt.colorbar()
If I want to use a different marker type for (x[5],y[5],z[5]), how could I do that?
The only way I can think of is to plot again for this point using plt.scatter([x[5],y[5]) but define the color by manually finding the colormap color corresponding to z[5]. However this is quite tedious. Is there a better way?
Each scatterplot has one single marker, you cannot by default use different markers in a single scatterplot. Hence, if you are happy to only change the markersize and leave the marker the same, you can supply an array of different sizes to the scatter's s argument.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(10)
x,y,z = np.random.randn(3,10)
sizes = [36]*len(x)
sizes[5] = 121
plt.scatter(x,y,c=z,s=sizes, cmap=plt.cm.jet)
plt.colorbar()
plt.show()
If you really need a different marker style, you can to plot a new scatter plot. You can then set the colorlimits of the second scatter to the ones from the first.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(10)
x,y,z = np.random.randn(3,10)
xs, ys, zs = [x[5]], [y[5]], [z[5]]
print xs, ys, zs
y[5] = np.nan
sc = plt.scatter(x,y,c=z,s=36, cmap=plt.cm.jet)
climx, climy = sc.get_clim()
plt.scatter(xs,ys,c=zs,s=121, marker="s", cmap=plt.cm.jet, vmin=climx, vmax=climy )
plt.colorbar()
plt.show()
Finally, a bit of a complicated solution to have several different markers in the same scatter plot would be given in this answer.
I've calculated some values representing a potential as a function of x,y using relaxation method. And I want to display a contour plot with colors (not lines) but, the examples at matplotlib are all fancy 3d plots. I have a ufinal object which is a 2 dimensional numpy array. I did see some nice answers with very nice plots here on SO but I wasn't able to use them properly with my data. I was able to plot a 3d plot using the examples but that's not what I need:
fig = plt.figure()
ax = fig.gca(projection='3d')
X,Y=meshgrid(x,y)
surf=ax.plot_surface(X,Y,ufinal,rstride=1,cstride=1,cmap=cm.jet,linewidth=0.1)
fig.colorbar(surf,shrink=0.5,aspect=5)
As suggested I've tried using the contourf example like so:
CS = plt.contourf(X, Y, ufinal,cmap=cm.jet)
plt.clabel(CS, inline=1, fontsize=10)
plt.title('Simplest default with labels')
As David said, use contourf:
import numpy as np
import pylab as pl
x,y = np.mgrid[:1:1E-3,:1:1E-3]
xs = ((x-0.3)**2.)
ys = ((y-0.5)**2.)
z = np.exp(-1*(xs/0.5+ys/0.3))
pl.contourf(x,y,z,20)
In case anyone is still interested I found a solution for the granularity (a.k.a. nice-looking-ness problem) as part of the solution over here:
Symmetrical Log color scale in matplotlib contourf plot