Plotting 4D information - python

Suppose that I have a function which takes in 2 real numbers x,y as input and outputs 2 real numbers w,z, i.e., myfunc(x,y)=w,z, so if I had a list of x,y points, then I would also have a list of w,z points. I want to be able to visualize this on plot. One way that I know how is to regard w,z as a point in 2d space and calculate the angle theta and intensity r (convert to polar coordinates) and use scatter plot where I represent the angle theta with a hue and intensity r with luminous. The following would be a pseudo-code in python
w,z = myfunc(x,y)
theta, r = cartesian2polar(w,z)
cmap = matplotlib.cm.hsv
my_cmap = convert cmap so that theta corresponds to a hue and r is the luminous
plt.scatter(x,y,c=my_cmap)
The problem with this is that the scatter plot is relatively slow when I have many data points. Is there anyway else to implement this but much more quickly? Maybe by using imshow, since my x,y points are actually obtained from meshgrid.
EDIT:
I found this post, which does exactly what I need.

The bottleneck is computing the cmap.
Could you generate the cmap once and for all? Perhaps could you lower the resolution on the cmap and, instead of having a continuous cmap, have a discrete one.

Related

Contour Plot of Binary Data (0 or 1)

I have x values, y values, and z values. The z values are either 0 or 1, essentially indicating whether an (x,y) pair is a threat (1) or not a threat (0).
I have been trying to plot a 2D contour plot using the matplotlib contourf. This seems to have been interpolating between my z values, which I don't want. So, I did a bit of searching and found that I could use pcolormesh to better plot binary data. However, I am still having some issues.
First, the colorbar of my pcolormesh plot doesn't show two distinct colors (white or red). Instead, it shows a full spectrum from white to red. See the attached plot for what I mean. How do I change this so that the colorbar only shows two colors, for 0 and 1? Second, is there a way to draw a grid of squares into the contour plot so that it is more clear for which x and y intervals the 0s and 1s are occurring. Third, my code calls for minorticks. However, these do not show up in the plot. Why?
The code which I use is shown here. The vels and ms for x and y can really be anything, and the threat_bin is just the corresponding 0 or 1 values for all the (vets,ms) pairs:
fig=plt.figure(figsize=(6,5))
ax2=fig.add_subplot(111)
from matplotlib import cm
XX,YY=np.meshgrid(vels, ms)
cp=ax2.pcolormesh(XX/1000.0,YY,threat_bin, cmap=cm.Reds)
ax2.minorticks_on()
ax2.set_ylabel('Initial Meteoroid Mass (kg)')
ax2.set_xlabel('Initial Meteoroid Velocity (km/s)')
ax2.set_yscale('log')
fig.colorbar(cp, ticks=[0,1], label='Threat Binary')
plt.show()
Please be simple with your recommendations, and let me know the code I should include or change with respect to what I have at the moment.

Creating a pseudo color plot with a linear and nonlinear axis and computing values based on the center of grid values

I have the equation: z(x,y)=1+x^(2/3)y^(-3/4)
I would like to calculate values of z for x=[0,100] and y=[10^1,10^4]. I will do this for 100 points in each axis direction. My grid, then, will be 100x100 points. In the x-direction I want the points spaced linearly. In the y-direction I want the points space logarithmically.
Were I to need these values I could easily go through the following:
x=np.linspace(0,100,100)
y=np.logspace(1,4,100)
z=np.zeros( (len(x), len(y)) )
for i in range(len(x)):
for j in range(len(y)):
z[i,j]=1+x[i]**(2/3)*y[j]**(-3/4)
The problem for me comes with visualizing these results. I know that I would need to create a grid of points. I feel my options are to create a meshgrid with the values and then use pcolor.
My issue here is that the values at the center of the block do not coincide with the calculated values. In the x-direction I could fix this by shifting the x-vector by half of dx (the step between successive values). I'm not so sure how I would do this for the y-axis. Furthermore, If I wanted to compute values for each of the y-direction values, including the end points, they would not all show up.
In the final visualization I would like to have the y-axis as a log scale and the x axis as a linear scale. I would also like the tick marks to fall in the center of the cells, correlating with the correct value. Can someone point me to the correct plotting functions for this. I have to resolve the issue using pcolor or pcolormesh.
Should you require more details, please let me know.
In current matplotlib, you can use pcolormesh with shading='nearest', and it will center the blocks with the values:
import matplotlib.pyplot as plt
y_plot = np.log10(y)
z[5, 5] = 0 # to make it more evident
plt.pcolormesh(x, y_plot, z, shading="nearest")
plt.colorbar()
ax = plt.gca()
ax.set_xticks(x)
ax.set_yticks(y_plot)
plt.axvline(x[5])
plt.axhline(y_plot[5])
Output:

Python - Interpolating a gap in scattered data points

I'm trying to interpolate a gap I have between data points. The data I have is 2 arrays of time and acceleration. The acceleration array consist of values that can be considered periodic. The original data points with the gap look like this:
data points with gap
I am trying to do the interpolation by using the scipy.interpolate.interp1d as illustrated below:
interpolation_func = interpolate.interp1d(time, acceleration,
kind='slinear')
new_time = np.arange(np.min(time), np.max(time), 0.1)
new_acc = interpolation_func(new_time)
plt.figure(2, figsize=(14, 8))
plt.title('Interpolated uncalibrated acceleration data')
plt.scatter(new_time, new_acc, c=new_time[:], s=1, vmin=np.min(new_time),
vmax=np.max(new_time))
plt.colorbar()
plt.xlabel('Time [s]')
plt.ylabel('Acceleration')
plot_fig2 = (output_folder + "kinematic_plot2.png")
plt.savefig(plot_fig2)
However, the result I'm getting is not accurate because I'm get a line that connects the last point from 1st group of scattered points, on the left side of the gap, and the first point from the 2nd group of points, on the right side of the gap. The wrong result looks like this:
Wrong result
I have tried other options from the scipy.interpolate.interp1d function, other than the kind slinear, but all of them would flatten the scattered points on both sides of the gap and fill in the gap with a polynomial graph, which is not what I need. Are there any options in python to interpolate the gap I have between the scattered points?

Plot 2 histograms with different length of data points in one graph using matplotlib

I have two set of data with one containing around 11 million data points and the another around 5000. I would like to plot them both on one histogram. But because of the difference in size I need to normalise the frequency so I can plot them on the same figure. Below I have simulated what I have done with my data to be able to plot them. I have used the normed=True.
from numpy.random import randn
import matplotlib.pyplot as plt
import random
datalist1=[]
for x in range(1,50000):
datalist1.append(random.uniform(1,2))
datalist2=randn(5000000)
fig= plt.figure(1)
plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
plt.xlabel("Value")
plt.ylabel("Normalised Frequency")
plt.legend()
plt.show()
Can you please tell me if this is a good way to get around this issue? I would like to match the tallest hight between the two histogram frequencies to be 1 (or 100%).
The normed=True setting normalizes the histogram to an area of 1. That gives the histogram an interpretation as estimates of probability density functions.
In short, it actually makes sense not to normalize on the peak but on the area.
But if you really want to normalize by height you can modify the polygon data of the histogram:
h = plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
h = plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
This solution feels a bit hackish, but at least it's quick and dirty :)

Scatterplot Contours In Matplotlib

I have a massive scatterplot (~100,000 points) that I'm generating in matplotlib. Each point has a location in this x/y space, and I'd like to generate contours containing certain percentiles of the total number of points.
Is there a function in matplotlib which will do this? I've looked into contour(), but I'd have to write my own function to work in this way.
Thanks!
Basically, you're wanting a density estimate of some sort. There multiple ways to do this:
Use a 2D histogram of some sort (e.g. matplotlib.pyplot.hist2d or matplotlib.pyplot.hexbin) (You could also display the results as contours--just use numpy.histogram2d and then contour the resulting array.)
Make a kernel-density estimate (KDE) and contour the results. A KDE is essentially a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins (usually in the shape of a gaussian "bell curve").
Using a 2D histogram is simple and easy to understand, but fundementally gives "blocky" results.
There are some wrinkles to doing the second one "correctly" (i.e. there's no one correct way). I won't go into the details here, but if you want to interpret the results statistically, you need to read up on it (particularly the bandwidth selection).
At any rate, here's an example of the differences. I'm going to plot each one similarly, so I won't use contours, but you could just as easily plot the 2D histogram or gaussian KDE using a contour plot:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kde
np.random.seed(1977)
# Generate 200 correlated x,y points
data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)
x, y = data.T
nbins = 20
fig, axes = plt.subplots(ncols=2, nrows=2, sharex=True, sharey=True)
axes[0, 0].set_title('Scatterplot')
axes[0, 0].plot(x, y, 'ko')
axes[0, 1].set_title('Hexbin plot')
axes[0, 1].hexbin(x, y, gridsize=nbins)
axes[1, 0].set_title('2D Histogram')
axes[1, 0].hist2d(x, y, bins=nbins)
# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents
k = kde.gaussian_kde(data.T)
xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
axes[1, 1].set_title('Gaussian KDE')
axes[1, 1].pcolormesh(xi, yi, zi.reshape(xi.shape))
fig.tight_layout()
plt.show()
One caveat: With very large numbers of points, scipy.stats.gaussian_kde will become very slow. It's fairly easy to speed it up by making an approximation--just take the 2D histogram and blur it with a guassian filter of the right radius and covariance. I can give an example if you'd like.
One other caveat: If you're doing this in a non-cartesian coordinate system, none of these methods apply! Getting density estimates on a spherical shell is a bit more complicated.
I have the same question.
If you want to plot contours, which contain some part of points you can use following algorithm:
create 2d histogram
h2, xedges, yedges = np.histogram2d(X, Y, bibs = [30, 30])
h2 is now 2d matrix containing integers which is number of points in some rectangle
hravel = np.sort(np.ravel(h2))[-1] #all possible cases for rectangles
hcumsum = np.sumsum(hravel)
ugly hack,
let give for every point in h2 2d matrix the cumulative number of points for rectangle which contain number of points equal or greater to that we analyze currently.
hunique = np.unique(hravel)
hsum = np.sum(h2)
for h in hunique:
h2[h2 == h] = hcumsum[np.argwhere(hravel == h)[-1]]/hsum
now plot contour for h2, it will be the contour which containing some amount of all points

Categories