Connect 2 points from separate graphs in python (matplotlib) - python

I am trying to plot a graph like the following and want to connect the points by lines. However, as you can see some of the points (above 0.04 on X axis) are partially overlapping and that does not allow us to represent the connection between them.
What I want to do is, make 2 separate graphs with 1 graph having all the points above 0.04 (so that it will be zoomed in and the points will be separated) and other one with just the one point in top left corner.
Note that, the size of the points also contains some meaning. So, I can not make the points smaller or larger in size. (unless the change is uniform across all the points)
What is the good way to do this? Is there any function in matplotlib providing such feature? Or is there any other python library apart from matplotlib where I can do this in a better way?

Edit Based on this post, a better solution than my previous one might be:
import matplotlib.pylab as pl
import matplotlib
import numpy as np
pl.close('all')
x = np.linspace(0.019, 0.021, 4)
y = np.linspace(0.09, 0.10, 4)
s = np.random.randint(10, 200, 4)
fig = pl.figure()
ax1=pl.subplot(121)
pl.scatter(x, y, s=s)
pl.xlim(0.01, 0.04)
pl.ylim(0.04, 0.12)
pl.xticks([0.01,0.02,0.03,0.04])
pl.yticks([0.04,0.06,0.08,0.10,0.12])
ax2=pl.subplot(122)
pl.scatter(x, y, s=s)
pl.xlim(0.018, 0.022)
pl.ylim(0.08, 0.11)
pl.xticks([0.018,0.020,0.022])
pl.yticks([0.08,0.095,0.11])
transFigure = fig.transFigure.inverted()
for i in range(x.size):
xy1 = transFigure.transform(ax1.transData.transform([x[i],y[i]]))
xy2 = transFigure.transform(ax2.transData.transform([x[i],y[i]]))
line = matplotlib.lines.Line2D((xy1[0],xy2[0]),(xy1[1],xy2[1]),
transform=fig.transFigure)
fig.lines.append(line)
The other (old) solution:
Interesting question. I came up with the "solution" below (although it ain't pretty...); it does an ax.transData.transform from the data coordinates to figure coordinates, and uses ax.annote to draw the arrows, but this solution unfortunately only works if you keep the figure dpi (dots per inch) equal to the figure ppi (points per inch).
If I can think of a better solution, I'll post it here.
import matplotlib.pylab as pl
import numpy as np
x = np.linspace(0.019, 0.021, 4)
y = np.linspace(0.09, 0.10, 4)
s = np.random.randint(10, 200, 4)
# Transform the data coordinates to figure (pixel) coordinates
def get_display_coordinates(x,y):
ax = pl.gca()
xd = np.zeros_like(x)
yd = np.zeros_like(y)
for i in range(x.size):
xd[i], yd[i] = ax.transData.transform([x[i], y[i]])
return xd, yd
pl.figure(dpi=72)
ax=pl.subplot(121)
sc=pl.scatter(x, y, s=s)
pl.xlim(0.01, 0.04)
pl.ylim(0.04, 0.12)
pl.xticks([0.01,0.02,0.03,0.04])
pl.yticks([0.04,0.06,0.08,0.10,0.12])
xd_1, yd_1 = get_display_coordinates(x,y)
ax=pl.subplot(122)
pl.scatter(x, y, s=s)
pl.xlim(0.018, 0.022)
pl.ylim(0.08, 0.11)
pl.xticks([0.018,0.020,0.022])
pl.yticks([0.08,0.095,0.11])
xd_2, yd_2 = get_display_coordinates(x,y)
for i in range(x.size):
ax.annotate("",
xy=(xd_2[i], yd_2[i]), xycoords='figure pixels',
xytext=(xd_1[i], yd_1[i]), textcoords='figure pixels',
arrowprops=dict(arrowstyle="->", connectionstyle="arc3"))
pl.savefig('test.png', dpi=72)

Related

How do I correctly implement contours of histograms with logscale binning in numpy/matplotlib

I am trying to plot contours of data that his been binned using numpy.hist2d, except the bins are set using numpy.logscale (equal binning in log space).
Unfortunately, this results in a strange behavior that I can't seem to resolve: the placement of the contours does not match the location of the points in x/y. I plot both the 2d histogram of the data, and the contours, and they do not overlap.
It looks like what is actually happening is the contours are being placed on the physical location of the plot in linear space where I expect them to be placed in log space.
It's a strange phenomenon that I think can be best described by the following plots, using identical data but binned in different ways.:
Here is a minimum working example to produce the logbinned data:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(loc=500, scale=100,size=10000)
y = np.random.normal(loc=600, scale=60, size=10000)
nbins = 50
bins = (np.logspace(np.log10(10),np.log10(1000),nbins),np.logspace(np.log10(10),np.log10(1000),nbins))
HH, xe, ye = np.histogram2d(x,y,bins=bins)
plt.hist2d(x,y,bins=bins,cmin=1);
grid = HH.transpose()
extent = np.array([xe.min(), xe.max(), ye.min(), ye.max()])
cs = plt.contourf(grid,2,extent=extent,extend='max',cmap='plasma',alpha=0.5,zorder=100)
plt.contour(grid,2,extent=extent,colors='k',zorder=100)
plt.yscale('log')
plt.xscale('log')
It's fairly clear what is happening -- the contour is getting misplaced do the scaling of the bins. I'd like to be able to plot the histogram and the contour here together.
If anyone has an idea of how to resolve this, that would be very helpful - thanks!
This is your problem:
cs = plt.contourf(grid,2,extent=extent,...)
You are passing in a single 2d array specifying the values of the histograms, but you aren't passing the x and y coordinates these data correspond to. By only passing in extent there's no way for pyplot to do anything other than assume that the underlying grid is uniform, stretched out to fit extent.
So instead what you have to do is to define x and y components for each value in grid. You have to think a bit how to do this, because you have (n, n)-shaped data and (n+1,)-shaped edges to go with it. We should probably choose the center of each bin to associate a data point with. So we need to find the midpoint of each bin, and pass those arrays to contour[f].
Something like this:
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
size = 10000
x = rng.normal(loc=500, scale=100, size=size)
y = rng.normal(loc=600, scale=60, size=size)
nbins = 50
bins = (np.geomspace(10, 1000, nbins),) * 2
HH, xe, ye = np.histogram2d(x, y, bins=bins)
fig, ax = plt.subplots()
ax.hist2d(x, y, bins=bins, cmin=1)
grid = HH.transpose()
# compute bin midpoints
midpoints = (xe[1:] + xe[:-1])/2, (ye[1:] + ye[:-1])/2
cs = ax.contourf(*midpoints, grid, levels=2, extend='max', cmap='plasma', alpha=0.5, zorder=100)
ax.contour(*midpoints, grid, levels=2, colors='k', zorder=100)
# these are a red herring during debugging:
#ax.set_yscale('log')
#ax.set_xscale('log')
(I've cleaned up your code a bit.)
Alternatively, if you want to avoid having those white strips at the top and edge, you can keep your bin edges, and pad your grid with zeros:
grid_padded = np.pad(grid, [(0, 1)])
cs = ax.contourf(xe, ye, grid_padded, levels=2, extend='max', cmap='plasma', alpha=0.5, zorder=100)
ax.contour(xe, ye, grid_padded, levels=2, colors='k', zorder=100)
This gives us something like
This seems prettier, but if you think about your data this is less exact, because your data points are shifted with respect to the bin coordinates they correspond to. If you look closely you can see the contours being shifted with respect to the output of hist2d. You could fix this by generating geomspaces with one more final value which you only use for this final plotting step, and again use the midpoints of these edges (complete with a last auxiliary one).

Using numpy arrays to count the number of points within the cells of a regular grid

I am working with a large number of 3D points, each with x,y,z values stored in numpy arrays.
For background, the points will always fall within a cylinder of fixed radius, and height = max z value of the points.
My objective is to split the bounding cylinder (or column if it is easier) into e.g. 1 m height strata, and then count the number of points within each cell
of a regular grid (e.g. 1 m x 1 m) overlaid on each strata.
Conceptually, the operation would be the same as overlaying a raster and counting the points intersecting each pixel.
The grid of cells can form a square or a disk, it doesn't matter.
After a lot of searching and reading, my current thinking is to use some combination of numpy.linspace and numpy.meshgrid to generate the vertices of each cell stored within an array and test each cell against each point to see if it is 'in'. This seems inefficient, especially when working with thousands of points.
The numpy / scipy suite seems well suited to the problem, but I have not found a solution yet. Any suggestions would be much appreciated.
I have included a few example points and some code to visualize the data.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# This plot is instructive to visualize the problem
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x_vals, y_vals, z_vals, c='b', marker='o')
plt.show()
I am not sure I understand perfectly what you are looking for, but since every "cell" seems to have a 1m side for all directions, couldn't you:
round all your values to integers (rasterize your data) probably with some floor function;
create a bijection from these integer coordinates to something more convenient with something like:
(64**2)*x + (64)*y + z # assuming all values are in [0,63]
You can put z rather at the beginning if you want to more easely focus on height later
compute the histogram of each "cell" (several functions from numpy/scipy or numpy can do it);
revert the bijection if needed (ie. know the "true" coordinates of each cell once the count is known)
Maybe I didn't understand well, but in case it helps...
Thanks #Baruchel. It turns out the n-dimensional histograms suggested by #DilithiumMatrix provides a fairly simple solution to the problem I posted. After some reading, here is my current solution for anyone else that faces a similar problem.
As this is my first Python/Numpy effort any improvements/suggestions, especially regarding performance, would be welcome. Thanks.
# Setup
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load in X,Y,Z values from a sub-sample of 10 points for testing
# XY Values are scaled to a reasonable point of origin
z_vals = np.array([3.08,4.46,0.27,2.40,0.48,0.21,0.31,3.28,4.09,1.75])
x_vals = np.array([22.88,20.00,20.36,24.11,40.48,29.08,36.02,29.14,32.20,18.96])
y_vals = np.array([31.31,25.04,31.86,41.81,38.23,31.57,42.65,18.09,35.78,31.78])
# Updated code below
# Variables needed for 2D,3D histograms
xmax, ymax, zmax = int(x_vals.max())+1, int(y_vals.max())+1, int(z_vals.max())+1
xmin, ymin, zmin = int(x_vals.min()), int(y_vals.min()), int(z_vals.min())
xrange, yrange, zrange = xmax-xmin, ymax-ymin, zmax-zmin
xedges = np.linspace(xmin, xmax, (xrange + 1), dtype=int)
yedges = np.linspace(ymin, ymax, (yrange + 1), dtype=int)
zedges = np.linspace(zmin, zmax, (zrange + 1), dtype=int)
# Make the 2D histogram
h2d, xedges, yedges = np.histogram2d(x_vals, y_vals, bins=(xedges, yedges))
assert np.count_nonzero(h2d) == len(x_vals), "Unclassified points in the array"
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
plt.imshow(h2d.transpose(), extent=extent, interpolation='none', origin='low')
# Transpose and origin must be used to make the array line up when using imshow, unsure why
# Plot settings, not sure yet on matplotlib update/override objects
plt.grid(b=True, which='both')
plt.xticks(xedges)
plt.yticks(yedges)
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.plot(x_vals, y_vals, 'ro')
plt.show()
# 3-dimensional histogram with 1 x 1 x 1 m bins. Produces point counts in each 1m3 cell.
xyzstack = np.stack([x_vals,y_vals,z_vals], axis=1)
h3d, Hedges = np.histogramdd(xyzstack, bins=(xedges, yedges, zedges))
assert np.count_nonzero(h3d) == len(x_vals), "Unclassified points in the array"
h3d.shape # Shape of the array should be same as the edge dimensions
testzbin = np.sum(np.logical_and(z_vals >= 1, z_vals < 2)) # Slice to test with
np.sum(h3d[:,:,1]) == testzbin # Test num points in second bins
np.sum(h3d, axis=2) # Sum of all vertical points above each x,y 'pixel'
# only in this example the h2d and np.sum(h3d,axis=2) arrays will match as no z bins have >1 points
# Remaining issue - how to get a r x c count of empty z bins.
# i.e. for each 'pixel' how many z bins contained no points?
# Possible solution is to reshape to use logical operators
count2d = h3d.reshape(xrange * yrange, zrange) # Maintain dimensions per num 3D cells defined
zerobins = (count2d == 0).sum(1)
zerobins.shape
# Get back to x,y grid with counts - ready for output as image with counts=pixel digital number
bincount_pixels = zerobins.reshape(xrange,yrange)
# Appears to work, perhaps there is a way without reshapeing?
PS if you are facing a similar problem scikit patch extraction looks like another possible solution.

Plot staggered histograms/lines as in FACS

My question is basically exaclt the same as this one but for matplotlib. I'm sure it has something to do with axes or subplots, but I don't think I fully understand those paradigms (a fuller explanation would be great).
As I loop through a set of comparisons, I'd like the base y value of each new plot to be set slightly below the previous one to get something like this:
One other (potential) wrinkle is that I'm generating these plots in a loop, so I don't necessarily know how many plots there will be at the outset. I think this is one of the things that I'm getting hung up on with subplots/axes, because it seems like you need to set them ahead of time.
Any ideas would be greatly appreciated.
EDIT: I made a little progress I think:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.random.random(100)
y = np.random.random(100)
fig = plt.figure()
ax = fig.add_axes([1,1,1,1])
ax2 = fig.add_axes([1.02,.9,1,1])
ax.plot(x, color='red')
ax.fill_between([i for i in range(len(x))], 0, x, color='red', alpha=0.5)
ax2.plot(y, color='green')
ax2.fill_between([i for i in range(len(y))], 0, y, color='green', alpha=0.5)
Gives me:
Which is close to what I want...
Is this the sort of thing you want?
What I did was define the y-distance between the baselines of each curve. For the ith curve, I calculated the minimum Y-value, then set that minimum to be i times the y-distance, adjusting the height of the entire curve accordingly. I used a decreasing z-order to ensure that the filled part of the curves were not obscured by the baselines.
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
delta_Y = .5
zorder = 0
for i, Y in enumerate(data):
baseline = min(Y)
#change needed for minimum of Y to be delta_Y above previous curve
y_change = delta_Y * i - baseline
Y = Y + y_change
plt.fill_between(np.linspace(0, 1000, 1000), Y, np.ones(1000) * delta_Y * i, zorder = zorder)
zorder -= 1
Code that generates dummy data:
def gauss(X):
return np.exp(-X**2 / 2.0)
#create data
X = np.linspace(-10, 10, 100)
data = []
for i in xrange(10):
arr = np.zeros(1000)
arr[i * 100: i * 100 + 100] = gauss(X)
data.append(arr)
data.reverse()
You could also look into installing JoyPy through:
pip install joypy
Pretty dynamic tool created by Leonardo Taccari, if what you are looking into is "stacked" distribution plots like so:
Example 1 - Joy Plot using JoyPy:
Example 2 - Joy Plot on Iris dataset:
Leonardo also has a neat description of the package and how to use it here.
Alternatively Seaborn has a package but I found it less easy to use.
Hope that helps!
So I managed to get a little bit farther by adding an additional Axes instance in each loop.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#instantiate data sets
x = np.random.random(100)
y = np.random.random(100)
z = np.random.random(100)
plots = [x, y, z]
fig = plt.figure()
#Sets the default vertical position
pos = 1
def making_plot(ax, p):
ax.plot(p)
# Prevents the background from covering over the earlier plots
ax.set_axis_bgcolor('none')
for p in plots:
ax = fig.add_axes([1,pos,1,1])
pos -= 0.3
making_plot(ax, p)
plt.show()
Clearly, I could spend more time making this prettier, but this does the job.

xticks and labels stuck on one side of the subplot

I am plotting 3 maps on one figure. For some reason when I go to label the xaxis the numbers are all crammed on one side of the plot. Is there anyone to space the values out?
for j in xrange(0,3):
data = mydatalist[j]
a.append(fig.add_subplot(3,2,j+1))]
m.append(Basemap(projection='mill', llcrnrlat=-90, urcrnrlat=90, \
llcrnrlon=30,urcrnrlon=390, resolution='c', ax=a[j]))
x=np.linspace(30,390,288)
y = np.linspace(-90, 90, 234)
x, y = np.meshgrid(x, y)
x, y = m[j](x,y)
cintervals = [-0.1,-0.09, -0.08, -0.07, -0.06,-0.05, -0.04, -0.03, -0.02,-0.01,\
0, 0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1]
mesh = m[j].contourf(x,y,data,cintervals, cmap=plt.cm.jet)
xlab=np.concatenate([np.arange(30,181,30),np.arange(-150,31,30)])
plt.xticks(np.linspace(30, 390, 13),xlab)
plt.tick_params(labelsize=8)
plt.show()
Your problem is with co-ordinate mismatch between map coordinates and lat / long
You assign your x ticks to be displayed along the x axis spaced according to
np.linspace(30, 390, 13)
However - if you look at your values in x (i.e. the actual x co-ordinates that you are plotting against in the contourf line), you see they run from 0 to 40030154.74248523.
To avoid this - replace
plt.xticks(np.linspace(30, 390, 13),xlab)
with
plt.xticks(np.linspace(min(x[0]),max(x[0]), len(xlab)),xlab)
Note - you can produce this effect with a lot smaller but complete example, which might have helped you to isolate the issue. Take a look at how to produce a Minimal, complete and verifiable example. As it stands, your code doesn't run as it is missing a, m, mydatalist and the required imports.
I've put in the code below that you might have provided - retaining the subplot loop - although in reality you will likely get the same effect even with just one plot, rather than subplots.
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
x=np.linspace(30,390,288)
y = np.linspace(-90, 90, 234)
xg, yg = np.meshgrid(x, y)
fig = plt.figure()
for j in xrange(0,3):
a = fig.add_subplot(3,2,j+1)
m = Basemap(projection='mill', llcrnrlat=-90, urcrnrlat=90, llcrnrlon=30,urcrnrlon=390, resolution='c', ax=a)
m.drawcoastlines() # Just put something on the map - doesn't need to be your complex contour plot
x, y = m(xg,yg)
#You can see the problem with using hard-coded 30,390 if you print this
#x=30 and x=390 are both in the lowest 0.001% of the x axis
#print x
xlab=np.concatenate([np.arange(30,181,30),np.arange(-150,31,30)])
plt.xticks(np.linspace(30,390,13),xlab)
#Working version commented below
#plt.xticks(np.linspace(min(x[0]),max(x[0]), len(xlab)),xlab)
plt.tick_params(labelsize=8)
plt.show()
Switching to a Gall Stereographic projection solved the problem for me, although i'm not sure why it does not work on a Miller projection.

Create a stack of polar plots using Matplotlib/Python

I need to generate a stack of 2D polar plots (a 3D cylindrical plot) so that I can view a distorted cylinder. I want to use matplotlib since I already have it installed and want to distribute my code to others who only have matplotlib. For example, say I have a bunch of 2-D arrays. Is there any way I can do this without having to download an external package? Here's my code.
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
A0 = 55.0
offset = 60.0
R = [116.225,115.105,114.697,115.008,115.908,117.184,118.61,119.998,121.224,122.216,\
122.93,123.323,123.343,122.948,122.134,120.963,119.575,118.165,116.941,116.074,115.66\
,115.706,116.154,116.913,117.894,119.029,120.261,121.518,122.684,123.594,124.059,\
123.917,123.096,121.661,119.821,117.894,116.225]
fig = plt.figure()
ax = fig.add_axes([0.1,0.1,0.8,0.8],polar=True) # Polar plot
ax.plot(theta,R,lw=2.5)
ax.set_rmax(1.5*(A0)+offset)
plt.show()
I have 10 more similar 2D polar plots and I want to stack them up nicely. If there's any better way to visualize a distorted cylinder in 3D, I'm totally open to suggestions. Any help would be appreciated. Thanks!
If you want to stack polar charts using matplotlib, one approach is to use the Axes3D module. You'll notice that I used polar coordinates first and then converted them back to Cartesian when I was ready to plot them.
from numpy import *
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
n = 1000
fig = plt.figure()
ax = fig.gca(projection='3d')
for k in linspace(0, 5, 5):
THETA = linspace(0, 2*pi, n)
R = ones(THETA.shape)*cos(THETA*k)
# Convert to Cartesian coordinates
X = R*cos(THETA)
Y = R*sin(THETA)
ax.plot(X, Y, k-2)
plt.show()
If you play with the last argument of ax.plot, it controls the height of each slice. For example, if you want to project all of your data down to a single axis you would use ax.plot(X, Y, 0). For a more exotic example, you can map the height of the data onto a function, say a saddle ax.plot(X, Y, -X**2+Y**2 ). By playing with the colors as well, you could in theory represent multiple 4 dimensional datasets (though I'm not sure how clear this would be). Examples below:

Categories