Scatter plot over 2D-histogram in matplotlib with log-scale - python

I have two sets of points with values (x, y). One is enormous (300k) and one is small (2k). I want to show a scatter plot of the latter over a 2D-histogram of the former in log-log scale. plt.xscale('log')-like commands keep messing up the histogram and when I just take logs of x's and y's and then do all the plotting, my ticks are say -3 not 10^-3 and the pretty logarithmic minor ticks are missing altogether. What's the most elegant solution in matplotlib? Do I have to dig into the artist layer?

If you forgive a bit of self-advertisement, you may use my library physt (see https://github.com/janpipek/physt). Then, you can write code like this:
import numpy as np
import matplotlib.pyplot as plt
from physt import h2
# Data
r1 = np.random.normal(0, 1, 20000)
r2 = np.random.normal(0, .3, 20000) + r1
x = np.exp(r1)
y = np.exp(r2)
# Plot scatter
fig, ax = plt.subplots()
ax.scatter(x[:1000], y[:1000], s=2)
H = h2(x, y, "exponential")
H.plot(ax=ax, zorder=-1) # Necessary to put behind
Which, I hope is the solution to your problem:

Related

Resizing Twinned axis for plotting of different type of data versus a common parameter

I am trying to plot some data. It is recorded as different parameters against a common parameter. For example, say time on X Axis and Temperature, Wind Speed etc. on Y Axis.
I have been using origin to do my plotting. In origin, such manipulations are carried out by use of Layers. Each axis on different layers can share X or Y axis from the host. And it can be resized so that the plot is drawn on some percentage of actual area. Please see the attached sketch.
I am trying to achieve something similar in python to shift to open-source. After some reading, I found out twinx() and twiny() are two axes functions that can clone/share an axis. Spines can be moved as required through spine['pos'].set_position().
My problem is that, although X axis is shared, I can not individually resize the Y axis of individual parameters. On using set_position, whole figure changes. Request the community for a solution. I am including a snippet of representative code.
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (8,11))
host = fig.add_axes([0.1,0.1,0.7,0.8])
host.plot([1,2,3],[1,4,9])
parm1 = host.twinx()
#parm1.set_position([0.1,0.1,0.8,0.5]) #this doesn't work
parm1.spines['right'].set_position(('axes',1.1))
..
..
I do not think there is an existing function in matplotlib that can do what you want in one call. However, with a little bit of work, you can achieve something like that:
import matplotlib.pyplot as plt
import numpy as np
fig, (ax, bx) = plt.subplots(nrows=2, ncols=1, sharex=True)
bx.yaxis.set_label_position('right')
bx.yaxis.tick_right()
ax.spines['bottom'].set_visible(False)
bx.spines['top'].set_visible(False)
fig.subplots_adjust(hspace=0)
bx.set_xlabel('X Axis')
ax.set_ylabel('Cosine')
bx.set_ylabel('Sine')
x = np.linspace(-np.pi, np.pi, 100)
ax.plot(x, np.cos(x), color='xkcd:avocado', linestyle='dashdot', linewidth=3)
bx.plot(x, np.sin(x), color='xkcd:purple', linestyle='dotted', linewidth=3)
plt.show()

Plotting KDE with logarithmic x-data in Matplotlib

I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f

Is it possible to test if the legend is covering any data in matplotlib/pyplot

Python beginner so apologies if incorrect terminology at any point.
I am using the legend(loc='best', ...) method and it works 99% of the time. However, when stacking more than 9 plots (i.e. i>9 in example below) on a single figure, with individual labels, it defaults to center and covers the data.
Is there a way to run a test in the script that will give a true/false value if the legend is covering any data points?
Very simplified code:
fig = plt.figure()
for i in data:
plt.plot(i[x, y], label=LABEL)
fig.legend(loc='best')
fig.savefig()
Example of legend covering data
One way is to add some extra space at the bottom/top/left or right side of the axis (in your case I would prefer top or bottom), by changing the limits slightly. Doing so makes the legend fit below the data. Add extra space by setting a different y-limit with ax.set_ylim(-3e-4, 1.5e-4) (the upper limit is approximately what it is in your figure and -3 is a estimate of what you need).
What you also need to do is to add split the legend into more columns, with the keyword ncol=N when creating the legend.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.linspace(0, 1, 100)
y = 3.5 * x - 2
for i in range(9):
ax.plot(x, y + i / 10., label='iiiiiiiiiiii={}'.format(i))
ax.set_ylim(-3, 1.5)
ax.legend(loc='lower center', ncol=3) # ncol=3 looked nice for me, maybe you need to change this
plt.show()
EDIT
Another solution is to put the legend in a separate axis like I do in the code below. The data-plot does not need to care about making space for the legend or anything and you should have enough space in the axis below to put all your line-labels. If you need more space, you can easily change the ratio of the upper axis to the lower axis.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(211)
ax_leg = fig.add_subplot(212)
x = np.linspace(0, 1, 100)
y = 3.5 * x - 2
lines = []
for i in range(9): #for plotting the actual data
li, = ax.plot(x, y + i / 10., label='iiiiiiiiiiii={}'.format(i))
lines.append(li)
for line in lines: # just to make the legend plot
ax_leg.plot([], [], line.get_color(), label=line.get_label())
ax_leg.legend(loc='center', ncol=3, ) # ncol=3 looked nice for me, maybe you need to change this
ax_leg.axis('off')
fig.show()

Matplotlib: multiple 3D lines all get drawn using the final y-value in my loop

I am trying to plot multiple lines in a 3D figure. Each line represents a month: I want them displayed parallel in the y-direction.
My plan was to loop over a set of Y values, but I cannot make this work properly, as using the ax.plot command (see working code below) produces a dozen lines all at the position of the final Y value. Confusingly, swapping ax.plot for ax.scatter does produce a set of parallel lines of data (albeit in the form of a set of dots; ax.view_init set to best display the parallel aspect of the result).
How can I use a produce a plot with multiple parallel lines?
My current workaround is to replace the loop with a dozen different arrays of Y values, and that can't be the right answer.
from mpl_toolkits.mplot3d.axes3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
# preamble
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
cs = ['r','g','b','y','r','g','b','y','r','g','b','y']
# x axis
X = np.arange(24)
# y axis
y = np.array([15,45,75,105,135,165,195,225,255,285,315,345])
Y = np.zeros(24)
# data - plotted against z axis
Z = np.random.rand(24)
# populate figure
for step in range(0,12):
Y[:] = y[step]
# ax.plot(X,Y,Z, color=cs[step])
ax.scatter(X,Y,Z, color=cs[step])
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
# set initial view of plot
ax.view_init(elev=80., azim=345.)
plt.show()
I'm still learning python, so simple solutions (or, preferably, those with copious explanatory comments) are greatly appreciated.
Use
ax.plot(X, np.array(Y), Z, color=cs[step])
or
Y = [y[step]] * 24
This looks like a bug in mpl where we are not copying data when you hand it in so each line is sharing the same np.array object so when you update it all of your lines.

Create a stack of polar plots using Matplotlib/Python

I need to generate a stack of 2D polar plots (a 3D cylindrical plot) so that I can view a distorted cylinder. I want to use matplotlib since I already have it installed and want to distribute my code to others who only have matplotlib. For example, say I have a bunch of 2-D arrays. Is there any way I can do this without having to download an external package? Here's my code.
#!usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-180.0,190.0,10)
theta = (np.pi/180.0 )*x # in radians
A0 = 55.0
offset = 60.0
R = [116.225,115.105,114.697,115.008,115.908,117.184,118.61,119.998,121.224,122.216,\
122.93,123.323,123.343,122.948,122.134,120.963,119.575,118.165,116.941,116.074,115.66\
,115.706,116.154,116.913,117.894,119.029,120.261,121.518,122.684,123.594,124.059,\
123.917,123.096,121.661,119.821,117.894,116.225]
fig = plt.figure()
ax = fig.add_axes([0.1,0.1,0.8,0.8],polar=True) # Polar plot
ax.plot(theta,R,lw=2.5)
ax.set_rmax(1.5*(A0)+offset)
plt.show()
I have 10 more similar 2D polar plots and I want to stack them up nicely. If there's any better way to visualize a distorted cylinder in 3D, I'm totally open to suggestions. Any help would be appreciated. Thanks!
If you want to stack polar charts using matplotlib, one approach is to use the Axes3D module. You'll notice that I used polar coordinates first and then converted them back to Cartesian when I was ready to plot them.
from numpy import *
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
n = 1000
fig = plt.figure()
ax = fig.gca(projection='3d')
for k in linspace(0, 5, 5):
THETA = linspace(0, 2*pi, n)
R = ones(THETA.shape)*cos(THETA*k)
# Convert to Cartesian coordinates
X = R*cos(THETA)
Y = R*sin(THETA)
ax.plot(X, Y, k-2)
plt.show()
If you play with the last argument of ax.plot, it controls the height of each slice. For example, if you want to project all of your data down to a single axis you would use ax.plot(X, Y, 0). For a more exotic example, you can map the height of the data onto a function, say a saddle ax.plot(X, Y, -X**2+Y**2 ). By playing with the colors as well, you could in theory represent multiple 4 dimensional datasets (though I'm not sure how clear this would be). Examples below:

Categories