I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f
Related
I would like to plot a series of curves in the same Axes each having a constant y offset from eachother. Because the data I have needs to be displayed in log scale, simply adding a y offset to each curve (as done here) does not give the desired output.
I have tried using matplotlib.transforms to achieve the same, i.e. artificially shifting the curve in Figure coordinates. This achieves the desired result, but requires adjusting the Axes y limits so that the shifted curves are visible. Here is an example to illustrate this, though such data would not require log scale to be visible:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1)
for i in range(1,19):
x, y = np.arange(200), np.random.rand(200)
dy = 0.5*i
shifted = mpl.transforms.offset_copy(ax.transData, y=dy, fig=fig, units='inches')
ax.set_xlim(0, 200)
ax.set_ylim(0.1, 1e20)
ax.set_yscale('log')
ax.plot(x, y, transform=shifted, c=mpl.cm.plasma(i/18), lw=2)
The problem is that to make all the shifted curves visible, I would need to adjust the ylim to a very high number, which compresses all the curves so that the features visible because of the log scale cannot be seen anymore.
Since the displayed y axis values are meaningless to me, is there any way to artificially extend the Axes limits to display all the curves, without having to make the Figure very large? Apparently this can be done with seaborn, but if possible I would like to stick to matplotlib.
EDIT:
This is the kind of data I need to plot (an X-ray diffraction pattern varying with temperature):
I have a MxN (say, 1000x50) array. I want to plot each 50-point line onto the same plot, and have a heatmap of their density.
Simply doing a plt.pcolor(data) is not what I want, since I don't want to plot the matrix.
This is what I want to plot, but as I said it doesn't provide me with the heatmap I need.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(1000, 50)
fig, ax = plt.subplots()
for i in range(0,1000):
ax.plot(data[i], '.')
plt.show()
I would like a way of getting this together (I assume it will have something to do with histograms and binning?).
EDIT: simply adding an alpha value to the plot ( ax.plot(data[i], '.r', alpha=0.01)) achieves something similar to what I want. I would like, however, to have a heatmap with different colours.
As you already pointed out in your question, probably one of the simplest approaches involves histograms. A linear approximation of the histogram is probably enough for this application.
You can use np.histogram to calculate bin heights and edges and use scipy.interpolate.interp1d to obtain a function that provides an interpolation of the histogram. We can define a simple helper function to get the approximate density around each value in one column of the data array:
# import scipy.interpolate as interp
def get_density(vals, bins=30, kind="linear"):
y, bin_edges = np.histogram(vals, bins=bins, density=True)
x = (bin_edges[1:] + bin_edges[:-1])/2.
f = interp.interp1d(x, y, kind=kind, fill_value="extrapolate")
return f(vals)
Then you can use any colormap you want to map the density to a color value. The easiest way to go from here is to use plt.scatter instead of plot, where you can provide a specific color for every data point.
I would do something like this:
fig, ax = plt.subplots()
for i in range(data.shape[1]):
colors = plt.cm.viridis(get_density(data[:, i]))
ax.scatter(i*np.ones(data.shape[0]), data[:, i], c=colors, marker='.')
I have two sets of points with values (x, y). One is enormous (300k) and one is small (2k). I want to show a scatter plot of the latter over a 2D-histogram of the former in log-log scale. plt.xscale('log')-like commands keep messing up the histogram and when I just take logs of x's and y's and then do all the plotting, my ticks are say -3 not 10^-3 and the pretty logarithmic minor ticks are missing altogether. What's the most elegant solution in matplotlib? Do I have to dig into the artist layer?
If you forgive a bit of self-advertisement, you may use my library physt (see https://github.com/janpipek/physt). Then, you can write code like this:
import numpy as np
import matplotlib.pyplot as plt
from physt import h2
# Data
r1 = np.random.normal(0, 1, 20000)
r2 = np.random.normal(0, .3, 20000) + r1
x = np.exp(r1)
y = np.exp(r2)
# Plot scatter
fig, ax = plt.subplots()
ax.scatter(x[:1000], y[:1000], s=2)
H = h2(x, y, "exponential")
H.plot(ax=ax, zorder=-1) # Necessary to put behind
Which, I hope is the solution to your problem:
I have a set of data that when plotted most points congregate to the left of the x axis:
plt.plot(x, y, marker='o')
plt.title('Original')
plt.show()
ORIGINAL GRAPH
I want to use scipy to interpolate the data and later try to fit a quadratic line to the data. I am avoiding to simply fit a quadratic curve without interpolation since this will make the obtained curve biased towards the mass of data at one extreme end of the x axis. I tried this by using
f = interp1d(x, y, kind='quadratic')
# Array with points in between min(x) and max(x) for interpolation
x_interp = np.linspace(min(x), max(x), num=np.size(x))
# Plot graph with interpolation
plt.plot(x_interp, f(x_interp), marker='o')
plt.title('Interpolated')
plt.show()
and got INTERPOLATED GRAPH.
However, what I intend to get is something like this:
EXPECTED GRAPH
What am I doing wrong?
My values for x can be found here and values for y here.
Thank you!
Solution 1
I'm pretty sure this does what you want. It fits a second degree (quadratic) polynomial to your data, then plots that function on an evenly spaced array of x values ranging from the minimum to the maximum of your original x data.
new_x = np.linspace(min(x), max(x), num=np.size(x))
coefs = np.polyfit(x,y,2)
new_line = np.polyval(coefs, new_x)
Plotting it returns:
plt.scatter(x,y)
plt.scatter(new_x,new_line,c='g', marker='^', s=5)
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
if that wasn't what you meant...
However, from your question, it seems like you might be trying to force all your original y-values onto evenly spaced x-values (if that's not your intention, let me know, and I'll just delete this part).
This is also possible, there are lots of ways to do this, but I've done it here in pandas:
import pandas as pd
xy_df=pd.DataFrame({'x_orig': x, 'y_orig': y})
sorted_x_y=xy_df.sort_values('x_orig')
sorted_x_y['new_x'] = np.linspace(min(x), max(x), np.size(x))
plt.figure(figsize=[5,5])
plt.scatter(sorted_x_y['new_x'], sorted_x_y['y_orig'])
plt.xlim(min(x)-0.00001,max(x)+0.00001)
plt.xticks(rotation=90)
plt.tight_layout()
Which looks pretty different from your original data... which is why I think it might not be exactly what you're looking for.
I have already binned data to plot a histogram. For this reason I'm using the plt.bar() function. I'd like to set both axes in the plot to a logarithmic scale.
If I set plt.bar(x, y, width=10, color='b', log=True) which lets me set the y-axis to log but I can't set the x-axis logarithmic.
I've tried plt.xscale('log') unfortunately this doesn't work right. The x-axis ticks vanish and the sizes of the bars don't have equal width.
I would be grateful for any help.
By default, the bars of a barplot have a width of 0.8. Therefore they appear larger for smaller x values on a logarithmic scale. If instead of specifying a constant width, one uses the distance between the bin edges and supplies this to the width argument, the bars will have the correct width. One would also need to set the align to "edge" for this to work.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
x = np.logspace(0, 5, num=21)
y = (np.sin(1.e-2*(x[:-1]-20))+3)**10
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), log=True,ec="k", align="edge")
ax.set_xscale("log")
plt.show()
I cannot reproduce missing ticklabels for a logarithmic scaling. This may be due to some settings in the code that are not shown in the question or due to the fact that an older matplotlib version is used. The example here works fine with matplotlib 2.0.
If the goal is to have equal width bars, assuming datapoints are not equidistant, then the most proper solution is to set width as
plt.bar(x, y, width=c*np.array(x), color='b', log=True) for a constant c appropriate for the plot. Alignment can be anything.
I know it is a very old question and you might have solved it but I've come to this post because I was with something like this but at the y axis and I manage to solve it just using ax.set_ylim(df['my data'].min()+100, df['my data'].max()+100). In y axis I have some sensible information which I thouhg the best way was to show in log scale but when I set log scale I couldn't see the numbers proper (as this post in x axis) so I just leave the idea of use log and use the min and max argment. It sets the scale of my graph much like as log. Still looking for another way for doesnt need use that -+100 at set_ylim.
While this does not actually use pyplot.bar, I think this method could be helpful in achieving what the OP is trying to do. I found this to be easier than trying to calibrate the width as a function of the log-scale, though it's more steps. Create a line collection whose width is independent of the chart scale.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.collections as coll
#Generate data and sort into bins
a = np.random.logseries(0.5, 1000)
hist, bin_edges = np.histogram(a, bins=20, density=False)
x = bin_edges[:-1] # remove the top-end from bin_edges to match dimensions of hist
lines = []
for i in range(len(x)):
pair=[(x[i],0), (x[i], hist[i])]
lines.append(pair)
linecoll = coll.LineCollection(lines, linewidths=10, linestyles='solid')
fig, ax = plt.subplots()
ax.add_collection(linecoll)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlim(min(x)/10,max(x)*10)
ax.set_ylim(0.1,1.1*max(hist)) #since this is an unweighted histogram, the logy doesn't make much sense.
Resulting plot - no frills
One drawback is that the "bars" will be centered, but this could be changed by offsetting the x-values by half of the linewidth value ... I think it would be
x_new = x + (linewidth/2)*10**round(np.log10(x),0).