I have already binned data to plot a histogram. For this reason I'm using the plt.bar() function. I'd like to set both axes in the plot to a logarithmic scale.
If I set plt.bar(x, y, width=10, color='b', log=True) which lets me set the y-axis to log but I can't set the x-axis logarithmic.
I've tried plt.xscale('log') unfortunately this doesn't work right. The x-axis ticks vanish and the sizes of the bars don't have equal width.
I would be grateful for any help.
By default, the bars of a barplot have a width of 0.8. Therefore they appear larger for smaller x values on a logarithmic scale. If instead of specifying a constant width, one uses the distance between the bin edges and supplies this to the width argument, the bars will have the correct width. One would also need to set the align to "edge" for this to work.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
x = np.logspace(0, 5, num=21)
y = (np.sin(1.e-2*(x[:-1]-20))+3)**10
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), log=True,ec="k", align="edge")
ax.set_xscale("log")
plt.show()
I cannot reproduce missing ticklabels for a logarithmic scaling. This may be due to some settings in the code that are not shown in the question or due to the fact that an older matplotlib version is used. The example here works fine with matplotlib 2.0.
If the goal is to have equal width bars, assuming datapoints are not equidistant, then the most proper solution is to set width as
plt.bar(x, y, width=c*np.array(x), color='b', log=True) for a constant c appropriate for the plot. Alignment can be anything.
I know it is a very old question and you might have solved it but I've come to this post because I was with something like this but at the y axis and I manage to solve it just using ax.set_ylim(df['my data'].min()+100, df['my data'].max()+100). In y axis I have some sensible information which I thouhg the best way was to show in log scale but when I set log scale I couldn't see the numbers proper (as this post in x axis) so I just leave the idea of use log and use the min and max argment. It sets the scale of my graph much like as log. Still looking for another way for doesnt need use that -+100 at set_ylim.
While this does not actually use pyplot.bar, I think this method could be helpful in achieving what the OP is trying to do. I found this to be easier than trying to calibrate the width as a function of the log-scale, though it's more steps. Create a line collection whose width is independent of the chart scale.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.collections as coll
#Generate data and sort into bins
a = np.random.logseries(0.5, 1000)
hist, bin_edges = np.histogram(a, bins=20, density=False)
x = bin_edges[:-1] # remove the top-end from bin_edges to match dimensions of hist
lines = []
for i in range(len(x)):
pair=[(x[i],0), (x[i], hist[i])]
lines.append(pair)
linecoll = coll.LineCollection(lines, linewidths=10, linestyles='solid')
fig, ax = plt.subplots()
ax.add_collection(linecoll)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlim(min(x)/10,max(x)*10)
ax.set_ylim(0.1,1.1*max(hist)) #since this is an unweighted histogram, the logy doesn't make much sense.
Resulting plot - no frills
One drawback is that the "bars" will be centered, but this could be changed by offsetting the x-values by half of the linewidth value ... I think it would be
x_new = x + (linewidth/2)*10**round(np.log10(x),0).
Related
I would like to plot a series of curves in the same Axes each having a constant y offset from eachother. Because the data I have needs to be displayed in log scale, simply adding a y offset to each curve (as done here) does not give the desired output.
I have tried using matplotlib.transforms to achieve the same, i.e. artificially shifting the curve in Figure coordinates. This achieves the desired result, but requires adjusting the Axes y limits so that the shifted curves are visible. Here is an example to illustrate this, though such data would not require log scale to be visible:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1)
for i in range(1,19):
x, y = np.arange(200), np.random.rand(200)
dy = 0.5*i
shifted = mpl.transforms.offset_copy(ax.transData, y=dy, fig=fig, units='inches')
ax.set_xlim(0, 200)
ax.set_ylim(0.1, 1e20)
ax.set_yscale('log')
ax.plot(x, y, transform=shifted, c=mpl.cm.plasma(i/18), lw=2)
The problem is that to make all the shifted curves visible, I would need to adjust the ylim to a very high number, which compresses all the curves so that the features visible because of the log scale cannot be seen anymore.
Since the displayed y axis values are meaningless to me, is there any way to artificially extend the Axes limits to display all the curves, without having to make the Figure very large? Apparently this can be done with seaborn, but if possible I would like to stick to matplotlib.
EDIT:
This is the kind of data I need to plot (an X-ray diffraction pattern varying with temperature):
I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f
I'm trying to make plots that are formatted the same way despite coming from different datasets and I'm running into issues with getting consistent text positions and appropriate axis limits because the datasets are not scaled exactly the same. For example - say I generate the following elevation profile:
import matplotlib.pyplot as plt
import numpy as np
Distance=np.array([1000,3000,7000,15000,20000])
Elevation=np.array([100,200,350,800,400])
def MyPlot(X,Y):
fig = plt.figure()
ax = fig.add_subplot(111, aspect='equal')
ax.plot(X,Y)
fig.set_size_inches(fig.get_size_inches()*2)
ax.set_ylim(min(Y)-50, max(Y)+500)
ax.set_xlim(min(X)-50, max(X)+50)
MaxPoint=X[np.argmax(Y)], max(Y)
ax.scatter(MaxPoint[0], MaxPoint[1], s=10)
ax.text(MaxPoint[0], MaxPoint[1]+100, s='Maximum = '+str(MaxPoint[1]), fontsize=8)
MyPlot(Distance,Elevation)
And then I have another dataset that's scaled differently:
Distance2=Distance*4
Elevation2=Elevation*5
MyPlot(Distance2,Elevation2)][2]][2]
Because of the fact that a unit change is relatively much larger in the first dataset than the second dataset, the text and axis labels do not get formatted as I'd like in the 2nd plot. Is there a way to adjust text position and axis limits that adjusts to the relative scale of the dataset?
First off, for placing text with an offset such as this, you almost never want to use text. Instead, use annotate. The advantage is that you can give an offset of the text in points instead of data units.
Next, to reduce the density of tick locations, use ax.locator_params and change the nbins parameter. nbins controls the tick density. Tick locations will still be automatically chosen, but reducing nbins will reduce the maximum number of tick locations. If you do lower nbins, you may want to also change the numbers that matplotlib considers "even" when picking tick intervals. That way, you have more options to get the expected number of ticks.
Finally, to avoid manually setting limits with a set padding, consider using margins(some_percentage) to pad the extents by a percentage of the current limits.
To show a complete example of all:
import matplotlib.pyplot as plt
import numpy as np
distance=np.array([1000,3000,7000,15000,20000])
elevation=np.array([100,200,350,800,400])
def plot(x, y):
fig, ax = plt.subplots(figsize=(8, 2))
# Plot your data and place a marker at the peak location
maxpoint=x[np.argmax(y)], max(y)
ax.scatter(maxpoint[0], maxpoint[1], s=10)
ax.plot(x, y)
# Reduce the maximum number of ticks and give matplotlib more flexibility
# in the tick intervals it can choose.
# Essentially, this will more or less always have two ticks on the y-axis
# and 4 on the x-axis
ax.locator_params(axis='y', nbins=3, steps=range(1, 11))
ax.locator_params(axis='x', nbins=5, steps=range(1, 11))
# Annotate the peak location. The text will always be 5 points from the
# data location.
ax.annotate('Maximum = {:0.0f}'.format(maxpoint[1]), size=8,
xy=maxpoint, xytext=(5, 5), textcoords='offset points')
# Give ourselves lots of padding on the y-axis, less on the x
ax.margins(x=0.01, y=0.3)
ax.set_ylim(bottom=y.min())
# Set the aspect of the plot to be equal and add some x/y labels
ax.set(xlabel='Distance', ylabel='Elevation', aspect=1)
plt.show()
plot(distance,elevation)
And if we change the data:
plot(distance * 4, elevation * 5)
Finally, you might consider placing the annotation just above the top of the axis, instead of offset from the point:
ax.annotate('Maximum = {:0.0f}'.format(maxpoint[1]), ha='center',
size=8, xy=(maxpoint[0], 1), xytext=(0, 5),
textcoords='offset points',
xycoords=('data', 'axes fraction'))
May be you should use seaborn where no any borders. I think it's very good way.
It will be look like this:
you should write string import seaborn at the beginning of the script.
When I plot a function in matplotlib, the plot is framed by a rectangle. I want the ratio of the length and height of this rectangle to be given by the golden mean ,i.e., dx/dy=1.618033...
If the x and y scale are linear I found this solution using google
import numpy as np
import matplotlib.pyplot as pl
golden_mean = (np.sqrt(5)-1.0)/2.0
dy=pl.gca().get_ylim()[1]-pl.gca().get_ylim()[0]
dx=pl.gca().get_xlim()[1]-pl.gca().get_xlim()[0]
pl.gca().set_aspect((dx/dy)*golden_mean,adjustable='box')
If it is a log-log plot I came up with this solution
dy=np.abs(np.log10(pl.gca().get_ylim()[1])-np.log10(pl.gca().get_ylim()[0]))
dx=np.abs(np.log10(pl.gca().get_xlim()[1])-np.log10(pl.gca().get_xlim()[0]))
pl.gca().set_aspect((dx/dy)*golden_mean,adjustable='box')
However, for a semi-log plot, when I call set_aspect, I get
UserWarning: aspect is not supported for Axes with xscale=log, yscale=linear
Can anyone think of a work-around for this?
the most simple solution would be to log your data and then use the method for lin-lin.
you can then label the axes to let it look like a normal log-plot.
ticks = np.arange(min_logx, max_logx, 1)
ticklabels = [r"$10^{}$".format(tick) for tick in ticks]
pl.yticks(ticks, ticklabels)
if you have higher values than 10e9 you will need three pairs of braces, two pairs for the LaTeX braces and one for the .format()
ticklabels = [r"$10^{{{}}}$".format(tick) for tick in ticks]
Edit:
if you want also the ticks for 0.1ex ... 0.9ex, you want to use the minor ticks as well:
they need to be located at log10(1), log10(2), log10(3) ..., log10(10), log10(20) ...
you can create and set them with:
minor_ticks = []
for i in range(min_exponent, max_exponent):
for j in range(2,10):
minor_ticks.append(i+np.log10(j))
plt.gca().set_yticks(minor_labels, minor=True)
Changing the vertical distance between two subplot using tight_layout(h_pad=-1) changes the total figuresize. How can I define the figuresize using tight_layout?
Here is the code:
#define figure
pl.figure(figsize=(10, 6.25))
ax1=subplot(211)
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
subplot(212)
x=linspace(0,50)
pl.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
And here is the results:
If I write
pl.tight_layout(h_pad=-2)
in the last line, then I get this:
As you can see, the figure is bigger...
You can use a GridSpec object to control precisely width and height ratios, as answered on this thread and documented here.
Experimenting with your code, I could produce something like what you want, by using a height_ratio that assigns twice the space to the upper subplot, and increasing the h_pad parameter to the tight_layout call. This does not sound completely right, but maybe you can adjust this further ...
import numpy as np
from matplotlib.pyplot import *
import matplotlib.pyplot as pl
import matplotlib.gridspec as gridspec
#define figure
fig = pl.figure(figsize=(10, 6.25))
gs = gridspec.GridSpec(2, 1, height_ratios=[2,1])
ax1=subplot(gs[0])
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
ax2=subplot(gs[1])
x=np.linspace(0,50)
ax2.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
fig.tight_layout(h_pad=-5)
show()
There were other issues, like correcting the imports, adding numpy, and plotting to ax2 instead of directly with pl. The output I see is this:
This case is peculiar because of the fact that the default aspect ratios of images and plots are not the same. So it is worth noting for people looking to remove the spaces in a grid of subplots consisting of images only or of plots only that you may find an appropriate solution among the answers to this question (and those linked to it): How to remove the space between subplots in matplotlib.pyplot?.
The aspect ratios of the subplots in this particular example are as follows:
# Default aspect ratio of images:
ax1.get_aspect()
# 1.0
# Which is as it is expected based on the default settings in rcParams file:
matplotlib.rcParams['image.aspect']
# 'equal'
# Default aspect ratio of plots:
ax2.get_aspect()
# 'auto'
The size of ax1 and the space beneath it are adjusted automatically based on the number of pixels along the x-axis (i.e. width) so as to preserve the 'equal' aspect ratio while fitting both subplots within the figure. As you mentioned, using fig.tight_layout(h_pad=xxx) or the similar fig.set_constrained_layout_pads(hspace=xxx) is not a good option as this makes the figure larger.
To remove the gap while preserving the original figure size, you can use fig.subplots_adjust(hspace=xxx) or the equivalent plt.subplots(gridspec_kw=dict(hspace=xxx)), as shown in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
np.random.seed(1)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6.25),
gridspec_kw=dict(hspace=-0.206))
# For those not using plt.subplots, you can use this instead:
# fig.subplots_adjust(hspace=-0.206)
size = 50
ax1.imshow(np.random.random((10, size)))
ax1.xaxis.set_visible(False)
# Create plot of a line that is aligned with the image above
x = np.arange(0, size)
ax2.plot(x, x, 'k-')
ax2.set_xlim(ax1.get_xlim())
plt.show()
I am not aware of any way to define the appropriate hspace automatically so that the gap can be removed for any image width. As stated in the docstring for fig.subplots_adjust(), it corresponds to the height of the padding between subplots, as a fraction of the average axes height. So I attempted to compute hspace by dividing the gap between the subplots by the average height of both subplots like this:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Compute negative hspace to close the vertical gap between subplots
ax1_h = ax1_y1-ax1_y0
ax2_h = ax2_y1-ax2_y0
avg_h = (ax1_h+ax2_h)/2
gap = ax1_y0-ax2_y1
hspace=-(gap/avg_h) # this divided by 2 also does not work
fig.subplots_adjust(hspace=hspace)
Unfortunately, this does not work. Maybe someone else has a solution for this.
It is also worth mentioning that I tried removing the gap between subplots by editing the y positions like in this example:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Set new y positions: shift ax1 down over gap
gap = ax1_y0-ax2_y1
ax1.set_position([ax1_x0, ax1_y0-gap, ax1_x1, ax1_y1-gap])
ax2.set_position([ax2_x0, ax2_y0, ax2_x1, ax2_y1])
Unfortunately, this (and variations of this) produces seemingly unpredictable results, including a figure resizing similar to when using fig.tight_layout(). Maybe someone else has an explanation for what is happening here behind the scenes.