seaborn modify y axis (log scale) to show more values - python

I have plotted a boxplot using seaborn, the y axis is uses a log scale (time in miliseconds). I would like to make the y axis more clear by including more values on the axis. How could I achieve that? The code used and the graph generated are below.
ax2 = sns.boxplot(x="xVals", y="Time", data=df2, whis=[0, 100])
ax2.set(yscale="log")

Try
ax2.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax2.xaxis.set_minor_locator(ticker.MultipleLocator(1))
More info:
https://matplotlib.org/api/ticker_api.html

Related

Seaborn histplot uses weird y axis limits?

I'm iterating through all columns of my df to plot their densities to see if and how I need to transform/normalize my data. I'm using Seaborn and this code:
fig, axes = plt.subplots(nrows=n_rows, ncols=n_cols, figsize=(16,40))
fig.tight_layout() #othwerwise the plots overlapped each other and I couldn't see the column names
for i, column in enumerate(df.columns):
sns.histplot(df[column],ax=axes[i//n_cols,i%n_cols], kde=True, legend=True, fmt='g')
This results in a mostly okay graph, however the scaling of the y axis is waaay too big in some cases:
City 3 and 4 are just fine, however, the highest Count for City 4 is at around 200, yet the plot scales y until 10 000, which makes the data hard to interpret. The x axis also goes way beyond where it should, as the highest cost is at about 1000000, but the plot goes until 25000000. When I plot City 4 separately and force a ylim of 200 and xlim of 1000000 I get a much more understandable plot:
Why is the y axis (and actually, the x axis also) scaled so weirdly, and how can I change my code to scale it down so that I don't get a ylim much higher than the actually displayed data?
Thank you!
Set the shared_yaxis to False.
This will get the subplots to plot at the respective maximum points of the corresponding data.
Example:
fig, axes = plt.subplots(nrows=n_rows, ncols=n_cols, figsize=(16,40), sharey=False)

Plot two datasets at same position based on their index

I'm trying to plot two datasets (called Height and Temperature) on different y axes.
Both datasets have the same length.
Both datasets are linked together by a third dataset, RH.
I have tried to use matplotlib to plot the data using twiny() but I am struggling to align both datasets together on the same plot.
Here is the plot I want to align.
The horizontal black line on the figure is defined as the 0°C degree line that was found from Height and was used to test if both datasets, when plotted, would be aligned. They do not. There is a noticable difference between the black line and the 0°C tick from Temperature.
Rather than the two y axes changing independently from each other I would like to plot each index from Height and Temperature at the same y position on the plot.
Here is the code that I used to create the plot:
#Define number of subplots sharing y axis
f, ax1 = plt.subplots()
ax1.minorticks_on()
ax1.grid(which='major',axis='both',c='grey')
#Set axis parameters
ax1.set_ylabel('Height $(km)$')
ax1.set_ylim([np.nanmin(Height), np.nanmax(Height)])
#Plot RH
ax1.plot(RH, Height, label='Original', lw=0.5)
ax1.set_xlabel('RH $(\%)$')
ax2 = ax1.twinx()
ax2.plot(RH, Temperature, label='Original', lw=0.5, c='black')
ax2.set_ylabel('Temperature ($^\circ$C)')
ax2.set_ylim([np.nanmin(Temperature), np.nanmax(Temperature)])
Any help on this would be amazing. Thanks.
Maybe the atmosphere is wrong. :)
It sounds like you are trying to align the two y axes at particular values. Why are you doing this? The relationship of Height vs. Temperature is non-linear, so I think you are setting the stage for a confusing graph. Any particular line you plot can only be interpreted against one vertical axis.
If needed, I think you will be forced to "do some math" on the limits of the y axes. This link may be helpful:
align scales

Matplotlib: how to scale the y axis according to the y-value? [duplicate]

I'm trying to create a histogram of a data column and plot it logarithmically (y-axis) and I'm not sure why the following code does not work:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('foo.bar')
fig = plt.figure()
ax = fig.add_subplot(111)
plt.hist(data, bins=(23.0, 23.5,24.0,24.5,25.0,25.5,26.0,26.5,27.0,27.5,28.0))
ax.set_xlim(23.5, 28)
ax.set_ylim(0, 30)
ax.grid(True)
plt.yscale('log')
plt.show()
I've also tried instead of plt.yscale('log') adding Log=true in the plt.hist line and also I tried ax.set_yscale('log'), but nothing seems to work. I either get an empty plot, either the y-axis is indeed logarithmic (with the code as shown above), but there is no data plotted (no bins).
try
plt.yscale('log', nonposy='clip')
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.yscale
The issue is with the bottom of bars being at y=0 and the default is to mask out in-valid points (log(0) -> undefined) when doing the log transformation (there was discussion of changing this, but I don't remember which way it went) so when it tries to draw the rectangles for you bar plot, the bottom edge is masked out -> no rectangles.
The hist constructor accepts the log parameter.
You can do this:
plt.hist(data, bins=bins, log=True)
np.logspace returns bins in [1-10], logarithmically spaced - in my case xx is a npvector >0 so the following code does the trick
logbins=np.max(xx)*(np.logspace(0, 1, num=1000) - 1)/9
hh,ee=np.histogram(xx, density=True, bins=logbins)

Plotting KDE with logarithmic x-data in Matplotlib

I want to plot a KDE for some data with data that covers a large range in x-values. Therefore I want to use a logarithmic scale for the x-axis. For plotting I was using seaborn and the solution from Plotting 2D Kernel Density Estimation with Python, both of which fail once I set the xscale to logarithmic. When I take the logarithm of my x-data beforehand, everything looks fine, except the tics and ticlabels are still linear with the logarithm of the actual values as the labels. I could manually change the tics using something like:
labels = np.array(ax.get_xticks().tolist(), dtype=np.float64)
new_labels = [r'$10^{%.1f}$' % (labels[i]) for i in range(len(labels))]
ax.set_xticklabels(new_labels)
but in my eyes that looks just wrong and is nothing close to the axis labels (including the minor tics) when I would just use
ax.set_xscale('log')
Is there an easier way to plot a KDE with logarithmic x-data? Or is it possible to just change the tic- or label-scale without changing the scaling of the data, so that I could plot the logarithmic values of x and change the scaling of the labels afterwards?
Edit:
The plot I want to create looks like this:
The two right columns are what it is supposed to look like. There I used the the x data with the logarithm already applied. I don't like the labels on the x-axis, though.
The left column displays the plots, when the original data is used for the kde and all the other plots, and afterwards the scale is changed using
ax.set_xscale('log')
For some reason the kde, does not look like it is supposed to look. This is also not a result of erroneous data, since it looks just fine if the logarithmic data is used.
Edit 2:
A working example of code is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = np.random.multivariate_normal((0, 0), [[0.8, 0.05], [0.05, 0.7]], 100)
x = np.power(10, data[:, 0])
y = data[:, 1]
fig, ax = plt.subplots(2, 1)
sns.kdeplot(data=np.log10(x), data2=y, ax=ax[0])
sns.kdeplot(data=x, data2=y, ax=ax[1])
ax[1].set_xscale('log')
plt.show()
The ax[1] plot is not displayed correctly for me (the x-axis is inverted), but the general behavior is the same as for the case described above. I believe the problem lies with the bandwidth of the kde, which should probably account for the logarithmic x-data.
I found an answer that works for me and wanted to post it in case someone else has a similar problem.
Based on the accepted answer from this post, I defined a function that first applies the logarithm to the x-data and after the KDE was performed, transforms the x-values back to the original values. Afterwards I can simply plot the contours and use ax.set_xscale('log')
import numpy as np
import scipy.stats as st
def logx_kde(x, y, xmin, xmax, ymin, ymax):
x = np.log10(x)
# Peform the kernel density estimate
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
return np.power(10, xx), yy, f

Setting both axes logarithmic in bar plot matploblib

I have already binned data to plot a histogram. For this reason I'm using the plt.bar() function. I'd like to set both axes in the plot to a logarithmic scale.
If I set plt.bar(x, y, width=10, color='b', log=True) which lets me set the y-axis to log but I can't set the x-axis logarithmic.
I've tried plt.xscale('log') unfortunately this doesn't work right. The x-axis ticks vanish and the sizes of the bars don't have equal width.
I would be grateful for any help.
By default, the bars of a barplot have a width of 0.8. Therefore they appear larger for smaller x values on a logarithmic scale. If instead of specifying a constant width, one uses the distance between the bin edges and supplies this to the width argument, the bars will have the correct width. One would also need to set the align to "edge" for this to work.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(1)
x = np.logspace(0, 5, num=21)
y = (np.sin(1.e-2*(x[:-1]-20))+3)**10
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), log=True,ec="k", align="edge")
ax.set_xscale("log")
plt.show()
I cannot reproduce missing ticklabels for a logarithmic scaling. This may be due to some settings in the code that are not shown in the question or due to the fact that an older matplotlib version is used. The example here works fine with matplotlib 2.0.
If the goal is to have equal width bars, assuming datapoints are not equidistant, then the most proper solution is to set width as
plt.bar(x, y, width=c*np.array(x), color='b', log=True) for a constant c appropriate for the plot. Alignment can be anything.
I know it is a very old question and you might have solved it but I've come to this post because I was with something like this but at the y axis and I manage to solve it just using ax.set_ylim(df['my data'].min()+100, df['my data'].max()+100). In y axis I have some sensible information which I thouhg the best way was to show in log scale but when I set log scale I couldn't see the numbers proper (as this post in x axis) so I just leave the idea of use log and use the min and max argment. It sets the scale of my graph much like as log. Still looking for another way for doesnt need use that -+100 at set_ylim.
While this does not actually use pyplot.bar, I think this method could be helpful in achieving what the OP is trying to do. I found this to be easier than trying to calibrate the width as a function of the log-scale, though it's more steps. Create a line collection whose width is independent of the chart scale.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.collections as coll
#Generate data and sort into bins
a = np.random.logseries(0.5, 1000)
hist, bin_edges = np.histogram(a, bins=20, density=False)
x = bin_edges[:-1] # remove the top-end from bin_edges to match dimensions of hist
lines = []
for i in range(len(x)):
pair=[(x[i],0), (x[i], hist[i])]
lines.append(pair)
linecoll = coll.LineCollection(lines, linewidths=10, linestyles='solid')
fig, ax = plt.subplots()
ax.add_collection(linecoll)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_xlim(min(x)/10,max(x)*10)
ax.set_ylim(0.1,1.1*max(hist)) #since this is an unweighted histogram, the logy doesn't make much sense.
Resulting plot - no frills
One drawback is that the "bars" will be centered, but this could be changed by offsetting the x-values by half of the linewidth value ... I think it would be
x_new = x + (linewidth/2)*10**round(np.log10(x),0).

Categories