Seaborn jointplot axis on log scale with kind="hex"

Seaborn jointplot axis on log scale with kind="hex" - python

I'd like to show the chart below, but with the x-axis on a log scale.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('XY'))
sns.jointplot(data=df,x='X',y='Y',height=3,kind='hex')
To be clear, I don't want to log X first, rather I want the numbers to stay the same but the distance between the axis ticks to change. In altair, it would look like the following (I can't do hex in altair, although please correct me if I'm wrong on that):
EDIT: Matt suggested adding xscale="log". That gets me very nearly there. I just need a way to from powers to normal integers.

You can use the xscale="log" keyword argument, which gets passed to the Matplotlib hexbin function that is used under-the-hood by seaborn. E.g.,
sns.jointplot(data=df, x='X', y='Y' ,height=3, kind='hex')
As stated in the comments, there are various ways to set the axis tick labels to not be in scientific format. The simplest is to do:
import matplotlib.ticker as mticker
grid = sns.jointplot(data=df, x='X', y='Y', height=3, kind='hex', xscale="log")
grid.ax_joint.xaxis.set_major_formatter(mticker.ScalarFormatter())
If instead you want, e.g., 1000 to be formatted with a comma so that it is 1,000, you could instead do:
grid.ax_joint.xaxis.set_major_formatter(mticker.StrMethodFormatter("{x:,.0f}"))

Related

How to ensure even spacing between labels on x axis of matplotlib graph?

I have been given a data for which I need to find a histogram. So I used pandas hist() function and plot it using matplotlib. The code runs on a remote server so I cannot directly see it and hence I save the image. Here is what the image looks like
Here is my code below
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5) // raw_data is the data supplied to me
plt.savefig('/path/to/file.png')
plt.close()
As you can see the x axis labels are overlapping. So I used this function plt.tight_layout() like so
import matplotlib.pyplot as plt
df_hist = pd.DataFrame(np.array(raw_data)).hist(bins=5)
plt.tight_layout()
plt.savefig('/path/to/file.png')
plt.close()
There is some improvement now
But still the labels are too close. Is there a way to ensure the labels do not touch each other and there is fair spacing between them? Also I want to resize the image to make it smaller.
I checked the documentation here https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html but not sure which parameter to use for savefig.

Since raw_data is not already a pandas dataframe there's no need to turn it into one to do the plotting. Instead you can plot directly with matplotlib.
There are many different ways to achieve what you'd like. I'll start by setting up some data which looks similar to yours:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gamma
raw_data = gamma.rvs(a=1, scale=1e6, size=100)
If we go ahead and use matplotlib to create the histogram we may find the xticks too close together:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])
ax.hist(raw_data, bins=5)
fig.tight_layout()
The xticks are hard to read with all the zeros, regardless of spacing. So, one thing you may wish to do would be to use scientific formatting. This makes the x-axis much easier to interpret:
ax.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Another option, without using scientific formatting would be to rotate the ticks (as mentioned in the comments):
ax.tick_params(axis='x', rotation=45)
fig.tight_layout()
Finally, you also mentioned altering the size of the image. Note that this is best done when the figure is initialised. You can set the size of the figure with the figsize argument. The following would create a figure 5" wide and 3" in height:
fig, ax = plt.subplots(1, 1, figsize=[5, 3])

I think the two best fixes were mentioned by Pam in the comments.
You can rotate the labels with
plt.xticks(rotation=45
For more information, look here: Rotate axis text in python matplotlib
The real problem is too many zeros that don't provide any extra info. Numpy arrays are pretty easy to work with, so pd.DataFrame(np.array(raw_data)/1000).hist(bins=5) should get rid of three zeros off of both axes. Then just add a 'kilo' in the axes labels.
To change the size of the graph use rcParams.
from matplotlib import rcParams
rcParams['figure.figsize'] = 7, 5.75 #the numbers are the dimensions

MemoryError with numpy arange

I want to create an array of powers of 10 as a label for the y axis of a plot.
I am using the plt.yticks() with matplotlib imported as plt but this does not matter here anyway.
I have plots where as the y axis is varying from 1e3 to 1e15. Those are log plots.
Matplotlib is automatically displaying those with ticks with 1e2 steps and I want to have a step of 10 instead (in order to be able to use the minorticks properly).
I want to use the plt.yticks(numpy.arange(1e3, 1e15, 10)) command as said but numpy.arange(1e3, 1e15, 10) result in a MemoryError. Isn't it supposed to output an array of length 13? Why does the memory gets full?
How to overpass this issue and not build the array manually?
I also tried using built-in range but it won't work with floats.
Thank you.

Try the logspace from NumPy as
plt.yticks(numpy.logspace(3, 15, 13))
Here you give the starting and the last exponent (powers of 10) and the number of data-points in between. If you print the above mesh, you get the following
array([1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08, 1.e+09, 1.e+10,
1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15])

You could also just do:
10. ** np.arange(3,16)
that decimal point is important, as without it you will overflow the default int32 dtype for integers

An alternative way to do this, rather than explicitly defining the tick positions, is to use a LogLocator from the matplotlib.ticker module, and manually increase the number of ticks (by default, it will try to set a nice-looking number of ticks; i.e. so it doesn't look too cramped).
In this example, I set the number of ticks to 13 on the Axes on the right (using numticks=13), and you can see this increases the number of ticks so there is one on each integer power of 10.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
# Create figure and axes
fig, (ax1, ax2) = plt.subplots(ncols=2)
# Make yscale logarithmic
ax1.set_yscale('log')
ax2.set_yscale('log')
# Set y limits
ax1.set_ylim(1e3, 1e15)
ax2.set_ylim(1e3, 1e15)
# On ax2, lets tell the locator how many ticks we want
ax2.yaxis.set_major_locator(ticker.LogLocator(numticks=13))
ax1.set_title('default ticks')
ax2.set_title('LogLocator with numticks=13')
plt.show()
EDIT:
To add minor ticks with this method, we can use another LogLocator, and this time set the subs option to say where we want minor ticks in each decade. Here I haven't set minor ticks on every 0.1 because it would be too cramped, so just done for a subset. Note that if you set minor ticks like this, you also need to turn off tick labels for the minor ticks, using a NullFormatter.
ax2.yaxis.set_major_locator(ticker.LogLocator(numticks=13))
ax2.yaxis.set_minor_locator(ticker.LogLocator(subs=(0.2,0.4,0.6,0.8),numticks=13))
ax2.yaxis.set_minor_formatter(ticker.NullFormatter())

In this case the function logspace from numpy is more suitable.
The answer to the example is
np.logspace(3,15,num=15-3+1, endpoint=True)

Pyplot how to reduce xticks and xticklabels density?

I have to plot several curves with very high xtick density, say 1000 date strings. To prevent these tick labels overlapping each other I manually set them to be 60 dates apart. Code below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
fig = plt.figure(1)
ax = plt.subplot(1, 1, 1)
tick_spacing = 60
for i in range(5):
plt.plot(ts_index, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
fig.savefig(r".\net_value_curves.png", )
fig.clf()
I'm running this piece of code in PyCharm Community Edition 2017.2.2 with a Python 3.6 kernel. Now comes the funny thing: whenever I ran the code in the normal "run" mode (i.e. just hit the execution button and let the code run "freely" till interruption or termination), then the figure I got would always miss xticklabels:
However, if I ran the code in "debug" mode and ran it step by step then I would get an expected figure with complete xticklabels:
This is really weird. Anyway, I just hope to find a way that can ensure me getting the desired output (the second figure) in the normal "run" mode. How can I modify my current code to achieve this?
Thanks in advance!

Your x axis data are strings. Hence you will get one tick per data point. This is probably not what you want. Instead use the dates to plot. Because you are using pandas, this is easily converted,
dates = pd.to_datetime(ts_index, format="%Y%m%d")
You may then get rid of your manual xtick locating and formatting, because matplotlib will automatically choose some nice tick locations for you.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts_index = pd.period_range(start="20060429", periods=1000).strftime("%Y%m%d")
dates = pd.to_datetime(ts_index, format="%Y%m%d")
fig, ax = plt.subplots()
for i in range(5):
plt.plot(dates, 1 + i * 0.01 * np.arange(0, 1000), label="group %d"%i)
plt.legend(loc='best')
plt.title(r'net value curves')
plt.xticks(rotation="vertical")
plt.xlabel(r'date')
plt.ylabel('net value')
plt.grid(True)
plt.show()
However in case you do want to have some manual control over the locations and formats you may use matplotlib.dates locators and formatters.
# tick every 3 months
plt.gca().xaxis.set_major_locator(mdates.MonthLocator((1,4,7,10)))
# format as "%Y%m%d"
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y%m%d"))

In general, the Axis object computes and places ticks using a Locator object. Locators and Formatters are meant to be easily replaceable, with appropriate methods of Axis. The default Locator does not seem to be doing the trick for you so you can replace it with anything you want using axes.xaxis.set_major_locator. This problem is not complicated enough to write your own, so I would suggest that MaxNLocator fits your needs fairly well. Your example seems to work well with nbins=16 (which is what you have in the picture, since there are 17 ticks.
You need to add an import:
from matplotlib.ticker import MaxNLocator
You need to replace the block
xticks = ax.get_xticks()
xlabels = ax.get_xticklabels()
ax.set_xticks(xticks[::tick_spacing])
ax.set_xticklabels(xlabels[::tick_spacing])
with
ax.xaxis.set_major_locator(MaxNLocator(nbins=16))
or just
ax.xaxis.set_major_locator(MaxNLocator(16))
You may want to play around with the other arguments (all of which have to be keywords, except nbins). Pay especial attention to integer.
Note that for the Locator and Formatter APIs we work with an Axis object, not Axes. Axes is the whole plot, while Axis is the thing with the spines on it. Axes usually contains two Axis objects and all the other stuff in your plot.

You can set the visibility of the xticks-labels to False
for label in plt.gca().xaxis.get_ticklabels()[::N]:
label.set_visible(False)
This will set every Nth label invisible.

How to improve this seaborn countplot?

I used the following code to generate the countplot in python using seaborn:
sns.countplot( x='Genres', data=gn_s)
But I got the following output:
I can't see the items on x-axis clearly as they are overlapping. How can I correct that?
Also I would like all the items to be arranged in a decreasing order of count. How can I achieve that?

You can use choose the x-axis to be vertical, as an example:
g = sns.countplot( x='Genres', data=gn_s)
g.set_xticklabels(g.get_xticklabels(),rotation=90)
Or, you can also do:
plt.xticks(rotation=90)

Bring in matplotlib to set up an axis ahead of time, so that you can modify the axis tick labels by rotating them 90 degrees and/or changing font size. To arrange your samples in order, you need to modify the source. I assume you're starting with a pandas dataframe, so something like:
data = data.sort_values(by='Genres', ascending=False)
labels = # list of labels in the correct order, probably your data.index
fig, ax1 = plt.subplots(1,1)
sns.countplot( x='Genres', data=gn_s, ax=ax1)
ax1.set_xticklabels(labels, rotation=90)
would probably help.
edit Taking andrewnagyeb's suggestion from the comments to order the plot:
sns.countplot( x='Genres', data=gn_s, order = gn_s['Genres'].value_counts().index)

What is the correct way to replace matplotlib tick labels with computed values?

I have a figure with a log axis
and I would like to relabel the axis ticks with logs of the values, rather than the values themselves
The way I've accomplished this is with
plt.axes().set_xticklabels([math.log10(x) for x in plt.axes().get_xticks()])
but I wonder if there isn't a less convoluted way to do this.
What is the correct idiom for systematically relabeling ticks on matplotlib plots with values computed from the original tick values?

Look into the Formatter classes. Unless you are putting text on your ticks you should almost never directly use set_xticklabels or set_yticklabels. This completely de-couples your tick labels from you data. If you adjust the view limits, the tick labels will remain the same.
In your case, a formatter already exists for this:
fig, ax = plt.subplots()
ax.loglog(np.logspace(0, 5), np.logspace(0, 5)**2)
ax.xaxis.set_major_formatter(matplotlib.ticker.LogFormatterExponent())
matplotlib.ticker.LogFormatterExponent doc
In general you can use FuncFormatter. For an example of how to use FuncFomatter see matplotlib: change yaxis tick labels which one of many examples floating around SO.
A concise example for what you want, lifting exactly from JoeKington in the comments,:
ax.xaxis.set_major_formatter(
FuncFormatter(lambda x, pos: '{:0.1f}'.format(log10(x))))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn jointplot axis on log scale with kind="hex" - python

Related

How to ensure even spacing between labels on x axis of matplotlib graph?

MemoryError with numpy arange

Pyplot how to reduce xticks and xticklabels density?

How to improve this seaborn countplot?

What is the correct way to replace matplotlib tick labels with computed values?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Seaborn jointplot axis on log scale with kind="hex" - python

Related

How to ensure even spacing between labels on x axis of matplotlib graph?

MemoryError with numpy arange

Pyplot how to reduce xticks *and* xticklabels density?

How to improve this seaborn countplot?

What is the correct way to replace matplotlib tick labels with computed values?

Categories

Resources

Pyplot how to reduce xticks and xticklabels density?