Use locale with seaborn - python

Currently I'm trying to visualize some data I am working on with seaborn. I need to use a comma as decimal separator, so I was thinking about simply changing the locale. I found this answer to a similar question, which sets the locale and uses matplotlib to plot some data.
This also works for me, but when using seaborn instead of matplotlib directly, it doesn't use the locale anymore. Unfortunately, I can't find any setting to change in seaborn or any other workaround. Is there a way?
Here some exemplary data. Note that I had to use 'german' instead of "de_DE". The xlabels all use the standard point as decimal separator.
import locale
# Set to German locale to get comma decimal separator
locale.setlocale(locale.LC_NUMERIC, 'german')
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Tell matplotlib to use the locale we set above
plt.rcParams['axes.formatter.use_locale'] = True
df = pd.DataFrame([[1,2,3],[4,5,6]]).T
df.columns = [0.3,0.7]
sns.boxplot(data=df)

The "numbers" shown on the x axis for such boxplots are determined via a
matplotlib.ticker.FixedFormatter (find out via print(ax.xaxis.get_major_formatter())).
This fixed formatter just puts labels on ticks one by one from a list of labels. This makes sense because your boxes are positionned at 0 and 1, yet you want them to be labeled as 0.3, 0.7. I suppose this concept becomes clearer when thinking about what should happen for a dataframe with df.columns=["apple","banana"].
So the FixedFormatter ignores the locale, because it just takes the labels as they are. The solution I would propose here (although some of those in the comments are equally valid) would be to format the labels yourself.
ax.set_xticklabels(["{:n}".format(l) for l in df.columns])
The n format here is just the same as the usual g, but takes into account the locale. (See python format mini language). Of course using any other format of choice is equally possible. Also note that setting the labels here via ax.set_xticklabels only works because of the fixed locations used by boxplot. For other types of plots with continuous axes, this would not be recommended, and instead the concepts from the linked answers should be used.
Complete code:
import locale
# Set to German locale to get comma decimal separator
locale.setlocale(locale.LC_NUMERIC, 'german')
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([[1,2,3],[4,5,6]]).T
df.columns = [0.3,0.7]
ax = sns.boxplot(data=df)
ax.set_xticklabels(["{:n}".format(l) for l in df.columns])
plt.show()

Related

In a pairplot, how can I not show confidence intervals but display grid lines instead? [duplicate]

I'm plotting two data series with Pandas with seaborn imported. Ideally I would like the horizontal grid lines shared between both the left and the right y-axis, but I'm under the impression that this is hard to do.
As a compromise I would like to remove the grid lines all together. The following code however produces the horizontal gridlines for the secondary y-axis.
import pandas as pd
import numpy as np
import seaborn as sns
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'],grid=False)
You can take the Axes object out after plotting and perform .grid(False) on both axes.
# Gets the axes object out after plotting
ax = data.plot(...)
# Turns off grid on the left Axis.
ax.grid(False)
# Turns off grid on the secondary (right) Axis.
ax.right_ax.grid(False)
sns.set_style("whitegrid", {'axes.grid' : False})
Note that the style can be whichever valid one that you choose.
For a nice article on this, refer to this site.
The problem is with using the default pandas formatting (or whatever formatting you chose). Not sure how things work behind the scenes, but these parameters are trumping the formatting that you pass as in the plot function. You can see a list of them here in the mpl_style dictionary
In order to get around it, you can do this:
import pandas as pd
pd.options.display.mpl_style = 'default'
new_style = {'grid': False}
matplotlib.rc('axes', **new_style)
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'])
This feels like buggy behavior in Pandas, with not all of the keyword arguments getting passed to both Axes. But if you want to have the grid off by default in seaborn, you just need to call sns.set_style("dark"). You can also use sns.axes_style in a with statement if you only want to change the default for one figure.
You can just set:
sns.set_style("ticks")
It goes back to normal.

Seaborn lineplot unexpected behaviour

I am hoping to understand why the following Seaborn lineplot behaviour occurs.
Spikes are occurring through the time-series and additional data has been added to the left of the actual data.
How can I prevent this unexpected behaviour in Seaborn?
Regular plot of data:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
aussie_property[['Sydney(SYDD)']].plot();
Seaborn plot of data:
sns.lineplot(data=aussie_property, x='date', y='Sydney(SYDD)');
This is not a seaborn problem but a question of ambiguous datetimes.
Convert date to a datetime object with the following code:
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], dayfirst=True)
and you get your expected plot with seaborn
Generally, it is advisable to provide the format during datetime conversions, e.g.,
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], format="%d/%m/%Y")
because, as we have seen here, dates like 10/12/2020 are ambiguous. Consequently, the parser first thought the data would be month/day/year and later noticed this cannot be the case, so changed to parsing your input as day/month/year, giving rise to these time-travelling spikes in your seaborn graph. Why you didn't see them in the pandas plot, you ask? Well, this is plotted against the index, so you don't notice this conversion problem in the pandas plot.
More information on the format codes can be found in the Python documentation.

Seaborn bar plot - different y axis values?

I am very new to coding and just really stuck with a graph I am trying to produce for a Uni assignment
This is what it looks like
I am pretty happy with the styling my concern is with the y axis. I understand that because I have one value much higher than the rest it is difficult to see the true values of the values further down the scale.
Is there anyway to change this?
Or can anyone recommend a different grah type that may show this data mor clearly?
Thanks!
You can try using a combination of ScalarFormatter on the y-axis and MultipleLocator to specify the tick-frequency of the y-axis values. You can read more about customising tricks for data-visualisations here Customising tricks for visualising data in Python
import numpy as np
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax_data = sns.barplot(x= PoliceForce, y = TotalNRMReferrals) # change as per how you are plotting, just for an example
ax_data.yaxis.set_major_locator(ticker.MultipleLocator(40)) # it would have a tick frequency of 40, change 40 to the tick-frequency you want.
ax_data.yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()
Based on your current graph, I would suggest lowering the tick-frequency (try with values lower than 100, say 50 for instance). This would present the graph in a more readable fashion. I hope this helps answer your question.

Pandas timeseries plot showing abnormal characters

I used pandas to plot some random time-series data, and I found that it was showing each month as a number followed by a square. Is there any way to fix this?
Here is the code:
>>> import numpy as np
>>> import pandas as pd
>>> import matplotlib.pyplot as plt
>>> ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
>>> ts.plot()
<matplotlib.axes._subplots.AxesSubplot object at 0xb2072b0c>
>>> plt.show()
Here is the plot:
This can happen if your system locale is set to a non-English one. Given your name I am going to assume you might be using a Chinese locale. So, Pandas or Matplotlib is generating Chinese calendar characters, but the rendering engine you are using cannot display them.
You have at least two options:
Change your system locale, at least when running this code.
Try a different "backend" for Matplotlib. You can get the list of available ones on your system by following this: List of all available matplotlib backends

How to get rid of grid lines when plotting with Seaborn + Pandas with secondary_y

I'm plotting two data series with Pandas with seaborn imported. Ideally I would like the horizontal grid lines shared between both the left and the right y-axis, but I'm under the impression that this is hard to do.
As a compromise I would like to remove the grid lines all together. The following code however produces the horizontal gridlines for the secondary y-axis.
import pandas as pd
import numpy as np
import seaborn as sns
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'],grid=False)
You can take the Axes object out after plotting and perform .grid(False) on both axes.
# Gets the axes object out after plotting
ax = data.plot(...)
# Turns off grid on the left Axis.
ax.grid(False)
# Turns off grid on the secondary (right) Axis.
ax.right_ax.grid(False)
sns.set_style("whitegrid", {'axes.grid' : False})
Note that the style can be whichever valid one that you choose.
For a nice article on this, refer to this site.
The problem is with using the default pandas formatting (or whatever formatting you chose). Not sure how things work behind the scenes, but these parameters are trumping the formatting that you pass as in the plot function. You can see a list of them here in the mpl_style dictionary
In order to get around it, you can do this:
import pandas as pd
pd.options.display.mpl_style = 'default'
new_style = {'grid': False}
matplotlib.rc('axes', **new_style)
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'])
This feels like buggy behavior in Pandas, with not all of the keyword arguments getting passed to both Axes. But if you want to have the grid off by default in seaborn, you just need to call sns.set_style("dark"). You can also use sns.axes_style in a with statement if you only want to change the default for one figure.
You can just set:
sns.set_style("ticks")
It goes back to normal.

Categories