Seaborn lineplot unexpected behaviour - python

I am hoping to understand why the following Seaborn lineplot behaviour occurs.
Spikes are occurring through the time-series and additional data has been added to the left of the actual data.
How can I prevent this unexpected behaviour in Seaborn?
Regular plot of data:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
aussie_property[['Sydney(SYDD)']].plot();
Seaborn plot of data:
sns.lineplot(data=aussie_property, x='date', y='Sydney(SYDD)');

This is not a seaborn problem but a question of ambiguous datetimes.
Convert date to a datetime object with the following code:
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], dayfirst=True)
and you get your expected plot with seaborn
Generally, it is advisable to provide the format during datetime conversions, e.g.,
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], format="%d/%m/%Y")
because, as we have seen here, dates like 10/12/2020 are ambiguous. Consequently, the parser first thought the data would be month/day/year and later noticed this cannot be the case, so changed to parsing your input as day/month/year, giving rise to these time-travelling spikes in your seaborn graph. Why you didn't see them in the pandas plot, you ask? Well, this is plotted against the index, so you don't notice this conversion problem in the pandas plot.
More information on the format codes can be found in the Python documentation.

Related

In a pairplot, how can I not show confidence intervals but display grid lines instead? [duplicate]

I'm plotting two data series with Pandas with seaborn imported. Ideally I would like the horizontal grid lines shared between both the left and the right y-axis, but I'm under the impression that this is hard to do.
As a compromise I would like to remove the grid lines all together. The following code however produces the horizontal gridlines for the secondary y-axis.
import pandas as pd
import numpy as np
import seaborn as sns
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'],grid=False)
You can take the Axes object out after plotting and perform .grid(False) on both axes.
# Gets the axes object out after plotting
ax = data.plot(...)
# Turns off grid on the left Axis.
ax.grid(False)
# Turns off grid on the secondary (right) Axis.
ax.right_ax.grid(False)
sns.set_style("whitegrid", {'axes.grid' : False})
Note that the style can be whichever valid one that you choose.
For a nice article on this, refer to this site.
The problem is with using the default pandas formatting (or whatever formatting you chose). Not sure how things work behind the scenes, but these parameters are trumping the formatting that you pass as in the plot function. You can see a list of them here in the mpl_style dictionary
In order to get around it, you can do this:
import pandas as pd
pd.options.display.mpl_style = 'default'
new_style = {'grid': False}
matplotlib.rc('axes', **new_style)
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'])
This feels like buggy behavior in Pandas, with not all of the keyword arguments getting passed to both Axes. But if you want to have the grid off by default in seaborn, you just need to call sns.set_style("dark"). You can also use sns.axes_style in a with statement if you only want to change the default for one figure.
You can just set:
sns.set_style("ticks")
It goes back to normal.

Why is my boxplot not showing up in python? [duplicate]

This question already has answers here:
How to show matplotlib plots?
(6 answers)
Closed 3 years ago.
I am new to Python and am working on displaying a boxplot for a dataset with 2 numeric columns and 1 character column with values (A,B,C,D). I want to show a boxplot of the values for either of the 2 numeric columns by the character column. I have followed some tutorials online but the plots are not showing up.
I have tried adding .show() or .plot() on the end of some of my code, but receive warnings that those attributes don't exist. I have tried using matplotlib and it seems to work better when I use that module, but I want to learn how to do this when using pandas.
import pandas as pd
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
I want a boxplot to show up automatically when I run this code or be able to run one more line to have it pop up, but everything I've tried has failed. What step(s) am I missing?
You should use plt.show(). Look at the following code
import pandas as pd
import matplotlib.pyplot as plt
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
plt.show()
Seaborn library helps you plot all sorts of plots between two columns of a dataframe pretty easily. Place any categorical column on the x-axis and a numerical column on the y-axis. There is also a fancy version of boxplot in Seaborn known as boxenplot.
import seaborn as sns
sns.boxplot(x = data['Col1'], y = data['Col2'])
import seaborn as sns
sns.boxenplot(x = data['Col1'], y = data['Col2'])

Use locale with seaborn

Currently I'm trying to visualize some data I am working on with seaborn. I need to use a comma as decimal separator, so I was thinking about simply changing the locale. I found this answer to a similar question, which sets the locale and uses matplotlib to plot some data.
This also works for me, but when using seaborn instead of matplotlib directly, it doesn't use the locale anymore. Unfortunately, I can't find any setting to change in seaborn or any other workaround. Is there a way?
Here some exemplary data. Note that I had to use 'german' instead of "de_DE". The xlabels all use the standard point as decimal separator.
import locale
# Set to German locale to get comma decimal separator
locale.setlocale(locale.LC_NUMERIC, 'german')
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Tell matplotlib to use the locale we set above
plt.rcParams['axes.formatter.use_locale'] = True
df = pd.DataFrame([[1,2,3],[4,5,6]]).T
df.columns = [0.3,0.7]
sns.boxplot(data=df)
The "numbers" shown on the x axis for such boxplots are determined via a
matplotlib.ticker.FixedFormatter (find out via print(ax.xaxis.get_major_formatter())).
This fixed formatter just puts labels on ticks one by one from a list of labels. This makes sense because your boxes are positionned at 0 and 1, yet you want them to be labeled as 0.3, 0.7. I suppose this concept becomes clearer when thinking about what should happen for a dataframe with df.columns=["apple","banana"].
So the FixedFormatter ignores the locale, because it just takes the labels as they are. The solution I would propose here (although some of those in the comments are equally valid) would be to format the labels yourself.
ax.set_xticklabels(["{:n}".format(l) for l in df.columns])
The n format here is just the same as the usual g, but takes into account the locale. (See python format mini language). Of course using any other format of choice is equally possible. Also note that setting the labels here via ax.set_xticklabels only works because of the fixed locations used by boxplot. For other types of plots with continuous axes, this would not be recommended, and instead the concepts from the linked answers should be used.
Complete code:
import locale
# Set to German locale to get comma decimal separator
locale.setlocale(locale.LC_NUMERIC, 'german')
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame([[1,2,3],[4,5,6]]).T
df.columns = [0.3,0.7]
ax = sns.boxplot(data=df)
ax.set_xticklabels(["{:n}".format(l) for l in df.columns])
plt.show()

Any easy way to get grouped box plot using plotly and cufflinks?

I now that for scatter plot, you can write something like
df.iplot(kind='scatter', x='myX',y='myY',categories='myGroup')
supposing that df is a dataframe with those variables.
However, this won't work if I change to
df.iplot(kind='box', x='myX',y='myY',categories='myGroup')
it will end up with a scatter plot. Is the categories setting not supported in box plot yet or I missed something?
I was looking for that solution too, but could not find any help. But I was able to find a hack; For example in case of popular titanic data set from kaggle. Box plot of age by passenger class:
import cufflinks as cf
cf.go_offline()
box_age = train[['Pclass', 'Age']]
box_age.pivot(columns='Pclass', values='Age').iplot(kind='box')
You can do it in one step but in two (or three step by storing the pivot table in a object) step code looks clean. So Second step I am pivoting the data. So there will be 1 non-null value per rows. iplot can take care about the null values. I have tested with seaborn and iplot the give me the same answer. So its reliable. In case if you want to try both. Here is seaborn code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(12, 7))
sns.boxplot(x='Pclass', y='Age', data=train, palette='winter')
Note: I am using Jupyter Notebook that's why there is %matplotlib inline

How to get rid of grid lines when plotting with Seaborn + Pandas with secondary_y

I'm plotting two data series with Pandas with seaborn imported. Ideally I would like the horizontal grid lines shared between both the left and the right y-axis, but I'm under the impression that this is hard to do.
As a compromise I would like to remove the grid lines all together. The following code however produces the horizontal gridlines for the secondary y-axis.
import pandas as pd
import numpy as np
import seaborn as sns
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'],grid=False)
You can take the Axes object out after plotting and perform .grid(False) on both axes.
# Gets the axes object out after plotting
ax = data.plot(...)
# Turns off grid on the left Axis.
ax.grid(False)
# Turns off grid on the secondary (right) Axis.
ax.right_ax.grid(False)
sns.set_style("whitegrid", {'axes.grid' : False})
Note that the style can be whichever valid one that you choose.
For a nice article on this, refer to this site.
The problem is with using the default pandas formatting (or whatever formatting you chose). Not sure how things work behind the scenes, but these parameters are trumping the formatting that you pass as in the plot function. You can see a list of them here in the mpl_style dictionary
In order to get around it, you can do this:
import pandas as pd
pd.options.display.mpl_style = 'default'
new_style = {'grid': False}
matplotlib.rc('axes', **new_style)
data = pd.DataFrame(np.cumsum(np.random.normal(size=(100,2)),axis=0),columns=['A','B'])
data.plot(secondary_y=['B'])
This feels like buggy behavior in Pandas, with not all of the keyword arguments getting passed to both Axes. But if you want to have the grid off by default in seaborn, you just need to call sns.set_style("dark"). You can also use sns.axes_style in a with statement if you only want to change the default for one figure.
You can just set:
sns.set_style("ticks")
It goes back to normal.

Categories