Any easy way to get grouped box plot using plotly and cufflinks? - python

I now that for scatter plot, you can write something like
df.iplot(kind='scatter', x='myX',y='myY',categories='myGroup')
supposing that df is a dataframe with those variables.
However, this won't work if I change to
df.iplot(kind='box', x='myX',y='myY',categories='myGroup')
it will end up with a scatter plot. Is the categories setting not supported in box plot yet or I missed something?

I was looking for that solution too, but could not find any help. But I was able to find a hack; For example in case of popular titanic data set from kaggle. Box plot of age by passenger class:
import cufflinks as cf
cf.go_offline()
box_age = train[['Pclass', 'Age']]
box_age.pivot(columns='Pclass', values='Age').iplot(kind='box')
You can do it in one step but in two (or three step by storing the pivot table in a object) step code looks clean. So Second step I am pivoting the data. So there will be 1 non-null value per rows. iplot can take care about the null values. I have tested with seaborn and iplot the give me the same answer. So its reliable. In case if you want to try both. Here is seaborn code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(12, 7))
sns.boxplot(x='Pclass', y='Age', data=train, palette='winter')
Note: I am using Jupyter Notebook that's why there is %matplotlib inline

Related

Default Colormap of Seaborn When Making a 2D Histogram?

In Python, let's say I define two numpy arrays that I want to display in a 2D histogram:
import seaborn
import numpy as np
import matplotlib.pyplot as plt
num_samples = int(8e+4)
u0 = np.random.rand(num_samples)
E_incid = 90*np.random.rand(num_samples)+10
Now, with seaborn, I get this output:
However, since I need to switch to matplotlib, I want the same color scheme just in plt.hist2d(). What I did is to loop over all available colormaps (cf. this link)
and look which cmap resembles the one from seaborn, but I did not find a match. The closest match I found (with cmap='PuBu'):
Clearly, these two plots do not look similar; how I can force matplotlib to use the same colormap as seaborn? Thanks!

Seaborn lineplot unexpected behaviour

I am hoping to understand why the following Seaborn lineplot behaviour occurs.
Spikes are occurring through the time-series and additional data has been added to the left of the actual data.
How can I prevent this unexpected behaviour in Seaborn?
Regular plot of data:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
aussie_property[['Sydney(SYDD)']].plot();
Seaborn plot of data:
sns.lineplot(data=aussie_property, x='date', y='Sydney(SYDD)');
This is not a seaborn problem but a question of ambiguous datetimes.
Convert date to a datetime object with the following code:
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], dayfirst=True)
and you get your expected plot with seaborn
Generally, it is advisable to provide the format during datetime conversions, e.g.,
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], format="%d/%m/%Y")
because, as we have seen here, dates like 10/12/2020 are ambiguous. Consequently, the parser first thought the data would be month/day/year and later noticed this cannot be the case, so changed to parsing your input as day/month/year, giving rise to these time-travelling spikes in your seaborn graph. Why you didn't see them in the pandas plot, you ask? Well, this is plotted against the index, so you don't notice this conversion problem in the pandas plot.
More information on the format codes can be found in the Python documentation.

Why is my boxplot not showing up in python? [duplicate]

This question already has answers here:
How to show matplotlib plots?
(6 answers)
Closed 3 years ago.
I am new to Python and am working on displaying a boxplot for a dataset with 2 numeric columns and 1 character column with values (A,B,C,D). I want to show a boxplot of the values for either of the 2 numeric columns by the character column. I have followed some tutorials online but the plots are not showing up.
I have tried adding .show() or .plot() on the end of some of my code, but receive warnings that those attributes don't exist. I have tried using matplotlib and it seems to work better when I use that module, but I want to learn how to do this when using pandas.
import pandas as pd
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
I want a boxplot to show up automatically when I run this code or be able to run one more line to have it pop up, but everything I've tried has failed. What step(s) am I missing?
You should use plt.show(). Look at the following code
import pandas as pd
import matplotlib.pyplot as plt
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
plt.show()
Seaborn library helps you plot all sorts of plots between two columns of a dataframe pretty easily. Place any categorical column on the x-axis and a numerical column on the y-axis. There is also a fancy version of boxplot in Seaborn known as boxenplot.
import seaborn as sns
sns.boxplot(x = data['Col1'], y = data['Col2'])
import seaborn as sns
sns.boxenplot(x = data['Col1'], y = data['Col2'])

Seaborn bar plot - different y axis values?

I am very new to coding and just really stuck with a graph I am trying to produce for a Uni assignment
This is what it looks like
I am pretty happy with the styling my concern is with the y axis. I understand that because I have one value much higher than the rest it is difficult to see the true values of the values further down the scale.
Is there anyway to change this?
Or can anyone recommend a different grah type that may show this data mor clearly?
Thanks!
You can try using a combination of ScalarFormatter on the y-axis and MultipleLocator to specify the tick-frequency of the y-axis values. You can read more about customising tricks for data-visualisations here Customising tricks for visualising data in Python
import numpy as np
import seaborn.apionly as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax_data = sns.barplot(x= PoliceForce, y = TotalNRMReferrals) # change as per how you are plotting, just for an example
ax_data.yaxis.set_major_locator(ticker.MultipleLocator(40)) # it would have a tick frequency of 40, change 40 to the tick-frequency you want.
ax_data.yaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()
Based on your current graph, I would suggest lowering the tick-frequency (try with values lower than 100, say 50 for instance). This would present the graph in a more readable fashion. I hope this helps answer your question.

how to make small multiple box plots with long data frame in python

I have a long data frame like the simplified sample below:
import pandas as pd
import numpy as np
data={'nm':['A','B']*12,'var':['vol','vol','ratio','ratio','price','price']*4,'value':np.random.randn(24)}
sample=pd.DataFrame(data)
sample
And wish to create small multiple box plots using var as facet, nm as category and value as value, how can I do so using matplotlib or seaborn? I've searched for similar code but the examples looked complex.
Perhaps you can start with seaborns catplot:
sns.catplot(x='nm', y='value', col='var', kind='box', data=sample)

Categories