Why is my boxplot not showing up in python? [duplicate] - python

This question already has answers here:
How to show matplotlib plots?
(6 answers)
Closed 3 years ago.
I am new to Python and am working on displaying a boxplot for a dataset with 2 numeric columns and 1 character column with values (A,B,C,D). I want to show a boxplot of the values for either of the 2 numeric columns by the character column. I have followed some tutorials online but the plots are not showing up.
I have tried adding .show() or .plot() on the end of some of my code, but receive warnings that those attributes don't exist. I have tried using matplotlib and it seems to work better when I use that module, but I want to learn how to do this when using pandas.
import pandas as pd
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
I want a boxplot to show up automatically when I run this code or be able to run one more line to have it pop up, but everything I've tried has failed. What step(s) am I missing?

You should use plt.show(). Look at the following code
import pandas as pd
import matplotlib.pyplot as plt
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
plt.show()

Seaborn library helps you plot all sorts of plots between two columns of a dataframe pretty easily. Place any categorical column on the x-axis and a numerical column on the y-axis. There is also a fancy version of boxplot in Seaborn known as boxenplot.
import seaborn as sns
sns.boxplot(x = data['Col1'], y = data['Col2'])
import seaborn as sns
sns.boxenplot(x = data['Col1'], y = data['Col2'])

Related

Seaborn heatmap keeps adding color bars in loop over datasets [duplicate]

This question already has answers here:
Seaborn plots in a loop
(6 answers)
How to plot in multiple subplots
(12 answers)
Closed 1 year ago.
I'm trying to plot heatmaps in a loop over various datasets but with every new heatmap, a new color bar gets added (the map looks fine and no extra maps are added). I could use a workaround by resetting the color bar inside the loop but I would prefer to know what is going on and have a cleaner solution. Thanks in advance for any help!
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns;
# read file
atlas = ['A','B','C','D']
output_path = '/Users/polo/Desktop/Heatmaps/'
for at in range(len(atlas)):
data = pd.read_csv('/Users/polo/Desktop/correl_input_{}.csv'.format(atlas[at]))
hmap = sns.heatmap(data,cmap='seismic',linewidths=.5,vmin=-0.1, vmax=0.1)
hmap.set_ylim(0, 5)
plt.savefig(output_path + 'Heatmap_{}.png'.format(atlas[at]), dpi=1200,bbox_inches='tight')
#plt.savefig('{}_plot.png', format='png', dpi=1200,bbox_inches='tight')
TL;DR your solution of plt.figure() is the correct one, as suggested here.
Being built on top of matplotlib, seaborn uses the concepts of figures and axes. The canonical way to create a matplotlib.pyplot chart begins with instantiating a figure and axes with f, ax = plt.subplots(). When you call an axes-level plot such as heatmap, seaborn calls matplotlib.pyplot.gca(), which gets the current axes. If it doesn't exist seaborn instantiates a new one under the hood.
I'm guessing that in your loop the heatmaps are covering one another, but seaborn is dynamically adjusting the figure to leave space for each colorbar. Clearing the figure with plt.figure() (or f, ax = plt.subplots()) is what you want.

Seaborn lineplot unexpected behaviour

I am hoping to understand why the following Seaborn lineplot behaviour occurs.
Spikes are occurring through the time-series and additional data has been added to the left of the actual data.
How can I prevent this unexpected behaviour in Seaborn?
Regular plot of data:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
aussie_property[['Sydney(SYDD)']].plot();
Seaborn plot of data:
sns.lineplot(data=aussie_property, x='date', y='Sydney(SYDD)');
This is not a seaborn problem but a question of ambiguous datetimes.
Convert date to a datetime object with the following code:
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], dayfirst=True)
and you get your expected plot with seaborn
Generally, it is advisable to provide the format during datetime conversions, e.g.,
aussie_property['date'] = pd.to_datetime(aussie_property['Date'], format="%d/%m/%Y")
because, as we have seen here, dates like 10/12/2020 are ambiguous. Consequently, the parser first thought the data would be month/day/year and later noticed this cannot be the case, so changed to parsing your input as day/month/year, giving rise to these time-travelling spikes in your seaborn graph. Why you didn't see them in the pandas plot, you ask? Well, this is plotted against the index, so you don't notice this conversion problem in the pandas plot.
More information on the format codes can be found in the Python documentation.

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

how to make small multiple box plots with long data frame in python

I have a long data frame like the simplified sample below:
import pandas as pd
import numpy as np
data={'nm':['A','B']*12,'var':['vol','vol','ratio','ratio','price','price']*4,'value':np.random.randn(24)}
sample=pd.DataFrame(data)
sample
And wish to create small multiple box plots using var as facet, nm as category and value as value, how can I do so using matplotlib or seaborn? I've searched for similar code but the examples looked complex.
Perhaps you can start with seaborns catplot:
sns.catplot(x='nm', y='value', col='var', kind='box', data=sample)

Any easy way to get grouped box plot using plotly and cufflinks?

I now that for scatter plot, you can write something like
df.iplot(kind='scatter', x='myX',y='myY',categories='myGroup')
supposing that df is a dataframe with those variables.
However, this won't work if I change to
df.iplot(kind='box', x='myX',y='myY',categories='myGroup')
it will end up with a scatter plot. Is the categories setting not supported in box plot yet or I missed something?
I was looking for that solution too, but could not find any help. But I was able to find a hack; For example in case of popular titanic data set from kaggle. Box plot of age by passenger class:
import cufflinks as cf
cf.go_offline()
box_age = train[['Pclass', 'Age']]
box_age.pivot(columns='Pclass', values='Age').iplot(kind='box')
You can do it in one step but in two (or three step by storing the pivot table in a object) step code looks clean. So Second step I am pivoting the data. So there will be 1 non-null value per rows. iplot can take care about the null values. I have tested with seaborn and iplot the give me the same answer. So its reliable. In case if you want to try both. Here is seaborn code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(12, 7))
sns.boxplot(x='Pclass', y='Age', data=train, palette='winter')
Note: I am using Jupyter Notebook that's why there is %matplotlib inline

Categories