How to output a large number of histograms in a pandas groupby - python

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()

One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()

Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

Related

same data produces different pandas plot

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.
The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

Why is my boxplot not showing up in python? [duplicate]

This question already has answers here:
How to show matplotlib plots?
(6 answers)
Closed 3 years ago.
I am new to Python and am working on displaying a boxplot for a dataset with 2 numeric columns and 1 character column with values (A,B,C,D). I want to show a boxplot of the values for either of the 2 numeric columns by the character column. I have followed some tutorials online but the plots are not showing up.
I have tried adding .show() or .plot() on the end of some of my code, but receive warnings that those attributes don't exist. I have tried using matplotlib and it seems to work better when I use that module, but I want to learn how to do this when using pandas.
import pandas as pd
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
I want a boxplot to show up automatically when I run this code or be able to run one more line to have it pop up, but everything I've tried has failed. What step(s) am I missing?
You should use plt.show(). Look at the following code
import pandas as pd
import matplotlib.pyplot as plt
datafile="C:\\Users\\…\\TestFile.xlsx"
data=pd.read_excel(datafile)
data.boxplot('Col1', by='Col2')
plt.show()
Seaborn library helps you plot all sorts of plots between two columns of a dataframe pretty easily. Place any categorical column on the x-axis and a numerical column on the y-axis. There is also a fancy version of boxplot in Seaborn known as boxenplot.
import seaborn as sns
sns.boxplot(x = data['Col1'], y = data['Col2'])
import seaborn as sns
sns.boxenplot(x = data['Col1'], y = data['Col2'])

How do I "reset the index" for a matplotlib plot?

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?
If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

plot.ly Bar plot axis labels

I am trying to plot a Bar plot of a pandas df column.
df[z1z2].head()
MN-SW_TO_MN-SE 562
IA-2_TO_MN-SE 345
MN-SW_TO_MN-WC 259
MN-SW_TO_MN-SW 184
ND_TO_MN-NW 163
Name: z1z2, dtype: int64
In [126]:
data = [Bar(y=df['z1z2'].value_counts()[0:50])]
iplot(data)
Note: that df['z1z2'].iplot(kind=Bar.....) does fails with credential errors. I guess you can't call iplot directly from pandas offline?
In any case, now how do I plot the x-axis, which are categories i.e. MN-SW_TO_MN-SE etc.
I am working offline within Jupyter notebook.
If someone can also point me to documentation that shows examples of what else is possible for Bar plots or any other plot I would be grateful.
Yes, I do know where the full reference documentation is for plot.ly. However, it is not intuitive and not easy to use for anyone that isn't a developer.
You can pass your categorical x-values directly to Plotly. In the example below the first column contains the categories (x=df.iloc[:,0]).
import string
import pandas as pd
import plotly
plotly.plotly.sign_in('username', 'api_key')
data = [[c, i] for i, c in enumerate(string.ascii_uppercase)]
df = pd.DataFrame(data)
plotly.plotly.plot([plotly.graph_objs.Bar(x=df.iloc[:,0], y=df.iloc[:,1])])
What else is possible with Plotly's bar charts? The best way for me was to look at existing examples and try to reverse engineer them. The documentation is fairly complete but cryptic without examples.

Categories