boxplot on groupby timegrouper without subplots using pandas - python

I am doing a groupby using pd.timegrouper on a time series dataset. When I am plotting a boxplot on this groupby object,it has subplots. I dont want to divide the plot area into subplots. I tried using the parameter subplots=False,but its throwing an error saying KEY ERROR "value".
This is the plot i am getting with subplots.
the code:
df['timestamp1'] = df['timestamp'].values.astype('datetime64[s]')
df=df.groupby(pd.TimeGrouper(key="timestamp1",freq="3H"),group_keys=True,as_index=True)
df.boxplot(column="value",subplots=True)
The dataframe object i am using is:
I want to plot all the box plots in the same area without dividing it into subplots
Thanks a lot in advance.

This might actually be a bug. You can get the desired outcome by selecting only the timestamp1 and value columns, therefore eliminating the need to use the column parameter.
df[['timestamp1', 'value']].groupby(pd.TimeGrouper('3H', key='timestamp1'))\
.boxplot(subplots=False)
I went ahead and submitted an issue for this on github.

Related

Easily show mean value for plotly express bar plot

Plotly Express's bar chart stacks the observations by default, showing the sum.
import seaborn as sns
import plotly.express as px
df =sns.load_dataset("penguins")
px.bar(data_frame=df, x="species", y="bill_depth_mm")
I'm trying to display the mean for each species, which is what most other popular Python libraries return.
I could manually calculate the mean of each species and make a new dictionary/Data Frame. However I feel like there should be an easy way to display the mean directly from Plotly.
I've checked the docs and SO with no luck. What am I missing?
I don't think you're missing anything. I imagine what the Plotly developers had in mind is that DataFrames being passed to the px.bar method have one y-value per unique category as evidenced by this documentation showing how Plotly Express works with long or wide format data. In the medals dataset, there are 9 bars for 9 unique categories.
As you said, this means that you would need to calculate the mean for each unique species, and this can be accomplished by passing a groupby mean of your DataFrame directly to the data_frame parameter, even if it's not the most elegant.
fig = px.bar(
data_frame=df.groupby(['species']).mean().reset_index(),
x="species",
y="bill_depth_mm"
)

How can I get a plot with errorbars in Pandas?

I have a set of data held in a dataframe, with another dataframe with the associated errors. I would want to plot this with seaborn, but I can't seem to find a way to do this. I can get a scatterplot, but not with errorbars.
I would want something like This
Which was produced with Matplotlib. Although if I can't plot the lines that is ok as well. I am able to get a basic scatterplot with the sns.scatterplot() method, but can't find any way to add the errorbars to it. Does anyone know how to do this in seaborn?
Thanks

How to plot categorical variables on x axis vs a numerical variable? Looking to make an area plot

I want to make an area plot in seaborn or matplotlib using an index of categorical variables. I have tried a few things but I can't seem to get it. Here's an image of my dataframe. Thanks for any help.
Here's some examples of what I've tried.
plt.plot(areaData.index.values,areaData['Badassery'], data=areaData)
plt.plot(areaData['Badassery'])
I'm not really sure what else I should be doing. Usually I get errors like "Series objects are mutable, and thus can't be hashed" or a blank chart.
It appears that your categorical variable is set as the index, you want to reset it so that you can use it as a column.
#reset the index
areaData.reset_index()
areaData.plot.area(y='Badassery')
Refer to the documentation on area plots with pandas.

Hvplot/bokeh summed Bar chart from Pandas Dataframe

I'm trying to print a "simple" Bar chart, using HVPlot and bokeh in jupyter notebook.
Here is some simplified data:
My Data originally looks like this:
My goal is to get a bar chart like That (Note it doesn't have to be stacked. The only importatnt thing are the Totals.):
Since I couldn't figure out how to get a bar chart with the sum of certain columns, I used pandas.melt to model the Data to look like that:
With this Data I can plot it, but then the values aren't summed. Instead, there are multiple Bars behind each other.
Here is the code I used to test:
testd = {'Name': ['Item1', 'Item2','Item3','Item3'],'Filter': ['F1','F2','F1','F1'],
'Count': [1,5,2,1], 'CountCategory': ['CountA','CountB','CountA','CountD']}
testdf = pd.DataFrame(data=testd)
testdf.hvplot.bar('CountCategory','Count',groupby='Filter', rot=90, aggregator=np.sum)
It doesn't change anything if I omit the aggregator=np.sum
Does anyone know how to properly plot this?
It doesn't have to use the "transposed" data since I'm only doing that because I have no idea how to plot the Original Data.
And another question would be if there is a possibility
The aggregator is used by the datashade/rasterize operation to aggregate the data and indeed has no effect on bar plots. If you want to aggregate the data I recommend doing so using pandas methods. However in your case I don't think that's the issue, the main problem in implementing the plot you requested is that in holoviews the legend is generally linked to the styling, which means that you can't easily get the legend to display the filter and color each bar separately.
You could do this and add the Filter as a hover column, which means you still have access to it:
testdf.hvplot.bar('CountCategory', 'Count', by='Name', stacked=True, rot=90, hover_cols=['Filter'])
I'll probably raise an issue in HoloViews to support a legend decoupled from the styling.

How can I map the values in an array to the bar labels in Python

I am new at Data Visualization with Python. I want to be able to plot the Groupby() results in a bar chart. I have converted a categorical array using the pd.factorize() function in Python. Then, I created a plot using the results of the groupby function.
Here is my code:
fact=pd.factorize(data['DayOfWeek'])
data['fact'].groupby(data['fact_dow']).count().plot(kind='bar',figsize=(14,8))
plt.show()
The resulting image is:
It looks almost good but the x-labels are the factorized results, I need to map them to their corresponding values.
Any one knows how to do this in a pythonic way? Also, if there are other suggestions as to how to do it, please comment.
If the data['DayOfWeek'] corresponds to the labels, then use plt.xticks(data['DayOfWeek'])

Categories