Hvplot/bokeh summed Bar chart from Pandas Dataframe - python

I'm trying to print a "simple" Bar chart, using HVPlot and bokeh in jupyter notebook.
Here is some simplified data:
My Data originally looks like this:
My goal is to get a bar chart like That (Note it doesn't have to be stacked. The only importatnt thing are the Totals.):
Since I couldn't figure out how to get a bar chart with the sum of certain columns, I used pandas.melt to model the Data to look like that:
With this Data I can plot it, but then the values aren't summed. Instead, there are multiple Bars behind each other.
Here is the code I used to test:
testd = {'Name': ['Item1', 'Item2','Item3','Item3'],'Filter': ['F1','F2','F1','F1'],
'Count': [1,5,2,1], 'CountCategory': ['CountA','CountB','CountA','CountD']}
testdf = pd.DataFrame(data=testd)
testdf.hvplot.bar('CountCategory','Count',groupby='Filter', rot=90, aggregator=np.sum)
It doesn't change anything if I omit the aggregator=np.sum
Does anyone know how to properly plot this?
It doesn't have to use the "transposed" data since I'm only doing that because I have no idea how to plot the Original Data.
And another question would be if there is a possibility

The aggregator is used by the datashade/rasterize operation to aggregate the data and indeed has no effect on bar plots. If you want to aggregate the data I recommend doing so using pandas methods. However in your case I don't think that's the issue, the main problem in implementing the plot you requested is that in holoviews the legend is generally linked to the styling, which means that you can't easily get the legend to display the filter and color each bar separately.
You could do this and add the Filter as a hover column, which means you still have access to it:
testdf.hvplot.bar('CountCategory', 'Count', by='Name', stacked=True, rot=90, hover_cols=['Filter'])
I'll probably raise an issue in HoloViews to support a legend decoupled from the styling.

Related

Easily show mean value for plotly express bar plot

Plotly Express's bar chart stacks the observations by default, showing the sum.
import seaborn as sns
import plotly.express as px
df =sns.load_dataset("penguins")
px.bar(data_frame=df, x="species", y="bill_depth_mm")
I'm trying to display the mean for each species, which is what most other popular Python libraries return.
I could manually calculate the mean of each species and make a new dictionary/Data Frame. However I feel like there should be an easy way to display the mean directly from Plotly.
I've checked the docs and SO with no luck. What am I missing?
I don't think you're missing anything. I imagine what the Plotly developers had in mind is that DataFrames being passed to the px.bar method have one y-value per unique category as evidenced by this documentation showing how Plotly Express works with long or wide format data. In the medals dataset, there are 9 bars for 9 unique categories.
As you said, this means that you would need to calculate the mean for each unique species, and this can be accomplished by passing a groupby mean of your DataFrame directly to the data_frame parameter, even if it's not the most elegant.
fig = px.bar(
data_frame=df.groupby(['species']).mean().reset_index(),
x="species",
y="bill_depth_mm"
)

Bokeh vbar with group counts after filtering with a CDSView

I'd like to create a Bokeh vertical bar chart (vbar) coupled with a scatter plot. I need to start from a given ColumnDataSource (then filtered with a CDSView), as this is only a part of a complex visualization and I need all the plots to be linked together.
The only tricky part for which I didn't find anything about after extensive research is: how to show counts (i.e., the values provided in the top parameter of vbar) based only on data points that are both filtered AND selected in the scatter plot? Both the plots receive the same filters from which a view is created.
The documentation shows how to link selected points in different plots (by using the same ColumnDataSource object), how to filter them, and how to use vbar, but not all of them together.
Any hint? Thank you very much in advance.

How can I arrange two faceted side-by-side charts horizontally in Altair?

Altair offers lovely feature to facet charts using facet method. For example, following dataset visualizes nicely:
print(df[['Year', 'Profile', 'Saison', 'Pos']].to_csv())
,Year,Profile,Saison,Pos
0,2017,6.0,Sommer,VL
1,2017,6.0,Winter,VL
13,2017,6.0,Winter,HL
12,2017,6.0,Sommer,HL
18,2017,6.0,Sommer,HR
6,2017,6.0,Sommer,VR
7,2017,6.0,Winter,VR
19,2017,6.0,Winter,HR
14,2018,5.5,Winter,HL
8,2018,5.5,Winter,VR
15,2018,5.5,Sommer,HL
20,2018,4.3,Winter,HR
21,2018,5.0,Sommer,HR
3,2018,5.5,Sommer,VL
2,2018,6.2,Winter,VL
9,2018,4.5,Sommer,VR
17,2019,4.5,Sommer,HL
11,2019,4.2,Sommer,VR
22,2019,3.5,Winter,HR
10,2019,5.28,Winter,VR
5,2019,4.6,Sommer,VL
4,2019,4.9,Winter,VL
16,2019,4.0,Winter,HL
23,2019,4.5,Sommer,HR
with the following command:
alt.Chart(df).mark_bar().encode(x='Year:O', y='Profile:Q').facet(row='Saison:N', column='Pos:N')
But, as you can seem I have still a lot of place horizontally and would like to use it by rearranging Winter plot right next to the Summer plot:
I understand that I already used column grid to facet over attribute Pos, but visually for me Winter and Sommer plots are two separate plots (just like here), which I'd like to place side by side.
I tried to create two different charts in the same cell and using html emit them side by side, but in Jupyter environment there is a limitation on just one Altair/Vega plot per cell.
Is there any method I can use to arrange these charts horizontally?
In Altair, there is no good way to do this, because faceted charts cannot be nested according to the Vega-Lite schema. However, the Vega-Lite renderer actually does handle this in some cases, despite it technically being disallowed by the schema.
So you can hack it by doing something like this:
chart = alt.Chart(df).mark_bar().encode(
x='Year:O',
y='Profile:Q'
).facet('Saison:N')
spec = alt.FacetChart(
data=df,
spec=chart,
facet=alt.Facet('Pos:N')
).to_json(validate=False)
print(spec)
The resulting spec can be pasted by hand into http://vega.github.io/editor to reveal this (vega editor link):
You'll even notice that the vega editor flags parts of the spec as invalid. This is admittedly not the most satisfying answer, but it sort of works.
Hopefully in the future the Vega-Lite schema will add actual support for nested facets, so they can be used more directly from Altair.

Skip weekends on stock charts with matplolib

This is not duplicate, because existing answers on similar questions don't describe exactly what I need.
Matplotlib has great formatters inside and I love to use them:
ax.xaxis.set_major_locator(matplotlib.dates.MonthLocator())
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b%y'))
They let me plot such stock market charts:
This is what I need, but it has 1 issue: weekends. They are present on x axis and make my chart a little ugly.
Other questions about this issue give advice to create custom formatter. They show examples of such formatters. But no one of them do pretty formatting like matplotlib do:
May19, Jun19, Jul19...
I mean this line of code:
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b%y'))
My question is: please help me to format x axis like matplotlib do: May19, Jun19, Jul19... and don't create weekends when stock market is closed.
What you could almost always do is something similar to what Nic Wanavit suggested.
Manually set your labels, depending on what you need on your axis.
Especially in this case the plot is looking a bit ugly because you have timespans in your data that are not provided with actual data (the weekends in this case) so pyplot will simply connect these points with the corresponding length from the x-axis.
What you can do then is just to plot your data equally distant - which is correct if the data is daily - otherwise consider to interpolate it using e.g. pandas bultin interpolation.
To avoid pyplot automatically detect the index I had to do this:
df['plotidx'] = [i for i in range(len(df['close'])):
Here all the closing values for the stock are stored in a column named 'close' obvsl.
You plot this correspondingly.
Then you can obtain all the ticks created via
labels = [item.get_text() for item in ax.get_xticklabels()]
Adjust them as desired with
labels[i] = string_for_the_label_no_i
Then get them back on the graph using
ax.xaxis.set_ticklabels(labels)
You need to somewhat "update" the plot then. Also keep in mind, that resizing a lot could end up with the labels being as also said in the documentation strange location.
It is some kind of a workaround but worked fine for me because it feels natural to plot data equally distant next to each other rather then making up some data for the weekends.
Greets
to set the x ticks
assuming that you have the dates variable in dataframe row df['dates']
ax.xaxis.set_ticks(df['dates'])

How can I map the values in an array to the bar labels in Python

I am new at Data Visualization with Python. I want to be able to plot the Groupby() results in a bar chart. I have converted a categorical array using the pd.factorize() function in Python. Then, I created a plot using the results of the groupby function.
Here is my code:
fact=pd.factorize(data['DayOfWeek'])
data['fact'].groupby(data['fact_dow']).count().plot(kind='bar',figsize=(14,8))
plt.show()
The resulting image is:
It looks almost good but the x-labels are the factorized results, I need to map them to their corresponding values.
Any one knows how to do this in a pythonic way? Also, if there are other suggestions as to how to do it, please comment.
If the data['DayOfWeek'] corresponds to the labels, then use plt.xticks(data['DayOfWeek'])

Categories