How to plot pandas grouped values using pygal? - python

I have a csv like this:
name,version,color
AA,"version 1",yellow
BB,"version 2",black
CC,"version 3",yellow
DD,"version 1",black
AA,"version 1",green
BB,"version 2",green
FF,"version 3",green
GG,"version 3",red
BB,"version 3",yellow
BB,"version 2",red
BB,"version 1",black
I would like to draw a bar chart, which shows versions on x axis and an amount (number) of different colors on y axis.
So I want to group DataFrame by version, check which colors belong to a particular version, count colors and display the results on the pygal bar chart.
It should look similar to this:
What I tried so far:
df = pd.read_csv(results)
new_df = df.groupby('version')['color'].value_counts()
bar_chart = pygal.Bar(width=1000, height=600,
legend_at_bottom=True, human_readable=True,
title='versions vs colors',
x_title='Version',
y_title='Number')
versions = []
for index, row in new_df.iteritems():
versions.append(index[0])
bar_chart.add(index[1], row)
bar_chart.x_labels = map(str, versions)
bar_chart.render_to_file('bar-chart.svg')
Unfortunately, it does not work and can not match group of colors to proper version.
I also tried using matplotlib.pyplot and it works like a charm:
pd.crosstab(df['version'],df['color']).plot.bar(ax=ax)
plt.draw()
This works as well:
df.groupby(['version','color']).size().unstack(fill_value=0).plot.bar()
But the generated chart is not accurate enough for me. I would like to have pygal chart.
I also checked:
How to plot pandas groupby values in a graph?
How to plot a pandas dataframe?

Related

Plotly: Plotting columns of a dataframe resulting in blank plot

I've been attempting to create a line graph with subplots for each column of a dataframe in Pandas. My dataframe has variable names as the column names, datetime objects as the columns, and percentages (floats) as the values.
I'm referencing Plotly: How to create subplots from each column in a pandas dataframe?, but when adapted for my scenario it results in a blank figure - no exceptions or anything, and it does print out an empty box whose size I can adjust, but nothing in it.
My code is
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# get the number of columns
num_alerts = len(helpfulness_graph_data.columns)
# Get the alert names
alert_types = helpfulness_graph_data.columns.values.tolist()
# Create a subplot figure object with 1 column, num_rows rows, and titles for each alert
fig = make_subplots(
rows=num_alerts, cols=1,
subplot_titles=alert_types)
j = 1
for i in helpfulness_graph_data.columns:
#print(helpfulness_graph_data[i].values)
fig.append_trace(
go.Scatter(
{'x': helpfulness_graph_data.index,
'y': helpfulness_graph_data[i].values}),
row=j, col=1)
j += 1
fig.update_layout(height=1200, width=600, title_text="Helpfulness Over Time")
fig.show()
For anyone else who comes across this: This happens when running plotly in jupyterlab sometimes apparently - I found some additional questions with suggested solutions, but none of them worked for me; What did work however was running it in plain old Jupyter. That's what I'd recommend.

Plotly: How to display individual value on histogram?

I am trying to make dynamic plots with plotly. I want to plot a count of data that have been aggregated (using groupby).
I want to facet the plot by color (and maybe even by column). The problem is that I want the value count to be displayed on each bar. With histogram, I get smooth bars but I can't find how to display the count:
With a bar plot I can display the count but I don't get smooth bar and the count does not appear for the whole bar but for each case composing that bar
Here is my code for the barplot
val = pd.DataFrame(data2.groupby(["program", "gender"])["experience"].value_counts())
px.bar(x=val.index.get_level_values(0), y=val, color=val.index.get_level_values(1), barmode="group", text=val)
It's basically the same for the histogram.
Thank you for your help!
px.histogram does not seem to have a text attribute. So if you're willing to do any binning before producing your plot, I would use px.Bar. Normally, you apply text to your barplot using px.Bar(... text = <something>). But this gives the results you've described with text for all subcategories of your data. But since we know that px.Bar adds data and annotations in the order that the source is organized, we can simply update text to the last subcategory applied using fig.data[-1].text = sums. The only challenge that remains is some data munging to retrieve the correct sums.
Plot:
Complete code with data example:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data
df = pd.DataFrame({'x':['a', 'b', 'c', 'd'],
'y1':[1, 4, 9, 16],
'y2':[1, 4, 9, 16],
'y3':[6, 8, 4.5, 8]})
df = df.set_index('x')
# calculations
# column sums for transposed dataframe
sums= []
for col in df.T:
sums.append(df.T[col].sum())
# change dataframe format from wide to long for input to plotly express
df = df.reset_index()
df = pd.melt(df, id_vars = ['x'], value_vars = df.columns[1:])
fig = px.bar(df, x='x', y='value', color='variable')
fig.data[-1].text = sums
fig.update_traces(textposition='inside')
fig.show()
If your first graph is with graph object librairy you can try:
# Use textposition='auto' for direct text
fig=go.Figure(data[go.Bar(x=val.index.get_level_values(0),
y=val, color=val.index.get_level_values(1),
barmode="group", text=val, textposition='auto',
)])

python plotly plot value to thousands

I am planning to use plotly and plot value from a number to .k, for example, the value showed on the chart is like 8247294, and I want it to show like 8.25M
I tried something like this:
x = [x for x in range(1,len(table))] #date
y = table['revenue'].values.tolist()
fig = go.Figure(go.Scatter(x=x, y=y,text=y,mode="lines+markers+text",
line=dict(color='firebrick', width=4)))
fig.update_layout(width=900,height=650)
fig.update_layout(
tickformat='k')
It is not working.So what's the correct way of doing it?

Parsing CSV file using Panda

I have been using matplotlib for quite some time now and it is great however, I want to switch to panda and my first attempt at it didn't go so well.
My data set looks like this:
sam,123,184,2.6,543
winter,124,284,2.6,541
summer,178,384,2.6,542
summer,165,484,2.6,544
winter,178,584,2.6,545
sam,112,684,2.6,546
zack,145,784,2.6,547
mike,110,984,2.6,548
etc.....
I want first to search the csv for anything with the name mike and create it own list. Now with this list I want to be able to do some math for example add sam[3] + winter[4] or sam[1]/10. The last part would be to plot it columns against each other.
Going through this page
http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table
The only thing I see is if I have a column header, however, I don't have any headers. I only know the position in a row of the values I want.
So my question is:
How do I create a bunch of list for each row (sam, winter, summer)
Is this method efficient if my csv has millions of data point?
Could I use matplotlib plotting to plot pandas dataframe?
ie :
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike[1], winter[3], label='Mike vs Winter speed', color = 'red')
You can read a csv without headers:
data=pd.read_csv(filepath, header=None)
Columns will be numbered starting from 0.
Selecting and filtering:
all_summers = data[data[0]=='summer']
If you want to do some operations grouping by the first column, it will look like this:
data.groupby(0).sum()
data.groupby(0).count()
...
Selecting a row after grouping:
sums = data.groupby(0).sum()
sums.loc['sam']
Plotting example:
sums.plot()
import matplotlib.pyplot as plt
plt.show()
For more details about plotting, see: http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html
df = pd.read_csv(filepath, header=None)
mike = df[df[0]=='mike'].values.tolist()
winter = df[df[0]=='winter'].values.tolist()
Then you can plot those list as you wanted to above
fig1 = plt.figure(figsize= (10,10))
ax = fig1.add_subplot(211)
ax.plot(mike, winter, label='Mike vs Winter speed', color = 'red')

Python Bokeh - blending

Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
I am trying to create a bar chart from a dataframe df in Python Bokeh library. The data I have simply looks like:
value datetime
5 01-01-2015
7 02-01-2015
6 03-01-2015
... ... (for 3 years)
I would like to have a bar chart that shows 3 bars per month:
one bar for the MEAN of 'value' for the month
one bar for the MAX of 'value' for the month
one bar for the mean of 'value' for the month
I am able to create one bar chart any of MEAN/MAX/MIN with:
from bokeh.charts import Bar, output_file, show
p = Bar(df, 'datetime', values='value', title='mybargraph',
agg='mean', legend=None)
output_file('test.html')
show(p)
How could I have the 3 bar (mean, max, min) on the same plot ? And if possible stacked above each other.
It looks like blend could help me (like in this example:
http://docs.bokeh.org/en/latest/docs/gallery/stacked_bar_chart.html
but I cannot find detailed explanations of how it works. The bokeh website is amazing but for this particular item it is not really detailed.
Note from maintainers: This question concerns the obsolete bokeh.charts API, removed years ago. For information on creating all kinds of Bar charts with modern Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
OBSOLETE:
That blend example put me on the right track.
import pandas as pd
from pandas import Series
from dateutil.parser import parse
from bokeh.plotting import figure
from bokeh.layouts import row
from bokeh.charts import Bar, output_file, show
from bokeh.charts.attributes import cat, color
from bokeh.charts.operations import blend
output_file("datestats.html")
Just some sample data, feel free to alter it as you see fit.
First I had to wrangle the data into a proper format.
# Sample data
vals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
dates = ["01-01-2015", "02-01-2015", "03-01-2015", "04-01-2015",
"01-02-2015", "02-02-2015", "03-02-2015", "04-02-2015",
"01-03-2015", "02-03-2015", "03-03-2015", "04-03-2015"
]
It looked like your date format was "day-month-year" - I used the dateutil.parser so pandas would recognize it properly.
# Format data as pandas datetime objects with day-first custom
days = []
days.append(parse(x, dayfirst=True) for x in dates)
You also needed it grouped by month - I used pandas resample to downsample the dates, get the appropriate values for each month, and merge into a dataframe.
# Put data into dataframe broken into min, mean, and max values each for month
ts = Series(vals, index=days[0])
firstmerge = pd.merge(ts.resample('M').min().to_frame(name="min"),
ts.resample('M').mean().to_frame(name="mean"),
left_index=True, right_index=True)
frame = pd.merge(firstmerge, ts.resample('M').max().to_frame(name="max"),
left_index=True, right_index=True)
Bokeh allows you to use the pandas dataframe's index as the chart's x values,
as discussed here
but it didn't like the datetime values so I added a new column for date labels. See timeseries comment below***.
# You can use DataFrame index for bokeh x values but it doesn't like timestamp
frame['Month'] = frame.index.strftime('%m-%Y')
Finally we get to the charting part. Just like the Olympic medal example, we pass some arguments to Bar.
Play with these however you like, but note that I added the legend by building it outside of the chart altogether. If you have a lot of data points it gets very messy on the chart the way it's built here.
# Main object to render with stacking
bar = Bar(frame,
values=blend('min', 'mean', 'max',
name='values', labels_name='stats'),
label=cat(columns='Month', sort=False),
stack=cat(columns='values', sort=False),
color=color(columns='values',
palette=['SaddleBrown', 'Silver', 'Goldenrod'],
sort=True),
legend=None,
title="Statistical Values Grouped by Month",
tooltips=[('Value', '#values')]
)
# Legend info (displayed as separate chart using bokeh.layouts' row)
factors = ["min", "mean", "max"]
x = [0] * len(factors)
y = factors
pal = ['SaddleBrown', 'Silver', 'Goldenrod']
p = figure(width=100, toolbar_location=None, y_range=factors)
p.rect(x, y, color=pal, width=10, height=1)
p.xaxis.major_label_text_color = None
p.xaxis.major_tick_line_color = None
p.xaxis.minor_tick_line_color = None
# Display chart
show(row(bar, p))
If you copy/paste this code, this is what you will show.
If you render it yourself or if you serve it: hover over each block to see the tooltips (values).
I didn't abstract everything I could (colors come to mind).
This is the type of chart you wanted to build, but it seems like a different chart style would display the data more informatively since stacked totals (min + mean + max) don't provide meaningful information. But I don't know what your data really are.
***You might consider a timeseries chart. This could remove some of the data wrangling done before plotting.
You might also consider grouping your bars instead of stacking them. That way you could easily visualize each month's numbers.

Categories