Pandas - Stacked bar chart with multiple boolean columns - python

I have data like this. I would like to make a stacked bar chart where the x-axis is the ball color and each stack in the bar is the percentage of balls with that color that have that attribute (note each column in the bar chart will not sum to 100). I'm trying something like this
data = {'Ball Color' : ['Red', 'Blue', 'Blue', 'Red', 'Red', 'Red'],
'Heavy?' : [True, True, False, True, False, True],
'Shiny?' : [True, True, False, True, True, False]}
code_samp = pd.DataFrame(data)
code_samp.groupby('Ball Color')[['Heavy?', 'Shiny?']].value_counts().plot.bar()
But value_counts is only supported for series. Any ideas? Thanks in advance

Use:
code_samp.groupby('Ball Color').sum().plot.bar()
or
code_samp.groupby('Ball Color').mean().plot.bar()

Related

y2 axis selectable/dropdown range

new to python and programing.
I am trying to program some data visualization to improve efficiency.
I want to generate a scatter plot with plotly with y1 and y2. On y1 I want to have 2 permanent ranges of data. On y2 I want to have multiple data sets/ranges that can be selected to be shown or not.
Data is imported from an excel file.
I found a way, the other way around, to plot the permanent on y2 and on/off data on y1, but is not working great, legend is going crazy, format is changed to line plot and in the end this is not what I want, I want to have permanent data on y1 and selectable data on y2
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig = make_subplots(specs=[[{"secondary_y": True}]])
for column in df.columns.to_list():
fig.add_trace(
go.Scatter(x=df["Lap"],
y=df["P_TYR_FA (bar) mean"],name="TYR_FA"),secondary_y=False)
fig.add_trace(
go.Scatter(x=df["Lap"],
y=df["P_TYR_RA (bar) mean"],name="TYR_RA"),secondary_y=False)
fig.add_trace(
go.Scatter(x=df["Lap"],
y=df['P_INT (mbar) mean'],name="TYR_FR"),secondary_y=False)
fig.add_trace(
go.Scatter(x=df["Lap"],
y=df['P_FUE (bar) mean'],name="TYR_FF"),secondary_y=True)
fig.update_layout(
updatemenus=[go.layout.Updatemenu(
active=0,
buttons=list(
[dict(label = 'All',
method = 'update',
args = [{'visible': [True, True, True, True]},
{'title': 'All',
}]),
dict(label = 'P_TYR_FA',
method = 'update',
args = [{'visible': [True, False, False, True]}, # the index of True aligns with the indices of plot traces
{'title': 'MSFT',
}]),
dict(label = 'P_TYR_RA',
method = 'update',
args = [{'visible': [False, True, False, True]},
{'title': 'AAPL',
}]),
dict(label = 'P_INT',
method = 'update',
args = [{'visible': [False, False, True, True]},
{'title': 'AMZN',
}]),
dict(label = 'P_FUE',
method = 'update',
args = [{'visible': [False, False, False, True]},
{'title': 'GOOGL',
}]),
])
)
])
fig.update_layout(template="plotly_dark")
fig.show()
enter image description here

Plotly Scatter plot: how to create a scatter or line plot for only one group

My question might seem very easy, but I am having a difficult time understanding how to create a scatter plot or line plot for only one group of values. For example, my data frame, has 3 columns.
My table looks like the following:
fruit
lb
price
orange
1
1.4
orange
2
1.7
apple
3
2.1
apple
1
1.4
kiwi
2
1.1
I want to create a scatter plot that has the lb as the x axis and price as the y axis. However, I only want to make the plot only for the orange category. What parameter should I use to specify the orange category?
What I have now is this:
px.scatter(df, x=df.lb, y=df.price)
Adding a user selection dropdown will accomplish your goal. Use a graph object to draw a graph for each type of fruit and show the Show/Hide setting. All and only each type will be available as a type of dropdown. Give the list of Show/Hide as input for the button. Now, the drop-down selection will toggle between show and hide. Please refer to the examples in the reference.
import plotly.graph_objects as go
fig = go.Figure()
for f in df['fruit'].unique():
dff = df.query('fruit == #f')
fig.add_trace(go.Scatter(mode='markers', x=dff.lb, y=dff.price, name=f, visible=True))
fig.update_layout(
updatemenus=[
dict(
active=0,
buttons=list([
dict(label="ALL",
method="update",
args=[{"visible": [True, True, True]},
{"title": "All fruit"}]),
dict(label="Orange",
method="update",
args=[{"visible": [True, False, False]},
{"title": "Orange"}]),
dict(label="Apple",
method="update",
args=[{"visible": [False, True, False]},
{"title": "Apple"}]),
dict(label="Kiwi",
method="update",
args=[{"visible": [False, False, True]},
{"title": "Kiwi"}]),
]),
)
])
fig.show()

Boxplot with a bolean column and a Int value

I would like to create a boxplot of the distribution of the variable duration according to whether the film belongs to the category Dramas or (true or false)
Unfortunately these two options do not take into account whether the in_Dramas column is true or false...
Notice that the two columns are in the same DataFrame
movies.boxplot(column= 'in_drama', by='duree', figsize= (7,7));
# sns.catplot(x="in_drama", y="duree" , kind="box", data=movies);
For the pandas boxplot, you can set by='in_drama' and column='duree' to get x-values of in_drama == False and in_drama == True, and boxplots taking into account the duree column:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
movies = pd.DataFrame({'in_drama': [False, False, False, False, False, True, True, True, True, True],
'durée': [95, 118, 143, 89, 91, 145, 168, 193, 139, 141]})
movies.boxplot(by='in_drama', column='durée', figsize=(7, 7))
plt.show()
The seaborn plot should also work. As only one subplot is needed, sns.boxplot can be used directly.
sns.set()
sns.boxplot(x="in_drama", y="durée", data=movies)
At the left the pandas boxplot, at the right seaborn:

heatmap and dendrogram (clustermap) error using Plotly

The last example in Plotly's documentation for Dendrograms has an error. When executing this code, I get this error in two locations due to 'extend':
AttributeError: ‘tuple’ object has no attribute ‘extend’
They are produced by these lines: figure.add_traces(heatmap) and figure['data'].extend(dendro_side['data'])
If anyone has run into this problem, please see my solution below! Happy coding!
I have a quick and accurate solution to run the last example code in Plotly's documentation for Dendrograms. Note that I am using Plotly offline in a Jupyter Notebook.
Figure has methods to add_traces, and these should replace extend.
The three key lines are :
figure.add_traces(dendro_side[‘data’])
figure.add_traces(heatmap)
plotly.offline.iplot(figure, filename=‘dendrogram_with_heatmap’)
Here is the full example code with my corrections and necessary imports, below:
# Import Useful Things
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
plotly.offline.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
import numpy as np
from scipy.spatial.distance import pdist, squareform
# Get Data
data = np.genfromtxt("http://files.figshare.com/2133304/ExpRawData_E_TABM_84_A_AFFY_44.tab",names=True,usecols=tuple(range(1,30)),dtype=float, delimiter="\t")
data_array = data.view((np.float, len(data.dtype.names)))
data_array = data_array.transpose()
labels = data.dtype.names
# Initialize figure by creating upper dendrogram
figure = ff.create_dendrogram(data_array, orientation='bottom', labels=labels)
for i in range(len(figure['data'])):
figure['data'][i]['yaxis'] = 'y2'
# Create Side Dendrogram
dendro_side = ff.create_dendrogram(data_array, orientation='right')
for i in range(len(dendro_side['data'])):
dendro_side['data'][i]['xaxis'] = 'x2'
# Add Side Dendrogram Data to Figure
figure.add_traces(dendro_side['data'])
# Create Heatmap
dendro_leaves = dendro_side['layout']['yaxis']['ticktext']
dendro_leaves = list(map(int, dendro_leaves))
data_dist = pdist(data_array)
heat_data = squareform(data_dist)
heat_data = heat_data[dendro_leaves,:]
heat_data = heat_data[:,dendro_leaves]
heatmap = [
go.Heatmap(
x = dendro_leaves,
y = dendro_leaves,
z = heat_data,
colorscale = 'Blues'
)
]
heatmap[0]['x'] = figure['layout']['xaxis']['tickvals']
heatmap[0]['y'] = dendro_side['layout']['yaxis']['tickvals']
figure.add_traces(heatmap)
# Edit Layout
figure['layout'].update({'width':800, 'height':800,
'showlegend':False, 'hovermode': 'closest',
})
# Edit xaxis
figure['layout']['xaxis'].update({'domain': [.15, 1],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'ticks':""})
# Edit xaxis2
figure['layout'].update({'xaxis2': {'domain': [0, .15],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks':""}})
# Edit yaxis
figure['layout']['yaxis'].update({'domain': [0, .85],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks': ""})
# Edit yaxis2
figure['layout'].update({'yaxis2':{'domain':[.825, .975],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks':""}})
# Plot using Plotly Offline
plotly.offline.iplot(figure, filename='dendrogram_with_heatmap')
This outputs:

Plotting multiple stacked bar graph given a pandas dataframe in Python

There are a few things that I would like to express in a bar chart and have no clue about doing it using basic graphing techniques in matplotlib. I have a dataframe which is shown below and would like to obtain a bar chart as described below. The x-axis is based on the the Type column of the dataframe and within a single bar, the different colors are based on the Name column and the size of the bar is defined by the Count number. The color of the different names need not to be the same across different types,
as long as the colors within a single bar is different.
You can use pivot and then plot
df.pivot('Type', 'Name', 'Count').plot(kind = 'bar', stacked = True, color = ['b','g','orange','m', 'r'])
Edit: To sort the values
df.pivot('Type', 'Name', 'Count').sort_values(by = 'A', ascending = False, axis = 1)\
.plot(kind = 'bar', stacked = True, color = ['g','r','b','orange', 'm'])
I you want to change the size of plot the use arg figsize=(15, 5)
df.pivot('Type', 'Name', 'Count').sort_values(by = 'A', ascending = False, axis = 1)\
.plot(kind = 'bar', stacked = True, color = ['g','r','b','orange', 'm'], figsize=(15,5))

Categories