Data Availability Chart in Python

Data Availability Chart in Python - python

I am wondering if Python has something to plot the data availability of time series with multiple variables. An example is shown below taken from Visavail.js - A Time Data Availability Chart.

Here's a suggestion using plotly in a Jupyter Notebook:
Code:
import random
import pandas as pd
import plotly.express as px
from random import choices
# random data with a somewhat higher
# probability of 1 than 0 to mimic OPs data
random.seed(1)
vals=[0,1]
prob=[0.4, 0.6]
choices(vals, prob)
data=[]
for i in range(0,5):
data.append([choices(vals, prob)[0] for c in range(0,10)])
# organize data in a pandas dataframe
df=pd.DataFrame(data).T
df.columns=['Balance Sheet', 'Closing Price', 'Weekly Report', 'Analyst Data', 'Annual Report']
drng=pd.date_range(pd.datetime(2080, 1, 1).strftime('%Y-%m-%d'), periods=df.shape[0]).tolist()
df['date']=[d.strftime('%Y-%m-%d') for d in drng]
dfm=pd.melt(df, id_vars=['date'], value_vars=df.columns[:-1])
# plotly express
fig = px.bar(dfm, x="date", y="variable", color='value', orientation='h',
hover_data=["date"],
height=600,
color_continuous_scale=['firebrick', '#2ca02c'],
title='Data Availabiltiy Plot',
template='plotly_white',
)
fig.update_layout(yaxis=dict(title=''), xaxis=dict(title='', showgrid=False, gridcolor='grey',
tickvals=[],
)
)
fig.show()

Try Plotly Gantt Charts
https://plotly.com/python/gantt/
The benefit of using this is, you can use actual time as value and also separate ressources and tasks.

Related

Plotly horizontal bar comparizon

The chart attached is from R plotly package. Does this exist or can be done in python using the plotly package?

You can create diverging stacked bars in plotly-python by plotting the bars for male and female populations as separate traces, making the population values negative for the men, and then using the original values in the customdata so the populations for men display positive values.
I followed the method outlined by #empet in his answer here, and modified the categories and hovertemplate to fit your example.
import numpy as np
import pandas as pd
import plotly.graph_objects as go
d = {'Age': ['0-19','20-29','30-39','40-49','50-59','60-Inf'],
'Male': [1000,2000,4200,5000,3500,1000],
'Female': [1000,2500,4000,4800,2000,1000],
}
df = pd.DataFrame(d)
fig = go.Figure()
fig.add_trace(go.Bar(x=-df['Male'].values,
y=df['Age'],
orientation='h',
name='Male',
customdata=df['Male'],
hovertemplate = "Age: %{y}<br>Pop:%{customdata}<br>Gender:Male<extra></extra>"))
fig.add_trace(go.Bar(x= df['Female'],
y =df['Age'],
orientation='h',
name='Female',
hovertemplate="Age: %{y}<br>Pop:%{x}<br>Gender:Female<extra></extra>"))
fig.update_layout(barmode='relative',
height=400,
width=700,
yaxis_autorange='reversed',
bargap=0.01,
legend_orientation ='h',
legend_x=-0.05, legend_y=1.1
)
fig.show()

How to format plotly legend when using marker color?

I want to follow up on this post: Plotly: How to colorcode plotly graph objects bar chart using Python?.
When using plotly express, and specifying 'color', the legend is correctly produced as seen in the post by vestland.
This is my plotly express code:
data = {'x_data': np.random.random_sample((5,)),
'y_data': ['A', 'B', 'C', 'D', 'E'],
'c_data': np.random.randint(1, 100, size=5)
}
df = pd.DataFrame(data=data)
fig = px.bar(df,
x='x_data',
y='y_data',
orientation='h',
color='c_data',
color_continuous_scale='YlOrRd'
)
fig.show()
But when using go.Bar, the legend is incorrectly displayed as illustrated here:
This is my code using graph objects:
bar_trace = go.Bar(name='bar_trace',
x=df['x_data'],
y=df['y_data'],
marker={'color': df['c_data'], 'colorscale': 'YlOrRd'},
orientation='h'
)
layout = go.Layout(showlegend=True)
fig = go.FigureWidget(data=[bar_trace], layout=layout)
fig.show()
I'm learning how to use FigureWidget and it seems it can't use plotly express so I have to learn how to use graph objects to plot. How do I link the legend to the data such that it works like the plotly express example in vestland's post.

This really comes down to understanding what a high level API (plotly express) does. When you specify color in px if it is categorical it creates a trace per value of categorical. Hence the below two ways of creating a figure are mostly equivalent. The legend shows an item for each trace, not for each color.
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
df = pd.DataFrame({"x":np.linspace(0,10,10), "y":np.linspace(5,15,10), "color":np.random.choice(list("ABCD"),10)})
px.bar(df, x="x", y="y", color="color", orientation="h").show()
fig = go.Figure()
for g in df.groupby("color"):
fig.add_trace(go.Bar(x=g[1]["x"], y=g[1]["y"], name=g[0], orientation="h"))
fig
supplementary based on comments
you do not have to use graph objects if you are using FigureWidget() as demonstrated by second figure, create with plotly express and then generate FigureWidget()
for continuous data normal pattern is to use a single trace and a colorbar (also demonstrated in second figure). However if you want a discrete legend, create a trace per value in c_data and use https://plotly.com/python-api-reference/generated/plotly.colors.html sample_colorscale()
import plotly.express as px
import plotly.colors
import plotly.graph_objects as go
import numpy as np
import pandas as pd
# simulate data frame...
df = pd.DataFrame(
{
"x_data": np.linspace(0, 10, 10),
"y_data": np.linspace(5, 15, 10),
"c_data": np.random.randint(0, 4, 10),
}
)
# build a trace per value in c_data using graph objects ... correct legend !!??
bar_traces = [
go.Bar(
name="bar_trace",
x=d["x_data"],
y=d["y_data"],
marker={
"color": plotly.colors.sample_colorscale(
"YlOrRd",
d["c_data"] / df["c_data"].max(),
)
},
orientation="h",
)
for c, d in df.groupby("c_data")
]
layout = go.Layout(showlegend=True)
fig = go.FigureWidget(data=bar_traces, layout=layout)
fig.show()
fig = px.bar(
df,
x="x_data",
y="y_data",
color="c_data",
orientation="h",
color_continuous_scale="YlOrRd",
)
fig = go.FigureWidget(data=fig.data, layout=fig.layout)
fig.show()

Different output for Jupyter than Atom

I've created a waterfall graph using Jupyter and Atom (as I'm looking for a decent substitute for Jupyter, specially when it's related to dataframe visualisation)
Thing is that I used the same exact code in both editors but the output of the graph is different.
Does someone have an explanation?
Here is the code used:
import pandas as pd
import numpy as np
import plotly
import plotly.graph_objs as go
#read excel file
df=pd.read_csv('C:/Users/Usuario/Desktop/python/HP/waterfall.csv',sep=';')
df['Measure']=df['Measure'].str.lower()
display(df)
#store values in different variables
x=df['Deal ID']
y=df['deal value (USD)']
measure = df['Measure']
text=df['deal value (USD)']
#let's create the figure
fig = go.Figure(go.Waterfall(
measure=measure,
x=x,
y=y,
text=text,
textposition="outside",
decreasing = {"marker":{"color":"Maroon", "line":{"color":"red", "width":2}}},
increasing = {"marker":{"color":"Teal"}},
totals = {"marker":{"color":"deep sky blue", "line":{"color":"blue", "width":3}}},
showlegend=False
))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False, visible=False)
fig.update_traces(hovertemplate=None)
fig.update_layout(title='Total deal value per customer X', height=470,
margin=dict(t=90, b=20, l=70, r=70),
hovermode="x unified",
xaxis_title='QvsQ ', yaxis_title="deal value in USD",
plot_bgcolor='rgba(0,0,0,0)',
#paper_bgcolor='#333',
title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
font=dict(color='#8a8d93'))
Atom otput:
Jupyter output: [![Jupyter output][2]][2]
Thanks
[2]: https://i.stack.imgur.com/wYPEG.png

How to get standard notation (rather than scientific) when hovering over pie chart in Plotly

I have a pie chart that displays worldwide movie sales by rating. When I hover over the chart the woldwide sales are being displayed in scientific notation. How do I fix this so that worldwide sales are represented in standard notation instead? I would appreciate it if anyone has a solution to this in express or graph objects (or both).
Thank you.
# formatting and importing data
import pandas as pd
movie_dataframe = pd.read_csv("https://raw.githubusercontent.com/NicholasTuttle/public_datasets/main/movie_data.csv") # importing dataset to dataframe
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace(',', '', regex=True) # removing commas from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].str.replace('$', '' , regex=True ) # removing dollar signs from column
movie_dataframe['worldwide_gross'] = movie_dataframe['worldwide_gross'].astype(float)
# narrowing dataframe to specific columns
movies_df = movie_dataframe.loc[:, ['title', 'worldwide_gross', 'rating', 'rt_score', 'rt_freshness']]
# plotly express
import plotly.express as px
fig = px.pie(movies_df,
values= movies_df['worldwide_gross'],
names= movies_df['rating'],
)
fig.show()
# plotly graph objects
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating']
))
fig.show()

Have a look here: https://plotly.com/python/hover-text-and-formatting/#disabling-or-customizing-hover-of-columns-in-plotly-express
Basically you give a dictionary of row name and format string to hover_data. The formatting string follows the d3-format's syntax.
import plotly.express as px
fig = px.pie(
movies_df, values= movies_df['worldwide_gross'], names= movies_df['rating'],
hover_data={
"worldwide_gross": ':.d',
# "worldwide_gross": ':.2f', # float
}
)
fig.show()
For the graph object API you need to write an hover_template:
https://plotly.com/python/reference/pie/#pie-hovertemplate
import plotly.graph_objects as go
fig = go.Figure(go.Pie(
values = movies_df['worldwide_gross'],
labels = movies_df['rating'],
hovertemplate='Rating: %{label}<br />World wide gross: %{value:d}<extra></extra>'
))
fig.show()

Plotly: How to prepare data visualization for below image using scatter bubble chart?

Here is my dataset after cleaning csv file
Here is output what I want
What I want is , I have to display years in x axis and column values in y axis.and I want to display bubbles with different colors and size with play animation button
I am new to data science , can someone help me ,how can I achieve this?

Judging by your dataset and attached image, what you're asking for is something like this:
But I'm not sure that is what you actually want. You see, with your particular dataset there aren't enough dimensions to justify an animation. Or even a bubble plot. This is because you're only looking at one value. So you end up showing the same value throuh the bubble sizes and on the y axis. And there's really no need to change your dataset given that your provided screenshot is in fact your desired plot. But we can talk more about that if you'd like.
Since you haven't provided a sample dataset, I've used a dataset that's available through plotly express and reshaped it so that is matches your dataset:
Complete code:
# imports
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import math
import numpy as np
# color cycle
colors = px.colors.qualitative.Alphabet*10
# sample data with similar structure as OP
df = px.data.gapminder().query("continent=='Americas'")
dfp=df.pivot(index='year', columns='country', values='pop')
dfp=dfp[['United States', 'Mexico', 'Argentina', 'Brazil', 'Colombia']]
dfp=dfp.sort_values(by='United States', ascending = False)
dfp=dfp.T
dfp.columns = [str(yr) for yr in dfp.columns]
dfp = dfp[dfp.columns[::-1]].T
# build figure and add traces
fig=go.Figure()
for col, country in enumerate(dfp):
vals = dfp[country].values
yVals = [col]*len(vals)
fig.add_traces(go.Scatter(
y=yVals,
x=dfp.index,
mode='markers',
marker=dict(color=colors[col],
size=vals,
sizemode='area',
#sizeref=2.*max(vals)/(40.**2),
sizeref=2.*max(dfp.max())/(40.**2),
sizemin=4),
name = country
))
# edit y tick layout
tickVals = np.arange(0, len(df.columns))
fig.update_layout(
yaxis = dict(tickmode = 'array',
tickvals = tickVals,
ticktext = dfp.columns.tolist()))
fig.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Data Availability Chart in Python - python

I am wondering if Python has something to plot the data availability of time series with multiple variables. An example is shown below taken from Visavail.js - A Time Data Availability Chart.

Try Plotly Gantt Charts https://plotly.com/python/gantt/ The benefit of using this is, you can use actual time as value and also separate ressources and tasks.

Related

Plotly horizontal bar comparizon

How to format plotly legend when using marker color?

Different output for Jupyter than Atom

How to get standard notation (rather than scientific) when hovering over pie chart in Plotly

Plotly: How to prepare data visualization for below image using scatter bubble chart?

Categories

Resources