plotly express for large data sets

plotly express for large data sets - python

import plotly.express as px
import pandas as pd
dfa = pd.DataFrame()
dfa["travel_time(min)"] = range(100000)
fig = px.ecdf(dfa["travel_time(min)"], x="travel_time(min)")
#fig.write_html("debug.html")
fig.show()
The 100k points are producing a graphic, which is lagging (with 10k points it is working fine).
How can I fix this? Is it possible to somehow precalculate the graphic?

Related

Build slice by slice a heatmap in Plotly

I am trying to build slice by slice a heatmap. How can I update the data present in the graph without generating a full figure? The following code each time produces a new plot. I tried to use fig.update_traces() but it didn’t work.
What am I missing?
Thanks
import plotly.express as px
import pandas as pd
import time
df = pd.DataFrame(np.random.rand(1,100))
for i in range(0,10):
df = df.append(pd.DataFrame(np.random.rand(1,100)), ignore_index = True)
time.sleep(1)
fig = px.imshow(df)
fig.show()

Plotly: How to increase the number of colors to assure unique colors for all lines?

I'd to plot a simple line plot but if I have more than 10 variables, plolty use the same color twice, how can I avoid it and always have a new color for a new variable ?
import pandas as pd
import numpy as np
pd.set_option("plotting.backend", "plotly")
df=pd.DataFrame(np.random.rand(100, 12)).cumsum()
df.plot()
Output:

You can pass a list of colors using the keyword colors like df.plot(colors=my_list).
Ich your list has as many colors as your DataFrame columns, the colors aren't repeaded.
Here is a example:
import pandas as pd
import numpy as np
import colorcet as cc
pd.set_option("plotting.backend", "matplotlib")
df=pd.DataFrame(np.random.rand(100, 12)).cumsum()
df.plot(color=cc.b_rainbow_bgyrm_35_85_c71[::15][:df.shape[0]])
Output

Effectively it's documented here https://plotly.com/python/discrete-color/. You are using interface to plotly express
Code below using a different set of colors.
import pandas as pd
import numpy as np
pd.set_option("plotting.backend", "plotly")
df=pd.DataFrame(np.random.rand(100, 12)).cumsum()
color_seq = ['#AA0DFE',
'#3283FE',
'#85660D',
'#782AB6',
'#565656',
'#1C8356',
'#16FF32',
'#F7E1A0',
'#E2E2E2',
'#1CBE4F',
'#C4451C',
'#DEA0FD',
'#FE00FA',
'#325A9B',
'#FEAF16',
'#F8A19F',
'#90AD1C',
'#F6222E',
'#1CFFCE',
'#2ED9FF',
'#B10DA1',
'#C075A6',
'#FC1CBF',
'#B00068',
'#FBE426',
'#FA0087']
df.plot(color_discrete_sequence=color_seq)

If you'd like to do this dynamically with regards to an arbitrary number of traces, you can sample a continuous plotly colorscale using px.colors.sample_colorscale() like this:
colors = px.colors.sample_colorscale("viridis", [n/(n_colors -1) for n in range(n_colors)])
df.plot(color_discrete_sequence=colors)
Plot 1 - 12 traces
Plot 2 - 50 traces
Complete code:
import pandas as pd
import numpy as np
import plotly.express as px
pd.set_option("plotting.backend", "plotly")
# data
df=pd.DataFrame(np.random.uniform(low=-2, high=2, size=(100,12))).cumsum()
# colors
n_colors = len(df.columns)
colors = px.colors.sample_colorscale("viridis", [n/(n_colors -1) for n in range(n_colors)])
# plot
df.plot(color_discrete_sequence=colors)

Fastest way to parse multiple header names to Plotly (Python

so I've been experimenting with plotly and trying to get plotting multiple traces. I wrote the following code which plots two traces on the same graph :
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.read_csv("data.csv")
headers = pd.read_csv("data.csv", index_col=0, nrows=0).columns.tolist()
fig = go.Figure()
fig = px.line(data, x="DateTime", y=[headers[0], headers[1]])
fig.show()
In this example the first and second headers are plotted as traces on the graph. I was wondering if there was a way other than y=[headers[n],headers[n+1]]... to get all the lines drawn on? I tried just using the headers array without an index, but it gives a ValueError
Plotly Express cannot process wide-form data with columns of different type.
So, is there a plotly-specific way to make this more efficient & readable than just writing every index in the plot header definition, or can it be done with standard python?
EDIT: the actual data sample is a csv providing int values with a header and date :
DateTime X Y Z
01-JAN-2018,5,6,7...
02-JAN-2018,7,8,9

if your sample data is what is in your CSV, it's a simple case of defining y as the numeric columns
import io
import pandas as pd
import plotly.express as px
headers = pd.read_csv(io.StringIO("""DateTime,X,Y,Z
01-JAN-2018,5,6,7
02-JAN-2018,7,8,9
"""))
px.line(headers, x="DateTime", y=headers.select_dtypes("number").columns)

Plotly bar chart not ascending/descending

I have a bar chart in plotly that I have produced, however, it is not in any type of order. How would I sort to ascending or descending?
What I am doing:
fig = px.bar(data, x='Old_SKU', y='u_power')
fig = data.sort_values('u_power', ascending=True)
fig.show()

I'm not sure what your desired output is, or what your data looks like. In any case fig in plotly terms is normaly a plotly figure object. When you're running fig = data.sort_values('u_power', ascending=True) you're not building a figure, but sorting a dataframe. So far I can only imagine that you'd like to sort a dataset that looks like this:
... into this:
Or maybe you're expecting a continuous increase or decrease? In that case you will have to share a dataset. Nevertheless, with a few tweaks depending on your dataset, the following snippet should not be far from a working solution:
import plotly.express as px
import numpy as np
import pandas as pd
var = np.random.randint(low=2, high=6, size=20).tolist()
data = pd.DataFrame({'u_power':var,
'Old_SKU':np.arange(0, len(var))})
# fig = px.bar(data, x='Old_SKU', y='u_power', barmode='stack')
fig = px.bar(data.sort_values('u_power'), x='Old_SKU', y='u_power', barmode='stack')
fig.show()

Plotly Animated Bar Graph Showing 1 subgroup only in Jupyter

Issue: When I run my code only one status (sub group) shows. The data set is very simple, create date, status and count. I can only think something might be wrong with my data set at this point. Why will it only show one status of the three I have or possibly it works better with a hosted file? It seems to just iterate through the list and not keep each data point in tact until the end. The other code block works fine on github.
Sample of my data set:
Status,Create Date,Count
None,17-Apr-12,8
None,30-Apr-12,9
None,23-Aug-12,10
None,3-Oct-12,11
None,9-Jan-13,12
None,29-Jan-13,13
QBOS,31-Jan-13,1
QBDS,1-Feb-13,1
My code:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
df = pd.read_csv('qb7.csv')
df.columns = ['Status','Create Date','Count']
includes=['None','QBDS', 'QBOS']
df=df[df['Status'].isin(includes)]
df['Create Date']= pd.to_datetime(df['Create Date']).dt.strftime('%Y-%m-%d')
fig = px.bar(df,
x="Status",
y="Count",
color="Status",
animation_frame="Create Date", hover_name="Status",
range_y=[0,8000])
fig.show()
``
Sample of what I want to make:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
df = pd.read_csv('https://raw.githubusercontent.com/shinokada/covid-19-stats/master/data/daily-new-
confirmed-cases-of-covid-19-tests-per-case.csv')
df.columns = ['Country','Code','Date','Confirmed','Days since confirmed']
includes=['United States','Russia', 'India','Brazil']
df=df[df['Country'].isin(includes)]
df['Date']= pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d')
fig = px.bar(df, x="Country", y="Confirmed", color="Country",
animation_frame="Date", animation_group="Country", range_y=[0,35000])
fig.show()`

I think the reason it doesn't show the intended graph is because of the different number of data. The intended result is achieved when the number of data is aligned.
import pandas as pd
import numpy as np
import io
data = '''
Status,Create Date,Count
None,17-Apr-12,8
None,30-Apr-12,9
None,23-Aug-12,10
None,3-Oct-12,11
None,9-Jan-13,12
None,29-Jan-13,13
QBOS,17-Apr-12,8
QBOS,30-Apr-12,9
QBOS,23-Aug-12,10
QBOS,3-Oct-12,11
QBOS,9-Jan-13,12
QBOS,29-Jan-13,13
QBDS,17-Apr-12,8
QBDS,30-Apr-12,9
QBDS,23-Aug-12,10
QBDS,3-Oct-12,11
QBDS,9-Jan-13,12
QBDS,29-Jan-13,13
'''
df = pd.read_csv(io.StringIO(data), sep=',')
import plotly.graph_objects as go
import plotly.express as px
# df = pd.read_csv('qb7.csv')
df.columns = ['Status','Create Date','Count']
includes=['None','QBDS', 'QBOS']
df=df[df['Status'].isin(includes)]
df['Create Date']= pd.to_datetime(df['Create Date']).dt.strftime('%Y-%m-%d')
fig = px.bar(df,
x="Status",
y="Count",
color="Status",
animation_frame="Create Date", hover_name="Status",
range_y=[0,30])
fig.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

plotly express for large data sets - python

Related

Build slice by slice a heatmap in Plotly

Plotly: How to increase the number of colors to assure unique colors for all lines?

Fastest way to parse multiple header names to Plotly (Python

Plotly bar chart not ascending/descending

Plotly Animated Bar Graph Showing 1 subgroup only in Jupyter

Categories

Resources