Memory efficient way of plotting several line segements in Plotly?

Memory efficient way of plotting several line segements in Plotly? - python

I have a collection of timestamps that correspond roughly to the time these modules called "metal cells" are being utilized. At present, I am grabbing these by pairs and plotting them by creating a small line segment using go.Scatter and adding it to a list which later functions as the data argument for a go.Figure object.
for mcell in metal_cells:
mdf = cell_dep_grouped.get_group(mcell).reset_index(drop=True)
for i in range(0, mdf.shape[0], 2):
start_row = mdf.iloc[i]
finish_row = mdf.iloc[i+1]
buffer_dat = go.Scatter(x = [start_row.Time, finish_row.Time],
y = [mcell, mcell],
line = {'color': chem_palette[start_row.Recipe]},
legendgroup = start_row.Recipe,
name = start_row.Recipe,
showlegend = not plotted_chem[start_row.Recipe])
plot_data.append(buffer_dat)
The output looks as the picture I have attached, and I would like it to continue looking as closely to it because it clearly highlights the time the module was being utilized using the scatter segments, and downtime as implied by the empty space between segments. The main issue is I need this plot to be interactive - the main interactive feature being that, upon performing a selection, some calculations are made and some of the traces change. But because the plot consists of 80 or so go.Scatter objects, it is quite slow. Is there a more efficient way of plotting segments like these that make the plot lighter and faster?

Related

Holoviews: how to customize histogram for linked time series Curve plots

I am just getting started with Holoviews. My questions are on customizing histograms, but also I am sharing a complete example as it may be helpful for other newbies to look at, since the documentation for Holoviews is very thorough but can be overwhelming.
I have a number of time series in text files loaded as Pandas DataFrames where:
each file is for a specific location
at each location about 10 time series were collected, each with about 15,000 points
I am building a small interactive tool where a Selector can be used to choose the location / DataFrame, and then another Selector to pick 3 of 10 of the time series to be plotted together.
My goal is to allow linked zooms (both x and y scales). The questions and code will focus on this aspect of the tool.
I cannot share the actual data I am using, unfortunately, as it is proprietary, but I have created 3 random walks with specific data ranges that are consistent with the actual data.
## preliminaries ##
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews.util.transform import dim
from holoviews.selection import link_selections
from holoviews import opts
from holoviews.operation.datashader import shade, rasterize
import hvplot.pandas
hv.extension('bokeh', width=100)
## create random walks (one location) ##
data_df = pd.DataFrame()
npoints=15000
np.random.seed(71)
x = np.arange(npoints)
y1 = 1300+2.5*np.random.randn(npoints).cumsum()
y2 = 1500+2*np.random.randn(npoints).cumsum()
y3 = 3+np.random.randn(npoints).cumsum()
data_df.loc[:,'x'] = x
data_df.loc[:,'rand1'] = y1
data_df.loc[:,'rand2'] = y2
data_df.loc[:,'rand3'] = y3
This first block is just to plot the data and show how, by design, one of the random walks have different range from the other two:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],value_label='y',width=800,height=400)
As a result, although hvplot subplots work out of the box (for linking), ranges are different so the scaling is not quite there:
data_df.hvplot(x='x',y=['rand1','rand2','rand3'],
value_label='y',subplots=True,width=800,height=200).cols(1)
So, my first attempt was to adapt the Python-based Points example from Linked brushing in the documentation:
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
hv.Points(data_df, dim).opts(color=c)
for c, dim in zip(colors, [['x', d] for d in dims])
])
link_selections(layout).opts(opts.Points(width=1200, height=300)).cols(1)
That is already an amazing result for a 20 minutes effort!
However, what I would really like is to plot a curve rather than points, and also see a histogram, so I adapted the comprehension syntax to work with Curve (after reading the documentation pages Applying customization, and Composing elements):
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([hv.Curve(data_df,'x',dim).opts(height=300,width=1200,
color=c).hist(dim) for c,
dim in zip(colors,[d for d in dims])])
link_selections(layout).cols(1)
Which is almost exactly what I want. But I still struggle with the different layers of opts syntax.
Question 1: with the comprehension from the last code block, how would I make the histogram share color with the curves?
Now, suppose I want to rasterize the plots (although I do not think is quite yet necessary with 15,000 points like in this case), I tried to adapt the first example with Points:
cmaps = ['Blues', 'Greens', 'Reds']
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
shade(rasterize(hv.Points(data_df, dims),
cmap=c)).opts(width=1200, height = 400).hist(dims[1])
for c, dims in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)
This is a decent start, but again I struggle with the options/customization.
Question 2: in the above cod block, how would I pass the colormaps (it does not work as it is now), and how do I make the histogram reflect data values as in the previous case (and also have the right colormap)?
Thank you!

Sander answered how to color the histogram, but for the other question about coloring the datashaded plot, Datashader renders your data with a colormap rather than a single color, so the parameter is named cmap rather than color. So you were correct to use cmap in the datashaded case, but (a) cmap is actually a parameter to shade (which does the colormapping of the output of rasterize), and (b) you don't really need shade, as you can let Bokeh do the colormapping in most cases nowadays, in which case cmap is an option rather than an argument. Example:
from bokeh.palettes import Blues, Greens, Reds
cmaps = [Blues[256][200:], Greens[256][200:], Reds[256][200:]]
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout([
rasterize(hv.Points(data_df, ds)).opts(cmap=c,width=1200, height = 400).hist(dims[1])
for c, ds in zip(cmaps, [['x', d] for d in dims])
])
link_selections(layout).cols(1)

To answer your first question to make the histogram share the color of the curve, I've added .opts(opts.Histogram(color=c)) to your code.
When you have a layout you can specify the options of an element inside the layout like that.
colors = hv.Cycle('Category10').values
dims = ['rand1', 'rand2', 'rand3']
layout = hv.Layout(
[hv.Curve(data_df,'x',dim)
.opts(height=300,width=600, color=c)
.hist(dim)
.opts(opts.Histogram(color=c))
for c, dim in zip(colors,[d for d in dims])]
)
link_selections(layout).cols(1)

Set colors in Matplotlib.pie by label name

I have created a large matrix of pie plots using a function that runs through a datafrane. I am only plotting in the pie charts two variables. When one of the variables is not present in the specific data, matplotlib automatically switches the colors. See sample picture below.
How would I make sure the colors stay consistent based on values? Would I manipulate the colors argument in my function?
my def code that I run the data through
#function to make matrix
def pie(v, l, color=None):
plt.pie(v, labels=l.values, colors = ????, autopct='%0.f')
#function being called for data - l='coverage'
g = sns.FacetGrid(market_covered_sum, col="mkt_mcap_decile", row="market",
margin_titles=True)
g.map(pie, "MKT_Cap_mn", "coverage").set_axis_labels(" ", " ")
I want to keep the colors consistent, and change them to a color code once I can keep consistent.

Filtering "bad" data from a pandas dataframe and plotting the "good" results

I am working with a function that plots the light curves of various sources from two pandas data frames. I would like to write a loop to cycle through each light curve in the data frame that will plot the good light curves. The function works by plotting two curves on the same plot, one from each data frame. I need the loop to count the data points from one of the curves (the LSST curve) and eliminate plots that have less than 5 points. All of the HiTS plots have enough data points, but the LSST plots have had filters applied to them. I am still relatively new to Python so I am pretty confused about how to go about doing this. The function in question is listed below:
def plotLsstHitsLightCurve(obj, srcTable, row, lcPath='random file path'):
plt.figure(figsize=(10,8))
# Plot LSST curve
srcRowFilter = (srcTable['diaObjectId'] == obj)
srcRow = srcTable.loc[srcRowFilter]
plt.errorbar(srcRow['midPointTai'], srcRow['magCol'],yerr=srcRow['magErrCol'],ls=':', marker='o', label='LSST')
# PLot HiTS Curve
tok = row['internalID'].split('_')
field = '_'.join([tok[0],tok[1]])
ccd = tok[2]
lightcurveFile = field + '_' + ccd + '_LC_50.tar.gz'
tarball = tarfile.open(os.path.join(lcPath,field,ccd,lightcurveFile))
data = tarball.extractfile(str(row['internalID'])+ '_g.dat')
dfl = pd.read_csv(data,sep='\t') # lead a file with a light curve data into a pandas dataframe
plt.errorbar(dfl.MJD,dfl.MAG_AP1,dfl.MAGERR_AP1, marker='o',linestyle=':', label='HiTS')
obj = goodObj.iloc[idx[2]]['diaObjectId']
row = hitsDf.iloc[2]
enter code here
enter code here

How to make the confidence interval (error bands) show on seaborn lineplot

I'm trying to create a plot of classification accuracy for three ML models, depending on the number of features used from the data (the number of features used is from 1 to 75, ranked according to a feature selection method). I did 100 iterations of calculating the accuracy output for each model and for each "# of features used". Below is what my data looks like (clsf from 0 to 2, timepoint from 1 to 75):
data
I am then calling the seaborn function as shown in documentation files.
sns.lineplot(x= "timepoint", y="acc", hue="clsf", data=ttest_df, ci= "sd", err_style = "band")
The plot comes out like this:
plot
I wanted there to be confidence intervals for each point on the x-axis, and don't know why it is not working. I have 100 y values for each x value, so I don't see why it cannot calculate/show it.

You could try your data set using Seaborn's pointplot function instead. It's specifically for showing an indication of uncertainty around a scatter plot of points. By default pointplot will connect values by a line. This is fine if the categorical variable is ordinal in nature, but it can be a good idea to remove the line via linestyles = "" for nominal data. (I used join = False in my example)
I tried to recreate your notebook to give a visual, but wasn't able to get the confidence interval in my plot exactly as you describe. I hope this is helpful for you.
sb.set(style="darkgrid")
sb.pointplot(x = 'timepoint', y = 'acc', hue = 'clsf',
data = ttest_df, ci = 'sd', palette = 'magma',
join = False);

plotly - multiple traces using a shared slider variable

As the title hints, I'm struggling to create a plotly chart that has multiple lines that are functions of the same slider variable.
I hacked something together using bits and pieces from the documentation: https://pastebin.com/eBixANqA. This works for one line.
Now I want to add more lines to the same chart, but this is where I'm struggling. https://pastebin.com/qZCMGeAa.
I'm getting a PlotlyListEntryError: Invalid entry found in 'data' at index, '0'
Path To Error: ['data'][0]
Can someone please help?

It looks like you were using https://plot.ly/python/sliders/ as a reference, unfortunately I don't have time to test with your code, but this should be easily adaptable. If you create each trace you want to plot in the same way that you have been:
trace1 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x[0:step],
y = y[0:step]) for step in range(len(x))]
where I note in my example my data is coming from pre-defined lists, where you are using a function, that's probably the only change you'll really need to make besides your own step size etc.
If you create a second trace in the same way, for example
trace2 = [dict(
type='scatter',
visible = False,
name = "trace title",
mode = 'markers+lines',
x = x2[0:step],
y = y2[0:step]) for step in range(len(x2))]`
Then you can put all your data together with the following
all_traces = trace1 + trace2
then you can just go ahead and plot it provided you have your layout set up correctly (it should remain unchanged from your single trace example):
fig = py.graph_objs.Figure(data=all_traces, layout=layout)
py.offline.iplot(fig)
Your slider should control both traces provided you were following https://plot.ly/python/sliders/ to get the slider working. You can combine multiple data dictionaries this way in order to have multiple plots controlled by the same slider.
I do note that if your lists of dictionaries containing data are of different length, that this gets topsy-turvy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Memory efficient way of plotting several line segements in Plotly? - python

Related

Holoviews: how to customize histogram for linked time series Curve plots

Set colors in Matplotlib.pie by label name

Filtering "bad" data from a pandas dataframe and plotting the "good" results

How to make the confidence interval (error bands) show on seaborn lineplot

plotly - multiple traces using a shared slider variable

Categories

Resources