How to reduce Plotly HTML size in Python? - python

I am doing three plots:
Line plot, Boxplot and Histogram.
All these plots don't really need hoverinfo. Also these plots don't really need to plot each point of the data.
Ass you can see the plots are very simple, however, when dealing with huge data (30 million of observations) the html results weights 5 MBs, which is a lot because there are 100 plots more like this.
At the moment I have made some optimizations...
When saving to html I put these parameters:
fig.to_html( include_plotlyjs="cdn", full_html=False)
Which reduces plot size a lot, however is not enough.
I have also tried in the Line plot specifying this parameter line = {"simplify":True} and hoverinfo = "skip". However, file size is almost the same.
Any help/ workaround is appreciated

Related

i want to make my plots smooth in python, how can i do it?

I have 600 files that each one contain The two columns i, phase. I want to plot i in terms of phase, but my plots are not smooth. I want to write a code in python that makes these 600 plots smoothly for me so that I can give them to CNN. I used the interpolation method before but it didn't work
thank you for your help
it is one of my plot that isn't smooth

Problem plotting three sets of data in one graph

EDIT - I was being stupid, and trying to plot strings. I converted to int and plotted again fine. Thanks to ImportanceOfBeingErnest for the hint.
I have data from 3 sensors which I want to plot, using matplotlib
Each array is of different length, and I plot them using the following line of code
plt.plot(s_1,'r',s_3,'b',s_4,'g')
plt.show()
This produces the following graph
As you can see, the green trace is not correct, and the y-axis scale is off (these is a 6 after the 21).
I'm really not sure what the problem is here.
When I plot the data individually, they are fine:
It is the last one in this series that is plotted strangely in the graph with all three at once.
To be clear, I don't understand why separately the graphs plot fine, but when the three are printed in one plot the y-axis gets messed up.
Any advice around what the issue with the three-in-one plot is would be great.

How to plot a dataframe that contains values spread over a large spectrum of values?

I have the following dataframe, resulted from running grid search over several regression models:
As it can be noticed, there are many values grouped around 0.0009, but several that are a few orders of magnitude higher (-1.6, -2.3 etc).
I would like to plot these results, but I don't seem to find a way to get a readable plot. I have tried a bar plot, but I get something like:
How can I make this bar plot more readable? Or what other kind of plot would be more suitable to visualize such data?
Edit: Here is the dataframe, exported as CSV:
,a,b,c,d
LinearRegression,0.000858399508896,-4.11609208874e+20,0.000952538859738,0.000952538859733
RandomForestRegressor,-1.62264355718,-2.30218457629,0.0008957696846039999,0.0008990722465239999
ElasticNet,0.000883257900658,0.0008525502791760002,0.000884706195921,0.000929498696126
Lasso,7.92193516085e-05,-1.84086765436e-05,7.92193516085e-05,-1.84086765436e-05
ExtraTreesRegressor,-6.320170496909999,-6.30420308033,,
Ridge,0.0008584791396339999,0.0008601028734780001,,
SGDRegressor,-4.62522968756,,,
You could make the graph have a log scale, which is often used for plotting data with a very large range. This muddies the interpretation slightly, as now each equivalent distance is an equivalent order of magnitude difference. You can read about log scales here:
https://en.wikipedia.org/wiki/Logarithmic_scale

Bokeh line graph looping

I’ve been working on bokeh plots and I’m trying to plot a line graph taking values from a database. But the plot kind of traces back to the initial point and I don’t want that. I want a plot which starts at one point and stops at a certain point (and circle back). I’ve tried plotting it on other tools like SQLite browser and Excel and the plot seems ok which means I must be doing something wrong with the bokeh stuff and that the data points itself are not in error.
I’ve attached the images for reference and the line of code doing the line plot. Is there something I’ve missed?
>>> image = fig.line(“x”, “y”, color=color, source=something)
(Assume x and y are integer values and I’ve specified x and y ranges as DataRange1d(bounds=(0,None)))
Bokeh does not "auto-close" lines. You can see this is the case by looking at any number of examples in the docs and repository, but here is one in particular:
http://docs.bokeh.org/en/latest/docs/gallery/stocks.html
Bokeh's .line method will only "close up" if that is what is in the data (i.e., if the last point in the data is a repeat of the first point). I suggest you actually inspect the data values in source.data and I believe you will find this to be the case. Then the question is why is that the case and how to prevent it from doing that, but that is not really a Bokeh question.

Plotting an histogram in log log scale with identical bar thickness

I'm trying to plot input data in an histogram in log-log scale (to quickly view if this could fit a power law), but I'm having trouble in outputting the way I want. I'm using Python and more specificaly the matplotlib/numpy libraries:
thebins = N.linspace(min_data.min(),min_data.max(),int(sys.argv[len(sys.argv)-1]))
thebins = N.log(thebins)
bar_min = plt.hist(min_data,bins=thebins,alpha=0.40,label=['Minimal Distance'],log=True)
min_data is my 1d data array, the two first lines are for creating the bins and then putting them in a log scale. The final line is for 'filling' the bins/histogram with log y scale.
The graphical output is:
It may seem fussy but I'm not satisifed with having bins of different thickness, it seems to me that the data is harder to read or can even be misread from that. Not all log-log histogram have same width bins and I'm convinced it can be done within Python; do you have an idea of to change my code to get there?
Thank you in advance ;)
Should have been a nobrainer: I only had to take the log of my data for the x axis, and then build the histogram passing the argument "log=True" for the y axis.

Categories