Plotly scatter3d go empty dealing with a huge datapoints - python

I am trying to plot a huge number of data points, if I use the following code, it can work properly
N = 615677
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N),
z=np.random.randn(N)))
marker_data = go.Scatter3d(
x=np.random.randn(N),
y=np.random.randn(N),
z=np.random.randn(N),
marker=go.scatter3d.Marker(size=1),
mode='markers',
)
fig = go.Figure(data=marker_data)
fig.show()
figure 1, N=615677, normal plot
However, if I set
N = 615678
I will get an empty graph, it only plots axes without any data points.
figure 2, N=615678, wrong plot
Does anyone know what caused it? I can deal with it with downsampling, but it may be not the best way.

Related

xaxes labels not displaying correctly

I'm trying to create a histogram based on the following dataset.
I want independent x axes with labels, so I tried the following code:
fig = go.Figure()
fig = px.histogram( x=df["mun"], y=df["cust"], color=df["prod"], facet_col=df["pr"] )
fig.update_xaxes(matches=None, showticklabels=True)
fig.show()
As you can see the second plot does not show the labels for x. I don't understand why this is happening. How can I fix it?
I don't know why this is happening - it may be some bug in the categorical axis labels when plotly generates facet plots.
You can manually specify category_array=['D','E'] when you update the xaxes, which is admittedly a brittle workaround:
fig.update_xaxes(matches=None, showticklabels=True, categoryarray=['D','E'])

How can I return a matplotlib figure from a function?

I need to plot changing molecule numbers against time. But I'm also trying to investigate the effects of parallel processing so I'm trying to avoid writing to global variables. At the moment I have the following two numpy arrays tao_all, contains all the time points to be plotted on the x-axis and popul_num_all which contains the changing molecule numbers to be plotted on the y-axis.
The current code I've got for plotting is as follows:
for i, label in enumerate(['Enzyme', 'Substrate', 'Enzyme-Substrate complex', 'Product']):
figure1 = plt.plot(tao_all, popul_num_all[:, i], label=label)
plt.legend()
plt.tight_layout()
plt.show()
I need to encapsulate this in a function that takes the above arrays as the input and returns the graph. I've read a couple of other posts on here that say I should write my results to an axis and return the axis? But I can't quite get my head around applying that to my problem?
Cheers
def plot_func(x, y):
fig,ax = plt.subplots()
ax.plot(x, y)
return fig
Usage:
fig = plot_func([1,2], [3,4])
Alternatively you may want to return ax. For details about Figure and Axes see the docs. You can get the axes array from the figure by fig.axes and the figure from the axes by ax.get_figure().
In addition to above answer, I can suggest you to use matplotlib animation.FuncAnimation method if you are working with the time series and want to make your visualization better.
You can find the details here https://matplotlib.org/api/_as_gen/matplotlib.animation.FuncAnimation.html

How to reproduce this legend with multiple curves?

I've been working hard on a package of functions for my work, and I'm stuck on a layout problem. Sometimes I need to work with a lot of columns subplots (1 row x N columns) and the standard matplotlib legend sometimes is not helpful and makes it hard to visualize all the data.
I've been trying to create something like the picture below. I already tried to create a subplot for the curves and another one for the legends (and display the x-axis scale as a horizontal plot). Also, I tried to spine the x-axis, but when I have a lot of curves plotted inside the same subplots the legend becomes huge.
The following image is from a software. I'd like to create a similar look. Notice that these legends are "static": it remains fixed independent of the zooming. Another observation is, I don't need all the ticks or anything like that.
What I'm already have is the following (the code is a mess, becouse I'm trying many different solutions and it is not organized nor pythonic yet.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,2, sharey = True)
ax[0].semilogx(np.zeros_like(dados.Depth)+0.02, dados.Depth)
ax[0].semilogx(dados.AHT90, dados.Depth, label = 'aht90')
ax[0].set_xlim(0.2,2000)
ax[0].grid(True, which = 'both', axis = 'both')
axres1 = ax[0].twiny()
axres1.semilogx(dados.AHT90, dados.Depth, label = 'aht90')
axres1.set_xlim(0.2 , 2000)
axres1.set_xticks(np.logspace(np.log10(0.2),np.log10(2000),2))
axres1.spines["top"].set_position(("axes", 1.02))
axres1.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
axres1.tick_params(axis='both', which='both', labelsize=6)
axres1.set_xlabel('sss')#, labelsize = 5)
axres2 = ax[0].twiny()
axres2.semilogx(dados.AHT10, dados.Depth, label = 'aht90')
axres2.set_xlim(0.2 , 2000)
axres2.set_xticks(np.logspace(np.log10(0.2),np.log10(2000),2))
axres2.spines["top"].set_position(("axes", 1.1))
axres2.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
axres2.tick_params(axis='both', which='both', labelsize=6)
axres2.set_xlabel('aht10')#, labelsize = 5)
fig.show()
and the result is:
But well, I'm facing some issues on make a kind of make it automatic. If I add more curves, the prameter "set position" it is not practical to keep setting the position "by hand"
set_position(("axes", 1.02))
and another problem is, more curves I add, that kind of "legend" keep growing upward, and I have to adjust the subplot size with
fig.subplots_adjust(top=0.75)
And I'm also want to make the adjustment automatic, without keeping updating that parameter whenever I add more curves

Data not plotting, but no errors

I am trying to plot some precipitation data. The code I'm using is modified slightly from this code here.
The code works fine when I plot using the data from the site used in the link, but when I use a different dataset I have, it doesn't plot. The biggest difference between this dataset and the dataset used in the link's example, is my dataset is global data. The dataset I am using is also netcdf, is not masked, and I am loading it the same way as the example.
I am familiar with the data and know for a fact I should be seeing something and the contour values used in the example are reasonable for this other set of data I am using.
My code is the same, expect for some changes in the section that plots the figure (below) which I have modified so it will plot a specific area instead of CONUS like in the example (using ax.set_extent).
When I do not set the extent it appears to plot the data, but then none of the boundaries (coastlines, state lines, etc.) do not plot. Based on this, I'm guessing it's something with either the dataset itself, something with set_extent, or a combination of things that is causing it to go wrong. I am not getting back any kind of errors when I plot it, either way. However, there might be something else I'm missing with it.
In the end, I'm actually comparing my dataset to the dataset used in the example link, so I would like them in the same projection.
Thanks for any insight and let me know if you need more information about the data itself!
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.set_extent((x1,x0,y0,y1))
# draw coastlines, state and country boundaries, edge of map.
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
ax.add_feature(cfeature.STATES)
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm)
# add colorbar.
cbar = plt.colorbar(cs1, orientation='horizontal')
#cbar.set_label(data1.units)
#ax.set_title(prcpvar.long_name + ' for period ending ' + nc.creation_time)
plt.show()
plt.savefig('ncep_model')
Results when extent is not included in code above:
Edit 1:
I'll add that I was able to successfully plot the data with this code below (from a default template I made). I tried to change the projection to stereographic, but I was having trouble getting it to plot correctly using basemap because I've never used it before. As an alternative, if you can't figure out the error with the code above and could instead help with changing the projection for the code below, I would also take that. At this point I just want my data to plot correctly in the correct projection I want!
(I also included the results for the code below to confirm that the data should be showing up in this location)
LLlat = 40.
LLlon = 263.
URlat = 44.
URlon = 270.
lat = xm
lon = ym
%matplotlib inline
plt.figure(1,figsize=(10, 8),)
plt.title('Convective Precipitation 8/28/2018 0Z (in) Valid July 2018')
map = Basemap(projection='cyl',\
llcrnrlat=LLlat,urcrnrlat=URlat,\
llcrnrlon=LLlon,urcrnrlon=URlon,\
rsphere=6371200.,resolution='i')
map.drawcoastlines(linewidth=0.5) # Draw some coastlines
map.drawstates(linewidth=0.5) # Draw some coastlines
map.drawrivers(color='#000000')
map.drawparallels(np.arange(-90.,91.,30),labels=[1,0,0,0]) # Drawing lines of latitude
map.drawmeridians(np.arange(0.,330.,60),labels=[0,0,0,1]) # Drawing lines of longitude
lons,lats = map(lon,lat) # Setting up the grid in cylindrical coords.
cs = plt.contourf(lons,lats,data1[:,:], clevs,cmap=cmap, norm=norm)
cb = plt.colorbar(cs,orientation='horizontal')
plt.show()
Edit 2:
I've added the resulting plot when I don't include the set_extent in the first chunk of code (Don't know if that will help at all, but thought I'd include it as well)
So it'd be really useful to have more information on your data, like a link to sample file, but my guess is that your data do not give coordinates in a stereographic projection, unlike the original data. When plotting with Cartopy, if you do not specify otherwise, all plot commands assume that the x,y values given are in the projection specified for the axes (for the original code this was ccrs.Stereographic). If this is not the case, such as when plotting lon/lats, you need to specify this by passing transform to the plotting command, as below where I specify that the x,y values are lat/lons:
data_proj = ccrs.PlateCarree()
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm,
transform=data_proj)

How to reduce the number of data points in a scatter chart?

Currently I have a problem for plotting a huge amount of X,Y data in a scatter chart by using the plotly's engine and python. So the browser can't actually render this amount of points without crashing after some time. (I've also tried the Scattergl option https://plot.ly/python/webgl-vs-svg/)
Is there any algorithms to reduce this huge amount of points without losing the original shape of the scatter chart? Maybe something like the iterative end-point fit algorithm?
EDIT:
some code
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import plot
import numpy as np
N = 1000000
trace = go.Scattergl(
x = np.random.randn(N),
y = np.random.randn(N),
mode = 'markers',
marker = dict(
line = dict(
width = 1,
color = '#404040')
)
)
data = [trace]
layout = go.Layout(title='A Simple Plot', width=1000, height=350)
fig = go.Figure(data=data, layout=layout)
plot(fig)
If you are just trying to visualize the regions where the data points exist, it might be more effective to convert the x-y data into a grid of densities. This may be better than a scatter plot because when you have a very large number of points, the points can obscure each other so you really have no idea how many points there are in certain areas.
I'm not familiar with plotly (I use matplotlib.pyplot) but I see there is at least one way to do this.
One way would be to randomly sample from the scatter points. As long as you're sampling enough points, it can be extremely likely you have a similar shape.
For example, to randomly sample 10,000 of the 1 million points you would use
i_plot = np.random.choice(N, size=10000, replace=False)
trace = go.Scattergl(
x = np.random.randn(N)[i_plot],
y = np.random.randn(N)[i_plot],
mode = 'markers',
marker = dict(
line = dict(
width = 1,
color = '#404040')
)
)
This snippet might look silly, but in reality you'll have an actual arrays instead of np.random.randn(N), so it will make sense to randomly sample from those arrays.
You'll want to test different numbers of points, and probably increase it to the maximum number of points the engine can handle without lagging or crashing.
You should try DataShader package (http://datashader.readthedocs.io/en/latest/) which focuses exactly on that - transformation of huge number of data points into something more amenable to visualization. They also provide argumentation why their approach might be better than a simple heatmap: https://anaconda.org/jbednar/plotting_pitfalls/notebook

Categories