xaxes labels not displaying correctly - python

I'm trying to create a histogram based on the following dataset.
I want independent x axes with labels, so I tried the following code:
fig = go.Figure()
fig = px.histogram( x=df["mun"], y=df["cust"], color=df["prod"], facet_col=df["pr"] )
fig.update_xaxes(matches=None, showticklabels=True)
fig.show()
As you can see the second plot does not show the labels for x. I don't understand why this is happening. How can I fix it?

I don't know why this is happening - it may be some bug in the categorical axis labels when plotly generates facet plots.
You can manually specify category_array=['D','E'] when you update the xaxes, which is admittedly a brittle workaround:
fig.update_xaxes(matches=None, showticklabels=True, categoryarray=['D','E'])

Related

Plotly scatter3d go empty dealing with a huge datapoints

I am trying to plot a huge number of data points, if I use the following code, it can work properly
N = 615677
df = pd.DataFrame(dict(x=np.random.randn(N),
y=np.random.randn(N),
z=np.random.randn(N)))
marker_data = go.Scatter3d(
x=np.random.randn(N),
y=np.random.randn(N),
z=np.random.randn(N),
marker=go.scatter3d.Marker(size=1),
mode='markers',
)
fig = go.Figure(data=marker_data)
fig.show()
figure 1, N=615677, normal plot
However, if I set
N = 615678
I will get an empty graph, it only plots axes without any data points.
figure 2, N=615678, wrong plot
Does anyone know what caused it? I can deal with it with downsampling, but it may be not the best way.

Gibberish / malformed negative y-axis values in plotly charts in python

Im trying to plot a bar plot in plotly that is representing net-gains (will have positive and negative bar values). But somewhat the negative values in the y-axis are being represented in gibberish. I tried several things, including using update_layout function, but nothing seems to work. Im using the make_subplots function because i want to plot multiple viz on one figure.
Im using databricks for this code.
Attaching my code and viz output:
net_gains = pd.DataFrame()
net_gains["general_net_gain"] = [-2,2,-1,2]
fig = plotly.subplots.make_subplots(rows=1, cols=1)
fig.add_bar(x=net_gains.index, y=net_gains["general_net_gain"], row=1, col=1)
fig.update_layout(height=400,width=500,showlegend=True)

Size legend for plotly express scatterplot in Python

Here is a Plotly Express scatterplot with marker color, size and symbol representing different fields in the data frame. There is a legend for symbol and a colorbar for color, but there is nothing to indicate what marker size represents.
Is it possible to display a "size" legend? In the legend I'm hoping to show some example marker sizes and their respective values.
A similar question was asked for R and I'm hoping for a similar results in Python. I've tried adding markers using fig.add_trace(), and this would work, except I don't know how to make the sizes equal.
import pandas as pd
import plotly.express as px
import random
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':list(range(1,11,1)),
'Size':random.sample(range(10,150), 10)
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
Scatterplot Image:
Thank you
You can not achieve this goal, if you use a metric scale/data like in your range. Plotly will try to always interpret it like metric, even if it seems/is discrete in the output. So your data has to be a factor like in R, as you are showing groups. One possible solution could be to use a list comp. and convert everything to a str. I did it in two steps so you can follow:
import pandas as pd
import plotly.express as px
import random
check = sorted(random.sample(range(10,150), 10))
check = [str(num) for num in check]
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':check,
'Size':list(range(1,11,1))
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
That gives:
Keep in mind, that you also get the symbol label, as you now have TWO groups!
Maybe you want to sort the values in the list before converting to string!
Like in this picture (added it to the code above)
UPDATE
Hey There,
yes, but as far as I know, only in matplotlib, and it is a little bit hacky, as you simulate scatter plots. I can only show you a modified example from matplotlib, but maybe it helps you so you can fiddle it out by yourself:
from numpy.random import randn
z = randn(10)
red_dot, = plt.plot(z, "ro", markersize=5)
red_dot_other, = plt.plot(z*2, "ro", markersize=20)
plt.legend([red_dot, red_dot_other], ["Yes", "No"], markerscale=0.5)
That gives:
As you can see you are working with two different plots, to be exact one plot for each size legend. In the legend these plots are merged together. Legendsize is further steered through markerscale and it is linked to markersize of each plot. And because we have two plots with TWO different markersizes, we can create a plot with different markersizes in the legend. markerscale is normally a value between 0 and 1 but you can also do 150% thus 1.5.
You can achieve this through fiddling around with the legend handler in matplotlib see here:
https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html

How can I return a matplotlib figure from a function?

I need to plot changing molecule numbers against time. But I'm also trying to investigate the effects of parallel processing so I'm trying to avoid writing to global variables. At the moment I have the following two numpy arrays tao_all, contains all the time points to be plotted on the x-axis and popul_num_all which contains the changing molecule numbers to be plotted on the y-axis.
The current code I've got for plotting is as follows:
for i, label in enumerate(['Enzyme', 'Substrate', 'Enzyme-Substrate complex', 'Product']):
figure1 = plt.plot(tao_all, popul_num_all[:, i], label=label)
plt.legend()
plt.tight_layout()
plt.show()
I need to encapsulate this in a function that takes the above arrays as the input and returns the graph. I've read a couple of other posts on here that say I should write my results to an axis and return the axis? But I can't quite get my head around applying that to my problem?
Cheers
def plot_func(x, y):
fig,ax = plt.subplots()
ax.plot(x, y)
return fig
Usage:
fig = plot_func([1,2], [3,4])
Alternatively you may want to return ax. For details about Figure and Axes see the docs. You can get the axes array from the figure by fig.axes and the figure from the axes by ax.get_figure().
In addition to above answer, I can suggest you to use matplotlib animation.FuncAnimation method if you are working with the time series and want to make your visualization better.
You can find the details here https://matplotlib.org/api/_as_gen/matplotlib.animation.FuncAnimation.html

How to reproduce this legend with multiple curves?

I've been working hard on a package of functions for my work, and I'm stuck on a layout problem. Sometimes I need to work with a lot of columns subplots (1 row x N columns) and the standard matplotlib legend sometimes is not helpful and makes it hard to visualize all the data.
I've been trying to create something like the picture below. I already tried to create a subplot for the curves and another one for the legends (and display the x-axis scale as a horizontal plot). Also, I tried to spine the x-axis, but when I have a lot of curves plotted inside the same subplots the legend becomes huge.
The following image is from a software. I'd like to create a similar look. Notice that these legends are "static": it remains fixed independent of the zooming. Another observation is, I don't need all the ticks or anything like that.
What I'm already have is the following (the code is a mess, becouse I'm trying many different solutions and it is not organized nor pythonic yet.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,2, sharey = True)
ax[0].semilogx(np.zeros_like(dados.Depth)+0.02, dados.Depth)
ax[0].semilogx(dados.AHT90, dados.Depth, label = 'aht90')
ax[0].set_xlim(0.2,2000)
ax[0].grid(True, which = 'both', axis = 'both')
axres1 = ax[0].twiny()
axres1.semilogx(dados.AHT90, dados.Depth, label = 'aht90')
axres1.set_xlim(0.2 , 2000)
axres1.set_xticks(np.logspace(np.log10(0.2),np.log10(2000),2))
axres1.spines["top"].set_position(("axes", 1.02))
axres1.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
axres1.tick_params(axis='both', which='both', labelsize=6)
axres1.set_xlabel('sss')#, labelsize = 5)
axres2 = ax[0].twiny()
axres2.semilogx(dados.AHT10, dados.Depth, label = 'aht90')
axres2.set_xlim(0.2 , 2000)
axres2.set_xticks(np.logspace(np.log10(0.2),np.log10(2000),2))
axres2.spines["top"].set_position(("axes", 1.1))
axres2.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
axres2.tick_params(axis='both', which='both', labelsize=6)
axres2.set_xlabel('aht10')#, labelsize = 5)
fig.show()
and the result is:
But well, I'm facing some issues on make a kind of make it automatic. If I add more curves, the prameter "set position" it is not practical to keep setting the position "by hand"
set_position(("axes", 1.02))
and another problem is, more curves I add, that kind of "legend" keep growing upward, and I have to adjust the subplot size with
fig.subplots_adjust(top=0.75)
And I'm also want to make the adjustment automatic, without keeping updating that parameter whenever I add more curves

Categories