I've added a trendline to my plotly graph - please let me know if this is the best way to do it or not...
I want to find values on the trendline which aren't in the data. See that on the trace/trendline in the graph provided below, I only get hover values for where the data has an x value. So I have a datapoint at x = 4.567123, and I can only hover at x = 4.567123 on the trendline. I want data values for x and y when x = 1, 2, 3, 4, 5, 6.
Anyone know how? Thank you!
My dataframe is called mat. Please see the used columns below, the code I use, and the graph.
# Plotting OIS spread vs Years to Maturity
x = mat['Years to Maturity']
y = mat['OIS Spread']
# Creating Polynomial Function
z = np.polyfit(x, y, 3)
f = np.poly1d(z)
fig = px.scatter(mat, x, y, color = x)
fig.add_trace(go.Scatter(
x = x,
y = f(x),
mode = 'lines',
hovertemplate = '%{x}, %{y}',
showlegend = False,
name = 'Cubic'
))
fig.update_layout(
title = 'OIS Spread: ' + today_str)
fig.show()
Related
Hi I'm new to python and would like to plot the names of the footballers on my scatterplot as labels if their Goals or npxG are greater than the average values i have calculated.
I wondered whether I could use a for/while loop to go through the data and plot the relevant players names?
I've struggled to figure out the most efficient way for this to be done.
Please see the scatter plot and code below for additional context. Any help would be greatly appreciated, Thanks.
df = pd.read_csv('C:/Users/alexo/Documents/Data/football data/shooting_top5_leagues_21_22.csv',encoding = 'ISO-8859-1')
striker_df = df.loc[(df['Pos']=='FW') & (df['90s']>= 15)]
sns.set_style('darkgrid')
sns.set(rc = {'figure.figsize':(15,8)})
graph = sns.scatterplot(striker_df.Gls,striker_df.npxG_p90,hue=striker_df.League,size=striker_df.npxG_pSh,edgecolor = 'black')
# averageline x axis
graph.axvline(9.751677852348994,c='grey',ls='--')
# average line yaxis
graph.axhline(0.34438111920973147,c='grey',ls='--')
#adding label names for specific players
#title
plt.title('Best Strikers across Europes Top 5 leagues 21/22',size=17,c='black')
# add credits
Notes = 'By Alex Orlandini'
CREDIT_1 = "data: statsbomb via fbref"
graph.text(
36, 0.1, f"{Notes}\n{CREDIT_1}", size=10,
color="#000000",
ha="right");
enter image description here
Yes, you can loop through specific players and add the arrow and text.
Just a matter of getting the x, y coordinate of the data point, then deciding where to place the label. I had to pull my own data since you didn't share yours.
I would also avoid hard coding that average. I'd have that as a calculated variable.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#df = pd.read_csv('C:/Users/alexo/Documents/Data/football data/shooting_top5_leagues_21_22.csv',encoding = 'ISO-8859-1')
df = pd.read_html('https://fbref.com/en/comps/Big5/shooting/players/Big-5-European-Leagues-Stats', header=1)[0]
df = df[df['Rk'].ne('Rk')]
df['npxG'] = df['npxG'].astype(float)
df['90s'] = df['90s'].astype(float)
df['npxG/Sh'] = df['npxG/Sh'].astype(float)
df['Gls'] = df['Gls'].astype(int)
df['npxG_p90'] = df['npxG'] / df['90s']
df['League'] = df['Comp'].str.split(' ',1, expand=True)[1]
df = df.rename(columns={'npxG/Sh':'npxG_pSh'})
striker_df = df.loc[(df['Pos']=='FW') & (df['90s']>= 15)]
sns.set_style('darkgrid')
sns.set(rc = {'figure.figsize':(15,8)})
x_axis_column = 'Gls'
y_axis_column = 'npxG_p90'
graph = sns.scatterplot(x = striker_df[x_axis_column],
y = striker_df[y_axis_column],
hue = striker_df.League,
size = striker_df.npxG_pSh,
edgecolor = 'black')
# averageline x axis
avgX = striker_df[x_axis_column].mean()
graph.axvline(avgX, c='grey', ls='--')
# average line yaxis
avgY = striker_df[y_axis_column].mean()
graph.axhline(avgY, c='grey', ls='--')
xOffset = (striker_df[x_axis_column].max() - striker_df[x_axis_column].min()) *.10
yOffset = (striker_df[y_axis_column].max() - striker_df[y_axis_column].min()) *.10
#adding label names for specific players
for player in ['Robert Lewandowski', 'Kylian Mbappé', 'Patrik Schick', 'Arnaut Groeneveld']:
# Label coordinate, Custom arrow
x = striker_df[striker_df['Player'] == player].iloc[0][x_axis_column]
y = striker_df[striker_df['Player'] == player].iloc[0][y_axis_column]
plt.annotate(player, xy=(x, y),xytext=(x + xOffset, y + yOffset) ,
horizontalalignment="center",
arrowprops=dict(arrowstyle='->', lw=2, color='black')
)
#title
plt.title('Best Strikers across Europes Top 5 leagues 21/22',size=17,c='black')
# add credits
Notes = 'By Alex Orlandini'
CREDIT_1 = "data: statsbomb via fbref"
graph.text(
36, 0.1, f"{Notes}\n{CREDIT_1}", size=10,
color="#000000",
ha="right");
Output:
Or you can iterate through a dataframe:
#adding label names for specific players
striker_df['calc'] = striker_df[x_axis_column] + striker_df[y_axis_column]
striker_df = striker_df.sort_values('calc', ascending = False)
top_players = striker_df.head(8)
for idx, row in top_players.iterrows():
# Label coordinate, Custom arrow
player = row['Player']
x = row[x_axis_column]
y = row[y_axis_column]
plt.annotate(player, xy=(x, y),xytext=(x + xOffset, y) ,
horizontalalignment="center",
arrowprops=dict(arrowstyle='->', lw=2, color='black')
)
To get something like this:
Okay, so my initial idea is to make a line plot in plotly and color the line with one color after certain threshold t, and another color before the threshold. It works for a 23 or less points, but it works with no more, using this method:
import numpy as np
import plotly.graph_objects as go
X = [j for j in range(0, 100)]
Y = [j for j in range(100000, 200000, 1000)]
X = X[:23]
Y = Y[:23]
X = np.array(X)
Y = np.array(Y)
t = 4
x = X[X <= t] # Include the threshold
y = Y[X <= t]
bx = X[X >= t]
by = Y[X >= t]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, line=dict(width=4, color='grey'), name="useless data"))
fig.add_trace(go.Scatter(x=bx, y=by, line=dict(width=4, color='blue'), name="useful data"))
fig.update_layout(xaxis_title="x axis", yaxis_title="y axis")
fig.show()
So this works normally, and if you run it, you will see that 4 is included in the blue points. But now, please remove the lines where only 23 values are taken (X = X[:23], Y = Y[:23]). You will see that 4 is no longer part of the blue points, moreover, the points themselves disappear from the graph in the blue line, you can hover and see data, but you can't see the actual points! If anyone knows why this happens, is it an actual bug or it is normal behaviour and there is something I am missing? Thank you in advance!
Bug? Not necessarily. Weird behaviour? Perhaps...
In any case, the solution in your case is:
fig.data[1].mode = 'lines+markers'
What you seem to be struggling with here is caused by two things:
When hovering over a point where there is one trace represented by a line, and one trace represented by a marker, plotly will display the information for the marker even though the line is placed on top.
For an increasing length of a go.Scatter() trace, plotly will stop showing markers for after a certain threshold.
And this is perhaps the arguably weird part; that the exact threshold does not seem to be determined by the length of the trace alone. We'll take a look at that in the end.
Details:
1. Hover behavior
Just run your code as it is, and hover over 4:
Now deselect useless data by clicking the name in the legend and you'll get:
If you zoom in a bit, you'll see that the data is actually there, it just won't show on hover when both traces are activated:
So, what to do about it?
Just include:
fig.data[1].mode = 'lines+markers'
And get:
2. Marker trigger threshold for go.Scatter
In your case, this threshold seems to be a trace with length = 23 since you're seeing the exact behaviour you're describing. So, what's weird about this? The next part:
3. Varying marker trigger threshold
First of all, *why is there a threshold? Probably because a trace with too many markers arguably looks weird:
You found the threshold to be 24. But in a figure built only with go.Figure(go.Scatter(x = x, y = y)), the threshold is 20:
pts = 20
x = np.arange(1,pts)
y = np.arange(1,pts)
fig = go.Figure(go.Scatter(x = x, y = y)).show()
pts = 21
x = np.arange(1,pts)
y = np.arange(1,pts)
fig = go.Figure(go.Scatter(x = x, y = y)).show()
And I'm not sure why. But I think that would make for a good question on its own.
The reason:
This is a 'feature' of Plotly Scatter plots. When (a number) of points are plotted, the underlying Plotly logic converts the 'scatter' plot to a 'line' plot, for rendering efficiency and plot cleanliness. Thus, the markers are converted to a solid line.
The fix:
Simply add mode='lines+markers' to the trace.
Full working example:
This is your source code, with the minor fix mentioned above:
import numpy as np
import plotly.graph_objects as go
X = [j for j in range(0, 100)]
Y = [j for j in range(100000, 200000, 1000)]
#X = X[:23]
#Y = Y[:23]
X = np.array(X)
Y = np.array(Y)
t = 4
x = X[X <= t] # Include the threshold
y = Y[X <= t]
bx = X[X >= t]
by = Y[X >= t]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers', line=dict(width=1, color='grey'), name="useless data"))
fig.add_trace(go.Scatter(x=bx, y=by, mode='lines+markers', line=dict(width=1, color='blue'), name="useful data"))
fig.update_layout(xaxis_title="x axis", yaxis_title="y axis")
fig.show()
Output:
In a Streamtube plot of Python plotly, how do you fix the radius of the tube and not let it vary by the divergence of the field?
Run the following code as shown in this page:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/streamtube-wind.csv').drop(['Unnamed: 0'],axis=1)
fig = go.Figure(data=go.Streamtube(
x = df['x'],
y = df['y'],
z = df['z'],
u = df['u'],
v = df['v'],
w = df['w'],
starts = dict(
x = [80] * 16,
y = [20,30,40,50] * 4,
z = [0,0,0,0,5,5,5,5,10,10,10,10,15,15,15,15]
),
sizeref = 0.3,
colorscale = 'Portland',
showscale = False,
maxdisplayed = 3000
))
fig.show()
you will see something like:
The radius of the tube varies according to the divergence of the field, according to this documentation. However, this is annoying and I don't want the radius of the tube to vary. I want to fix the radius of the tubes and keep their coloring varying according to the module of the field, as it already does. I don't see any parameter for that in the documentation. I also searched the word "local radius" or "divergence" in the page and found nothing related. This drives me crazy.
This question already has answers here:
Matplotlib Scatter plot change color based on value on list
(2 answers)
Closed 4 years ago.
I'm trying to create a plot of x1 against y and make the points coloured based on another variable x2.
x1 = times
y = wait
x2 = triage (from 1-5)
Every variable have the same contains data and have all the same length. The special about x2(triage), is that is numbered from 1-5 and want to colour based on those, so 1 is red, 2 is orange, 3 is yellow, 4 is green and 5 is blue.
Code UPDATED
X = dataset.iloc[:, 0:3].values
y = dataset.iloc[:, 3].values
triage = X[:, 0]
week = X[:, 1]
times = X[:, 2]
wait = y
df = pd.DataFrame(dict(times=times, wait=wait, triage=triage))
fig, ax = plt.subplots()
colors = {'1':'red', '2':'orange', '3':'yellow', '4':'green', '5':'blue'}
grouped = df.groupby('triage')
for key, group in grouped:
group.plot(ax=ax, kind='scatter', x='times', y='wait', label=key,
triage=colors[key])
plt.show()
Managed to try something, but still does not work quite, anyone can see what could be wrong here?
First, it is important to provide a complete and verifiable example. The data dependency here does not allow for it. However, to answer your question you need to index into your conditions. I made some mock data for you to produce the results you want.
import pandas as pd, numpy as np, matplotlib.pyplot as plt
# mockdata
data = np.zeros((10, 3))
data[:, :2] = np.random.rand(len(data), 2)
data[:, -1] = np.random.randint(0, 2, size = len(data))
df = pd.DataFrame(data, columns = 'x y condition'.split(' '))
fig, ax = plt.subplots()
# index into different subsets and plot
for condition, selection in df.groupby('condition'):
ax.plot(selection.x, selection.y, label = condition)
ax.legend(title = 'condition')
ax.set(xlabel = 'x', ylabel = 'y')
fig.show()
I have a data set for clusters . And the output seems something like this :-
1 [1,2]
2 [1,6]
1 [2,4]
Where 1,2 ... is the cluster id and [1,2]..so on are the points . So i want to plot the points x co-ordinate and y co-ordinate on both the axis and corresponding to that a point in graph depicting the cluster id as label and for different id the color of points should be different. How do i go about it?
Thanks
If one axis is the cluster id I don't get how you fit both the x and y coordinates onto the other axis. So I plotted the x,y on the x and y axis and used the cluster id as a label; you can swap which value goes into which axis, I guess:
import matplotlib.pyplot as plt
from ast import literal_eval
data = """1 [1,2]
2 [1,6]
1 [2,4]"""
def cluster_color(clusternum):
'''Some kind of map of cluster ID to a color'''
if clusternum == 1:
return [(1, 0,0)]
if clusternum == 2:
return [(0, 0, 1)]
else:
return [(0, 1,0)]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
def datalines(pt):
'''Pick the coordinates and label out of this text format'''
label, xy = pt.split(' ')
xy = literal_eval(xy)
assert(len(xy) == 2)
return xy[0], xy[1], label
for pt in data.splitlines():
if len(pt) > 0:
x, y, cluster = datalines(pt)
ax.scatter(x, y, c= cluster_color(float(cluster)))
ax.text(x + .01, y + .01, cluster)
fig.show()
N.b.: if you have a lot of data, don't call scatter for each point separately; append x, y , cluster to three separate lists and scatter the lists.