Annotating scatterplot points with DF column text Matplotlib - python

I'm fairly new to Python and I'm struggling annotating plots at the minute.
I've come from R so I'm used to the ease of being able to annotate scatterplot points with minimum code.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
url = ('https://fbref.com/en/share/nXtrf')
df = pd.read_html(url)[0]
df = df[['Unnamed: 1_level_0', 'Unnamed: 2_level_0', 'Play', 'Perf']].copy()
df.columns = df.columns.droplevel()
df = df[['Player','Squad','Min','SoTA','Saves']]
df = df.drop([25])
df['Min'] = pd.to_numeric(df['Min'])
df['SoTA'] = pd.to_numeric(df['SoTA'])
df['Saves'] = pd.to_numeric(df['Saves'])
df['Min'] = df[df['Min'] > 1600]['Min']
df = df.dropna()
df.plot(x = 'Saves', y = 'SoTA', kind = "scatter")
I've tried numerous ways to annotate this plot. I'd like the points to be annotated with corresponding data from 'Player' column.
I've tried using a label_point function that I've found while trying to find a work around buy I keep getting Key Error 0 on most ways I try.
Any assistance would be great. Thanks.

You could loop through both columns and add a text for each entry. Note that you need to save the ax returned by df.plot(...).
ax = df.plot(x='Saves', y='SoTA', kind="scatter")
for x, y, player in zip(df['Saves'], df['SoTA'], df['Player']):
ax.text(x, y, f'{player}', ha='left', va='bottom')
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, xmax + 0.15 * (xmax - xmin)) # some more margin to fit the texts
An alternative is to use the mplcursors library to show an annotation while hovering (or after a click):
import mplcursors
mplcursors.cursor(hover=True)

Related

How to plot on exactly rows of a dataframe

That's not easy to describe with words, so I will reveal a picture for you in order to understand:
As the image shows, I want to plot a line on each row separately based on their values on a data frame. Is it possible with Python libraries?
Here's an example to get you started: it uses table to plot the dataframe and overplots the stacked lines. The line for each row is shifted by ymax, the maximum value in the dataframe, to prevent overlapping.
import matplotlib as mpl
import numpy as np
import pandas as pd
# make sample data
np.random.seed(0)
df = pd.DataFrame(np.random.rand(41,5))
df.index = [f'Row {i}' for i in df.index]
fig, ax = plt.subplots(figsize=(4,10))
ax.set_axis_off()
# plot data as table
plt.matplotlib.table.table(ax, df.applymap('{:.1f}'.format).values.tolist(), rowLabels=df.index, bbox=[0,0,1,1])
# plot curve over table
ymax = df.max().max()
ax.set_ylim(0, ymax * len(df))
ax.plot((df.to_numpy() + ((len(df) - 1 - df.reset_index(drop=True).index.to_numpy()) * ymax)[:, None]).T, color='C0')
To use alternating colors, you can set the color cycler:
from cycler import cycler
# ...
ax.set_prop_cycle(cycler(color='rg'))
ax.plot((df.to_numpy() + ((len(df) - 1 - df.reset_index(drop=True).index.to_numpy()) * ymax)[:, None]).T)

Python Seaborn Chart - Shadow Area

Sorry to my noob question, but how can I add a shadow area/color between the upper and lower lines in a seaborn chart?
The primary code I've working on is the following:
plt.figure(figsize=(18,10))
sns.set(style="darkgrid")
palette = sns.color_palette("mako_r", 3)
sns.lineplot(x="Date", y="Value", hue='Std_Type', style='Value_Type', sizes=(.25, 2.5), palette = palette, data=tbl4)
The idea is to get some effect like below (the example from seaborn website):
But I could not replicate the effect although my data structure is pretty much in the same fashion as fmri (seaborn example)
from seaborn link:
import seaborn as sns
sns.set(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
Do you have some ideas?
I tried to change the chart style, but if I go to a distplot or relplot, for example, the x_axis cannot show the timeframe...
Check this code:
# import
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style = 'darkgrid')
# data generation
time = pd.date_range(start = '2006-01-01', end = '2020-01-01', freq = 'M')
tbl4 = pd.DataFrame({'Date': time,
'down': 1 - 0.5*np.random.randn(len(time)),
'up': 4 + 0.5*np.random.randn(len(time))})
tbl4 = tbl4.melt(id_vars = 'Date',
value_vars = ['down', 'up'],
var_name = 'Std_Type',
value_name = 'Value')
# figure plot
fig, ax = plt.subplots(figsize=(18,10))
sns.lineplot(ax = ax,
x = 'Date',
y = 'Value',
hue = 'Std_Type',
data = tbl4)
# fill area
plt.fill_between(x = tbl4[tbl4['Std_Type'] == 'down']['Date'],
y1 = tbl4[tbl4['Std_Type'] == 'down']['Value'],
y2 = tbl4[tbl4['Std_Type'] == 'up']['Value'],
alpha = 0.3,
facecolor = 'green')
plt.show()
which gives me this plot:
Since I do not have access to your data, I generated random ones. Replace them with yours.
The shadow area is done with plt.fill_between (documentation here), where you specify the x array (common to both curves), the upper and lower limits of the area as y1 and y2 and, optionally a color and its transparency with the facecolor and alpha parameters respectively.
You cannot do it through ci parameter, since it is used to show the confidence interval of your data.

How to add axis offset in matplotlib plot?

I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.
Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()
You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()

Date removed from x axis on overlaid plots matplotlib

I am trying to show time series lines representing an effort amount using matplotlib and pandas.
I've got my DF's to all to overlay in one plot, however when I do python seems to strip the x axis of the date and input some numbers. (I'm not sure where these come from but at a guess, not all days contain the same data so python has reverted to using an index id number). If I plot any one of these they come up with date on the x-axis.
Any hints or solutions to make the x axis show date for the multiple plot would be much appreciated.
This is the single figure plot with time axis:
Code I'm using to plot is
fig = pl.figure()
ax = fig.add_subplot(111)
ax.plot(b342,color='black')
ax.plot(b343,color='blue')
ax.plot(b344,color='red')
ax.plot(b345,color='green')
ax.plot(b346,color='pink')
ax.plot(fi,color='yellow')
plt.show()
This is the multiple plot fig with weird x axis:
One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# make up some data
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["FishEffort"] )
df.df_name = str(i)
dfs.append(df)
# plot it directly using matplotlib instead of through the DataFrame
fig = plt.figure()
ax = fig.add_subplot()
for df in dfs:
plt.plot(df.index,df["FishEffort"], label = df.df_name)
plt.legend()
plt.show()
Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n = 100
dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
dfs = []
for i in range(3):
df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
columns = ["DataFrame #" + str(i) ] )
df.df_name = str(i)
dfs.append(df)
df = pd.concat(dfs, axis = 1)
df.plot()
I've found an answer that does what I want, it seems that calling plt.plot wasn't using the date as the x axis, however calling it using the pandas documentation did the trick.
ax = b342.plot(label='342')
b343.plot(ax=ax, label='test')
b344.plot(ax=ax)
b345.plot(ax=ax)
b346.plot(ax=ax)
fi.plot(ax=ax)
plt.show()
I was wondering if anyone knew hwo to change the labels here?

Clustered barchart in matplotlib?

How do I plot a barchart similar to
Clustered bar plot in gnuplot using python matplotlib?
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2
Trying to plot the subapp hours by app for the same person. Couldn't see an example in the demo pages of matplotlib.
EDIT: None of the examples cited below seem to work for unequal # of bars for each category as above.
The examples didn't manage unequal # of bars but you can use another approach. I'll post you an example.
Note: I use pandas to manipulate your data, if you don't know about it you should give it a try http://pandas.pydata.org/:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
df = pd.read_table("data.csv",sep="|")
grouped = df.groupby('app')['hours']
colors = "rgbcmyk"
fig, ax = plt.subplots()
initial_gap = 0.1
start = initial_gap
width = 1.0
gap = 0.05
for app,group in grouped:
size = group.shape[0]
ind = np.linspace(start,start + width, size+1)[:-1]
w = (ind[1]-ind[0])
start = start + width + gap
plt.bar(ind,group,w,color=list(colors[:size]))
tick_loc = (np.arange(len(grouped)) * (width+gap)) + initial_gap + width/2
ax.set_xticklabels([app for app,_ in grouped])
ax.xaxis.set_major_locator(mtick.FixedLocator(tick_loc))
plt.show()
And on data.csv is the data:
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2

Categories