Problem with Friends of Tracking Code NOOB [PYTHON] - python

I am learning with python code and I have some issues:
https://github.com/Slothfulwave612/Football-Analytics-Using-Python/blob/master/03.%20Analyzing%20Event%20Data/pass_map.py
My dubt is really simple:
I would like to apply a for expresion in order to apply the pass code to multiple football matches.
import matplotlib.pyplot as plt
import json
from pandas.io.json import json_normalize
from FCPython import createPitch
pitch_length_X = 120
pitch_width_Y = 80
(fig,ax) = createPitch(pitch_length_X, pitch_width_Y,'yards','gray')
## the code integrated in order to analyze multiple matches
P1TMP = [16205, 16131, 16265]
for i in P1TMP:
## match id for our El Clasico
match_id = int(i)
home_team = 'Barcelona'
player_name = 'Lionel Andrés Messi Cuccittini'
## this is the name of our event data file for
## our required El Clasico
file_name = str(match_id) + '.json'
## loading the required event data file
my_data = json.load(open('/content/drive/My Drive/20200515 CHIRINGUITO/events/' + file_name, 'r', encoding='utf-8'))
## get the nested structure into a dataframe
## store the dataframe in a dictionary with the match id as key
df = json_normalize(my_data, sep='_').assign(match_id = file_name[:-5])
## making the list of all column names
column = list(df.columns)
## all the type names we have in our dataframe
all_type_name = list(df['type_name'].unique())
## creating a data frame for pass
## and then removing the null values
## only listing the player_name in the dataframe
pass_df = df.loc[df['type_name'] == 'Pass', :].copy()
pass_df.dropna(inplace=True, axis=1)
pass_df = pass_df.loc[pass_df['player_name'] == player_name, :]
## creating a data frame for ball receipt
## removing all the null values
## and only listing Barcelona players in the dataframe
breceipt_df = df.loc[df['type_name'] == 'Ball Receipt*', :].copy()
breceipt_df.dropna(inplace=True, axis=1)
breceipt_df = breceipt_df.loc[breceipt_df['team_name'] == 'Barcelona', :]
pass_comp, pass_no = 0, 0
## pass_comp: completed pass
## pass_no: unsuccessful pass
## iterating through the pass dataframe
for row_num, passed in pass_df.iterrows():
if passed['player_name'] == player_name:
## for away side
x_loc = passed['location'][0]
y_loc = passed['location'][1]
pass_id = passed['id']
summed_result = sum(breceipt_df.iloc[:, 14].apply(lambda x: pass_id in x))
if summed_result > 0:
## if pass made was successful
color = 'blue'
label = 'Successful'
pass_comp += 1
else:
## if pass made was unsuccessful
color = 'green'
label = 'Unsuccessful'
pass_no += 1
## plotting circle at the player's position
shot_circle = plt.Circle((pitch_length_X - x_loc, y_loc), radius=2, color=color, label=label)
shot_circle.set_alpha(alpha=0.2)
ax.add_patch(shot_circle)
## parameters for making the arrow
pass_x = 120 - passed['pass_end_location'][0]
pass_y = passed['pass_end_location'][1]
dx = ((pitch_length_X - x_loc) - pass_x)
dy = y_loc - pass_y
## making an arrow to display the pass
pass_arrow = plt.Arrow(pitch_length_X - x_loc, y_loc, -dx, -dy, width=1, color=color)
## adding arrow to the plot
ax.add_patch(pass_arrow)
## computing pass accuracy
pass_acc = (pass_comp / (pass_comp + pass_no)) * 100
pass_acc = str(round(pass_acc, 2))
## adding text to the plot
plt.text(20, 85, '{} pass map vs Real Madrid'.format(player_name), fontsize=15)
plt.text(20, 82, 'Pass Accuracy: {}'.format(pass_acc), fontsize=15)
## handling labels
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), loc='best', bbox_to_anchor=(0.9, 1, 0, 0),fontsize=12)
## editing the figure size and saving it
fig.set_size_inches(12, 8)
fig.savefig('{} passmap.png'.format(match_id), dpi=200)
## showing the plot
plt.show()
I only have edited the code in order to analayze multiple matches with a for expresion.
P1TMP = [16205, 16131, 16265]
for i in P1TMP:
And the results:
In The first image the result is almost perfect, but the Kind of passes´s filter is not working.
enter image description here
In the second image the passes are a mix of the passes of the first match and the second match. I only want the passes of the second match.
enter image description here
And in the third is the mix of the match nº1 +nº2 + n3º. I need the passes of the third :
enter image description here
Thanks in advance for your support.
Best Regards

So it's combining all the matches onto 1 because the figure is "drawing" on top of the previous one. There's a few other things you need to change too.
The away team will not always be Real Madrid, so make that dynamic
Adjust that in the figure text text so it's not always "vs. Real Madrid"
Save the file as something dynamic so they don't overwrite
Instead of doing plt.text to put in the titles (which is fine if you want to annotate at a specific x,y coordinates), use plt.title() and plt.suptitle(). It'll center and just make it a nicer layout of the text
You us i for you match id variable when iterating, but then you don't change/include that in the loop
This is the main issue: (fig,ax) = createPitch(pitch_length_X, pitch_width_Y,'yards','gray')
is what is creating your "blank canvas" to plot on. So this needs to be called before each plot. It's like grabbing a new blank sheet of paper to draw on. If you just use the first initial sheet, then everything will go on that 1 sheet. So move that into your for loop
Code
import matplotlib.pyplot as plt
import json
from pandas.io.json import json_normalize
from FCPython import createPitch
## Note Statsbomb data uses yards for their pitch dimensions
pitch_length_X = 120
pitch_width_Y = 80
## match id for our El Clasico
match_list = [16205, 16131, 16265]
teamA = 'Barcelona' #<--- adjusted here
for match_id in match_list:
## calling the function to create a pitch map
## yards is the unit for measurement and
## gray will be the line color of the pitch map
(fig,ax) = createPitch(pitch_length_X, pitch_width_Y,'yards','gray') #< moved into for loop
player_name = 'Lionel Andrés Messi Cuccittini'
## this is the name of our event data file for
## our required El Clasico
file_name = str(match_id) + '.json'
## loading the required event data file
my_data = json.load(open('Statsbomb/data/events/' + file_name, 'r', encoding='utf-8'))
## get the nested structure into a dataframe
## store the dataframe in a dictionary with the match id as key
df = json_normalize(my_data, sep='_').assign(match_id = file_name[:-5])
teamB = [x for x in list(df['team_name'].unique()) if x != teamA ][0] #<--- get other team name
## making the list of all column names
column = list(df.columns)
## all the type names we have in our dataframe
all_type_name = list(df['type_name'].unique())
## creating a data frame for pass
## and then removing the null values
## only listing the player_name in the dataframe
pass_df = df.loc[df['type_name'] == 'Pass', :].copy()
pass_df.dropna(inplace=True, axis=1)
pass_df = pass_df.loc[pass_df['player_name'] == player_name, :]
## creating a data frame for ball receipt
## removing all the null values
## and only listing Barcelona players in the dataframe
breceipt_df = df.loc[df['type_name'] == 'Ball Receipt*', :].copy()
breceipt_df.dropna(inplace=True, axis=1)
breceipt_df = breceipt_df.loc[breceipt_df['team_name'] == 'Barcelona', :]
pass_comp, pass_no = 0, 0
## pass_comp: completed pass
## pass_no: unsuccessful pass
## iterating through the pass dataframe
for row_num, passed in pass_df.iterrows():
if passed['player_name'] == player_name:
## for away side
x_loc = passed['location'][0]
y_loc = passed['location'][1]
pass_id = passed['id']
summed_result = sum(breceipt_df.iloc[:, 14].apply(lambda x: pass_id in x))
if summed_result > 0:
## if pass made was successful
color = 'blue'
label = 'Successful'
pass_comp += 1
else:
## if pass made was unsuccessful
color = 'red'
label = 'Unsuccessful'
pass_no += 1
## plotting circle at the player's position
shot_circle = plt.Circle((pitch_length_X - x_loc, y_loc), radius=2, color=color, label=label)
shot_circle.set_alpha(alpha=0.2)
ax.add_patch(shot_circle)
## parameters for making the arrow
pass_x = 120 - passed['pass_end_location'][0]
pass_y = passed['pass_end_location'][1]
dx = ((pitch_length_X - x_loc) - pass_x)
dy = y_loc - pass_y
## making an arrow to display the pass
pass_arrow = plt.Arrow(pitch_length_X - x_loc, y_loc, -dx, -dy, width=1, color=color)
## adding arrow to the plot
ax.add_patch(pass_arrow)
## computing pass accuracy
pass_acc = (pass_comp / (pass_comp + pass_no)) * 100
pass_acc = str(round(pass_acc, 2))
## adding text to the plot
plt.suptitle('{} pass map vs {}'.format(player_name, teamB), fontsize=15) #<-- make dynamic and change to suptitle
plt.title('Pass Accuracy: {}'.format(pass_acc), fontsize=15) #<-- change to title
## handling labels
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), loc='best', bbox_to_anchor=(0.9, 1, 0, 0), fontsize=12)
## editing the figure size and saving it
fig.set_size_inches(12, 8)
fig.savefig('{} passmap.png'.format(match_id), dpi=200) #<-- dynamic file name
## showing the plot
plt.show()

Related

Matplotlib PathPatch Colors and Legends not Matching

I have a dataset that is a list of lists.
Each list is a category to be plotted as a box plot.
Each list has a list of up to 9 components to be plotted into subplots.
The functions I am using is below was based on this answer. I pulled it out of my work and added some mock data. Should be a minimal example below.
neonDict = {
0:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8
}
import matplotlib as mpl
import matplotlib.pyplot as plt
def coloredBoxPlot(axis, data,edgeColor,fillColor):
bp = axis.boxplot(data,vert=False,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(bp[element], color=edgeColor)
for patch in bp['boxes']:
patch.set(facecolor=fillColor)
return bp
def plotCalStats(data, prefix='Channel', savedir=None,colors=['#00597c','#a8005c','#00aeea','#007d50','#400080','#e07800'] ):
csize = mpl.rcParams['figure.figsize']
cdpi = mpl.rcParams['figure.dpi']
mpl.rcParams['figure.figsize'] = (12,8)
mpl.rcParams['figure.dpi'] = 1080
pkdata = []
labels = []
lstyles = []
fg, ax = plt.subplots(3,3)
for pk in range(len(neonDict)):
px = pk // 3
py = pk % 3
ax[px,py].set_xlabel('Max Pixel')
ax[px,py].set_ylabel('')
ax[px,py].set_title(str(neonDict[pk]) + ' nm')
pkdata.append([])
for cat in range(len(data)):
bp = ''
for acal in data[cat]:
for apeak in acal.peaks:
pkdata[apeak].append(acal.peaks[apeak][0])
for pk in range(9):
px = pk // 3
py = pk % 3
bp = coloredBoxPlot(ax[px,py], pkdata[pk], colors[cat], '#ffffff')
if len(data[cat]) > 0:
#print(colors[cat])
#print(bp['boxes'][0].get_edgecolor())
labels.append(prefix+' '+str(cat))
lstyles.append(bp['boxes'][0])
fg.legend(lstyles,labels)
fg.suptitle('Calibration Summary by '+prefix)
fg.tight_layout()
if savedir is not None:
plt.savefig(savedir + 'Boxplots.png')
plt.show()
mpl.rcParams['figure.figsize'] = csize
mpl.rcParams['figure.dpi'] = cdpi
return
class acal:
def __init__(self):
self.peaks = {}
for x in range(9):
self.peaks[x] = (np.random.randint(20*x,20*(x+1)),)
mockData = [[acal() for y in range(100)] for x in range(6)]
#Some unused channels
mockData[2] = []
mockData[3] = []
mockData[4] = []
plotCalStats(mockData)
So the issue is that the plot colors do not match the legend. Even if I restrict the data to only add a label if data exists (ensuring thus there is no issue with calling boxplots with an empty data set and not getting an appropriate PathPatch.
The printouts verify the colors are correctly stored in the PathPatch. (I can add my digits -> hex converter) if that is questioned.
Attached is the output. One can see I get a purple box but no purple in the legend. Purple is the 4th category which is empty.
Any ideas why the labels don't match the actual style? Thanks much!
EDITS:
To address question on 'confusing'.
I have six categories of data, each category is coming from a single event. Each event has 9 components. I want to compare all events, for each individual component, for each category on a single plot as shown below.
Each subplot is a individual component comprised from the series of data for each categorical (Channel).
So the link I have provided, (like I said, is adapted from) shows how to create a single box plot on one axis for 2 data sets. I've basically done the same thing for 6 data sets on 9 axis, where 3 data sets are empty (but don't have to be, I did it to illustrate the issue. If I have all 6 data sets there, how can you tell the colors are messed up?????)
Regarding the alpha:
The alphas are always 'ff' when giving only RGB data to matplotlib. If I call get_edgecolors, it will return a tuple (RGBA) where A = 1.0.
See commented out print statement.
EDIT2:
If I restrict it down to a single category, it makes the box plot view less confusing.
Single Example (see how box plot color is orange, figure says it's blue)
All colors off
Feel like this used to work....
Uncertain how the error presented as it did, but the issue has to do with reformatting the data before creating the box plot.
By removing pkdata.append([]) during the creation of the subplots before looping the categories and adding:
pkdata = [[],[],[],[],[],[],[],[],[]] during each iteration of the category loop fixed the issue. The former was sending in all previous channel data...
Output is now better. Full sol attached.
Likely, since the plot uses data from pkdata, the empty channel (data[cat]) plotted previous data (from data[cat-1]) as that was still in pkdata (actually, all previous data[cat] was still in pkdata) which was then plotted. I only check data[cat] for data on each loop to add to the legend. The legend was set up for channels 0,1,5, for example.. but we saw data for channel: 0 as 0, 0+1 as 1, 0+1 as 2, 0+1 as 3, 0+1 as 4, 0+1+5 as 5... thus channel 4 (purple) had data to plot but wasn't added to the legend. Giving the impression of 'misaligned' legends but rather unlegend data...
The single channel data is actually all 6 channels overlapping, the final channel 5 color being orange, overlapping all previous, namely the original channel 0 data to whom the data belongs and was properly added to the legend.
neonDict = {
0:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8
}
import matplotlib as mpl
import matplotlib.pyplot as plt
def getHex(r,g,b,a=1.0):
colors = [int(r * 255 ),int(g * 255 ),int(b * 255 ),int(a * 255) ]
s = '#'
for x in range(4):
cs = hex(colors[x])
if len(cs) == 3:
cs = cs + '0'
s += cs.replace('0x','')
return s
def getRGB(colstr):
try:
a = ''
r = int(colstr[1:3],16) / 255
g = int(colstr[3:5],16) / 255
b = int(colstr[5:7],16) / 255
if len (colstr) == 7:
a = 1.0
else:
a = int(colstr[7:],16) / 255
return (r,g,b,a)
except Exception as e:
print(e)
raise e
return
def compareHexColors(col1,col2):
try:
## ASSUME #RBG or #RBGA
## If less than 7, append the ff for the colors
if len(col1) < 9:
col1 += 'ff'
if len(col2) < 9:
col2 += 'ff'
return col1.lower() == col2.lower()
except Exception as e:
raise e
return False
def coloredBoxPlot(axis, data,edgeColor,fillColor):
bp = axis.boxplot(data,vert=False,patch_artist=True)
for element in ['boxes', 'whiskers', 'fliers', 'means', 'medians', 'caps']:
plt.setp(bp[element], color=edgeColor)
for patch in bp['boxes']:
patch.set(facecolor=fillColor)
return bp
def plotCalStats(data, prefix='Channel', savedir=None,colors=['#00597c','#a8005c','#00aeea','#007d50','#400080','#e07800'] ):
csize = mpl.rcParams['figure.figsize']
cdpi = mpl.rcParams['figure.dpi']
mpl.rcParams['figure.figsize'] = (12,8)
mpl.rcParams['figure.dpi'] = 1080
pkdata = []
labels = []
lstyles = []
fg, ax = plt.subplots(3,3)
for pk in range(len(neonDict)):
px = pk // 3
py = pk % 3
ax[px,py].set_xlabel('Max Pixel')
ax[px,py].set_ylabel('')
ax[px,py].set_title(str(neonDict[pk]) + ' nm')
for cat in range(len(data)):
bp = ''
pkdata = [[],[],[],[],[],[],[],[],[]]
for acal in data[cat]:
for apeak in acal.peaks:
pkdata[apeak].append(acal.peaks[apeak][0])
for pk in range(9):
px = pk // 3
py = pk % 3
bp = coloredBoxPlot(ax[px,py], pkdata[pk], colors[cat], '#ffffff')
if len(data[cat]) > 0:
print(compareHexColors(colors[cat],getHex(*bp['boxes'][0].get_edgecolor())))
labels.append(prefix+' '+str(cat))
lstyles.append(bp['boxes'][0])
fg.legend(lstyles,labels)
fg.suptitle('Calibration Summary by '+prefix)
fg.tight_layout()
if savedir is not None:
plt.savefig(savedir + 'Boxplots.png')
plt.show()
mpl.rcParams['figure.figsize'] = csize
mpl.rcParams['figure.dpi'] = cdpi
return
class acal:
def __init__(self,center):
self.peaks = {}
for x in range(9):
self.peaks[x] = [10*x + (center) + (np.random.randint(10)-1)/2.0,0,0]
mockData = [[acal(x) for y in range(1000)] for x in range(6)]
#Some unused channels
mockData[2] = []
mockData[3] = []
mockData[4] = []
plotCalStats(mockData)

How to animate a 2D scatter plot given X, Y coordinates and time with appearing and disappearing points?

I have a data frame like the below:
Every row represents a person. They stay at 3 different locations for some time given on the dataframe. The first few people don't stay at location1 but they "born" at location2. The rest of them stay at every locations (3 locations).
I would like to animate every person at the given X, Y coordinates given on the data frame and represent them as dots or any other shape. Here is the flow:
Every person should appear at the first given location (location1) at the given time. Their color should be blue at this state.
Stay at location1 until location2_time and then appear at location2. Their color should be red at this state.
Stay at location2 until location3_time and then appear at location3. Their color should be red at this state.
Stay at location3 for 3 seconds and disappear forever.
There can be several people on the visual at the same time. How can I do that?
There are some good answers on the below links. However, on these solutions, points don't disappear.
How can i make points of a python plot appear over time?
How to animate a scatter plot?
The following is an implementation with python-ffmpeg, pandas, matplotlib, and seaborn. You can find output video on my YouTube channel (link is unlisted).
Each frame with figures is saved directly to memory. New figures are generated only when the state of the population changes (person appears/moves/disappears).
You should definetely separate this code into smaller chunks if you are using this in a Python package:
from numpy.random import RandomState, SeedSequence
from numpy.random import MT19937
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import ffmpeg
RESOLUTION = (12.8, 7.2) # * 100 pixels
NUMBER_OF_FRAMES = 900
class VideoWriter:
# Courtesy of https://github.com/kylemcdonald/python-utils/blob/master/ffmpeg.py
def __init__(
self,
filename,
video_codec="libx265",
fps=15,
in_pix_fmt="rgb24",
out_pix_fmt="yuv420p",
input_args=None,
output_args=None,
):
self.filename = filename
self.process = None
self.input_args = {} if input_args is None else input_args
self.output_args = {} if output_args is None else output_args
self.input_args["r"] = self.input_args["framerate"] = fps
self.input_args["pix_fmt"] = in_pix_fmt
self.output_args["pix_fmt"] = out_pix_fmt
self.output_args["vcodec"] = video_codec
def add(self, frame):
if self.process is None:
height, width = frame.shape[:2]
self.process = (
ffmpeg.input(
"pipe:",
format="rawvideo",
s="{}x{}".format(width, height),
**self.input_args,
)
.filter("crop", "iw-mod(iw,2)", "ih-mod(ih,2)")
.output(self.filename, **self.output_args)
.global_args("-loglevel", "quiet")
.overwrite_output()
.run_async(pipe_stdin=True)
)
conv = frame.astype(np.uint8).tobytes()
self.process.stdin.write(conv)
def close(self):
if self.process is None:
return
self.process.stdin.close()
self.process.wait()
def figure_to_array(figure):
"""adapted from: https://stackoverflow.com/questions/21939658/"""
figure.canvas.draw()
buf = figure.canvas.tostring_rgb()
n_cols, n_rows = figure.canvas.get_width_height()
return np.frombuffer(buf, dtype=np.uint8).reshape(n_rows, n_cols, 3)
# Generate data for the figure
rs1 = RandomState(MT19937(SeedSequence(123456789)))
time_1 = np.round(rs1.rand(232) * NUMBER_OF_FRAMES).astype(np.int16)
time_2 = time_1 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_1)).astype(np.int16)
time_3 = time_2 + np.round(rs1.rand(232) * (NUMBER_OF_FRAMES - time_2)).astype(np.int16)
loc_1_x, loc_1_y, loc_2_x, loc_2_y, loc_3_x, loc_3_y = np.round(rs1.rand(6, 232) * 100, 1)
df = pd.DataFrame({
"loc_1_time": time_1,
"loc_1_x": loc_1_x,
"loc_1_y": loc_1_y,
"loc_2_time": time_2,
"loc_2_x": loc_2_x,
"loc_2_y": loc_2_y,
"loc_3_time": time_3,
"loc_3_x": loc_3_x,
"loc_3_y": loc_3_y,
})
"""The stack answer starts here"""
# Add extra column for disappear time
df["disappear_time"] = df["loc_3_time"] + 3
all_times = df[["loc_1_time", "loc_2_time", "loc_3_time", "disappear_time"]]
change_times = np.unique(all_times)
# Prepare ticks for plotting the figure across frames
x_values = df[["loc_1_x", "loc_2_x", "loc_3_x"]].values.flatten()
x_ticks = np.array(np.linspace(x_values.min(), x_values.max(), 6), dtype=np.uint8)
y_values = df[["loc_1_y", "loc_2_y", "loc_3_y"]].values.flatten()
y_ticks = np.array(np.round(np.linspace(y_values.min(), y_values.max(), 6)), dtype=np.uint8)
sns.set_theme(style="whitegrid")
video_writer = VideoWriter("endermen.mp4")
if 0 not in change_times:
# Generate empty figure if no person arrive at t=0
fig, ax = plt.subplots(figsize=RESOLUTION)
ax.set_xticklabels(x_ticks)
ax.set_yticklabels(y_ticks)
ax.set_title("People movement. T=0")
video_writer.add(figure_to_array(fig))
loop_range = range(1, NUMBER_OF_FRAMES)
else:
loop_range = range(NUMBER_OF_FRAMES)
palette = sns.color_palette("tab10") # Returns three colors from the palette (we have three groups)
animation_data_df = pd.DataFrame(columns=["x", "y", "location", "index"])
for frame_idx in loop_range:
if frame_idx in change_times:
plt.close("all")
# Get person who appears/moves/disappears
indexes, loc_nums = np.where(all_times == frame_idx)
loc_nums += 1
for i, loc in zip(indexes, loc_nums):
if loc != 4:
x, y = df[[f"loc_{loc}_x", f"loc_{loc}_y"]].iloc[i]
if loc == 1: # location_1
animation_data_df = animation_data_df.append(
{"x": x, "y": y, "location": loc, "index": i},
ignore_index=True
)
else:
data_index = np.where(animation_data_df["index"] == i)[0][0]
if loc in (2, 3): # location_2 or 3
animation_data_df.loc[[data_index], :] = x, y, loc, i
elif loc == 4: # Disappear
animation_data_df.iloc[data_index] = np.nan
current_palette_size = np.sum(~np.isnan(np.unique(animation_data_df["location"])))
fig, ax = plt.subplots(figsize=RESOLUTION)
sns.scatterplot(
x="x", y="y", hue="location", data=animation_data_df, ax=ax, palette=palette[:current_palette_size]
)
ax.set_xticks(x_ticks)
ax.set_xticklabels(x_ticks)
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_ticks)
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ax.set_title(f"People movement. T={frame_idx}")
video_writer.add(figure_to_array(fig))
video_writer.close()
Edit: There was a bug in which location_3 wasn't removed after 3 seconds. Fixed now.
Modifying the code from this question to only include the positions you want automatically removes the old ones if the old position isn't included in the new ones. This doesn't change if you want to animate by time or iterations or anything else. I have opted to use iterations here since it's easier and I don't know how you are handling your dataset. The code does have one bug though, the last point (or points if they last the same amount of time) remaining won't disappear, this can be solved easily if you don't want to draw anything again, if you do though for exaple in case you there is a gap in the data with no people and then the data resumes I haven't found any workarounds
import math
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
#The t0,t1,t2,t3 are the times (in iterations) that the position changes
#If t0 is None then the person will never be displayed
people = [
# t0 x1 y1 t1 x2 y2 t2 x3 y3 t4
[ 0, 1, 0.1, 1, 2, 0.2, 2, 3, 0.3, 3],
[ 2, None, None, None, 2, 1, 3, 4, 1, 7],
[ 2, float("NaN"), float("NaN"), float("NaN"), 2, 0.8, 4, 4, 0.8, 10],
]
fig = plt.figure()
plt.xlim(0, 5)
plt.ylim(0, 1)
graph = plt.scatter([], [])
def animate(i):
points = []
colors = []
for person in people:
if person[0] is None or math.isnan(person[0]) or i < person[0]:
continue
# Position 1
elif person[3] is not None and not (math.isnan(person[3])) and i <= person[3]:
new_point = [person[1], person[2]]
color = "b"
# Position 2
elif person[6] is not None and not (math.isnan(person[6])) and i <= person[6]:
new_point = [person[4], person[5]]
color = "r"
# Position 3
elif person[9] is not None and not (math.isnan(person[9])) and i <= person[9]:
new_point = [person[7], person[8]]
color = "r"
else:
people.remove(person)
new_point = []
if new_point != []:
points.append(new_point)
colors.append(color)
if points != []:
graph.set_offsets(points)
graph.set_facecolors(colors)
else:
# You can use graph.remove() to fix the last point not disappiring but you won't be able to plot anything after that
# graph.remove()
pass
return graph
ani = FuncAnimation(fig, animate, repeat=False, interval=500)
plt.show()

ValueError: Dimensions of labels and X must be compatible

After running the code, this happens:
ValueError: Dimensions of labels and X must be compatible
I do not quite understand what is the error above
Honestly, pretty new to python, was referring to a code and following it to make a boxplot graph, but encountered an error, here is my code:
import numpy as np
import matplotlib.pyplot as plt
title = "Annual Bus Population"
titlelen = len(title)
print("{:*^{titlelen}}".format(title, titlelen=titlelen+6))
print()
filename = 'annual-bus-population-by-passenger-capacity.csv'
data = np.genfromtxt(filename, dtype=["i4", "U50", "i8"], delimiter=",", names=True)
#print("Original data: " + str(data.shape))
null_rows = np.isnan(data['number'])
nonnull_values = data[null_rows==False]
#print("Filtered data: " + str(nonnull_values.shape))
labels = list(set(data['capacity']))
capacities = np.arange(0,len(labels))
capacity_number = data[['capacity','number']]
numbers = capacity_number['number']
values_nine = numbers[capacity_number ['capacity'] == '<10']
values_fifteen = numbers[capacity_number['capacity'] == '10-15']
values_twenty = numbers[capacity_number['capacity'] == '16-20']
values_twentyfive = numbers[capacity_number['capacity'] == '21-25']
values_thirty= numbers[capacity_number ['capacity'] == '21-30']
values_thirtyfive = numbers[capacity_number ['capacity'] == '31-35']
values_fourty = numbers[capacity_number ['capacity'] == '36-40']
values_fourtyfive = numbers[capacity_number ['capacity'] == '40-45']
values_fifty = numbers[capacity_number ['capacity'] == '45-50']
values_fiftyfive = numbers[capacity_number ['capacity'] == '51-55']
values_sixty = numbers[capacity_number ['capacity'] == '56-60']
values_sixtyfive = numbers[capacity_number ['capacity'] == '61-65']
values_seventy = numbers[capacity_number ['capacity'] == '66-70']
values_moreseventy = numbers[capacity_number ['capacity'] == '>70']
values_total = [values_nine,values_fifteen,values_twenty,values_twentyfive,values_thirty,values_thirtyfive,values_fourty,values_fourtyfive,values_fifty,values_fiftyfive,values_sixty,values_sixtyfive,values_seventy,values_moreseventy]
#print(values_total.shape)
#print()
plt.figure(2, figsize=(30,30))
plt.title(title,fontsize=50)
plt.ylabel('Number of passengers',fontsize=40)
plt.yticks(fontsize=30)
plt.xticks(fontsize=30,rotation='vertical')
bp_dict = plt.boxplot(values_total,labels=labels,patch_artist=True)
## change outline color, fill color and linewidth of the boxes
for box in bp_dict['boxes']:
# change outline color
box.set( color='#7570b3', linewidth=2)
# change fill color
box.set( facecolor = '#1b9e77' )
## change color and linewidth of the whiskers
for whisker in bp_dict['whiskers']:
whisker.set(color='#7570b3', linewidth=2)
## change color and linewidth of the caps
for cap in bp_dict['caps']:
cap.set(color='#7570b3', linewidth=2)
## change color and linewidth of the medians
for median in bp_dict['medians']:
median.set(color='#b2df8a', linewidth=2)
## change the style of fliers and their fill
for flier in bp_dict['fliers']:
flier.set(marker='D', color='#e7298a', alpha=0.5)
print(bp_dict.keys())
for line in bp_dict['medians']:
# get position data for median line
x, y = line.get_xydata()[1] # top of median line
# overlay median value
plt.text(x, y, '%.1f' % y,
horizontalalignment='center',fontsize=30) # draw above, centered
fliers = []
for line in bp_dict['fliers']:
ndarray = line.get_xydata()
if (len(ndarray)>0):
max_flier = ndarray[:,1].max()
max_flier_index = ndarray[:,1].argmax()
x = ndarray[max_flier_index,0]
print("Flier: " + str(x) + "," + str(max_flier))
plt.text(x,max_flier,'%.1f' % max_flier,horizontalalignment='center',fontsize=30,color='green')
plt.show()
The error was in this line:
bp_dict = plt.boxplot(values_total,labels=labels,patch_artist=True)
Dataset gotten from:
https://data.gov.sg/dataset/annual-age-bus-population-by-passenger-capacity
Any help is greatly appreciated ^^
thanks
Your error is in your labels variable. Specifically, you have extra values in it such as 15-Nov. Also, you lose the order of the labels when you use the set() function, so they come out in a random order. I'm not quite sure what you need to do to fix it tonight, but you can just remove the labels parameter from your call to plt.boxplot() to get something working. Then you can figure out labels that work.
The error is trying to say "The dimensions of the data and dimension of the labels do not match".
Good luck!
labels should be feature_names (as is column dimension, or axis=1) in order to be drawn in one plot (matplot) by different column-divisions.
But your labels_var is just a list of one column (capacity) values - it is not correct.
You need either pivot_table your dataframe ... or plt.boxplot (not ax.boxplot - I did't investigate why) gives an opportunity to use the grouping_param e.g. "by 'capacity'" (possibly suits your case)... or you can try to use seaborn library - probably it gives more opportunities
Try using the following for plotting for older versions.
bp_dict = plt.boxplot(values_total.transpose(),labels=labels,patch_artist=True)

matplotlib widget updates the wrong data

I'm making a plot to compare band structure calculations from two different methods. This means plotting multiple lines for each set of data. I want to have a set of widgets that controls each set of data separately. The code below works if I only plot one set of data, but I can't get the widgets to work properly for two sets of data.
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider, TextBox
#cols = ['blue', 'red', 'green', 'purple']
cols = ['#3f54bf','#c14142','#59bf3f','#b83fbf']
finam = ['wan_band.dat','wan_band.pwx.dat']
#finam = ['wan_band.dat'] # this works
lbot = len(finam)*0.09 + 0.06
fig, ax = plt.subplots()
plt.subplots_adjust(bottom=lbot)
ax.margins(x=0) # lines go to the edge of the horizontal axes
def setlines(lines, txbx1, txbx2):
''' turn lines on/off based on text box values '''
try:
mn = int(txbx1) - 1
mx = int(txbx2) - 1
for ib in range(len(lines)):
if (ib<mn) or (ib>mx):
lines[ib].set_visible(False)
else :
lines[ib].set_visible(True)
plt.draw()
except ValueError as err:
print('Invalid range')
#end def setlines(cnt, lines, txbx1, txbx2):
def alphalines(lines, valin):
''' set lines' opacity '''
maxval = int('ff',16)
maxval = hex(int(valin*maxval))[2:]
for ib in range(bcnt):
lines[ib].set_color(cols[cnt]+maxval)
plt.draw()
#end def alphalines(lines, valtxt):
lines = [0]*len(finam) # 2d list to hold Line2Ds
txbox1 = [0]*len(finam) # list of Lo Band TextBoxes
txbox2 = [0]*len(finam) # lsit of Hi Band TextBoxes
alslid = [0]*len(finam) # list of Line Opacity Sliders
for cnt, fnam in enumerate(finam):
ptcnt = 0 # point count
fid = open(fnam, 'r')
fiit = iter(fid)
for line in fiit:
if line.strip() == '' :
break
ptcnt += 1
fid.close()
bandat_raw = np.loadtxt(fnam)
bcnt = int(np.round((bandat_raw.shape[0] / (ptcnt))))
print(ptcnt)
print(bcnt)
# get views of the raw data that are easier to work with
kbandat = bandat_raw[:ptcnt,0] # k point length along path
ebandat = bandat_raw.reshape((bcnt,ptcnt,2))[:,:,1] # band energy # k-points
lines[cnt] = [0]*bcnt # point this list element to another list
for ib in range(bcnt):
#l, = plt.plot(kbandat, ebandat[ib], c=cols[cnt],lw=1.0)
l, = ax.plot(kbandat, ebandat[ib], c=cols[cnt],lw=1.0)
lines[cnt][ib] = l
y0 = 0.03 + 0.07*cnt
bxht = 0.035
axbox1 = plt.axes([0.03, y0, 0.08, bxht]) # x0, y0, width, height
axbox2 = plt.axes([0.13, y0, 0.08, bxht])
txbox1[cnt] = TextBox(axbox1, '', initial=str(1))
txbox2[cnt] = TextBox(axbox2, '', initial=str(bcnt))
txbox1[cnt].on_submit( lambda x: setlines(lines[cnt], x, txbox2[cnt].text) )
txbox2[cnt].on_submit( lambda x: setlines(lines[cnt], txbox1[cnt].text, x) )
axalpha = plt.axes([0.25, y0, 0.65, bxht])
alslid[cnt] = Slider(axalpha, '', 0.1, 1.0, valinit=1.0)
salpha = alslid[cnt]
alslid[cnt].on_changed( lambda x: alphalines(lines[cnt], x) )
#end for cnt, fnam in enumerate(finam):
plt.text(0.01, 1.2, 'Lo Band', transform=axbox1.transAxes)
plt.text(0.01, 1.2, 'Hi Band', transform=axbox2.transAxes)
plt.text(0.01, 1.2, 'Line Opacity', transform=axalpha.transAxes)
plt.show()
All the widgets only control the last data set plotted instead of the individual data sets I tried to associate with each widget. Here is a sample output:
Here the bottom slider should be changing the blue lines' opacity, but instead it changes the red lines' opacity. Originally the variables txbox1, txbox2, and alslid were not lists. I changed them to lists though to ensure they weren't garbage collected but it didn't change anything.
Here is the test data set1 and set2 I've been using. They should be saved as files 'wan_band.dat' and 'wan_band.pwx.dat' as per the hard coded list finam in the code.
I figured it out, using a lambda to partially execute some functions with an iterator value meant they were always being evaluated with the last value of the iterator. Switching to functools.partial fixed the issue.

Linking plots (box select, lasso etc) in Bokeh generated in a loop

I am plotting some data using bokeh using a for loop to iterate over my columns in the dataframe. For some reason the box select and lasso tools which I have managed to have as linked in plots explicitly plotted (i.e. not generated with a for loop) does not seem to work now.
Do I need to increment some bokeh function within the for loop?
#example dataframe
array = {'variable': ['var1', 'var2', 'var3', 'var4'],
'var1': [np.random.rand(10)],
'var2': [np.random.rand(10)],
'var3': [np.random.rand(10)],
'var4': [np.random.rand(10)]}
cols = ['var1',
'var2',
'var3',
'var4']
df = pd.DataFrame(array, columns = cols)
w = 500
h = 400
#collect plots in a list (start with an empty)
plots = []
#iterate over the columns in the dataframe
# specify the tools in TOOLS
#add additional lines to show tolerance bands etc
for c in df[cols]:
source = ColumnDataSource(data = dict(x = df.index, y = df[c]))
TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select"
f = figure(tools = TOOLS, width = w, plot_height = h, title = c + ' Run Chart',
x_axis_label = 'Run ID', y_axis_label = c)
f.line('x', 'y', source = source, name = 'data')
f.triangle('x', 'y', source = source)
#data mean line
f.line(df.index, df[c].mean(), color = 'orange')
#tolerance lines
f.line (df.index, df[c + 'USL'][0], color = 'red', line_dash = 'dashed', line_width = 2)
f.line (df.index, df[c + 'LSL'][0], color = 'red', line_dash = 'dashed', line_width = 2)
#append the new plot in this loop to the existing list of plots
plots.append(f)
#link all the x_ranges
for i in plots:
i.x_range = plots[0].x_range
#plot
p = gridplot(plots, ncols = 2)
output_notebook()
show(p)
I expect to produce plots which are linked and allow me to box or lasso select some points on one chart and for them to be highlighted on the others. However, the plots only let me select on one plot with no linked behaviour.
SOLUTION
This may seem a bit of a noob problem, but I am sure someone else will come across this, so here is the answer!!!
Bokeh works by referring to a datasource object (the columndatasource object). You can pass your dataframe completely into this and then call explicit x and y values within the glyph creation (e.g. my f.line, f.triangle etc).
So I moved the 'source' outside of the loop to prevent it being reset each iteration and just passed my df to it. I then within the loop, call the iteration index + descriptor string (USL, LSL, mean) for the y values and the 'index' for my x values.
I add a box select tool explicitly with a 'name' defined so that when the box selects, it only selects those glyphs that I want it to select (i.e. don't want it to select my constant value mean and spec limit lines).
Also, be careful that if you want to output to a html or something, that you probably will need to supress your in-notebook output as bokeh does not like having duplicate plots open. I have not included my html output solution here.
In terms of adding linked lasso objects for loop generated plots, I could only find an explicit box select tool generator so not sure this is possible.
So here it is:
#keep the source out of the loop to stop it resetting every time
Source = ColumnDataSource(df)
for c in cols:
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
f = figure(tools = TOOLS, width = w, plot_height = h, title = c + ' Run Chart',
x_axis_label = 'Run ID', y_axis_label = c)
f.line(x = 'index', y = c , source = Source, name = 'data')
f.triangle(x = 'index', y = c, source = Source, name = 'data')
#data mean line
f.line(x = 'index', y = c + '_mean', source = Source, color = 'orange')
#tolerance lines
f.line (x = 'index', y = c + 'USL', color = 'red', line_dash = 'dashed', line_width = 2, source = Source)
f.line (x = 'index', y = c + 'LSL', color = 'red', line_dash = 'dashed', line_width = 2, source = Source)
# Add BoxSelect tool - this allows points on one plot to be highligted on all linked plots. Note only the delta info
# is linked using name='data'. Again names can be used to ensure only the relevant glyphs are highlighted.
bxselect1 = BoxSelectTool(renderers=f.select(name='data'))
f.add_tools(bxselect1)
plots.append(f)
#tie the x_ranges together so that panning is linked between plots
for i in plots:
i.x_range = plots[0].x_range
forp = gridplot(plots, ncols = 2)
show(forp)

Categories