How to plot on exactly rows of a dataframe - python

That's not easy to describe with words, so I will reveal a picture for you in order to understand:
As the image shows, I want to plot a line on each row separately based on their values on a data frame. Is it possible with Python libraries?

Here's an example to get you started: it uses table to plot the dataframe and overplots the stacked lines. The line for each row is shifted by ymax, the maximum value in the dataframe, to prevent overlapping.
import matplotlib as mpl
import numpy as np
import pandas as pd
# make sample data
np.random.seed(0)
df = pd.DataFrame(np.random.rand(41,5))
df.index = [f'Row {i}' for i in df.index]
fig, ax = plt.subplots(figsize=(4,10))
ax.set_axis_off()
# plot data as table
plt.matplotlib.table.table(ax, df.applymap('{:.1f}'.format).values.tolist(), rowLabels=df.index, bbox=[0,0,1,1])
# plot curve over table
ymax = df.max().max()
ax.set_ylim(0, ymax * len(df))
ax.plot((df.to_numpy() + ((len(df) - 1 - df.reset_index(drop=True).index.to_numpy()) * ymax)[:, None]).T, color='C0')
To use alternating colors, you can set the color cycler:
from cycler import cycler
# ...
ax.set_prop_cycle(cycler(color='rg'))
ax.plot((df.to_numpy() + ((len(df) - 1 - df.reset_index(drop=True).index.to_numpy()) * ymax)[:, None]).T)

Related

How to plot heatmap onto mplsoccer pitch?

Wondering how I can plot a seaborn plot onto a different matplotlib plot. Currently I have two plots (one a heatmap, the other a soccer pitch), but when I plot the heatmap onto the pitch, I get the results below. (Plotting the pitch onto the heatmap isn't pretty either.) Any ideas how to fix it?
Note: Plots don't need a colorbar and the grid structure isn't required either. Just care about the heatmap covering the entire space of the pitch. Thanks!
import pandas as pd
import numpy as np
from mplsoccer import Pitch
import seaborn as sns
nmf_shot_W = pd.read_csv('https://raw.githubusercontent.com/lucas-nelson-uiuc/datasets/main/nmf_show_W.csv').iloc[:, 1:]
nmf_shot_ThierryHenry = pd.read_csv('https://raw.githubusercontent.com/lucas-nelson-uiuc/datasets/main/nmf_show_Hth.csv')['Thierry Henry']
pitch = Pitch(pitch_type='statsbomb', line_zorder=2,
pitch_color='#22312b', line_color='#efefef')
dfdfdf = np.array(np.matmul(nmf_shot_W, nmf_shot_ThierryHenry)).reshape((24,25))
g_ax = sns.heatmap(dfdfdf)
pitch.draw(ax=g_ax)
Current output:
Desired output:
Use the built-in pitch.heatmap:
pitch.heatmap expects a stats dictionary of binned data, bin mesh, and bin centers:
stats (dict) – The keys are statistic (the calculated statistic), x_grid and y_grid (the bin's edges), and cx and cy (the bin centers).
In the mplsoccer heatmap demos, they construct this stats object using pitch.bin_statistic because they have raw data. However, you already have binned data ("calculated statistic"), so reconstruct the stats object manually by building the mesh and centers:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mplsoccer import Pitch
nmf_shot_W = pd.read_csv('71878281/nmf_show_W.csv', index_col=0)
nmf_shot_ThierryHenry = pd.read_csv('71878281/nmf_show_Hth.csv')['Thierry Henry']
statistic = np.dot(nmf_shot_W, nmf_shot_ThierryHenry.to_numpy()).reshape((24, 25))
# construct stats object from binned data, bin mesh, and bin centers
y, x = statistic.shape
x_grid = np.linspace(0, 120, x + 1)
y_grid = np.linspace(0, 80, y + 1)
cx = x_grid[:-1] + 0.5 * (x_grid[1] - x_grid[0])
cy = y_grid[:-1] + 0.5 * (y_grid[1] - y_grid[0])
stats = dict(statistic=statistic, x_grid=x_grid, y_grid=y_grid, cx=cx, cy=cy)
# use pitch.draw and pitch.heatmap as per mplsoccer demo
pitch = Pitch(pitch_type='statsbomb', line_zorder=2, pitch_color='#22312b', line_color='#efefef')
fig, ax = pitch.draw(figsize=(6.6, 4.125))
pcm = pitch.heatmap(stats, ax=ax, cmap='plasma')
cbar = fig.colorbar(pcm, ax=ax, shrink=0.6)
cbar.outline.set_edgecolor('#efefef')
cbar.ax.yaxis.set_tick_params(color='#efefef')
plt.setp(plt.getp(cbar.ax.axes, 'yticklabels'), color='#efefef')

Reading specific rows and plotting using Matplotlib

I have an Excel sheet that has a column of image frames. These frames numbers are not uniformly distributed, e.g. frame 1 may have entries from row 1 to 20 and frame 2 from 21 to 25 and so on. I want to read this data from an Excel sheet that has x and y coordinate for each frame and plot these x and y coordinate in a scattered plot using matplotlib. Here's my code, frame numbers are identified as image index.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from matplotlib.pyplot import figure
df_xlsx = pd.read_excel('X10.xlsx')
temp = df_xlsx['Image index'][0]
i = 0; #number of the row
xList = []
yList = []
dt = df_xlsx.loc[df_xlsx['Image index'] == 19]
xList = np.array(dt['X position'])
yList = np.array(dt['Y position'])
rList = np.array(dt['Diameter'])
figure(figsize=(10.24,7.68), dpi=100)
fig, ax = plt.subplots()
plt.xlim([0,1024])
plt.ylim([0,768])
plt.scatter(xList, yList, color ='r')
plt.axis('off')
plt.gcf().set_size_inches((10.24,7.68))
for i in range(len(xList)):
circle1 = plt.Circle((xList[i], yList[i]), rList[i], color='r')
ax.add_artist(circle1)
plt.tight_layout(pad=0)
plt.savefig('f=19.png',dpi=100)
plt.show()
Excel sheet example
The problem is every time I need to enter the image index and then save the plot. Can this be done in a loop such that the plot is continuously generated as different plots for each frame number (index frames)? This will save me a lot of time, as I have lots of frames and excel sheets. I am new to Python.
You can use groupby to step over the image index:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('X10.xlsx')
for idx, group in df.groupby('Image index'):
fig, ax = plt.subplots(figsize=(10.24, 7.68), dpi=100)
diameter = 1 * group['Diameter']**2
ax.scatter(group['X position'], group['Y position'], s=diameter)
ax.set_xlim(0, 1024)
ax.set_ylim(0, 768)
ax.axis('off')
plt.tight_layout(pad=0)
plt.savefig(f'Image_index_{idx}.png', dpi=100)
Some notes on the other changes I made:
You don't need to cast the DataFrame columns to arrays or lists.
You can pass a size parameter s to plt.scatter() to make the circles; you'll just need to scale the numbers to fit your scale. E.g. you could multiply by some factor other than 1. Note that in matplotlib you are specifying the area of the marker, not the diameter.

Annotating scatterplot points with DF column text Matplotlib

I'm fairly new to Python and I'm struggling annotating plots at the minute.
I've come from R so I'm used to the ease of being able to annotate scatterplot points with minimum code.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
url = ('https://fbref.com/en/share/nXtrf')
df = pd.read_html(url)[0]
df = df[['Unnamed: 1_level_0', 'Unnamed: 2_level_0', 'Play', 'Perf']].copy()
df.columns = df.columns.droplevel()
df = df[['Player','Squad','Min','SoTA','Saves']]
df = df.drop([25])
df['Min'] = pd.to_numeric(df['Min'])
df['SoTA'] = pd.to_numeric(df['SoTA'])
df['Saves'] = pd.to_numeric(df['Saves'])
df['Min'] = df[df['Min'] > 1600]['Min']
df = df.dropna()
df.plot(x = 'Saves', y = 'SoTA', kind = "scatter")
I've tried numerous ways to annotate this plot. I'd like the points to be annotated with corresponding data from 'Player' column.
I've tried using a label_point function that I've found while trying to find a work around buy I keep getting Key Error 0 on most ways I try.
Any assistance would be great. Thanks.
You could loop through both columns and add a text for each entry. Note that you need to save the ax returned by df.plot(...).
ax = df.plot(x='Saves', y='SoTA', kind="scatter")
for x, y, player in zip(df['Saves'], df['SoTA'], df['Player']):
ax.text(x, y, f'{player}', ha='left', va='bottom')
xmin, xmax = ax.get_xlim()
ax.set_xlim(xmin, xmax + 0.15 * (xmax - xmin)) # some more margin to fit the texts
An alternative is to use the mplcursors library to show an annotation while hovering (or after a click):
import mplcursors
mplcursors.cursor(hover=True)

How to change what the axis of a plot is based on? (Python, Matplotlib)

I want to create a graph of 2 * height (which is the meter values in the index) versus the time squared (which are the decimal values in the columns). How can I go about doing this? (In matplotlib)
For clarity, I want the y-axis to be 2 * index values, and the x-axis to be the times squared from within the columns. I would like this to be a series of line graphs
It should end up looking something like this:
In your comment you say you use df1.plot() to draw lines. df.plot() uses dataframe index as x values by default. You say you want the y-axis to be 2 * index values, and the x-axis to be the times squared from within the columns. Your demand involves changes to dataframe values, so I suggest you use ax.plot() for better customization.
Here is a program uses numpy.linalg.lstsq which adopts Least squares internally to get a matched line among given points.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from io import StringIO
TESTDATA = StringIO("""Height Trial:1 Trial:2 Trial:3 Trial:4 Trial:5 Trial:6 Trial:7
1.029 0.4667 0.4616 0.4569 0.4579 0.4653 0.4578 0.4484
1.095 0.4752 0.4773 0.4721 0.4738 0.4713 0.4745 0.4663
1.168 0.4836 0.4834 0.4873 0.4890 0.4890 0.4904 0.4902
1.315 0.5139 0.5117 0.5161 0.5108 0.5224 0.5129 0.5187
1.540 0.5644 0.5677 0.5804 0.5535 0.5636 0.5605 0.5609
1.807 0.6051 0.6124 0.6014 0.6035 0.5977 0.6012 0.6209
""")
df = pd.read_csv(TESTDATA, delim_whitespace=True)
df.set_index(['Height'], inplace=True)
fig, ax = plt.subplots()
for column in df:
x = df[column]**2
y = df.index*2
A = np.vstack([x, np.ones(len(x))]).T
k, b = np.linalg.lstsq(A, y)[0]
line = ax.plot(x, y, 'o')
ax.plot(x, k*x+b, label=f'y={k:.5f}x+{b:.5f}', color=line[0].get_color(), linestyle='dashed')
plt.legend()
plt.xlabel('Fall time, squared (s²)')
plt.ylabel('Twice the height (m)')
plt.title('Measurement of Acceleration due to Gravity on Earth')
plt.show()
import matplotlib.pyplot as plt
plt.plot(list of things on x-axis, list of things on y-axs)
plt.show
import matplotlib.pyplot as plt
plt.plot(times_squared_variable, 2_height_variable, '--', color='choose_a_color')
# Label axis and the plot
plt.xlabel('Name_x_axis')
plt.ylabel('Name_y_axis')
plt.title('Plot_name')
# Show the plot
plt.show()

Clustered barchart in matplotlib?

How do I plot a barchart similar to
Clustered bar plot in gnuplot using python matplotlib?
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2
Trying to plot the subapp hours by app for the same person. Couldn't see an example in the demo pages of matplotlib.
EDIT: None of the examples cited below seem to work for unequal # of bars for each category as above.
The examples didn't manage unequal # of bars but you can use another approach. I'll post you an example.
Note: I use pandas to manipulate your data, if you don't know about it you should give it a try http://pandas.pydata.org/:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
df = pd.read_table("data.csv",sep="|")
grouped = df.groupby('app')['hours']
colors = "rgbcmyk"
fig, ax = plt.subplots()
initial_gap = 0.1
start = initial_gap
width = 1.0
gap = 0.05
for app,group in grouped:
size = group.shape[0]
ind = np.linspace(start,start + width, size+1)[:-1]
w = (ind[1]-ind[0])
start = start + width + gap
plt.bar(ind,group,w,color=list(colors[:size]))
tick_loc = (np.arange(len(grouped)) * (width+gap)) + initial_gap + width/2
ax.set_xticklabels([app for app,_ in grouped])
ax.xaxis.set_major_locator(mtick.FixedLocator(tick_loc))
plt.show()
And on data.csv is the data:
date|name|empid|app|subapp|hours
20140101|A|0001|IIC|I1|2.5
20140101|A|0001|IIC|I2|3
20140101|A|0001|IIC|I3|4
20140101|A|0001|CAR|C1|2.5
20140101|A|0001|CAR|C2|3
20140101|A|0001|CAR|C3|2
20140101|A|0001|CAR|C4|2

Categories