Highlight a single point with a marker in lineplot - python

I would like to highlithgt a single point on my lineplot graph using a marker. So far I managed to create my plot and insert the highlight where I wanted.
The problem is that I have 4 differents lineplot (4 different categorical attributes) and I get the marker placed on every sigle lineplot like in the following image:
I would like to place the marker only on the 2020 line (the purple one). This is my code so far:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import seaborn as sns
import numpy as np
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(15,10))
gs0 = gridspec.GridSpec(2,2, figure=fig, hspace=0.2)
ax1 = fig.add_subplot(gs0[0,:]) # lineplot
ax2 = fig.add_subplot(gs0[1,0]) #Used for another plot not shown here
ax3 = fig.add_subplot(gs0[1,1]) #Used for another plot not shown here
flatui = ["#636EFA", "#EF553B", "#00CC96", "#AB63FA"]
sns.lineplot(ax=ax1,x="number of weeks", y="avg streams", hue="year", data=df, palette=flatui, marker = 'o', markersize=20, fillstyle='none', markeredgewidth=1.5, markeredgecolor='black', markevery=[5])
ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.0f}'.format(x/1000) + 'K'))
ax1.set(title='Streams trend')
ax1.xaxis.set_major_locator(ticker.MultipleLocator(2))
I used the markevery field to place a marker in position 5. Is there a way to specify also on which line/category place my marker?
EDIT: This is my dataframe:
avg streams date year number of weeks
0 145502.475 01-06 2017 0
1 158424.445 01-13 2017 1
2 166912.255 01-20 2017 2
3 169132.215 01-27 2017 3
4 181889.905 02-03 2017 4
... ... ... ... ...
181 760505.945 06-26 2020 25
182 713891.695 07-03 2020 26
183 700764.875 07-10 2020 27
184 753817.945 07-17 2020 28
185 717685.125 07-24 2020 29
186 rows × 4 columns

markevery is a Line2D property. sns.lineplot doesn't return the lines so you need to get the line you want to annotate from the Axes. Remove all the marker parameters from the lineplot call and add ...
lines = ax1.get_lines()
If the 2020 line/data is the fourth in the series,
line = lines[3]
line.set_marker('o')
line.set_markersize(20)
line.set_markevery([5])
line.set_fillstyle('none')
line.set_markeredgewidth(1.5)
line.set_markeredgecolor('black')
# or
props = {'marker':'o','markersize':20, 'fillstyle':'none','markeredgewidth':1.5,
'markeredgecolor':'black','markevery': [5]}
line.set(**props)
Another option, inspired by Quang Hoang's comment would be to add a circle around/at the point deriving the point from the DataFrame.
x = 5 # your spec
wk = df['number of weeks']==5
yr = df['year']==2020
s = df[wk & yr]
y = s['avg streams'].to_numpy()
# or
y = df.loc[(df['year']==2020) & (df['number of weeks']==5),'avg streams'].to_numpy()
ax1.plot(x,y, 'ko', markersize=20, fillstyle='none', markeredgewidth=1.5)

Related

How to show last row of Pandas DataFrame in box plot

Random data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(np.random.normal(size=(20,4)))
data
0 1 2 3
0 -0.710006 -0.748083 -1.261515 0.048941
1 0.856541 0.533073 0.649113 -0.236297
2 -0.091005 -0.244658 -2.194779 0.632878
3 -0.059058 0.807661 -0.418446 -0.295255
4 -0.103701 0.775622 0.258412 0.024411
5 -0.447976 -0.034419 -1.521598 -0.903301
6 1.451105 0.549661 -1.655751 -0.147499
7 1.479374 -1.475347 0.665726 0.236611
8 -1.427979 -1.812916 0.522802 0.006066
9 0.198515 1.203476 -0.475389 -1.721707
10 0.286255 0.564450 0.590050 -0.657811
11 -1.076161 1.820218 -0.315127 -0.848114
12 0.061848 0.303502 0.978169 0.024630
13 -0.307827 -1.047835 0.547052 -0.647217
14 0.679214 0.734134 0.158803 -0.334951
15 0.469675 1.043391 -1.449727 1.335354
16 -0.483831 -0.988185 0.264027 -0.831833
17 -2.013968 -0.200699 1.076526 1.275300
18 -0.199473 -1.630597 -1.697146 -0.177458
19 1.245289 0.132349 1.054312 -0.082550
data.boxplot(vert= False, figsize = (15,10))
I want to add red dots to the box plot indicating the last value (bottom) in each column. For example (red dots I've edited in are not in their exact position, but this gives you a general idea):
Thank you.
You could just add a scatter plot on top of the boxplot.
For the provided example, it looks like this:
fig, ax = plt.subplots(figsize=(8,5))
df.boxplot(vert= False, patch_artist=True, ax=ax, zorder=1)
lastrow = df.iloc[-1,:]
print(lastrow)
ax.scatter(x=lastrow, y=[*range(1,len(lastrow)+1)], color='r', zorder=2)
# for displaying the values of the red points:
for i, val in enumerate(lastrow,1):
ax.annotate(text=f"{val:.2f}", xy=(val,i+0.1))

How to change colormap in joypy plot?

I have a dataframe which looks like this:
Team Minute Type
148 12 1
148 22 1
143 27 1
148 29 1
143 32 1
143 32 1
I created a joyplot using the Python library joypy
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], linewidth = 1, colormap=plt.cm.viridis)
Which gave me this plot:
All Good.
However, the colourmap is meaningless now so I am trying to color the plots according to a second dataframe - which is the sum of Type for all the teams.
To do that, I created a norm, and a colourmap using these lines:
norm = plt.Normalize(group_df["Type"].min(), group_df["Type"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Type"])
Cm = cmap(norm(ar))
sm.set_array([])
Here's where the problem arose as I can't figure out how to change the color of the joyplots. I tried a couple of approaches:
I tried to pass this Cm as the colormap argument. However, that threw up an error - typeerror 'numpy.ndarray' object is not callable
I tried to use a for loop over the axes and Cm -
for col, ax in zip(Cm, axes):
ax.set_facecolor(col)
#ax.patch.set_facecolor(col) ##Also tried this; didn't change anything
How can I get greater control over the colours of the joyplot and change them around? Any help would be appreciated.
MCVE
Sample of the csv file I'm reading in(Actual shape of dataframe is (4453,2)):
Team Minute
0 148 5
1 148 5
2 148 11
3 148 11
4 148 12
5 148 22
6 143 27
My code:
df = pd.read_csv(r"path")
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
df["Minute"] = pd.to_numeric(df["Minute"])
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Count"])
Cm = cmap(norm(ar))
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], colormap = plt.cm.viridis)
I want to color every subplot in the plot by the total count of the team from the group_df["Count"] values. Currently, the colormap is just uniform and not according to the total value. The picture above is what's produced.
joypy fills the colors of the KDE curves sequentially from a colormap. So in order to have the colors match to a third variable you can supply a colormap which contains the colors in the order you need. This can be done using a ListedColormap.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(21)
import pandas as pd
import joypy
df = pd.DataFrame({"Team" : np.random.choice([143,148,159,167], size=200),
"Minute" : np.random.randint(0,100, size=200)})
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
print(group_df)
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
ar = np.array(group_df["Count"])
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", x_range = [0,94], colormap = cmap)
fig.colorbar(sm, ax=axes, label="Count")
plt.show()

Final plot in a series of matplotlib subplots has increased y tick label padding

I have a Pandas dataframe that contains columns representing year, month within year and a binary outcome (0/1). I want to plot a column of barcharts with one barchart per year. I've used the subplots() function in matplotlib.pyplot with sharex = True and sharey = True. The graphs look fine except the padding between the y-ticks and the y-tick labels is different on the final (bottom) graph.
An example dataframe can be created as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generate dataframe containing dates over several years and a random binary outcome
tempDF = pd.DataFrame()
tempDF['date'] = pd.date_range(start = pd.to_datetime('2014-01-01'),end = pd.to_datetime('2017-12-31'))
tempDF['case'] = np.random.choice([0,1],size = [len(tempDF.index)],p = [0.9,0.1])
# Create a dataframe that summarises proportion of cases per calendar month
tempGroupbyCalendarMonthDF = tempDF.groupby([tempDF['date'].dt.year,tempDF['date'].dt.month]).agg({'case': sum,
'date': 'count'})
tempGroupbyCalendarMonthDF.index.names = ['year','month']
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.reset_index()
# Rename columns to something more meaningful
tempGroupbyCalendarMonthDF = tempGroupbyCalendarMonthDF.rename(columns = {'case': 'numberCases',
'date': 'monthlyTotal'})
# Calculate percentage positive cases per month
tempGroupbyCalendarMonthDF['percentCases'] = (tempGroupbyCalendarMonthDF['numberCases']/tempGroupbyCalendarMonthDF['monthlyTotal'])*100
The final dataframe looks something like:
year month monthlyTotal numberCases percentCases
0 2014 1 31 5 16.129032
1 2014 2 28 5 17.857143
2 2014 3 31 3 9.677419
3 2014 4 30 1 3.333333
4 2014 5 31 4 12.903226
.. ... ... ... ... ...
43 2017 8 31 2 6.451613
44 2017 9 30 2 6.666667
45 2017 10 31 3 9.677419
46 2017 11 30 2 6.666667
47 2017 12 31 1 3.225806
Then the plots are produced as shown below. The subplots() function is used to return an array of axes. The code steps through each axis and plots the values. The x-axis ticks and labels are only displayed on the final (bottom) plot. Finally, the get a common y-axis label, an additional subplot is added that covers all the bar graphs but all the axes and labels (except the y axis label) are not displayed.
# Calculate minimumn and maximum years in dataset
minYear = tempDF['date'].min().year
maxYear = tempDF['date'].max().year
# Set a few parameters
barWidth = 0.80
labelPositionX = 0.872
labelPositionY = 0.60
numberSubplots = maxYear - minYear + 1
fig, axArr = plt.subplots(numberSubplots,figsize = [8,10],sharex = True,sharey = True)
# Keep track of which year to plot, starting with first year in dataset.
currYear = minYear
# Step through each subplot
for ax in axArr:
# Plot the data
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth)
# Format the axes
ax.set_xlim([0.8,13])
ax.set_ylim([0,40])
ax.grid(True)
ax.tick_params(axis = 'both',
left = 'on',
bottom = 'off',
top = 'off',
right = 'off',
direction = 'out',
length = 4,
width = 2,
labelsize = 14)
# Turn on the x-axis ticks and labels for final plot only
if currYear == maxYear:
ax.tick_params(bottom = 'on')
xtickPositions = [1,2,3,4,5,6,7,8,9,10,11,12]
ax.set_xticks([x + barWidth/2 for x in xtickPositions])
ax.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])
ytickPositions = [10,20,30]
ax.set_yticks(ytickPositions)
# Add label (in this case, the year) to each subplot
# (The transform = ax.transAxes makes positios relative to current axis.)
ax.text(labelPositionX, labelPositionY,currYear,
horizontalalignment = 'center',
verticalalignment = 'center',
transform = ax.transAxes,
family = 'sans-serif',
fontweight = 'bold',
fontsize = 18,
color = 'gray')
# Onto the next year...
currYear = currYear + 1
# Fine-tune overall figure
# ========================
# Make subplots close to each other.
fig.subplots_adjust(hspace=0)
# To display a common y-axis label, create a large axis that covers all the subplots
fig.add_subplot(111, frameon=False)
# Hide tick and tick label of the big axss
plt.tick_params(labelcolor='none', top='off', bottom='off', left='off', right='off')
# Add y label that spans several subplots
plt.ylabel("Percentage cases (%)", fontsize = 16, labelpad = 20)
plt.show()
The figure that is produced is almost exactly what I want but the y-axis tick labels on the bottom plot are set further from the axis compared with all the other plots. If the number of plots produced is altered (by using a wider range of dates), the same pattern occurs, namely, it's only the final plot that appears different.
I'm almost certainly not seeing the wood for the trees but can anyone spot what I've done wrong?
EDIT
The above code was original run on Matplotlib 1.4.3 (see comment by ImportanceOfBeingErnest). However, when updated to the Matplotlib 2.0.2 the code failed to run (KeyError: 0). The reason appears to be that the default setting in Matplotlib 2.xxx is for bars to be aligned center. To get the above code to run, either adjust the x-axis range and tick positions so that the bars don't extend beyond the y-axis or set align='center' in the plotting function, i.e.:
rects = ax.bar(tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'month'],
tempGroupbyCalendarMonthDF.loc[tempGroupbyCalendarMonthDF['year'] == currYear,'percentCases'],
width = barWidth,
align = 'edge')

How to border a bar for particular data aperiod in Python

I have a dataset of a year and its numerical description. Example:
X Y
1890 6
1900 4
2000 1
2010 9
I plot a bar like plt.bar(X,Y) and it looks like:
How can I make the step of the X scale more detailet, for example, 2 years?
Can I border somehow every 5 years with another color, red, for instatnce?
There are some different ways to do this. This is a possible solution:
import matplotlib.pyplot as plt
x = [1890,1900,2000,2010]
y = [6,4,1,9]
stepsize = 10 # Chose your step here
fig, ax = plt.subplots()
ax.bar(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
plt.show()
, the result is:

Color line by third variable - Python

I have the following data set:
In[55]: usdbrl
Out[56]:
Date Price Open High Low Change STD
0 2016-03-18 3.6128 3.6241 3.6731 3.6051 -0.31 0.069592
1 2016-03-17 3.6241 3.7410 3.7449 3.6020 -3.16 0.069041
2 2016-03-16 3.7422 3.7643 3.8533 3.7302 -0.62 0.068772
3 2016-03-15 3.7656 3.6610 3.7814 3.6528 2.83 0.071474
4 2016-03-14 3.6618 3.5813 3.6631 3.5755 2.23 0.070348
5 2016-03-11 3.5820 3.6204 3.6692 3.5716 -1.09 0.076458
6 2016-03-10 3.6215 3.6835 3.7102 3.6071 -1.72 0.062977
7 2016-03-09 3.6849 3.7543 3.7572 3.6790 -1.88 0.041329
8 2016-03-08 3.7556 3.7826 3.8037 3.7315 -0.72 0.013700
9 2016-03-07 3.7830 3.7573 3.7981 3.7338 0.63 0.000000
I want to plot Price against Date:
But I would like to color the line by a third variable (in my case Date or Change).
Could anybody help with this please?
Thanks.
I've wrote a simple function to map a given property into a color:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
def plot_colourline(x,y,c):
c = cm.jet((c-np.min(c))/(np.max(c)-np.min(c)))
ax = plt.gca()
for i in np.arange(len(x)-1):
ax.plot([x[i],x[i+1]], [y[i],y[i+1]], c=c[i])
return
This function normalizes the desired property and get a color from the jet colormap. You may want to use a different one. Then, get the current axis and plot different segments of your data with a different colour. Because I am doing a for loop, you should avoid using it for a very large data set, however, for normal purposes it is useful.
Consider the following example as a test:
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = 1.*np.arange(n)
y = np.random.rand(n)
prop = x**2
fig = plt.figure(1, figsize=(5,5))
ax = fig.add_subplot(111)
plot_colourline(x,y,prop)
You could color the data points by a third variable, if that would help:
dates = [dt.date() for dt in pd.to_datetime(df.Date)]
plt.scatter(dates, df.Price, c=df.Change, s=100, lw=0)
plt.plot(dates, df.Price)
plt.colorbar()
plt.show()

Categories