Random data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(np.random.normal(size=(20,4)))
data
0 1 2 3
0 -0.710006 -0.748083 -1.261515 0.048941
1 0.856541 0.533073 0.649113 -0.236297
2 -0.091005 -0.244658 -2.194779 0.632878
3 -0.059058 0.807661 -0.418446 -0.295255
4 -0.103701 0.775622 0.258412 0.024411
5 -0.447976 -0.034419 -1.521598 -0.903301
6 1.451105 0.549661 -1.655751 -0.147499
7 1.479374 -1.475347 0.665726 0.236611
8 -1.427979 -1.812916 0.522802 0.006066
9 0.198515 1.203476 -0.475389 -1.721707
10 0.286255 0.564450 0.590050 -0.657811
11 -1.076161 1.820218 -0.315127 -0.848114
12 0.061848 0.303502 0.978169 0.024630
13 -0.307827 -1.047835 0.547052 -0.647217
14 0.679214 0.734134 0.158803 -0.334951
15 0.469675 1.043391 -1.449727 1.335354
16 -0.483831 -0.988185 0.264027 -0.831833
17 -2.013968 -0.200699 1.076526 1.275300
18 -0.199473 -1.630597 -1.697146 -0.177458
19 1.245289 0.132349 1.054312 -0.082550
data.boxplot(vert= False, figsize = (15,10))
I want to add red dots to the box plot indicating the last value (bottom) in each column. For example (red dots I've edited in are not in their exact position, but this gives you a general idea):
Thank you.
You could just add a scatter plot on top of the boxplot.
For the provided example, it looks like this:
fig, ax = plt.subplots(figsize=(8,5))
df.boxplot(vert= False, patch_artist=True, ax=ax, zorder=1)
lastrow = df.iloc[-1,:]
print(lastrow)
ax.scatter(x=lastrow, y=[*range(1,len(lastrow)+1)], color='r', zorder=2)
# for displaying the values of the red points:
for i, val in enumerate(lastrow,1):
ax.annotate(text=f"{val:.2f}", xy=(val,i+0.1))
I would like to highlithgt a single point on my lineplot graph using a marker. So far I managed to create my plot and insert the highlight where I wanted.
The problem is that I have 4 differents lineplot (4 different categorical attributes) and I get the marker placed on every sigle lineplot like in the following image:
I would like to place the marker only on the 2020 line (the purple one). This is my code so far:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import seaborn as sns
import numpy as np
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(15,10))
gs0 = gridspec.GridSpec(2,2, figure=fig, hspace=0.2)
ax1 = fig.add_subplot(gs0[0,:]) # lineplot
ax2 = fig.add_subplot(gs0[1,0]) #Used for another plot not shown here
ax3 = fig.add_subplot(gs0[1,1]) #Used for another plot not shown here
flatui = ["#636EFA", "#EF553B", "#00CC96", "#AB63FA"]
sns.lineplot(ax=ax1,x="number of weeks", y="avg streams", hue="year", data=df, palette=flatui, marker = 'o', markersize=20, fillstyle='none', markeredgewidth=1.5, markeredgecolor='black', markevery=[5])
ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.0f}'.format(x/1000) + 'K'))
ax1.set(title='Streams trend')
ax1.xaxis.set_major_locator(ticker.MultipleLocator(2))
I used the markevery field to place a marker in position 5. Is there a way to specify also on which line/category place my marker?
EDIT: This is my dataframe:
avg streams date year number of weeks
0 145502.475 01-06 2017 0
1 158424.445 01-13 2017 1
2 166912.255 01-20 2017 2
3 169132.215 01-27 2017 3
4 181889.905 02-03 2017 4
... ... ... ... ...
181 760505.945 06-26 2020 25
182 713891.695 07-03 2020 26
183 700764.875 07-10 2020 27
184 753817.945 07-17 2020 28
185 717685.125 07-24 2020 29
186 rows × 4 columns
markevery is a Line2D property. sns.lineplot doesn't return the lines so you need to get the line you want to annotate from the Axes. Remove all the marker parameters from the lineplot call and add ...
lines = ax1.get_lines()
If the 2020 line/data is the fourth in the series,
line = lines[3]
line.set_marker('o')
line.set_markersize(20)
line.set_markevery([5])
line.set_fillstyle('none')
line.set_markeredgewidth(1.5)
line.set_markeredgecolor('black')
# or
props = {'marker':'o','markersize':20, 'fillstyle':'none','markeredgewidth':1.5,
'markeredgecolor':'black','markevery': [5]}
line.set(**props)
Another option, inspired by Quang Hoang's comment would be to add a circle around/at the point deriving the point from the DataFrame.
x = 5 # your spec
wk = df['number of weeks']==5
yr = df['year']==2020
s = df[wk & yr]
y = s['avg streams'].to_numpy()
# or
y = df.loc[(df['year']==2020) & (df['number of weeks']==5),'avg streams'].to_numpy()
ax1.plot(x,y, 'ko', markersize=20, fillstyle='none', markeredgewidth=1.5)
I have a dataframe which looks like this:
Team Minute Type
148 12 1
148 22 1
143 27 1
148 29 1
143 32 1
143 32 1
I created a joyplot using the Python library joypy
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], linewidth = 1, colormap=plt.cm.viridis)
Which gave me this plot:
All Good.
However, the colourmap is meaningless now so I am trying to color the plots according to a second dataframe - which is the sum of Type for all the teams.
To do that, I created a norm, and a colourmap using these lines:
norm = plt.Normalize(group_df["Type"].min(), group_df["Type"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Type"])
Cm = cmap(norm(ar))
sm.set_array([])
Here's where the problem arose as I can't figure out how to change the color of the joyplots. I tried a couple of approaches:
I tried to pass this Cm as the colormap argument. However, that threw up an error - typeerror 'numpy.ndarray' object is not callable
I tried to use a for loop over the axes and Cm -
for col, ax in zip(Cm, axes):
ax.set_facecolor(col)
#ax.patch.set_facecolor(col) ##Also tried this; didn't change anything
How can I get greater control over the colours of the joyplot and change them around? Any help would be appreciated.
MCVE
Sample of the csv file I'm reading in(Actual shape of dataframe is (4453,2)):
Team Minute
0 148 5
1 148 5
2 148 11
3 148 11
4 148 12
5 148 22
6 143 27
My code:
df = pd.read_csv(r"path")
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
df["Minute"] = pd.to_numeric(df["Minute"])
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Count"])
Cm = cmap(norm(ar))
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], colormap = plt.cm.viridis)
I want to color every subplot in the plot by the total count of the team from the group_df["Count"] values. Currently, the colormap is just uniform and not according to the total value. The picture above is what's produced.
joypy fills the colors of the KDE curves sequentially from a colormap. So in order to have the colors match to a third variable you can supply a colormap which contains the colors in the order you need. This can be done using a ListedColormap.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(21)
import pandas as pd
import joypy
df = pd.DataFrame({"Team" : np.random.choice([143,148,159,167], size=200),
"Minute" : np.random.randint(0,100, size=200)})
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
print(group_df)
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
ar = np.array(group_df["Count"])
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", x_range = [0,94], colormap = cmap)
fig.colorbar(sm, ax=axes, label="Count")
plt.show()
Despite trying some solutions available on SO and at Matplotlib's documentation, I'm still unable to disable Matplotlib's creation of weekend dates on the x-axis.
As you can see see below, it adds dates to the x-axis that are not in the original Pandas column.
I'm plotting my data using (commented lines are unsuccessful in achieving my goal):
fig, ax1 = plt.subplots()
x_axis = df.index.values
ax1.plot(x_axis, df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(x_axis, df['R'], color='r')
# plt.xticks(np.arange(len(x_axis)), x_axis)
# fig.autofmt_xdate()
# ax1.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
fig.tight_layout()
plt.show()
An example of my Pandas dataframe is below, with dates as index:
2019-01-09 1.007042 2585.898714 4.052480e+09 19.980000 12.07 1
2019-01-10 1.007465 2581.828491 3.704500e+09 19.500000 19.74 1
2019-01-11 1.007154 2588.605258 3.434490e+09 18.190001 18.68 1
2019-01-14 1.008560 2582.151225 3.664450e+09 19.070000 14.27 1
Some suggestions I've found include a custom ticker here and here however although I don't get errors the plot is missing my second series.
Any suggestions on how to disable date interpolation in matplotlib?
The matplotlib site recommends creating a custom formatter class. This class will contain logic that tells the axis label not to display anything if the date is a weekend. Here's an example using a dataframe I constructed from the 2018 data that was in the image you'd attached:
df = pd.DataFrame(
data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
'2018-01-12', '2018-01-15', '2018-01-16'])
Move the dates from the index to their own column:
df = df.reset_index().rename({'index': 'Date'}, axis=1, copy=False)
df['Date'] = pd.to_datetime(df['Date'])
Create the custom formatter class:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import Formatter
%config InlineBackend.figure_format = 'retina' # Get nicer looking graphs for retina displays
class CustomFormatter(Formatter):
def __init__(self, dates, fmt='%Y-%m-%d'):
self.dates = dates
self.fmt = fmt
def __call__(self, x, pos=0):
'Return the label for time x at position pos'
ind = int(np.round(x))
if ind >= len(self.dates) or ind < 0:
return ''
return self.dates[ind].strftime(self.fmt)
Now let's plot the MP and R series. Pay attention to the line where we call the custom formatter:
formatter = CustomFormatter(df['Date'])
fig, ax1 = plt.subplots()
ax1.xaxis.set_major_formatter(formatter)
ax1.plot(np.arange(len(df)), df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(np.arange(len(df)), df['R'], color='r')
fig.autofmt_xdate()
fig.tight_layout()
plt.show()
The above code outputs this graph:
Now, no weekend dates, such as 2018-01-13, are displayed on the x-axis.
If you would like to simply not show the weekends, but for the graph to still scale correctly matplotlib has a built-in functionality for this in matplotlib.mdates. Specifically, the WeekdayLocator pretty much solves this problem singlehandedly. It's a one line solution (the rest just fabricates data for testing). Note that this works whether or not the data includes weekends:
import matplotlib.pyplot as plt
import datetime
import numpy as np
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
DT_FORMAT="%Y-%m-%d"
if __name__ == "__main__":
N = 14
#Fake data
x = list(zip([2018]*N, [5]*N, list(range(1,N+1))))
x = [datetime.datetime(*y) for y in x]
x = [y for y in x if y.weekday() < 5]
random_walk_steps = 2 * np.random.randint(0, 6, len(x)) - 3
random_walk = np.cumsum(random_walk_steps)
y = np.arange(len(x)) + random_walk
# Make a figure and plot everything
fig, ax = plt.subplots()
ax.plot(x, y)
### HERE IS THE BIT THAT ANSWERS THE QUESTION
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=(MO, TU, WE, TH, FR)))
ax.xaxis.set_major_formatter(mdates.DateFormatter(DT_FORMAT))
# plot stuff
fig.autofmt_xdate()
plt.tight_layout()
plt.show()
If you are trying to avoid the fact that matplotlib is interpolating between each point of your dataset, you can exploit the fact that matplotlib will plot a new line segment each time a np.NaN is encountered. Pandas makes it easy to insert np.NaN for the days that are not in your dataset with pd.Dataframe.asfreq():
df = pd.DataFrame(data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15', '2018-01-16'])
df.index = pd.to_datetime(df.index)
#rescale R so I don't need to worry about twinax
df.loc[df.R==0, 'R'] = df.loc[df.R==0, 'R'] + df.MP.min()
df.loc[df.R==1, 'R'] = df.loc[df.R==1, 'R'] * df.MP.max()
df = df.asfreq('D')
df
Col 1 MP Col 3 Col 4 Col 5 R
2018-01-08 1.000325 2743.002071 3.242650e+09 9.52 5.04 2743.002071
2018-01-09 1.000807 2754.011543 3.453480e+09 10.08 5.62 2743.002071
2018-01-10 1.001207 2746.121450 3.576350e+09 9.82 5.29 2743.002071
2018-01-11 1.000355 2760.169848 3.641320e+09 9.88 6.58 2743.002071
2018-01-12 1.001512 2780.756857 3.573970e+09 10.16 8.32 2743.002071
2018-01-13 NaN NaN NaN NaN NaN NaN
2018-01-14 NaN NaN NaN NaN NaN NaN
2018-01-15 1.003237 2793.953050 3.573970e+09 10.16 9.57 2793.953050
2018-01-16 1.000979 2792.675162 4.325970e+09 11.66 9.53 2793.953050
df[['MP', 'R']].plot(); plt.show()
I have a dataset of a year and its numerical description. Example:
X Y
1890 6
1900 4
2000 1
2010 9
I plot a bar like plt.bar(X,Y) and it looks like:
How can I make the step of the X scale more detailet, for example, 2 years?
Can I border somehow every 5 years with another color, red, for instatnce?
There are some different ways to do this. This is a possible solution:
import matplotlib.pyplot as plt
x = [1890,1900,2000,2010]
y = [6,4,1,9]
stepsize = 10 # Chose your step here
fig, ax = plt.subplots()
ax.bar(x,y)
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, stepsize))
plt.show()
, the result is: