I have a dataframe which looks like this:
Team Minute Type
148 12 1
148 22 1
143 27 1
148 29 1
143 32 1
143 32 1
I created a joyplot using the Python library joypy
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], linewidth = 1, colormap=plt.cm.viridis)
Which gave me this plot:
All Good.
However, the colourmap is meaningless now so I am trying to color the plots according to a second dataframe - which is the sum of Type for all the teams.
To do that, I created a norm, and a colourmap using these lines:
norm = plt.Normalize(group_df["Type"].min(), group_df["Type"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Type"])
Cm = cmap(norm(ar))
sm.set_array([])
Here's where the problem arose as I can't figure out how to change the color of the joyplots. I tried a couple of approaches:
I tried to pass this Cm as the colormap argument. However, that threw up an error - typeerror 'numpy.ndarray' object is not callable
I tried to use a for loop over the axes and Cm -
for col, ax in zip(Cm, axes):
ax.set_facecolor(col)
#ax.patch.set_facecolor(col) ##Also tried this; didn't change anything
How can I get greater control over the colours of the joyplot and change them around? Any help would be appreciated.
MCVE
Sample of the csv file I'm reading in(Actual shape of dataframe is (4453,2)):
Team Minute
0 148 5
1 148 5
2 148 11
3 148 11
4 148 12
5 148 22
6 143 27
My code:
df = pd.read_csv(r"path")
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
df["Minute"] = pd.to_numeric(df["Minute"])
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
cmap = plt.cm.viridis
sm = matplotlib.cm.ScalarMappable(cmap=cmap, norm=norm)
ar = np.array(group_df["Count"])
Cm = cmap(norm(ar))
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", figsize =(10,16), x_range = [0,94], colormap = plt.cm.viridis)
I want to color every subplot in the plot by the total count of the team from the group_df["Count"] values. Currently, the colormap is just uniform and not according to the total value. The picture above is what's produced.
joypy fills the colors of the KDE curves sequentially from a colormap. So in order to have the colors match to a third variable you can supply a colormap which contains the colors in the order you need. This can be done using a ListedColormap.
import matplotlib
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(21)
import pandas as pd
import joypy
df = pd.DataFrame({"Team" : np.random.choice([143,148,159,167], size=200),
"Minute" : np.random.randint(0,100, size=200)})
##getting the sum for every team - total of 20 teams
group_df = df.groupby(["Team"]).size().to_frame("Count").reset_index()
print(group_df)
##Trying to create a colormap
norm = plt.Normalize(group_df["Count"].min(), group_df["Count"].max())
ar = np.array(group_df["Count"])
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
fig, axes = joypy.joyplot(df, by="Team", column="Minute", x_range = [0,94], colormap = cmap)
fig.colorbar(sm, ax=axes, label="Count")
plt.show()
Related
I have the following dataframe where it contains the best equipment in operation ranked by 1 to 300 (1 is the best, 300 is the worst) over a few days (df columns)
Equipment 21-03-27 21-03-28 21-03-29 21-03-30 21-03-31 21-04-01 21-04-02
P01-INV-1-1 1 1 1 1 1 2 2
P01-INV-1-2 2 2 4 4 5 1 1
P01-INV-1-3 4 4 3 5 6 10 10
I would like to customize a line plot (example found here) but I'm having some troubles trying to modify the example code provided:
import matplotlib.pyplot as plt
import numpy as np
def energy_rank(data, marker_width=0.1, color='blue'):
y_data = np.repeat(data, 2)
x_data = np.empty_like(y_data)
x_data[0::2] = np.arange(1, len(data)+1) - (marker_width/2)
x_data[1::2] = np.arange(1, len(data)+1) + (marker_width/2)
lines = []
lines.append(plt.Line2D(x_data, y_data, lw=1, linestyle='dashed', color=color))
for x in range(0,len(data)*2, 2):
lines.append(plt.Line2D(x_data[x:x+2], y_data[x:x+2], lw=2, linestyle='solid', color=color))
return lines
data = ranks.head(4).to_numpy() #ranks is the above dataframe
artists = []
for row, color in zip(data, ('red','blue','green','magenta')):
artists.extend(energy_rank(row, color=color))
fig, ax = plt.subplots()
ax.set_xticklabels(ranks.columns) # set X axis to be dataframe columns
ax.set_xticklabels(ax.get_xticklabels(), rotation=35, fontsize = 10)
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([15,0])
ax.set_xbound([.5,8.5])
When using ax.set_xticklabels(ranks.columns), for some reason, it only plots 5 of the 7 days from ranks columns, removing specifically the first and last values. I tried to duplicate those values but this did not work as well. I end up having this below:
In summary, I would like to know if its possible to do 3 customizations:
input all dates from ranks columns on X axis
revert Y axis. ax.set_ybound([15,0]) is not working. It would make more sense to see the graph starting with 0 on top, since 1 is the most important rank to look at
add labels to the end of each line at the last day (last value on X axis). I could add the little window label, but it often gets really messy when you plot more data, so adding just the text at the end of each line would really make it look cleaner
Please let me know if those customizations are impossible to do and any help is really appreciated! Thank you in advance!
To show all the dates, use plt.xticks() and set_xbound to start at 0. To reverse the y axis, use ax.set_ylim(ax.get_ylim()[::-1]). To set the legends the way you described, you can use annotation and set the coordinates of the annotation at your last datapoint for each series.
fig, ax = plt.subplots()
plt.xticks(np.arange(len(ranks.columns)), list(ranks.columns), rotation = 35, fontsize = 10)
plt.xlabel('Date')
plt.ylabel('Rank')
for artist in artists:
ax.add_artist(artist)
ax.set_ybound([0,15])
ax.set_ylim(ax.get_ylim()[::-1])
ax.set_xbound([0,8.5])
ax.annotate('Series 1', xy =(7.1, 2), color = 'red')
ax.annotate('Series 2', xy =(7.1, 1), color = 'blue')
ax.annotate('Series 3', xy =(7.1, 10), color = 'green')
plt.show()
Here is the plot for the three rows of data in your sample dataframe:
I would like to highlithgt a single point on my lineplot graph using a marker. So far I managed to create my plot and insert the highlight where I wanted.
The problem is that I have 4 differents lineplot (4 different categorical attributes) and I get the marker placed on every sigle lineplot like in the following image:
I would like to place the marker only on the 2020 line (the purple one). This is my code so far:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
import seaborn as sns
import numpy as np
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(15,10))
gs0 = gridspec.GridSpec(2,2, figure=fig, hspace=0.2)
ax1 = fig.add_subplot(gs0[0,:]) # lineplot
ax2 = fig.add_subplot(gs0[1,0]) #Used for another plot not shown here
ax3 = fig.add_subplot(gs0[1,1]) #Used for another plot not shown here
flatui = ["#636EFA", "#EF553B", "#00CC96", "#AB63FA"]
sns.lineplot(ax=ax1,x="number of weeks", y="avg streams", hue="year", data=df, palette=flatui, marker = 'o', markersize=20, fillstyle='none', markeredgewidth=1.5, markeredgecolor='black', markevery=[5])
ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.0f}'.format(x/1000) + 'K'))
ax1.set(title='Streams trend')
ax1.xaxis.set_major_locator(ticker.MultipleLocator(2))
I used the markevery field to place a marker in position 5. Is there a way to specify also on which line/category place my marker?
EDIT: This is my dataframe:
avg streams date year number of weeks
0 145502.475 01-06 2017 0
1 158424.445 01-13 2017 1
2 166912.255 01-20 2017 2
3 169132.215 01-27 2017 3
4 181889.905 02-03 2017 4
... ... ... ... ...
181 760505.945 06-26 2020 25
182 713891.695 07-03 2020 26
183 700764.875 07-10 2020 27
184 753817.945 07-17 2020 28
185 717685.125 07-24 2020 29
186 rows × 4 columns
markevery is a Line2D property. sns.lineplot doesn't return the lines so you need to get the line you want to annotate from the Axes. Remove all the marker parameters from the lineplot call and add ...
lines = ax1.get_lines()
If the 2020 line/data is the fourth in the series,
line = lines[3]
line.set_marker('o')
line.set_markersize(20)
line.set_markevery([5])
line.set_fillstyle('none')
line.set_markeredgewidth(1.5)
line.set_markeredgecolor('black')
# or
props = {'marker':'o','markersize':20, 'fillstyle':'none','markeredgewidth':1.5,
'markeredgecolor':'black','markevery': [5]}
line.set(**props)
Another option, inspired by Quang Hoang's comment would be to add a circle around/at the point deriving the point from the DataFrame.
x = 5 # your spec
wk = df['number of weeks']==5
yr = df['year']==2020
s = df[wk & yr]
y = s['avg streams'].to_numpy()
# or
y = df.loc[(df['year']==2020) & (df['number of weeks']==5),'avg streams'].to_numpy()
ax1.plot(x,y, 'ko', markersize=20, fillstyle='none', markeredgewidth=1.5)
I am plotting a histogram using matplotlib but my y-axis range is in the millions. How can I scale the y-axis so that instead of printing 5000000 it will print 5
Here is my code
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
filename = './norstar10readlength.csv'
df=pd.read_csv(filename, sep=',',header=None)
n, bins, patches = plt.hist(x=df.values, bins=10, color='#0504aa',
alpha=0.7, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('My Very Own Histogram')
maxfreq = n.max()
# Set a clean upper y-axis limit.
plt.ylim(ymax=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)
plt.show()
And here is the plot I am generating now
An elegant solution is to apply a FuncFormatter to format y labels.
Instead of your source data, I used the following DataFrame:
Val
0 800000
1 2600000
2 6700000
3 1400000
4 1700000
5 1600000
and made a bar plot. "Ordinary" bar plot:
df.Val.plot.bar(rot=0, width=0.75);
yields a picture with original values on the y axis (1000000 to
7000000).
But if you run:
from matplotlib.ticker import FuncFormatter
def lblFormat(n, pos):
return str(int(n / 1e6))
lblFormatter = FuncFormatter(lblFormat)
ax = df.Val.plot.bar(rot=0, width=0.75)
ax.yaxis.set_major_formatter(lblFormatter)
then y axis labels are integers (the number of millions):
So you can arrange your code something like this:
n, bins, patches = plt.hist(x=df.values, ...)
#
# Other drawing actions, up to "plt.ylim" (including)
#
ax = plt.gca()
ax.yaxis.set_major_formatter(lblFormatter)
plt.show()
You can modify your df itself, you just need to decide one ratio
so if you want to make 50000 to 5 then it means the ratio is 5/50000 which is 0.0001
Once you have the ratio just multiply all the values of y-axis with the ratio in your DataFrame itself.
Hope this helps!!
I have a seaborn count plot, but instead of colour bars I need the value above each bar. My input is pandas data frame.
ax = sns.countplot(x="variable", hue="value", data=pd.melt(dfs))
here dfs has many entries for different columns.
For example, here "man" above the blue bar, "woman" above the brown bar and "child" above the green bar instead of the colour description.
Sometimes it's easier to not try to find ways to tweak seaborn, but rather to directly use matplotlib and build a chart up from scratch.
Here, we can assume to have a dataframe named counts that looks like
hue c m w
class
A 20 31 29
B 40 112 63
C 85 203 117
where the index are the positions along x axis and the columns are the different hues. In the following, groupedbarplot is a function to take such dataframe as input and plot the bars as groups, and in addition add a label to each one of them.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
def groupedbarplot(df, width=0.8, annotate="values", ax=None, **kw):
ax = ax or plt.gca()
n = len(df.columns)
w = 1./n
pos = (np.linspace(w/2., 1-w/2., n)-0.5)*width
w *= width
bars = []
for col, x in zip(df.columns, pos):
bars.append(ax.bar(np.arange(len(df))+x, df[col].values, width=w, **kw))
for val, xi in zip(df[col].values, np.arange(len(df))+x):
if annotate:
txt = val if annotate == "values" else col
ax.annotate(txt, xy=(xi, val), xytext=(0,2),
textcoords="offset points",
ha="center", va="bottom")
ax.set_xticks(np.arange(len(df)))
ax.set_xticklabels(df.index)
return bars
df = pd.DataFrame({"class" : np.random.choice(list("ABC"), size=700, p=[.1,.3,.6]),
"hue" : np.random.choice(["m", "w" ,"c"], size=700, p=[.5,.3,.2])})
counts = df.groupby(["class", "hue"]).size().unstack()
groupedbarplot(counts, annotate="col")
plt.show()
We could also label the values directly, groupedbarplot(counts, annotate="values")
I am plotting a DataFrame as a scatter graph using this code:
My dataframe somewhat looks like this -
Sector AvgDeg
0 1 52
1 2 52
2 3 52
3 4 54
4 5 52
... ... ...
df.plot.scatter(x='Sector', y='AvgDeg', s=df['AvgDeg'], color='LightBlue',grid=True)
plt.show()
and I'm getting this result:
What I need is to draw every dot with a different color and with the corresponding legend. For example: -blue dot- 'Sector 1', -red dot- 'Sector 2', and so on.
Do you have any idea how to do this? Tks!!
What you have to do is to use a list of the same size as the points in the c parameter of scatter plot.
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
txt = ["text1", "text2", "text3", "text4"]
fig, ax = plt.subplots()
x = np.arange(1, 5)
y = np.arange(1, 5)
#c will change the colors of each point
#s is the size of each point...
#c_map is the color map you want to use
ax.scatter(x, y,s = 40, cmap = cmap_light, c=np.arange(1, 5))
for i, j in enumerate(txt):
#use the below code to display the text for each point
ax.annotate(j, (x[i], y[i]))
plt.show()
What this gives you as a result is -
To assign more different colors for 31 points for example you just gotta change the size...
ax.scatter(x, y,s = 40, cmap = cmap_light, c=np.arange(1, 32))
Similarly you can annotate those points by changing the txt list above.
i would do it this way:
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.style.use('ggplot')
colorlist = list(mpl.colors.ColorConverter.colors.keys())
ax = df.plot.scatter(x='Sector', y='AvgDeg', s=df.AvgDeg*1.2,
c=(colorlist * len(df))[:len(df)])
df.apply(lambda x: ax.text(x.Sector, x.AvgDeg, 'Sector {}'.format(x.Sector)), axis=1)
plt.show()
Result