Labelling a matplotlib histogram bin with an arrow - python

I have a histogram plot which could be replicated with the MWE below:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
pd.Series(np.random.normal(0, 100, 1000)).plot(kind='hist', bins=50)
Which creates a plot like this:
How would I then go about labelling the bin with an arrow for a given integer?
For example see below, where an arrow labels the bin containing the integer 300.
EDIT: I should add ideally the y coordinates of the arrow should be set automatically by the height of the bar it is labelling - if possible!

you can use annotate to add an arrow:
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
fig, ax = plt.subplots()
series = pd.Series(np.random.normal(0, 100, 1000))
series.plot(kind='hist', bins=50, ax=ax)
ax.annotate("",
xy=(300, 5), xycoords='data',
xytext=(300, 20), textcoords='data',
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3"),
)
In this example, I added an arrow that goes from coordinates (300, 20) to (300, 5).
In order to automatically scale your arrow to the value in the bin, you can use matplotlib hist to plot the histogram and get the values back and then use numpy where to find which bin corresponds to the desired position.
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
nbins = 50
labeled_bin = 200
fig, ax = plt.subplots()
series = pd.Series(np.random.normal(0, 100, 1000))
## plot the histogram and return the bin position and values
ybins, xbins, _ = ax.hist(series, bins=nbins)
## find out in which bin belongs the position where you want the label
ind_bin = np.where(xbins >= labeled_bin)[0]
if len(ind_bin) > 0 and ind_bin[0] > 0:
## get position and value of the bin
x_bin = xbins[ind_bin[0]-1]/2. + xbins[ind_bin[0]]/2.
y_bin = ybins[ind_bin[0]-1]
## add the arrow
ax.annotate("",
xy=(x_bin, y_bin + 5), xycoords='data',
xytext=(x_bin, y_bin + 20), textcoords='data',
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3"),
)
else:
print "Labeled bin is outside range"

#Julien Spronck showed the best way, I think. Alternatively, you can also use arrow; the example code can be found below. The y-ccordinate is determined automatically by calculating how many elements are in a certain bin (with a certain tolerance which you can define yourself). You can play with the parameters (length of arrow head, length of arrow). Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
mySer = pd.Series(np.random.normal(0, 100, 1000))
mySer.plot(kind='hist', bins=50)
# that is where you want to add the arrow
ind = 200
# determine how many elements you have in the bin (with a certain tolerance)
n = len(mySer[(mySer > ind*0.95) & (mySer < ind*1.05)])
# define length of the arrow
lenArrow = 10
lenHead = 2
wiArrow = 5
plt.arrow(ind, n+lenArrow+lenHead, 0, -lenArrow, head_width=wiArrow+3, head_length=lenHead, width=wiArrow, fc='k', ec='k')
plt.show()
This gives you the following output (for 200 instead of 300 as in your example):

Related

Seaborn: How to change the color of individual bars in histogram?

I was looking on internet but i didn't get any solution.
I have this graph and I want to change the color of the first bar, if I use the parameter 'color' it changes all the bars.
Is it possible to do this?
Thank u so much!
You could access the list of generated rectangles via ax.patches, and then recolor the first one:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'Sales': 100000 * (np.random.rand(80) ** 1.5) + 18000})
ax = sns.histplot(x='Sales', data=df, bins=4, color='skyblue', alpha=1)
ax.patches[0].set_facecolor('salmon')
plt.show()
To get a separation exactly at 40.000, you could create two histograms on the same subplot. With binrange= exact limits can be set:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'Sales': 100000 * (np.random.rand(80) ** 1.5) + 18000})
# either choose a fixed limit, or set it exactly at one fourth
limit = 40000
# limit = df['Sales'].min() + 0.25 * (df['Sales'].max() - df['Sales'].min())
ax = sns.histplot(x='Sales', data=df[df['Sales'] <= limit],
bins=1, binrange=(df['Sales'].min(), limit), color='salmon')
sns.histplot(x='Sales', data=df[df['Sales'] > limit],
bins=3, binrange=(limit, df['Sales'].max()), color='skyblue', ax=ax)
plt.show()
Use:
import seaborn as sns
s = [1,1,2,2,1,3,4]
s = pd.DataFrame({'val': s, 'col':['1' if x==1 else '0' for x in s]})
sns.histplot(data=s, x="val", hue="col")
The output:
Well, the exact way will depend on which mapping software you are using. Your best bet is to break your data into two sets, one for the first bar and one for the rest. You should be able to output each of the sets in its own colour.

How to plot Multiline Graphs Via Seaborn library in Python?

I have written a code that looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
exp1= sns.lineplot(data=df1)
plt.savefig('exp1.png')
exp1_smooth= sns.lmplot(x='Size', y='Time', data=df, ci=None, order=4, truncate=False)
plt.savefig('exp1_smooth.png')
That gives me Graph_1:
The Size = x- axis is a constant line but as you can see in my code it varies from (10,100,1000).
How does this produces a constant line? I want to produce a multiline graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2).
Also I wanted to plot a smooth graph of the same graph I am getting right now but it gives me error. What needs to be done to achieve a smooth multi-line graph with x-axis = Size(T),y- axis= Encrypt_Time and Decrypt_Time (power1 & power2)?
I think it not the issue, the line represents for size looks like constant but it NOT.
Can see that values of size in range 10-1000 while the minimum division of y-axis is 20,000 (20 times bigger), make it look like a horizontal line on your graph.
You can try with a bigger values to see the slope clearly.
If you want 'size` as x-axis, you can try below example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([10.03,100.348,1023.385])
power1 = np.array([100000,86000,73000])
power2 = np.array([1008000,95000,1009000])
df1 = pd.DataFrame(data = {'Size': T, 'Encrypt_Time': power1, 'Decrypt_Time': power2})
fig = plt.figure()
fig = sns.lineplot(data=df1, x='Size',y='Encrypt_Time' )
fig = sns.lineplot(data=df1, x='Size',y='Decrypt_Time' )

Seaborn: reverse cbar

I have a Dataframe which represents a binary matrix (0 and 1), with labels on rows and columns. I'm using the following code to print the matrix assigning each label a color:
import seaborn as sns
import matplotlib.pylab as plt
import matplotlib as mpl
import pandas as pd
import numpy as np
N = 100
M = 200
p = 0.8
df = pd.DataFrame(np.random.choice([0,1], (M,N), p=(p, 1-p)),
columns=sorted((list(range(10))*N)[0:N]),
index=sorted((list(range(10))*N)[0:M]))
cmap = mpl.colors.ListedColormap([(.8, .8, .8, 1.0)] + [plt.cm.jet(i) for i in range(plt.cm.jet.N-1)])
ax = sns.heatmap(df.apply(lambda s: (s.name==s.index)*s*(s.index+1)), mask=df.eq(0), cmap=cmap )
My issue is that the colors displayed in the cbar are in the reversed order with respect to those shown in the figure (and so are the labels). How can I reverse the colors and the labels in the cbar?
I tried:
ax.invert_yaxis()
but it also changes the structure of the plot.
Is there a solution?
You can grab the colorbar via ax.collections[0].colorbar and then call invert_yaxis() on its ax.
ax.collections[0].colorbar.ax.invert_yaxis()

how to reduce y-axis in matplot with same distance

I want this plot's y-axis to be centered at 38, and the y-axis scaled such that the 'humps' disappear. How do I accomplish this?
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02', '33.79', '34.73', '36.47', '35.03', '33.45',
'33.82', '33.38', '34.68', '36.93', '33.44', '33.55',
'33.18', '33.07', '33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 2))
for i,j in zip(s,df[0]):
ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, df[0])
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.grid()
#plt.yticks(np.arange(min(df[p]), max(df[p])+1, 2))
fig.savefig("test.png")
plt.show()
I'm not entirely certain if this is what you're looking for but you can adjust the y-limits explicitly to change the scale, i.e.
ax.set_ylim([ax.get_ylim()[0], 42])
Which only sets the upper bound, leaving the lower limit unchanged, this would give you
you can supply any values you find appropriate, i.e.
ax.set_ylim([22, 52])
will give you something that looks like
Also note that the tick labels and general appearance of your plot will differ from what is shown here.
Edit - Here is the complete code as requested:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02','33.79','34.73','36.47','35.03','33.45',
'33.82','33.38','34.68','36.93','33.44','33.55',
'33.18','33.07','33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 3))
#for i,j in zip(s,df[0]):
# ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, pd.to_numeric(df[0]))
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.set_xticklabels(pd.to_datetime(s).strftime('%m.%d'), rotation=45)
ax.set_ylim([22, 52])
plt.show()

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

I would like to plot an EPSgram (see below) using Python and Matplotlib.
The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes?
I put together a sample, if you're still curious. It uses scipy.stats.scoreatpercentile, but you may be getting those numbers from elsewhere:
from random import random
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import scoreatpercentile
x = np.array([random() for x in xrange(100)])
# percentiles of interest
perc = [min(x), scoreatpercentile(x,10), scoreatpercentile(x,25),
scoreatpercentile(x,50), scoreatpercentile(x,75),
scoreatpercentile(x,90), max(x)]
midpoint = 0 # time-series time
fig = plt.figure()
ax = fig.add_subplot(111)
# min/max
ax.broken_barh([(midpoint-.01,.02)], (perc[0], perc[1]-perc[0]))
ax.broken_barh([(midpoint-.01,.02)], (perc[5], perc[6]-perc[5]))
# 10/90
ax.broken_barh([(midpoint-.1,.2)], (perc[1], perc[2]-perc[1]))
ax.broken_barh([(midpoint-.1,.2)], (perc[4], perc[5]-perc[4]))
# 25/75
ax.broken_barh([(midpoint-.4,.8)], (perc[2], perc[3]-perc[2]))
ax.broken_barh([(midpoint-.4,.8)], (perc[3], perc[4]-perc[3]))
ax.set_ylim(-0.5,1.5)
ax.set_xlim(-10,10)
ax.set_yticks([0,0.5,1])
ax.grid(True)
plt.show()

Categories