Is it possible to put numbers on top of a matplot histogram? - python

import matplotlib.pyplot as plt
import numpy as np
randomnums = np.random.normal(loc=9,scale=6, size=400).astype(int)+15
Output:
array([25, 22, 19, 26, 24, 9, 19, 32, 30, 25, 29, 17, 21, 14, 17, 27, 27,
28, 17, 17, 20, 21, 16, 28, 20, 24, 15, 20, 20, 13, 33, 21, 30, 27,
8, 22, 24, 25, 23, 13, 24, 20, 16, 32, 15, 26, 34, 16, 21, 21, 28,
22, 23, 18, 20, 22, 23, 22, 23, 26, 22, 25, 19, 29, 14, 27, 21, 23,
24, 19, 25, 15, 22, 23, 19, 19, 23, 21, 22, 17, 25, 15, 24, 25, 23 ...
h = sorted(randomnums)
plt.hist(h,density=False)
plt.show()
Output:
From my research I found only how to plot numbers on top of a bar chart, but what I want is to plot on top of a histogram chart. Is it possible?

An adapted version of the answer I linked in the comments of the question. Thanks a lot for the suggestions in the comments below this post!
import matplotlib.pyplot as plt
import numpy as np
h = np.random.normal(loc=9,scale=6, size=400).astype(int)+15
fig, ax = plt.subplots(figsize=(16, 10))
ax.hist(h, density=False)
for rect in ax.patches:
height = rect.get_height()
ax.annotate(f'{int(height)}', xy=(rect.get_x()+rect.get_width()/2, height),
xytext=(0, 5), textcoords='offset points', ha='center', va='bottom')
...gives e.g.
See also: matplotlib.axes.Axes.annotate.

Related

how to detect the peaks above 23 using scipy

i have the random data in which i plotted graph for finding the peaks which is originated from zero i used this code
x = np.array([0, 7, 18, 24, 26, 27, 26, 25, 26, 16, 20, 16, 23, 33, 27, 27,
22, 26, 27, 26, 25, 24, 25, 26, 23, 25, 26, 24, 23, 12, 22, 11, 15, 24, 11,
12, 11, 27, 19, 25, 26, 21, 23, 26, 13, 9, 22, 18, 23, 26, 26, 25, 10, 22,
27, 25, 19, 10, 15, 20, 21, 13, 16, 16, 15, 19, 17, 20, 24, 26, 20, 23, 23,
25, 19, 15, 16, 27, 26, 27, 28, 24, 23, 24, 27, 28, 30, 31, 30, 9, 0, 11,
16, 25, 25, 22, 25, 25, 11, 15, 24, 24, 24, 17, 0, 23, 21, 0, 24, 26, 24,
26, 26, 26, 24, 25, 24, 24, 22, 22, 22, 23, 24, 26])
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
zero_locs = np.where(x==0) # find zeros in x
search_lims = np.append(zero_locs, len(x)) # limits for search area
diff_x = np.diff(x) # find the derivative of x
diff_x_mapped = diff_x > 0 # find the max's of x (zero crossover # points)
# from every zero, search for the first peak within the range of current
# zero location to next zero location
peak_locs = []
for i in range(len(search_lims)-1):
peak_locs.append(search_lims[i] +
np.where(diff_x_mapped[search_lims[i]:search_lims[i+1]]==0)[0][0])
fig= plt.figure(figsize=(19,5))
plt.plot(x)
plt.plot(np.array(peak_locs), x[np.array(peak_locs)], "x", color = 'r')
this is my code actually this is vehicle speed data for every 0 to maximum peak it is detecting correctly but the thing is it should detect above 26 i tried unable to get it. my graph output is
for starting point only it is detecting correctly remaining 3 points it should not be detected only the peak value 0 to above 26 only it has to detect how can i do it

how to change Label font size in Seaborn Bar plot?

I want to change the font size of the label of my Bars. The below snippet isn't working...it's almost overlapping each other.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_context('paper')
report_id = ['Report_1', 'Report_2', 'Report_3', 'Report_4', 'Report_5', 'Report_6', 'Report_7', 'Report_8', 'Report_9',
'Report_10', 'Report_11', 'Report_12', 'Report_13', 'Report_14', 'Report_15', 'Report_16', 'Report_17',
'Report_18', 'Report_19', 'Report_20', 'Report_21', 'Report_22', 'Report_23', 'Report_24', 'Report_25',
'Report_26', 'Report_27', 'Report_28', 'Report_29', 'Report_30', 'Report_31', 'Report_32', 'Report_33',
'Report_34', 'Report_35', 'Report_36', 'Report_37', 'Report_38', 'Report_39', 'Report_40', 'Report_41',
'Report_42', 'Report_43', 'Report_44', 'Report_45', 'Report_46', 'Report_47', 'Report_48', 'Report_49',
'Report_50', 'Report_51', 'Report_52', 'Report_53', 'Report_54', 'Report_55', 'Report_56', 'Report_57',
'Report_58', 'Report_59', 'Report_60']
report_value = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60]
df = pd.DataFrame({'report_id': report_id, 'report_value': report_value})
sns.set(rc={'figure.figsize': (12, 60)})
ax = sns.barplot(y="report_id", x="report_value", data=df, palette="GnBu_d")
ax.tick_params(labelsize=3)
initialx = 0
for p in ax.patches:
ax.text(p.get_width(), initialx + p.get_height() / 10, "{:1.0f}".format(p.get_width()))
initialx += 1
plt.show()
Output Image is:

Return deviant value from mean values in a graph

I have following file data.txt
This file contains number of bounding boxes and their respective heights. I have wrote a function to extract the heights of all the boxes from json input data.txt respectively:
heights [43, 17, 23, 24, 17, 27, 19, 19, 24, 22, 8, 8, 26, 25, 18, 19,
20, 20, 20, 21, 20, 20, 22, 18, 18, 19, 19, 16, 13, 20, 20, 19, 19,
20, 13, 20, 18, 18, 13, 12, 19, 25, 17, 13, 38, 38, 20, 19, 16]
I have wrote following script to plot the height of each box
box_number=[]
box_height=[]
for index2, num2 in enumerate(heights):
print('box number',index2, 'box height',num2)
box_number.append(index2)
box_height.append(num2)
#ax = sns.lineplot(box_number, box_height);
ax = sns.stripplot(box_number, box_height);
ax.set(xlabel ='box number', ylabel ='height of box')
# giving title to the plot
plt.title('My first graph');
# function to show plot
plt.show()
here's the output:
I want to write a function to print boxes which are very tall in height and which are deviant from the mean value of height . In short print box number 0,44 and 45. How can I do this?
(Every time I will get a different set of boxes but I'll have to find a mean value of their height and print boxes which are too tall)
There are several strategies to discover outliers. The definition of outlier is what matters at the end of the day. If you want a simple computation as you described, you can do something like this:
import numpy as np
# heights
hs = [43, 17, 23, 24, 17, 27, 19, 19, 24, 22, 8, 8, 26, 25, 18, 19, 20, 20, 20, 21, 20,
20, 22, 18, 18, 19, 19, 16, 13, 20, 20, 19, 19, 20, 13, 20, 18, 18, 13, 12, 19,
25, 17, 13, 38, 38, 20, 19, 16]
# let's say that an outlier is a height that is farther than 2*std from the mean
outliers_definition = np.abs(hs - np.mean(hs)) > 2 * np.std(hs)
# you can get their indexes this way
outliers_idx = np.argwhere(outliers_definition)
print(outliers_idx)
# array([[ 0],
# [44],
# [45]], dtype=int64)
Notice that the mean here is also taking the outliers into account. You could use the median, for example. If you want something more robust, there is a vast literature on outlier detection. I recommend you to take a look at it.

How do I initialize a numpy array starting at a particular number?

I can initialize a numpy array and reshape it at the time of creation.
test = np.arange(32).reshape(4, 8)
which produces this:
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])
... but I'd like to know how to start the sequential numbering at a given point, say at 13 rather than at 0. How is that done in numpy?
I've looked for answers and found something somewhat similar but it seems there would be a numpy command to do this.
arange takes an optional start argument.
start = 13 # Any number works here
np.arange(start, start + 32).reshape(4, 8)
# array([[13, 14, 15, 16, 17, 18, 19, 20],
# [21, 22, 23, 24, 25, 26, 27, 28],
# [29, 30, 31, 32, 33, 34, 35, 36],
# [37, 38, 39, 40, 41, 42, 43, 44]])

How to implement a plot of the regression model in my code?

I have a little school project and would like to show the plot of the function in any way, maybe like this:
I know that my code is probaply bad, and if you have any iprovements just throw them at me.
This is the code I have worked on so far... I coded the data into the program by hand.
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn import linear_model
Xtrain = np.array([[15, 15, 20, 30, 20, 20],
[10, 10, 15, 25, 15, 15],
[20, 20, 25, 35, 25, 25],
[20, 20, 30, 20, 30, 20],
[15, 15, 25, 15, 25, 15],
[25, 25, 35, 25, 35, 25],
[30, 30, 30, 30, 10, 10],
[25, 25, 25, 25, 10, 10],
[35, 25, 35, 35, 15, 15],
[20, 20, 30, 25, 30, 25],
[15, 15, 25, 20, 25, 20],
[25, 25, 35, 30, 35, 30],
[10, 10, 15, 25, 30, 20],
[10, 10, 10, 20, 25, 15],
[20, 20, 20, 30, 35, 25],
[20, 25, 25, 20, 30, 20],
[15, 20, 20, 15, 25, 15],
[25, 30, 30, 25, 35, 25]])
ytrain = np.array([20, 15, 25, 20, 15, 25, 15, 10, 20, 20, 15, 25, 15, 10, 20, 20, 15, 25])
lr = LogisticRegression().fit(Xtrain, ytrain)
yhat = lr.predict(Xtrain)
print (accuracy_score(ytrain, yhat))
The problem is your Xtrain (in other words your Xaxis) is composed of 6 variables. That means it is 6 dimensional. On top of that there is a Y dimension of ytrain. A total of 7 dimensions. It will be very hard to visualize 7 dimensions on a 2D diagram. However suppose you want to plot the first column in the Xtrain with respect to Ytrain and plot it on top of it the predicted yhat, you can do as below. However please note this will not serve your original purpose of plotting the full Xtrain.
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn import linear_model
import matplotlib.pyplot as plt
Xtrain = np.array([[15, 15, 20, 30, 20, 20],
[10, 10, 15, 25, 15, 15],
[20, 20, 25, 35, 25, 25],
[20, 20, 30, 20, 30, 20],
[15, 15, 25, 15, 25, 15],
[25, 25, 35, 25, 35, 25],
[30, 30, 30, 30, 10, 10],
[25, 25, 25, 25, 10, 10],
[35, 25, 35, 35, 15, 15],
[20, 20, 30, 25, 30, 25],
[15, 15, 25, 20, 25, 20],
[25, 25, 35, 30, 35, 30],
[10, 10, 15, 25, 30, 20],
[10, 10, 10, 20, 25, 15],
[20, 20, 20, 30, 35, 25],
[20, 25, 25, 20, 30, 20],
[15, 20, 20, 15, 25, 15],
[25, 30, 30, 25, 35, 25]])
ytrain = np.array([20, 15, 25, 20, 15, 25, 15, 10, 20, 20, 15, 25, 15, 10, 20, 20, 15, 25])
lr = LogisticRegression().fit(Xtrain, ytrain)
yhat = lr.predict(Xtrain)
plt.scatter(x=Xtrain[:,0],y=ytrain,color="blue")
plt.scatter(x=Xtrain[:,0],y=yhat,color="red")
plt.show()
The output is as below. The predicted and observed values are very close in this case. Please let me know, if my explanation made any sense or if I read the problem requirement completely wrong.

Categories