How to build a histogram of numpy 2 dimensional array - python

I have a 256x256 matrix of values and I would like to plot a histogram of these values
If I am not mistaken, the histogram must be calculated in a vector of values, correct? so here is what I have tried:
from skimage.measure import compare_ssim
import numpy as np
import matplotlib.pyplot as plt
d = np.load("BB_Digital.npy")
n, bins, patches = plt.hist(x=d.ravel(), color='#0504aa', bins='auto', alpha=0.7, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Blue channel Co-occurency matrix')
maxfreq = n.max()
# Set a clean upper y-axis limit.
plt.ylim(ymax=np.ceil(maxfreq / 10) * 10 if maxfreq % 10 else maxfreq + 10)
plt.show()
But then, I have a very strange result:
When I don't use the ravel function (use the 2D matrix) the following result is shown:
However, both histograms seem to be wrong, as I verified later:
>>> np.count_nonzero(d==0)
51227
>>> np.count_nonzero(d==1)
2529
>>> np.count_nonzero(d==2)
1275
>>> np.count_nonzero(d==3)
885
>>> np.count_nonzero(d==4)
619
>>> np.count_nonzero(d==5)
490
>>> np.count_nonzero(d==6)
403
>>> np.max(d)
12518
>>> np.min(d)
0
How can I build a correct histogram?
P.s: Here is the file if you could help me.

The data seems to be discrete. Setting explicit bin boundaries at the halves could show the frequency of each value. As there are very high but infrequent values, the following example cuts off at 50:
import numpy as np
from matplotlib import pyplot as plt
d = np.load("BB_Digital.npy")
plt.hist(d.ravel(), bins=np.arange(-0.5, 51), color='#0504aa', alpha=0.7, rwidth=0.85)
plt.yscale('log')
plt.margins(x=0.02)
plt.show()
Another visualization could show a pcolormesh where the colors use a logarithmic scale. As the values start at 0, adding 1 avoids minus infinity:
from matplotlib import pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np
d = np.load("BB_Digital.npy")
plt.pcolormesh(d + 1, norm=LogNorm(), cmap='inferno')
plt.colorbar()
plt.show()
Yet another visualization concentrates on the diagonal values:
plt.plot(np.diagonal(d), color='navy')
ind_max = np.argmax(np.diagonal(d))
plt.vlines(ind_max, 0, d[ind_max, ind_max], colors='crimson', ls=':')
plt.yscale('log')

Related

how to reduce y-axis in matplot with same distance

I want this plot's y-axis to be centered at 38, and the y-axis scaled such that the 'humps' disappear. How do I accomplish this?
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02', '33.79', '34.73', '36.47', '35.03', '33.45',
'33.82', '33.38', '34.68', '36.93', '33.44', '33.55',
'33.18', '33.07', '33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 2))
for i,j in zip(s,df[0]):
ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, df[0])
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.grid()
#plt.yticks(np.arange(min(df[p]), max(df[p])+1, 2))
fig.savefig("test.png")
plt.show()
I'm not entirely certain if this is what you're looking for but you can adjust the y-limits explicitly to change the scale, i.e.
ax.set_ylim([ax.get_ylim()[0], 42])
Which only sets the upper bound, leaving the lower limit unchanged, this would give you
you can supply any values you find appropriate, i.e.
ax.set_ylim([22, 52])
will give you something that looks like
Also note that the tick labels and general appearance of your plot will differ from what is shown here.
Edit - Here is the complete code as requested:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame()
s=['05/02/2019', '06/02/2019', '07/02/2019', '08/02/2019',
'09/02/2019', '10/02/2019', '11/02/2019', '12/02/2019',
'13/02/2019', '20/02/2019', '21/02/2019', '22/02/2019',
'23/02/2019', '24/02/2019', '25/02/2019']
df[0]=['38.02','33.79','34.73','36.47','35.03','33.45',
'33.82','33.38','34.68','36.93','33.44','33.55',
'33.18','33.07','33.17']
# Data for plotting
fig, ax = plt.subplots(figsize=(17, 3))
#for i,j in zip(s,df[0]):
# ax.annotate(str(j),xy=(i,j+0.8))
ax.plot(s, pd.to_numeric(df[0]))
ax.set(xlabel='Dates', ylabel='Latency',
title='Hongkong to sing')
ax.set_xticklabels(pd.to_datetime(s).strftime('%m.%d'), rotation=45)
ax.set_ylim([22, 52])
plt.show()

Labelling a matplotlib histogram bin with an arrow

I have a histogram plot which could be replicated with the MWE below:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
pd.Series(np.random.normal(0, 100, 1000)).plot(kind='hist', bins=50)
Which creates a plot like this:
How would I then go about labelling the bin with an arrow for a given integer?
For example see below, where an arrow labels the bin containing the integer 300.
EDIT: I should add ideally the y coordinates of the arrow should be set automatically by the height of the bar it is labelling - if possible!
you can use annotate to add an arrow:
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
fig, ax = plt.subplots()
series = pd.Series(np.random.normal(0, 100, 1000))
series.plot(kind='hist', bins=50, ax=ax)
ax.annotate("",
xy=(300, 5), xycoords='data',
xytext=(300, 20), textcoords='data',
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3"),
)
In this example, I added an arrow that goes from coordinates (300, 20) to (300, 5).
In order to automatically scale your arrow to the value in the bin, you can use matplotlib hist to plot the histogram and get the values back and then use numpy where to find which bin corresponds to the desired position.
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
import numpy as np
nbins = 50
labeled_bin = 200
fig, ax = plt.subplots()
series = pd.Series(np.random.normal(0, 100, 1000))
## plot the histogram and return the bin position and values
ybins, xbins, _ = ax.hist(series, bins=nbins)
## find out in which bin belongs the position where you want the label
ind_bin = np.where(xbins >= labeled_bin)[0]
if len(ind_bin) > 0 and ind_bin[0] > 0:
## get position and value of the bin
x_bin = xbins[ind_bin[0]-1]/2. + xbins[ind_bin[0]]/2.
y_bin = ybins[ind_bin[0]-1]
## add the arrow
ax.annotate("",
xy=(x_bin, y_bin + 5), xycoords='data',
xytext=(x_bin, y_bin + 20), textcoords='data',
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3"),
)
else:
print "Labeled bin is outside range"
#Julien Spronck showed the best way, I think. Alternatively, you can also use arrow; the example code can be found below. The y-ccordinate is determined automatically by calculating how many elements are in a certain bin (with a certain tolerance which you can define yourself). You can play with the parameters (length of arrow head, length of arrow). Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
mySer = pd.Series(np.random.normal(0, 100, 1000))
mySer.plot(kind='hist', bins=50)
# that is where you want to add the arrow
ind = 200
# determine how many elements you have in the bin (with a certain tolerance)
n = len(mySer[(mySer > ind*0.95) & (mySer < ind*1.05)])
# define length of the arrow
lenArrow = 10
lenHead = 2
wiArrow = 5
plt.arrow(ind, n+lenArrow+lenHead, 0, -lenArrow, head_width=wiArrow+3, head_length=lenHead, width=wiArrow, fc='k', ec='k')
plt.show()
This gives you the following output (for 200 instead of 300 as in your example):

Ordered colored plot after clustering using python

I have a 1D array called data=[5 1 100 102 3 4 999 1001 5 1 2 150 180 175 898 1012]. I am using python scipy.cluster.vq to find clusters within it. There are 3 clusters in the data. After clustering when I'm trying to plot the data, there is no order in it.
It would be great if it's possible to plot the data in the same order as it is given and color different sections belong to different groups or clusters.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq
data = np.loadtxt('rawdata.csv', delimiter=' ')
#----------------------kmeans------------------
centroid,_ = kmeans(data, 3)
idx,_ = vq(data, centroid)
x=np.linspace(0,(len(data)-1),len(data))
fig = plt.figure(1)
plt.plot(x,data)
plot1=plt.plot(data[idx==0],'ob')
plot2=plt.plot(data[idx==1],'or')
plot3=plt.plot(data[idx==2],'og')
plt.show()
Here is my plot
http://s29.postimg.org/9gf7noe93/figure_1.png
(The blue graph in the background is in-order, after clustering,it messed up)
Thanks!
Update :
I wrote the following code to implement in-order colored plot after clustering,
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq
data = np.loadtxt('rawdata.csv', delimiter=' ')
#----------------------kmeans-----------------------------
centroid,_ = kmeans(data, 3) # three clusters
idx,_ = vq(data, centroid)
x=np.linspace(0,(len(data)-1),len(data))
fig = plt.figure(1)
plt.plot(x,data)
for i in range(0,(len(data)-1)):
if data[i] in data[idx==0]:
plt.plot(x[i],(data[i]),'ob' )
if data[i] in data[idx==1]:
plt.plot(x[i],(data[i]),'or' )
if data[i] in data[idx==2]:
plt.plot(x[i],(data[i]),'og' )
plt.show()
The problem with the above code is it's too slow. And my array size is over 3million. So this code will take forever to finish it's job for me.
I really appreciate if someone can provide vectorized version of the above mentioned code.
Thanks!
You can plot the clustered data points based on their distances from the cluster center and then write the index of each data point close to that in order to see how they scattered based on their clustering properties:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, vq
from scipy.spatial.distance import cdist
data=np.array([ 5, 1, 100, 102, 3, 4, 999, 1001, 5, 1, 2, 150, 180, 175, 898, 1012])
centroid,_ = kmeans(data, 3)
idx,_ = vq(data, centroid)
X=data.reshape(len(data),1)
Y=centroid.reshape(len(centroid),1)
D_k = cdist( X, Y, metric='euclidean' )
colors = ['red', 'green', 'blue']
pId=range(0,(len(data)-1))
cIdx = [np.argmin(D) for D in D_k]
dist = [np.min(D) for D in D_k]
r=np.vstack((data,dist)).T
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
mark=['^','o','>']
for i, ((x,y), kls) in enumerate(zip(r, cIdx)):
ax.plot(r[i,0],r[i,1],color=colors[kls],marker=mark[kls])
ax.annotate(str(i), xy=(x,y), xytext=(0.5,0.5), textcoords='offset points',
size=8,color=colors[kls])
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('Data')
ax.set_ylabel('Distance')
plt.show()
Update:
if you are very keen of using vectorize procedure you can do it as following for a randomly generated data:
data=np.random.uniform(1,1000,3000)
#np.vectorize
def plotting(i):
ax.plot(i,data[i],color=colors[cIdx[i]],marker=mark[cIdx[i]])
mark=['>','o','^']
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plotting(range(len(data)))
ax.set_xlabel('index')
ax.set_ylabel('Data')
plt.show()

Using Colormaps to set color of line in matplotlib

How does one set the color of a line in matplotlib with scalar values provided at run time using a colormap (say jet)? I tried a couple of different approaches here and I think I'm stumped. values[] is a storted array of scalars. curves are a set of 1-d arrays, and labels are an array of text strings. Each of the arrays have the same length.
fig = plt.figure()
ax = fig.add_subplot(111)
jet = colors.Colormap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
retLine, = ax.plot(line, color=colorVal)
#retLine.set_color()
lines.append(retLine)
ax.legend(lines, labels, loc='upper right')
ax.grid()
plt.show()
The error you are receiving is due to how you define jet. You are creating the base class Colormap with the name 'jet', but this is very different from getting the default definition of the 'jet' colormap. This base class should never be created directly, and only the subclasses should be instantiated.
What you've found with your example is a buggy behavior in Matplotlib. There should be a clearer error message generated when this code is run.
This is an updated version of your example:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
import numpy as np
# define some random data that emulates your indeded code:
NCURVES = 10
np.random.seed(101)
curves = [np.random.random(20) for i in range(NCURVES)]
values = range(NCURVES)
fig = plt.figure()
ax = fig.add_subplot(111)
# replace the next line
#jet = colors.Colormap('jet')
# with
jet = cm = plt.get_cmap('jet')
cNorm = colors.Normalize(vmin=0, vmax=values[-1])
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=jet)
print scalarMap.get_clim()
lines = []
for idx in range(len(curves)):
line = curves[idx]
colorVal = scalarMap.to_rgba(values[idx])
colorText = (
'color: (%4.2f,%4.2f,%4.2f)'%(colorVal[0],colorVal[1],colorVal[2])
)
retLine, = ax.plot(line,
color=colorVal,
label=colorText)
lines.append(retLine)
#added this to get the legend to work
handles,labels = ax.get_legend_handles_labels()
ax.legend(handles, labels, loc='upper right')
ax.grid()
plt.show()
Resulting in:
Using a ScalarMappable is an improvement over the approach presented in my related answer:
creating over 20 unique legend colors using matplotlib
I thought it would be beneficial to include what I consider to be a more simple method using numpy's linspace coupled with matplotlib's cm-type object. It's possible that the above solution is for an older version. I am using the python 3.4.3, matplotlib 1.4.3, and numpy 1.9.3., and my solution is as follows.
import matplotlib.pyplot as plt
from matplotlib import cm
from numpy import linspace
start = 0.0
stop = 1.0
number_of_lines= 1000
cm_subsection = linspace(start, stop, number_of_lines)
colors = [ cm.jet(x) for x in cm_subsection ]
for i, color in enumerate(colors):
plt.axhline(i, color=color)
plt.ylabel('Line Number')
plt.show()
This results in 1000 uniquely-colored lines that span the entire cm.jet colormap as pictured below. If you run this script you'll find that you can zoom in on the individual lines.
Now say I want my 1000 line colors to just span the greenish portion between lines 400 to 600. I simply change my start and stop values to 0.4 and 0.6 and this results in using only 20% of the cm.jet color map between 0.4 and 0.6.
So in a one line summary you can create a list of rgba colors from a matplotlib.cm colormap accordingly:
colors = [ cm.jet(x) for x in linspace(start, stop, number_of_lines) ]
In this case I use the commonly invoked map named jet but you can find the complete list of colormaps available in your matplotlib version by invoking:
>>> from matplotlib import cm
>>> dir(cm)
A combination of line styles, markers, and qualitative colors from matplotlib:
import itertools
import matplotlib as mpl
import matplotlib.pyplot as plt
N = 8*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, colormap)):
plt.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=4);
UPDATE: Supporting not only ListedColormap, but also LinearSegmentedColormap
import itertools
import matplotlib.pyplot as plt
Ncolors = 8
#colormap = plt.cm.Dark2# ListedColormap
colormap = plt.cm.viridis# LinearSegmentedColormap
Ncolors = min(colormap.N,Ncolors)
mapcolors = [colormap(int(x*colormap.N/Ncolors)) for x in range(Ncolors)]
N = Ncolors*4+10
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
fig,ax = plt.subplots(gridspec_kw=dict(right=0.6))
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, mapcolors)):
ax.plot([0,1,2],[0,2*i,2*i], color=color, linestyle=linestyle,marker=marker,label=i)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3,prop={'size': 8})
U may do as I have written from my deleted account (ban for new posts :( there was). Its rather simple and nice looking.
Im using 3-rd one of these 3 ones usually, also I wasny checking 1 and 2 version.
from matplotlib.pyplot import cm
import numpy as np
#variable n should be number of curves to plot (I skipped this earlier thinking that it is obvious when looking at picture - sorry my bad mistake xD): n=len(array_of_curves_to_plot)
#version 1:
color=cm.rainbow(np.linspace(0,1,n))
for i,c in zip(range(n),color):
ax1.plot(x, y,c=c)
#or version 2: - faster and better:
color=iter(cm.rainbow(np.linspace(0,1,n)))
c=next(color)
plt.plot(x,y,c=c)
#or version 3:
color=iter(cm.rainbow(np.linspace(0,1,n)))
for i in range(n):
c=next(color)
ax1.plot(x, y,c=c)
example of 3:
Ship RAO of Roll vs Ikeda damping in function of Roll amplitude A44

Python-Matplotlib boxplot. How to show percentiles 0,10,25,50,75,90 and 100?

I would like to plot an EPSgram (see below) using Python and Matplotlib.
The boxplot function only plots quartiles (0, 25, 50, 75, 100). So, how can I add two more boxes?
I put together a sample, if you're still curious. It uses scipy.stats.scoreatpercentile, but you may be getting those numbers from elsewhere:
from random import random
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import scoreatpercentile
x = np.array([random() for x in xrange(100)])
# percentiles of interest
perc = [min(x), scoreatpercentile(x,10), scoreatpercentile(x,25),
scoreatpercentile(x,50), scoreatpercentile(x,75),
scoreatpercentile(x,90), max(x)]
midpoint = 0 # time-series time
fig = plt.figure()
ax = fig.add_subplot(111)
# min/max
ax.broken_barh([(midpoint-.01,.02)], (perc[0], perc[1]-perc[0]))
ax.broken_barh([(midpoint-.01,.02)], (perc[5], perc[6]-perc[5]))
# 10/90
ax.broken_barh([(midpoint-.1,.2)], (perc[1], perc[2]-perc[1]))
ax.broken_barh([(midpoint-.1,.2)], (perc[4], perc[5]-perc[4]))
# 25/75
ax.broken_barh([(midpoint-.4,.8)], (perc[2], perc[3]-perc[2]))
ax.broken_barh([(midpoint-.4,.8)], (perc[3], perc[4]-perc[3]))
ax.set_ylim(-0.5,1.5)
ax.set_xlim(-10,10)
ax.set_yticks([0,0.5,1])
ax.grid(True)
plt.show()

Categories