producing histogram with y axis as relative frequency?

producing histogram with y axis as relative frequency? - python

Today my task is to produce a histogram where the y axis is a relative frequency rather than just an absolute count. I've located another question regarding this (see: Setting a relative frequency in a matplotlib histogram) however, when I try to implement it, I get the error message:
'list' object has no attribute size
despite having the exact same code given in the answer -- and despite their information also being stored in a list.
In addition, I have tried the method here(http://www.bertplot.com/visualization/?p=229) with no avail, as the output still doesn't show the y label as ranging from 0 to 1.
import numpy as np
import matplotlib.pyplot as plt
import random
from tabulate import tabulate
import matplotlib.mlab as mlab
precision = 100000000000
def MarkovChain(n,s) :
"""
"""
matrix = []
for l in range(n) :
lineLst = []
sum = 0
crtPrec = precision
for i in range(n-1) :
val = random.randrange(crtPrec)
sum += val
lineLst.append(float(val)/precision)
crtPrec -= val
lineLst.append(float(precision - sum)/precision)
matrix2 = matrix.append(lineLst)
print("The intial probability matrix.")
print(tabulate(matrix2))
baseprob = []
baseprob2 = []
baseprob3 = []
baseprob4 = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = np.linalg.matrix_power(matrix2, i)
baseprob.append(matrix_n.item(0))
baseprob2.append(matrix_n.item(1))
baseprob3.append(matrix_n.item(2))
baseprob = np.array(baseprob)
baseprob2 = np.array(baseprob2)
baseprob3 = np.array(baseprob3)
baseprob4 = np.array(baseprob4)
# Here I tried to make a histogram using the plt.hist() command, but the normed=True doesn't work like I assumed it would.
'''
plt.hist(baseprob, bins=20, normed=True)
plt.show()
'''
#Here I tried to make a histogram using the method from the second link in my post.
# The code runs, but then the graph that is outputted isn't doesn't have the relative frequency on the y axis.
'''
n, bins, patches = plt.hist(baseprob, bins=30,normed=True,facecolor = "green",)
y = mlab.normpdf(bins,mu,sigma)
plt.plot(bins,y,'b-')
plt.title('Main Plot Title',fontsize=25,horizontalalignment='right')
plt.ylabel('Count',fontsize=20)
plt.yticks(fontsize=15)
plt.xlabel('X Axis Label',fontsize=20)
plt.xticks(fontsize=15)
plt.show()
'''
# Here I tried to make a histogram using the method seen in the Stackoverflow question I mentioned.
# The figure that pops out looks correct in terms of the axes, but no actual data is posted. Instead the error below is shown in the console.
# AttributeError: 'list' object has no attribute 'size'
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(baseprob, weights=np.zeros_like(baseprob)+1./ baseprob.size)
n, bins, patches = ax.hist(baseprob, bins=100, normed=1, cumulative=0)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend
plt.show()
print("The final probability matrix.")
print(tabulate(matrix_n))
matrixTranspose = zip(*matrix_n)
evectors = np.linalg.eig(matrixTranspose)[1][:,0]
print("The steady state vector is:")
print(evectors)
MarkovChain(5, 1000)
The methods I tried are each commented out, so to reproduce my errors, make sure to erase the comment markers.
As you can tell, I'm really new to Programming. Also this is not for a homework assignment in a computer science class, so there are no moral issues associated with just providing me with code.

The expected input to matplotlib functions are usually numpy arrays, which have the methods nparray.size. Lists do not have size methods so when list.size is called in the hist function, this causes your error. You need to convert, using nparray = np.array(list). You can do this after the loop where you build the lists with append, something like,
baseprob = []
baseprob2 = []
baseprob3 = []
baseprob4 = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = numpy.linalg.matrix_power(matrix, i)
baseprob.append(matrix_n.item(0))
baseprob2.append(matrix_n.item(1))
baseprob3.append(matrix_n.item(2))
baseprob = np.array(baseprob)
baseprob2 = np.array(baseprob2)
baseprob3 = np.array(baseprob3)
baseprob4 = np.array(baseprob4)
EDIT: minimal hist example
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
baseprob = np.random.randn(1000000)
ax.hist(baseprob, weights=np.zeros_like(baseprob)+1./ baseprob.size, bins=100)
n, bins, patches = ax.hist(baseprob, bins=100, normed=1, cumulative=0, alpha = 0.4)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend
plt.show()
which gives,

Related

Making parts of a line graph a different colour depending on their y value in Matplotlib

I'm making a program which takes a random list of data and will plot it.
I want the colour of the graph to change if it goes above a certain value.
https://matplotlib.org/gallery/lines_bars_and_markers/multicolored_line.html
Matplotlib has an entry on doing just this but it seems to require using a function as input for the graph not using lists.
Does anyone know how to either convert this to work for lists or another way of doing so?
Here's my code so far (without my horrific failed attempts to colour code them)
from matplotlib import pyplot as plt
import random
import sys
import numpy as np
#setting the max and min values where I want the colour to change
A_min = 2
B_max = 28
#makes lists for later
A_min_lin = []
B_max_lin = []
#simulating a corruption of the data where it returns all zeros
sim_crpt = random.randint(0,10)
print(sim_crpt)
randomy = []
if sim_crpt == 0:
randomy = []
#making the empty lists for corrupted data
for i in range(0,20):
randomy.append(0)
print(randomy)
else:
#making a random set of values for the y axis
for i in range(0,20):
n = random.randint(0,30)
randomy.append(n)
print(randomy)
#making an x axis for time
time = t = np.arange(0, 20, 1)
#Making a list to plot a straight line showing where the maximum and minimum values
for i in range(0, len(time)):
A_min_lin.append(A_min)
B_max_lin.append(B_max)
#Testing to see if more than 5 y values are zero to return if it's corrupted
tracker = 0
for i in (randomy):
if i == 0:
tracker += 1
if tracker > 5:
sys.exit("Error, no data")
#ploting and showing the different graphs
plt.plot(time,randomy)
plt.plot(time,A_min_lin)
plt.plot(time,B_max_lin)
plt.legend(['Data', 'Minimum for linear', "Maximum for linear"])
plt.show

You can use np.interp to generate the fine-grain data to plot:
# fine grain time
new_time = np.linspace(time.min(), time.max(), 1000)
# interpolate the y values
new_randomy = np.interp(new_time, time, randomy)
# this is copied from the link with few modification
points = np.array([new_time, new_randomy]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
fig, axs = plt.subplots()
norm = plt.Normalize(new_randomy.min(), new_randomy.max())
lc = LineCollection(segments, cmap='viridis', norm=norm)
# Set the values used for colormapping
lc.set_array(new_randomy[1:])
lc.set_linewidth(2)
line = axs.add_collection(lc)
fig.colorbar(line, ax=axs)
# set the limits
axs.set_xlim(new_time.min(), new_time.max())
axs.set_ylim(new_randomy.min(), new_randomy.max())
plt.show()
Output:

How to add (or annotate) value labels (or frequencies) on a matplotlib "histogram" chart

I want to add frequency labels to the histogram generated using plt.hist.
Here is the data :
np.random.seed(30)
d = np.random.randint(1, 101, size = 25)
print(sorted(d))
I looked up other questions on stackoverflow like :
Adding value labels on a matplotlib bar chart
and their answers, but apparantly, the objects returnded by plt.plot(kind='bar') are different than than those returned by plt.hist, and I got errors while using the 'get_height' or 'get width' functions, as suggested in some of the answers for bar plot.
Similarly, couldn't find the solution by going through the matplotlib documentation on histograms.
got this error

Here is how I managed it. If anyone has some suggestions to improve my answer, (specifically the for loop and using n=0, n=n+1, I think there must be a better way to write the for loop without having to use n in this manner), I'd welcome it.
# import base packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# generate data
np.random.seed(30)
d = np.random.randint(1, 101, size = 25)
print(sorted(d))
# generate histogram
# a histogram returns 3 objects : n (i.e. frequncies), bins, patches
freq, bins, patches = plt.hist(d, edgecolor='white', label='d', bins=range(1,101,10))
# x coordinate for labels
bin_centers = np.diff(bins)*0.5 + bins[:-1]
n = 0
for fr, x, patch in zip(freq, bin_centers, patches):
height = int(freq[n])
plt.annotate("{}".format(height),
xy = (x, height), # top left corner of the histogram bar
xytext = (0,0.2), # offsetting label position above its bar
textcoords = "offset points", # Offset (in points) from the *xy* value
ha = 'center', va = 'bottom'
)
n = n+1
plt.legend()
plt.show;

How to plot a thermometer?

In a recent, very broad question it was asked how to plot several symbols, like "circles, squares, rectangles, stars, thermometers, and boxplots" with matplotlib. From that list, all but thermometers are obvious as either shown in the documentation or in many existing stackoverflow answers. Since the OP did not seem interested in thermomenters at all, I'd rather ask a new question specifically about thermometers here.
How to plot thermometers in matplotlib?
In principle you can plot any symbol you like, making it either a marker or a Path. There does not seem to be any unicode symbol for thermometers though. Font awesome has a thermometer symbol and plotting FontAwesome symbols in matplotlib is possible. Yet there are only 5 differnt fillings
Also, the color of such font symbol is uniform, yet ideally one would have the inner part of a thermometer (the "mercury pillar") in a different color (probably mostly red for associative reasons) or in different colors as to encode temperature in color as well.
So is it possible to have a temperature symbol where the mercury pillar encodes temperature (or in fact any other quantity) in terms of color and filling level? And if so, how?
(I gave an answer below, alternatives to or improvements of that method are welcome as further answers here.)

An option to plot a thermometer consisting of two parts is to create two Paths, the outer hull and the inner mercury pillar. For this one can create the Paths from scratch and allow the inner path to be variable depending on a (normalized) input parameter.
Then plotting both paths as individual scatter plots is possible. In the following, we create a class that has a scatter method, which works similar to a usual scatter, except that it would also take the additional arguments temp for the temperature and tempnorm for the normalization of the temperature as input.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.path as mpath
class TemperaturePlot():
#staticmethod
def get_hull():
verts1 = np.array([[0,-128],[70,-128],[128,-70],[128,0],
[128,32.5],[115.8,61.5],[96,84.6],[96,288],
[96,341],[53,384],[0,384]])
verts2 = verts1[:-1,:] * np.array([-1,1])
codes1 = [1,4,4,4,4,4,4,2,4,4,4]
verts3 = np.array([[0,-80],[44,-80],[80,-44],[80,0],
[80,34.3],[60.7,52],[48,66.5],[48,288],
[48,314],[26.5,336],[0,336]])
verts4 = verts3[:-1,:] * np.array([-1,1])
verts = np.concatenate((verts1, verts2[::-1], verts4, verts3[::-1]))
codes = codes1 + codes1[::-1][:-1]
return mpath.Path(verts/256., codes+codes)
#staticmethod
def get_mercury(s=1):
a = 0; b = 64; c = 35
d = 320 - b
e = (1-s)*d
verts1 = np.array([[a,-b],[c,-b],[b,-c],[b,a],[b,c],[c,b],[a,b]])
verts2 = verts1[:-1,:] * np.array([-1,1])
verts3 = np.array([[0,0],[32,0],[32,288-e],[32,305-e],
[17.5,320-e],[0,320-e]])
verts4 = verts3[:-1,:] * np.array([-1,1])
codes = [1] + [4]*12 + [1,2,2,4,4,4,4,4,4,2,2]
verts = np.concatenate((verts1, verts2[::-1], verts3, verts4[::-1]))
return mpath.Path(verts/256., codes)
def scatter(self, x,y, temp=1, tempnorm=None, ax=None, **kwargs):
self.ax = ax or plt.gca()
temp = np.atleast_1d(temp)
ec = kwargs.pop("edgecolor", "black")
kwargs.update(linewidth=0)
self.inner = self.ax.scatter(x,y, **kwargs)
kwargs.update(c=None, facecolor=ec, edgecolor=None, color=None)
self.outer = self.ax.scatter(x,y, **kwargs)
self.outer.set_paths([self.get_hull()])
if not tempnorm:
mi, ma = np.nanmin(temp), np.nanmax(temp)
if mi == ma:
mi=0
tempnorm = plt.Normalize(mi,ma)
ipaths = [self.get_mercury(tempnorm(t)) for t in temp]
self.inner.set_paths(ipaths)
Usage of this class could look like this,
plt.rcParams["figure.figsize"] = (5.5,3)
plt.rcParams["figure.dpi"] = 72*3
fig, ax = plt.subplots()
p = TemperaturePlot()
p.scatter([.25,.5,.75], [.3,.4,.5], s=[800,1200,1600], temp=[28,39,35], color="C3",
ax=ax, transform=ax.transAxes)
plt.show()
where we plot 3 Thermometers with different temperatures depicted by the fill of the "mercury" pillar. Since no normalization is given it will normalize the temperatures of [28,39,35] between their minimum and maximum.
Or we can use color (c) and temp to show the temparature as in
np.random.seed(42)
fig, ax = plt.subplots()
n = 42
x = np.linspace(0,100,n)
y = np.cumsum(np.random.randn(n))+5
ax.plot(x,y, color="darkgrey", lw=2.5)
p = TemperaturePlot()
p.scatter(x[::4],y[::4]+3, s=300, temp=y[::4], c=y[::4], edgecolor="k", cmap="RdYlBu_r")
ax.set_ylim(-6,18)
plt.show()

python scatter plot with errorbars and colors mapping a physical quantity

I'm trying to do a quite simple scatter plot with error bars and semilogy scale. What is a little bit different from tutorials I have found is that the color of the scatterplot should trace a different quantity. On one hand, I was able to do a scatterplot with the errorbars with my data, but just with one color. On the other hand, I realized a scatterplot with the right colors, but without the errorbars.
I'm not able to combine the two different things.
Here an example using fake data:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
n=100
Lx_gas = 1e40*np.random.random(n) + 1e37
Tx_gas = np.random.random(n) + 0.5
Lx_plus_error = Lx_gas
Tx_plus_error = Tx_gas/2.
Tx_minus_error = Tx_gas/4.
#actually positive numbers, this is the quantity that should be traced by the
#color, in this example I use random numbers
Lambda = np.random.random(n)
#this is actually different from zero, but I want to be sure that this simple
#code works with the log axis
Lx_minus_error = np.zeros_like(Lx_gas)
#normalize the color, to be between 0 and 1
colors = np.asarray(Lambda)
colors -= colors.min()
colors *= (1./colors.max())
#build the error arrays
Lx_error = [Lx_minus_error, Lx_plus_error]
Tx_error = [Tx_minus_error, Tx_plus_error]
##--------------
##important part of the script
##this works, but all the dots are of the same color
#plt.errorbar(Tx_gas, Lx_gas, xerr = Tx_error,yerr = Lx_error,fmt='o')
##this is what is should be in terms of colors, but it is without the error bars
#plt.scatter(Tx_gas, Lx_gas, marker='s', c=colors)
##what I tried (and failed)
plt.errorbar(Tx_gas, Lx_gas, xerr = Tx_error,yerr = Lx_error,\
color=colors, fmt='o')
ax = plt.gca()
ax.set_yscale('log')
plt.show()
I even tried to plot the scatterplot after the errorbar, but for some reason everything plotted on the same window is put in background with respect to the errorplot.
Any ideas?
Thanks!

You can set the color to the LineCollection object returned by the errorbar as described here.
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
n=100
Lx_gas = 1e40*np.random.random(n) + 1e37
Tx_gas = np.random.random(n) + 0.5
Lx_plus_error = Lx_gas
Tx_plus_error = Tx_gas/2.
Tx_minus_error = Tx_gas/4.
#actually positive numbers, this is the quantity that should be traced by the
#color, in this example I use random numbers
Lambda = np.random.random(n)
#this is actually different from zero, but I want to be sure that this simple
#code works with the log axis
Lx_minus_error = np.zeros_like(Lx_gas)
#normalize the color, to be between 0 and 1
colors = np.asarray(Lambda)
colors -= colors.min()
colors *= (1./colors.max())
#build the error arrays
Lx_error = [Lx_minus_error, Lx_plus_error]
Tx_error = [Tx_minus_error, Tx_plus_error]
sct = plt.scatter(Tx_gas, Lx_gas, marker='s', c=colors)
cb = plt.colorbar(sct)
_, __ , errorlinecollection = plt.errorbar(Tx_gas, Lx_gas, xerr = Tx_error,yerr = Lx_error, marker = '', ls = '', zorder = 0)
error_color = sct.to_rgba(colors)
errorlinecollection[0].set_color(error_color)
errorlinecollection[1].set_color(error_color)
ax = plt.gca()
ax.set_yscale('log')
plt.show()

Defining and plotting a Schechter function: plot problems

I'm currently defining a function in python as:
def schechter_fit(logM, phi=5.96E-11, log_M0=11.03, alpha=-1.35, e=2.718281828):
schechter = phi*(10**((alpha+1)*(logM-log_M0)))*(e**(pow(-10,logM-log_M0)))
return schechter
schechter_range = numpy.linspace(10.0, 11.9, 10000)
And then plotting said function as:
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
schechter_range = numpy.linspace(10, 12, 10000)
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
ax.plot(schechter_range, schechter_fit(schechter_range), 'k')
This is the graphical output I am receiving is just a blank plot with no curve plotted. There must be a problem with how I have defined the function, but I can't see the problem. The plot should look something like this:
I'm new to python functions so perhaps my equation isn't quite right. This is what I am looking to plot and the parameters I am starting with:

The function you describe returns a complex result over most of your input range. Here I added +0j to the input to allow for an imaginary result; if you don't do this you just get a bunch of nans (which mpl doesn't plot). Here are the plots:
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
schechter_range = numpy.linspace(10, 12, 10000)
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
def schechter_fit(logM, phi=5.96E-11, log_M0=11.03, alpha=-1.35, e=2.718281828):
schechter = phi*(10**((alpha+1)*(logM-log_M0)))*(e**(pow(-10,logM-log_M0)))
return schechter
y = schechter_fit(schechter_range+0j) # Note the +0j here to allow an imaginary result
ax.plot(schechter_range, y.real, 'b', label="Re Part")
ax.plot(schechter_range, y.imag, 'r', label="Im Part")
ax.legend()
plt.show()
Now that you can see why the data is not plotting, and that complex numbers are being generated, and you know physically that you don't want that, it would be reasonable to figure out where these are coming from. Hopefully, it's obvious that these are originate from pow(-10,logM-log_M0), and from there it's clear that this is assuming the wrong operator precedence: the equation isn't pow(-10,logM-log_M0), but -pow(10,logM-log_M0). Making this corrections gives (after a log is taken, because I can see the log in the plot in the question):
I also extended the lower bound from 10 to 8, so the region of constant slope is clear and it better matches the graph shown in the question. This is still off by a factor on the y-axis, but I'm guessing that's a factor of (SFR/M*) that's not being applied correctly (it's difficult to know without seeing the context and the full y-axis).

i did amost the same as tom10 except that i took the log of your expression directly, which turns the factors into summands and may make things easier to debug.
i did not really test the formula!
import numpy
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid.axislines import SubplotZero
def log_schechter_fit(logM, SFR_M=5.96E-11, log_M0=11.03,
alpha=-1.35):
schechter = numpy.log(SFR_M)
schechter += (alpha+1)*(logM-log_M0)*numpy.log(10)
schechter += pow(-10,logM-log_M0)
return schechter
schechter_range = numpy.linspace(10, 12, 10000)
# for i in range(10,13):
for i in numpy.linspace(10, 11.03, 10):
print(i, log_schechter_fit(i+0j))
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
ax.set_xlim([10,12])
y = log_schechter_fit(schechter_range+0j)
ax.plot(schechter_range, y.real, 'b', label="Re Part")
ax.plot(schechter_range, y.imag, 'r', label="Im Part")
ax.legend()
and i got:
UPDATE
again using tom10's comments on operator precedence and changing the last part in the function:
LOG_10 = numpy.log(10)
SFR_M = 5.96E-11
LOG_SFR_M = numpy.log(SFR_M)
def log_schechter_fit(logM, log_SFR_M=LOG_SFR_M, log_M0=11.03,
alpha=-1.35):
schechter = log_SFR_M
schechter += (alpha+1)*(logM-log_M0)*LOG_10
schechter -= pow(10,logM-log_M0)
return schechter
i can reproduce the plot of the accepted answer. the shape of the curve fits but i can not explain the discrepancy to the values compared with the original plot posted in the question...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

producing histogram with y axis as relative frequency? - python

Related

Making parts of a line graph a different colour depending on their y value in Matplotlib

How to add (or annotate) value labels (or frequencies) on a matplotlib "histogram" chart

How to plot a thermometer?

python scatter plot with errorbars and colors mapping a physical quantity

Defining and plotting a Schechter function: plot problems

Categories

Resources