pyplot hist() frequency histogram does not normalize to 1

pyplot hist() frequency histogram does not normalize to 1 - python

I have a problem when using hist(). I created an array with 396 entries which contains every possible outcome of
X + 0.5 Y + Z,
where 0 <= X,Z <=5 and 0 <= Y <= 10. All I want to create is a frequency histogram whose bar heights coincide with the probability that a certain value is taken under the assumption of x,y,z being independently, uniformly distributed. I thought I'd give hist() a try:
import pandas as pd
import matplotlib.pyplot as plt
def activity(x,y,z):
return 1 * x + 0.5 * y + 1 * z
rangex = np.arange(6)
rangey = np.arange(11)
rangez = np.arange(6)
rhs = 10
activities = [activity(x,y,z) for x in rangex for y in rangey for z in rangez]
activities = pd.Series(activities)
fig, axes = plt.subplots(1,1)
n, bins, patches = axes.hist(activities, bins=np.linspace(-0.25, 15.25, num=32), \
normed=True)
Here is the weird thing: n sums up to 2!!!
With the choice of bins, I made sure that each item falls into exactly 1 bin, and I know that there are exactly 31 bins necessary, because my value range is from 0 through 15 in steps of 0.5.
I am sorry I couldn't simplify this example. A random trial with 100 values yielded a correct frequency. Instead, this is what the histogram looks like:
From the picture, it is obvious that the frequencies do not sum up to 1. There are, e.g., in the center 11 bars of height 0.1 or above. The frequencies of the red plot, however, do sum up to 1.
My question: Why do I get a false normalization?
See below the code to manually calculate a correct histogram:
barposs, barheight = zip(*activities.value_counts(normalize=True).iteritems())
plt.bar(np.array(barposs) - 0.25, np.array(barheight), width=0.5, color='red')
I appreciate any useful comment on that.

Related

Make a parabola steeper at both sides while keeping both ends

I'm having a parabola with both axes being from 0 to 1 as follows:
The parabola is created and normalized with the following code:
import matplotlib.pyplot as plt
import numpy as np
# normalize array
def min_max_scale_array(arr):
arr = np.array(arr)
return (arr - arr.min())/(arr.max()-arr.min())
x = np.linspace(-50,50,100)
y = x**2
x = min_max_scale_array(x)
y = min_max_scale_array(y)
fig, ax = plt.subplots()
ax.plot(x, y)
I want to create another one with both ends being the same but both sides become steeper like this:
I thought of joining an exponential curve and its reflection but that would make the resulting parabola looks pointy at the bottom.
Can you show me how to achieve this? Thank you!

If you want to modify any arbitrary curve, you can change the x values, for example taking a power of it:
# x and y are defined
for factor in [1.1, 1.5, 2, 3, 4]:
x2 = 2*x-1
x3 = (abs(x2)**(1/factor))*np.sign(x2)/2+0.5
ax.plot(x3, y, label=f'{factor=}')
output:

You can change the exponent to get a steeper curve with the same value at the extremes. You need to pick a larger value that is an even integer (odd numbers won't give a parabola).
y = x**4

Regular Distribution of Points in the Volume of a Sphere

I'm trying to generate a regular n number of points within the volume of a sphere. I found this similar answer (https://scicomp.stackexchange.com/questions/29959/uniform-dots-distribution-in-a-sphere) on generating a uniform regular n number of points on the surface of a sphere, with the following code:
import numpy as np
n = 5000
r = 1
z = []
y = []
x = []
alpha = 4.0*np.pi*r*r/n
d = np.sqrt(alpha)
m_nu = int(np.round(np.pi/d))
d_nu = np.pi/m_nu
d_phi = alpha/d_nu
count = 0
for m in range (0,m_nu):
nu = np.pi*(m+0.5)/m_nu
m_phi = int(np.round(2*np.pi*np.sin(nu)/d_phi))
for n in range (0,m_phi):
phi = 2*np.pi*n/m_phi
xp = r*np.sin(nu)*np.cos(phi)
yp = r*np.sin(nu)*np.sin(phi)
zp = r*np.cos(nu)
x.append(xp)
y.append(yp)
z.append(zp)
count = count +1
which works as intended:
How can I modify this to generate a regular set of n points in the volume of a sphere?

Another method to do this, yielding uniformity in volume:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
dim_len = 30
spacing = 2 / dim_len
point_cloud = np.mgrid[-1:1:spacing, -1:1:spacing, -1:1:spacing].reshape(3, -1).T
point_radius = np.linalg.norm(point_cloud, axis=1)
sphere_radius = 0.5
in_points = point_radius < sphere_radius
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(point_cloud[in_points, 0], point_cloud[in_points, 1], point_cloud[in_points, 2], )
plt.show()
Output (matplotlib mixes up the view but it is a uniformly sampled sphere (in volume))
Uniform sampling, then checking if points are in the sphere or not by their radius.
Uniform sampling reference [see this answer's edit history for naiive sampling].
This method has the drawback of generating redundant points which are then discarded.
It has the upside of vectorization, which probably makes up for the drawback. I didn't check.
With fancy indexing, one could generate the same points as this method without generating redundant points, but I doubt it can be easily (or at all) vectorized.

Sample uniformly along X. For every value of X, you draw two Y from X²+Y²=1. Sample uniformly between these two Y. Then for every (X, Y) pair, you draw two Z from X²+Y²+Z²=1. Sample uniformly between these two Z.

given percentiles find distribution function python

From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?

My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks

displaying Mandelbrot set in python using matplotlib.pyplot and numpy

I am trying to get a plot of a Mandelbrot set and having trouble plotting the expected plot.
As I understand, the Mandelbrot set is made up of values c, which would converge if are iterated through the following equation z = z**2 + c. I used the initial value of z = 0.
Initially, I was getting a straight line. I look for solutions online to see where I went wrong. Using the following link in particular, I attempted to improve my code:
https://scipy-lectures.org/intro/numpy/auto_examples/plot_mandelbrot.html
Here is my improved code. I don't really understand the reason of using np.newaxis and why I am plotting the final z values that converge. Am I misunderstanding the definition of the Mandelbrot set?
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
z = 0
for n in range(0,loop):
z = z**2 + c
plt.rcParams['figure.figsize'] = [12, 7.5]
z = z[abs(z) < 2] # removing z values that diverge
plt.scatter(z.real, z.imag, color = "black" ) # plotting points
plt.xlabel("Real")
plt.ylabel("i (imaginary)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()
and got the following image, which looks closer to the Mandelbrot set than anything I got so far. But it looks more of a starfish with scattered dots around it.
Image
For reference, here is my initial code before improvement:
# initial values
loop = 50
div = 50
clist = np.linspace(-2,2,div) + 1j*np.linspace(-1.5,1.5,div) # range of c values
all_results = []
for c in clist: # for each value of c
z = 0 # starting point
for a in range(0,loop):
negative = 0 # unstable
z = z**2 + c
if np.abs(z) > 2:
negative +=1
if negative > 2:
break
if negative == 0:
all_results.append([c,"blue"]) #converging
else:
all_results.append([c,"black"]) # not converging

Alternatively, with another small change to the code in the question, one can use the values of z to colorize the plot. One can store the value of n where the absolute value of the series becomes larger than 2 (meaning it diverges), and color the points outside the Mandelbrot set with it:
import pylab as plt
import numpy as np
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
# array of ones of same dimensions as c
ones = np.ones(np.shape(c), np.int)
# Array that will hold colors for plot, initial value set here will be
# the color of the points in the mandelbrot set, i.e. where the series
# converges.
# For the code below to work, this initial value must at least be 'loop'.
# Here it is loop + 5
color = ones * loop + 5
z = 0
for n in range(0,loop):
z = z**2 + c
diverged = np.abs(z)>2
# Store value of n at which series was detected to diverge.
# The later the series is detected to diverge, the higher
# the 'color' value.
color[diverged] = np.minimum(color[diverged], ones[diverged]*n)
plt.rcParams['figure.figsize'] = [12, 7.5]
# contour plot with real and imaginary parts of c as axes
# and colored according to 'color'
plt.contourf(c.real, c.imag, color)
plt.xlabel("Real($c$)")
plt.ylabel("Imag($c$)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()

The plot doesn't look correct, because in the code in the question z (i.e. the iterated variable) is plotted. Iterating z = z*z + c, the Mandelbrot set is given by those real, imaginary part pairs of c, for which the series doesn't diverge. Hence the small change to the code as shown below gives the correct Mandelbrot plot:
import pylab as plt
import numpy as np
# initial values
loop = 50 # number of interations
div = 600 # divisions
# all possible values of c
c = np.linspace(-2,2,div)[:,np.newaxis] + 1j*np.linspace(-2,2,div)[np.newaxis,:]
z = 0
for n in range(0,loop):
z = z**2 + c
plt.rcParams['figure.figsize'] = [12, 7.5]
p = c[abs(z) < 2] # removing c values for which z has diverged
plt.scatter(p.real, p.imag, color = "black" ) # plotting points
plt.xlabel("Real")
plt.ylabel("i (imaginary)")
plt.xlim(-2,2)
plt.ylim(-1.5,1.5)
plt.savefig("plot.png")
plt.show()

Complete turn x angle values are mapped to half turn using asin function, how to mirror them back?

I have angles that form a complete turn in an array x, from -90 to 270 e.g. (it may be defined otherwise, like from 0 to 360 or -180 to 180) with step 1 or whatever.
asin function is valid only between -90 and +90.
Thus, angles < -90 or > 90 would be "mapped" between these values.
E.g. y = some_asin_func(over_sin(x)) will end up in an y value that is always between -90 and +90. So y is stuck between -90 and +90.
I do need to retrieve to which x-input is y related, because it's ambiguous yet: for example, the function over (x) will give the same y values for x = 120 and x = 60, or x = -47 and x = 223. Which is not what I want.
Put an other way; I need y making a complete turn as x does, ranging from where x starts up to where x ends.
An image will be better:
Here, x ranges between -90 (left) to 270 (right of the graph).
The valid part of the curve is between x=-90 and x=+90 (left half of the graph).
All other values are like mirrored about y=90 or y=-90.
For x=180 for example, I got y=0 and it should be y=180.
For x=270, I have y=-90 but it should be y=270, thus +360.
Here's a code sample:
A = 50 # you can make this value vary to have different curves like in the images, when A=0 -> shape is triangle-like, when A=90-> shape is square-like.
x = np.linspace(-90,270,int(1e3))
u = np.sin(math.pi*A/180)*np.cos(math.pi*x/180)
v = 180*(np.arcsin(u))/math.pi
y = 180*np.arcsin(np.sin(math.pi*x/180)/np.cos(math.pi*v/180))/math.pi
plt.plot(x,y)
plt.grid(True)
Once again, first left half of the graph is completely correct.
The right half is also correct in its behavior, but in final, here, it must be mirrored about an horizontal axis at position y=+90 when x>90, like this:
That is, it's like the function is mirrored about y=-90 and y=+90 for y where x is out of the range [-90,+90] and only where where x is out of the range [-90,+90].
I want to un-mirror it outside the valid [-90,+90] range:
about y=-90 where y is lower than -90
about y=+90 where y is greater than +90
And of course, modulo each complete turn.
Here an other example where x ranges from -180 to 180 and the desired behavior:
Yet:
Wanted:
I have first tested some simple thing up now:
A = 50
x = np.linspace(-180,180,int(1e3))
u = np.sin(math.pi*A/180)*np.cos(math.pi*x/180)
v = 180*(np.arcsin(u))/math.pi
y = 180*np.arcsin(np.sin(math.pi*x/180)/np.cos(math.pi*v/180))/math.pi
for i,j in np.ndenumerate(x):
xval = (j-180)%180-180
if (xval < -90):
y[i] = y[i]-val
elif (xval > 90):
y[i] = y[i]+val
plt.plot(x,y);
plt.grid(True)
plt.show()
which doesn't work at all but I think the background idea is there...
I guess it may be some kind of modulo trick but can't figure it out.

Here a solution that fixes the periodicity of the cos function 'brute force' by calculating an offset and a sign correction based on the x value. I'm sure there is something better out there, but I would almost need a drawing with the angles and distances involved.
from matplotlib import pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1, figsize=(4,4))
x = np.linspace(-540,540,1000)
sign = np.sign(np.cos(np.pi*x/180))
offset = ((x-90)//180)*180
for A in range(1,91,9):
u = np.sin(np.pi*A/180)*np.cos(np.pi*x/180)
v = 180*(np.arcsin(u))/np.pi
y = 180*np.arcsin(np.sin(np.pi*x/180)/np.cos(np.pi*v/180))/np.pi
y = sign*y + offset
ax.plot(x,y)
ax.grid(True)
plt.show()
The result for the interval [-540, 540] looks like this:
Note that you can get pi also from numpy, so you don't need to import math -- I altered the code accordingly.
EDIT:
Apparently I first slightly misunderstood the OP's desired output. If the calculation of offset is just slightly changed, the result is as requested:
from matplotlib import pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,1, figsize=(4,4))
x = np.linspace(-720,720,1000)
sign = np.sign(np.cos(np.pi*x/180))
offset = ((x-90)//180 +1 )*180 - ((x-180)//360+1)*360
for A in range(1,91,9):
u = np.sin(np.pi*A/180)*np.cos(np.pi*x/180)
v = 180*(np.arcsin(u))/np.pi
y = 180*np.arcsin(np.sin(np.pi*x/180)/np.cos(np.pi*v/180))/np.pi
y = sign*y + offset
ax.plot(x,y)
ax.grid(True)
plt.show()
The result now looks like this:

Thank you #Thomas Kühn, it seems fine except I wanted to restrict the function in a single same turn in respect to y-values. Anyway, it's only aesthetics.
Here's what I found by my side. It's maybe not perfect but it works:
A = 50
u = np.sin(math.pi*A/180)*np.cos(math.pi*x/180)
v = 180*(np.arcsin(u))/math.pi
y = 180*np.arcsin(np.sin(math.pi*x/180)/np.cos(math.pi*v/180))/math.pi
for i,j in np.ndenumerate(x):
val = (j-180)%360-180
if (val < -90):
y[i] = -180-y[i]
elif (val > 90):
y[i] = 180-y[i]
Here are some expected results:
Range from -180 to +180
Range from 0 to +360
Range from -720 to +720
Range from -360 to +360 with some different A values.
Funny thing is that it reminds me some electronics diagrams as well.
Periodic phenomenons are everywhere!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pyplot hist() frequency histogram does not normalize to 1 - python

Related

Make a parabola steeper at both sides while keeping both ends

Regular Distribution of Points in the Volume of a Sphere

given percentiles find distribution function python

displaying Mandelbrot set in python using matplotlib.pyplot and numpy

Complete turn x angle values are mapped to half turn using asin function, how to mirror them back?

Categories

Resources