I need to generate a vector sampled uniformly with 10 directions (a collection of 10 random numbers) which lies over a unit sphere. So, the sum of the squares of the 10 values should be 1.
This is the exact question for which I need to generate those points:
Implement the Perceptron algorithm and run it on the following
synthetic data sets in ā10: pick š¤ā = [1,0,0,ā¦,0]; generate 1000
points š„ by sampling uniformly at random over the unit sphere and
then removing those that have margin š¾ smaller than 0.1; generate
label š¦ = sign((š¤ā)Tš„).
There is a math theorem saying that if X = (X1,...,XN) is a vector with Xi the standard normal distribution, then X/NORM(X) is uniform in the unit sphere, where NORM is the euclidean norm. So you have to sample 10 points from a standard normal distribution (using numpy?) and then normalize the result.
As #Andrex suggested, here is the right solution:
import numpy as np
import math
s = np.random.normal(0, 1, 10)
norm=math.sqrt(sum(s*s))
result=s/norm
where result is the answer. You can evaluate the result:
sum([x*x for x in result])
1.0
Related
For a project, I need to be able to sample random points uniformly from linear subspaces (ie. lines and hyperplanes) within a certain radius. Since these are linear subspaces, they must go through the origin. This should work for any dimension n from which we draw our subspaces for in Rn.
I want my range of values to be from -0.5 to 0.5 (ie, all the points should fall within a hypercube whose center is at the origin and length is 1). I have tried to do the following to generate random subspaces and then points from those subspaces but I don't think it's exactly correct (I think I'm missing some form of normalization for the points):
def make_pd_line_in_rn(p, n, amount=1000):
# n is the dimension we draw our subspaces from
# p is the dimension of the subspace we want to draw (eg p=2 => line, p=3 => plane, etc)
# assume that n >= p
coeffs = np.random.rand(n, p) - 0.5
t = np.random.rand(amount, p)-0.5
return np.matmul(t, coeffs.T)
I'm not really good at visualizing the 3D stuff and have been banging my head against the wall for a couple of days.
Here is a perfect example of what I'm trying to achieve:
I think I'm missing some form of normalization for the points
Yes, you identified the issue correctly. Let me sum up your algorithm as it stands:
Generate a random subspace basis coeffs made of p random vectors in dimension n;
Generate coordinates t for amount points in the basis coeffs
Return the coordinates of the amount points in R^n, which is the matrix product of t and coeffs.
This works, except for one detail: the basis coeffs is not an orthonormal basis. The vectors of coeffs do not define a hypercube of side length 1; instead, they define a random parallelepiped.
To fix your code, you need to generate a random orthonormal basis instead of coeffs. You can do that using scipy.stats.ortho_group.rvs, or if you don't want to import scipy.stats, refer to the accepted answer to that question: How to create a random orthonormal matrix in python numpy?
from scipy.stats import ortho_group # ortho_group.rvs random orthogonal matrix
import numpy as np # np.random.rand random matrix
def make_pd_line_in_rn(p, n, amount=1000):
# n is the dimension we draw our subspaces from
# p is the dimension of the subspace we want to draw (eg p=2 => line, p=3 => plane, etc)
# assume that n >= p
coeffs = ortho_group.rvs(n)[:p]
t = np.random.rand(amount, p) - 0.5
return np.matmul(t, coeffs)
Please note that this method returns a rotated hypercube, aligned with the subspace. This makes sense; for instance, if you want to draw a square on a plane embed in R^3, then the square has to be aligned with the plane (otherwise it's not in the plane).
If what you wanted instead, is the intersection of a dimension-n hypercube with the dimension-p subspace, as suggested in the comments, then please do clarify your question.
I've looked around and all solutions for generating uniform random points in/on the unit ball are designed for 2 or 3 dimensions.
What is a (tractable) way to generate uniform random points inside a ball in arbitrary dimension? Particularly, not just on the surface of the ball.
To preface, generating random points in the cube and throwing out the points with norm greater than 1 is not feasible in high dimension. The ratio of the volume of a unit ball to the volume of a unit cube in high dimension goes to 0. Even in 10 dimensions only about 0.25% of random points in the unit cube are also inside the unit ball.
The best way to generate uniformly distributed random points in a d-dimension ball appears to be by thinking of polar coordinates (directions instead of locations). Code is provided below.
Pick a random point on the unit ball with uniform distribution.
Pick a random radius where the likelihood of a radius corresponds to the surface area of a ball with that radius in d dimensions.
This selection process will (1) make all directions equally likely, and (2) make all points on the surface of balls within the unit ball equally likely. This will generate our desired uniformly random distribution over the entire interior of the ball.
Picking a random direction (on the unit ball)
In order to achieve (1) we can randomly generate a vector from d independent draws of a Gaussian distribution normalized to unit length. This works because a Gausssian distribution has a probability distribution function (PDF) with x^2 in an exponent. That implies that the joint distribution (for independent random variables this is the multiplication of their PDFs) will have (x_1^2 + x_2^2 + ... + x_d^2) in the exponent. Notice that resembles the definition of a sphere in d dimensions, meaning the joint distribution of d independent samples from a Gaussian distribution is invariant to rotation (the vectors are uniform over a sphere).
Here is what 200 random points generated in 2D looks like.
Picking a random radius (with appropriate probability)
In order to achieve (2) we can generate a radius by using the inverse of a cumulative distribution function (CDF) that corresponds to the surface area of a ball in d dimensions with radius r. We know that the surface area of an n-ball is proportional to r^d, meaning we can use this over the range [0,1] as a CDF. Now a random sample is generated by mapping random numbers in the range [0,1] through the inverse, r^(1/d).
Here is a visual of the CDF of x^2 (for two dimensions), random generated numbers in [0,1] would get mapped to the corresponding x coordinate on this curve. (e.g. .1 ā .317)
Code for the above
Finally, here is some Python code (assumes you have NumPy installed) that computes all of the above.
# Generate "num_points" random points in "dimension" that have uniform
# probability over the unit ball scaled by "radius" (length of points
# are in range [0, "radius"]).
def random_ball(num_points, dimension, radius=1):
from numpy import random, linalg
# First generate random directions by normalizing the length of a
# vector of random-normal values (these distribute evenly on ball).
random_directions = random.normal(size=(dimension,num_points))
random_directions /= linalg.norm(random_directions, axis=0)
# Second generate a random radius with probability proportional to
# the surface area of a ball with a given radius.
random_radii = random.random(num_points) ** (1/dimension)
# Return the list of random (direction & length) points.
return radius * (random_directions * random_radii).T
For posterity, here is a visual of 5000 random points generated with the above code.
What function can I use in Python if I want to sample a truncated integer power law?
That is, given two parameters a and m, generate a random integer x in the range [1,m) that follows a distribution proportional to 1/x^a.
I've been searching around numpy.random, but I haven't found this distribution.
AFAIK, neither NumPy nor Scipy defines this distribution for you. However, using SciPy it is easy to define your own discrete distribution function using scipy.rv_discrete:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
def truncated_power_law(a, m):
x = np.arange(1, m+1, dtype='float')
pmf = 1/x**a
pmf /= pmf.sum()
return stats.rv_discrete(values=(range(1, m+1), pmf))
a, m = 2, 10
d = truncated_power_law(a=a, m=m)
N = 10**4
sample = d.rvs(size=N)
plt.hist(sample, bins=np.arange(m)+0.5)
plt.show()
I don't use Python, so rather than risk syntax errors I'll try to describe the solution algorithmically. This is a brute-force discrete inversion. It should translate quite easily into Python. I'm assuming 0-based indexing for the array.
Setup:
Generate an array cdf of size m with cdf[0] = 1 as the first entry, cdf[i] = cdf[i-1] + 1/(i+1)**a for the remaining entries.
Scale all entries by dividing cdf[m-1] into each -- now they actually are CDF values.
Usage:
Generate your random values by generating a Uniform(0,1) and
searching through cdf[] until you find an entry greater than your
uniform. Return the index + 1 as your x-value.
Repeat for as many x-values as you want.
For instance, with a,m = 2,10, I calculate the probabilities directly as:
[0.6452579827864142, 0.16131449569660355, 0.07169533142071269, 0.04032862392415089, 0.02581031931145657, 0.017923832855178172, 0.013168530260947229, 0.010082155981037722, 0.007966147935634743, 0.006452579827864143]
and the CDF is:
[0.6452579827864142, 0.8065724784830177, 0.8782678099037304, 0.9185964338278814, 0.944406753139338, 0.9623305859945162, 0.9754991162554634, 0.985581272236501, 0.9935474201721358, 1.0]
When generating, if I got a Uniform outcome of 0.90 I would return x=4 because 0.918... is the first CDF entry larger than my uniform.
If you're worried about speed you could build an alias table, but with a geometric decay the probability of early termination of a linear search through the array is quite high. With the given example, for instance, you'll terminate on the first peek almost 2/3 of the time.
Use numpy.random.zipf and just reject any samples greater than or equal to m
I need a uniform distribution of points on a 4 dimensional sphere. I know this is not as trivial as picking 3 angles and using polar coordinates.
In 3 dimensions I use
from random import random
u=random()
costheta = 2*u -1 #for distribution between -1 and 1
theta = acos(costheta)
phi = 2*pi*random
x=costheta
y=sin(theta)*cos(phi)
x=sin(theta)*sin(phi)
This gives a uniform distribution of x, y and z.
How can I obtain a similar distribution for 4 dimensions?
A standard way, though, perhaps not the fastest, is to use Muller's method to generate uniformly distributed points on an N-sphere:
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
N = 600
dim = 3
norm = np.random.normal
normal_deviates = norm(size=(dim, N))
radius = np.sqrt((normal_deviates**2).sum(axis=0))
points = normal_deviates/radius
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
ax.scatter(*points)
ax.set_aspect('equal')
plt.show()
Simply change dim = 3 to dim = 4 to generate points on a 4-sphere.
Take a point in 4D space whose coordinates are distributed normally, and calculate its unit vector. This will be on the unit 4-sphere.
from random import random
import math
x=random.normalvariate(0,1)
y=random.normalvariate(0,1)
z=random.normalvariate(0,1)
w=random.normalvariate(0,1)
r=math.sqrt(x*x + y*y + z*z + w*w)
x/=r
y/=r
z/=r
w/=r
print (x,y,z,w)
I like #unutbu's answer if the gaussian sampling really creates an evenly spaced spherical distribution (unlike sampling from a cube), but to avoid sampling on a Gaussian distribution and to have to prove that, there is a simple solution: to sample on a uniform distribution on a sphere (not on a cube).
Generate points on a uniform distribution.
Compute the squared radius of each point (avoid the square root).
Discard points:
Discard points for which the squared radius is greater than 1 (thus, for which the unsquared radius is greater than 1).
Discard points too close to a radius of zero to avoid numerical instabilities related to the division in the next step.
For each sampled point kept, divide the sampled point by the norm so as to renormalize it the unit radius.
Wash and repeat for more points because of discarded samples.
This obviously works in an n-dimensional space, since the radius is always the L2-norm in higher dimensions.
It is fast so as avoiding a square-root and sampling on a Gaussian distribution, but it's not a vectorized algorithm.
I found a good solution for sampling from N-dim sphere. The main idea is:
If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit d-sphere. Multiplying S by U1/d, where U has the uniform distribution on the unit interval (0,1), creates the uniform distribution in the unit d-dimensional ball.
Here is the python code to do this:
Y = np.random.multivariate_normal(mean=[0], cov=np.eye(1,1), size=(n_dims, n_samples))
Y = np.squeeze(Y, -1)
Y /= np.sqrt(np.sum(Y * sample_isotropic, axis=0))
U = np.random.uniform(low=0, high=1, size=(n_samples)) ** (1/n_dims)
Y *= distr * radius # in my case radius is one
This is what I get for the sphere:
I am doing some work, comparing the interpolated fft of the concentrations of some gases over a period, of which is unevenly sampled, with the lomb-scargle periodogram of the same data. I am using scipy's fft function to calculate the fourier transform and then squaring the modulus of this to give what I believe to be the power spectral density, in units of parts per billion(ppb) squared.
I can get the lomb-scargle plot to match almost the exact pattern as the FFT but never the same scale of magnitude, the FFT power spectral density always is higher, even though I thought the lomb-scargle power was power spectral density. Now the lomb code I am using:http://www.astropython.org/snippet/2010/9/Fast-Lomb-Scargle-algorithm, normalises the dataset taking away the average and dividing by 2 times the variance from the data, therefore I have normalised the FFT data in the same manner, but still the magnitudes do not match.
Therefore I did some more research and found that the normalised lomb-scargle power could unitless and therefore I cannot the plots match. This leads me to the 2 questions:
What units (if any) are the power spectral density of a normalised lim-scargle perioogram in?
How would I proceed to match my fft plot with my lomb-scargle plot, in terms of magnitude and pattern?
Thank you.
The squared modulus of the Fourier transform of a series is defined as the energy spectral density (ESD). You need to divide the ESD by the length of the series to convert to an estimate of power spectral density (PSD).
Units
The units of a PSD are [units]**2/[frequency] where [units] represents the units of your original series.
Normalization
To check for proper normalization, one can numerically integrate the PSD of a white noise (with known variance). If the integrated spectrum equals the variance of the series, the normalization is correct. A factor of 2 (too low) is not incorrect, though, and may indicate the PSD is normalized to be double-sided; in that case, just multiply by 2 and you have a properly normalized, single-sided PSD.
Using numpy, the randn function generates pseudo-random numbers that are Gaussian distributed. For example
10 * np.random.randn(1, 100)
produces a 1-by-100 array with mean=0 and variance=100. If the sampling frequency is, say, 1-Hz, the single-sided PSD will theoretically be flat at 200 units**2/Hz, from [0,0.5] Hz; the integrated spectrum would thus be 10, equaling the variance of the series.
Update
I modified the example included in the python code you linked to demonstrate the normalization for a normally distributed series of length 20, with variance 1, and sampling frequency 10:
import numpy
import lomb
numpy.random.seed(999)
nd = 20
fs = 10
x = numpy.arange(nd)
y = numpy.random.randn(nd)
fx, fy, nout, jmax, prob = lomb.fasper(x, y, 1., fs)
fNy = fx[-1]
fy = fy/fs
Si = numpy.mean(fy)*fNy
print fNy, Si, Si*2
This gives, for me:
5.26315789474 0.482185882163 0.964371764327
which shows you a few things:
The "Nyquist" frequency asked for is actually the sampling frequency.
The result needs to be divided by the sampling frequency.
The output is normalized for a double-sided PSD, so multiplying by 2 makes the integrated spectrum nearly 1.
In the time since this question was asked and answered, the AstroPy project has gained a Lomb-Scargle method, and this question is addressed in the documentation: http://docs.astropy.org/en/stable/stats/lombscargle.html#psd-normalization-unnormalized
In brief, you can compute a Fourier periodogram and compare it to the astropy Lomb-Scargle periodogram as follows
import numpy as np
from astropy.stats import LombScargle
def fourier_periodogram(t, y):
N = len(t)
frequency = np.fft.fftfreq(N, t[1] - t[0])
y_fft = np.fft.fft(y)
positive = (frequency > 0)
return frequency[positive], (1. / N) * abs(y_fft[positive]) ** 2
t = np.arange(100)
y = np.random.randn(100)
frequency, PSD_fourier = fourier_periodogram(t, y)
PSD_LS = LombScargle(t, y).power(frequency, normalization='psd')
np.allclose(PSD_fourier, PSD_LS)
# True
Since AstroPy is a common tool used in astronomy, I thought this might be more useful than an answer based on the code snippet mentioned above.