I want to use the dendogram of scipy.
I have the following data:
I have a list with seven different means. For example:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
Each mean is calculate for a different user. For example:
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
My aim is to display the data described above with the help of a dendorgram.
I tried the following:
Y = [71.407452200146807, 0, 33.700136456196823, 1112.3757110973756, 31.594949722819372, 34.823881975554166, 28.36368420190157]
X = ["user1", "user2", "user3", "user4", "user5", "user6", "user7"]
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
Z = linkage(Y)
# Plot the dendogram with the results above
dendrogram(Z, leaf_rotation=45., leaf_font_size=12. , show_contracted=True)
plt.style.use("seaborn-whitegrid")
plt.title("Dendogram to find clusters")
plt.ylabel("Distance")
plt.show()
But it says:
ValueError: Length n of condensed distance matrix 'y' must be a binomial coefficient, i.e.there must be a k such that (k \choose 2)=n)!
I already tried to convert my data into a matrix. With:
# Attempt with matrix
#X = np.concatenate((X, Y),)
#Z = linkage(X)
But that doesn´t work too!
Are there any suggestions?
Thanks :-)
The first argument of linkage is either an n x m array, representing n points in m-dimensional space, or a one-dimensional array containing the condensed distance matrix. These are two very different meanings! The first is the raw data, i.e. the observations. The second format assumes that you have already computed all the distances between your observations, and you are providing these distances to linkage, not the original points.
It looks like you want the first case (raw data), with m = 1. So you must reshape the input to have shape (n, 1).
Replace this:
Z = linkage(Y)
with:
Z = linkage(np.reshape(Y, (len(Y), 1)))
So you are using 7 observations in Y len(Y) = 7.
But as per documentation of Linkage, the number of observations len(Y) should be such that.
{n \choose 2} = len(Y)
which means
1/2 * (n -1) * n = len(Y)
so length of Y should be such that n is a valid integer.
Related
Error: ValueError: shapes (3,1) and (3,2) not aligned: 1 (dim 1) != 3 (dim 0)
The error occurs because the matrices are different sizes, but how can I multiply two matrices with different size and where the resulting output should be: [-0.78 0.85]?
import numpy as np
x1 = 3-7/3;
x2 = 2-4/3;
x3 = 1-5/3;
X = ([x1], [x2],[x3])
V = ([-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70])
res = np.dot(X,V)
print("Res: ",res)
Any help is appreciated!
Mathematical question, for better understanding:
A principal component analysis is carried out on a dataset comprised of three data points x1, x2 and x3 collected in a N × M matrix X such that each row of the matrix is a data point. Suppose the matrix X ̃ corresponds to X with the mean of each columns substracted i.e.
X = ([3.00, 2.00, 1.00],[4.00, 1.00, 2.00],[0.00, 1.00, 2.00])
and suppose X ̃ has the singular value decomposition:
V = ([-0.99, -0.13, -0.00], [-0.09, 0.70, -0.71],[0.09, -0.70, -0.71])
What is the (rounded to two significant digits) coordinates of the first observation x1 projected onto the 2-Dimensional subspace containing the maximal variation?
Answer:
The projection can be found by substracting the mean from X
and projecting onto the first two columns of V. The first point with the mean subtracted has coordinates: [2-7/3 2-4/3 1-5/3]
This should be (left) multiplied with the first two columns of V:
([3-7/3], [2-4/3],[1-5/3]) * ([-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70]) = [-0.78 0.85]
So I am trying to find out how to calculate this in python.
I am assuming you wish to perform matrix multliplication. This cannot be achieved if the dimensions of the matrices are different. You can achieve the desired result by using reshape and numpy.matmul().
Code:
import numpy as np
x1 = 3-7/3;
x2 = 2-4/3;
x3 = 1-5/3;
X = np.array([[x1], [x2],[x3]])
X = X.reshape(1, 3)
V = np.array([[-0.99, -0.13], [-0.09, 0.70],[0.09, -0.70]])
res = np.matmul(X, V)
print("Res: ",res)
I have used interp2 in Matlab, such as the following code, that is part of #rayryeng's answer in: Three dimensional (3D) matrix interpolation in Matlab:
d = size(volume_image)
[X,Y] = meshgrid(1:1/scaleCoeff(2):d(2), 1:1/scaleCoeff(1):d(1));
for ind = z
%Interpolate each slice via interp2
M2D(:,:,ind) = interp2(volume_image(:,:,ind), X, Y);
end
Example of Dimensions:
The image size is 512x512 and the number of slices is 133. So:
volume_image(rows, columns, slices in 3D dimenson) : 512x512x133 in 3D dimenson
X: 288x288
Y: 288x288
scaleCoeff(2): 0.5625
scaleCoeff(1): 0.5625
z = 1 up to 133 ,hence z: 1x133
ind: 1 up to 133
M2D(:,:,ind) finally is 288x288x133 in 3D dimenson
Aslo, Matlabs syntax for size: (rows, columns, slices in 3rd dimenson) and Python syntax for size: (slices in 3rd dim, rows, columns).
However, after convert the Matlab code to Python code occurred an error, ValueError: Invalid length for input z for non rectangular grid:
for ind in range(0, len(z)+1):
M2D[ind, :, :] = interpolate.interp2d(X, Y, volume_image[ind, :, :]) # ValueError: Invalid length for input z for non rectangular grid
What is wrong? Thank you so much.
In MATLAB, interp2 has as arguments:
result = interp2(input_x, input_y, input_z, output_x, output_y)
You are using only the latter 3 arguments, the first two are assumed to be input_x = 1:size(input_z,2) and input_y = 1:size(input_z,1).
In Python, scipy.interpolate.interp2 is quite different: it takes the first 3 input arguments of the MATLAB function, and returns an object that you can call to get interpolated values:
f = scipy.interpolate.interp2(input_x, input_y, input_z)
result = f(output_x, output_y)
Following the example from the documentation, I get to something like this:
from scipy import interpolate
x = np.arange(0, volume_image.shape[2])
y = np.arange(0, volume_image.shape[1])
f = interpolate.interp2d(x, y, volume_image[ind, :, :])
xnew = np.arange(0, volume_image.shape[2], 1/scaleCoeff[0])
ynew = np.arange(0, volume_image.shape[1], 1/scaleCoeff[1])
M2D[ind, :, :] = f(xnew, ynew)
[Code not tested, please let me know if there are errors.]
You might be interested in scipy.ndimage.zoom. If you are interpolating from one regular grid to another, it is much faster and easier to use than scipy.interpolate.interp2d.
See this answer for an example:
https://stackoverflow.com/a/16984081/1295595
You'd probably want something like:
import scipy.ndimage as ndimage
M2D = ndimage.zoom(volume_image, (1, scaleCoeff[0], scaleCoeff[1])
I am using scipy.stats.binned_statistic_2d to bin irregular data onto a uniform grid by finding the mean of points within every bin.
x,y = np.meshgrid(sort(np.random.uniform(0,1,100)),sort(np.random.uniform(0,1,100)))
z = np.sin(x*y)
statistic, xedges, yedges, binnumber = sp.stats.binned_statistic_2d(x.ravel(), y.ravel(), values=z.ravel(), statistic='mean',bins=[np.arange(0,1.1,.1), np.arange(0,1.1,.1)])
plt.figure(1)
plt.pcolormesh(x,y,z, vmin = 0, vmax = 1)
plt.figure(2)
plt.pcolormesh(xedges,yedges,statistic, vmin = 0, vmax = 1)
Produces these plots, as expected:
Scattered data:
Gridded data:
But the data I want to grid has NaNs in it. This is what the result is like when I add NaNs:
x,y = np.meshgrid(sort(np.random.uniform(0,1,100)),sort(np.random.uniform(0,1,100)))
z = np.sin(x*y)
z[50:55,50:55] = np.nan
statistic, xedges, yedges, binnumber = binned_statistic_2d(x.ravel(), y.ravel(), values=z.ravel(), statistic='mean',bins=[np.arange(0,1.1,.1), np.arange(0,1.1,.1)])
plt.figure(3)
plt.pcolormesh(x,y,z, vmin = 0, vmax = 1)
plt.figure(4)
plt.pcolormesh(xedges,yedges,statistic, vmin = 0, vmax = 1)
Scattered:
Gridded:
Obviously if a bin is entirely filled with NaNs, the the resulting mean of that bin should still be NaN. However, I would like bins that are not entirely filled with NaNs to just result in the mean of the non-NaN numbers.
I've tried replacing the "statistic" argument in sp.stats.binned_statistic_2d with np.nanmean. This works, but it goes very very slowly when I use it on large datasets. I've tried digging into the underlying code of `sp.stats.binned_statistic_2d', but I can't figure out exactly how it is calculating the mean, or how to make it ignore NaNs in it's calculation.
Any ideas?
I had the same problem and changed the definition of binned_statistic_dd in scipy.stats and saved a local copy so that it won't be changed if scipy is updated.
I added 'nanmean' to the list of known_stats and
elif statistic == 'nanmean':
result.fill(np.nan)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = np.nanmean(values[vv, binnumbers == i])
Full new definition:
def binned_statistic_dd(sample, values, statistic='mean',
bins=10, range=None, expand_binnumbers=False,
binned_statistic_result=None):
"""
Compute a multidimensional binned statistic for a set of data.
This is a generalization of a histogramdd function. A histogram divides
the space into bins, and returns the count of the number of points in
each bin. This function allows the computation of the sum, mean, median,
or other statistic of the values within each bin.
Parameters
----------
sample : array_like
Data to histogram passed as a sequence of N arrays of length D, or
as an (N,D) array.
values : (N,) array_like or list of (N,) array_like
The data on which the statistic will be computed. This must be
the same shape as `sample`, or a list of sequences - each with the
same shape as `sample`. If `values` is such a list, the statistic
will be computed on each independently.
statistic : string or callable, optional
The statistic to compute (default is 'mean').
The following statistics are available:
* 'mean' : compute the mean of values for points within each bin.
Empty bins will be represented by NaN.
* 'median' : compute the median of values for points within each
bin. Empty bins will be represented by NaN.
* 'count' : compute the count of points within each bin. This is
identical to an unweighted histogram. `values` array is not
referenced.
* 'sum' : compute the sum of values for points within each bin.
This is identical to a weighted histogram.
* 'std' : compute the standard deviation within each bin. This
is implicitly calculated with ddof=0. If the number of values
within a given bin is 0 or 1, the computed standard deviation value
will be 0 for the bin.
* 'min' : compute the minimum of values for points within each bin.
Empty bins will be represented by NaN.
* 'max' : compute the maximum of values for point within each bin.
Empty bins will be represented by NaN.
* function : a user-defined function which takes a 1D array of
values, and outputs a single numerical statistic. This function
will be called on the values in each bin. Empty bins will be
represented by function([]), or NaN if this returns an error.
bins : sequence or positive int, optional
The bin specification must be in one of the following forms:
* A sequence of arrays describing the bin edges along each dimension.
* The number of bins for each dimension (nx, ny, ... = bins).
* The number of bins for all dimensions (nx = ny = ... = bins).
range : sequence, optional
A sequence of lower and upper bin edges to be used if the edges are
not given explicitly in `bins`. Defaults to the minimum and maximum
values along each dimension.
expand_binnumbers : bool, optional
'False' (default): the returned `binnumber` is a shape (N,) array of
linearized bin indices.
'True': the returned `binnumber` is 'unraveled' into a shape (D,N)
ndarray, where each row gives the bin numbers in the corresponding
dimension.
See the `binnumber` returned value, and the `Examples` section of
`binned_statistic_2d`.
binned_statistic_result : binnedStatisticddResult
Result of a previous call to the function in order to reuse bin edges
and bin numbers with new values and/or a different statistic.
To reuse bin numbers, `expand_binnumbers` must have been set to False
(the default)
.. versionadded:: 0.17.0
Returns
-------
statistic : ndarray, shape(nx1, nx2, nx3,...)
The values of the selected statistic in each two-dimensional bin.
bin_edges : list of ndarrays
A list of D arrays describing the (nxi + 1) bin edges for each
dimension.
binnumber : (N,) array of ints or (D,N) ndarray of ints
This assigns to each element of `sample` an integer that represents the
bin in which this observation falls. The representation depends on the
`expand_binnumbers` argument. See `Notes` for details.
See Also
--------
numpy.digitize, numpy.histogramdd, binned_statistic, binned_statistic_2d
Notes
-----
Binedges:
All but the last (righthand-most) bin is half-open in each dimension. In
other words, if `bins` is ``[1, 2, 3, 4]``, then the first bin is
``[1, 2)`` (including 1, but excluding 2) and the second ``[2, 3)``. The
last bin, however, is ``[3, 4]``, which *includes* 4.
`binnumber`:
This returned argument assigns to each element of `sample` an integer that
represents the bin in which it belongs. The representation depends on the
`expand_binnumbers` argument. If 'False' (default): The returned
`binnumber` is a shape (N,) array of linearized indices mapping each
element of `sample` to its corresponding bin (using row-major ordering).
If 'True': The returned `binnumber` is a shape (D,N) ndarray where
each row indicates bin placements for each dimension respectively. In each
dimension, a binnumber of `i` means the corresponding value is between
(bin_edges[D][i-1], bin_edges[D][i]), for each dimension 'D'.
.. versionadded:: 0.11.0
Examples
--------
>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>> from mpl_toolkits.mplot3d import Axes3D
Take an array of 600 (x, y) coordinates as an example.
`binned_statistic_dd` can handle arrays of higher dimension `D`. But a plot
of dimension `D+1` is required.
>>> mu = np.array([0., 1.])
>>> sigma = np.array([[1., -0.5],[-0.5, 1.5]])
>>> multinormal = stats.multivariate_normal(mu, sigma)
>>> data = multinormal.rvs(size=600, random_state=235412)
>>> data.shape
(600, 2)
Create bins and count how many arrays fall in each bin:
>>> N = 60
>>> x = np.linspace(-3, 3, N)
>>> y = np.linspace(-3, 4, N)
>>> ret = stats.binned_statistic_dd(data, np.arange(600), bins=[x, y],
... statistic='count')
>>> bincounts = ret.statistic
Set the volume and the location of bars:
>>> dx = x[1] - x[0]
>>> dy = y[1] - y[0]
>>> x, y = np.meshgrid(x[:-1]+dx/2, y[:-1]+dy/2)
>>> z = 0
>>> bincounts = bincounts.ravel()
>>> x = x.ravel()
>>> y = y.ravel()
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111, projection='3d')
>>> with np.errstate(divide='ignore'): # silence random axes3d warning
... ax.bar3d(x, y, z, dx, dy, bincounts)
Reuse bin numbers and bin edges with new values:
>>> ret2 = stats.binned_statistic_dd(data, -np.arange(600),
... binned_statistic_result=ret,
... statistic='mean')
"""
known_stats = ['mean', 'median', 'count', 'sum', 'std', 'min', 'max',
'nanmean']
if not callable(statistic) and statistic not in known_stats:
raise ValueError('invalid statistic %r' % (statistic,))
try:
bins = index(bins)
except TypeError:
# bins is not an integer
pass
# If bins was an integer-like object, now it is an actual Python int.
# NOTE: for _bin_edges(), see e.g. gh-11365
if isinstance(bins, int) and not np.isfinite(sample).all():
raise ValueError('%r contains non-finite values.' % (sample,))
# `Ndim` is the number of dimensions (e.g. `2` for `binned_statistic_2d`)
# `Dlen` is the length of elements along each dimension.
# This code is based on np.histogramdd
try:
# `sample` is an ND-array.
Dlen, Ndim = sample.shape
except (AttributeError, ValueError):
# `sample` is a sequence of 1D arrays.
sample = np.atleast_2d(sample).T
Dlen, Ndim = sample.shape
# Store initial shape of `values` to preserve it in the output
values = np.asarray(values)
input_shape = list(values.shape)
# Make sure that `values` is 2D to iterate over rows
values = np.atleast_2d(values)
Vdim, Vlen = values.shape
# Make sure `values` match `sample`
if(statistic != 'count' and Vlen != Dlen):
raise AttributeError('The number of `values` elements must match the '
'length of each `sample` dimension.')
try:
M = len(bins)
if M != Ndim:
raise AttributeError('The dimension of bins must be equal '
'to the dimension of the sample x.')
except TypeError:
bins = Ndim * [bins]
if binned_statistic_result is None:
nbin, edges, dedges = _bin_edges(sample, bins, range)
binnumbers = _bin_numbers(sample, nbin, edges, dedges)
else:
edges = binned_statistic_result.bin_edges
nbin = np.array([len(edges[i]) + 1 for i in builtins.range(Ndim)])
# +1 for outlier bins
dedges = [np.diff(edges[i]) for i in builtins.range(Ndim)]
binnumbers = binned_statistic_result.binnumber
result = np.empty([Vdim, nbin.prod()], float)
if statistic == 'mean':
result.fill(np.nan)
flatcount = np.bincount(binnumbers, None)
a = flatcount.nonzero()
for vv in builtins.range(Vdim):
flatsum = np.bincount(binnumbers, values[vv])
result[vv, a] = flatsum[a] / flatcount[a]
elif statistic == 'std':
result.fill(0)
flatcount = np.bincount(binnumbers, None)
a = flatcount.nonzero()
for vv in builtins.range(Vdim):
for i in np.unique(binnumbers):
# NOTE: take std dev by bin, np.std() is 2-pass and stable
binned_data = values[vv, binnumbers == i]
# calc std only when binned data is 2 or more for speed up.
if len(binned_data) >= 2:
result[vv, i] = np.std(binned_data)
elif statistic == 'count':
result.fill(0)
flatcount = np.bincount(binnumbers, None)
a = np.arange(len(flatcount))
result[:, a] = flatcount[np.newaxis, :]
elif statistic == 'sum':
result.fill(0)
for vv in builtins.range(Vdim):
flatsum = np.bincount(binnumbers, values[vv])
a = np.arange(len(flatsum))
result[vv, a] = flatsum
elif statistic == 'median':
result.fill(np.nan)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = np.median(values[vv, binnumbers == i])
elif statistic == 'min':
result.fill(np.nan)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = np.min(values[vv, binnumbers == i])
elif statistic == 'max':
result.fill(np.nan)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = np.max(values[vv, binnumbers == i])
elif statistic == 'nanmean':
result.fill(np.nan)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = np.nanmean(values[vv, binnumbers == i])
elif callable(statistic):
with np.errstate(invalid='ignore'), suppress_warnings() as sup:
sup.filter(RuntimeWarning)
try:
null = statistic([])
except Exception:
null = np.nan
result.fill(null)
for i in np.unique(binnumbers):
for vv in builtins.range(Vdim):
result[vv, i] = statistic(values[vv, binnumbers == i])
# Shape into a proper matrix
result = result.reshape(np.append(Vdim, nbin))
# Remove outliers (indices 0 and -1 for each bin-dimension).
core = tuple([slice(None)] + Ndim * [slice(1, -1)])
result = result[core]
# Unravel binnumbers into an ndarray, each row the bins for each dimension
if(expand_binnumbers and Ndim > 1):
binnumbers = np.asarray(np.unravel_index(binnumbers, nbin))
if np.any(result.shape[1:] != nbin - 2):
raise RuntimeError('Internal Shape Error')
# Reshape to have output (`result`) match input (`values`) shape
result = result.reshape(input_shape[:-1] + list(nbin-2))
return BinnedStatisticddResult(result, edges, binnumbers)
I want to translate the following group coloring octave function to python and use it with pyplot.
Function input:
x - Data matrix (m x n)
a - A parameter.
index - A vector of size "m" with values in range [: a]
(For example if a = 4, index can be [random.choice(range(4)) for i in range(m)]
The values in "index" indicate the number of the group the "m"th data point belongs to.
The function should plot all the data points from x and color them in different colors (Number of different colors is "a").
The function in octave:
p = hsv(a); % This is a x 3 metrix
colors = p(index, :); % ****This is m x 3 metrix****
scatter(X(:,1), X(:,2), 10, colors);
I couldn't find a function like hsv in python, so I wrote it myself (I think I did..):
p = colors.hsv_to_rgb(numpy.column_stack((
numpy.linspace(0, 1, a), numpy.ones((a ,2)) )) )
But I can't figure out how to do the matrix selection p(index, :) in python (numpy).
Specially because the size of "index" is bigger then "a".
Thanks in advance for your help.
So, you want to take an m x 3 of HSV values, and convert each row to RGB?
import numpy as np
import colorsys
mymatrix = np.matrix([[11,12,13],
[21,22,23],
[31,32,33]])
def to_hsv(x):
return colorsys.rgb_to_hsv(*x)
#Apply the to_hsv function to each matrix row.
print np.apply_along_axis(to_hsv, axis=1, arr=mymatrix)
This produces:
[[ 0.5 0. 13. ]
[ 0.5 0. 23. ]
[ 0.5 0. 33. ]]
Follow through on your comment:
If I understand you have a matrix p that is an a x 3 matrix, and you want to randomly select rows from the matrix over and over again, until you have a new matrix that is m x 3?
Ok. Let's say you have a matrix p defined as follows:
a = 5
p = np.random.randint(5, size=(a, 3))
Now, make a list of random integers between the range 0 -> 3 (index starts at 0 and ends to a-1), That is m in length:
m = 20
index = np.random.randint(a, size=m)
Now access the right indexes and plug them into a new matrix:
p_prime = np.matrix([p[i] for i in index])
Produces a 20 x 3 matrix.
On the numpy page they give the example of
s = np.random.dirichlet((10, 5, 3), 20)
which is all fine and great; but what if you want to generate random samples from a 2D array of alphas?
alphas = np.random.randint(10, size=(20, 3))
If you try np.random.dirichlet(alphas), np.random.dirichlet([x for x in alphas]), or np.random.dirichlet((x for x in alphas)), it results in a
ValueError: object too deep for desired array. The only thing that seems to work is:
y = np.empty(alphas.shape)
for i in xrange(np.alen(alphas)):
y[i] = np.random.dirichlet(alphas[i])
print y
...which is far from ideal for my code structure. Why is this the case, and can anyone think of a more "numpy-like" way of doing this?
Thanks in advance.
np.random.dirichlet is written to generate samples for a single Dirichlet distribution. That code is implemented in terms of the Gamma distribution, and that implementation can be used as the basis for a vectorized code to generate samples from different distributions. In the following, dirichlet_sample takes an array alphas with shape (n, k), where each row is an alpha vector for a Dirichlet distribution. It returns an array also with shape (n, k), each row being a sample of the corresponding distribution from alphas. When run as a script, it generates samples using dirichlet_sample and np.random.dirichlet to verify that they are generating the same samples (up to normal floating point differences).
import numpy as np
def dirichlet_sample(alphas):
"""
Generate samples from an array of alpha distributions.
"""
r = np.random.standard_gamma(alphas)
return r / r.sum(-1, keepdims=True)
if __name__ == "__main__":
alphas = 2 ** np.random.randint(0, 4, size=(6, 3))
np.random.seed(1234)
d1 = dirichlet_sample(alphas)
print "dirichlet_sample:"
print d1
np.random.seed(1234)
d2 = np.empty(alphas.shape)
for k in range(len(alphas)):
d2[k] = np.random.dirichlet(alphas[k])
print "np.random.dirichlet:"
print d2
# Compare d1 and d2:
err = np.abs(d1 - d2).max()
print "max difference:", err
Sample run:
dirichlet_sample:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
np.random.dirichlet:
[[ 0.38980834 0.4043844 0.20580726]
[ 0.14076375 0.26906604 0.59017021]
[ 0.64223074 0.26099934 0.09676991]
[ 0.21880145 0.33775249 0.44344606]
[ 0.39879859 0.40984454 0.19135688]
[ 0.73976425 0.21467288 0.04556287]]
max difference: 5.55111512313e-17
I think you're looking for
y = np.array([np.random.dirichlet(x) for x in alphas])
for your list comprehension. Otherwise you're simply passing a python list or tuple. I imagine the reason numpy.random.dirichlet does not accept your list of alpha values is because it's not set up to - it already accepts an array, which it expects to have a dimension of k, as per the documentation.