I have two 2D arrays (or of higher dimension), one that defines averages (M) and one that defines standard deviations (S). Is there a python library (numpy, scipy, ...?) that allows me to generate an array (X) containing samples drawn from the corresponding distributions?
In other words: each entry xij is a sample that comes from the normal distribution defined by the corresponding mean mij and standard deviation sij.
Yes numpy can help here:
There is a np.random.normal function that accepts array-like inputs:
import numpy as np
means = np.arange(10) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
stddevs = np.ones(10) # [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
samples = np.random.normal(means, stddevs)
array([-1.69515214, -0.20680708, 0.61345775, 2.98154162, 2.77888087,
7.22203785, 5.29995343, 8.52766436, 9.70005434, 9.58381479])
even if they are multidimensional:
means = np.arange(10).reshape(2,5) # make it multidimensional with shape 2, 5
stddevs = np.ones(10).reshape(2,5)
samples = np.random.normal(means, stddevs)
array([[-0.76585438, 1.22226145, 2.85554809, 2.64009423, 4.67255324],
[ 3.21658151, 4.59969355, 6.87946817, 9.14658687, 8.68465692]])
The second one has a shape of (2,5)
In case you want only different means but the same standard deviation you can also only pass one array and one scalar and still get an array with the right shape:
means = np.arange(10)
samples = np.random.normal(means, 1)
array([ 0.54018686, -0.35737881, 2.08881115, 3.08742942, 4.4426366 ,
3.6694955 , 5.27515536, 8.68300816, 8.83893819, 7.71284217])
Related
Given an x-dataset,
x = np.array([1, 2, 3, 4, 5])
what is the most efficient way to create the NumPy array where each x coordinate is paired with a y-coordinate of value 0? I am wondering if there is a way specifically that doesn't require any hard coding, so that x could vary in length without causing failure.
As per your problem statement, the following is one way to do it.
# initialize an array of zeros
In [36]: res = np.zeros((2, *x.shape), dtype=x.dtype)
# fill `x` as first row
In [37]: res[0] = x
In [38]: res
Out[38]:
array([[1, 2, 3, 4],
[0, 0, 0, 0]])
When we initialize the array of zeros, we use 2 for axis-0 dimension since your requirement is to create a 2D array. For the column size we simply take the length from the x array. For reasonably larger arrays, this approach would be the fastest.
I'm looking to implement a hardware-efficient multiplication of a list of large matrices (on the order of 200,000 x 200,000). The matrices are very nearly the identity matrix, but with some elements changed to irrational numbers.
In an effort to reduce the memory footprint and make the computation go faster, I want to store the 0s and 1s of the identity as single bytes like so.
import numpy as np
size = 200000
large_matrix = np.identity(size, dtype=uint8)
and just change a few elements to a different data type.
import sympy as sp
# sympy object
irr1 = sp.sqrt(2)
# float
irr2 = e
large_matrix[123456, 100456] = irr1
large_matirx[100456, 123456] = irr2
Is is possible to hold only these elements of the matrix with a different data type, while all the other elements are still bytes? I don't want to have to change everything to a float just because I need one element to be a float.
-----Edit-----
If it's not possible in numpy, then how can I find a solution without numpy?
Maybe you can have a look at the SciPy's Coordinate-based sparse matrix. In that case SciPy creates a sparse matrix (optimized for such large empty matrices) and with its coordinate format you can access and modify the data as you intend.
From its documentation:
>>> from scipy.sparse import coo_matrix
>>> # Constructing a matrix using ijv format
>>> row = np.array([0, 3, 1, 0])
>>> col = np.array([0, 3, 1, 2])
>>> data = np.array([4, 5, 7, 9])
>>> m = coo_matrix((data, (row, col)), shape=(4, 4))
>>> m.toarray()
array([[4, 0, 9, 0],
[0, 7, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 5]])
It does not create a matrix but a set of coordinates with values, which takes much less space than just filling a matrix with zeros.
>>> from sys import getsizeof
>>> getsizeof(m)
56
>>> getsizeof(m.toarray())
176
By definition, NumPy arrays only have one dtype. You can see in the NumPy documentation:
A numpy array is homogeneous, and contains elements described by a dtype object. A dtype object can be constructed from different combinations of fundamental numeric types.
Further reading: https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html
I have three parameter arrays, each containing n parameter values. Now I need to draw m independent samples using the same parameter settings, and I was wondering if there is an efficient way of doing this?
Example:
p1 = [1, 2, 3, 4], p2 = [4,4,4,4], p3 = [6,7,7,5]
One sample would be generated as:
np.random.triangular(left=p1, mode=p2, right=p3)
resulting in
[3, 6, 3, 4.5]
But I would like to get m of those, in a single ndarray ideally.
A solution could of course be to initiate a sample ndarray of size [n, m] and fill each column using a loop. However, generating all random values simultaneously is generally quicker, hence I would like to figure out if that's possible.
NOTE:
adding the parameter 'size=(n,m)' does not work for array valued parameter values
It's true that strictly speaking, adding the parameter size=(n, m) doesn't work. But size=(m, n) does!
In general, in numpy sizes, the number of rows comes first.
>>> numpy.random.triangular(left=p1, mode=p2, right=p3, size=(10, 4))
array([[2.90526206, 3.90549642, 4.17820463, 4.49103927],
[4.128539 , 5.64750789, 4.2343925 , 4.14951323],
[4.55117141, 4.18380231, 4.94283228, 4.17310084],
[3.7047425 , 6.19969199, 3.9318881 , 4.73317286],
[5.0613046 , 4.88435654, 4.04345036, 4.41236136],
[3.6946254 , 2.28868213, 4.29268451, 4.61406735],
[4.26315216, 3.84219428, 4.79651309, 4.02510467],
[3.1213574 , 3.87407067, 4.20976142, 4.11963155],
[2.89005644, 4.43081604, 5.96604977, 4.0194683 ],
[5.28800737, 3.80200832, 4.45966515, 4.46419704]])
This can be generalized for arrays that broadcast in more complex ways. Here's an example that creates four distinct samples of a 2x2x2 array based on broadcasted parameters. Note that again, the first value is the number of samples, and the remaining ones describe the shape of each sample:
>>> numpy.random.triangular(a[:, None, None],
... a[None, :, None] + 2,
... a[None, None, :] + 4,
... size=(4, 2, 2, 2))
array([[[[1.96335621, 1.88351682],
[2.27347214, 3.23075503]],
[[2.53612351, 2.33322979],
[2.73651868, 2.7414705 ]]],
[[[3.80046148, 3.83468891],
[3.43258814, 3.33174839]],
[[3.05200913, 4.47039698],
[2.89013357, 1.99638614]]],
[[[1.91325759, 2.64773446],
[1.73132514, 3.47843725]],
[[1.88526414, 2.86937885],
[3.12001437, 1.58742945]]],
[[[0.58692663, 1.08249125],
[3.4744866 , 1.95300333]],
[[1.72887756, 2.68527515],
[1.95189437, 4.49416249]]]])
For example:
import numpy as np
a = np.array([[[1,2,3],[1,2,3]],[[4,5,6],[7,8,7]]])
print(a.shape)
# (2, 2, 3)
So, on every 2d grid (3 grids on the example above) i want ot compute the mean:
mean = [np.mean(a[:, :, i]) for i in range(3)]
print(mean)
# [3.25, 4.25, 4.75]
Is there a method in numpy that achieves this?
I tried using mean on the axis but the result is not as expected.
You can accomplish this using np.mean(axis = ...) and specifying a tuple of dimensions to average over
a.mean(axis=tuple(range(len(a.shape) - 1)))
This will compute the mean on every dimension/axis except the last one (note how the range of axis indices goes from 0 up to len - 1 (exclusive) thus ignoring the last axis.
This method is extensible to deeper arrays. For example if you have an array of shape (2, 6, 5, 4, 3), it will compute mean as a.mean(axis=(0, 1, 2, 3)) thus giving you an array of 3 means (corresponding to 3 elements in the last dimension)
I'm trying to efficiently replicate numpy's ndarray.choose() method.
Here's a numpy example of what I'm looking for:
b = np.arange(15).reshape(3, 5)
c = np.array([1,0,4])
c.choose(b.T) # trying to replicate in tensorflow
-> array([ 1, 5, 14])
The best I've been able to do with this is generate a batch_size square matrix (which is huge if batch size is huge) and take the diagonal of it:
tf_b = tf.constant(b)
tf_c = tf.constant(c)
sess.run(tf.diag_part(tf.gather(tf.transpose(tf_b), tf_c)))
-> array([ 1, 5, 14])
Is there a way to do this that is just linear in the first dimension (instead of squared)?
Yeah, there's an easier way to do this. Flatten your b array to 1-d, so it's [0, 1, 2, ..., 13, 14]. Take an array of indices that are in the range of the number of 'choices' you are taking (3 in your case). That will be [0, 1, 2]. Multiply this range by the second dimension of your original shape, which is the number of options for each choice (5 in your case). That gives you [0, 5, 10]. Then add your indices to this to obtain [1, 5, 14]. Now you're good to call tf.gather().
Here is some code that I've taken from here that does a similar thing for RNN outputs. Yours will be slightly different, but the idea is the same.
index = tf.range(0, batch_size) * max_length + (length - 1)
flat = tf.reshape(output, [-1, out_size])
relevant = tf.gather(flat, index)
return relevant
In a big picture, the operation is pretty straightforward. You use the range operation to get the index of the beginning of each row, then add the index of where you are in each row. I think doing it in 1D is easiest, so that's why we flatten it.