Is there an efficient way to generate multinomial random variables in parallel? - python

numpy.random has the following function to generate multinomial random samples.
multinomial(n, p, size)
But I wonder if there is an efficient way to generate multinomial samples for different parameters n and p. For example,
n = np.array([[10],
[20]])
p = np.array([[0.1, 0.2, 0.7],
[0.4, 0.4, 0.2]])
and even for higher dimension n and p like these:
n = np.array([[[10],
[20]],
[[10],
[20]]])
p = np.array([[[0.1, 0.2, 0.7],
[0.1, 0.2, 0.7]],
[[0.3, 0.2, 0.5],
[0.4, 0.1, 0.5]]])
I know for the univariate random variable, we can do this kind of things, but don't know how to do it for multinomial in python.

Related

How to build OneHot Decoder in python

I have encoded my images(masks) with dimensions (img_width x img_height x 1) with OneHotEncoder in this way:
import numpy as np
def OneHotEncoding(im,n_classes):
one_hot = np.zeros((im.shape[0], im.shape[1], n_classes),dtype=np.uint8)
for i, unique_value in enumerate(np.unique(im)):
one_hot[:, :, i][im == unique_value] = 1
return one_hot
After doing some data manipulation with deep learning, softmax activation function will result in probabilities instead of 0 and 1 values, so in my Decoder I wanted to implement the following approach:
Threshold the output to obtain 0 or 1 only.
Multiply each channel with weight equal to the channel index.
take the max between labels along channels axis.
import numpy as np
arr = np.array([
[[0.1,0.2,0,5],[0.2,0.4,0.7],[0.3,0.5,0.8]],
[[0.3,0.6,0 ],[0.4,0.9,0.1],[0 ,0 ,0.2]],
[[0.7,0.1,0.1],[0,6,0.1,0.1],[0.6,0.6,0.3]],
[[0.6,0.2,0.3],[0.4,0.5,0.3],[0.1,0.2,0.7]]
])
# print(arr.dtype,arr.shape)
def oneHotDecoder(img):
# Thresholding
img[img<0.5]=0
img[img>=0.5]=1
# weigts of the labels
img = [i*img[:,:,i] for i in range(img.shape[2])]
# take the max label
img = np.amax(img,axis=2)
print(img.shape)
return img
arr2 = oneHotDecoder(arr)
print(arr2)
My questions is:
How to git rid of the error:
line 15, in oneHotDecoder
img[img<0.5]=0 TypeError: '<' not supported between instances of 'list' and 'float'
Is there any other issues in my implementaion that you suggest to improve?
Thanks in advance.
You have typos with commas and dots with some of your items (e.g. your first list should be [0.1, 0.2, 0.5] instead of [0.1, 0.2, 0, 5]).
The fixed list is:
l = [
[[0.1,0.2,0.5],[0.2,0.4,0.7],[0.3,0.5,0.8]],
[[0.3,0.6,0 ],[0.4,0.9,0.1],[0 ,0 ,0.2]],
[[0.7,0.1,0.1],[0.6,0.1,0.1],[0.6,0.6,0.3]],
[[0.6,0.2,0.3],[0.4,0.5,0.3],[0.1,0.2,0.7]]
]
Then you could do:
np.array(l) # np.dstack(l) would work as well
Which would yield:
array([[[0.1, 0.2, 0.5],
[0.2, 0.4, 0.7],
[0.3, 0.5, 0.8]],
[[0.3, 0.6, 0. ],
[0.4, 0.9, 0.1],
[0. , 0. , 0.2]],
[[0.7, 0.1, 0.1],
[0.6, 0.1, 0.1],
[0.6, 0.6, 0.3]],
[[0.6, 0.2, 0.3],
[0.4, 0.5, 0.3],
[0.1, 0.2, 0.7]]])

How to L2 Normalize a list of lists in Python using Sklearn

s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
from sklearn.preprocessing import normalize
X = normalize(s2)
this is throwing error:
ValueError: setting an array element with a sequence.
How to L2 Normalize a list of lists in Python using Sklearn.
Since I don't have enough reputation to comment; hence posting it as an answer.
Let's quickly look at your datapoint.
I have converted the given datapoint into NumPy array. Since it doesn't have the same length, so it will look like.
>>> n2 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n2
array([list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]),
list([0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831]),
list([0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925]),
list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194])],
dtype=object)
And you can see here that converted values are not in Sequence of Values and to achieve this you need to keep the same length for the internal list ( looks like 0.16666666666666666 is copied multiple time in your array; if not then fix the length), it will look like
>>> n3 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.319381788645692], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n3
array([[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871],
[0.2 , 0.48925742, 0.2 , 0.2 , 0.38325815],
[0.31938179, 0.16666667, 0.16666667, 0.16666667, 0.31938179],
[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871]])
As you can see now n3 has become a sequence of values.
and if you use normalize function, it simply works
>>> X = normalize(n3)
>>> X
array([[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139],
[0.28108867, 0.6876236 , 0.28108867, 0.28108867, 0.53864762],
[0.59581303, 0.31091996, 0.31091996, 0.31091996, 0.59581303],
[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139]])
How to use NumPy array to avoid this issue, please have a look at this SO link ValueError: setting an array element with a sequence
Important: I removed one element from the 3rd list in order for all lists to have the same length.
I did that cause I really believe that it's a copy-paste error. If not, comment below and I will modify my answer.
import numpy as np
s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
X = normalize(np.array(s2))

Selecting 3 random elements from an array in python

I am trying to select three random elements from within a array.
I currently have implemented:
result= np.random.uniform(np.min(dataset[:,1]), np.max(dataset[:,1]), size=3
Which returns three random floats between the min and max range. I am struggling finding a way to select random elements within an array, instead of a random float which may not exist as an element inside the array.
I have also tried:
result = random.choice(dataset[:,0])
Which only returns a single element, is it possible to return 3 with this function
You can use random.sample(), if you want to sample without replacement, ie. the same element can't be picked twice.
>>> import random
>>> l = [0.3, 0.2, 0.1, 0.4, 0.5, 0.6]
>>> random.sample(l, 3)
[0.3, 0.5, 0.1]
If you want to sample with replacement, you can random.choices()
>>> import random
>>> l = [0.3, 0.2, 0.1, 0.4, 0.5, 0.6]
>>> random.choices(l, k=3)
[0.3, 0.5, 0.3]
You can use random.choices instead:
result = random.choices(dataset[:,0], k=3)

Sample from a 2d probability numpy array?

Say that I have an 2d array ar like this:
0.9, 0.1, 0.3
0.4, 0.5, 0.1
0.5, 0.8, 0.5
And I want to sample from [1, 0] according to this probability array.
rdchoice = lambda x: numpy.random.choice([1, 0], p=[x, 1-x])
I have tried two methods:
1) reshape it into a 1d array first and use numpy.random.choice and then reshape it back to 2d:
np.array(list(map(rdchoice, ar.reshape((-1,))))).reshape(ar.shape)
2) use the vectorize function.
func = numpy.vectorize(rdchoice)
func(ar)
But these two ways are all too slow, and I learned that the nature of the vectorize is a for-loop and in my experiments, I found that map is no faster than vectorize.
I thought this can be done faster. If the 2d array is large it would be unbearably slow.
You should be able to do this like so:
>>> p = np.array([[0.9, 0.1, 0.3], [0.4, 0.5, 0.1], [0.5, 0.8, 0.5]])
>>> (np.random.rand(*p.shape) < p).astype(int)
Actually I can use the np.random.binomial:
import numpy as np
p = [[0.9, 0.1, 0.3],
[0.4, 0.5, 0.1],
[0.5, 0.8, 0.5]]
np.random.binomial(1, p)

Linear algebra in Numpy

I am working on matrix multiplications in NumPy using np.dot(). As the data set is very large, I would like to reduce the overall run time as far as possible - i.e. perform as little as possible np.dot() products.
Specifically, I need to calculate the overall matrix product as well as the associated flow from each element of my values vector.
Is there a way in NumPy to calculate all of this together in one or two np.dot() products?
In the code below, is there a way to reduce the number of np.dot() products and still get the same output?
import pandas as pd
import numpy as np
vector = pd.DataFrame([1, 2, 3],
['A', 'B', 'C'], ["Values"])
matrix = pd.DataFrame([[0.5, 0.4, 0.1],
[0.2, 0.6, 0.2],
[0.1, 0.3, 0.6]],
index = ['A', 'B', 'C'], columns = ['A', 'B', 'C'])
# Can the number of matrix multiplications in this part be reduced?
overall = np.dot(vector.T, matrix)
from_A = np.dot(vector.T * [1,0,0], matrix)
from_B = np.dot(vector.T * [0,1,0], matrix)
from_C = np.dot(vector.T * [0,0,1], matrix)
print("Overall:", overall)
print("From A:", from_A)
print("From B:", from_B)
print("From C:", from_C)
If the vectors you use to select the row are indeed the unit vectors, you are much better off not doing matrix multiplication at all for from_A, from_B, from_C. Matrix multiplication requires a lot more addition and multiplications than you need to just multiply each row of the matrix by it's corresponding entry in the vector:
from_ABC = matrix.values * vector.values
You will only need a single call to np.dot to get overall.
You could define a 3 x 3 shaped 2D array of those scaling values and perform matrix-multiplication, like so -
scale = np.array([[1,0,0],[0,1,0],[0,0,1]])
from_ABC = np.dot(vector.values.ravel()*scale,matrix)
Sample run -
In [901]: from_A
Out[901]: array([[ 0.5, 0.4, 0.1]])
In [902]: from_B
Out[902]: array([[ 0.9, 1.6, 0.5]])
In [903]: from_C
Out[903]: array([[ 0.8, 1.3, 1.9]])
In [904]: from_ABC
Out[904]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])
Here's an alternative with np.einsum to do all those in one step -
np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
Sample run -
In [915]: np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
Out[915]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])

Categories