Sample from a 2d probability numpy array? - python

Say that I have an 2d array ar like this:
0.9, 0.1, 0.3
0.4, 0.5, 0.1
0.5, 0.8, 0.5
And I want to sample from [1, 0] according to this probability array.
rdchoice = lambda x: numpy.random.choice([1, 0], p=[x, 1-x])
I have tried two methods:
1) reshape it into a 1d array first and use numpy.random.choice and then reshape it back to 2d:
np.array(list(map(rdchoice, ar.reshape((-1,))))).reshape(ar.shape)
2) use the vectorize function.
func = numpy.vectorize(rdchoice)
func(ar)
But these two ways are all too slow, and I learned that the nature of the vectorize is a for-loop and in my experiments, I found that map is no faster than vectorize.
I thought this can be done faster. If the 2d array is large it would be unbearably slow.

You should be able to do this like so:
>>> p = np.array([[0.9, 0.1, 0.3], [0.4, 0.5, 0.1], [0.5, 0.8, 0.5]])
>>> (np.random.rand(*p.shape) < p).astype(int)

Actually I can use the np.random.binomial:
import numpy as np
p = [[0.9, 0.1, 0.3],
[0.4, 0.5, 0.1],
[0.5, 0.8, 0.5]]
np.random.binomial(1, p)

Related

Is there an efficient way to generate multinomial random variables in parallel?

numpy.random has the following function to generate multinomial random samples.
multinomial(n, p, size)
But I wonder if there is an efficient way to generate multinomial samples for different parameters n and p. For example,
n = np.array([[10],
[20]])
p = np.array([[0.1, 0.2, 0.7],
[0.4, 0.4, 0.2]])
and even for higher dimension n and p like these:
n = np.array([[[10],
[20]],
[[10],
[20]]])
p = np.array([[[0.1, 0.2, 0.7],
[0.1, 0.2, 0.7]],
[[0.3, 0.2, 0.5],
[0.4, 0.1, 0.5]]])
I know for the univariate random variable, we can do this kind of things, but don't know how to do it for multinomial in python.

How to build OneHot Decoder in python

I have encoded my images(masks) with dimensions (img_width x img_height x 1) with OneHotEncoder in this way:
import numpy as np
def OneHotEncoding(im,n_classes):
one_hot = np.zeros((im.shape[0], im.shape[1], n_classes),dtype=np.uint8)
for i, unique_value in enumerate(np.unique(im)):
one_hot[:, :, i][im == unique_value] = 1
return one_hot
After doing some data manipulation with deep learning, softmax activation function will result in probabilities instead of 0 and 1 values, so in my Decoder I wanted to implement the following approach:
Threshold the output to obtain 0 or 1 only.
Multiply each channel with weight equal to the channel index.
take the max between labels along channels axis.
import numpy as np
arr = np.array([
[[0.1,0.2,0,5],[0.2,0.4,0.7],[0.3,0.5,0.8]],
[[0.3,0.6,0 ],[0.4,0.9,0.1],[0 ,0 ,0.2]],
[[0.7,0.1,0.1],[0,6,0.1,0.1],[0.6,0.6,0.3]],
[[0.6,0.2,0.3],[0.4,0.5,0.3],[0.1,0.2,0.7]]
])
# print(arr.dtype,arr.shape)
def oneHotDecoder(img):
# Thresholding
img[img<0.5]=0
img[img>=0.5]=1
# weigts of the labels
img = [i*img[:,:,i] for i in range(img.shape[2])]
# take the max label
img = np.amax(img,axis=2)
print(img.shape)
return img
arr2 = oneHotDecoder(arr)
print(arr2)
My questions is:
How to git rid of the error:
line 15, in oneHotDecoder
img[img<0.5]=0 TypeError: '<' not supported between instances of 'list' and 'float'
Is there any other issues in my implementaion that you suggest to improve?
Thanks in advance.
You have typos with commas and dots with some of your items (e.g. your first list should be [0.1, 0.2, 0.5] instead of [0.1, 0.2, 0, 5]).
The fixed list is:
l = [
[[0.1,0.2,0.5],[0.2,0.4,0.7],[0.3,0.5,0.8]],
[[0.3,0.6,0 ],[0.4,0.9,0.1],[0 ,0 ,0.2]],
[[0.7,0.1,0.1],[0.6,0.1,0.1],[0.6,0.6,0.3]],
[[0.6,0.2,0.3],[0.4,0.5,0.3],[0.1,0.2,0.7]]
]
Then you could do:
np.array(l) # np.dstack(l) would work as well
Which would yield:
array([[[0.1, 0.2, 0.5],
[0.2, 0.4, 0.7],
[0.3, 0.5, 0.8]],
[[0.3, 0.6, 0. ],
[0.4, 0.9, 0.1],
[0. , 0. , 0.2]],
[[0.7, 0.1, 0.1],
[0.6, 0.1, 0.1],
[0.6, 0.6, 0.3]],
[[0.6, 0.2, 0.3],
[0.4, 0.5, 0.3],
[0.1, 0.2, 0.7]]])

Python - matrix multiplication code problem

I have this exercise where I get to build a simple neural network with one input layer and one hidden layer... I made the code below to perform a simple matrix multiplication, but it's not doing it properly as when I do the multiplication by hand. What am I doing wrong in my code?
#toes %win #fans
ih_wgt = ([0.1, 0.2, -0.1], #hid[0]
[-0.1, 0.1, 0.9], #hid[1]
[0.1, 0.4, 0.1]) #hid[2]
#hid[0] hid[1] #hid[2]
ho_wgt = ([0.3, 1.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1]) #sad?
weights = [ih_wgt, ho_wgt]
def w_sum(a,b):
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
def vect_mat_mul(vec, mat):
assert(len(vec) == len(mat))
output = [0, 0, 0]
for i in range(len(vec)):
output[i]= w_sum(vec, mat[i])
return output
def neural_network(input, weights):
hid = vect_mat_mul(input, weights[0])
pred = vect_mat_mul(hid, weights[1])
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
input = [toes[0],wlrec[0],nfans[0]]
pred = neural_network(input, weights)
print(pred)
the output of my code is:
[0.258, 0, 0]
The way I attempted to solve it by hand is as follows:
I multiplied the input vector [8.5, 0.65, 1.2] with the input weight matrix
ih_wgt = ([0.1, 0.2, -0.1], #hid[0]
[-0.1, 0.1, 0.9], #hid[1]
[0.1, 0.4, 0.1]) #hid[2]
[0.86, 0.295, 1.23]
the output vector is then fed into the network as an input vector which is then multiplied by the hidden weight matrix
ho_wgt = ([0.3, 1.1, -0.3], #hurt?
[0.1, 0.2, 0.0], #win?
[0.0, 1.3, 0.1]) #sad?
the correct output prediction:
[0.2135, 0.145, 0.5065]
Your help would be much appreciated!
You're almost there! Only a simple indentation thing is the reason:
def vect_mat_mul(vec, mat):
assert(len(vec) == len(mat))
output = [0, 0, 0]
for i in range(len(vec)):
output[i]= w_sum(vec, mat[i])
return output # <-- This one was inside the for loop

Selecting 3 random elements from an array in python

I am trying to select three random elements from within a array.
I currently have implemented:
result= np.random.uniform(np.min(dataset[:,1]), np.max(dataset[:,1]), size=3
Which returns three random floats between the min and max range. I am struggling finding a way to select random elements within an array, instead of a random float which may not exist as an element inside the array.
I have also tried:
result = random.choice(dataset[:,0])
Which only returns a single element, is it possible to return 3 with this function
You can use random.sample(), if you want to sample without replacement, ie. the same element can't be picked twice.
>>> import random
>>> l = [0.3, 0.2, 0.1, 0.4, 0.5, 0.6]
>>> random.sample(l, 3)
[0.3, 0.5, 0.1]
If you want to sample with replacement, you can random.choices()
>>> import random
>>> l = [0.3, 0.2, 0.1, 0.4, 0.5, 0.6]
>>> random.choices(l, k=3)
[0.3, 0.5, 0.3]
You can use random.choices instead:
result = random.choices(dataset[:,0], k=3)

Linear algebra in Numpy

I am working on matrix multiplications in NumPy using np.dot(). As the data set is very large, I would like to reduce the overall run time as far as possible - i.e. perform as little as possible np.dot() products.
Specifically, I need to calculate the overall matrix product as well as the associated flow from each element of my values vector.
Is there a way in NumPy to calculate all of this together in one or two np.dot() products?
In the code below, is there a way to reduce the number of np.dot() products and still get the same output?
import pandas as pd
import numpy as np
vector = pd.DataFrame([1, 2, 3],
['A', 'B', 'C'], ["Values"])
matrix = pd.DataFrame([[0.5, 0.4, 0.1],
[0.2, 0.6, 0.2],
[0.1, 0.3, 0.6]],
index = ['A', 'B', 'C'], columns = ['A', 'B', 'C'])
# Can the number of matrix multiplications in this part be reduced?
overall = np.dot(vector.T, matrix)
from_A = np.dot(vector.T * [1,0,0], matrix)
from_B = np.dot(vector.T * [0,1,0], matrix)
from_C = np.dot(vector.T * [0,0,1], matrix)
print("Overall:", overall)
print("From A:", from_A)
print("From B:", from_B)
print("From C:", from_C)
If the vectors you use to select the row are indeed the unit vectors, you are much better off not doing matrix multiplication at all for from_A, from_B, from_C. Matrix multiplication requires a lot more addition and multiplications than you need to just multiply each row of the matrix by it's corresponding entry in the vector:
from_ABC = matrix.values * vector.values
You will only need a single call to np.dot to get overall.
You could define a 3 x 3 shaped 2D array of those scaling values and perform matrix-multiplication, like so -
scale = np.array([[1,0,0],[0,1,0],[0,0,1]])
from_ABC = np.dot(vector.values.ravel()*scale,matrix)
Sample run -
In [901]: from_A
Out[901]: array([[ 0.5, 0.4, 0.1]])
In [902]: from_B
Out[902]: array([[ 0.9, 1.6, 0.5]])
In [903]: from_C
Out[903]: array([[ 0.8, 1.3, 1.9]])
In [904]: from_ABC
Out[904]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])
Here's an alternative with np.einsum to do all those in one step -
np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
Sample run -
In [915]: np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
Out[915]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])

Categories