How can I randomly set elements to zero in TF? - python

The pure numpy solution is:
import numpy as np
data = np.random.rand(5,5) #data is of shape (5,5) with floats
masking_prob = 0.5 #probability of an element to get masked
indices = np.random.choice(np.prod(data.shape), replace=False, size=int(np.prod(data.shape)*masking_prob))
data[np.unravel_index(indices, data)] = 0. #set to zero
How can I achieve this in TensorFlow?

Use tf.nn.dropout:
import tensorflow as tf
import numpy as np
data = np.random.rand(5,5)
array([[0.38658212, 0.6896139 , 0.92139911, 0.45646086, 0.23185075],
[0.03461688, 0.22073962, 0.21254995, 0.20046708, 0.43419155],
[0.49012903, 0.45495968, 0.83753471, 0.58815975, 0.90212244],
[0.04071416, 0.44375078, 0.55758641, 0.31893155, 0.67403431],
[0.52348073, 0.69354454, 0.2808658 , 0.6628248 , 0.82305081]])
tf.nn.dropout(data, rate=prob).numpy()*(1-prob)
array([[0.38658212, 0.6896139 , 0.92139911, 0. , 0. ],
[0.03461688, 0. , 0. , 0.20046708, 0. ],
[0.49012903, 0.45495968, 0. , 0. , 0. ],
[0. , 0.44375078, 0.55758641, 0.31893155, 0. ],
[0.52348073, 0.69354454, 0.2808658 , 0.6628248 , 0. ]])
Dropout multiplies remaining values so I counter this by multiplying by (1-prob)

For further users looking for a TF 2.x compatible answer, this is what I came up with:
import tensorflow as tf
import numpy as np
input_tensor = np.random.rand(5,5).astype(np.float32)
def my_numpy_func(x):
# x will be a numpy array with the contents of the input to the
# tf.function
p = 0.5
indices = np.random.choice(np.prod(x.shape), replace=False, size=int(np.prod(x.shape)*p))
x[np.unravel_index(indices, x.shape)] = 0.
return x
#tf.function(input_signature=[tf.TensorSpec((None, None), tf.float32)])
def tf_function(input):
y = tf.numpy_function(my_numpy_func, [input], tf.float32)
return y
tf_function(tf.constant(input_tensor))
You can also use this is code in the context of a Dataset by using the map() operation.

Related

Standardising data of irregular shape (TypeError: only size-1 arrays can be converted to Python scalars)

So I have an array XsN of shape (590,) and I am trying to standardise the data.
This is an example of one of the 590 elements in my array:
print(XsN[:1])
[array([[ 0. , 0.27229556, -1.8033657 , ..., 0. ,
0. , 0. ],
[ 0. , 0.20665401, -1.9340569 , ..., 0. ,
0. , 0. ],
[ 4. , 0. , 0.04352444, ..., 0. ,
0. , 0. ],
...,
[10. , 0. , -0.5655 , ..., 0. ,
0. , 0. ],
[10. , 0. , 0.9150001 , ..., 0. ,
0. , 0. ],
[10. , 0. , 1.0005 , ..., 0. ,
0. , 0. ]], dtype=float32)]
I'm then reshaping it so that it has shape (590,1):
XsN_2 = XsN.reshape(-1,1)
Now when I use StandardScaler:
from sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(XsN_2)
I get the error that
TypeError: only size-1 arrays can be converted to Python scalars
and
ValueError: setting an array element with a sequence.
I understand it tries to find a number but instead it finds an ndarray but I'm not quite sure how to standardise data of shape (590,) where each element is its own ndarray.
Edit 1:
Referring to this csv file: https://gofile.io/?c=YGxCWQ
Here is some code with a sample data:
import pandas as pd
from sklearn.preprocessing import StandardScaler
imp = pd.read_csv('foo.csv', sep=',', header=None)
data = imp.values
print(data)
standardized_data = StandardScaler().fit_transform(data)
The error I get now is:
ValueError: could not convert string to float
Is there any way I can standardise this data?
Without access to your original data in the form of a valid .csv file it is a little difficult to debug this. From the look of what you printed it seems like XsN is a list of arrays, so you may want to loop through each in turn or convert it into an array with expanded dimensions.
Here is an example of standardizing some dummy data which I think resembles the structure of your data. Hope that helps.
n = 100
# Create feature 1
mean1 = 10
standard_dev1 = 2
col1 = np.random.normal(loc=mean1,scale=standard_dev1,size=[n,1])
# Create feature 2
mean2 = 20
standard_dev2 = 4
col2 = np.random.normal(loc=mean2,scale=standard_dev2,size=[n,1])
data = np.concatenate([col1,col2],axis=1)
print(f"means of raw data: {data.mean(axis=0)}")
>>>
means of raw data: [10.15783287 19.82541124]
print(f"standard devations of raw data: {data.std(axis=0)}")
>>>
standard devations of raw data: [2.00049111 3.87277793]
from sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(data)
print(f"means of standardized data: {standardized_data.mean(axis=0)}")
>>>
means of standardized data: [-6.92779167e-16 -1.78745907e-15]
print(f"standard devations of standardized data: {standardized_data.std(axis=0)}")
>>>
standard devations of standardized data: [1. 1.]

Min-max scaling along rows in numpy array

I have a numpy array and I want to rescale values along each row to values between 0 and 1 using the following procedure:
If the maximum value along a given row is X_max and the minimum value along that row is X_min, then the rescaled value (X_rescaled) of a given entry (X) in that row should become:
X_rescaled = (X - X_min)/(X_max - X_min)
As an example, let's consider the following array (arr):
arr = np.array([[1.0,2.0,3.0],[0.1, 5.1, 100.1],[0.01, 20.1, 1000.1]])
print arr
array([[ 1.00000000e+00, 2.00000000e+00, 3.00000000e+00],
[ 1.00000000e-01, 5.10000000e+00, 1.00100000e+02],
[ 1.00000000e-02, 2.01000000e+01, 1.00010000e+03]])
Presently, I am trying to use MinMaxscaler from scikit-learn in the following way:
from sklearn.preprocessing import MinMaxScaler
result = MinMaxScaler(arr)
But, I keep getting my initial array, i.e. result turns out to be the same as arr in the aforementioned method. What am I doing wrong?
How can I scale the array arr in the manner that I require (min-max scaling along each axis?) Thanks in advance.
MinMaxScaler is a bit clunky to use; sklearn.preprocessing.minmax_scale is more convenient. This operates along columns, so use the transpose:
>>> import numpy as np
>>> from sklearn import preprocessing
>>>
>>> a = np.random.random((3,5))
>>> a
array([[0.80161048, 0.99572497, 0.45944366, 0.17338664, 0.07627295],
[0.54467986, 0.8059851 , 0.72999058, 0.08819178, 0.31421126],
[0.51774372, 0.6958269 , 0.62931078, 0.58075685, 0.57161181]])
>>> preprocessing.minmax_scale(a.T).T
array([[0.78888024, 1. , 0.41673812, 0.10562126, 0. ],
[0.63596033, 1. , 0.89412757, 0. , 0.314881 ],
[0. , 1. , 0.62648851, 0.35384099, 0.30248836]])
>>>
>>> b = np.array([(4, 1, 5, 3), (0, 1.5, 1, 3)])
>>> preprocessing.minmax_scale(b.T).T
array([[0.75 , 0. , 1. , 0.5 ],
[0. , 0.5 , 0.33333333, 1. ]])

Predicting missing values in recommender System

I am trying to implement Non-negative Matrix Factorization so as to find the missing values of a matrix for a Recommendation Engine Project. I am using the nimfa library to implement matrix factorization. But can't seem to figure out how to predict the missing values.
The missing values in this matrix is represented by 0.
a=[[ 1. 0.45643546 0. 0.1 0.10327956 0.0225877 ]
[ 0.15214515 1. 0.04811252 0.07607258 0.23570226 0.38271325]
[ 0. 0.14433757 1. 0.07905694 0. 0.42857143]
[ 0.1 0.22821773 0.07905694 1. 0. 0.27105237]
[ 0.06885304 0.47140452 0. 0. 1. 0.13608276]
[ 0.00903508 0.4592559 0.17142857 0.10842095 0.08164966 1. ]]
import nimfa
model = nimfa.Lsnmf(a, max_iter=100000,rank =4)
#fit the model
fit = model()
#get U and V matrices from fit
U = fit.basis()
V = fit.coef()
print numpy.dot(U,V)
But the ans given is nearly same as a and I can't predict the zero values.
Please tell me which method to use or any other implementations possible and any possible resources.
I want to use this function to minimize the error in predicting the values.
error=|| a - UV ||_F + c*||U||_F + c*||V||_F
where _F denotes the frobenius norm
I have not used nimfa before so I cannot answer on exactly how to do that, but with sklearn you can perform a preprocessor to transform the missing values, like this:
In [28]: import numpy as np
In [29]: from sklearn.preprocessing import Imputer
# prepare a numpy array
In [30]: a = np.array(a)
In [31]: a
Out[31]:
array([[ 1. , 0.45643546, 0. , 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0. , 0.14433757, 1. , 0.07905694, 0. ,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0. ,
0.27105237],
[ 0.06885304, 0.47140452, 0. , 0. , 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
In [32]: pre = Imputer(missing_values=0, strategy='mean')
# transform missing_values as "0" using mean strategy
In [33]: pre.fit_transform(a)
Out[33]:
array([[ 1. , 0.45643546, 0.32464951, 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0.26600665, 0.14433757, 1. , 0.07905694, 0.35515787,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0.35515787,
0.27105237],
[ 0.06885304, 0.47140452, 0.32464951, 0.27271009, 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
You can read more here.

List indices must be integers, not tuple?

I have been given a .mat file which is 1024*1024*360 i.e., a 3D object. I have divided the data in to three .mat files A,B and C. All three of them are 1024*1024*120 . I am loading them to a matrix 'mat' which is 1024*360 . I am loading each one of them one by one and then deleting them to make space. Basically it's just a 2D slice of the 3D object at the point 240. Later I am trying to plot the image. Following is my code :
import scipy.io
import numpy as np
mat = np.zeros((1024,360))
x = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/A.mat')
x = x.values()
mat[:,0:120]= x[240,:,:]
del x
y = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/B.mat')
y = y.values()
mat[:,120:240]= y[240,:,:]
del y
z = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/C.mat')
z = z.values()
mat[:,240:360]= z[240,:,:]
del z
import matplotlib.py as plt
imageplot = plt.imshow(matrix)
I am getting this error :
mat[:,0:120]= x[240,:,:]
TypeError: List indices must be integers, not tuple
Can anyone suggest what I am doing wrong here?
You have to create a numpy array from the original x matrix.
This is why the normal python array doesn't accept the numpy type fancy indexing, like matrix[x,y,z] only like matrix[x][y][z].
import scipy.io
import numpy as np
mat = np.zeros((1024,360))
x = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/A.mat')
x = np.array((x.values()))
mat[:,0:120]= x[240,:,:]
del x
y = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/B.mat')
y = np.array((y.values()))
mat[:,120:240]= y[240,:,:]
del y
z = scipy.io.loadmat('/home/imaging/Desktop/PRAKRITI/Project/C.mat')
z = np.array((z.values()))
mat[:,240:360]= z[240,:,:]
del z
import matplotlib.py as plt
imageplot = plt.imshow(matrix)
Alternately you can use x[240][:][:] instead of x[240,:,:]
Glad to have been of help! Feel free to accept my answer if you feel it was useful to you. :-)
continuing:
Because the following code worked fine, i guess the problem is somewhere at the loaded matrixs' dimensions i.e. x.values() etc. So please check it first, with print x.shape().
import numpy as np
mat = np.zeros((1024,360))
x = np.zeros((1024,1024,120))
mat[:,0:120] = x[240,:,:]
print mat
[[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]

How to fill upper triangle of numpy array with zeros in place?

What is the best way to fill in the lower triangle of a numpy array with zeros in place so that I don't have to do the following:
a=np.random.random((5,5))
a = np.triu(a)
since np.triu returns a copy, not a view. Preferable this would require no list indexing as well since I am working with large arrays.
Digging into the internals of triu you'll find that it just multiplies the input by the output of tri.
So you can just multiply the array in-place by the output of tri:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape)
>>> a
array([[ 0.46026582, 0. , 0. , 0. , 0. ],
[ 0.76234296, 0.5298908 , 0. , 0. , 0. ],
[ 0.08797149, 0.14881991, 0.9302515 , 0. , 0. ],
[ 0.54794779, 0.36896506, 0.92901552, 0.73747726, 0. ],
[ 0.62917827, 0.61674542, 0.44999905, 0.80970863, 0.41860336]])
Like triu, this still creates a second array (the output of tri), but at least it performs the operation itself in-place. The splat is a bit of a shortcut; consider basing your function on the full version of triu for something robust. But note that you can still specify a diagonal:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape, k=2)
>>> a
array([[ 0.25473126, 0.70156073, 0.0973933 , 0. , 0. ],
[ 0.32859487, 0.58188318, 0.95288351, 0.85735005, 0. ],
[ 0.52591784, 0.75030515, 0.82458369, 0.55184033, 0.01341398],
[ 0.90862183, 0.33983192, 0.46321589, 0.21080121, 0.31641934],
[ 0.32322392, 0.25091433, 0.03980317, 0.29448128, 0.92288577]])
I now see that the question title and body describe opposite behaviors. Just in case, here's how you can fill the lower triangle with zeros. This requires you to specify the -1 diagonal:
>>> a = np.random.random((5, 5))
>>> a *= 1 - np.tri(*a.shape, k=-1)
>>> a
array([[0.6357091 , 0.33589809, 0.744803 , 0.55254798, 0.38021111],
[0. , 0.87316263, 0.98047459, 0.00881754, 0.44115527],
[0. , 0. , 0.51317289, 0.16630385, 0.1470729 ],
[0. , 0. , 0. , 0.9239731 , 0.11928557],
[0. , 0. , 0. , 0. , 0.1840326 ]])
If speed and memory use are still a limitation and Cython is available, a short Cython function will do what you want.
Here's a working version designed for a C-contiguous array with double precision values.
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef make_lower_triangular(double[:,:] A, int k):
""" Set all the entries of array A that lie above
diagonal k to 0. """
cdef int i, j
for i in range(min(A.shape[0], A.shape[0] - k)):
for j in range(max(0, i+k+1), A.shape[1]):
A[i,j] = 0.
This should be significantly faster than any version that involves multiplying by a large temporary array.
import numpy as np
n=3
A=np.zeros((n,n))
for p in range(n):
A[0,p] = p+1
if p >0 :
A[1,p]=p+3
if p >1 :
A[2,p]=p+4
creates a upper triangular matrix starting at 1

Categories