The objective is randomly assign a constant value to tril of a numpy array.
I wonder whether there is more efficient and compact than the proposed solution below.
import numpy as np
import random
rand_n2 = np.random.randn(10,10)
arr=np.tril(rand_n2,-1)
n=np.where(arr!=0)
nsize=n[0].shape[0]
rand_idx = random.sample(range(1,nsize), nsize-1)
ndrop=2 # Total location to assign the contant value
for idx in range(ndrop):
arr[n[0][rand_idx[idx]],n[1][rand_idx[idx]]]=10 # Assign constant value to random tril location
You could initialize a matrix with random numbers, and overwrite the upper triangle the you take random indexes from the lower triangle indexes and overwrite them:
import numpy as np
# create the matrix with random values
size = 5
arr = np.random.rand(size, size)
arr[np.triu_indices(size, k=0)] = 0
# set values randomly
val = 10
k_max = 2
ix = np.random.choice(range(int((size*size-size)/2)), k_max)
rnd = np.tril_indices(size, k=-1)
arr[(rnd[0][ix], rnd[1][ix])] = val
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 0.50754565, 0. , 0. , 0. , 0. ],
[ 0.98920062, 0.53945212, 0. , 0. , 0. ],
[ 0.54987252, 10. , 0.22052519, 0. , 0. ],
[10. , 0.82057924, 0.86199411, 0.85397047, 0. ]])
Don't know if this is much more efficient and compact, but I feel it's a bit cleaner and easier to read:
import numpy as np
rand_n2 = np.random.randn(10,10)
arr=np.tril(rand_n2,-1)
# create list of lower trianguler indices
tril_idx = [(i,j) for i in range(1,10) for j in range(i)]
# shuffle indices i.e. draw two at random
np.random.shuffle(tril_idx)
ndrop = 2 # Total location to assign the contant value
for idx in tril_idx[:ndrop]:
arr[idx] = 10 # Assign constant value to random tril location
Instead of using the double list comprehension to create the list of lower triangular indices, you can use np.tril_indices() as well. Just take care since this will return a tuple of arrays of rather than a array of tuples.
Related
I'm trying to scale the following NumPy array based on its minimum and maximum values.
array = [[17405.051 17442.4 17199.6 17245.65 ]
[17094.949 17291.75 17091.15 17222.75 ]
[17289. 17294.9 17076.551 17153. ]
[17181.85 17235.1 17003.9 17222. ]]
Formula used is:
m=(x-xmin)/(xmax-xmin)
wherein m is an individually scaled item, x is an individual item, xmax is the highest value and xmin is the smallest value of the array.
My question is how do I print the scaled array?
P.S. - I can't use MinMaxScaler as I need to scale a given number (outside the array) by plugging it in the mentioned formula with xmin & xmax of the given array.
I tried scaling the individual items by iterating over the array but I'm unable to put together the scaled array.
I'm new to NumPy, any suggestions would be welcome.
Thank you.
Use method ndarray.min(), ndarray.max() or ndarray.ptp()(gets the range of the values in the array):
>>> ar = np.array([[17405.051, 17442.4, 17199.6, 17245.65 ],
... [17094.949, 17291.75, 17091.15, 17222.75 ],
... [17289., 17294.9, 17076.551, 17153. ],
... [17181.85, 17235.1, 17003.9, 17222. ]])
>>> min_val = ar.min()
>>> range_val = ar.ptp()
>>> (ar - min_val) / range_val
array([[0.91482554, 1. , 0.44629418, 0.55131129],
[0.2076374 , 0.65644242, 0.19897377, 0.4990878 ],
[0.65017104, 0.663626 , 0.16568073, 0.34002281],
[0.40581528, 0.527252 , 0. , 0.49737742]])
I think you should learn more about the basic operation of numpy.
import numpy as np
array_list = [[17405.051, 17442.4, 17199.6, 17245.65 ],
[17094.949, 17291.75, 17091.15, 17222.75 ],
[17289., 17294.9, 17076.551, 17153., ],
[17181.85, 17235.1, 17003.9, 17222. ]]
# Convert list into numpy array
array = np.array(array_list)
# Create empty list
scaled_array_list=[]
for x in array:
m = (x - np.min(array))/(np.max(array)-np.min(array))
scaled_array_list.append(m)
# Convert list into numpy array
scaled_array = np.array(scaled_array_list)
scaled_array
My version is by iterating over the array as you said.
You can also put everything in a function and use it in future:
def scaler(array_to_scale):
# Create empty list
scaled_array_list=[]
for x in array:
m = (x - np.min(array))/(np.max(array)-np.min(array))
scaled_array_list.append(m)
# Convert list into numpy array
scaled_array = np.array(scaled_array_list)
return scaled_array
# Here it is our input
array_list = [[17405.051, 17442.4, 17199.6, 17245.65 ],
[17094.949, 17291.75, 17091.15, 17222.75 ],
[17289., 17294.9, 17076.551, 17153., ],
[17181.85, 17235.1, 17003.9, 17222. ]]
# Convert list into numpy array
array = np.array(array_list)
scaler(array)
Output:
Out:
array([[0.91482554, 1. , 0.44629418, 0.55131129],
[0.2076374 , 0.65644242, 0.19897377, 0.4990878 ],
[0.65017104, 0.663626 , 0.16568073, 0.34002281],
[0.40581528, 0.527252 , 0. , 0.49737742]])
I have a squared symetric matrix like this:
[[-1. -0.70710678 -0.70710678 -0.70710678 -0. ]
[-0.70710678 -1. -1. -1. -0. ]
[-0.70710678 -1. -1. -1. -0. ]
[-0.70710678 -1. -1. -1. -0. ]
[-0. -0. -0. -0. -1. ]]
I would like to analyze all numbers below or above the diagonal, but not the diagonal.
What can I do to find unique values from this matrix except the values in the diagonal?
Expected output : [-0., -0.70710678]
You can get the values of the diagonal using arr.diagonal() and np.unique and remove them from the values of the array
unique = np.unique(arr)
index = np.ravel([np.where(unique == i) for i in np.unique(arr.diagonal())])
values = np.delete(unique, index)
print(values) # [-0.70710678 -0. ]
If a is the name of the numpy array with the representation you provided, then
print(np.array(np.setdiff1d(a, a.diagonal())))
does the trick with output
[-0.70710678 0. ]
(Original Answer) Alternatively,
import numpy as np
b = np.unique(a[~np.eye(a.shape[0],dtype=bool)].reshape(a.shape[0],-1))
print(b)
print(np.setdiff1d(b, a.diagonal()))
Printing b will output the unique values in the array a with the main diagonal elements deleted. The next line removes those numbers in the diagonal of a that are in b.
The output is
[-1. -0.70710678 0. ]
[-0.70710678 0. ]
You can use python sets, assuming a the input:
b = np.array(list(set(a.flatten())-set(np.diagonal(a))))
output: array([-0.70710678, -0. ])
NB. this is faster for small arrays (the provided 25 items example) and roughly as fast as numpy operations for larger arrays (tested on 1M (1000x1000) and 100M (10k x 10k) items with 1000 unique possibilities)
timing:
code for the perfplot:
import numpy as np
import perfplot
def guy(a):
unique = np.unique(a)
index = np.ravel([np.where(unique == i) for i in np.unique(a.diagonal())])
values = np.delete(unique, index)
return values
def mozway(a):
return np.array(list(set(a.flatten())-set(np.diagonal(a))))
def oda(a):
b = np.unique(a[~np.eye(a.shape[0],dtype=bool)].reshape(a.shape[0],-1))
return np.setdiff1d(b, a.diagonal())
def oda_setdiff(a):
return np.array(np.setdiff1d(a, a.diagonal()))
perfplot.show(
setup=lambda n: np.random.randint(0,1000, size=(n,n)),
kernels=[guy, oda, oda_setdiff, mozway],
n_range=[2**k for k in range(11)],
xlabel="array shape in each dimension",
equality_check=None,
)
The pure numpy solution is:
import numpy as np
data = np.random.rand(5,5) #data is of shape (5,5) with floats
masking_prob = 0.5 #probability of an element to get masked
indices = np.random.choice(np.prod(data.shape), replace=False, size=int(np.prod(data.shape)*masking_prob))
data[np.unravel_index(indices, data)] = 0. #set to zero
How can I achieve this in TensorFlow?
Use tf.nn.dropout:
import tensorflow as tf
import numpy as np
data = np.random.rand(5,5)
array([[0.38658212, 0.6896139 , 0.92139911, 0.45646086, 0.23185075],
[0.03461688, 0.22073962, 0.21254995, 0.20046708, 0.43419155],
[0.49012903, 0.45495968, 0.83753471, 0.58815975, 0.90212244],
[0.04071416, 0.44375078, 0.55758641, 0.31893155, 0.67403431],
[0.52348073, 0.69354454, 0.2808658 , 0.6628248 , 0.82305081]])
tf.nn.dropout(data, rate=prob).numpy()*(1-prob)
array([[0.38658212, 0.6896139 , 0.92139911, 0. , 0. ],
[0.03461688, 0. , 0. , 0.20046708, 0. ],
[0.49012903, 0.45495968, 0. , 0. , 0. ],
[0. , 0.44375078, 0.55758641, 0.31893155, 0. ],
[0.52348073, 0.69354454, 0.2808658 , 0.6628248 , 0. ]])
Dropout multiplies remaining values so I counter this by multiplying by (1-prob)
For further users looking for a TF 2.x compatible answer, this is what I came up with:
import tensorflow as tf
import numpy as np
input_tensor = np.random.rand(5,5).astype(np.float32)
def my_numpy_func(x):
# x will be a numpy array with the contents of the input to the
# tf.function
p = 0.5
indices = np.random.choice(np.prod(x.shape), replace=False, size=int(np.prod(x.shape)*p))
x[np.unravel_index(indices, x.shape)] = 0.
return x
#tf.function(input_signature=[tf.TensorSpec((None, None), tf.float32)])
def tf_function(input):
y = tf.numpy_function(my_numpy_func, [input], tf.float32)
return y
tf_function(tf.constant(input_tensor))
You can also use this is code in the context of a Dataset by using the map() operation.
What is the best way to fill in the lower triangle of a numpy array with zeros in place so that I don't have to do the following:
a=np.random.random((5,5))
a = np.triu(a)
since np.triu returns a copy, not a view. Preferable this would require no list indexing as well since I am working with large arrays.
Digging into the internals of triu you'll find that it just multiplies the input by the output of tri.
So you can just multiply the array in-place by the output of tri:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape)
>>> a
array([[ 0.46026582, 0. , 0. , 0. , 0. ],
[ 0.76234296, 0.5298908 , 0. , 0. , 0. ],
[ 0.08797149, 0.14881991, 0.9302515 , 0. , 0. ],
[ 0.54794779, 0.36896506, 0.92901552, 0.73747726, 0. ],
[ 0.62917827, 0.61674542, 0.44999905, 0.80970863, 0.41860336]])
Like triu, this still creates a second array (the output of tri), but at least it performs the operation itself in-place. The splat is a bit of a shortcut; consider basing your function on the full version of triu for something robust. But note that you can still specify a diagonal:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape, k=2)
>>> a
array([[ 0.25473126, 0.70156073, 0.0973933 , 0. , 0. ],
[ 0.32859487, 0.58188318, 0.95288351, 0.85735005, 0. ],
[ 0.52591784, 0.75030515, 0.82458369, 0.55184033, 0.01341398],
[ 0.90862183, 0.33983192, 0.46321589, 0.21080121, 0.31641934],
[ 0.32322392, 0.25091433, 0.03980317, 0.29448128, 0.92288577]])
I now see that the question title and body describe opposite behaviors. Just in case, here's how you can fill the lower triangle with zeros. This requires you to specify the -1 diagonal:
>>> a = np.random.random((5, 5))
>>> a *= 1 - np.tri(*a.shape, k=-1)
>>> a
array([[0.6357091 , 0.33589809, 0.744803 , 0.55254798, 0.38021111],
[0. , 0.87316263, 0.98047459, 0.00881754, 0.44115527],
[0. , 0. , 0.51317289, 0.16630385, 0.1470729 ],
[0. , 0. , 0. , 0.9239731 , 0.11928557],
[0. , 0. , 0. , 0. , 0.1840326 ]])
If speed and memory use are still a limitation and Cython is available, a short Cython function will do what you want.
Here's a working version designed for a C-contiguous array with double precision values.
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef make_lower_triangular(double[:,:] A, int k):
""" Set all the entries of array A that lie above
diagonal k to 0. """
cdef int i, j
for i in range(min(A.shape[0], A.shape[0] - k)):
for j in range(max(0, i+k+1), A.shape[1]):
A[i,j] = 0.
This should be significantly faster than any version that involves multiplying by a large temporary array.
import numpy as np
n=3
A=np.zeros((n,n))
for p in range(n):
A[0,p] = p+1
if p >0 :
A[1,p]=p+3
if p >1 :
A[2,p]=p+4
creates a upper triangular matrix starting at 1
Using some experimental data, I cannot for the life of me work out how to use splrep to create a B-spline. The data are here: http://ubuntuone.com/4ZFyFCEgyGsAjWNkxMBKWD
Here is an excerpt:
#Depth Temperature
1 14.7036
-0.02 14.6842
-1.01 14.7317
-2.01 14.3844
-3 14.847
-4.05 14.9585
-5.03 15.9707
-5.99 16.0166
-7.05 16.0147
and here's a plot of it with depth on y and temperature on x:
Here is my code:
import numpy as np
from scipy.interpolate import splrep, splev
tdata = np.genfromtxt('t-data.txt',
skip_header=1, delimiter='\t')
depth = tdata[:, 0]
temp = tdata[:, 1]
# Find the B-spline representation of 1-D curve:
tck = splrep(depth, temp)
### fails here with "Error on input data" returned. ###
I know I am doing something bleedingly stupid, but I just can't see it.
You just need to have your values from smallest to largest :). It shouldn't be a problem for you #a different ben, but beware readers from the future, depth[indices] will throw a TypeError if depth is a list instead of a numpy array!
>>> indices = np.argsort(depth)
>>> depth = depth[indices]
>>> temp = temp[indices]
>>> splrep(depth, temp)
(array([-7.05, -7.05, -7.05, -7.05, -5.03, -4.05, -3. , -2.01, -1.01,
1. , 1. , 1. , 1. ]), array([ 16.0147 , 15.54473241, 16.90606794, 14.55343229,
15.12525673, 14.0717599 , 15.19657895, 14.40437622,
14.7036 , 0. , 0. , 0. , 0. ]), 3)
Hat tip to #FerdinandBeyer for the suggestion of argsort instead of my ugly "zip the values, sort the zip, re-assign the values" method.