Python- Numpy array in function with conditions - python

I have a numpy array (say xs), for which I am writing a function to create another array (say ys), which has the same values as of xs until the first half of xs and two times of xs in the remaining half. for example, if xs=[0,1,2,3,4,5], the required output is [0,1,2,6,8,10]
I wrote the following function:
import numpy as np
xs=np.arange(0,6,1)
def step(xs):
ys1=np.array([]);ys2=np.array([])
if xs.all() <=2:
ys1=xs
else:
ys2=xs*2
return np.concatenate((ys1,ys2))
print(xs,step(xs))
Which produces output: `array([0., 1., 2., 3., 4., 5.]), ie the second condition is not executed. Does anybody know how to fix it? Thanks in advance.

You can use vectorised operations instead of Python-level iteration. With the below method, we first copy the array and then multiply the second half of the array by 2.
import numpy as np
xs = np.arange(0,6,1)
def step(xs):
arr = xs.copy()
arr[int(len(arr)/2):] *= 2
return arr
print(xs, step(xs))
[0 1 2 3 4 5] [ 0 1 2 6 8 10]

import numpy as np
xs=np.arange(0,6,1)
def f(a):
it = np.nditer([a, None])
for x, y in it:
y[...] = x if x <= 2 else x * 2
return it.operands[1]
print(f(xs))
[ 0 1 2 6 8 10]
Sorry, I did not find your bug, but I felt it can be implemented differently.

Related

numpy vectorize use on (2,) array

I have a numpy array of (m, 2) and I want to transform it to shape of (m, 1) using a function below.
def func(x):
if x == [1., 1.]:
return 0.
if x == [-1., 1.] or x == [-1., -1.]:
return 1.
if x == [1., -1.]:
return 2.
I want this for applied on each (2,) vector inside the (m, 2) array resulting an (m, 1) array. I tried to use numpy.vectorize but it seems that the function gets applied in each element of a array (which makes sense in general purpose case). So I have failed to apply it.
My intension is not to use for loop. Can anyone help me with this? Thanks.
import numpy as np
def f(a, b):
return a + b
F = np.vectorize(f)
x = np.asarray([[1, 2], [3, 4], [5, 6]]).T
print(F(*x))
Output:
[3, 7, 11]

Summing array values by repeating index for an array

I want to sum the values in vals into elements of a smaller array a specified in an index list idx.
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
a[0,idx] += vals
This produces the result [[ 1. 2. 4.]] but I want the result [[ 1. 2. 7.]], because it should add the 3 from vals and 4 from vals into the 2nd element of a.
I can achieve what I want with:
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
for i in np.unique(idx):
fidx = (idx==i).astype(int)
psum = (vals * fidx).sum()
a[0,i] = psum
print(a)
Is there a way to do this with numpy without using a for loop?
Possible with np.add.at as long as the shapes align, i.e., a will need to be 1D here.
a = a.squeeze()
np.add.at(a, idx, vals)
a
array([1., 2., 7.])

How to numpy.nan*0=0

I would like to have python make the product
numpy.nan*0
return 0 (instead of nan), but e.g.
numpy.nan*4
still return nan.
My application: I have some numpy matrices which I am multiplying with one another. These contain many nan entries, and plenty of zeros.
The nans always represent unknown, but finite values which are known to become zero when multiplied with zero.
So I would like A*B return [1,nan],[nan,1] in the following example:
import numpy as np
A=np.matrix('1 0; 0 1')
B=np.matrix([[1, np.nan],[np.nan, 1]])
Is this possible?
Many thanks
You can use the numpy function numpy.nan_to_num()
import numpy as np
A = np.matrix('1 0; 0 1')
B = np.matrix([[1, np.nan],[np.nan, 1]])
C = np.nan_to_num(A) * np.nan_to_num(B)
The outcome will be [[1., 0.], [0., 1.]].
I don't think it is possible to override the behavior of nan * 0 in numpy directly because that multiplication is performed at a very low level.
However, you can provide your own Python class with the desired multiplication behavior, but be warned: This will seriously kill performance.
import numpy as np
class MyNumber(float):
def __mul__(self, other):
if other == 0 and np.isnan(self) or self == 0 and np.isnan(other):
return 0.0
return float(self) * other
def convert(x):
x = np.asmatrix(x, dtype=object) # use Python objects as matrix elements
x.flat = [MyNumber(i) for i in x.flat] # convert each element to MyNumber
return x
A = convert([[1, 0], [0, 1]])
B = convert([[1, np.nan], [np.nan, 1]])
print(A * B)
# [[1.0 nan]
# [nan 1.0]]

how to use sparse vectors and matrices in Python?

I am trying to do something very simple, but confused by the abundance of information about sparse matrices and vectors in Python.
I want to create two vectors, x and y, one of length 5 and one of length 6, being sparse. Then I want to set one coordinate in each one of them. Then I want to create a matrix A, sparse, which is 5 x 6 and add to it the outer product between x and y. I then want to do SVD on that A.
Here is what I tried, and it goes wrong in many ways.
from scipy import sparse;
import numpy as np;
import scipy.sparse.linalg as ssl;
x = sparse.bsr_matrix(np.zeros(5));
x[1] = 1;
y = sparse.bsr_matrix(np.zeros(6));
y[1] = 2;
A = sparse.coo_matrix(5, 6);
A = A + np.outer(x,y.transpose())
svdresult = ssl.svds(A,1);
At first, you should determine data you want to store in sparse matrix before constructing it. Otherwise you should use sparse.csc_matrix or sparse.csr_matrix instead. Then you can assign or change data like this:
x[0, 1] = 1
At second, outer product of vectors x and y is equivalent to x.transpose() * y.
Here is working code:
from scipy import sparse
import numpy as np
import scipy.sparse.linalg as ssl
x = np.zeros(5)
x[1] = 1
x_bsr = sparse.bsr_matrix(x)
y = np.zeros(6)
y[1] = 2
y_bsr = sparse.bsr_matrix(y)
A = sparse.coo_matrix((5, 6)) # Sparse matrix 5 x 6
B = x_bsr.transpose().dot(y_bsr) # Outer product of x and y
svdresult = ssl.svds((A + B), 1)
Output:
(array([[ 5.55111512e-17],
[ -1.00000000e+00],
[ 0.00000000e+00],
[ -2.77555756e-17],
[ 1.11022302e-16]]), array([ 2.]), array([[ 0., -1., 0., 0., 0., 0.]]))

Better way to shuffle two numpy arrays in unison

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.
Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.
Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way
James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.
you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]
There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.
This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))
One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state
Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]
If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.
Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)
most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T
With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)
I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.
Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

Categories