Better way to shuffle two numpy arrays in unison

Better way to shuffle two numpy arrays in unison - python

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.

Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.

Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way

James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.

you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]

There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.

This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))

One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state

Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]

If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.

Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)

most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T

With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.

Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

Related

Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates

Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates
import numpy as np
random_matrix = np.random.randint(0,10,size=(3,3))
print(random_matrix)

If you want an answer where we don't have to rely on numpy then you can do this:
import random
# Generates a randomized list between 0-9, where 0 is replaced by "#"
x = ["#" if i == 0 else i for i in random.sample(range(10), k=9)]
print(x)
# Slices the list into a 3x3 format
newx = [x[idx:idx+3] for idx in range(0, len(x), 3)]
print(newx)
Output:
[6, 2, 7, 4, '#', 8, 9, 1, 3]
[[6, 2, 7], [4, '#', 8], [9, 1, 3]]

import numpy
x = numpy.arange(0, 9)
numpy.random.shuffle(x)
x = numpy.reshape(x, (3,3))
print(numpy.where(x==0, '#', x))
Let me know, but with my solution, integers seems to be replaced by string.. i don't know if you care. Else, I will found an other solution

You can achieve your goal using a few steps:
Generate sequence of values (in some range) you would like to randomly select into matrix.
Take randomly some number of elements from this sequence to new sequence.
From this new sequence make matrix with wanted shape.
import numpy as np
from random import sample
#step one
values = range(0,11)
#step two
random_sequence = sample(values, 9)
#step three
random_matrix = np.array(random_sequence).reshape(3,3)
Because you sample some number of elements, from unique sequence, that guarantee you uniqueness of new sequence, and then matrix.

You can use np.random.choice with replace=False to generate the (3, 3) array:
np.random.choice(np.arange(9), size=(3, 3), replace=False)
Replacing 0 with np.nan:
>>> np.where(x, x, np.nan)
array([[ 4., 1., 3.],
[ 5., nan, 8.],
[ 2., 6., 7.]])
However, I think Hampus Larsson's answer is better, as this problem is not appropriate for numpy if you intend to replace 0 with the string "#".

you could use numpy but random is enough
import random
numbers = list(range(9))
random.shuffle(numbers)
my_list = [[numbers[i*3 + j] for j in range(0,3)] for i in range(0,3)]

Python Optimization: Using vector technique to find power of each matrix in an numpy array

3D numpy array A contains a series (in this example, I am choosing 3) of 2D numpy array D of shape 2 x 2. The D matrix is as follows:
D = np.array([[1,2],[3,4]])
A is initialized and assigned as below:
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
Now, essentially what I require after the execution of the codes is:
Mathematically, A = {D^0, D^1, D^2} = {D0, D1, D2}
where D0 = [[1,0],[0,1]], D1 = [[1,2],[3,4]], D2=[[7,10],[15,22]]
Is it possible to apply power to each matrix element in A without using a for-loop? I would be doing larger matrices with more in the series.
I had defined, n = np.array([0,1,2]) # corresponding to powers 0, 1 and 2 and tried
Result = np.power(A,n) but I do not get the desired output.
Is there are an efficient way to do it?
Full code:
D = np.array([[1,2],[3,4]])
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
n = np.array([0,1,2])
Result = np.power(A,n) # ------> Not the desired output.

A cumulative product exists in numpy, but not for matrices. Therefore, you need to make your own 'matcumprod' function. You can use np.dot for this, but np.matmul (or #) is specialized for matrix multiplication.
Since you state your powers always go from 0 to some_power, I suggest the following function:
def matcumprod(D, upto):
Res = np.empty((upto, *D.shape), dtype=A.dtype)
Res[0, :, :] = np.eye(D.shape[0])
Res[1, :, :] = D.copy()
for i in range(1,upto):
Res[i, :, :] = Res[i-1,:,:] # D
return Res
By the way, a loop often times outperforms a built-in numpy function if the latter uses a lot of memory, so don't fret over it if your powers stay within bounds...

Alright, i spent a lot of time on this problem but could not seem to find a vectorized solution in the way you'd like. So i would like to instead first propose a basic solution, and then perhaps an optimization if you require finding continuous powers.
The function you're looking for is called numpy.linalg.matrix_power
import numpy as np
D = np.matrix([[1,2],[3,4]])
idx = np.arange(3)
A = np.zeros((3,2,2))
A[idx,:,:] = D # This gives A = [[[1,2],[3,4]],[[1,2],[3,4]],\
# [[1,2],[3,4]]]
# In mathematical notation: A = {D, D, D}
np.zeros(A.shape)
n = np.array([0,1,2])
result = [np.linalg.matrix_power(D, i) for i in n]
np.array(result)
#Output:
array([[[ 1, 0],
[ 0, 1]],
[[ 1, 2],
[ 3, 4]],
[[ 7, 10],
[15, 22]]])
However, if you notice, you end up calculating multiple powers for the same base matrix. We could instead utilize the intermediate results and go from there, using numpy.linalg.multi_dot
def all_powers_arr_of_matrix(A):
result = np.zeros(A.shape)
result[0] = np.linalg.matrix_power(A[0], 0)
for i in range(1, A.shape[0]):
result[i] = np.linalg.multi_dot([result[i - 1], A[i]])
return result
result = all_powers_arr_of_matrix(A)
#Output:
array([[[ 1., 0.],
[ 0., 1.]],
[[ 1., 2.],
[ 3., 4.]],
[[ 7., 10.],
[15., 22.]]])
Also, we can avoid creating the matrix A entirely, saving some time.
def all_powers_matrix(D, *rangeargs): #end exclusive
''' Expects 2D matrix.
Use as all_powers_matrix(D, end) or
all_powers_matrix(D, start, end)
'''
if len(rangeargs) == 1:
start = 0
end = rangeargs[0]
elif len(rangeargs) == 2:
start = rangeargs[0]
end = rangeargs[1]
else:
print("incorrect args")
return None
result = np.zeros((end - start, *D.shape))
result[0] = np.linalg.matrix_power(A[0], start)
for i in range(start + 1, end):
result[i] = np.linalg.multi_dot([result[i - 1], D])
return result
return result
result = all_powers_matrix(D, 3)
#Output:
array([[[ 1., 0.],
[ 0., 1.]],
[[ 1., 2.],
[ 3., 4.]],
[[ 7., 10.],
[15., 22.]]])
Note that you'd need to add error handling if you decide to use these functions as-is.

To calculate power of matrix D, one way could be to find the eigenvalues and right eigenvectors of it with np.linalg.eig and then raise the power of the diagonal matrix as it is easier, then after some manipulation, you can use two np.einsum to calculate A
#get eigvalues and eigvectors
eigval, eigvect = np.linalg.eig(D)
# to check how it works, you can do:
print (np.dot(eigvect*eigval,np.linalg.inv(eigvect)))
#[[1. 2.]
# [3. 4.]]
# so you get back on D
#use power as ufunc of outer with n on the eigenvalues to get all the one you want
arrp = np.power.outer( eigval, n).T
#apply_along_axis to create the diagonal matrix along the last axis
diagp = np.apply_along_axis( np.diag, axis=-1, arr=arrp)
#finally use two np.einsum to calculate with the subscript to get what you want
A = np.einsum('lij,jk -> lik',
np.einsum('ij,kjl -> kil',eigvect,diagp), np.linalg.inv(eigvect)).round()
print (A)
print (A.shape)
#[[[ 1. 0.]
# [-0. 1.]]
#
# [[ 1. 2.]
# [ 3. 4.]]
#
# [[ 7. 10.]
# [15. 22.]]]
#
#(3, 2, 2)

I don't have a full solution, but there are some things I wanted to mention which are a bit too long for the comments.
You might first look into addition chain exponentiation if you are computing big powers of big matrices. This is basically asking how many matrix multiplications are required to compute A^k for a given k. For instance A^5 = A(A^2)^2 so you need to only three matrix multiplies: A^2 and (A^2)^2 and A(A^2)^2. This might be the simplest way to gain some efficiency, but you will probably still have to use explicit loops.
Your question is also related to the problem of computing Ax, A^2x, ... , A^kx for a given A and x. This is an active area of research right now (search "matrix powers kernel"), since computing such a sequence efficiently is useful for parallel/communication avoiding Krylov subspace methods. If you're looking for a very efficient solution to your problem it might be worth looking into some of the results about this.

sum a 3x3 array on a given point to another matrix maintaining boundaries

suppose I have this 2d array A:
[[0,0,0,0],
[0,0,0,0],
[0,0,0,0],
[0,0,0,4]]
and I want to sum B:
[[1,2,3]
[4,5,6]
[7,8,9]]
centered on A[0][0] so the result would be:
array_sum(A,B,0,0) =
[[5,6,0,4],
[8,9,0,0],
[0,0,0,0],
[2,0,0,5]]
I was thinking that I should make a function that compares if its on a boundary and then adjust the index for that:
def array_sum(A,B,i,f):
...
if i == 0 and j == 0:
A[-1][-1] = A[-1][-1]+B[0][0]
...
else:
A[i-1][j-1] = A[i][j]+B[0][0]
A[i][j] = A[i][j]+B[1][1]
A[i+1][j+1] = A[i][j]+B[2][2]
...
but I don't know if there is a better way of doing that, I was reading about broadcasting or maybe using convolute for that, but I'm not sure if there is a better way to do that.

Assuming B.shape is all odd numbers, you can use np.indices, manipulate them to point where you want, and use np.add.at
def array_sum(A, B, loc = (0, 0)):
A_ = A.copy()
ix = np.indices(B.shape)
new_loc = np.array(loc) - np.array(B.shape) // 2
new_ix = np.mod(ix + new_loc[:, None, None],
np.array(A.shape)[:, None, None])
np.add.at(A_, tuple(new_ix), B)
return A_
Testing:
array_sum(A, B)
Out:
array([[ 5., 6., 0., 4.],
[ 8., 9., 0., 7.],
[ 0., 0., 0., 0.],
[ 2., 3., 0., 5.]])

As a rule of thumb slice indexing is faster (~2x) than fancy indexing. This appears to be true even for the small example in OP. Downside: the code is slightly more complicated.
import numpy as np
from numpy import s_ as _
from itertools import product, starmap
def wrapsl1d(N, n, c):
# check in 1D whether a patch of size n centered at c in a vector
# of length N fits or has to be wrapped around
# return appropriate slice objects for both vector and patch
assert n <= N
l = (c - n//2) % N
h = l + n
# return list of pairs (index into A, index into patch)
# 2 pairs if we wrap around, otherwise 1 pair
return [_[l:h, :]] if h <= N else [_[l:, :N-l], _[:h-N, n+N-h:]]
def use_slices(A, patch, center=(0, 0)):
slAptch = product(*map(wrapsl1d, A.shape, patch.shape, center))
# the product now has elements [(idx0A, idx0ptch), (idx1A, idx1ptch)]
# transpose them:
slAptch = starmap(zip, slAptch)
out = A.copy()
for sa, sp in slAptch:
out[sa] += patch[sp]
return out

Is it possible to speed up an element-by-element array operation in Python?

I have a list of times (called times in my code, produced by the code suggested to me in the thread astropy.io fits efficient element access of a large table) and I want to do some statistical tests for periodicity, using Zn^2 and epoch folding tests. Some steps in the code take quite a while to run, and I am wondering if there is a faster way to do it. I have tried the equivalent map and lambda functions, but that takes even longer. My list of times has several hundred or maybe thousands of elements, depending on the dataset. Here is my code:
phase=[(x-mintime)*testfreq[m]-int((x-mintime)*testfreq[m]) for x in times]
# the above step takes 3 seconds for the dataset I am using for testing
# testfreq[m] is just one of several hundred frequencies I am testing
# times is of type numpy.ndarray
phasebin=[int(ph*numbins)for ph in phase]
# 1 second (numbins is 20)
powerarray=[phasebin.count(n) for n in range(0,numbins-1)]
# 0.3 seconds
poweravg=np.mean(powerarray)
chisq[m]=sum([(pow-poweravg)**2/poweravg for pow in powerarray])
# the above 2 steps are very quick
for n in range(0,maxn): # maxn is 3
cosparam=sum([(np.cos(2*np.pi*(n+1)*ph)) for ph in phase])
sinparam=sum([(np.sin(2*np.pi*(n+1)*ph)) for ph in phase])
# these steps each take 4 seconds
z2[m,n]=sum(z2[m,])+(cosparam**2+sinparam**2)/count
# this is quick (count is the number of times)
As this steps through several hundred frequencies on either side of frequencies identified through an FFT search, it takes a very long time to run. The same functionality in a lower level language runs much more quickly, but I need some of the Python modules for plotting, etc. I am hoping that Python can be persuaded to do some of the operations, particularly the phase, phasebin, powerarray, cosparam, and sinparam calculations, significantly faster, but I am not sure how to make this happen. Can anyone tell me how this can be done, or do I have to write and call functions in C or fortran? I know that this could be done in a few minutes e.g. in fortran, but this Python code takes hours as it is.
Thanks very much.

Instead of Python lists, you could use the numpy library, it is much faster for linear algebra type operations. For example to add two arrays in an element-wise fashion
>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([2,3,4,5,6])
>>> a + b
array([ 3, 5, 7, 9, 11])
Similarly, you can multiply arrays by scalars which multiplies each element as you'd expect
>>> 2 * a
array([ 2, 4, 6, 8, 10])
As far as speed, here is the Python list equivalent of adding two lists
>>> c = [1,2,3,4,5]
>>> d = [2,3,4,5,6]
>>> [i+j for i,j in zip(c,d)]
[3, 5, 7, 9, 11]
Then timing the two
>>> from timeit import timeit
>>> setup = '''
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([2,3,4,5,6])'''
>>> timeit('a+b', setup)
0.521275608325351
>>> setup = '''
c = [1,2,3,4,5]
d = [2,3,4,5,6]'''
>>> timeit('[i+j for i,j in zip(c,d)]', setup)
1.2781205834379108
In this small example numpy was more than twice as fast.

for loop substitute - operating on complete arrays
First multiply phase by 2*pi*n using broadcasting
phase = np.arange(10)
maxn = 3
ens = np.arange(1, maxn+1) # array([1, 2, 3])
two_pi_ens = 2*np.pi*ens
b = phase * two_pi_ens[:, np.newaxis]
b.shape is (3,10) one row for each value of range(1, maxn)
Take the cosine then sum to get the three cosine parameters
c = np.cos(b)
c_param = c.sum(axis = 1) # c_param.shape is 3
Take the sine then sum to get the three sine parameters
s = np.sin(b)
s_param = s.sum(axis = 1) # s_param.shape is 3
Sum of the squares divided by count
d = (np.square(c_param) + np.square(s_param)) / count
# d.shape is (3,)
Assign to z2
for n in range(maxn):
z2[m,n] = z2[m,:].sum() + d[n]
That loop is doing a cumulative sum. numpy ndarrays have a cumsum method.
If maxn is small (3 in your case) it may not be noticeably faster.
z2[m,:] += d
z2[m,:].cumsum(out = z2[m,:])
To illustrate:
>>> a = np.ones((3,3))
>>> a
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
>>> m = 1
>>> d = (1,2,3)
>>> a[m,:] += d
>>> a
array([[ 1., 1., 1.],
[ 2., 3., 4.],
[ 1., 1., 1.]])
>>> a[m,:].cumsum(out = a[m,:])
array([ 2., 5., 9.])
>>> a
array([[ 1., 1., 1.],
[ 2., 5., 9.],
[ 1., 1., 1.]])
>>>

Python Dynamic Array allocation, Matlab style

I'm trying to move a few Matlab libraries that I've built to the python environment. So far, the biggest issue I faced is the dynamic allocation of arrays based on index specification. For example, using Matlab, typing the following:
x = [1 2];
x(5) = 3;
would result in:
x = [ 1 2 0 0 3]
In other words, I didn't know before hand the size of (x), nor its content. The array must be defined on the fly, based on the indices that I'm providing.
In python, trying the following:
from numpy import *
x = array([1,2])
x[4] = 3
Would result in the following error: IndexError: index out of bounds. On workaround is incrementing the array in a loop and then assigned the desired value as :
from numpy import *
x = array([1,2])
idx = 4
for i in range(size(x),idx+1):
x = append(x,0)
x[idx] = 3
print x
It works, but it's not very convenient and it might become very cumbersome for n-dimensional arrays.I though about subclassing ndarray to achieve my goal, but I'm not sure if it would work. Does anybody knows of a better approach?
Thanks for the quick reply. I didn't know about the setitem method (I'm fairly new to Python). I simply overwritten the ndarray class as follows:
import numpy as np
class marray(np.ndarray):
def __setitem__(self, key, value):
# Array properties
nDim = np.ndim(self)
dims = list(np.shape(self))
# Requested Index
if type(key)==int: key=key,
nDim_rq = len(key)
dims_rq = list(key)
for i in range(nDim_rq): dims_rq[i]+=1
# Provided indices match current array number of dimensions
if nDim_rq==nDim:
# Define new dimensions
newdims = []
for iDim in range(nDim):
v = max([dims[iDim],dims_rq[iDim]])
newdims.append(v)
# Resize if necessary
if newdims != dims:
self.resize(newdims,refcheck=False)
return super(marray, self).__setitem__(key, value)
And it works like a charm! However, I need to modify the above code such that the setitem allow changing the number of dimensions following this request:
a = marray([0,0])
a[3,1,0] = 0
Unfortunately, when I try to use numpy functions such as
self = np.expand_dims(self,2)
the returned type is numpy.ndarray instead of main.marray. Any idea on how I could enforce that numpy functions output marray if a marray is provided as an input? I think it should be doable using array_wrap, but I could never find exactly how. Any help would be appreciated.

Took the liberty of updating my old answer from Dynamic list that automatically expands. Think this should do most of what you need/want
class matlab_list(list):
def __init__(self):
def zero():
while 1:
yield 0
self._num_gen = zero()
def __setitem__(self,index,value):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__setitem__(index,value)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__setitem__(index,value)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__setitem__(index,value)
def expandfor(self,index):
rng = []
if abs(index)>len(self)-1:
if index<0:
rng = xrange(abs(index)-len(self))
for i in rng:
self.insert(0,self_num_gen.next())
else:
rng = xrange(abs(index)-len(self)+1)
for i in rng:
self.append(self._num_gen.next())
# Usage
spec_list = matlab_list()
spec_list[5] = 14

This isn't quite what you want, but...
x = np.array([1, 2])
try:
x[index] = value
except IndexError:
oldsize = len(x) # will be trickier for multidimensional arrays; you'll need to use x.shape or something and take advantage of numpy's advanced slicing ability
x = np.resize(x, index+1) # Python uses C-style 0-based indices
x[oldsize:index] = 0 # You could also do x[oldsize:] = 0, but that would mean you'd be assigning to the final position twice.
x[index] = value
>>> x = np.array([1, 2])
>>> x = np.resize(x, 5)
>>> x[2:5] = 0
>>> x[4] = 3
>>> x
array([1, 2, 0, 0, 3])
Due to how numpy stores the data linearly under the hood (though whether it stores as row-major or column-major can be specified when creating arrays), multidimensional arrays are pretty tricky here.
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.resize(x, (6, 4))
array([[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6],
[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6]])
You'd need to do this or something similar:
>>> y = np.zeros((6, 4))
>>> y[:x.shape[0], :x.shape[1]] = x
>>> y
array([[ 1., 2., 3., 0.],
[ 4., 5., 6., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])

A python dict will work well as a sparse array. The main issue is the syntax for initializing the sparse array will not be as pretty:
listarray = [100,200,300]
dictarray = {0:100, 1:200, 2:300}
but after that the syntax for inserting or retrieving elements is the same
dictarray[5] = 2345

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Better way to shuffle two numpy arrays in unison - python

Your can use NumPy's array indexing: def unison_shuffled_copies(a, b): assert len(a) == len(b) p = numpy.random.permutation(len(a)) return a[p], b[p] This will result in creation of separate unison-shuffled arrays.

X = np.array([[1., 0.], [2., 1.], [0., 0.]]) y = np.array([0, 1, 2]) from sklearn.utils import shuffle X, y = shuffle(X, y, random_state=0) To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html

Very simple solution: randomize = np.arange(len(x)) np.random.shuffle(randomize) x = x[randomize] y = y[randomize] the two arrays x,y are now both randomly shuffled in the same way

from np.random import permutation from sklearn.datasets import load_iris iris = load_iris() X = iris.data #numpy array y = iris.target #numpy array # Data is currently unshuffled; we should shuffle # each X[i] with its corresponding y[i] perm = permutation(len(X)) X = X[perm] y = y[perm]

you can make an array like: s = np.arange(0, len(a), 1) then shuffle it: np.random.shuffle(s) now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors. x_data = x_data[s] x_label = x_label[s]

Shortest and easiest way in my opinion, use seed: random.seed(seed) random.shuffle(x_data) # reset the same seed to get the identical random sequence and shuffle the y random.seed(seed) random.shuffle(y_data)

most solutions above work, however if you have column vectors you have to transpose them first. here is an example def shuffle(self) -> None: """ Shuffles X and Y """ x = self.X.T y = self.Y.T p = np.random.permutation(len(x)) self.X = x[p].T self.Y = y[p].T

With an example, this is what I'm doing: combo = [] for i in range(60000): combo.append((images[i], labels[i])) shuffle(combo) im = [] lab = [] for c in combo: im.append(c[0]) lab.append(c[1]) images = np.asarray(im) labels = np.asarray(lab)

Related

Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates

Python Optimization: Using vector technique to find power of each matrix in an numpy array

sum a 3x3 array on a given point to another matrix maintaining boundaries

Is it possible to speed up an element-by-element array operation in Python?

Python Dynamic Array allocation, Matlab style

Categories

Resources