What is the fastest way to stack numpy arrays in a loop?

What is the fastest way to stack numpy arrays in a loop? - python

I have a code that generates me within a for loop two numpy arrays (data_transform). In the first loop generates a numpy array of (40, 2) and in the second loop one of (175, 2). I want to concatenate these two arrays into one, to give me an array of (215, 2). I tried with np.concatenate and with np.append, but it gives me an error since the arrays must be the same size. Here is an example of how I am doing the code:
result_arr = np.array([])
for label in labels_set:
data = [index for index, value in enumerate(labels_list) if value == label]
for i in data:
sub_corpus.append(corpus[i])
data_sub_tfidf = vec.fit_transform(sub_corpus)
data_transform = pca.fit_transform(data_sub_tfidf)
#Append array
sub_corpus = []
I have also used np.row_stack but nothing else gives me a value of (175, 2) which is the second array I want to concatenate.

What #hpaulj was trying to say with
Stick with list append when doing loops.
is
#use a normal list
result_arr = []
for label in labels_set:
data_transform = pca.fit_transform(data_sub_tfidf)
# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)
# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.
result_arr = np.concatenate(result_arr)
# or
result_arr = np.stack(result_arr, axis=0)
# or
result_arr = np.vstack(result_arr)
Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.

Using concatenate, initializing "c":
a = np.array([[8,3,1],[2,5,1],[6,5,2]])
b = np.array([[2,5,1],[2,5,2]])
matrix = [a,b]
c = np.empty([0,matrix[0].shape[1]])
for v in matrix:
c = np.append(c, v, axis=0)
Output:
[[8. 3. 1.]
[2. 5. 1.]
[6. 5. 2.]
[2. 5. 1.]
[2. 5. 2.]]

If you have an array a of size (40, 2) and an array b of size (175,2), you can simply have a final array of size (215, 2) using np.concatenate([a,b]).

Related

Masking arrays using numpy

I have an array and I want to mask it such that I Keep its shape as it is i.e, not to delete the masked elements.
For example in this code
input = torch.randn(2, 5)
mask = input > 0
input = input[mask]
input = input *1000000000000
print(input)
printing the input is the result of the above mathematical operation on the unmasked elements and returns a 1D array without the masked elements.

you're overwriting your original array when you do input = input[mask]. If you omit that step, you can modify the masked values in place, but keep the non-masked values as is
i = np.random.randn(2, 5)
print(i)
[[ 0.48857855 0.97799014 2.29587523 -2.37257331 1.28193921]
[ 0.62932172 1.37433223 -1.2427145 0.31424802 1.34534568]]
mask = i> 0
i[mask] *= 1000000000000
print(i)
[[ 4.88578545e+11 9.77990142e+11 2.29587523e+12 -2.37257331e+00 1.28193921e+12]
[ 6.29321720e+11 1.37433223e+12 -1.24271450e+00 3.14248021e+11 1.34534568e+12]]

Performing a random addition between two arrays

What I'm trying to do is get two different arrays, where the first array is just filled with zeros and second array would be populated by random numbers. I would like to perform an operation where only certain elements from the latter array are added to the array filled with zeros and the rest of elements within the former array remain as zero. I'm trying to get the addition done in a random way as well. I just added the code below as an example. I honestly don't know how to perform something like this and I would be very grateful for any help or suggestions! Thank you!
shape = (6, 3)
empty_array = np.zeros(shape)
random_array = 0.1 * np.random.randn(*empty_array)
sum = np.add(empty_array, random_array)

You can use a binary mask with the density P:
P = 0.5
# Repeat the next two lines as needed
mask = np.random.binomial(1, P, size = empty_array.size)\
.reshape(shape).astype(bool)
empty_array[mask] += random_array[mask]
If you plan to add more random elements, you may want to re-generate the mask at each further iteration.

If I understand you correctly from your comments, you want to create random numbers at random indices based on the threshold of some percent of whole array (you do not need to create a whole random array and use only a percent of it, such random number generation is usually costly in larger scales):
sz = shape[0]*shape[1]
#this is your for example 20% threshold
threshold = 0.2
#create random numbers and random indices
random_array = np.random.rand(int(threshold*sz))
random_idx = np.random.randint(0,sz,int(threshold*sz))
#now you can add this random_array to random indices of your desired array
empty_array.reshape(-1)[random_idx] += random_array
or another solution:
sz = shape[0]*shape[1]
#this is your for example 20% threshold
threshold = 0.2
random_array = np.random.rand(int(threshold*sz))
#pad with enough zeros and randomly shuffle and finally reshape it
random_array.resize(sz)
np.random.shuffle(random_array)
#now you can add this random_array to any array of your choice
empty_array += random_array.reshape(shape)
sample output:
[[0. 0. 0. ]
[0. 0. 0. ]
[0. 0. 0.7397274 ]
[0. 0. 0. ]
[0. 0. 0.79541551]
[0.75684113 0. 0. ]]

How to use numpy empty_like

I've got the following array:
rxn_probability = [1. 0. 0.]
And I want to create another array, num_rxn, that has the same shape and size of rxn_probability, which contains a number of the reactions, so in this case num_rxn would be: [1, 2, 3]. Starting at 1 and increasing until it reaches the same size and shape of rxn_probability, so that if I change the size of rxn_probability the size and shape of num_rxn will automatically be changed.
So far I've tried:
num_rxn = np.array(range(len(rxn_probability + 1)))
(also tried using arange in a similar way)
But this outputs:
[0 1 2]
which isn't what I want because it doesn't start at 1 or end at 3.
I've been reading about numpy.empty_like but I'm not sure if that would be the best or right solution. Any ideas?
Cheers

You can use np.arange and then reshape to fit the shape of rxn_probability:
num_rxn = np.arange(1, rxn_probability.size + 1).reshape(rxn_probability.shape)

Numpy appending two-dimensional arrays together

I am trying to create a function which exponentiates a 2-D matrix and keeps the result in a 3D array, where the first dimension is indexing the exponent. This is important because the rows of the matrix I am exponentiating represent information about different vertices on a graph. So for example if we have A, A^2, A^3, each is shape (50,50) and I want a matrix D = (3,50,50) so that I can go D[:,1,:] to retrieve all the information about node 1 and be able to do matrix multiplication with that. My code is currently as
def expo(times,A,n):
temp = A;
result = csr_matrix.toarray(temp)
for i in range(0,times):
temp = np.dot(temp,A)
if i == 0:
result = np.array([result,csr_matrix.toarray(temp)]) # this creates a (2,50,50) array
if i > 0:
result = np.append(result,csr_matrix.toarray(temp),axis=0) # this does not work
return result
However, this is not working because in the "i>0" case the temp array is of the shape (50,50) and cannot be appended. I am not sure how to make this work and I am rather confused by the dimensionality in Numpy, e.g. why thinks are (50,1) sometimes and just (50,) other times. Would anyone be able to help me make this code work and explain generally how these things should be done in Numpy?

Documentation reference
If you want to stack matrices in numpy, you can use the stack function.
If you also want the index to correspond to the exponent, you might want to add a unity matrix to the beginning of your output:
MWE
import numpy as np
def expo(A, n):
result =[np.eye(len(A)), A,]
for _ in range(n-1):
result.append(result[-1].dot(A))
return np.stack(result, axis=0)
# If you do not really need the 3D array,
# you could also just return the list
result = expo(np.array([[1,-2],[-2,1]]), 3)
print(result)
# [[[ 1. 0.]
# [ 0. 1.]]
#
# [[ 1. -2.]
# [ -2. 1.]]
#
# [[ 5. -4.]
# [ -4. 5.]]
#
# [[ 13. -14.]
# [-14. 13.]]]
print(result[1])
# [[ 1. -2.]
# [-2. 1.]]
Comments
As you can see, we first simply create the list of matrices, and then convert them to an array at the end. I am not sure if you really need the 3D array though, as you could also just index the list that was created, but that depends on your use case, if that is convenient or not.
I guess the axis keyword argument for a lot of numpy functions can be confusing at first, but the documentation usually has good examples that combined with same trial and error, should get you pretty far. For example for numpy.stack, the very first example is indeed exactly what you want to do.

Error with Padlen in signal.filtfilt in Python

I am working with library "scipy.signal" in Python and I have the next code:
from scipy import signal
b = [ 0.001016 0.00507999 0.01015998 0.01015998 0.00507999 0.001016 ]
a = [ 1. -3.0820186 4.04351697 -2.76126457 0.97291013 -0.14063199]
data = [[ 1.]
[ 1.]
[ 1.]
...]
# length = 264
y = signal.filtfilt(b, a, data)
But when I execute the code I get the next error message:
The length of the input vector x must be at least padlen, which is 18.
What could I do?

It appears that data is a two-dimensional array with shape (264, 1). By default, filtfilt filters along the last axis of the input array, so in your case it is trying to filter along an axis where the length of the data is 1, which is not long enough for the default padding method.
I assume you meant to interpret data as a one-dimensional array. You can add the argument axis=0
y = signal.filtfilt(b, a, data, axis=0)
to filter along the first dimension (i.e. down the column), in which case the output y will also have shape (264, 1). Alternatively, you can convert the input to a one-dimensional array by flattening it with np.ravel(data) or by using indexing to select the first (and only) column, data[:, 0]. (The latter will only work if data is, in fact, a numpy array and not a list of lists.) E.g.
y = signal.filtfilt(b, a, np.ravel(data))
In that case, the output y will also be a one-dimensional array, with shape (264,).

Assuming you have a two-dimensional array with shape (264, 2), you can also use np.hsplit() to split data into two separate arrays like so:
import numpy as np
arr1, arr2 = np.hsplit(data,2)
You can view the shape of each individual array, for example:
print(arr1.shape)
Your code will then look something like this:
y1 = signal.filtfilt(b, a, arr1)
y2 = signal.filtfilt(b, a, arr2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the fastest way to stack numpy arrays in a loop? - python

Using concatenate, initializing "c": a = np.array([[8,3,1],[2,5,1],[6,5,2]]) b = np.array([[2,5,1],[2,5,2]]) matrix = [a,b] c = np.empty([0,matrix[0].shape[1]]) for v in matrix: c = np.append(c, v, axis=0) Output: [[8. 3. 1.] [2. 5. 1.] [6. 5. 2.] [2. 5. 1.] [2. 5. 2.]]

If you have an array a of size (40, 2) and an array b of size (175,2), you can simply have a final array of size (215, 2) using np.concatenate([a,b]).

Related

Masking arrays using numpy

Performing a random addition between two arrays

How to use numpy empty_like

Numpy appending two-dimensional arrays together

Error with Padlen in signal.filtfilt in Python

Categories

Resources