how to merge several arrays stored in list - python

I want to concatenate several arrays store in list. Length of the arrays are different. I already read this solution, but unfortunately I could not solve my problem. This is is simplified input data:
arr_all= [array([[1 ,2 , 10],
[5, 8, 3]]),
array([[1, 0, 5]]),
array([[0, 1, 8]]),
array([[9, 13, 0]]),
array([[2, 10, 2],
[1.1, 3, 3]]),
array([[25, 0, 0]])]
n_data_sets=2
n_repetition=3
Now, I want to merge (concatenate) the first array of arr_all (arr_all[0]) with the fourth one (arr_all[3]), the second (arr_all[1]) with the fifth one (arr_all[4]) and the third one (arr_all[2]) with the last one (arr_all[5]). In fact here I have two data sets (n_data_sets=2) which are repeated three times (n_repetition=3). In reality I have several data sets that are repeated tens of times. I want to put each data set in a single array of my list. I can say the input is sorted based on the repetition but I want make it based on the data sets of each repetition. My expected result is:
arr_all= [array([[1, 2 , 10],
[5, 8, 3],
[9, 13, 0]]),
array([[1, 0, 5],
[2, 10, 2],
[1.1, 3, 3]]),
array([[0, 1, 8],
[25, 0, 0]])]
My input data was a list with six arrays (n_repetition times n_data_sets) but my result has n_repetition arrays.
In advance I appreciate any feedback.

To further Alexander's response, this is what I came up with:
import numpy as np
arr_all = [np.array([[1, 2, 10], [5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2], [1.1, 3, 3]]),
np.array([[25, 0, 0]])]
n_data_sets = 2
n_repetition = 3
new_array = []
for i in range(n_repetition):
dataset = arr_all[i]
for j in range(n_data_sets-1):
dataset = np.concatenate([dataset, arr_all[i+(n_repetition*(j+1))]])
new_array.append(dataset)
print(new_array)
I also found a cleaner method, but which is possibly worse in terms of time:
import numpy as np
arr_all = [np.array([[1, 2, 10], [5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2], [1.1, 3, 3]]),
np.array([[25, 0, 0]])]
n_data_sets = 2
n_repetition = 3
reshaped = np.reshape(arr_all, (n_repetition, n_data_sets), order='F')
new = []
for arr in reshaped:
new.append(np.concatenate(arr))
print(new)

Two merge always the first half with the seconds half (if this was your intention), you can do something like this (which will work if you have an even amount of arrays.
import numpy as np
arr_all= [np.array([[1 ,2 , 10],
[5, 8, 3]]),
np.array([[1, 0, 5]]),
np.array([[0, 1, 8]]),
np.array([[9, 13, 0]]),
np.array([[2, 10, 2],
[1.1, 3, 3]]),
np.array([[25, 0, 0]])]
half = int(len(arr_all)/2)
new = []
for i in range(half):
new.append(np.concatenate((arr_all[i],arr_all[i+half]), axis=0))
print(new)

Related

Rearranging 2D numpy array by 2D row and column arrays

I have tried to find a similar question but so far it seems only half my question can be answered.
I have a 2D numpy array, e.g.:
a= np.array([[6, 4, 5],
[4, 7, 8],
[2, 8, 9]])
And i also have 2 further numpy arrays, indicating the rows, and columns where i would like to rearrange (or not):
rows= np.array([[0, 0, 0],
[1, 0, 1],
[2, 2, 2]])
cols= np.array([[0, 1, 2],
[0, 0, 2],
[0, 1, 2]])
now i would like to rearrange the array "a" based on these indices, so that the result is:
result= np.array([[6, 4, 5],
[4, 6, 8],
[2, 8, 9]])
Doing this only for columns or only for rows is easy, e.g. see this Thread:
np.array(list(map(lambda x, y: y[x], cols, a)))
This is a typical case of fancy/array indexing:
result = a[rows, cols]
Output:
array([[6, 4, 5],
[4, 6, 8],
[2, 8, 9]])

Numpy - Minimum memory usage when slicing images?

I have a memory usage problem in python but haven't been able to find a satisfying solution yet.
The problem is quite simple :
I have collection of images as numpy arrays of shape (n_samples, size_image). I need to slice each image in the same way and feed these slices to a classification algorithm all at once.
How do you take numpy array slices without duplicating data in memory?
Naively, as slices are simple "views" of the original data, I assume that there must be a way to do the slicing without copying data in the memory.
The problem being critical when dealing with large datasets such as the MNIST handwritten digits dataset.
I have tried to find a solution using numpy.lib.stride_tricks.as_strided but struggle to get it work on collections of images.
A similar toy problem would be to slice the scikit handwritten digits in a memory-friendly way.
from sklearn.datasets import load_digits
digits = load_digits()
X = digits.data
X has shape (1797, 64) , i.e. the picture is a 8x8 element.
With a window size of 6x6 it will give (8-6+1)*(8-6+1) = 9 slices of size 36 per image resulting in an array sliced_Xof shape (16173, 36).
Now the question is how do you get from X to sliced_Xwithout using too much memory???
I would start off assuming that the input array is (M,n1,n2) (if it's not we can always reshape it). Here's an implementation to have a sliding windowed view into it with an output array of shape (M,b1,b2,n1-b1+1,n2-b2+1) with the block size being (b1,b2) -
def strided_lastaxis(a, blocksize):
d0,d1,d2 = a.shape
s0,s1,s2 = a.strides
strided = np.lib.stride_tricks.as_strided
out_shp = (d0,) + tuple(np.array([d1,d2]) - blocksize + 1) + blocksize
return strided(a, out_shp, (s0,s1,s2,s1,s2))
Being a view it won't occupy anymore of memory space, so we are doing okay on memory. But keep in mind that we shouldn't reshape, as that would force a memory copy.
Here's a sample run to make things with a manual check -
Setup input and get output :
In [72]: a = np.random.randint(0,9,(2, 6, 6))
In [73]: out = strided_lastaxis(a, blocksize=(4,4))
In [74]: np.may_share_memory(a, out) # Verify this is a view
Out[74]: True
In [75]: a
Out[75]:
array([[[1, 7, 3, 5, 6, 3],
[3, 2, 3, 0, 1, 5],
[6, 3, 5, 5, 3, 5],
[0, 7, 0, 8, 2, 4],
[0, 3, 7, 3, 4, 4],
[0, 1, 0, 8, 8, 1]],
[[4, 1, 4, 5, 0, 8],
[0, 6, 5, 6, 6, 7],
[6, 3, 1, 8, 6, 0],
[0, 1, 1, 7, 6, 8],
[6, 3, 3, 1, 6, 1],
[0, 0, 2, 4, 8, 3]]])
In [76]: out.shape
Out[76]: (2, 3, 3, 4, 4)
Output values :
In [77]: out[0,0,0]
Out[77]:
array([[1, 7, 3, 5],
[3, 2, 3, 0],
[6, 3, 5, 5],
[0, 7, 0, 8]])
In [78]: out[0,0,1]
Out[78]:
array([[7, 3, 5, 6],
[2, 3, 0, 1],
[3, 5, 5, 3],
[7, 0, 8, 2]])
In [79]: out[0,0,2]
Out[79]:
array([[3, 5, 6, 3],
[3, 0, 1, 5],
[5, 5, 3, 5],
[0, 8, 2, 4]]) # ............
In [80]: out[1,2,2] # last block
Out[80]:
array([[1, 8, 6, 0],
[1, 7, 6, 8],
[3, 1, 6, 1],
[2, 4, 8, 3]])

Implicit looping through numpy array to replace values

I'm new to python and am trying to find a way to implicitly replace values in "array_to_replace" with one of two values in "values_to_use" based on the values in "array_of_positions":
First, the setup:
values_to_use = np.array([[0.5, 0.3, 0.4], [0.6, 0.7, 0.75]])
array_of_positions = np.array([0, 1, 1, 0, 1, 0, 0, 1, 0, 1])
array_to_replace = np.array([[5, 5, 4], [6, 5, 4], [1, 2, 3], [9, 9, 9], [8, 8, 8], [7, 7, 7], [6, 5, 7], [5, 7, 9], [1, 3, 5], [3, 3, 3]])
Then, the brute force way to do what I want, which is to replace values in "array_to_replace" based on conditional values in "array_of_positions", is something like the following:
for pos in range(0, len(aray_to_replace)):
if (array_of_positions[pos] == 0):
array_to_replace[pos] = values_to_use[0]
else:
array_to_replace[pos] = values_to_use[1]
Would you have any recommendations on how to do this happen implicitly?
The answer for this turned out to be pretty simple. To get what I wanted, all I needed to do was the following:
print values_to_use[array_of_positions]
This gave me what I needed.

Efficient way of making a list of pairs from an array in Numpy

I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])

Python delete row in numpy array

I have a large numpy array (8 by 30000) and I want to delete some rows according to some criteria. This criteria is only applicable in one column.
Example:
>>> p = np.array([[0, 1, 3], [1 , 5, 6], [4, 3, 56], [1, 34, 4]])
>>> p
array([[ 0, 1, 3],
[ 1, 5, 6],
[ 4, 3, 56],
[ 1, 34, 4]])
here I would like to remove every row in which the value of the 3rd column is >30, ie. here row 3.
As the array is pretty large, I'd like to avoid for loops. I thought of this:
>>> a[~(a>30).any(1), :]
array([[0, 1, 3],
[1, 5, 6]])
But there, it obviously removes the two last rows. Any ideas on how to do that in a efficient way?
p = p[~(p[:,2] > 30)]
or (if your condition is easily inversible):
p = p[p[:,2] <= 30]
returns
array([[ 0, 1, 3],
[ 1, 5, 6],
[ 1, 34, 4]])

Categories