Efficient way of inserting elements in a 2D array - python

I am unsuccessful in turning this function into a vectorised one:
a=np.asarray([[1,2,3],[3,4,5]])
inds=np.asarray([0,2])
vals=np.asarray([10,12])
def new_insert(arr,inds,vals):
ret=np.zeros((arr.shape[0],arr.shape[1]+1))
for i in range(arr.shape[0]):
ret[i]=np.insert(arr[i],inds[i],vals[i])
return ret
print new_insert(a,inds,vals)
With output:
[[ 10. 1. 2. 3.]
[ 3. 4. 12. 5.]]
Any helps?

You can switch to a 1d view of your array a:
shape = a.shape
a.shape = np.multiply(*shape)
recalculate indexes for 1-d array:
ind1d = [i+e*shape[0] for i, e in enumerate(ind)]
insert in 1d array
b = np.insert(a, ind1d, vals)
and reshape result back to 2d
b.shape = (shape[0], shape[1]+1)
So, finally, we get
>>> b
array([[10, 1, 2, 3],
[ 3, 4, 12, 5]])
An onliner, proposed by #askewchan in comments, using np.ravel_multi_index helper function to flatten index:
>>> np.insert(a.flat, np.ravel_multi_index((np.arange(ind.size), ind),
... a.shape), vals).reshape(a.shape[0], -1)
array([[10, 1, 2, 3],
[ 3, 4, 12, 5]])

Figured I'd post my comment to #alko's answer as an answer, since it looks a bit confusing as one line:
b = np.insert(a.flat, np.ravel_multi_index((np.arange(ind.size), ind), a.shape), vals).reshape(a.shape[0], -1)
This is basically the same as #alko's but it has a few advantages:
It does not modify a itself, by using the a.flat iterator instead of actually changing the shape of a.
Avoids potential bugs by using the np.ravel_multi_index to create the ind1d array instead of doing it manually.
It is a tiny bit (10%) faster.
In steps similar to alko's, this is what it does:
ind1d = np.ravel_multi_index((np.arange(ind.size), ind), a.shape)
where ind refers to column index, so use np.arange to refer to row index. Then, insert into the a.flat iterator instead of the reshaped a:
b = np.insert(a.flat, ind1d, vals)
Finally, reshape:
b = b.reshape(a.shape[0], -1) # the -1 allows any shape at the end

Related

Numpy: How to stack arrays in columns?

Let's say that I have n numpy arrays of the same length. I would like to now create a numpy matrix, sucht that each column of the matrix is one of the numpy arrays. How can I achieve this? Now I'm doing this in a loop and it produces the wrong results.
Note: I have to be able to stack them next to each other one by one iteratively.
my code looks like assume that get_array is a function that returns a certain array based on its argument. I don't know until after the loop, how many columns that I'm going to have.
matrix = np.empty((n_rows,))
for item in sorted_arrays:
array = get_array(item)
matrix = np.vstack((matrix,array))
any help would be appreciated
You could try putting all your arrays (or lists) into a matrix and then transposing it. This will work if all arrays are the same length.
mymatrix = np.asmatrix((array1, array2, array3)) #... putting arrays into matrix.
mymatrix = mymatrix.transpose()
This should output a matrix with each array as a column. Hope this helps.
Time and again, we recommend collecting the arrays in a list, and making the final array with one call. That's more efficient, and usually easier to get right.
alist = []
for item in sorted_arrays:
alist.append(get_array(item)
or
alist = [get_array(item) for item in sorted_arrays]
There are various ways of assembling the list. Since you want columns, and assuming get_array produces equal sized 1d arrays:
arr = np.column_stack(alist)
Collecting them in rows and transposing that works too:
arr = np.array(alist).T
arr = np.vstack(alist).T
arr = np.stack(alist).T
arr = np.stack(alist, axis=1)
If the arrays are already 2d
arr = np.concatenate(alist, axis=1)
All the stack variations use concatenate, just varying in how they tweak the shape(s) of the input arrays. The key to using concatenate is to understand the dimensions and shapes, and how to add dimensions as needed. That should, soon or later, become fluent in that kind of coding.
If they vary in shape or dimensions, things get messier.
Equally good is to put the arrays in a pre-allocated array. But you need to know the desired final shape
arr = np.zeros((m,n), dtype)
for i, item in enumerate(sorted_arrays):
arr[:,i] = get_array(item)
n is len(sorted_arrays), and m is the length of one of get_array(item). You also need to know the expected dtype (int, float etc).
If you have a, b, c, d np array of same length, the following code will accomplish what you want:
out_matrix = np.vstack([a, b, c, d]).transpose()
An example:
In [3]: a = np.array([1, 2, 3, 4])
In [4]: b = np.array([5, 6, 7, 8])
In [5]: c = np.array([2, 3, 4, 5])
In [6]: d = np.array([6, 8, 2, 4])
In [10]: np.vstack([a, b, c, d]).transpose()
Out[10]:
array([[1, 5, 2, 6],
[2, 6, 3, 8],
[3, 7, 4, 2],
[4, 8, 5, 4]])

Seemingly inconsistent slicing behavior in numpy arrays

I ran across something that seemed to me like inconsistent behavior in Numpy slices. Specifically, please consider the following example:
import numpy as np
a = np.arange(9).reshape(3,3) # a 2d numpy array
y = np.array([1,2,2]) # vector that will be used to index the array
b = a[np.arange(len(a)),y] # a vector (what I want)
c = a[:,y] # a matrix ??
I wanted to obtain a vector such that the i-th element is a[i,y[i]]. I tried two things (b and c above) and was surprised that b and c are not the same... in fact one is a vector and the other is a matrix! I was under the impression that : was shorthand for "all elements" but apparently the meaning is somewhat more subtle.
After trial and error I somewhat understand the difference now (b == np.diag(c)), but would appreciate clarification on why they are different, what exactly using : implies, and how to understand when to use either case.
Thanks!
It's hard to understand advanced indexing (with lists or arrays) without understanding broadcasting.
In [487]: a=np.arange(9).reshape(3,3)
In [488]: idx = np.array([1,2,2])
Index with a (3,) and (3,) producing shape (3,) result:
In [489]: a[np.arange(3),idx]
Out[489]: array([1, 5, 8])
Index with (3,1) and (3,), result is (3,3)
In [490]: a[np.arange(3)[:,None],idx]
Out[490]:
array([[1, 2, 2],
[4, 5, 5],
[7, 8, 8]])
The slice : does basically the same thing. There are subtle differences, but here it's the same.
In [491]: a[:,idx]
Out[491]:
array([[1, 2, 2],
[4, 5, 5],
[7, 8, 8]])
ix_ does the same thing, converting the (3,) & (3,) to (3,1) and (1,3):
In [492]: np.ix_(np.arange(3),idx)
Out[492]:
(array([[0],
[1],
[2]]), array([[1, 2, 2]]))
A broadcasted sum might help visualize the two cases:
In [495]: np.arange(3)*10+idx
Out[495]: array([ 1, 12, 22])
In [496]: np.sum(np.ix_(np.arange(3)*10,idx),axis=0)
Out[496]:
array([[ 1, 2, 2],
[11, 12, 12],
[21, 22, 22]])
When you pass
np.arange(len(a)), y
You can view the result as being all the indexed pairs for the zipped elements you passed. In this case, indexing by np.arange(len(a)) and y
np.arange(len(a))
# [0, 1, 2]
y
# [1, 2, 2]
effectively takes elements: (0, 1), (1, 2), and (2, 2).
print(a[0, 1], a[1, 2], a[2, 2]) # 0th, 1st, 2nd elements from each indexer
# 1 5 8
In the second case, you take the entire slice along the first dimension. (Nothing before the colon.) So this is all elements along the 0th axis. You then specify with y that you want the 1st, 2nd, and 2nd element along each row. (0-indexed.)
As you pointed out, it may seem a bit unintuitive that the results are different given that the individual elements of the slice are equivalent:
a[:] == a[np.arange(len(a))]
and
a[:y] == a[:y]
However, NumPy advanced indexing cares what type of data structure you pass when indexing (tuples, integers, etc). Things can become hairy very quickly.
The detail behind that is this: first consider all NumPy indexing to be of the general form x[obj], where obj is the evaluation of whatever you passed. How NumPy "behaves" depends on what type of object obj is:
Advanced indexing is triggered when the selection object, obj, is a
non-tuple sequence object, an ndarray (of data type integer or bool),
or a tuple with at least one sequence object or ndarray (of data type
integer or bool).
...
The definition of advanced indexing means that x[(1,2,3),] is
fundamentally different than x[(1,2,3)]. The latter is equivalent to
x[1,2,3] which will trigger basic selection while the former will
trigger advanced indexing. Be sure to understand why this occurs.
In your first case, obj = np.arange(len(a)),y, a tuple that fits the bill in bold above. This triggers advanced indexing and forces the behavior described above.
As for the second case, [:,y]
When there is at least one slice (:), ellipsis (...) or np.newaxis in
the index (or the array has more dimensions than there are advanced
indexes), then the behaviour can be more complicated. It is like
concatenating the indexing result for each advanced index element.
Demonstrated:
# Concatenate the indexing result for each advanced index element.
np.vstack((a[0, y], a[1, y], a[2, y]))

numpy.delete not removing column from array

I'm attempting to remove each column one at a time from an array and, based on the documentation and this question, thought the following should work:
print(all_input_data.shape)
for n in range(9):
print(n)
testArray = all_input_data.copy()
print(testArray.shape)
np.delete(testArray,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])
The original matrix is all_input_data.
This is not causing any columns to be deleted or generating any other change to the array. The initial output for the snippet above is:
(682120, 9)
0
(682120, 9)
(682120, 9)
[[ 2.37000000e+02 1.60000000e+01 9.90000000e+01 1.04910000e+03
9.29000000e-01 9.86000000e-01 8.43000000e-01 4.99290000e+01
1.97000000e+00]]
The delete command is not changing the shape of the matrix at all.
np.delete returns a copy of the input array with elements removed.
Return a new array with sub-arrays along an axis deleted.
There is no in place deletion of array elements in numpy.
Because np.delete returns a copy and does not modify the input there is no need to manually make a copy of all_input_data:
import numpy as np
all_input_data = np.random.rand(100, 9)
for n in range(9):
print(n)
testArray = np.delete(all_input_data,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])
From linked question consider this:
In [2]: a = np.arange(12).reshape(3,4)
In [3]: np.delete(a, [1,3], axis=1)
Out[3]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In other words, if you want to save changes you should save to a new variable, but considering the size of your matrix this wouldn't be practical. What you could do is use slice notation indexing. It is explained here.

Index of multidimensional array

I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])

Reverse part of an array using NumPy

I am trying to use array slicing to reverse part of a NumPy array. If my array is, for example,
a = np.array([1,2,3,4,5,6])
then I can get a slice b
b = a[::-1]
Which is a view on the original array. What I would like is a view that is partially reversed, for example
1,4,3,2,5,6
I have encountered performance problems with NumPy if you don't play along exactly with how it is designed, so I would like to avoid "fancy" indexing if it is possible.
If you don't like the off by one indices
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[1:4][::-1]
>>> a
array([1, 4, 3, 2, 5, 6])
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[3:0:-1]
>>> a
array([1, 4, 3, 2, 5, 6])
You can use the permutation matrices (that's the numpiest way to partially reverse an array).
a = np.array([1,2,3,4,5,6])
new_order_for_index = [1,4,3,2,5,6] # Careful: index from 1 to n !
# Permutation matrix
m = np.zeros( (len(a),len(a)) )
for index , new_index in enumerate(new_order_for_index ):
m[index ,new_index -1] = 1
print np.dot(m,a)
# np.array([1,4,3,2,5,6])

Categories