numpy.delete not removing column from array - python

I'm attempting to remove each column one at a time from an array and, based on the documentation and this question, thought the following should work:
print(all_input_data.shape)
for n in range(9):
print(n)
testArray = all_input_data.copy()
print(testArray.shape)
np.delete(testArray,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])
The original matrix is all_input_data.
This is not causing any columns to be deleted or generating any other change to the array. The initial output for the snippet above is:
(682120, 9)
0
(682120, 9)
(682120, 9)
[[ 2.37000000e+02 1.60000000e+01 9.90000000e+01 1.04910000e+03
9.29000000e-01 9.86000000e-01 8.43000000e-01 4.99290000e+01
1.97000000e+00]]
The delete command is not changing the shape of the matrix at all.

np.delete returns a copy of the input array with elements removed.
Return a new array with sub-arrays along an axis deleted.
There is no in place deletion of array elements in numpy.
Because np.delete returns a copy and does not modify the input there is no need to manually make a copy of all_input_data:
import numpy as np
all_input_data = np.random.rand(100, 9)
for n in range(9):
print(n)
testArray = np.delete(all_input_data,[n],axis=1)
print(testArray.shape)
print(testArray[0:1][:])

From linked question consider this:
In [2]: a = np.arange(12).reshape(3,4)
In [3]: np.delete(a, [1,3], axis=1)
Out[3]:
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
In [4]: a
Out[4]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In other words, if you want to save changes you should save to a new variable, but considering the size of your matrix this wouldn't be practical. What you could do is use slice notation indexing. It is explained here.

Related

ordering an array based on values of another array

This question is probably basic for some of you but it I am new to Python. I have an initial array:
initial_array = np.array ([1, 6, 3, 4])
I have another array.
value_array= np.array ([10, 2, 3, 15])
I want an array called output array which looks at the values in value_array and reorder the initial array.
My result should look like this:
output_array = np.array ([4, 1, 3, 6])
Does anyone know if this is possible to do in Python?
So far I have tried:
for i in range(4):
find position of element
You can use numpy.argsort to find sort_index from value_array then rearrange the initial_array base sort_index in the reversing order with [::-1].
>>> idx_sort = value_array.argsort()
>>> initial_array[idx_sort[::-1]]
array([4, 1, 3, 6])
You could use stack to put arrays together - basically adding column to initial array, then sort by that column.
import numpy as np
initial_array = np.array ([1, 6, 3, 4])
value_array = np.array ([10, 2, 3, 15])
output_array = np.stack((initial_array, value_array), axis=1)
output_array=output_array[output_array[:, 1].argsort()][::-1]
print (output_array)
[::-1] part is for descending order. Remove to get ascending.
I am assuming initial_array and values_array will have same length.

Numpy: How to stack arrays in columns?

Let's say that I have n numpy arrays of the same length. I would like to now create a numpy matrix, sucht that each column of the matrix is one of the numpy arrays. How can I achieve this? Now I'm doing this in a loop and it produces the wrong results.
Note: I have to be able to stack them next to each other one by one iteratively.
my code looks like assume that get_array is a function that returns a certain array based on its argument. I don't know until after the loop, how many columns that I'm going to have.
matrix = np.empty((n_rows,))
for item in sorted_arrays:
array = get_array(item)
matrix = np.vstack((matrix,array))
any help would be appreciated
You could try putting all your arrays (or lists) into a matrix and then transposing it. This will work if all arrays are the same length.
mymatrix = np.asmatrix((array1, array2, array3)) #... putting arrays into matrix.
mymatrix = mymatrix.transpose()
This should output a matrix with each array as a column. Hope this helps.
Time and again, we recommend collecting the arrays in a list, and making the final array with one call. That's more efficient, and usually easier to get right.
alist = []
for item in sorted_arrays:
alist.append(get_array(item)
or
alist = [get_array(item) for item in sorted_arrays]
There are various ways of assembling the list. Since you want columns, and assuming get_array produces equal sized 1d arrays:
arr = np.column_stack(alist)
Collecting them in rows and transposing that works too:
arr = np.array(alist).T
arr = np.vstack(alist).T
arr = np.stack(alist).T
arr = np.stack(alist, axis=1)
If the arrays are already 2d
arr = np.concatenate(alist, axis=1)
All the stack variations use concatenate, just varying in how they tweak the shape(s) of the input arrays. The key to using concatenate is to understand the dimensions and shapes, and how to add dimensions as needed. That should, soon or later, become fluent in that kind of coding.
If they vary in shape or dimensions, things get messier.
Equally good is to put the arrays in a pre-allocated array. But you need to know the desired final shape
arr = np.zeros((m,n), dtype)
for i, item in enumerate(sorted_arrays):
arr[:,i] = get_array(item)
n is len(sorted_arrays), and m is the length of one of get_array(item). You also need to know the expected dtype (int, float etc).
If you have a, b, c, d np array of same length, the following code will accomplish what you want:
out_matrix = np.vstack([a, b, c, d]).transpose()
An example:
In [3]: a = np.array([1, 2, 3, 4])
In [4]: b = np.array([5, 6, 7, 8])
In [5]: c = np.array([2, 3, 4, 5])
In [6]: d = np.array([6, 8, 2, 4])
In [10]: np.vstack([a, b, c, d]).transpose()
Out[10]:
array([[1, 5, 2, 6],
[2, 6, 3, 8],
[3, 7, 4, 2],
[4, 8, 5, 4]])

Is there any way to delete the specific elements of an numpy array "In-place" in python:

when calling the "np.delete()", I am not interested to define a new variable for the reduced size array. I want to execute the delete on the original numpy array. Any thought?
>>> arr = np.array([[1,2], [5,6], [9,10]])
>>> arr
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
>>> np.delete(arr, 1, 0)
array([[ 1, 2],
[ 9, 10]])
>>> arr
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
but I want:
>>> arr
array([[ 1, 2],
[ 9, 10]])
NumPy arrays are fixed-size, so there can't be an in-place version of np.delete. Any such function would have to change the array's size.
The closest you can get is reassigning the arr variable:
arr = numpy.delete(arr, 1, 0)
The delete call doesn't modify the original array, it copies it and returns the copy after the deletion is done.
>>> arr1 = np.array([[1,2], [5,6], [9,10]])
>>> arr2 = np.delete(arr, 1, 0)
>>> arr1
array([[ 1, 2],
[ 5, 6],
[ 9, 10]])
>>> arr2
array([[ 1, 2],
[ 9, 10]])
If its a matter of performance you might want to try (but test it since I'm not sure) creating a view* instead of of using np.delete. You can do it by slicing which should be an inplace operation:
import numpy as np
arr = np.array([[1, 2], [5, 6], [9, 10]])
arr = arr[(0, 2), :]
print(arr)
resulting in:
[[ 1 2]
[ 9 10]]
This, however, will not free the memory occupied from the excluded row. It might increase performance but memory wise you might have the same or worse problem. Also notice that, as far as I know, there is no way of indexing by exclusion (for instance arr[~1] would be very useful) which will necessarily make you spend resources in building an indexation array.
For most cases I think the suggestion other users have given, namely:
arr = numpy.delete(arr, 1, 0)
, is the best. In some cases it might be worth exploring the other alternative.
EDIT: *This is actually incorrect (thanks #user2357112). Fancy indexing does not create a view but instead returns a copy as can be seen in the documentation (which I should have checked before jumping to conclusions, sorry about that):
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
As so I'm unsure if the fancy indexing suggestion might be worth something as an actual suggestion unless it has any performance gain against the np.delete method (which I'll try to verify when opportunity arises, see EDIT2).
EDIT2: I performed a very simple test to see if there is any perfomance gain from using fancy indexing by opposition to delete function. Used timeit (actually the first time I've used but it seems the number of executions per snippet is 1 000 000, thus the hight numbers for time):
import numpy as np
import timeit
def test1():
arr = np.array([[1, 2], [5, 6], [9, 10]])
arr = arr[(0, 2), :]
def test2():
arr = np.array([[1, 2], [5, 6], [9, 10]])
arr = np.delete(arr, 1, 0)
print("Equality test: ", test1() == test2())
print(timeit.timeit("test1()", setup="from __main__ import test1"))
print(timeit.timeit("test2()", setup="from __main__ import test2"))
The results are these:
Equality test: True
5.43569152576767
9.476918448174644
Which represents a very considerable speed gain. Nevertheless notice that building the sequence for the fancy indexing will take time. If it is worth or not will surely depend on the problem being solved.
You could implement your own version of delete which copies data elements after the elements to be deleted forward, and then returns a view excluding the (now obsolete) last element:
import numpy as np
# in-place delete
def np_delete(arr, obj, axis=None):
# this is a only simplified example
assert (isinstance(obj, int))
assert (axis is None)
for i in range(obj + 1, arr.size):
arr[i - 1] = arr[i]
return arr[:-1]
Test = 10 * np.arange(10)
print(Test)
deleteIndex = 5
print(np.delete(Test, deleteIndex))
print(np_delete(Test, deleteIndex))
Nothing wrong in your code. you just have to override the variable
arr = np.array([[1,2], [5,6], [9,10]])
arr = np.delete(arr, 1, 0)

How do I sum a numpy array of size (m*n,) in groups of m?

Suppose I have a where a.shape is (m*n,), how do I create a new array that comprises the m sums of each group of n elements efficiently?
The best I came up with is:
a.reshape((m, n)).sum(axis=1)
but this creates an extra new array.
I think there is nothing wrong with using reshape and then taking the sum of the rows, I cannot think of anything faster. According to the manual, reshape should (if possible) return a view on the original array, so no large amount of data is copied. When a view is created, numpy only creates a new header with different strides and shape, with a pointer into the data of the original array. This should cost constant time and memory, independent of the array size.
In [23]: x = np.arange(12)
In [24]: y = x.reshape((3, 4))
In [25]: y
Out[25]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [26]: y.base is x # check if it is a view
Out[26]: True
There is another trick, a variant on cumsum, reduceat. In this case
np.add.reduceat(a, np.arange(0,m*n,n))
For m,n=100,10, it is 2x as fast as x.reshape((m,n)).sum(axis=1).
I haven't used it much, so it took a bit of digging to find in the documentation.

Efficient way of inserting elements in a 2D array

I am unsuccessful in turning this function into a vectorised one:
a=np.asarray([[1,2,3],[3,4,5]])
inds=np.asarray([0,2])
vals=np.asarray([10,12])
def new_insert(arr,inds,vals):
ret=np.zeros((arr.shape[0],arr.shape[1]+1))
for i in range(arr.shape[0]):
ret[i]=np.insert(arr[i],inds[i],vals[i])
return ret
print new_insert(a,inds,vals)
With output:
[[ 10. 1. 2. 3.]
[ 3. 4. 12. 5.]]
Any helps?
You can switch to a 1d view of your array a:
shape = a.shape
a.shape = np.multiply(*shape)
recalculate indexes for 1-d array:
ind1d = [i+e*shape[0] for i, e in enumerate(ind)]
insert in 1d array
b = np.insert(a, ind1d, vals)
and reshape result back to 2d
b.shape = (shape[0], shape[1]+1)
So, finally, we get
>>> b
array([[10, 1, 2, 3],
[ 3, 4, 12, 5]])
An onliner, proposed by #askewchan in comments, using np.ravel_multi_index helper function to flatten index:
>>> np.insert(a.flat, np.ravel_multi_index((np.arange(ind.size), ind),
... a.shape), vals).reshape(a.shape[0], -1)
array([[10, 1, 2, 3],
[ 3, 4, 12, 5]])
Figured I'd post my comment to #alko's answer as an answer, since it looks a bit confusing as one line:
b = np.insert(a.flat, np.ravel_multi_index((np.arange(ind.size), ind), a.shape), vals).reshape(a.shape[0], -1)
This is basically the same as #alko's but it has a few advantages:
It does not modify a itself, by using the a.flat iterator instead of actually changing the shape of a.
Avoids potential bugs by using the np.ravel_multi_index to create the ind1d array instead of doing it manually.
It is a tiny bit (10%) faster.
In steps similar to alko's, this is what it does:
ind1d = np.ravel_multi_index((np.arange(ind.size), ind), a.shape)
where ind refers to column index, so use np.arange to refer to row index. Then, insert into the a.flat iterator instead of the reshaped a:
b = np.insert(a.flat, ind1d, vals)
Finally, reshape:
b = b.reshape(a.shape[0], -1) # the -1 allows any shape at the end

Categories