python numpy mutate selection - python

i encountered a strange bug(?) in numpy:
Given a nested array:
p = np.asarray([[1., 2., 3.], [-4., -5., -6.], [1,2,-4]], dtype=np.float32)
which is
array([[ 1., 2., 3.],
[-4., -5., -6.],
[ 1., 2., -4.]], dtype=float32)
i want to mutate the third entry of the array conditional like
p[p[:, 2] <0][:, 2] *= -1
The last statement however does not mutate p.
I HOPE for output like
array([[ 1., 2., 3.],
[-4., -5., 6.],
[ 1., 2., 4.]], dtype=float32)
but in fact it does nothing at all. p stays unchanged.
I tested many things and i do not come to a conclusion why p does not mutate.
Of course i can somehow work around this, but this seems strange to me.
Cheers and thanks in advance.
Daniel

Reversing the order of your square brackets should fix it:
p[:, 2][p[:, 2] < 0] *= -1
Boolean indexing returns a copy, unless you are doing an assignment to it, which you can achieve by making it be the last indexing operation.

You've modified a copy of the original array. If you want to mutate original array you should use something like this:
p[p[:, 2] <0, 2] *= -1

p[boolean_array] returns a copy, so you modify your copy but leave your original unchanged. You could use np.where instead for example. Something like p[:,2] = np.where(p[:,2], p[:,2], -p[:,2])

Related

How to delete multiple value from matrix numpy low computational cost

I've recently been trying my hand at numpy, and I'm trying to find a solution to delete the elements inside the matrix at column 2 equal to the value stored in the variable element.
Since I am a large amount of data I would need to know if there was a more efficient method which takes less time to execute than the classic for.
I enclose an example:
element = [ 85., 222., 166., 238.]
matrix = [[228., 1., 222.],
[140., 0., 85.],
[140., 0., 104.],
[230., 0., 217.],
[115., 1., 250.],
[12., 1., 166.],
[181., 1., 238.]]
the output:
matrix = [[140., 0., 104.],
[230., 0., 217.],
[115., 1., 250.]]
The method I used is the following:
for y in element:
matrix = matrix[(matrix[:,2]!= y)]
When running it for a large amount of data it takes a long time. Is there anything more efficient, so that you can save on execution?
Since you tagged numpy, I'd assume matrix is a numpy array. With that, you can use np.isin for your purpose:
matrix = np.array(matrix)
matrix[~np.isin(np.array(matrix)[:,2], element)]
Output:
array([[140., 0., 104.],
[230., 0., 217.],
[115., 1., 250.]])

Python Median Filter for 1D numpy array

I have a numpy.array with a dimension dim_array. I'm looking forward to obtain a median filter like scipy.signal.medfilt(data, window_len).
This in fact doesn't work with numpy.array may be because the dimension is (dim_array, 1) and not (dim_array, ).
How to obtain such filter?
Next, another question, how can I obtain other filter, i.e., min, max, mean?
Based on this post, we could create sliding windows to get a 2D array of such windows being set as rows in it. These windows would merely be views into the data array, so no memory consumption and thus would be pretty efficient. Then, we would simply use those ufuncs along each row axis=1.
Thus, for example sliding-median` could be computed like so -
np.median(strided_app(data, window_len,1),axis=1)
For the other ufuncs, just use the respective ufunc names there : np.min, np.max & np.mean. Please note this is meant to give a generic solution to use ufunc supported functionality.
For the best performance, one must still look into specific functions that are built for those purposes. For the four requested functions, we have the builtins, like so -
Median : scipy.signal.medfilt.
Max : scipy.ndimage.filters.maximum_filter1d.
Min : scipy.ndimage.filters.minimum_filter1d.
Mean : scipy.ndimage.filters.uniform_filter1d
The fact that applying of a median filter with the window size 1 will not change the array gives us a freedom to apply the median filter row-wise or column-wise.
For example, this code
from scipy.ndimage import median_filter
import numpy as np
arr = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
median_filter(arr, size=3, cval=0, mode='constant')
#with cval=0, mode='constant' we set that input array is extended with zeros
#when window overlaps edges, just for visibility and ease of calculation
outputs an expected filtered with window (3, 3) array
array([[0., 2., 0.],
[2., 5., 3.],
[0., 5., 0.]])
because median_filter automatically extends the size to all dimensions, so the same effect we can get with:
median_filter(arr, size=(3, 3), cval=0, mode='constant')
Now, we can also apply median_filter row-wise with setting 1 to the first element of size
median_filter(arr, size=(1, 3), cval=0, mode='constant')
Output:
array([[1., 2., 2.],
[4., 5., 5.],
[7., 8., 8.]])
And column-wise with the same logic
median_filter(arr, size=(3, 1), cval=0, mode='constant')
Output:
array([[1., 2., 3.],
[4., 5., 6.],
[4., 5., 6.]])

logically adding in numpy

I want to do the following operation. But It likes the histogram operation.
maxIndex = 6
dst =zeros((1,6))
a =array([1,2,3,4,7,0,3,4,5,7])
index=array([1,1,1,3,3,4,4,5,5,5])
a's length == index's length,
for i in (a.size):
dst[index[i]] = dst[index[i]] + a[i]
How can I do this more pythonic. and more efficiently
If I understand correctly, I think you are looking for numpy.bincount:
dst = numpy.bincount(index, weights=a, minlength=maxIndex)
This give me array([ 0., 6., 0., 11., 3., 16.]) as the output. If you don't want to calculate maxIndex by hand, you can omit minlength parameter from the function call and numpy will return an appropriately-sized array for you.

How do I add a column to a python (matix) multi-dimensional array? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What's the simplest way to extend a numpy array in 2 dimensions?
I've been frustrated as a Matlab user switching over to python because I don't know all the tricks and get stuck hacking together code until it works. Below is an example where I have a matrix that I want to add a dummy column to. Surely, there is a simpler way then the zip vstack zip method below. It works, but it is totally a noob attempt. Please enlighten me. Thank you in advance for taking the time for this tutorial.
# BEGIN CODE
from pylab import *
# Find that unlike most things in python i must build a dummy matrix to
# add stuff in a for loop.
H = ones((4,10-1))
print "shape(H):"
print shape(H)
print H
### enter for loop to populate dummy matrix with interesting data...
# stuff happens in the for loop, which is awesome and off topic.
### exit for loop
# more awesome stuff happens...
# Now I need a new column on H
H = zip(*vstack((zip(*H),ones(4)))) # THIS SEEMS LIKE THE DUMB WAY TO DO THIS...
print "shape(H):"
print shape(H)
print H
# in conclusion. I found a hack job solution to adding a column
# to a numpy matrix, but I'm not happy with it.
# Could someone educate me on the better way to do this?
# END CODE
Use np.column_stack:
In [12]: import numpy as np
In [13]: H = np.ones((4,10-1))
In [14]: x = np.ones(4)
In [15]: np.column_stack((H,x))
Out[15]:
array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
In [16]: np.column_stack((H,x)).shape
Out[16]: (4, 10)
There are several functions that let you concatenate arrays in different dimensions:
np.vstack along axis=0
np.hstack along axis=1
np.dstack along axis=2
In your case, the np.hstack looks what you want. np.column_stack stacks a set 1D arrays as a 2D array, but you have already a 2D array to start with.
Of course, nothing prevents you to do it the hard way:
>>> new = np.empty((a.shape[0], a.shape[1]+1), dtype=a.dtype)
>>> new.T[:a.shape[1]] = a.T
Here, we created an empty array with an extra column, then used some tricks to set the first columns to a (using the transpose operator T, so that new.T has an extra row compared to a.T...)

How to operate elementwise on a matrix of type scipy.sparse.csr_matrix?

In numpy if you want to calculate the sinus of each entry of a matrix (elementise) then
a = numpy.arange(0,27,3).reshape(3,3)
numpy.sin(a)
will get the job done! If you want the power let's say to 2 of each entry
a**2
will do it.
But if you have a sparse matrix things seem more difficult. At least I haven't figured a way to do that besides iterating over each entry of a lil_matrix format and operate on it.
I've found this question on SO and tried to adapt this answer but I was not succesful.
The Goal is to calculate elementwise the squareroot (or the power to 1/2) of a scipy.sparse matrix of CSR format.
What would you suggest?
The following trick works for any operation which maps zero to zero, and only for those operations, because it only touches the non-zero elements. I.e., it will work for sin and sqrt but not for cos.
Let X be some CSR matrix...
>>> from scipy.sparse import csr_matrix
>>> X = csr_matrix(np.arange(10).reshape(2, 5), dtype=np.float)
>>> X.A
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.]])
The non-zero elements' values are X.data:
>>> X.data
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
which you can update in-place:
>>> X.data[:] = np.sqrt(X.data)
>>> X.A
array([[ 0. , 1. , 1.41421356, 1.73205081, 2. ],
[ 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ]])
Update In recent versions of SciPy, you can do things like X.sqrt() where X is a sparse matrix to get a new copy with the square roots of elements in X.

Categories