Numpy element wise division not working as expected

Numpy element wise division not working as expected - python

From the docs, here is how element division works normally
a1 = np.array([8,12,14])
b1 = np.array([4,6,7])
a1/b1
array([2, 2, 2])
Which works. I am trying the same thing, I think, on different arrays and it doesn't. For two 3-dim vectors it is returning a 3x3 matrix. I even made sure their "shape is same" but no difference.
>> t
array([[ 3.17021277e+00],
[ 4.45795858e-15],
[ 7.52842809e-01]])
>> s
array([ 1.00000000e+00, 7.86202619e+02, 7.52842809e-01])
>> t/s
array([[ 3.17021277e+00, 4.03231011e-03, 4.21098897e+00],
[ 4.45795858e-15, 5.67024132e-18, 5.92149984e-15],
[ 7.52842809e-01, 9.57568432e-04, 1.00000000e+00]])
t/s.T
array([[ 3.17021277e+00, 4.03231011e-03, 4.21098897e+00],
[ 4.45795858e-15, 5.67024132e-18, 5.92149984e-15],
[ 7.52842809e-01, 9.57568432e-04, 1.00000000e+00]])

This is because the shapes of your two arrays are
t.shape = (3,1) and s.shape=(3,). So the broadcasting rules apply: They state that if the two dimensions are equal, then do it element-wise, if they are not the same it will fail unless one of them is one, an this is where it becomes interesting: In this case the array with the dimension of one will iterate the operation over all elements of the other dimension.
I guess what you want to do would be
t[:,0] / s
or
np.squeeze(t) / s
Both of which will get rid of the empty first dimension in t. This really is not a bug it is a feature! because if you have two vectors and you want to do an operation between all elements you do exactly that:
a = np.arange(3)
b = np.arange(3)
element-wise you can do now:
a*b = [0,1,4]
If you would want to do have this operation performed between all elements you can insert these dimensions of size one like so:
a[np.newaxis,:] * b[:,np.newaxis]
Try it out! It really is a convenient concept, although I do see how this is confusing at first.

Related

Row.T * Row dot product of a matrix

I am searching for a faster and maybe more elegant way to compute the following:
I have a matrix A and I want to compute the row-wise dot product of A. Herby I want to compute Ai.T * Ai, whereby index i indicates the ith row of matrix A.
import numpy as np
A=np.arange(40).reshape(20,2)
sol=[np.dot(A[ii,:].reshape(1,2).T,A[ii,:].reshape(1,2)) for ii in range(20)]
This results in a matrix of shape np.shape(sol) #=(20,2,2)
I already had a look at np.einsum, but could not make it work so far.
If there only exists a solution, where all 20 2x2 matrices are summed, this is also okay, since I want to sum it anyway in the end :)
Thanks

Using np.dot -
A.T.dot(A)
Using np.einsum -
np.einsum('ij,ik->jk',A,A)
Sample run -
>>> A=np.arange(40).reshape(20,2)
>>> sol=[np.dot(A[ii,:].reshape(1,2).T,A[ii,:].reshape(1,2)) for ii in range(20)]
>>> sol = np.array(sol)
>>> sol.sum(0)
array([[ 9880, 10260],
[10260, 10660]])
>>> A.T.dot(A)
array([[ 9880, 10260],
[10260, 10660]])
>>> np.einsum('ij,ik->jk',A,A)
array([[ 9880, 10260],
[10260, 10660]])
If the result must be a 20 element array, I think you need -
np.einsum('ij,ik->i',A,A)

strange behaviour of numpy array_split

I do not understand the behaviour of numpy.array_split with subindices. Indeed when I consider an array of a given length, I determine a subindices and I try to use array_split. I obtain different behaviour if the number of subindices is odd or even. Let's make an example
import numpy as np
a = np.ones(2750001) # fake array
t = np.arange(a.size) # fake time basis
indA = ((t>= 5e5) & (t<= 1e6)) # First subindices odd number
indB = ((t>=5e5+1) & (t<= 1e6)) # Second indices even number
# now perform array_split
print(np.shape(np.array_split(a[indA],10)))
# (10,)
print(np.shape(np.array_split(a[indB],10)))
# (10, 50000)
Now we have different results, basically for the even number we have that the shape command gives actually (10,50000) whereas the shape command in case of odd indices gives (10,) (the 10 lists supposed). I'm a bit surprise actually and I would like to understand the reason. I know that array_split can be used also when the number of splitting does not equally divide the array. But I would like some clue also because I need to insert in a loop where I do not know a priori if the indices will be even or odd.

I think the suprising behavior has more to do with np.shape than np.array_split:
In [58]: np.shape([(1,2),(3,4)])
Out[58]: (2, 2)
In [59]: np.shape([(1,2),(3,4,5)])
Out[59]: (2,)
np.shape(a) is showing the shape of the array np.asarray(a):
def shape(a):
try:
result = a.shape
except AttributeError:
result = asarray(a).shape
return result
So, when np.array_split returns a list of arrays of unequal length, np.asarray(a) is a 1-dimensional array of object dtype:
In [61]: np.asarray([(1,2),(3,4,5)])
Out[61]: array([(1, 2), (3, 4, 5)], dtype=object)
When array_split returns a list of arrays of equal length, then np.asarray(a) returns a 2-dimensional array:
In [62]: np.asarray([(1,2),(3,4)])
Out[62]:
array([[1, 2],
[3, 4]])

map() function increase the dimension in python

I got a very strange problem about the map function, it will increase a dimension automatically.
matrix = range(4)
matrix = numpy.reshape(matrix,(2,2))
vector = numpy.ones((1,2))
newMatrix = map(lambda line: line/vector, matrix)
np.shape(newMatrix) # I got (2,1,2)
I am confused, the matrix has the shape(2,2), but why after the map() function, the newMatrix has such a shape (2,1,2)? How can I fix with this problem?

I think what you are trying to do is simply newMatrix = matrix / vector. Remember that numpy performs element-wise operations. map is doing what it is defined to do, i.e. return a list after applying your function to each item in the iterator. So map operates on each row of your matrix at a time. You have two rows; thus, your new shape is 2 x 1 x 2.

This example may illustrate what is going on (I replaced your 'matrix', and 'vector' names with neutral variable names)
In [13]: x = np.arange(4).reshape(2,2)
In [14]: y=np.ones((1,2))
In [15]: list(map(lambda line:line/y, x))
Out[15]: [array([[ 0., 1.]]), array([[ 2., 3.]])]
Notice the 2 arrays have shape (1,2), which matches that of y. x[0,:]/y shows this as well. Wrap that list in np.array..., and you get a (2,1,2).
Notice what happens when I use a 1d array, z:
In [16]: z=np.ones((2,))
In [17]: list(map(lambda line:line/z, x))
Out[17]: [array([ 0., 1.]), array([ 2., 3.])]
I ran this sample in Python3, where map returns a generator. To get an array from that I have to use
np.array(list(map(...)))
I don't think I've seen the use of map with numpy arrays before. I'm a little surprised that in Python2 it returns an array, not just a list. A more common version of your iteration is to wrap a list comprehension in np.array...
np.array([line/y for line in x])
But as noted in the other answer, you don't need iteration for this simple case. x/y is sufficient. How to avoid iteration is a frequent SO question.

Creating index array in numpy - eliminating double for loop

I have some physical simulation code, written in python and using numpy/scipy. Profiling the code shows that 38% of the CPU time is spent in a single doubly nested for loop - this seems excessive, so I've been trying to cut it down.
The goal of the loop is to create an array of indices, showing which elements of a 1D array the elements of a 2D array are equal to.
indices[i,j] = where(1D_array == 2D_array[i,j])
As an example, if 1D_array = [7.2, 2.5, 3.9] and
2D_array = [[7.2, 2.5]
[3.9, 7.2]]
We should have
indices = [[0, 1]
[2, 0]]
I currently have this implemented as
for i in range(ni):
for j in range(nj):
out[i, j] = (1D_array - 2D_array[i, j]).argmin()
The argmin is needed as I'm dealing with floating point numbers, and so the equality is not necessarily exact. I know that every number in the 1D array is unique, and that every element in the 2D array has a match, so this approach gives the correct result.
Is there any way of eliminating the double for loop?
Note:
I need the index array to perform the following operation:
f = complex_function(1D_array)
output = f[indices]
This is faster than the alternative, as the 2D array has a size of NxN compared with 1xN for the 1D array, and the 2D array has many repeated values. If anyone can suggest a different way of arriving at the same output without going through an index array, that could also be a solution

In pure Python you can do this using a dictionary in O(N) time, the only time penalty is going to be the Python loop involved:
>>> arr1 = np.array([7.2, 2.5, 3.9])
>>> arr2 = np.array([[7.2, 2.5], [3.9, 7.2]])
>>> indices = dict(np.hstack((arr1[:, None], np.arange(3)[:, None])))
>>> np.fromiter((indices[item] for item in arr2.ravel()), dtype=arr2.dtype).reshape(arr2.shape)
array([[ 0., 1.],
[ 2., 0.]])

The dictionary method that some others have suggest might work, but it requires that you know ahead of time that every element in your target array (the 2d array) has an exact match in your search array (your 1d array). Even when this should be true in principle, you still have to deal with floating point precision issues, for example try this .1 * 3 == .3.
Another approach is to use numpy's searchsorted function. searchsorted takes a sorted 1d search array and any traget array then finds the closest elements in the search array for every item in the target array. I've adapted this answer for your situation, take a look at it for a description of how the find_closest function works.
import numpy as np
def find_closest(A, target):
order = A.argsort()
A = A[order]
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return order[idx]
array1d = np.array([7.2, 2.5, 3.9])
array2d = np.array([[7.2, 2.5],
[3.9, 7.2]])
indices = find_closest(array1d, array2d)
print(indices)
# [[0 1]
# [2 0]]

To get rid of the two Python for loops, you can do all of the equality comparisons "in one go" by adding new axes to the arrays (making them broadcastable with each other).
Bear in mind that this produces a new array containing len(arr1)*len(arr2) values. If this is a very big number, this approach could be infeasible depending on the limitations of your memory. Otherwise, it should be reasonably quick:
>>> (arr1[:,np.newaxis] == arr2[:,np.newaxis]).argmax(axis=1)
array([[0, 1],
[2, 0]], dtype=int32)
If you need to get the index of the closest matching value in arr1 instead, use:
np.abs(arr1[:,np.newaxis] - arr2[:,np.newaxis]).argmin(axis=1)

When does getting submatrix of an numpy array returns view but not copy?

I'm trying to get a submatrix of a numpy 2D array, and modify it. Sometimes I get a copy, which modification to it does not affect the original array:
In [650]: d=np.random.rand(5,5)
In [651]: may_share_memory(d, d[[0,1],:][:,[2,3]])
Out[651]: False
In [652]: d[[0,1],:][:,[2,3]]=2
In [653]: d
Out[653]:
array([[ 0.0648922 , 0.41408311, 0.88024646, 0.22471181, 0.81811439],
[ 0.32154096, 0.88349028, 0.30755883, 0.55301128, 0.61138144],
[ 0.18398833, 0.40208368, 0.69888324, 0.93197147, 0.43538379],
[ 0.55633382, 0.80531999, 0.71486132, 0.4186339 , 0.76487239],
[ 0.81193408, 0.4951559 , 0.97713937, 0.33904998, 0.27660239]])
while sometimes it seems I get a view, although may_share_memory also returns False:
In [662]: d[np.ix_([0,1],[2,3])]=1
In [663]: d
Out[663]:
array([[ 0.0648922 , 0.41408311, 1. , 1. , 0.81811439],
[ 0.32154096, 0.88349028, 1. , 1. , 0.61138144],
[ 0.18398833, 0.40208368, 0.69888324, 0.93197147, 0.43538379],
[ 0.55633382, 0.80531999, 0.71486132, 0.4186339 , 0.76487239],
[ 0.81193408, 0.4951559 , 0.97713937, 0.33904998, 0.27660239]])
In [664]: may_share_memory(d, d[np.ix_([0,1],[2,3])])
Out[664]: False
What more strange is, if assign that 'view' to a variable, it becomes a 'copy' (again, modification does not affect the original array):
In [658]: d2=d[np.ix_([0,1],[2,3])]
In [659]: may_share_memory(d,d2)
Out[659]: False
In [660]: d2+=1
In [661]: d
Out[661]:
array([[ 0.0648922 , 0.41408311, 0.88024646, 0.22471181, 0.81811439],
[ 0.32154096, 0.88349028, 0.30755883, 0.55301128, 0.61138144],
[ 0.18398833, 0.40208368, 0.69888324, 0.93197147, 0.43538379],
[ 0.55633382, 0.80531999, 0.71486132, 0.4186339 , 0.76487239],
[ 0.81193408, 0.4951559 , 0.97713937, 0.33904998, 0.27660239]])

I agree; this is strange. Yet there is a logic to it.
Note that a sliced assignment is a special overloaded method in python. A sliced assignment doesn't create the view and then write to it; it writes to the array directly. You cant create a view to an ndarray of a[[2,0,1]], because you cant express this view as a strided array, which is the fundamental interface all numpy functions demand. But you can directly consume the indices and act on them. Arguably, for consistency, such a sliced assignment should make a modification to a copy; but where is the point in that, if you don't bind the newly created array to a new name?
It is somewhat awkward in python in general that assignment and sliced assignments are completely different beasts, which do completely different things. That is also what is at the root of this. sliced assignment and slicing on the right hand side call different functions, and are conceptually somewhat distinct. may_share_memory refers to the behavior of right hand side slicing, not sliced assignments.

What you're seeing is the difference between "fancy" indexing and normal indexing.
Also, for clarity, d[np.ix_([0,1],[2,3])] = 1 is not a view, it's an assignment. See #EelcoHoogendoorn's answer for more explanation in that regard. The root of your confusion seems to be with __setitem__ vs __getitem__, which Eelco addresses, but I thought I'd add a few numpy-specific clarifications.
Any time you index with a sequence of coordinates (np.ix_ returns an array of indicies), it's "fancy" indexing and will always return a copy.
Anything you can do with slicing with always return a view.
For example:
In [1]: import numpy as np
In [2]: x = np.arange(10)
In [3]: y = x[3:5]
In [4]: z = x[[3, 4]]
In [5]: z[0] = 100
In [5]: x
Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [6]: y[0] = 100
In [7]: x
Out[7]: array([ 0, 1, 2, 100, 4, 5, 6, 7, 8, 9])
The reason for this is that numpy arrays have to be semi-contiguous in memory (more precisely, they have to be able to be described by an offset, strides, and shape).
Any type of slicing can be described this way (even something like x[:, 3:100:5, None]).
An arbitrary sequence of coordinates (e.g. x[[1, 4, 5, 100]]) cannot be.
Therefore, numpy always returns a view if slicing is used and a copy if "fancy indexing" (a.k.a. using a sequence of indicies or a boolean mask) is used.
Assignments (e.g. x[blah] = y), however, will always modify a numpy array in-place.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy element wise division not working as expected - python

Related

Row.T * Row dot product of a matrix

strange behaviour of numpy array_split

map() function increase the dimension in python

Creating index array in numpy - eliminating double for loop

When does getting submatrix of an numpy array returns view but not copy?

Categories

Resources