I got a very strange problem about the map function, it will increase a dimension automatically.
matrix = range(4)
matrix = numpy.reshape(matrix,(2,2))
vector = numpy.ones((1,2))
newMatrix = map(lambda line: line/vector, matrix)
np.shape(newMatrix) # I got (2,1,2)
I am confused, the matrix has the shape(2,2), but why after the map() function, the newMatrix has such a shape (2,1,2)? How can I fix with this problem?
I think what you are trying to do is simply newMatrix = matrix / vector. Remember that numpy performs element-wise operations. map is doing what it is defined to do, i.e. return a list after applying your function to each item in the iterator. So map operates on each row of your matrix at a time. You have two rows; thus, your new shape is 2 x 1 x 2.
This example may illustrate what is going on (I replaced your 'matrix', and 'vector' names with neutral variable names)
In [13]: x = np.arange(4).reshape(2,2)
In [14]: y=np.ones((1,2))
In [15]: list(map(lambda line:line/y, x))
Out[15]: [array([[ 0., 1.]]), array([[ 2., 3.]])]
Notice the 2 arrays have shape (1,2), which matches that of y. x[0,:]/y shows this as well. Wrap that list in np.array..., and you get a (2,1,2).
Notice what happens when I use a 1d array, z:
In [16]: z=np.ones((2,))
In [17]: list(map(lambda line:line/z, x))
Out[17]: [array([ 0., 1.]), array([ 2., 3.])]
I ran this sample in Python3, where map returns a generator. To get an array from that I have to use
np.array(list(map(...)))
I don't think I've seen the use of map with numpy arrays before. I'm a little surprised that in Python2 it returns an array, not just a list. A more common version of your iteration is to wrap a list comprehension in np.array...
np.array([line/y for line in x])
But as noted in the other answer, you don't need iteration for this simple case. x/y is sufficient. How to avoid iteration is a frequent SO question.
Related
Consider the following array manipulations:
import numpy as np
def f(x):
x += 1
x = np.zeros(1)
f(x) # changes `x`
f(x[0]) # doesn't change `x`
x[0] += 1 # changes `x`
Why does x[0] behave differently depending on whether += 1 happens inside or outside the function f?
Can I pass a part of the array to the function, such that the function modifies the original array?
Edit: If we considered = instead of +=, we would probably maintain the core of the question while getting rid of some irrelevant complexity.
You don't even need the function call to see this difference.
x is an array:
In [138]: type(x)
Out[138]: numpy.ndarray
Indexing an element of the array returns a np.float64 object. It in effect "takes" the value out of the array; it is not a reference to the element of the array.
In [140]: y=x[0]
In [141]: type(y)
Out[141]: numpy.float64
This y is a lot like a python float; you can += the same way:
In [142]: y += 1
In [143]: y
Out[143]: 1.0
but this does not change x:
In [144]: x
Out[144]: array([0.])
But this does change x:
In [145]: x[0] += 1
In [146]: x
Out[146]: array([1.])
y=x[0] does a x.__getitem__ call. x[0]=3 does a x.__setitem__ call. += uses __iadd__, but it's similar in effect.
Another example:
Changing x:
In [149]: x[0] = 3
In [150]: x
Out[150]: array([3.])
but attempting to do the same to y fails:
In [151]: y[()] = 3
Traceback (most recent call last):
File "<ipython-input-151-153d89268cbc>", line 1, in <module>
y[()] = 3
TypeError: 'numpy.float64' object does not support item assignment
but y[()] is allowed.
basic indexing of an array with a slice does produce a view that can be modified:
In [154]: x = np.zeros(5)
In [155]: x
Out[155]: array([0., 0., 0., 0., 0.])
In [156]: y= x[0:2]
In [157]: type(y)
Out[157]: numpy.ndarray
In [158]: y += 1
In [159]: y
Out[159]: array([1., 1.])
In [160]: x
Out[160]: array([1., 1., 0., 0., 0.])
===
Python list and dict examples of the x[0]+=1 kind of action:
In [405]: alist = [1,2,3]
In [406]: alist[1]+=12
In [407]: alist
Out[407]: [1, 14, 3]
In [408]: adict = {'a':32}
In [409]: adict['a'] += 12
In [410]: adict
Out[410]: {'a': 44}
__iadd__ can be thought of a __getitem__ followed by a __setitem__ with the same index.
The issue is not scope, since the only thing that depends on scope is the available names. All objects can be accessed in any scope that has a name for them. The issue is one of mutability vs immutability and understanding what operators do.
x is a mutable numpy array. f runs x += 1 directly on it. += is the operator that invokes in-place addition. In other words, it does x = x.__iadd__(1)*. Notice the reassignment to x, which happens in the function. That is a feature of the in-place operators that allows them to operate on immutable objects. In this case, ndarray.__iadd__ is a true in-place operator which just returns x, and everything works as expected.
Now let's analyze f(x[0]) the same way. x[0] calls x.__getitem__(0)*. When you pass in a scalar int index, numpy extracts a one-element array and effectively calls .item() on it. The result is a python int (or float, or even possibly a tuple, depending on what your array's dtype is). Either way, the object is immutable. Once it's been extracted by __getitem__, the += operator in f replaces the name x in f with the new object, but the change is not seen outside the function, much less in the array. In this scenario, f has no reference to x, so no change is to be expected.
The example of x[0] += 1 is not the same as calling f(x[0]). It is equivalent to calling x.__setitem__(0, x.__getitem__(0).__iadd__(1))*. The call to f was only the part with type(x).__getitem__(0).__iadd__(1), which returns a new object, but never reassigns as __setitem__ does. The key is that [] = (__setitem__) in python is an entirely different operator from [] (__getitem__) and = (assingment) separately.
To make the second example (f(x[0]) work, you would have to pass in a mutable object. An integer object extracts a single python object, and an array index makes a copy. However, a slice index returns a view that is mutable and tied to the original array memory. Therefore, you can do
f(x[0:1]) # changes `x`
In this case f does the following: x.__getitem__(slice(0, 1, None)).__iadd__(1). The key is that __getitem__ returns a mutable view into the original array, not an immutable int.
To see why it is important not only that the object is mutable but that it is a view into the original array, try f(x[[0]]). Indexing with a list produces an array, but a copy. In x[[0]].__iadd__ will modify the list you pass in in-place, but the list is not copied back into the original, so the change will not propagate.
* This is an approximation. When invoked by an operator, dunder methods are actually called as type(x).__operator__(x, ...), not x.__operator__(...).
As per this comment and this answer:
The x[0] inside of f(x[0]) performs __getitem__ on x. In this particular case (as opposed to indexing a slice of the array, for example), the value returned by this operation doesn't allow modifying the original array.
x[0] = 1 performs __setitem__ on x.
__getitem__ and __setitem__ can be defined/overloaded to do anything. They don't even have to be consistent with each other.
I have an array of size m x n.
I want to pass each m row individually to a function and save the result in the same row.
What would be the efficient way of doing this using numpy.
Currently I am using for loops to achieve this:
X : size(m x n)
p : size(m x n)
for i in np.arange(X.shape[0]):
X[i] = some_func(X[i], p[i])
Since you are modifying the row of X, you can skip the indexing and use zip to iterate on the rows:
In [833]: X=np.ones((2,3)); p=np.arange(6).reshape(2,3)
In [834]: for x,y in zip(X,p):
...: x[:] = x + y
...:
In [835]: X
Out[835]:
array([[1., 2., 3.],
[4., 5., 6.]])
If you still needed the index you could add enumerate:
for i,(x,y) in enumerate(zip(X,p)):...
There isn't much difference in efficiency in these alternatives. You still have to call your function m times. You still have to select rows, either by index or by iteration. Both are a bit slower on arrays than on the equivalent list.
The best thing is to write your function so it works directly with the 2d arrays, and doesn't need iteration.
X+p
But if the function is too complex for that, then its evaluation time is likely to be relatively high (compared to the iteration mechanism).
You can make a list of all first row of the X and p Matrix using List Comprehension as shown below. Then you can easily send the first row of X and p as a parameters to your some_function
import numpy as np
X = np.random.randint(9, size=(3, 3))
p = np.random.randint(9, size=(3, 3))
print(X.shape, p.shape)
XList = [i[0] for i in X]
pList = [j[0] for j in p]
print (XList)
print (pList)
for i in np.arange(XList, pList):
X[i] = some_func(XList, pList)
I would like to build up a numpy matrix using rows I get in a loop. But how do I initialize the matrix? If I write
A = []
A = numpy.vstack((A, [1, 2]))
I get
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What's the best practice for this?
NOTE: I do not know the number of rows in advance. The number of columns is known.
Unknown number of rows
One way is to form a list of lists, and then convert to a numpy array in one operation:
final = []
# x is some generator
for item in x:
final.append(x)
A = np.array(x)
Or, more elegantly, given a generator x:
A = np.array(list(x))
This solution is time-efficient but memory-inefficient.
Known number of rows
Append operations on numpy arrays are expensive and not recommended. If you know the size of the final array in advance, you can instantiate an empty (or zero) array of your desired size, and then fill it with values. For example:
A = np.zeros((10, 2))
A[0] = [1, 2]
Or in a loop, with a trivial assignment to demonstrate syntax:
A = np.zeros((2, 2))
# in reality, x will be some generator whose length you know in advance
x = [[1, 2], [3, 4]]
for idx, item in enumerate(x):
A[idx] = item
print(A)
array([[ 1., 2.],
[ 3., 4.]])
I have some physical simulation code, written in python and using numpy/scipy. Profiling the code shows that 38% of the CPU time is spent in a single doubly nested for loop - this seems excessive, so I've been trying to cut it down.
The goal of the loop is to create an array of indices, showing which elements of a 1D array the elements of a 2D array are equal to.
indices[i,j] = where(1D_array == 2D_array[i,j])
As an example, if 1D_array = [7.2, 2.5, 3.9] and
2D_array = [[7.2, 2.5]
[3.9, 7.2]]
We should have
indices = [[0, 1]
[2, 0]]
I currently have this implemented as
for i in range(ni):
for j in range(nj):
out[i, j] = (1D_array - 2D_array[i, j]).argmin()
The argmin is needed as I'm dealing with floating point numbers, and so the equality is not necessarily exact. I know that every number in the 1D array is unique, and that every element in the 2D array has a match, so this approach gives the correct result.
Is there any way of eliminating the double for loop?
Note:
I need the index array to perform the following operation:
f = complex_function(1D_array)
output = f[indices]
This is faster than the alternative, as the 2D array has a size of NxN compared with 1xN for the 1D array, and the 2D array has many repeated values. If anyone can suggest a different way of arriving at the same output without going through an index array, that could also be a solution
In pure Python you can do this using a dictionary in O(N) time, the only time penalty is going to be the Python loop involved:
>>> arr1 = np.array([7.2, 2.5, 3.9])
>>> arr2 = np.array([[7.2, 2.5], [3.9, 7.2]])
>>> indices = dict(np.hstack((arr1[:, None], np.arange(3)[:, None])))
>>> np.fromiter((indices[item] for item in arr2.ravel()), dtype=arr2.dtype).reshape(arr2.shape)
array([[ 0., 1.],
[ 2., 0.]])
The dictionary method that some others have suggest might work, but it requires that you know ahead of time that every element in your target array (the 2d array) has an exact match in your search array (your 1d array). Even when this should be true in principle, you still have to deal with floating point precision issues, for example try this .1 * 3 == .3.
Another approach is to use numpy's searchsorted function. searchsorted takes a sorted 1d search array and any traget array then finds the closest elements in the search array for every item in the target array. I've adapted this answer for your situation, take a look at it for a description of how the find_closest function works.
import numpy as np
def find_closest(A, target):
order = A.argsort()
A = A[order]
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return order[idx]
array1d = np.array([7.2, 2.5, 3.9])
array2d = np.array([[7.2, 2.5],
[3.9, 7.2]])
indices = find_closest(array1d, array2d)
print(indices)
# [[0 1]
# [2 0]]
To get rid of the two Python for loops, you can do all of the equality comparisons "in one go" by adding new axes to the arrays (making them broadcastable with each other).
Bear in mind that this produces a new array containing len(arr1)*len(arr2) values. If this is a very big number, this approach could be infeasible depending on the limitations of your memory. Otherwise, it should be reasonably quick:
>>> (arr1[:,np.newaxis] == arr2[:,np.newaxis]).argmax(axis=1)
array([[0, 1],
[2, 0]], dtype=int32)
If you need to get the index of the closest matching value in arr1 instead, use:
np.abs(arr1[:,np.newaxis] - arr2[:,np.newaxis]).argmin(axis=1)
From the docs, here is how element division works normally
a1 = np.array([8,12,14])
b1 = np.array([4,6,7])
a1/b1
array([2, 2, 2])
Which works. I am trying the same thing, I think, on different arrays and it doesn't. For two 3-dim vectors it is returning a 3x3 matrix. I even made sure their "shape is same" but no difference.
>> t
array([[ 3.17021277e+00],
[ 4.45795858e-15],
[ 7.52842809e-01]])
>> s
array([ 1.00000000e+00, 7.86202619e+02, 7.52842809e-01])
>> t/s
array([[ 3.17021277e+00, 4.03231011e-03, 4.21098897e+00],
[ 4.45795858e-15, 5.67024132e-18, 5.92149984e-15],
[ 7.52842809e-01, 9.57568432e-04, 1.00000000e+00]])
t/s.T
array([[ 3.17021277e+00, 4.03231011e-03, 4.21098897e+00],
[ 4.45795858e-15, 5.67024132e-18, 5.92149984e-15],
[ 7.52842809e-01, 9.57568432e-04, 1.00000000e+00]])
This is because the shapes of your two arrays are
t.shape = (3,1) and s.shape=(3,). So the broadcasting rules apply: They state that if the two dimensions are equal, then do it element-wise, if they are not the same it will fail unless one of them is one, an this is where it becomes interesting: In this case the array with the dimension of one will iterate the operation over all elements of the other dimension.
I guess what you want to do would be
t[:,0] / s
or
np.squeeze(t) / s
Both of which will get rid of the empty first dimension in t. This really is not a bug it is a feature! because if you have two vectors and you want to do an operation between all elements you do exactly that:
a = np.arange(3)
b = np.arange(3)
element-wise you can do now:
a*b = [0,1,4]
If you would want to do have this operation performed between all elements you can insert these dimensions of size one like so:
a[np.newaxis,:] * b[:,np.newaxis]
Try it out! It really is a convenient concept, although I do see how this is confusing at first.