Passing elements to function efficiently - python

I have an array of size m x n.
I want to pass each m row individually to a function and save the result in the same row.
What would be the efficient way of doing this using numpy.
Currently I am using for loops to achieve this:
X : size(m x n)
p : size(m x n)
for i in np.arange(X.shape[0]):
X[i] = some_func(X[i], p[i])

Since you are modifying the row of X, you can skip the indexing and use zip to iterate on the rows:
In [833]: X=np.ones((2,3)); p=np.arange(6).reshape(2,3)
In [834]: for x,y in zip(X,p):
...: x[:] = x + y
...:
In [835]: X
Out[835]:
array([[1., 2., 3.],
[4., 5., 6.]])
If you still needed the index you could add enumerate:
for i,(x,y) in enumerate(zip(X,p)):...
There isn't much difference in efficiency in these alternatives. You still have to call your function m times. You still have to select rows, either by index or by iteration. Both are a bit slower on arrays than on the equivalent list.
The best thing is to write your function so it works directly with the 2d arrays, and doesn't need iteration.
X+p
But if the function is too complex for that, then its evaluation time is likely to be relatively high (compared to the iteration mechanism).

You can make a list of all first row of the X and p Matrix using List Comprehension as shown below. Then you can easily send the first row of X and p as a parameters to your some_function
import numpy as np
X = np.random.randint(9, size=(3, 3))
p = np.random.randint(9, size=(3, 3))
print(X.shape, p.shape)
XList = [i[0] for i in X]
pList = [j[0] for j in p]
print (XList)
print (pList)
for i in np.arange(XList, pList):
X[i] = some_func(XList, pList)

Related

Covert numpy.ndarray to a list

I'm trying to convert this numpy.ndarray to a list
[[105.53518731]
[106.45317529]
[107.37373843]
[108.00632646]
[108.56373502]
[109.28813113]
[109.75593207]
[110.57458371]
[111.47960639]]
I'm using this function to convert it.
conver = conver.tolist()
the output is this, I'm not sure whether it's a list and if so, can I access its elements by doing cover[0] , etc
[[105.5351873125], [106.45317529411764], [107.37373843478261], [108.00632645652173], [108.56373502040816], [109.28813113157895], [109.75593206666666], [110.57458370833334], [111.47960639393939]]
finally, after I convert it to a list, I try to multiply the list members by 1.05 and get this error!
TypeError: can't multiply sequence by non-int of type 'float'
You start with a 2d array, with shape (n,1), like this:
In [342]: arr = np.random.rand(5,1)*100
In [343]: arr
Out[343]:
array([[95.39049043],
[19.09502087],
[85.45215423],
[94.77657561],
[32.7869103 ]])
tolist produces a list - but it contains lists; each [] layer denotes a list. Notice that the [] nesting matches the array's:
In [344]: arr.tolist()
Out[344]:
[[95.39049043424225],
[19.095020872584335],
[85.4521542296349],
[94.77657561477125],
[32.786910295446425]]
To get a number you have to index through each list layer:
In [345]: arr.tolist()[0]
Out[345]: [95.39049043424225]
In [346]: arr.tolist()[0][0]
Out[346]: 95.39049043424225
In [347]: arr.tolist()[0][0]*1.05
Out[347]: 100.16001495595437
If you first turn the array into a 1d one, the list indexing is simpler:
In [348]: arr.ravel()
Out[348]: array([95.39049043, 19.09502087, 85.45215423, 94.77657561, 32.7869103 ])
In [349]: arr.ravel().tolist()
Out[349]:
[95.39049043424225,
19.095020872584335,
85.4521542296349,
94.77657561477125,
32.786910295446425]
In [350]: arr.ravel().tolist()[0]
Out[350]: 95.39049043424225
But if your primary goal is to multiply the elements, doing with the array is simpler:
In [351]: arr * 1.05
Out[351]:
array([[100.16001496],
[ 20.04977192],
[ 89.72476194],
[ 99.5154044 ],
[ 34.42625581]])
You can access elements of the array with:
In [352]: arr[0,0]
Out[352]: 95.39049043424225
But if you do need to iterate, the tolist() option is good to know. Iterating on lists is usually faster than iterating on an array. With an array you should try to use the fast whole-array methods.
you convert to list of list, so you could not broadcast.
import numpy as np
x = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639],]
x = np.hstack(x)
x * 1.05
array([110.81194668, 111.77583405, 112.74242535, 113.40664278,
113.99192177, 114.75253769, 115.24372867, 116.1033129 ,
117.05358671])
yes, it's a list, you can check the type of a variable:
type(a)
to multiply each element with 1.05 then run the code below:
x = [float(i[0]) * 1.05 for i in a]
print(x)
Try this:
import numpy as np
a = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639]]
b = [elem[0] for elem in a]
b = np.array(b)
print(b*1.05)

create a list of list of N-dimensional numpy arrays

I want to create a list of list of 2x2 numpy arrays
array([[0, 0],
[1, 1]])
for example I want to fill a list with 8 of these arrays.
x = []
for j in range(9):
for i in np.random.randint(2, size=(2, 2)):
x.append([i])
this gives me a 1x1 array
z = iter(x)
next(z)
[array([0, 1])]
what am I missing here ?
You missed that are iterating over a 2x2 array 9 times. Each iteration yields a row of the array which is what you see when you look at the first element - the first row of the first matrix. Not only that, you append this row within a list, so you actually have 18 lists with a single element. What you want to do is append the matrix directly, with no inner loop and definitely no additional [] around, or better yet:
x = [np.random.randint(2, size=(2, 2)) for _ in range(9)]

Create matrix in a loop with numpy

I would like to build up a numpy matrix using rows I get in a loop. But how do I initialize the matrix? If I write
A = []
A = numpy.vstack((A, [1, 2]))
I get
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What's the best practice for this?
NOTE: I do not know the number of rows in advance. The number of columns is known.
Unknown number of rows
One way is to form a list of lists, and then convert to a numpy array in one operation:
final = []
# x is some generator
for item in x:
final.append(x)
A = np.array(x)
Or, more elegantly, given a generator x:
A = np.array(list(x))
This solution is time-efficient but memory-inefficient.
Known number of rows
Append operations on numpy arrays are expensive and not recommended. If you know the size of the final array in advance, you can instantiate an empty (or zero) array of your desired size, and then fill it with values. For example:
A = np.zeros((10, 2))
A[0] = [1, 2]
Or in a loop, with a trivial assignment to demonstrate syntax:
A = np.zeros((2, 2))
# in reality, x will be some generator whose length you know in advance
x = [[1, 2], [3, 4]]
for idx, item in enumerate(x):
A[idx] = item
print(A)
array([[ 1., 2.],
[ 3., 4.]])

map() function increase the dimension in python

I got a very strange problem about the map function, it will increase a dimension automatically.
matrix = range(4)
matrix = numpy.reshape(matrix,(2,2))
vector = numpy.ones((1,2))
newMatrix = map(lambda line: line/vector, matrix)
np.shape(newMatrix) # I got (2,1,2)
I am confused, the matrix has the shape(2,2), but why after the map() function, the newMatrix has such a shape (2,1,2)? How can I fix with this problem?
I think what you are trying to do is simply newMatrix = matrix / vector. Remember that numpy performs element-wise operations. map is doing what it is defined to do, i.e. return a list after applying your function to each item in the iterator. So map operates on each row of your matrix at a time. You have two rows; thus, your new shape is 2 x 1 x 2.
This example may illustrate what is going on (I replaced your 'matrix', and 'vector' names with neutral variable names)
In [13]: x = np.arange(4).reshape(2,2)
In [14]: y=np.ones((1,2))
In [15]: list(map(lambda line:line/y, x))
Out[15]: [array([[ 0., 1.]]), array([[ 2., 3.]])]
Notice the 2 arrays have shape (1,2), which matches that of y. x[0,:]/y shows this as well. Wrap that list in np.array..., and you get a (2,1,2).
Notice what happens when I use a 1d array, z:
In [16]: z=np.ones((2,))
In [17]: list(map(lambda line:line/z, x))
Out[17]: [array([ 0., 1.]), array([ 2., 3.])]
I ran this sample in Python3, where map returns a generator. To get an array from that I have to use
np.array(list(map(...)))
I don't think I've seen the use of map with numpy arrays before. I'm a little surprised that in Python2 it returns an array, not just a list. A more common version of your iteration is to wrap a list comprehension in np.array...
np.array([line/y for line in x])
But as noted in the other answer, you don't need iteration for this simple case. x/y is sufficient. How to avoid iteration is a frequent SO question.

Creating index array in numpy - eliminating double for loop

I have some physical simulation code, written in python and using numpy/scipy. Profiling the code shows that 38% of the CPU time is spent in a single doubly nested for loop - this seems excessive, so I've been trying to cut it down.
The goal of the loop is to create an array of indices, showing which elements of a 1D array the elements of a 2D array are equal to.
indices[i,j] = where(1D_array == 2D_array[i,j])
As an example, if 1D_array = [7.2, 2.5, 3.9] and
2D_array = [[7.2, 2.5]
[3.9, 7.2]]
We should have
indices = [[0, 1]
[2, 0]]
I currently have this implemented as
for i in range(ni):
for j in range(nj):
out[i, j] = (1D_array - 2D_array[i, j]).argmin()
The argmin is needed as I'm dealing with floating point numbers, and so the equality is not necessarily exact. I know that every number in the 1D array is unique, and that every element in the 2D array has a match, so this approach gives the correct result.
Is there any way of eliminating the double for loop?
Note:
I need the index array to perform the following operation:
f = complex_function(1D_array)
output = f[indices]
This is faster than the alternative, as the 2D array has a size of NxN compared with 1xN for the 1D array, and the 2D array has many repeated values. If anyone can suggest a different way of arriving at the same output without going through an index array, that could also be a solution
In pure Python you can do this using a dictionary in O(N) time, the only time penalty is going to be the Python loop involved:
>>> arr1 = np.array([7.2, 2.5, 3.9])
>>> arr2 = np.array([[7.2, 2.5], [3.9, 7.2]])
>>> indices = dict(np.hstack((arr1[:, None], np.arange(3)[:, None])))
>>> np.fromiter((indices[item] for item in arr2.ravel()), dtype=arr2.dtype).reshape(arr2.shape)
array([[ 0., 1.],
[ 2., 0.]])
The dictionary method that some others have suggest might work, but it requires that you know ahead of time that every element in your target array (the 2d array) has an exact match in your search array (your 1d array). Even when this should be true in principle, you still have to deal with floating point precision issues, for example try this .1 * 3 == .3.
Another approach is to use numpy's searchsorted function. searchsorted takes a sorted 1d search array and any traget array then finds the closest elements in the search array for every item in the target array. I've adapted this answer for your situation, take a look at it for a description of how the find_closest function works.
import numpy as np
def find_closest(A, target):
order = A.argsort()
A = A[order]
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return order[idx]
array1d = np.array([7.2, 2.5, 3.9])
array2d = np.array([[7.2, 2.5],
[3.9, 7.2]])
indices = find_closest(array1d, array2d)
print(indices)
# [[0 1]
# [2 0]]
To get rid of the two Python for loops, you can do all of the equality comparisons "in one go" by adding new axes to the arrays (making them broadcastable with each other).
Bear in mind that this produces a new array containing len(arr1)*len(arr2) values. If this is a very big number, this approach could be infeasible depending on the limitations of your memory. Otherwise, it should be reasonably quick:
>>> (arr1[:,np.newaxis] == arr2[:,np.newaxis]).argmax(axis=1)
array([[0, 1],
[2, 0]], dtype=int32)
If you need to get the index of the closest matching value in arr1 instead, use:
np.abs(arr1[:,np.newaxis] - arr2[:,np.newaxis]).argmin(axis=1)

Categories