mapping over 2 numpy.ndarray simultaneously - python

Here's the problem. Let's say I have a matrix A =
array([[ 1., 0., 2.],
[ 0., 0., 2.],
[ 0., -1., 3.]])
and a vector of indices p = array([0, 2, 1]). I want to turn a 3x3 matrix A to an array of length 3 (call it v) where v[j] = A[j, p[j]] for j = 0, 1, 2. I can do it the following way:
v = map(lambda (row, idx): row[idx], zip(A, p))
So for the above matrix A and a vector of indices p I expect to get array([1, 2, -1]) (ie 0th element of row 0, 2nd element of row 1, 1st element of row 2).
But can I achieve the same result by using native numpy (ie without explicitly zipping and then mapping)? Thanks.

I don't think that such a functionality exists. To achieve what you want, I can think of two easy ways. You could do:
np.diag(A[:, p])
Here the array p is applied as a column index for every row such that on the diagonal you will have the elements that you are looking for.
As an alternative you can avoid to produce a lot of unnecessary entries by using:
A[np.arange(A.shape[0]), p]

Related

How do I remove rows in a list containing numpy arrays based on a condition?

I have the following numpy array arr_split:
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = 4,
axis = 0)
arr_split
Output:
[array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]]),
array([], shape=(0, 3), dtype=float64)]
How do I remove rows which are "empty" (ie. in the above eg., it's the last row). The array arr_split can have any number of "empty" rows. The above eg. just so happens to have only one row which is "empty".
I have tried using list comprehension, as per below:
arr_split[[(arr_split[i].shape[0] != 0) for i in range(len(arr_split))]]
but this doesn't work because the list comprehension [(arr_split[i].shape[0] != 0) for i in range(len(arr_split))] part returns a list, when I actually just need the elements in the list to feed into arr_split[] as indices.
Anyone know how I could fix this or is there another way of doing this? If possible, looking for the easiest way of doing this without too many loops or if statements.
you can change the indices_or_sections value to length of the first axis, this will prevent any empty arrays from being produced
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = arr1.shape[0],
axis = 0)
arr_split
>>> [
array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]])
]
Just loop through and check the size. Only add them to the new list if they have a size greater than 0.
arr_split_new = [arr for arr in arr_split if arr.size > 0]
You can use enumerate to get the indexes and size to check if empty
indexes = [idx for idx, v in enumerate(arr_split) if v.size != 0]
[0, 1, 2]

reshape list of numpy arrays and then reshape back

I have a list which consists of several numpy arrays with different shapes.
I want to reshape this list of arrays into a numpy vector and then change each element in the vector and then reshape it back to the original list of arrays.
For example:
input
[numpy.zeros((2,2)), numpy.ones((3,3))]
First
To vector
[0,0,0,0,1,1,1,1,1,1,1,1,1]
Second
every time change only one element. for example change the 1st element 0 to 2
[0,2,0,0,1,1,1,1,1,1,1,1,1]
Last
convert it back to
[array([[0,2],[0,0]]),array([[1,1,1],[1,1,1],[1,1,1]])]
Is there any fast implementation? Thanks very much.
It seems like converting to a list and back will be inefficient. Instead, why not figure out which array to index (and where) and then just update that index? e.g.
def change_element(arr1, arr2, ix, value):
which = ix >= arr1.size
arr = [arr1, arr2][which]
ix = ix - arr1.size if which else ix
arr.ravel()[ix] = value
And here's some example usage:
>>> arr1 = np.zeros((2, 2))
>>> arr2 = np.ones((3, 3))
>>> change_element(arr1, arr2, 1, 2)
>>> change_element(arr1, arr2, 6, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ]])
>>> change_element(arr1, arr2, 7, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 3.14, 1. , 1. ],
[ 1. , 1. , 1. ]])
A few notes -- This updates the arrays in place. It doesn't create new arrays. If you really need to create new arrays, I suppose you could np.copy them and return. Also, this relies on the arrays sharing memory before and after the ravel. I don't remember the exact circumstances where ravel would return a new array rather than a view into the original array . . .
Generalizing to more arrays is actually quite easy. We just need to walk down the list of arrays and see if ix is less than the array size. If it is, we've found our array. If it isn't, we need to subtract the array's size from ix to represent the number of elements we've traversed thus far:
def change_element(arrays, ix, value):
for arr in arrays:
if ix < arr.size:
arr.ravel()[ix] = value
return
ix -= arr.size
And you can call this similar to before:
change_element([arr1, arr2], 6, 3.14159)
#mgilson probably has the best answer for you, but if you absolutely have to convert to a flat list first and then go back again (perhaps because you need to do something else with the flat list as well), then you can do this with list comprehensions:
lst = [numpy.zeros((2,4)), numpy.ones((3,3))]
tlist = [e for a in lst for e in a.ravel()]
tlist[1] = 2
i = 0
lst2 = []
dims = [a.shape for a in lst]
for n, m in dims:
lst2.append(np.array(tlist[i:i+n*m]).reshape(n,m))
i += n*m
lst2
[array([[ 0., 2.],
[ 0., 0.]]), array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])]
Of course, you lose the information about your array sizes when you flatten, so you need to store them somewhere (here, in dims).

How to append numpy.array to other numpy.array?

I want to create 2D numpy.array knowing at the begining only its shape, i.e shape=2. Now, I want to create in for loop ith one dimensional numpy.arrays, and add them to the main matrix of shape=2, so I'll get something like this:
matrix=
[numpy.array 1]
[numpy.array 2]
...
[numpy.array n]
How can I achieve that? I try to use:
matrix = np.empty(shape=2)
for i in np.arange(100):
array = np.zeros(random_value)
matrix = np.append(matrix, array)
But as a result of print(np.shape(matrix)), after loop, I get something like:
(some_number, )
How can I append each new array in the next row of the matrix? Thank you in advance.
I would suggest working with list
matrix = []
for i in range(10):
a = np.ones(2)
matrix.append(a)
matrix = np.array(matrix)
list does not have the downside of being copied in the memory everytime you use append. so you avoid the problem described by ali_m. at the end of your operation you just convert the list object into a numpy array.
I suspect the root of your problem is the meaning of 'shape' in np.empty(shape=2)
If I run a small version of your code
matrix = np.empty(shape=2)
for i in np.arange(3):
array = np.zeros(3)
matrix = np.append(matrix, array)
I get
array([ 9.57895902e-259, 1.51798693e-314, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000])
See those 2 odd numbers at the start? Those are produced by np.empty(shape=2). That matrix starts as a (2,) shaped array, not an empty 2d array. append just adds sets of 3 zeros to that, resulting in a (11,) array.
Now if you started with a 2 array with the right number of columns, and did concatenate on the 1st dimension you would get a multirow array. (rows only have meaning in 2d or larger).
mat=np.zeros((1,3))
for i in range(1,3):
mat = np.concatenate([mat, np.ones((1,3))*i],axis=0)
produces:
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
A better way of doing an iterative construction like this is with list append
alist = []
for i in range(0,3):
alist.append(np.ones((1,3))*i)
mat=np.vstack(alist)
alist is:
[array([[ 0., 0., 0.]]), array([[ 1., 1., 1.]]), array([[ 2., 2., 2.]])]
mat is
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
With vstack you can get by with np.ones((3,), since it turns all of its inputs into 2d array.
append would work, but it also requires axis=0 parameter, and 2 arrays. It gets misused, often by mistaken analogy to the list append. It is just another front end to concatenate. So I prefer not to use it.
Notice that other posters assumed your random value changed during the iteration. That would produce a arrays of differing lengths. For 1d appending that would still produce the long 1d array. But a 2d append wouldn't work, because an 2d array can't be ragged.
mat = np.zeros((2,),int)
for i in range(4):
mat=np.append(mat,np.ones((i,),int)*i)
# array([0, 0, 1, 2, 2, 3, 3, 3])
The function you are looking for is np.vstack
Here is a modified version of your example
import numpy as np
matrix = np.empty(shape=2)
for i in np.arange(3):
array = np.zeros(2)
matrix = np.vstack((matrix, array))
The result is
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])

Python time optimisation of for loop using newaxis

I need to calculate n number of points(3D) with equal spacing along a defined line(3D).
I know the starting and end point of the line. First, I used
for k in range(nbin):
step = k/float(nbin-1)
bin_point.append(beam_entry+(step*(beamlet_intersection-beam_entry)))
Then, I found that using append for large arrays takes more time, then I changed code like this:
bin_point = [start_point+((k/float(nbin-1))*(end_point-start_point)) for k in range(nbin)]
I got a suggestion that using newaxis will further improve the time.
The modified code looks like this.
step = arange(nbin) / float(nbin-1)
bin_point = start_point + ( step[:,newaxis,newaxis]*((end_pint - start_point))[newaxis,:,:] )
But, I could not understand the newaxis function, I also have a doubt that, whether the same code will work if the structure or the shape of the start_point and end_point are changed. Similarly how can I use the newaxis to mdoify the following code
for j in range(32): # for all los
line_dist[j] = sqrt([sum(l) for l in (end_point[j]-start_point[j])**2])
Sorry for being so clunky, to be more clear the structure of the start_point and end_point are
array([ [[1,1,1],[],[],[]....[]],
[[],[],[],[]....[]],
[[],[],[],[]....[]]......,
[[],[],[],[]....[]] ])
Explanation of the newaxis version in the question: these are not matrix multiplies, ndarray multiply is element-by-element multiply with broadcasting. step[:,newaxis,newaxis] is num_steps x 1 x 1 and point[newaxis,:,:] is 1 x num_points x num_dimensions. Broadcasting together ndarrays with shape (num_steps x 1 x 1) and (1 x num_points x num_dimensions) will work, because the broadcasting rules are that every dimension should be either 1 or the same; it just means "repeat the array with dimension 1 as many times as the corresponding dimension of the other array". This results in an ndarray with shape (num_steps x num_points x num_dimensions) in a very efficient way; the i, j, k subscript will be the k-th coordinate of the i-th step along the j-th line (given by the j-th pair of start and end points).
Walkthrough:
>>> start_points = numpy.array([[1, 0, 0], [0, 1, 0]])
>>> end_points = numpy.array([[10, 0, 0], [0, 10, 0]])
>>> steps = numpy.arange(10)/9.0
>>> start_points.shape
(2, 3)
>>> steps.shape
(10,)
>>> steps[:,numpy.newaxis,numpy.newaxis].shape
(10, 1, 1)
>>> (steps[:,numpy.newaxis,numpy.newaxis] * start_points).shape
(10, 2, 3)
>>> (steps[:,numpy.newaxis,numpy.newaxis] * (end_points - start_points)) + start_points
array([[[ 1., 0., 0.],
[ 0., 1., 0.]],
[[ 2., 0., 0.],
[ 0., 2., 0.]],
[[ 3., 0., 0.],
[ 0., 3., 0.]],
[[ 4., 0., 0.],
[ 0., 4., 0.]],
[[ 5., 0., 0.],
[ 0., 5., 0.]],
[[ 6., 0., 0.],
[ 0., 6., 0.]],
[[ 7., 0., 0.],
[ 0., 7., 0.]],
[[ 8., 0., 0.],
[ 0., 8., 0.]],
[[ 9., 0., 0.],
[ 0., 9., 0.]],
[[ 10., 0., 0.],
[ 0., 10., 0.]]])
As you can see, this produces the correct answer :) In this case broadcasting (10,1,1) and (2,3) results in (10,2,3). What you had is broadcasting (10,1,1) and (1,2,3) which is exactly the same and also produces (10,2,3).
The code for the distance part of the question does not need newaxis: the inputs are num_points x num_dimensions, the ouput is num_points, so one dimension has to be removed. That is actually the axis you sum along. This should work:
line_dist = numpy.sqrt( numpy.sum( (end_point - start_point) ** 2, axis=1 )
Here numpy.sum(..., axis=1) means sum along that axis only, rather than all elements: a ndarray with shape num_points x num_dimensions summed along axis=1 produces a result with num_points, which is correct.
EDIT: removed code example without broadcasting.
EDIT: fixed up order of indexes.
EDIT: added line_dist
I'm not through understanding all you wrote, but some things I already can tell you; maybe they help.
newaxis is rather a marker than a function (in fact, it is plain None). It is used to add an (unused) dimension to a multi-dimensional value. With it you can make a 3D value out of a 2D value (or even more). Each dimension already there in the input value must be represented by a colon : in the index (assuming you want to use all values, otherwise it gets complicated beyond our usecase), the dimensions to be added are denoted by newaxis.
Example:
input is a one-dimensional vector (1D): 1,2,3
output shall be a matrix (2D).
There are two ways to accomplish this; the vector could fill the lines with one value each, or the vector could fill just the first and only line of the matrix. The first is created by vector[:,newaxis], the second by vector[newaxis,:]. Results of this:
>>> array([ 7,8,9 ])[:,newaxis]
array([[7],
[8],
[9]])
>>> array([ 7,8,9 ])[newaxis,:]
array([[7, 8, 9]])
(Dimensions of multi-dimensional values are represented by nesting of arrays of course.)
If you have more dimensions in the input, use the colon more than once (otherwise the deeper nested dimensions are simply ignored, i.e. the arrays are treated as simple values). I won't paste a representation of this here as it won't clarify things due to the optical complexity when 3D and 4D values are written on a 2D display using nested brackets. I hope it gets clear anyway.
The newaxis reshapes the array in such a way so that when you multiply numpy uses broadcasting. Here is a good tutorial on broadcasting.
step[:, newaxis, newaxis] is the same as step.reshape((step.shape[0], 1, 1)) (if step is 1d). Either method for reshaping should be very fast because reshaping arrays in numpy is very cheep, it just makes a view of the array, especially because you should only be doing it once.

Numpy setting j, j+1, j+2 to a

Is there a short-code efficient way of "glueing" two arrays together such that if the arrays differ in length then the glued product must be such that the values from the longer are filled between values from the smaller untill the the new product has the same length as sum of the length of the two arrays? Or: Is there a way to create an array where x = [a j j j b j j j ], that is to say, take array that has values [a b], create a new one by filling 3 js between each element of that array to get : [a j j j b]
There is the obvious way of doing this by a loop since I know the size of the product beforehand but I suspect there must be a more "numpyic" solution at hand.
It is easy to do when both arrays I want to "glue" are of the same size and the product is [a j b j c j], ie every other as can be seen in this:
np.append(np.zeros((10,1)),np.ones((10,1)),1).reshape(2*10)
array([ 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0.,
1., 0., 1., 0., 1., 0., 1.])
but you cannot do
np.append(np.zeros((10,1)),np.ones((20,1)),1).reshape(20+10)
I apologize if the question isn't clear enough, please do tell which parts I can clarify, my English is broken.
Assuming that len(A) == n and len(B) == N and Nis a multiple of n, ie there is some integer m such that N = m*n, and you can do:
import numpy as np
A = np.zeros(10)
B = np.ones(20)
n = len(A)
C = np.concatenate([A.reshape(n, 1), B.reshape(n, -1)], axis=1)
C = C.ravel()
This is pretty much what you have in the question, but the trick is to reshape B to be (n, m) instead of (N, 1) ie (10, 2) instead of (20, 1) in this case. The -1 in reshape is short hand for "whatever will make it work" it's a lazy way of doing B.reshape(n, len(B)//n).
Based on your question it seems like the array B might just be homogenous array, (ie all(B == j)), in which case you could just do:
import numpy as np
A = np.zeros(10)
j = 1.
C = np.zeros([10, 3])
C[:, 0] = A
C[:, 1:] = j
C = C.ravel()

Categories