Related
I have a little bit of a tricky problem here...
Given two arrays A and B
A = np.array([8, 5, 3, 7])
B = np.array([5, 5, 7, 8, 3, 3, 3])
I would like to replace the values in B with the index of that value in A. In this example case, that would look like:
[1, 1, 3, 0, 2, 2, 2]
For the problem I'm working on, A and B contain the same set of values and all of the entries in A are unique.
The simple way to solve this is to use something like:
for idx in range(len(A)):
ind = np.where(B == A[idx])[0]
B_new[ind] = A[idx]
But the B array I'm working with contains almost a million elements and using a for loop gets super slow. There must be a way to vectorize this, but I can't figure it out. The closest I've come is to do something like
np.intersect1d(A, B, return_indices=True)
But this only gives me the first occurrence of each element of A in B. Any suggestions?
The solution of #mozway is good for small array but not for big ones as it runs in O(n**2) time (ie. quadratic time, see time complexity for more information). Here is a much better solution for big array running in O(n log n) time (ie. quasi-linear) based on a fast binary search:
unique_values, index = np.unique(A, return_index=True)
result = index[np.searchsorted(unique_values, B)]
Use numpy broadcasting:
np.where(B[:, None]==A)[1]
NB. the values in A must be unique
Output:
array([1, 1, 3, 0, 2, 2, 2])
Though cant tell exactly what the complexity of this is, I belive it will perform quite well:
A.argsort()[np.unique(B, return_inverse = True)[1]]
array([1, 1, 3, 0, 2, 2, 2], dtype=int64)
i=np.arange(1,4,dtype=np.int)
a=np.arange(9).reshape(3,3)
and
a
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[:,0:1]
>>>array([[0],
[3],
[6]])
a[:,0:2]
>>>array([[0, 1],
[3, 4],
[6, 7]])
a[:,0:3]
>>>array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Now I want to vectorize the array to print them all together. I try
a[:,0:i]
or
a[:,0:i[:,None]]
It gives TypeError: only integer scalar arrays can be converted to a scalar index
Short answer:
[a[:,:j] for j in i]
What you are trying to do is not a vectorizable operation. Wikipedia defines vectorization as a batch operation on a single array, instead of on individual scalars:
In computer science, array programming languages (also known as vector or multidimensional languages) generalize operations on scalars to apply transparently to vectors, matrices, and higher-dimensional arrays.
...
... an operation that operates on entire arrays can be called a vectorized operation...
In terms of CPU-level optimization, the definition of vectorization is:
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
The problem with your case is that the result of each individual operation has a different shape: (3, 1), (3, 2) and (3, 3). They can not form the output of a single vectorized operation, because the output has to be one contiguous array. Of course, it can contain (3, 1), (3, 2) and (3, 3) arrays inside of it (as views), but that's what your original array a already does.
What you're really looking for is just a single expression that computes all of them:
[a[:,:j] for j in i]
... but it's not vectorized in a sense of performance optimization. Under the hood it's plain old for loop that computes each item one by one.
I ran into the problem when venturing to use numpy.concatenate to emulate a C++ like pushback for 2D-vectors; If A and B are two 2D numpy.arrays, then numpy.concatenate(A,B) yields the error.
The fix was to simply to add the missing brackets: numpy.concatenate( ( A,B ) ), which are required because the arrays to be concatenated constitute to a single argument
This could be unrelated to this specific problem, but I ran into a similar issue where I used NumPy indexing on a Python list and got the same exact error message:
# incorrect
weights = list(range(1, 129)) + list(range(128, 0, -1))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
# TypeError: only integer scalar arrays can be converted to a scalar index
It turns out I needed to turn weights, a 1D Python list, into a NumPy array before I could apply multi-dimensional NumPy indexing. The code below works:
# correct
weights = np.array(list(range(1, 129)) + list(range(128, 0, -1)))
mapped_image = weights[image[:, :, band]] # image.shape = [800, 600, 3]
try the following to change your array to 1D
a.reshape((1, -1))
You can use numpy.ravel to return a flattened array from n-dimensional array:
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
I had a similar problem and solved it using list...not sure if this will help or not
classes = list(unique_labels(y_true, y_pred))
this problem arises when we use vectors in place of scalars
for example in a for loop the range should be a scalar, in case you have given a vector in that place you get error. So to avoid the problem use the length of the vector you have used
I ran across this error when while trying to access elements of a list using a 1-D array. I was suggested this page but I don't the answer I was looking for.
Let l be the list and myarray be my 1D array. The correct way to access list l using elements of myarray is
np.take(l,myarray)
I feel silly, because this is such a simple thing, but I haven't found the answer either here or anywhere else.
Is there no straightforward way of indexing a numpy array with another?
Say I have a 2D array
>> A = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
if I want to access element [3,1] I type
>> A[3,1]
8
Now, say I store this index in an array
>> ind = np.array([3,1])
and try using the index this time:
>> A[ind]
array([[7, 8],
[3, 4]])
the result is not A[3,1]
The question is: having arrays A and ind, what is the simplest way to obtain A[3,1]?
Just use a tuple:
>>> A[(3, 1)]
8
>>> A[tuple(ind)]
8
The A[] actually calls the special method __getitem__:
>>> A.__getitem__((3, 1))
8
and using a comma creates a tuple:
>>> 3, 1
(3, 1)
Putting these two basic Python principles together solves your problem.
You can store your index in a tuple in the first place, if you don't need NumPy array features for it.
That is because by giving an array you actually ask
A[[3,1]]
Which gives the third and first index of the 2d array instead of the first index of the third index of the array as you want.
You can use
A[ind[0],ind[1]]
You can also use (if you want more indexes at the same time);
A[indx,indy]
Where indx and indy are numpy arrays of indexes for the first and second dimension accordingly.
See here for all possible indexing methods for numpy arrays: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html
I want to multiply each element of a list with each element of another list.
lst1 = [1, 2, 1, 2]
lst2 = [2, 2, 2]
lst3 = []
for item in lst1:
for i in lst2:
rs = i * item
lst3.append(rs)
This would work, but this is very inefficient in large dataset and can take very long to complete loop. Note, the length of both lists can vary here.
I am fine with using non built-in data structures. I checked numpy and there seems to be way called broadcasting in ndarray. I am not sure if it is way to go. So far, multiplying array with scalar works as expected.
arr = np.arange(3)
arr * 2
This returns:
array([0, 2, 4])
But they way it works with another array is bit different and I can't seem to achieve above.
I guess it must be something straight forward, but I can't seem to find exact solution needed at the moment. Any input will be highly appreciated. Thanks.
Btw, there is similar question for Scheme without considering efficiency here
Edit: Thanks for you answers. Multiplication works, see Dval's answer. However, I also need to do addition and possibly division too exactly same way. For that reason, I updated question a bit.
Edit: I can work with numpy array itself, so I don't need to convert list to array and back.
Numpy is the way to go, specifically numpy.outer, which returns the product of each element as a matrix. Using .flatten() compresses it into 1d.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2])
numpy.outer(lst1, lst2).flatten()
To add to updated question, addition seems to work similar way:
numpy.add.outer(lst1, lst2).flatten()
Linear operations on arrays like this are the meat-n-potatoes of numpy. Once you have defined the arrays, matrix like operations on them are easy, and relatively fast. That includes outer products and inner (matrix) products, as well as element by element operations.
For example:
In [133]: a=np.array([1,2,1,2])
In [134]: b=np.array([2,2,2])
A list comprehension version of your double loop:
In [135]: [i*j for i in a for j in b]
Out[135]: [2, 2, 2, 4, 4, 4, 2, 2, 2, 4, 4, 4]
A numpy product using broadcasting. Think a a[:,None] as turning a into a column vector.
In [136]: a[:,None]*b
Out[136]:
array([[2, 2, 2],
[4, 4, 4],
[2, 2, 2],
[4, 4, 4]])
element by element division also works
In [137]: a[:,None]/b
Out[137]:
array([[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
But this gets more useful when combining operations.
There is overhead in converting lists to arrays, so I wouldn't recommend it for small occasional calculations.
Use numpy - it's a library designed for complex matrix-based arithmetic.
import numpy
lst1 = numpy.array([1, 2, 1, 2])
lst2 = numpy.array([2, 2, 2]]
numpy.outer(lst1, lst2)
I'm really confused by the index logic of numpy arrays with several dimensions. Here is an example:
import numpy as np
A = np.arange(18).reshape(3,2,3)
[[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17]]])
this gives me an array of shape (3,2,3), call them (x,y,z) for sake of argument. Now I want an array B with the elements from A corresponding to x = 0,2 y =0,1 and z = 1,2. Like
array([[[ 1, 2],
[4, 5]],
[[13, 14],
[16, 17]]])
Naively I thought that
B=A[[0,2],[0,1],[1,2]]
would do the job. But it gives
array([ 2, 104])
and does not work.
A[[0,2],:,:][:,:,[1,2]]
does the job. But I still wonder whats wrong with my first try. And what is the best way to do what I want to do?
There are two types of indexing in NumPy basic and advanced. Basic indexing uses tuples of slices for indexing, and does not copy the array, but rather creates a view with adjusted strides. Advanced indexing in contrast also uses lists or arrays of indices and copies the array.
Your first attempt
B = A[[0, 2], [0, 1], [1, 2]]
uses advanced indexing. In advanced indexing, all index lists are first broadcasted to the same shape, and this shape is used for the output array. In this case, they already have the same shape, so the broadcasting does not do anything. The output array will also have this shape of two entries. The first entry of the output array is obtained by using all first indices of the three lists, and the second by using all second indices:
B = numpy.array([A[0, 0, 1], A[2, 1, 2]])
Your second approach
B = A[[0,2],:,:][:,:,[1,2]]
does work, but it is inefficient. It uses advanced indexing twice, so your data will be copied twice.
To get what you actually want with advanced indexing, you can use
A[np.ix_([0,2],[0,1],[1,2])]
as pointed out by nikow. This will copy the data only once.
In your example, you can get away without copying the data at all, using only basic indexing:
B = A[::2, :, 1:2]
I recommend the following advanced tutorial, which explains the various indexing methods: NumPy MedKit
Once you understand the powerful ways to index arrays (and how they can be combined) it will make sense. If your first try was valid then this would collide with some of the other indexing techniques (reducing your options in other use cases).
In your example you can exploit that the third index covers a continuous range:
A[[0,2],:,1:]
You could also use
A[np.ix_([0,2],[0,1],[1,2])]
which is handy in more general cases, when the latter indices are not continuous. np.ix_ simply constructs three index arrays.
As Sven pointed out in his answer, there is a more efficient way in this specific case (using a view instead of a copied version).
Edit: As pointed out by Sven my answer contained some errors, which I have removed. I still think that his answer is better, but unfortunately I can't delete mine now.
A[(0,2),:,1:]
If you wanted
array([[[ 1, 2],
[ 4, 5]],
[[13, 14],
[16, 17]]])
A[indices you want,rows you want, col you want]