How can I do the indexing of some arrays used as indices? I have the following six 2D arrays like this-
array([[2, 0],
[3, 0],
[3, 1],
[5, 0],
[5, 1],
[5, 2]])
I want to use these arrays as indices and put the value 10 in the corresponding indices of a new empty matrix. The output should look like this-
array([[ 0, 0, 0],
[ 0, 0, 0],
[10, 0, 0],
[10, 10, 0],
[ 0, 0, 0],
[10, 10, 10]])
So far I have tried this-
from numpy import*
a = array([[2,0],[3,0],[3,1],[5,0],[5,1],[5,2]])
b = zeros((6,3),dtype ='int32')
b[a] = 10
But this gives me the wrong output.
In [1]: import numpy as np
In [2]: a = np.array([[2,0],[3,0],[3,1],[5,0],[5,1],[5,2]])
In [3]: b = np.zeros((6,3), dtype='int32')
In [4]: b[a[:,0], a[:,1]] = 10
In [5]: b
Out[5]:
array([[ 0, 0, 0],
[ 0, 0, 0],
[10, 0, 0],
[10, 10, 0],
[ 0, 0, 0],
[10, 10, 10]])
Why it works:
If you index b with two numpy arrays in an assignment,
b[x, y] = z
then think of NumPy as moving simultaneously over each element of x and each element of y and each element of z (let's call them xval, yval and zval), and assigning to b[xval, yval] the value zval. When z is a constant, "moving over z just returns the same value each time.
That's what we want, with x being the first column of a and y being the second column of a. Thus, choose x = a[:, 0], and y = a[:, 1].
b[a[:,0], a[:,1]] = 10
Why b[a] = 10 does not work
When you write b[a], think of NumPy as creating a new array by moving over each element of a, (let's call each one idx) and placing in the new array the value of b[idx] at the location of idx in a.
idx is a value in a. So it is an int32. b is of shape (6,3), so b[idx] is a row of b of shape (3,). For example, when idx is
In [37]: a[1,1]
Out[37]: 0
b[a[1,1]] is
In [38]: b[a[1,1]]
Out[38]: array([0, 0, 0])
So
In [33]: b[a].shape
Out[33]: (6, 2, 3)
So let's repeat: NumPy is creating a new array by moving over each element of a and placing in the new array the value of b[idx] at the location of idx in a. As idx moves over a, an array of shape (6,2) would be created. But since b[idx] is itself of shape (3,), at each location in the (6,2)-shaped array, a (3,)-shaped value is being placed. The result is an array of shape (6,2,3).
Now, when you make an assignment like
b[a] = 10
a temporary array of shape (6,2,3) with values b[a] is created, then the assignment is performed. Since 10 is a constant, this assignment places the value 10 at each location in the (6,2,3)-shaped array.
Then the values from the temporary array are reassigned back to b.
See reference to docs. Thus the values in the (6,2,3)-shaped array are copied back to the (6,3)-shaped b array. Values overwrite each other. But the main point is you do not obtain the assignments you desire.
Related
How do I combine multiple column vectors into a Matrix? For example, if I have 3 10 x 1 vectors, how do I put them into a 10 x 3 matrix?
Here's what I've tried so far:
D0 =np.array([[np.cos(2*np.pi*f*time)],[np.sin(2*np.pi*f*time)],np.ones((len(time),1)).transpose()],'float').transpose()
this gives me something like this ,
[[[ 1.00000000e+00 0.00000000e+00 1.00000000e+00]]
[[ 9.99999741e-01 7.19053432e-04 1.00000000e+00]]
[[ 9.99998966e-01 1.43810649e-03 1.00000000e+00]]
...
[[ 9.99998966e-01 -1.43810649e-03 1.00000000e+00]]
[[ 9.99999741e-01 -7.19053432e-04 1.00000000e+00]]
[[ 1.00000000e+00 -2.15587355e-14 1.00000000e+00]]]
but, I don't think this is right, it looks more like an array of lists (and I couldn't do matrix multiplication with this form)...I tried numpy.concatenate as well, but that didn't work for me either...Looking into stack next....
In Matlab notation, I need to get this into a form
D0 =[cos(2*pi*f *t1), sin(2*pi*f*t1) ,1; cos(2*pi*f*t2), sin(2*pi*f*t2) ,1;....] etc
So that I can find the least squares solution s_hat:
s_hat = (D0^T D0)^-1(D0^T x)
where x is another input vector containing the samples of the sinusoid I'm trying to fit.
In Matlab, I could just type
D0 = [cos(2*np.pi*f*time),sin(2*np.pi*f*time), repmat(1,len(time),1)]
to create the D0 matrix. How do I do this in python?
Thank you!
Here you have equivalent complete examples in Matlab and Python/NumPy:
% Matlab
f = 0.1;
time = [0; 1; 2; 3];
D0 = [cos(2*pi*f*time), sin(2*pi*f*time), repmat(1,length(time),1)]
# Python
import numpy as np
f = 0.1
time = np.array([0, 1, 2, 3])
D0 = np.array([np.cos(2*np.pi*f*time), np.sin(2*np.pi*f*time), np.ones(time.size)]).T
print(D0)
Note that unlike Matlab, Python/NumPy has no special syntax to distinguish rows from columns (, vs. ; in Matlab). Similarly, a 1D NumPy array has no notion of either being a "column" or "row" vector. When merging several 1D NumPy arrays into a single 2D array, as above, each 1D array ends up as a row in the 2D array. As you want them as columns, you need to transpose the 2D array, here accomplished simply by the .T attribute.
If the arrays really are (10,1) shape, then simply concatenate:
In [60]: x,y,z = np.ones((10,1),int), np.zeros((10,1),int), np.arange(10)[:,None]
In [61]: np.concatenate([x,y,z], axis=1)
Out[61]:
array([[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 0, 3],
[1, 0, 4],
[1, 0, 5],
[1, 0, 6],
[1, 0, 7],
[1, 0, 8],
[1, 0, 9]])
If they are actually 1d, you'll have to fiddle with dimensions in one way or other. For example reshape or add a dimension as I did with z above. Or use some function that does that for you:
In [62]: x,y,z = np.ones((10,),int), np.zeros((10,),int), np.arange(10)
In [63]: z.shape
Out[63]: (10,)
In [64]: np.array([x,y,z]).shape
Out[64]: (3, 10)
In [65]: np.array([x,y,z]).T # transpose
Out[65]:
array([[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 0, 3],
[1, 0, 4],
[1, 0, 5],
[1, 0, 6],
[1, 0, 7],
[1, 0, 8],
[1, 0, 9]])
np.array([...]) joins the arrays on a new initial dimension. Remember in Python/numpy the first dimension is the outermost one (MATLAB is the reverse).
stack variants tweak the dimensions, and then do concatenate:
In [66]: np.stack([x,y,z],axis=1).shape
Out[66]: (10, 3)
In [67]: np.column_stack([x,y,z]).shape
Out[67]: (10, 3)
In [68]: np.vstack([x,y,z]).shape
Out[68]: (3, 10)
===
D0 =np.array([[np.cos(2*np.pi*f*time)],[np.sin(2*np.pi*f*time)],np.ones((len(time),1)).transpose()],'float').transpose()
I'm guessing f is a scalar, and time is a 1d array (shape (10,))
[np.cos(2*np.pi*f*time)]
wraps a (10,) in [], which when turned into an array becomes (1,10) shape.
np.ones((len(time),1)).transpose() is (10,1) transposed to (1,10).
np.array(....) of these creates a (3,1,10) array. Transpose of that is (10,1,3).
If you dropped the [] and shape that created (1,10) arrays:
D0 =np.array([np.cos(2*np.pi*f*time), np.sin(2*np.pi*f*time), np.ones((len(time))]).transpose()
would join 3 (10,) arrays to make (3,10), which then transposes to (10,3).
Alternatively,
D0 =np.concatenate([[np.cos(2*np.pi*f*time)], [np.sin(2*np.pi*f*time)], np.ones((1,len(time),1))], axis=0)
joins the 3 (1,10) arrays to make a (3,10), which you can transpose.
I have a numpy array A of n 1x3 arrays where n is the total number of possible combinations of elements in the 1x3 arrays, where each element ranges from 0 to 50. That is,
A = [[0,0,0],[0,0,1]...[0,1,0]...[50,50,50]]
and
len(A) = 50*50*50 = 125000
I have a numpy array B of m 1x3 arrays where m = 10 million, and the arrays can have values belonging to the set described by A.
I want to count up how many of each combination is present in B, that is, how many times [0,0,0] appears in B, how many times [0,0,1] appears...how many times [50,50,50] appears. So far I have the following:
for i in range(len(A)):
for j in range(len(B)):
if np.array_equal(A[i], B[j]):
y[i] += 1
where y keeps track of how many times the ith array occurs. So, y[0] is how many times [0,0,0] appeared in B, y[1] is how many times [0,0,1] appeared...y[125000] is how many times [50,50,50] appeared, etc.
The problem is this takes forever. It has to check 10 million entries, 125000 times. Is there a quicker and more efficient way to do this?
Here is a fast approach. It processes 10 million tuples out of range(50)^3 in a fraction of a second and is roughly 100 times faster than the next best solution (#Primusa's):
It uses the fact that there is a straight-forward translation between such tuples and the numbers 0 - 50^3 - 1. (The mapping happens to be the same as the one between the rows of your A and the row numbers.) The functions np.ravel_multi_index and np.unravel_index implement this translation and its inverse.
Once B is translated into numbers, their frequencies can be determined very efficiently using np.bincount. Below I reshape the result to get a 50x50x50 histogram but that is just a matter of taste and can be left out. (I have taken the liberty to only use numbers 0 through 49, so len(A) becomes 125000):
>>> B = np.random.randint(0, 50, (10000000, 3))
>>> Br = np.ravel_multi_index(B.T, (50, 50, 50))
>>> result = np.bincount(Br, minlength=125000).reshape(50, 50, 50)
Let's look at a smaller example for demonstration:
>>> B = np.random.randint(0, 3, (10, 3))
>>> Br = np.ravel_multi_index(B.T, (3, 3, 3))
>>> result = np.bincount(Br, minlength=27).reshape(3, 3, 3)
>>>
>>> B
array([[1, 1, 2],
[2, 1, 2],
[2, 0, 0],
[2, 1, 0],
[2, 0, 2],
[0, 0, 2],
[0, 0, 2],
[0, 2, 2],
[2, 0, 0],
[0, 2, 0]])
>>> result
array([[[0, 0, 2],
[0, 0, 0],
[1, 0, 1]],
[[0, 0, 0],
[0, 0, 1],
[0, 0, 0]],
[[2, 0, 1],
[1, 0, 1],
[0, 0, 0]]])
To query for example how many times [2,1,0] is in B one would do
>>> result[2,1,0]
1
As mentioned above: To convert between indices into your A and the actual rows of A (which are the indices into my result), np.ravel_multi_index and np.unravel_index can be used. Or you can leave out the last reshape (i.e. use result = np.bincount(Br, minlength=125000); then the counts are indexed exactly the same as A.
You can use a dict() to speed up this process to just going through 10 million entries.
So the first thing you want to do is to change all the sublists in A to hashable objects do you can use them as keys in a dict.
Converting all the sublists to tuples:
A = [tuple(i) for i in A]
Then create a dict() with every value in A as the key and the value being 0.
d = {i:0 for i in A}
Now for each subarray in your numpy array, you just want to convert it to a tuple and increment d[that array] by 1
for subarray in B:
d[tuple(subarray)] += 1
D is now a dictionary where for each key the value is how many times that key occured in B.
You can find the unique rows and their counts from array B by calling the np.unique over its first axis and return_counts=True. Then, you can use broadcasting to find the indices of the B's unique rows in A by calling the ndarray.all and ndarray.any methods on proper axises. Then all you need is just a simple indexing:
In [82]: unique, counts = np.unique(B, axis=0, return_counts=True)
In [83]: indices = np.where((unique == A[:,None,:]).all(axis=2).any(axis=0))[0]
# Get items from A that exist in B
In [84]: unique[indices]
# Get the counts
In [85]: counts[indices]
Example:
In [86]: arr = np.array([[2 ,3, 4], [5, 6, 0], [2, 3, 4], [1, 0, 4], [3, 3, 3], [5, 6, 0], [2, 3, 4]])
In [87]: a = np.array([[2, 3, 4], [1, 9, 5], [3, 3, 3]])
In [88]: unique, counts = np.unique(arr, axis=0, return_counts=True)
In [89]: indices = np.where((unique == a[:,None,:]).all(axis=2).any(axis=0))[0]
In [90]: unique[indices]
Out[90]:
array([[2, 3, 4],
[3, 3, 3]])
In [91]: counts[indices]
Out[91]: array([3, 1])
You can do this
y=[np.where(np.all(B==arr,axis=1))[0].shape[0] for arr in A]
arr just iterate over A and np.all checks where it matches with B and np.where returns the positions of those matches as an array then shape just returns the length of that array or in other words the desired frequency
I have a 2d numpy array., A I want to apply np.bincount() to each column of the matrix A to generate another 2d array B that is composed of the bincounts of each column of the original matrix A.
My problem is that np.bincount() is a function that takes a 1d array-like. It's not an array method like B = A.max(axis=1) for example.
Is there a more pythonic/numpythic way to generate this B array other than a nasty for-loop?
import numpy as np
states = 4
rows = 8
cols = 4
A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))
for x in range(A.shape[1]):
B[:,x] = np.bincount(A[:,x])
Using the same philosophy as in this post, here's a vectorized approach -
m = A.shape[1]
n = A.max()+1
A1 = A + (n*np.arange(m))
out = np.bincount(A1.ravel(),minlength=n*m).reshape(m,-1).T
I would suggest to use np.apply_along_axis, which will allow you to apply a 1D-method (in this case np.bincount) to 1D slices of a higher dimensional array:
import numpy as np
states = 4
rows = 8
cols = 4
A = np.random.randint(0,states,(rows,cols))
B = np.zeros((states,cols))
B = np.apply_along_axis(np.bincount, axis=0, arr=A)
You'll have to be careful, though. This (as well as your suggested for-loop) only works if the output of np.bincount has the right shape. If the maximum state is not present in one or multiple columns of your array A, the output will not have a smaller dimensionality and thus, the code will file with a ValueError.
This solution using the numpy_indexed package (disclaimer: I am its author) is fully vectorized, thus does not include any python loops behind the scenes. Also, there are no restrictions on the input; not every column needs to contain the same set of unique values.
import numpy_indexed as npi
rowidx, colidx = np.indices(A.shape)
(bin, col), B = npi.count_table(A.flatten(), colidx.flatten())
This gives an alternative (sparse) representation of the same result, which may be much more appropriate if the B array does indeed contain many zeros:
(bin, col), count = npi.count((A.flatten(), colidx.flatten()))
Note that apply_along_axis is just syntactic sugar for a for-loop, and has the same performance characteristics.
Yet another possibility:
import numpy as np
def bincount_columns(x, minlength=None):
nbins = x.max() + 1
if minlength is not None:
nbins = max(nbins, minlength)
ncols = x.shape[1]
count = np.zeros((nbins, ncols), dtype=int)
colidx = np.arange(ncols)[None, :]
np.add.at(count, (x, colidx), 1)
return count
For example,
In [110]: x
Out[110]:
array([[4, 2, 2, 3],
[4, 3, 4, 4],
[4, 3, 4, 4],
[0, 2, 4, 0],
[4, 1, 2, 1],
[4, 2, 4, 3]])
In [111]: bincount_columns(x)
Out[111]:
array([[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 3, 2, 0],
[0, 2, 0, 2],
[5, 0, 4, 2]])
In [112]: bincount_columns(x, minlength=7)
Out[112]:
array([[1, 0, 0, 1],
[0, 1, 0, 1],
[0, 3, 2, 0],
[0, 2, 0, 2],
[5, 0, 4, 2],
[0, 0, 0, 0],
[0, 0, 0, 0]])
I have a lot of data in database in (x, y, value) triplet form.
I would like to be able to create dynamically a 2D numpy array from this data by setting value at the coords (x,y) of the array.
For instance if I have :
(0,0,8)
(0,1,5)
(0,2,3)
(1,0,4)
(1,1,0)
(1,2,0)
(2,0,1)
(2,1,2)
(2,2,5)
The resulting array should be :
Array([[8,5,3],[4,0,0],[1,2,5]])
I'm new to numpy, is there any method in numpy to do so ? If not, what approach would you advice to do this ?
Extending the answer from #MaxU, in case the coordinates are not ordered in a grid fashion (or in case some coordinates are missing), you can create your array as follows:
import numpy as np
a = np.array([(0,0,8),(0,1,5),(0,2,3),
(1,0,4),(1,1,0),(1,2,0),
(2,0,1),(2,1,2),(2,2,5)])
Here a represents your coordinates. It is an (N, 3) array, where N is the number of coordinates (it doesn't have to contain ALL the coordinates). The first column of a (a[:, 0]) contains the Y positions while the second columne (a[:, 1]) contains the X positions. Similarly, the last column (a[:, 2]) contains your values.
Then you can extract the maximum dimensions of your target array:
# Maximum Y and X coordinates
ymax = a[:, 0].max()
xmax = a[:, 1].max()
# Target array
target = np.zeros((ymax+1, xmax+1), a.dtype)
And finally, fill the array with data from your coordinates:
target[a[:, 0], a[:, 1]] = a[:, 2]
The line above sets values in target at a[:, 0] (all Y) and a[:, 1] (all X) locations to their corresponding a[:, 2] value (your value).
>>> target
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Additionally, if you have missing coordinates, and you want to replace those missing values by some number, you can initialize the array as:
default_value = -1
target = np.full((ymax+1, xmax+1), default_value, a.type)
This way, the coordinates not present in your list will be filled with -1 in the target array/
Why not using sparse matrices? (which is pretty much the format of your triplets.)
First split the triplets in rows, columns, and data using numpy.hsplit(). (Use numpy.squeeze() to convert the resulting 2d arrays to 1d arrays.)
>>> row, col, data = [np.squeeze(splt) for splt
... in np.hsplit(tripets, tripets.shape[-1])]
Use the sparse matrix in coordinate format, and convert it to an array.
>>> from scipy.sparse import coo_matrix
>>> coo_matrix((data, (row, col))).toarray()
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
is that what you want?
In [37]: a = np.array([(0,0,8)
....: ,(0,1,5)
....: ,(0,2,3)
....: ,(1,0,4)
....: ,(1,1,0)
....: ,(1,2,0)
....: ,(2,0,1)
....: ,(2,1,2)
....: ,(2,2,5)])
In [38]:
In [38]: a
Out[38]:
array([[0, 0, 8],
[0, 1, 5],
[0, 2, 3],
[1, 0, 4],
[1, 1, 0],
[1, 2, 0],
[2, 0, 1],
[2, 1, 2],
[2, 2, 5]])
In [39]:
In [39]: a[:, 2].reshape(3,len(a)//3)
Out[39]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
or a bit more flexible (after your comment):
In [48]: a[:, 2].reshape([int(len(a) ** .5)] * 2)
Out[48]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Explanation:
this gives you the 3rd column (value):
In [42]: a[:, 2]
Out[42]: array([8, 5, 3, 4, 0, 0, 1, 2, 5])
In [49]: [int(len(a) ** .5)]
Out[49]: [3]
In [50]: [int(len(a) ** .5)] * 2
Out[50]: [3, 3]
How would I do the following in numpy?
select all rows of an array containing note more than 50% 0 values.
select first n (let's say 2) rows from all rows satisfying 1.
do something and place modified rows on same index of a zero array with equal shape of 1.
The following results in an array where no new values are assigned:
In [177]:
a = np.array([[0,0,3],[4,5,6],[7,0,0],[10,11,12],[13,14,15]])
b = np.zeros_like(a)
a
Out[177]:
array([[ 0, 0, 3],
[ 4, 5, 6],
[ 7, 0, 0],
[10, 11, 12],
[13, 14, 15]])
In [178]:
# select all rows containg note more than 50% 0 values
percent = np.sum(a == 0, axis=-1) / float(check.shape[1])
percent = percent >= 0.5
slice = np.invert(percent).nonzero()[0]
In [183]:
# select first two rows satisfying 'slice'
a[slice][0:2]
Out[183]:
array([[ 4, 5, 6],
[10, 11, 12]])
In [182]:
# do something and place modified rows on same index of zero array
b[slice][0:2] = a[slice][0:2] * 2
In [184]:
b
Out[184]:
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
The problem is that b[slice] creates a copy rather than a view (it triggers fancy indexing). The code b[slice][0:2] creates a view of this copy (not the original b!). Therefore...
b[slice][0:2] = a[slice][0:2] * 2
...is assigning the corresponding rows of a to a view of the copy of b.
Because it can lead to these situations, it's better not to chain indexing operations in this way. Instead, just compute the relevant row numbers for slice first and then do the assignment:
slice = np.invert(percent).nonzero()[0][:2] # first two rows
b[slice] = a[slice] * 2