I am accessing column values but it gives me the row values while indexing 2-d array in numpy.
The general format is arr_2d[row][col] or arr_2d[row,col]. recommended
is comma notation for clarity
arr_2d = np.arange(0,9).reshape((3,3))
# sub array
arr_2d[0:2,1:]
arr_2d[:,0] # in row form but the data will be of the column.
access column data but it gives me row values.
arr_2d[:][0] # it gives the first row data.
What is the difference between comma notation and bracket notation?
arr_2d[row][col] is only works as you intended, i.e, like arr_2d[row,col], if you pass an integer as row index, not slices.
For e.g.:
>>> arr_2d = np.arange(0,9).reshape((3,3))
>>> arr_2d[1][2]
5
>>> arr_2d[1,2]
5
But:
>>> arr_2d[:][2]
array([6, 7, 8])
This is because np.ndarray[:] is essentially a copy of the original array:
>>> arr_2d[:]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> arr_2d
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
and:
>>> arr_2d[2]
array([6, 7, 8])
# so no surprises here:
>>> arr_2d[:][2]
array([6, 7, 8])
The notation arr_2d[:,0] translates to select all items in dimension 0, and the the first item in dimension 1 - amounting to the entire first column (item 0 of all rows).
The notation arr_2d[:][0] is chaining two operations:
arr_2d[:]means select all items in dimension 0 - basically referring to the entire matrix.
[0] simply selects the first item in the matrix returned by the first operation - returning the first row.
In order to select the first row, you can use either arr_2d[0] or more 'verbosely' arr_2d[0, :] (which translates to "all columns of the first row").
You could access the same items using both notations, but in different ways. For example -
In order to select the 3rd item in the 2nd row you could use:
Comma notation - arr_2d[1, 2]
Bracket notation - arr_2d[1][2]
Related
I need to access the first element of each list within a list. Usually I do this via numpy arrays by indexing:
import numpy as np
nparr=np.array([[1,2,3],[4,5,6],[7,8,9]])
first_elements = nparr[:,0]
b/c:
print(nparr[0,:])
[1 2 3]
print(nparr[:,0])
[1 4 7]
Unfortunately I have to tackle non-rectangular dynamic arrays now so numpy won't work.
But Pythons standard lists behave strangely (at least for me):
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0][:])
[1, 2, 3]
print(pylist[:][0])
[1, 2, 3]
I guess either lists doesn't support this (which would lead to a second question: What to use instead) or I got the syntax wrong?
You have a few options. Here's one.
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0]) # [1, 2, 3]
print([row[0] for row in pylist]) # [1, 4, 7]
Alternatively, if you want to transpose pylist (make its rows into columns), you could do the following.
pylist_transpose = [*zip(*pylist)]
print(pylist_transpose[0]) # [1, 4, 7]
pylist_transpose will always be a rectangular array with a number of rows equal to the length of the shortest row in pylist.
I have 3 numpy.ndarray vectors, X, Y and intensity. I would like to mix it in an numpy array, then sort by the third column (or the first one). I tried the following code:
m=np.column_stack((X,Y))
m=np.column_stack((m,intensity))
m=np.sort(m,axis=2)
Then I got the error: ValueError: axis(=2) out of bounds.
When I print m, I get:
array([[ 109430, 285103, 121],
[ 134497, 284907, 134],
[ 160038, 285321, 132],
...,
[12374406, 2742429, 148],
[12371858, 2741994, 148],
[12372221, 2742017, 161]])
How can I fix it. that is, get a sorted array?
Axis=2 does not refer to the column index but rather, to the dimension of the array. It means numpy will try to look for a third dimension in the data and sorts it from smallest to largest in the third dimension. Sorting from smallest to largest in the first dimension (axis = 0) would be have the values in all rows going from smallest to largest. Sorting from smallest to largest in the second dimension (axis = 1) would be have the values in all columns going from smallest to largest. Examples would be below.
Furthermore, sort would work differently depending on the base array. Two arrays are considered: Unstructured and structured.
Unstructured
X = np.nrandn(10)
X = np.nrandn(10)
intensity = np.nrandn(10)
m=np.column_stack((X,Y))
m=np.column_stack((m,intensity))
m is being treated as an unstructured array because there are no fields linked to any of the columns. In other words, if you call np.sort() on m, it will just sort them from smallest to largest from top to bottom if axis=0 and left to right if axis=1. The rows are not being preserved.
Original:
[[ 1.20122251 1.41451461 -1.66427245]
[ 1.3657312 -0.2318793 -0.23870104]
[-0.30280613 0.79123814 -1.64082042]]
Axis=1:
[[-1.66427245 1.20122251 1.41451461]
[-0.23870104 -0.2318793 1.3657312 ]
[-1.64082042 -0.30280613 0.79123814]]
Axis = 0:
[[-0.30280613 -0.2318793 -1.66427245]
[ 1.20122251 0.79123814 -1.64082042]
[ 1.3657312 1.41451461 -0.23870104]]
Structured
As you can see, the data structure in the rows is not kept. If you would like to preserve the row order, you need to add in labels to the datatypes and create an array with this. You can sort by the other columns with order = label_name.
dtype = [("a",float),("b",float),("c",float)]
m = [tuple(x) for x in m]
labelled_arr = np.array(m,dtype)
print np.sort(labelled_arr,order="a")
This will get:
[(-0.30280612629541204, 0.7912381363389004, -1.640820419927318)
(1.2012225144719493, 1.4145146097431947, -1.6642724545574712)
(1.3657312047892836, -0.23187929505306418, -0.2387010374198555)]
Another more convenient way of doing this would be passing the data into a pandas dataframe which automatically creates column names from 0 to n-1. Then you can just call the sort_values method and pass in the column index you want and follow it by axis=0 if you would like it to be sorted from top to bottom just like in numpy.
Example:
pd.DataFrame(m).sort_values(0,axis = 0)
Output:
0 1 2
2 -0.302806 0.791238 -1.640820
0 1.201223 1.414515 -1.664272
1 1.365731 -0.231879 -0.238701
You are getting that error because you don't have an axis with a 2 index. Axes are zero-indexed. Regardless, np.sort will sort every column, or every row. Consider from the docs:
order : str or list of str, optional When a is an array with fields
defined, this argument specifies which fields to compare first,
second, etc. A single field can be specified as a string, and not all
fields need be specified, but unspecified fields will still be used,
in the order in which they come up in the dtype, to break ties.
For example:
In [28]: a
Out[28]:
array([[0, 0, 1],
[1, 2, 3],
[3, 1, 8]])
In [29]: np.sort(a, axis = 0)
Out[29]:
array([[0, 0, 1],
[1, 1, 3],
[3, 2, 8]])
In [30]: np.sort(a, axis = 1)
Out[30]:
array([[0, 0, 1],
[1, 2, 3],
[1, 3, 8]])
So, I think what you really want is this neat little idiom:
In [32]: a[a[:,2].argsort()]
Out[32]:
array([[0, 0, 1],
[1, 2, 3],
[3, 1, 8]])
This question already has answers here:
What is the difference between i = i + 1 and i += 1 in a 'for' loop? [duplicate]
(6 answers)
Closed 6 years ago.
I need a good explanation (reference) to explain NumPy slicing within (for) loops. I have three cases.
def example1(array):
for row in array:
row = row + 1
return array
def example2(array):
for row in array:
row += 1
return array
def example3(array):
for row in array:
row[:] = row + 1
return array
A simple case:
ex1 = np.arange(9).reshape(3, 3)
ex2 = ex1.copy()
ex3 = ex1.copy()
returns:
>>> example1(ex1)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> example2(ex2)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> example3(ex3)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
It can be seen that the first result differs from the second and third.
First example:
You extract a row and add 1 to it. Then you redefine the pointer row but not what the array contains! So it will not affect the original array.
Second example:
You make an in-place operation - obviously this will affect the original array - as long as it is an array.
If you were doing a double loop it wouldn't work anymore:
def example4(array):
for row in array:
for column in row:
column += 1
return array
example4(np.arange(9).reshape(3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
this doesn't work because you don't call np.ndarray's __iadd__ (to modify the data the array points to) but the python int's __iadd__. So this example only works because your rows are numpy arrays.
Third example:
row[:] = row + 1 this is interpreted as something like row[0] = row[0]+1, row[1] = row[1]+1, ... again this works in place so this affects the original array.
Bottom Line
If you are operating on mutable objects, like lists or np.ndarray you need to be careful what you change. Such an object only points to where the actual data is stored in memory - so changing this pointer (example1) doesn't affect the saved data. You need to follow the pointer (either directly by [:] (example3) or indirectly with array.__iadd__ (example2)) to change the saved data.
In the first code, you don't do anything with the new computed row; you rebind the name row, and there is no connection to the array anymore.
In the second and the third, you dont rebind, but assign values to the old variable. With += some internal function is called, which varies depending on the type of the object you let it act upon. See links below.
If you write row + 1 on the right hand side, a new array is computed. In the first case, you tell python to give it the name row (and forget the original object which was called row before). And in the third, the new array is written to the slice of the old row.
For further reading follow the link of the comment to the question by #Thiru above. Or read about assignment and rebinding in general...
One interesting question:
I would like to delete some elements from a numpy array but just as below simplified example code, it works if didn't delete the last element, but it failure if we wish to delete the last element.
Below code works fine:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,4,1]:
values = np.delete(values,i)
print values
The output is:
[0 1 2 3 4 5]
[0 2 4]
If we only change 4 to 5, then it will fail:
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [3,5,1]:
values = np.delete(values,i)
print values
The error message:
IndexError: index 5 is out of bounds for axis 0 with size 5
Why this error only happen if delete the last element? what's correct way to do such tasks?
Keep in mind that np.delete(arr, ind) deletes the element at index ind NOT the one with that value.
This means that as you delete things, the array is getting shorter. So you start with
values = [0,1,2,3,4,5]
np.delete(values, 3)
[0,1,2,4,5] #deleted element 3 so now only 5 elements in the list
#tries to delete the element at the fifth index but the array indices only go from 0-4
np.delete(values, 5)
One of the ways you can solve the problem is to sort the indices that you want to delete in descending order (if you really want to delete the array).
inds_to_delete = sorted([3,1,5], reverse=True) # [5,3,1]
# then delete in order of largest to smallest ind
Or:
inds_to_keep = np.array([0,2,4])
values = values[inds_to_keep]
A probably faster way (because you don't need to delete every single value but all at once) is using a boolean mask:
values = np.array([0,1,2,3,4,5])
tobedeleted = np.array([False, True, False, True, False, True])
# So index 3, 5 and 1 are True so they will be deleted.
values_deleted = values[~tobedeleted]
#that just gives you what you want.
It is recommended on the numpy reference on np.delete
To your question: You delete one element so the array get's shorter and index 5 is no longer in the array because the former index 5 has now index 4. Delete in descending order if you want to use np.delete.
If you really want to delete with np.delete use the shorthand:
np.delete(values, [3,5,1])
If you want to delete where the values are (not the index) you have to alter the procedure a bit. If you want to delete all values 5 in your array you can use:
values[values != 5]
or with multiple values to delete:
to_delete = (values == 5) | (values == 3) | (values == 1)
values[~to_delete]
all of these give you the desired result, not sure how your data really looks like so I can't say for sure which will be the most appropriate.
The problem is that you have deleted items from values so when you are trying to delete item in index 5 there is no longer value at that index, it's now at index 4.
If you sort the list of indices to delete, and iterate over them from large to small that should workaround this issue.
import numpy as np
values = np.array([0,1,2,3,4,5])
print values
for i in [5,3,1]: # iterate in order
values = np.delete(values,i)
print values
If you want to remove the elements of indices 3,4,1 , just do np.delete(values,[3,4,1]).
If you want in the first case to delete the fourth (index=3) item, then the fifth of the rest and finally the second of the rest, due to the order of the operations, you delete the second, fourth and sixth of the initial array. It's therefore logic that the second case fails.
You can compute the shifts (in the exemple fifth become sixth) in this way :
def multidelete(values,todelete):
todelete=np.array(todelete)
shift=np.triu((todelete>=todelete[:,None]),1).sum(0)
return np.delete(values,todelete+shift)
Some tests:
In [91]: multidelete([0, 1, 2, 3, 4, 5],[3,4,1])
Out[91]: array([0, 2, 4])
In [92]: multidelete([0, 1, 2, 3, 4, 5],[1,1,1])
Out[92]: array([0, 4, 5])
N.B. np.delete doesn't complain an do nothing if the bad indice(s) are in a list : np.delete(values,[8]) is values .
Boolean index is deprected. You can use function np.where() instead like this:
values = np.array([0,1,2,3,4,5])
print(values)
for i in [3,5,1]:
values = np.delete(values,np.where(values==i))
# values = np.delete(values,values==i) # still works with warning
print(values)
I know this question is old, but for further reference (as I found a similar source problem):
Instead of making a for loop, a solution is to filter the array with isin numpy's function. Like so,
>>> import numpy as np
>>> # np.isin(element, test_elements, assume_unique=False, invert=False)
>>> arr = np.array([1, 4, 7, 10, 5, 10])
>>> ~np.isin(arr, [4, 10])
array([ True, False, True, False, True, False])
>>> arr = arr[ ~np.isin(arr, [4, 10]) ]
>>> arr
array([1, 7, 5])
So for this particular case we can write:
values = np.array([0,1,2,3,4,5])
torem = [3,4,1]
values = values[ ~np.isin(values, torem) ]
which outputs: array([0, 2, 5])
here's how you can do it without any loop or any indexing, using numpy.setdiff1d
>>> import numpy as np
>>> array_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> array_1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> remove_these = np.array([1,3,5,7,9])
>>> remove_these
array([1, 3, 5, 7, 9])
>>> np.setdiff1d(array_1, remove_these)
array([ 2, 4, 6, 8, 10])
I feel silly, because this is such a simple thing, but I haven't found the answer either here or anywhere else.
Is there no straightforward way of indexing a numpy array with another?
Say I have a 2D array
>> A = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
if I want to access element [3,1] I type
>> A[3,1]
8
Now, say I store this index in an array
>> ind = np.array([3,1])
and try using the index this time:
>> A[ind]
array([[7, 8],
[3, 4]])
the result is not A[3,1]
The question is: having arrays A and ind, what is the simplest way to obtain A[3,1]?
Just use a tuple:
>>> A[(3, 1)]
8
>>> A[tuple(ind)]
8
The A[] actually calls the special method __getitem__:
>>> A.__getitem__((3, 1))
8
and using a comma creates a tuple:
>>> 3, 1
(3, 1)
Putting these two basic Python principles together solves your problem.
You can store your index in a tuple in the first place, if you don't need NumPy array features for it.
That is because by giving an array you actually ask
A[[3,1]]
Which gives the third and first index of the 2d array instead of the first index of the third index of the array as you want.
You can use
A[ind[0],ind[1]]
You can also use (if you want more indexes at the same time);
A[indx,indy]
Where indx and indy are numpy arrays of indexes for the first and second dimension accordingly.
See here for all possible indexing methods for numpy arrays: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html