Looping through Numpy Array elements - python

Is there a more readable way to code a loop in Python that goes through each element of a Numpy array? I have come up with the following code, but it seems cumbersome & not very readable:
import numpy as np
arr01 = np.random.randint(1,10,(3,3))
for i in range(0,(np.shape(arr01[0])[0]+1)):
for j in range(0,(np.shape(arr01[1])[0]+1)):
print (arr01[i,j])
I could make it more explicit such as:
import numpy as np
arr01 = np.random.randint(1,10,(3,3))
rows = np.shape(arr01[0])[0]
cols = np.shape(arr01[1])[0]
for i in range(0, (rows + 1)):
for j in range(0, (cols + 1)):
print (arr01[i,j])
However, that still seems a bit more cumbersome, compared to other languages, i.e. an equivalent code in VBA could read (supposing the array had already been populated):
dim i, j as integer
for i = lbound(arr01,1) to ubound(arr01,1)
for j = lbound(arr01,2) to ubound(arr01,2)
msgBox arr01(i, j)
next j
next i

You should use the builtin function nditer, if you don't need to have the indexes values.
for elem in np.nditer(arr01):
print(elem)
EDIT: If you need indexes (as a tuple for 2D table), then:
for index, elem in np.ndenumerate(arr01):
print(index, elem)

Seems like you've skipped over some intro Python chapters. With a list there are several simple ways of iterating:
In [1]: alist = ['a','b','c']
In [2]: for i in alist: print(i) # on the list itself
a
b
c
In [3]: len(alist)
Out[3]: 3
In [4]: for i in range(len(alist)): print(i,alist[i]) # index is ok
0 a
1 b
2 c
In [5]: for i,v in enumerate(alist): print(i,v) # but enumerate is simpler
0 a
1 b
2 c
Note the indexes. range(3) is sufficient. alist[3] produces an error.
In [6]: arr = np.arange(6).reshape(2,3)
In [7]: arr
Out[7]:
array([[0, 1, 2],
[3, 4, 5]])
In [8]: for row in arr:
...: for col in row:
...: print(row,col)
...:
[0 1 2] 0
[0 1 2] 1
[0 1 2] 2
[3 4 5] 3
[3 4 5] 4
[3 4 5] 5
The shape is a tuple. The row count is then arr.shape[0], and columns arr.shape[1]. Or you can 'unpack' both at once:
In [9]: arr.shape
Out[9]: (2, 3)
In [10]: n,m = arr.shape
In [11]: [arr[i,j] for i in range(n) for j in range(m)]
Out[11]: [0, 1, 2, 3, 4, 5]
But we can get the same flat list of values with ravel and optional conversion to list:
In [12]: arr.ravel()
Out[12]: array([0, 1, 2, 3, 4, 5])
In [13]: arr.ravel().tolist()
Out[13]: [0, 1, 2, 3, 4, 5]
But usually with numpy arrays, you shouldn't be iterating at all. Learn enough of the numpy basics so you can work with the whole array, not elements.
nditer can be used, as the other answer shows, to iterate through an array in a flat manner, but there are a number of details about it that could easily confuse a beginner. There are a couple of intro pages to nditer, but they should be read in full. Usually I discourage its use.
In [14]: for i in np.nditer(arr):
...: print(i, type(i), i.shape)
...:
0 <class 'numpy.ndarray'> () # this element is a 0d array, not a scalar integer
1 <class 'numpy.ndarray'> ()
2 <class 'numpy.ndarray'> ()
...
Iterating with ndenumerate or on the tolist produce different types of elements. The type may matter if you try to do more than display the value, so be careful.
In [15]: list(np.ndenumerate(arr))
Out[15]: [((0, 0), 0), ((0, 1), 1), ((0, 2), 2), ((1, 0), 3), ((1, 1), 4), ((1, 2), 5)]
In [16]: for ij, v in np.ndenumerate(arr):
...: print(ij, v, type(v))
...:
(0, 0) 0 <class 'numpy.int64'>
(0, 1) 1 <class 'numpy.int64'>
...
In [17]: for i, v in enumerate(arr.ravel().tolist()):
...: print(i, v, type(v))
...:
0 0 <class 'int'>
1 1 <class 'int'>
...

Related

What does '...' mean in a python slice [duplicate]

What is the meaning of x[...] below?
a = np.arange(6).reshape(2,3)
for x in np.nditer(a, op_flags=['readwrite']):
x[...] = 2 * x
While the proposed duplicate What does the Python Ellipsis object do? answers the question in a general python context, its use in an nditer loop requires, I think, added information.
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#modifying-array-values
Regular assignment in Python simply changes a reference in the local or global variable dictionary instead of modifying an existing variable in place. This means that simply assigning to x will not place the value into the element of the array, but rather switch x from being an array element reference to being a reference to the value you assigned. To actually modify the element of the array, x should be indexed with the ellipsis.
That section includes your code example.
So in my words, the x[...] = ... modifies x in-place; x = ... would have broken the link to the nditer variable, and not changed it. It's like x[:] = ... but works with arrays of any dimension (including 0d). In this context x isn't just a number, it's an array.
Perhaps the closest thing to this nditer iteration, without nditer is:
In [667]: for i, x in np.ndenumerate(a):
...: print(i, x)
...: a[i] = 2 * x
...:
(0, 0) 0
(0, 1) 1
...
(1, 2) 5
In [668]: a
Out[668]:
array([[ 0, 2, 4],
[ 6, 8, 10]])
Notice that I had to index and modify a[i] directly. I could not have used, x = 2*x. In this iteration x is a scalar, and thus not mutable
In [669]: for i,x in np.ndenumerate(a):
...: x[...] = 2 * x
...
TypeError: 'numpy.int32' object does not support item assignment
But in the nditer case x is a 0d array, and mutable.
In [671]: for x in np.nditer(a, op_flags=['readwrite']):
...: print(x, type(x), x.shape)
...: x[...] = 2 * x
...:
0 <class 'numpy.ndarray'> ()
4 <class 'numpy.ndarray'> ()
...
And because it is 0d, x[:] cannot be used instead of x[...]
----> 3 x[:] = 2 * x
IndexError: too many indices for array
A simpler array iteration might also give insight:
In [675]: for x in a:
...: print(x, x.shape)
...: x[:] = 2 * x
...:
[ 0 8 16] (3,)
[24 32 40] (3,)
this iterates on the rows (1st dim) of a. x is then a 1d array, and can be modified with either x[:]=... or x[...]=....
And if I add the external_loop flag from the next section, x is now a 1d array, and x[:] = would work. But x[...] = still works and is more general. x[...] is used all the other nditer examples.
In [677]: for x in np.nditer(a, op_flags=['readwrite'], flags=['external_loop']):
...: print(x, type(x), x.shape)
...: x[...] = 2 * x
[ 0 16 32 48 64 80] <class 'numpy.ndarray'> (6,)
Compare this simple row iteration (on a 2d array):
In [675]: for x in a:
...: print(x, x.shape)
...: x[:] = 2 * x
...:
[ 0 8 16] (3,)
[24 32 40] (3,)
this iterates on the rows (1st dim) of a. x is then a 1d array, and can be modified with either x[:] = ... or x[...] = ....
Read and experiment with this nditer page all the way through to the end. By itself, nditer is not that useful in python. It does not speed up iteration - not until you port your code to cython.np.ndindex is one of the few non-compiled numpy functions that uses nditer.
The ellipsis ... means as many : as needed.
For people who don't have time, here is a simple example:
In [64]: X = np.reshape(np.arange(9), (3,3))
In [67]: Y = np.reshape(np.arange(2*3*4), (2,3,4))
In [70]: X
Out[70]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [71]: X[:,0]
Out[71]: array([0, 3, 6])
In [72]: X[...,0]
Out[72]: array([0, 3, 6])
In [73]: Y
Out[73]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [74]: Y[:,0]
Out[74]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
In [75]: Y[...,0]
Out[75]:
array([[ 0, 4, 8],
[12, 16, 20]])
In [76]: X[0,...,0]
Out[76]: array(0)
In [77]: Y[0,...,0]
Out[77]: array([0, 4, 8])
This makes it easy to manipulate only one dimension at a time.
One thing - You can have only one ellipsis in any given indexing expression, or your expression would be ambiguous about how many : should be put in each.
I believe a very good parallel (that most people are maybe used to) is to think that way:
import numpy as np
random_array = np.random.rand(2, 2, 2, 2)
In such case, [:, :, :, 0] and [..., 0] are the same.
You can use to analyse only an specific dimension, say you have a batch of 50 128x128 RGB image (50, 3, 128, 128), if you want to slice a piece of it in every image at every color channel, you could either do image[:,:,50:70, 20:80] or image[...,50:70,20:80]
Just be aware that you can't use it more than once in the statement like [...,0,...] is invalid.

Loop append of numpy array element in python

Who can explain how to loop add an element to the numpy array by condition?
I wrote some code that should do add element 2 if i element of array A is 0 and add element 1 if i element of array A is not 0.
Here is the code itself:
import numpy as np
def finalconcat(somearray):
for i in somearray:
arraysome=[]
if somearray[i]==0:
arraysome=np.append(arraysome,[2],axis=0)
else:
arraysome=np.append(arraysome,[1],axis=0)
return arraysome
Let me give you an example:
A=np.array([1,0,2,3,4,5])
C=finalconcat(B)
print(C)
It should come out:
[1,2,1,1,1,1]
But it comes out like:
[1.]
Please explain to me what is wrong here, I just don't understand what could be wrong...
You have several issues:
arraysome=[] is inside your loop so for each iteration of somearray you are emptying arraysome. Consequently, you can never end up with more than one element in arraysome when you are all done.
You have for i in somearray. On each iteration i will be the next element of somearray; it will not be iterating indices of the array. Yet later you have if somearray[i]==0:. This should just be if i==0:.
If you want the resulting elements of arraysome to be integers rather than floats, then you should initialize it to be a an numpy array of integers.
You have C=finalconcat(B), but B is not defined.
You should really spend some time reading the PEP 8 – Style Guide for Python Code.
import numpy as np
def finalconcat(somearray):
arraysome = np.array([], dtype=np.int)
for i in somearray:
if i == 0:
arraysome = np.append(arraysome, [2], axis=0)
else:
arraysome = np.append(arraysome, [1], axis=0)
return arraysome
a = np.array([1, 0, 2, 3, 4, 5])
c = finalconcat(a)
print(c)
Prints:
[1 2 1 1 1 1]
For iteration like this it's better to use lists. np.append is just a poorly named cover for np.concatenate, which returns a whole new array with each call. List append works in-place, and is more efficient. And easier to use:
def finalconcat(somearray):
rec = [2 if i==0 else 1 for i in somearray]
# arr = np.array(rec)
return rec
In [31]: a = np.array([1, 0, 2, 3, 4, 5])
In [32]: np.array([2 if i==0 else 1 for i in a])
Out[32]: array([1, 2, 1, 1, 1, 1])
But it's better to use whole-array methods, such as:
In [33]: b = np.ones_like(a)
In [34]: b
Out[34]: array([1, 1, 1, 1, 1, 1])
In [35]: b[a==0] = 2
In [36]: b
Out[36]: array([1, 2, 1, 1, 1, 1])
or
In [37]: np.where(a==0, 2, 1)
Out[37]: array([1, 2, 1, 1, 1, 1])

Tracking Multi-Index and Modifying Values in NumPy

I have a 2D array that I am iterating over in an effort to use the index values to make calculations and then assign the calculated value to said index.
In the NumPy documentation, an example is provided for modifying values using an iterator:
for x in np.nditer(a, op_flags=['readwrite']):
x[...] = 2 * x
However, this doesn't seem to work when tracking an index using the following method:
it = np.nditer(a, flags=['multi_index'])
while not it.finished:
it[...] = . . .
it.iternext()
I am, however, able to use the it.multi_index values, but it seems unnecessarily verbose. Is there a simpler way to achieve this, either through a different approach or different syntax?
it = np.nditer(a, flags=['multi_index'])
while not it.finished:
matrix[it.multi_index[0]][it.multi_index[1]] = . . .
it.iternext()
EDIT
Here is an example of a multi_index iteration attempting to modify values using iterator indexing and failing.
matrix = np.zeros((5,5))
it = np.nditer(matrix, flags=['multi_index'])
while not it.finished:
it[...] = 1
it.iternext()
The error produced is
TypeError Traceback (most recent call last)
<ipython-input-79-3f4cabcbfde6> in <module>()
25 it = np.nditer(matrix, flags=['multi_index'])
26 while not it.finished:
---> 27 it[...] = 1
28 it.iternext()
TypeError: invalid index type for iterator indexing
In your first iteration example:
In [1]: arr = np.arange(12).reshape(3,4)
In [2]: for x in np.nditer(arr, op_flags=['readwrite']):
...: print(x, type(x))
...: x[...] = 2 * x
...:
0 <class 'numpy.ndarray'>
1 <class 'numpy.ndarray'>
2 <class 'numpy.ndarray'>
3 <class 'numpy.ndarray'>
4 <class 'numpy.ndarray'>
....
11 <class 'numpy.ndarray'>
In [3]: x
Out[3]: array(22)
In [4]: arr
Out[4]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
Turn on multi_index:
In [9]: it = np.nditer(arr, flags=['multi_index'],op_flags=['readwrite'])
In [10]: while not it.finished:
...: print(it[0], it.multi_index)
...: it.iternext()
...:
0 (0, 0)
2 (0, 1)
4 (0, 2)
...
20 (2, 2)
22 (2, 3)
Same iteration through the elements of arr, but is also generates the 2d index tuple. it is the nditer object, with various methods and attributes. In this case it has a multi_index attribute. And the current iteration variable is in it[0].
I can modify elements either with the [...] inplace, or by indexing in arr:
In [11]: it = np.nditer(arr, flags=['multi_index'],op_flags=['readwrite'])
In [13]: while not it.finished:
...: it[0][...] *= 2
...: arr[it.multi_index] += 100
...: it.iternext()
...:
In [14]: arr # values are doubled and add by 100
Out[14]:
array([[100, 104, 108, 112],
[116, 120, 124, 128],
[132, 136, 140, 144]])
Without multi_index I could still create an nditer object, and iterate with the while not finished syntax. Instead of accessing x[...] I'd have to use it[0][...].
np.ndindex is a more convenient way of generating a multi_index. Look at its code. It's one of the few numpy functions that uses np.nditer.
In [26]: for idx in np.ndindex(arr.shape):
...: print(idx)
...: arr[idx] -= 100
...:
(0, 0)
(0, 1)
...
(2, 3)
In [27]: arr
Out[27]:
array([[ 0, 4, 8, 12],
[16, 20, 24, 28],
[32, 36, 40, 44]])
But
While it is fun to play with nditer, it isn't practical, at least not in pure Python code. It is most useful as a stepping stone toward using it in cython or pure c code. See the final example of the iteration page.

Iterate over numpy array columnwise

np.nditer automatically iterates of the elements of an array row-wise. Is there a way to iterate of elements of an array columnwise?
x = np.array([[1,3],[2,4]])
for i in np.nditer(x):
print i
# 1
# 3
# 2
# 4
What I want is:
for i in Columnwise Iteration(x):
print i
# 1
# 2
# 3
# 4
Is my best bet just to transpose my array before doing the iteration?
For completeness, you don't necessarily have to transpose the matrix before iterating through the elements. With np.nditer you can specify the order of how to iterate through the matrix. The default is usually row-major or C-like order. You can override this behaviour and choose column-major, or FORTRAN-like order which is what you desire. Simply specify an additional argument order and set this flag to 'F' when using np.nditer:
In [16]: x = np.array([[1,3],[2,4]])
In [17]: for i in np.nditer(x,order='F'):
....: print i
....:
1
2
3
4
You can read more about how to control the order of iteration here: http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.nditer.html#controlling-iteration-order
You could use the shape and slice each column
>>> [x[:, i] for i in range(x.shape[1])]
[array([1, 2]), array([3, 4])]
You could transpose it?
>>> x = np.array([[1,3],[2,4]])
>>> [y for y in x.T]
[array([1, 2]), array([3, 4])]
Or less elegantly:
>>> [np.array([x[j,i] for j in range(x.shape[0])]) for i in range(x.shape[1])]
[array([1, 2]), array([3, 4])]
nditer is not the best iteration tool for this case. It is useful when working toward a compiled (cython) solution, but not in pure Python coding.
Look at some regular iteration strategies:
In [832]: x=np.array([[1,3],[2,4]])
In [833]: x
Out[833]:
array([[1, 3],
[2, 4]])
In [834]: for i in x:print i # print each row
[1 3]
[2 4]
In [835]: for i in x.T:print i # print each column
[1 2]
[3 4]
In [836]: for i in x.ravel():print i # print values in order
1
3
2
4
In [837]: for i in x.T.ravel():print i # print values in column order
1
2
3
4
You comment: I need to fill values into an array based on the index of each cell in the array
What do you mean by index?
A crude 2d iteration with indexing:
In [838]: for i in range(2):
.....: for j in range(2):
.....: print (i,j),x[i,j]
(0, 0) 1
(0, 1) 3
(1, 0) 2
(1, 1) 4
ndindex uses nditer to generate similar indexes
In [841]: for i,j in np.ndindex(x.shape):
.....: print (i,j),x[i,j]
.....:
(0, 0) 1
(0, 1) 3
(1, 0) 2
(1, 1) 4
enumerate is a good Python way of getting both values and indexes:
In [847]: for i,v in enumerate(x):print i,v
0 [1 3]
1 [2 4]
Or you can use meshgrid to generate all the indexes, as arrays
In [843]: I,J=np.meshgrid(range(2),range(2))
In [844]: I
Out[844]:
array([[0, 1],
[0, 1]])
In [845]: J
Out[845]:
array([[0, 0],
[1, 1]])
In [846]: x[I,J]
Out[846]:
array([[1, 2],
[3, 4]])
Note that most of these iterative methods just treat your array as a list of lists. They don't take advantage of the array nature, and will be slow compared to methods that work with the whole x.

The most frequent pattern of specific columns in Pandas.DataFrame in python

I know how to get the most frequent element of list of list, e.g.
a = [[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[3,2]]
print max(a, key=a.count)
should print [3, 4] even though the most frequent number is 1 for the first element and 2 for the second element.
My question is how to do the same kind of thing with Pandas.DataFrame.
For example, I'd like to know the implementation of the following method get_max_freq_elem_of_df:
def get_max_freq_elem_of_df(df):
# do some things
return freq_list
df = pd.DataFrame([[3,4], [3,4],[3,4], [1,2], [1,2], [1,1],[1,3],[2,2],[4,2]])
x = get_max_freq_elem_of_df(df)
print x # => should print [3,4]
Please notice that DataFrame.mode() method does not work. For above example, df.mode() returns [1, 2] not [3,4]
Update
have explained why DataFrame.mode() doesn't work.
You could use groupby.size and then find the max:
>>> df.groupby([0,1]).size()
0 1
1 1 1
2 2
3 1
2 2 1
3 4 3
4 2 1
dtype: int64
>>> df.groupby([0,1]).size().idxmax()
(3, 4)
In python you'd use Counter*:
In [11]: from collections import Counter
In [12]: c = Counter(df.itertuples(index=False))
In [13]: c
Out[13]: Counter({(3, 4): 3, (1, 2): 2, (1, 3): 1, (2, 2): 1, (4, 2): 1, (1, 1): 1})
In [14]: c.most_common(1) # get the top 1 most common items
Out[14]: [((3, 4), 3)]
In [15]: c.most_common(1)[0][0] # get the item (rather than the (item, count) tuple)
Out[15]: (3, 4)
* Note that your solution
max(a, key=a.count)
(although it works) is O(N^2), since on each iteration it needs to iterate through a (to get the count), whereas Counter is O(N).

Categories