I have an array that I would like to iterate over and modify the array itself, through inserts or deletions.
for idx, ele in enumerate(np.nditer(array)):
if idx + 1 < array.shape[0] and ele > array[idx+1]:
array = np.delete(array, idx+1)
print(ele)
Given [5, 4, 3, 2, 1] I want the loop to print out 5 3 1 because 4 and 2 are smaller than their previous elements. But because python creates an iterator based on the first instance of array so it prints 5 4 3 2 1. I want to know if I can get
Generally speaking I want the iterator to be modified if I modify the array within the body of my loop.
You cannot mutate the length of a numpy array, because numpy assigns the required memory for an array upon its creation.
With
array = np.delete(array, idx+1)
You are creating a new array on the right hand side of the = and reassign the name array.
The return value for enumerate(np.nditer(array)) has already been created at that point and won't recognize that the name array has been rebound.
In principle, you can iterate over a sequence and mutate its length at the same time (generally not a good idea). The object just needs to have methods that let you mutate its length (like lists, for example).
Consider:
>>> l = [5, 4, 3, 2, 1]
>>> for idx, ele in enumerate(l):
...: if ele == 3:
...: l.pop(idx) # mutates l
...: print(ele)
...:
5
4
3
1
>>> l
[5, 4, 2, 1]
Notice that
l is mutated.
The loop does not print 2 because popping an element reduces the indexes of all the remaining elements by one. Now l[2] == 2, but index 2 has already been visited by the iterator, so the next print-call prints l[3] which is 1.
This proves that mutations to l have effect on subsequent iterations.
Instead of looping over an array, you can use where method to find
indices of elements meeting some condition.
Then to delete the selected element (or elements), you can use
delete method, passing the source array, and a list of indices.
Then save the result, e.g. under the same variable.
To add an element, you can use append or insert methods
(for details see Numpy documentation).
I have also found a SO post concerning how to loop and delete over an array.
See Deleting elements from numpy array with iteration
Related
Short version: Given a list with two elements [i, j], how do I get the i, j -element of a 2d array instead of the i, j rows: arr[[i,j]] to arr[i,j].
I've seen similar cases where *list has been used, but I don't entirely understand how that operator works.
Deeper context:
I have a function that returns a nested list, where each sub-list is a pair of indices to be used in an array:
grid = np.full((3,3), 1)
def path():
...
return [[i_1, j_1], [i_2, j_2], ...]
for i in path():
grid[path()[i]] = 0
But since path()[i] is a list, grid[path()[i]] == 0 sets two rows equal to zero, and not a single element. How do I prevent that?
While not stricly necessary, a faster solution would be preferable as this operation is to be done many times.
The thing that is confusing you is the difference in mathematic notation and indexing 2D (or n-dimensional) lists in Python/programming languages.
If you have a 2D matrix in mathematics, called X, and let's say you'd want to access the element in the first row and first column, you'd write X(1, 1).
If you have a 2D array, it's elements are lists. So, if you want to access the 1st row of an array called X you'd do:
X[0] # 0 because indexation starts at 0
Keep in mind that the previous statement returns a new list. You can index this list as well. Since we need the first column, we do:
X[0][0] # element in 1st row and 1st column of matrix X
Thus the answer is you need to successively index the matrix, i.e. apply chain indexing.
As for your original question, here is my answer. Let's say a is the 2-element list which contains the indices i and j which you want to access in the 2D array. I'll denote the 2D array as b. Then you apply chain indexing, the first index is the first element of a and the second index is the second element of a:
a = [0, 0] # list with 2 elements containing the indices
b = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # 2D array in which you want to access b[0][0]
i, j = a[0], a[1]
print(b[i][j])
Obviously you don't need to use variables i, j, I just did so for clarity:
print(b[a[0]][a[1]])
Both statements print out the element in the 1st row and 1st column of b:
1
Edit: I fixed y so that x,y have the same length
I don't understand much about programing but I have a giant mass of data to analyze and it has to be done in Python.
Say I have two arrays:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
and say I want to choose the values in y which are greater than 17, and keep only the values in x which has the same index as the left values in y. for example I want to erase the first value of y (25) and accordingly the matching value in x (1).
I tried this:
filter=np.where(y>17, 0, y)
but I don't know how to filter the x values accordingly (the actual data are much longer arrays so doing it "by hand" is basically imposible)
Solution: using #mozway tip, now that x,y have the same length the needed code is:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20,80,45])
x_filtered=x[y>17]
As your question is not fully clear and you did not provide the expected output, here are two possibilities:
filtering
Nunique arrays can be sliced by an array (iterable) of booleans.
If the two arrays were the same length you could do:
x[y>17]
Here, xis longer than y so we first need to make it the same length:
import numpy as np
x=np.array([1,2,3,4,5,6,7,8,9,10])
y=np.array([25,18,16,19,30,5,9,20])
x[:len(y)][y>17]
Output: array([1, 2, 4, 5, 8])
replacement
To select between x and y based on a condition, use where:
np.where(y>17, x[:len(y)], y)
Output:
array([ 1, 2, 16, 4, 5, 5, 9, 8])
As someone with little experience in Numpy specifically, I wrote this answer before seeing #mozway's excellent answer for filtering. My answer works on more generic containers than Numpy's arrays, though it uses more concepts as a result. I'll attempt to explain each concept in enough detail for the answer to make sense.
TL;DR:
Please, definitely read the rest of the answer, it'll help you understand what's going on.
import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([25,18,16,19,30,5,9,20])
filtered_x_list = []
filtered_y_list = []
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
# These lines are just for us to see what happened
print(filtered_x) # prints [1 2 4 5 8]
print(filtered_y) # prints [25 18 19 30 20]
Pre-requisite Knowledge
Python containers (lists, arrays, and a bunch of other stuff I won't get into)
Lets take a look at the line:
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
What's Python doing?
The first thing it's doing is creating a list:
[1, 2, 3] # and so on
Lists in Python have a few features that are useful for us in this solution:
Accessing elements:
x_list = [ 1, 2, 3 ]
print(x_list[0]) # prints 1
print(x_list[1]) # prints 2, and so on
Adding elements to the end:
x_list = [ 1, 2, 3 ]
x_list.append(4)
print(x_list) # prints [1, 2, 3, 4]
Iteration:
x_list = [ 1, 2, 3 ]
for x in x_list:
print(x)
# prints:
# 1
# 2
# 3
Numpy arrays are slightly different: we can still access and iterate elements in them, but once they're created, we can't modify them - they have no .append, and there are other modifications one can do with lists (like changing one value, or deleting a value) we can't do with numpy arrays.
So the filtered_x_list and the filtered_y_list are empty lists we're creating, but we're going to modify them by adding the values we care about to the end.
The second thing Python is doing is creating a numpy array, using the list to define its contents. The array constructor can take a list expressed as [...], or a list defined by x_list = [...], which we're going to take advantage of later.
A little more on iteration
In your question, for every x element, there is a corresponding y element. We want to test something for each y element, then act on the corresponding x element, too.
Since we can access the same element in both arrays using an index - x[0], for instance - instead of iterating over one list or the other, we can iterate over all indices needed to access the lists.
First, we need to figure out how many indices we're going to need, which is just the length of the lists. len(x) lets us do that - in this case, it returns 10.
What if x and y are different lengths? In this case, I chose the smallest of the two - first, do len(x) and len(y), then pass those to the min() function, which is what min(len(x), len(y)) in the code above means.
Finally, we want to actually iterate through the indices, starting at 0 and ending at len(x) - 1 or len(y) - 1, whichever is smallest. The range sequence lets us do exactly that:
for i in range(10):
print(i)
# prints:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
So range(min(len(x), len(y))), finally, gets us the indices to iterate over, and finally, this line makes sense:
for i in range(min(len(x), len(y))):
Inside this for loop, i now gives us an index we can use for both x and y.
Now, we can do the comparison in our for loop:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
Then, including xs for the corresponding ys is a simple case of just appending the same x value to the x list:
for i in range(min(len(x), len(y))):
if y[i] > 17:
filtered_y_list.append(y[i])
filtered_x_list.append(x[i])
The filtered lists now contain the numbers you're after. The last two lines, outside the for loop, just create numpy arrays from the results:
filtered_x = np.array(filtered_x_list)
filtered_y = np.array(filtered_y_list)
Which you might want to do, if certain numpy functions expect arrays.
While there are, in my opinion, better ways to do this (I would probably write custom iterators that produce the intended results without creating new lists), they require a somewhat more advanced understanding of programming, so I opted for something simpler.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm learning Python, coming from a C#/Java background, and was playing around with list behavior. I've read some of the documentation, but I don't understand how, or why, slices with indices larger than length-1 make is possible to append items.
ls = ["a", "b", "c", "d"]
n = len(ls) # n = 4
letter = ls[4] # index out of range (list indexes are 0 - 3)
# the following all do the same thing
ls += ["echo"]
ls.append("foxtrot")
ls[len(ls):] = ["golf"] # equivalent to ls[len(ls): len(ls)]
print(ls)
Although it seems odd to me, I understand how slices can modify the list they operate on. What I don't get is why list[len(list)] results in the expected out of bounds error, but list[len(list):] doesn't. I understand that slices are fundamentally different than indexes, just not what happens internally when a slice begins with an index outside of the list values.
Why can I return a slice from a list that starts with a non-existent element (len(list))? And why does this allow me to expand the list?
Also, of the first three above methods for appending an item, which one is preferred from a convention or performance perspective? Are there performance disadvantages to any of them?
While a.append(x) and a[len(a):] = [x] do the same thing, the first is clearer and should almost always be used. Slice replacement may have a place, but you shouldn't look for reasons to do it. If the situation warrants it, it should be obvious when to do it.
Start with the fact that indexing returns an element of the list, and slicing returns a "sublist" of a list. While an element at a particular either exists or does not exist, it is mathematically convenient to think of every list have the empty list as a "sublist" between any two adjacent positions. Consider
>>> a = [1, 2, 3]
>>> a[2:2]
[]
Now consider the following sequence. At each step, we increment the starting index of the slice
>>> a[0:] # A copy of the list
[1, 2, 3]
>>> a[1:]
[2, 3]
>>> a[2:]
[3]
Each time we do so, we remove one more element from the beginning of the list for the resulting slice. So what should happen when the starting index finally equals the final index (which is implicitly the length of the list)?
>>> a[3:]
[]
You get the empty list, which is consistent with the following identities
a + [] = a
a[:k] + a[k:] = a
(The first is a special case of the second where k >= len(a).
Now, once we see why it makes sense for the slice to exist, what does it mean to replace that slice with another list? In other words, what list do we get if we replace the empty list at the end of [1, 2, 3] with the list [4]? You get [1, 2, 3, 4], which is consistent with
[1, 2, 3] + [4] = [1, 2, 3, 4]
Consider another sequence, this time involving slice replacements.
>>> a = [1,2,3]; a[0:] = ['a']; a
['a']
>>> a = [1,2,3]; a[1:] = ['a']; a
[1, 'a']
>>> a = [1,2,3]; a[2:] = ['a']; a
[1, 2, 'a']
>>> a = [1,2,3]; a[3:] = ['a']; a
[1, 2, 3, 'a']
Another way of looking at slice replacement is to treat it as appending a list to a sublist of the original. When that sublist is the list itself, then it is equal to appending to the original.
Why can I return a slice from a list that starts with a non-existent element?
I can't tell for sure, but my guess would be that it's because a reasonable value can be returned (there are no elements this refers to, so return an empty list).
And why does this allow me to expand the list?
Your slice selects the last 0 elements of the list. You then replace those elements with the elements of another list.
Also, of the first three above methods for appending an item, which one is preferred
Use .append() if you're adding a single element, and .extend() if you have elements inside another list.
I'm not sure about performance, but my guess would be that those are pretty well optimized.
I did not expect this to work, since I was modifying the object being iterated over, but I didn't expect it to fail this way. I actually expected an exception to be raised.
>>> x = [1, 2, 3]
>>> for a in x:
... print a, x.pop(0)
...
1 1
3 2
>>> x
[3]
With a slightly larger range:
>>> x = [1, 2, 3, 4]
>>> for a in x:
... print a, x.pop(0)
...
1 1
3 2
>>> x
[3, 4]
And a little bit larger still:
>>> x = [1, 2, 3, 4, 5]
>>> for a in x:
... print a, x.pop(0)
...
1 1
3 2
5 3
>>> x
[4, 5]
It's like the for loop is creating a generator from the list, but comparing the "index" to the length of the list to decide when iteration is over.
It still seems like this should generate an exception though, not this bizarre behavior. Is there some reason it doesn't raise an exception?
As you have intuited, the for loop is using an index internally, incrementing it by 1 on each iteration, and stopping when the index exceeds the length of the list. This is how iteration is defined for lists. It is defined in other ways for other types, and you can define it for your own classes by implementing __iter__.
Modifying a list while iterating over it is legal, and predictable once you understand how it works. When you remove one or more items, the items after it shift down to lower indexes, and if the item removed is at an index less than or equal to the loop's current index, you end up skipping items when the loop increments the index.
There are a number of solutions: iterate in reverse order, iterate over a copy, make a list of the indices to be deleted and do that separately, build a new list instead of modifying the existing one, use a while loop instead of an if statement if you are conditionally removing items... there are probably other approaches as well.
I would like to ask what the following does in Python.
It was taken from http://danieljlewis.org/files/2010/06/Jenks.pdf
I have entered comments telling what I think is happening there.
# Seems to be a function that returns a float vector
# dataList seems to be a vector of flat.
# numClass seems to an int
def getJenksBreaks( dataList, numClass ):
# dataList seems to be a vector of float. "Sort" seems to sort it ascendingly
dataList.sort()
# create a 1-dimensional vector
mat1 = []
# "in range" seems to be something like "for i = 0 to len(dataList)+1)
for i in range(0,len(dataList)+1):
# create a 1-dimensional-vector?
temp = []
for j in range(0,numClass+1):
# append a zero to the vector?
temp.append(0)
# append the vector to a vector??
mat1.append(temp)
(...)
I am a little confused because in the pdf there are no explicit variable declarations. However I think and hope I could guess the variables.
Yes, the method append() adds elements to the end of the list. I think your interpretation of the code is correct.
But note the following:
x =[1,2,3,4]
x.append(5)
print(x)
[1, 2, 3, 4, 5]
while
x.append([6,7])
print(x)
[1, 2, 3, 4, 5, [6, 7]]
If you want something like
[1, 2, 3, 4, 5, 6, 7]
you may use extend()
x.extend([6,7])
print(x)
[1, 2, 3, 4, 5, 6, 7]
Python doesn't have explicit variable declarations. It's dynamically typed, variables are whatever type they get assigned to.
Your assessment of the code is pretty much correct.
One detail: The range function goes up to, but does not include, the last element. So the +1 in the second argument to range causes the last iterated value to be len(dataList) and numClass, respectively. This looks suspicious, because the range is zero-indexed, which means it will perform a total of len(dataList) + 1 iterations (which seems suspicious).
Presumably dataList.sort() modifies the original value of dataList, which is the traditional behavior of the .sort() method.
It is indeed appending the new vector to the initial one, if you look at the full source code there are several blocks that continue to concatenate more vectors to mat1.
append is a list function used to append a value at the end of the list
mat1 and temp together are creating a 2D array (eg = [[], [], []]) or matrix of (m x n)
where m = len(dataList)+1 and n = numClass
the resultant matrix is a zero martix as all its value is 0.
In Python, variables are implicitely declared. When you type this:
i = 1
i is set to a value of 1, which happens to be an integer. So we will talk of i as being an integer, although i is only a reference to an integer value. The consequence of that is that you don't need type declarations as in C++ or Java.
Your understanding is mostly correct, as for the comments. [] refers to a list. You can think of it as a linked-list (although its actual implementation is closer to std::vectors for instance).
As Python variables are only references to objects in general, lists are effectively lists of references, and can potentially hold any kind of values. This is valid Python:
# A vector of numbers
vect = [1.0, 2.0, 3.0, 4.0]
But this is perfectly valid code as well:
# The list of my objects:
list = [1, [2,"a"], True, 'foo', object()]
This list contains an integer, another list, a boolean... In Python, you usually rely on duck typing for your variable types, so this is not a problem.
Finally, one of the methods of list is sort, which sorts it in-place, as you correctly guessed, and the range function generates a range of numbers.
The syntax for x in L: ... iterates over the content of L (assuming it is iterable) and sets the variable x to each of the successive values in that context. For example:
>>> for x in ['a', 'b', 'c']:
... print x
a
b
c
Since range generates a range of numbers, this is effectively the idiomatic way to generate a for i = 0; i < N; i += 1 type of loop:
>>> for i in range(4): # range(4) == [0,1,2,3]
... print i
0
1
2
3