I'm not very familiar with python. I reading the book 'Python for Data Analysis' recently, and I'm a bit confused about the numpy boolean indexing and setting.
The book said:
Selecting data from an array by boolean indexing always creates a copy of the data, even if the returned array is unchanged.
Setting values with boolean arrays works in a common-sense way.
And I have tried it as the follow code:
First:
data = np.random.randn(7, 4)
data[data < 0] = 0 # this could change the `data`
Second:
data = np.random.randn(7, 4)
copied = data[data < 0]
copied[1] = 1 # this couldn't change the `data`
I do not quite understand here, anyone can explain it. In my understanding, copied should be pointer to the data[data < 0] slices.
While data[data < 0] = 0 sorta looks like a view being set to 0, that's not what's actually happening. In reality, an ndarray followed by = calls __setitem__ which handles the piecewise assingment.
When the ndarray is on the other side of the =, __setitem__ isn't called and you assign a copy (as boolean indexing always does), which is independent of the original array.
Essentially:
foo[foo != bar] = bar # calls __setitem__
foo[:2] = bar # calls __setitem__
bar = foo[foo != bar] # makes a copy
bar = foo[:2] # makes a view
As a rule of thumb numpy creates a view where possible and a copy where necessary.
When is a view possible? When the data can be addressed using strides, i.e. for example for a 2d array A each A[i, j] sits in memory at address base + i*stride[0] + j*stride[1]. If you create a subarray using just slices this will always be the case which is why you will get a view.
For logical and advanced indexing it will typically not be possible to find a base and strides which happen to address the right elements. Therefore these operations return a new array with data copied.
Based on the sequence of the code:
data = np.random.randn(7, 4) : Thi step creates an array of size 7 by 4
data[data < 0] = 0 : makes all the elements in data which are < 0 as 0
copied = data[data < 0] : This step generates an empty array as there is no element in data which is < 0, because of step 4
copied[1] = 1 : This step raises an error as copied is an empty array and thus index 1 does not exist
Related
I have a 3D int64 Numpy array, which is output from skimage.measure.label. I need a list of 3D indices that match each of our possible (previously known) values, separated out by which indices correspond to each value.
Currently, we do this by the following idiom:
for cur_idx,count in values_counts.items():
region=labels[:,:,:] == cur_idx
[dim1_indices,dim2_indices,dim3_indices]= np.nonzero(region)
While this code works and produces correct output, it is quite slow, especially the np.nonzero part, as we call this 200+ times on a large array. I realize that there is probably a faster way to do this via, say, numba, but we'd like to avoid adding on additional requirements unless needed.
Ultimately, what we're looking for is a list of indices that correspond to each (nonzero) value relatively efficiently. Assume our number of values <1000 but our array size >100x1000x1000. So, for example, on the array created by the following:
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 2
we would want some idx_value dict/array such that idx_value_1[2] = 1 idx_value_2[2] = 2, idx_value_3[2] = 3.
I've tried tackling problems similar to the one you describe, and I think the np.argwhere function is probably your best option for reducing runtime (see docs here). See the code example below for how this could be used per the constraints you identify above.
import numpy as np
x = np.zeros((4,4,4))
x[3,3,3] = 1; x[1,0,3] = 2; x[1,2,3] = 3
# Instantiate dictionary/array to store indices
idx_value = {}
# Get indices equal to 2
idx_value[3] = np.argwhere(x == 3)
idx_value[2] = np.argwhere(x == 2)
idx_value[1] = np.argwhere(x == 1)
# Display idx_value - consistent with indices we set before
>>> idx_value
{3: array([[1, 2, 3]]), 2: array([[1, 0, 3]]), 1: array([[3, 3, 3]])}
For the first use case, I think you would still have to use a for loop to iterate over the values you're searching over, but it could be done as:
# Instantiate dictionary/array
idx_value = {}
# Now loop by incrementally adding key/value pairs
for cur_idx,count in values_counts.items():
idx_value[cur_idx] = np.argwhere(labels)
NOTE: This incrementally creates a dictionary where each key is an idx to search for, and each value is a np.array object of shape (N_matches, 3).
I've been trying to do the following as a batch operation in numpy or torch (no looping). Is this possible?
Suppose I have:
indices: [[3],[2]] (2x1)
output: [[0,0,0,0,1], [0,0,0,1,1]] (2xfixed_num) where fixed_num is 5 here
Essentially, I want to make indices up to that index value 0 and the rest 1 for each element.
Ok, so I actually assume this is some sort of HW assignment - but maybe it's not, either way it was fun to do, here's a solution for your specific example, maybe you can generalize it to any shape array:
def fill_ones(arr, idxs):
x = np.where(np.arange(arr.shape[1]) <= idxs[0], 0, 1) # This is the important logic.
y = np.where(np.arange(arr.shape[1]) <= idxs[1], 0, 1)
return np.array([x, y])
So where the comment is located - we use a condition to assign 0 to all indices before some index value, and 1 after such value. This actually creates a new array as opposed to a mask that we can use to the original array - so maybe it's "dirtier".
Also, I suspect it's possible to generalize to arrays more than 2 dimensions, but the solution i'm imagining now uses a for-loop. Hope this helps!
Note: arr is just a numpy array of whatever shape you want the output to be and idxs is a tuple of what indices past you want to the array elements to turn into 1's - hope that is clear
Given an array and a mask, we can assign new values to the positions that are TRUE in the mask:
import numpy as np
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
a[mask1] = 100
print a
# [ 1 100 3 4 100 6]
However, if we apply a second mask over the first one, we can access to the values but we cannot modify them:
a = np.array([1,2,3,4,5,6])
mask1 = (a==2) | (a==5)
mask2 = (a[mask1]==2)
print a[mask1][mask2]
# [2]
a[mask1][mask2] = 100
print a
# [ 1 2 3 4 5 6 ]
Why does it happen?
(Even if it seems a bizarre way to do this. Just out of curiosity)
This is probably because you mix getters and setters preventing backpropagation.
It's because you use mark1 as an indexer:
>>> mask1
array([False, True, False, False, True, False], dtype=bool)
now by setting a[mask1] = 100, you will set all the elements where mask1 was true thus resulting in
>>> a
array([ 1, 100, 3, 4, 100, 6])
note that you have only called a "setter" so to speak on a.
Now for a[mask1][mask2] = 100 you actually call both a getter and setter. Indeed you can write this as:
temp = a[mask1] #getter
temp[mask2] = 2#setter
as a result you only set the value in the temp, and thus the value is not "backpropagated" so to speak to a itself. You should see temp as a copy (although internally it is definitely possible that a python interpreter handles it differently).
Note: note that there can be circumstances where this behavior works: if temp is for instance a view on an array, it could support backwards propagation. This page for instance shows ways to return a view instead of a copy.
You are chaining advanced* indexing operations for the assignment, which prevents the value 100 being written back to the original array.
a[mask1] returns a new array with a copy of the original data. Writing a[mask1][mask2] = 100 means that this new array is indexed with mask2 and the value 100 assigned to it. This leaves a unchanged.
Simply viewing the items will appear to work fine because the values you pick out from the copy a[mask1] are the values you would want from the original array (although this is still inefficient as data is copied multiple times).
*advanced (or "fancy") indexing is triggered with a boolean array or an array of indices. It always returns a new array, unlike basic indexing which returns a view onto the original data (this is triggered, for example, by slicing).
I would like to run the contraction algorithm on an array of vertices n^2 times so as to calculate the minimum cut of a graph. After the first for-loop iteration, the array is altered and the remaining iterations use the altered array, which is not what I want. How can I simulate pointers so as to have the original input array during each for-loop iteration?
def n_squared_runs(array):
min_cut, length = 9999, len(array) ** 2
for i in range(0, length):
# perform operation on original input array
array = contraction(array)
if len(array) < min_cut:
min_cut = len(array)
return min_cut
The contraction() operation should create and return a new array then, and not modify in-place the array it receives as a parameter - also you should use a different variable name for the returned array, clearly if you use array to name both the parameter and the local variable, the parameter will get overwritten inside the function.
This has nothing to do with pointers, but with the contracts of the functions in use. If the original array must be preserved, then the helper functions need to make sure that this restriction is enforced. Notice that in Python if you do this:
array = [1, 2, 3]
f(array)
The array received by the f function is the same that was declared "outside" of it - in fact, all that f receives is a reference to the array, not a copy of it - so naturally any modifications to the array you do inside f will be reflected outside. Also, it's worth pointing out that all parameters in Python get passed by value, and there's no such thing as pointers or passing by reference in the language.
Don't overwrite the original array.
def n_squared_runs(array):
min_cut, length = 9999, len(array) ** 2
for i in range(0, length):
# perform operation on original input array
new_array = contraction(array)
if len(new_array) < min_cut:
min_cut = len(new_array)
return min_cut
So this is what I tried to do.
vectorized = [0] * length
for i,key in enumerate(foo_dict.keys()):
vector = vectorized
vector[i] = 1
print vector
vector = vectorized
print vectorized
So what I was hoping was for example the length is 4. So i create
a 4 dimension vector:
vectorized=[0,0,0,0]
now, depending on the index of dictionary (which is also of length 4 in this case)
create a vector with value 1 while rest has zero
so vector = [1, 0,0,0] , [0,1,0,0] and so on..
Now instead what is happening is:
vector = [1,0,0,0],[1,1,0,0] .. and finally [1,1,1,1]
even vectorized is now
[1,1,1,1]
Whats wrong I am doing. and how do i achieve what i want to achieve.
Basically I am trying to create unit vectors.
Thanks
This line (these lines, really):
vector = vectorized
copies the list reference. You need to do a shallow copy of the sequence contents.
vector = vectorized[:]
You are creating a single list and then giving it several different names. Remember that a = b doesn't create a new object. It just means that a and b are both names for the same thing.
Try this instead:
for ...:
vector = [0] * length
...
The line
vector = vectorized
is not making a copy of vectorized. Any time you change vector from then on, you are also changing `vectorized.
You can change the first line to:
vector = vectorized[:]
or
import copy
vector = copy.copy(vectorized)
If you want to make a copy.
While in Python, when you assign a list to a new list, the new list is just a pointer rather than a brand new one.
So when you trying to modify the value of "vector", you are actually changing the value of "vectorized".
And in your case,
vector[i] = 1
is same as
vectorized[i] = 1
Your problem is that when you write vector = vectorized it is not creating a copy of the array, rather it is creating a binding between the two.
Assignment statements in Python do not copy objects, they create bindings between a target and an object.
http://docs.python.org/library/copy.html
This should help you get it sorted out.
And here's a little snippet from the python REPL to show you what I mean.
>>> vectorized = [0] * 4
>>> print vectorized
[0, 0, 0, 0]
>>> vector = vectorized
>>> vector[1] = 1
>>> print vectorized
[0, 1, 0, 0]
EDIT: Jeez you guys are fast!