Related
Assume I have two numpy arrays as follows:
{0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
Now I want to replace each array with the ID at the front, i.e. the values in array 0 become 0 and in array 1 become 1, then both arrays should be merged, whereby the index order must be correct.
I.e. desired output:
array([1, 0, 1, 0, 1, 0, 0 ,0])
But that's what I get:
np.concatenate((h1,h2), axis=0)
array([0, 0, 0, 0, 0, 1, 1, 1])
(Each array contains only unique values, if this helps.)
How can this be done?
Your description of merging is a bit unclear. But here's something that makes sense
In [399]: dd ={0: np.array([ 2, 4, 8, 9, 12]),
...: 1: np.array([ 1, 3, 5])}
In [403]: res = np.zeros(13, int)
In [404]: res[dd[0]] = 0
In [405]: res[dd[1]] = 1
In [406]: res
Out[406]: array([0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0])
Or to make the assignments clearer:
In [407]: res = np.zeros(13, int)
In [408]: res[dd[0]] = 2
In [409]: res[dd[1]] = 1
In [410]: res
Out[410]: array([0, 1, 2, 1, 2, 1, 0, 0, 2, 2, 0, 0, 2])
Otherwise the talk index positions doesn't make a whole lot of sense.
Something like this?
d = {0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
(np.concatenate([d[0],d[1]]).argsort(kind="stable")>=len(d[0])).view(np.uint8)
# array([1, 0, 1, 0, 1, 0, 0, 0], dtype=uint8)
.concatenate Just appends lists/arrays.
Maybe an unconventional way to go about it, but you could repeat the [0 1] pattern for the len of the shortest array, using numpy.repeat and then add repeated 1 values for the difference of the two arrays?
if len(h1) > len(h2):
temp = len(h2)
else:
temp = len(h1)
diff = abs(h1-h2)
for i in range(temp):
A = numpy.repeat(0, 1)
for i in range(diff):
B = numpy.repeat(1)
C = numpy.concatenate((A,B), axis=0)
Maybe not the most dynamic or kindest way to go about this but if your solution requires just that, then it could do the job in the meantime.
Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.
Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?
for example:
arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr
[1 1 1 2 2 4 6 8 8 8 8 8 2]
A way to do it would be with this function:
def fill_zeros_with_last(arr):
last_val = None # I don't really care about the initial value
for i in range(arr.size):
if arr[i]:
last_val = arr[i]
elif last_val is not None:
arr[i] = last_val
However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.
If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...
Any ideas? or should we go straight to Cython?
Disclaimer:
I would say long ago I found a question in stackoverflow asking something like this or very similar. I wasn't able to find it. :-(
Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...
Here's a solution using np.maximum.accumulate:
def fill_zeros_with_last(arr):
prev = np.arange(len(arr))
prev[arr == 0] = 0
prev = np.maximum.accumulate(prev)
return arr[prev]
We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
Then prev looks like:
array([ 0, 0, 0, 3, 3, 5, 6, 7, 7, 7, 7, 7, 12])
Then we just index into arr with prev and we obtain our result. A test:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
Note: Be careful to understand what this does when the first entry of your array is zero:
>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])
Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:
def fill_zeros_with_last(arr, initial=0):
ind = np.nonzero(arr)[0]
cnt = np.cumsum(np.array(arr, dtype=bool))
return np.where(cnt, arr[ind[cnt-1]], initial)
I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)
If the 0s only come in strings of 1, this use of nonzero might work:
In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])
I can handle your arr by applying this repeatedly until I is empty.
In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
In [287]: while True:
.....: I=np.nonzero(arr==0)[0]
.....: if len(I)==0: break
.....: arr[I] = arr[I-1]
.....:
In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.
So let's say I have a list that looks like:
x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
I then have another list with indices that needs to be removed from list x:
x_remove = [1, 4, 5]
I can then use the numpy command delete to remove this from x and end up with:
x_final = np.delete(x, x_remove)
>>> x_final = [0, 0, 1, 0, 0, 0, 0]
So far so good. Now I then figure out that I don't want to use the entire list x, but start perhaps from index 2. So basically:
x_new = x[2:]
>>> x_new = [0, 1, 1, 1, 0, 0, 0, 0]
I do however still need to remove the indices from the x_remove list, but now, as you can see, the indices are not the same placement as before, so the wrong items are removed. And same thing will happen if I do it the other way around (i.e. first removing the indices, and then use slice to start at index 2). So basically it will/should look like:
x_new_final = [0, 1, 1, 0, 0] (first use slice, and the remove list)
x_new_final_v2 = [1, 0, 0, 0, 0] (first use remove list, and then slice)
x_new_final_correct_one = [0, 1, 0, 0, 0, 0] (as it should be)
So is there some way in which I can start my list at various indices (through slicing), and still use the delete command to remove the correct indices that would correspond to the full list ?
You could change the x_remove list depending on the slice location. For example:
slice_location = 2
x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
x_remove = [1, 4, 5]
x_new=x[slice_location:]
x_remove = [x-slice_location for x in x_remove if x-slice_location>0]
x_new = np.delete(x, x_remove)
x = [1, 0, 0, 1, 1, 1, 0, 0, 0, 0]
x_remove = [1, 4, 5]
for index,value in enumerate(x):
for remove_index in x_remove:
if(index == remove_index-1):
x[index] = ""
final_list = [final_value for final_value in x if(final_value != "")]
print(final_list)
Try it this simple way...
First let's explore alternatives for the simple removal (with out this change in starting position issue):
First make an x with unique and easily recognized values:
In [787]: x = list(range(10))
In [788]: x
Out[788]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A list comprehension method - maybe not the fastest, but fairly clear and bug free:
In [789]: [v for i,v in enumerate(x) if i not in x_remove]
Out[789]: [0, 2, 3, 6, 7, 8, 9]
Your np.delete approach:
In [790]: np.delete(x, x_remove)
Out[790]: array([0, 2, 3, 6, 7, 8, 9])
That has a downside of converting x to an array, which is not a trivial task (time wise). It also makes a new array. My guess is that it is slower.
Try in place removeal:
In [791]: y=x[:]
In [792]: for i in x_remove:
...: del y[i]
...:
In [793]: y
Out[793]: [0, 2, 3, 4, 6, 8, 9]
oops - wrong. We need to start from the end (largest index). This is a well known Python 'recipe':
In [794]: y=x[:]
In [795]: for i in x_remove[::-1]:
...: del y[i]
...:
...:
In [796]: y
Out[796]: [0, 2, 3, 6, 7, 8, 9]
Under the covers np.delete is taking a masked approach:
In [797]: arr = np.array(x)
In [798]: arr
Out[798]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [799]: mask = np.ones(arr.shape, bool)
In [800]: mask[x_remove] = False
In [801]: mask
Out[801]:
array([ True, False, True, True, False, False, True, True, True,
True])
In [802]: arr[mask]
Out[802]: array([0, 2, 3, 6, 7, 8, 9])
Now to the question of applying x_remove to a slice of x. The slice of x does not have a record of the slice parameters. That is you can't readily determine that y = x[2:] is missing two values. (Well, I could deduce it by comparing some attributes of x and y, but not from y alone).
So regardless of how you do the delete, you will have to first adjust the values of x_remove.
In [803]: x2 = np.array(x_remove)-2
In [804]: x2
Out[804]: array([-1, 2, 3])
In [805]: [v for i,v in enumerate(x[2:]) if i not in x2]
Out[805]: [2, 3, 6, 7, 8, 9]
This works ok, but that -1 is potentially a problem. We don't want it mean the last element. So we have to first filter out the negative indicies to be safe.
In [806]: np.delete(x[2:], x2)
/usr/local/bin/ipython3:1: FutureWarning: in the future negative indices will not be ignored by `numpy.delete`.
#!/usr/bin/python3
Out[806]: array([2, 3, 6, 7, 8, 9])
If delete didn't ignore negative indices, it could get a mask like this - with a False at the end:
In [808]: mask = np.ones(arr[2:].shape, bool)
In [809]: mask[x2] = False
In [810]: mask
Out[810]: array([ True, True, False, False, True, True, True, False])
I am working on a large array (3000 x 3000) over which I use scipy.ndimage.label. The return is 3403 labels and the labelled array. I would like to know the indices of these labels for e.g. for label 1 I should know the rows and columns in the labelled array.
So basically like this
a[0] = array([[1, 1, 0, 0],
[1, 1, 0, 2],
[0, 0, 0, 2],
[3, 3, 0, 0]])
indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3 is number of labels.
print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]
And I would like to create a list of indices for all 3403 labels like above. The above method seems to be slow. I tried using generators, it doesn't look like there is improvement.
Are there any efficient ways?
Well the idea with gaining efficiency would be to minimize the work once inside the loop. A vectorized method isn't possible given that you would have variable number of elements per label. So, with those factors in mind, here's one solution -
a_flattened = a[0].ravel()
sidx = np.argsort(a_flattened)
afs = a_flattened[sidx]
cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
row, col = np.unravel_index(sidx, a[0].shape)
row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
Sample input, output -
In [59]: a[0]
Out[59]:
array([[1, 1, 0, 0],
[1, 1, 0, 2],
[0, 0, 0, 2],
[3, 3, 0, 0]])
In [60]: a[1]
Out[60]: 3
In [62]: row_indices # row indices
Out[62]:
[array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
array([0, 0, 1, 1]), # for label-1
array([1, 2]), # for label-2
array([3, 3])] # for label-3
In [63]: col_indices # column indices
Out[63]:
[array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
array([0, 1, 0, 1]), # for label-1
array([3, 3]), # for label-2
array([0, 1])] # for label-3
The first elements off row_indices and col_indices are the expected output. The first groups from each those represent the 0-th regions, so you might want to skip those.
I have the following array
a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
I would like to find the start and the end index of the array where the values are zeros consecutively. For the array above the output would be as follows
[3,8],[12,15],[19]
I want to achieve this as efficiently as possible.
Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)
import numpy as np
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example:
In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]
In [237]: runs = zero_runs(a)
In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])
With this format, it is simple to get the number of zeros in each run:
In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])
It's always a good idea to check the edge cases:
In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])
In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])
In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)
In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
You can use itertools to achieve your expected result.
from itertools import groupby
a= [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
b = range(len(a))
for group in groupby(iter(b), lambda x: a[x]):
if group[0]==0:
lis=list(group[1])
print [min(lis),max(lis)]
Here is a custom function, not sure the most efficient but works :
def getZeroIndexes(li):
begin = 0
end = 0
indexes = []
zero = False
for ind,elt in enumerate(li):
if not elt and not zero:
begin = ind
zero = True
if not elt and zero:
end = ind
if elt and zero:
zero = False
if begin == end:
indexes.append(begin)
else:
indexes.append((begin, end))
return indexes