Assume I have two numpy arrays as follows:
{0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
Now I want to replace each array with the ID at the front, i.e. the values in array 0 become 0 and in array 1 become 1, then both arrays should be merged, whereby the index order must be correct.
I.e. desired output:
array([1, 0, 1, 0, 1, 0, 0 ,0])
But that's what I get:
np.concatenate((h1,h2), axis=0)
array([0, 0, 0, 0, 0, 1, 1, 1])
(Each array contains only unique values, if this helps.)
How can this be done?
Your description of merging is a bit unclear. But here's something that makes sense
In [399]: dd ={0: np.array([ 2, 4, 8, 9, 12]),
...: 1: np.array([ 1, 3, 5])}
In [403]: res = np.zeros(13, int)
In [404]: res[dd[0]] = 0
In [405]: res[dd[1]] = 1
In [406]: res
Out[406]: array([0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0])
Or to make the assignments clearer:
In [407]: res = np.zeros(13, int)
In [408]: res[dd[0]] = 2
In [409]: res[dd[1]] = 1
In [410]: res
Out[410]: array([0, 1, 2, 1, 2, 1, 0, 0, 2, 2, 0, 0, 2])
Otherwise the talk index positions doesn't make a whole lot of sense.
Something like this?
d = {0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
(np.concatenate([d[0],d[1]]).argsort(kind="stable")>=len(d[0])).view(np.uint8)
# array([1, 0, 1, 0, 1, 0, 0, 0], dtype=uint8)
.concatenate Just appends lists/arrays.
Maybe an unconventional way to go about it, but you could repeat the [0 1] pattern for the len of the shortest array, using numpy.repeat and then add repeated 1 values for the difference of the two arrays?
if len(h1) > len(h2):
temp = len(h2)
else:
temp = len(h1)
diff = abs(h1-h2)
for i in range(temp):
A = numpy.repeat(0, 1)
for i in range(diff):
B = numpy.repeat(1)
C = numpy.concatenate((A,B), axis=0)
Maybe not the most dynamic or kindest way to go about this but if your solution requires just that, then it could do the job in the meantime.
Related
I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))
mat = [[ 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10.]
[11. 12. 13. 14. 15.]]
Suppose, I have this NumPy array.
Say, I need to extract the 2nd column of each row, convert them into binary, and then create a vector out of them.
How can I do it using NumPy?
For instance, if I select 2nd column of this NumPy array, my output should look as follows:
[[0 0 1 0],
[0 1 1 1],
[1 1 0 0]]
I tried as follows:
my_data = np.genfromtxt('data_input')
print(my_data)
my_data_2nd_column = my_data[:, 1]
my_data_2nd_column_binary = Utils.encode(my_data_2nd_column)
my_2nd_column_binary = np.apply_along_axis(Utils.encode, 1, my_data)
print(my_2nd_column_binary)
Numpy has a built-in function for this. First, you can get a particular column using indexing:
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> arr[:, [1]]
array([[ 2],
[ 7],
[12]])
Then, you could use the built-in function, but make sure you convert to unsigned, 8-bit integers:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0]], dtype=uint8)
Of course, if you need the second dimension to be rank 4, just use slicing again, although, it is probably worth copying if you are going to do lots of operations on the resulting array:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)[:, -4:]
array([[0, 0, 1, 0],
[0, 1, 1, 1],
[1, 1, 0, 0]], dtype=uint8)
I had done this example, but without using the numpy library.
I commented on all functions.
mat = [[ 1, 2, 3, 4, 1,],
[ 6, 7, 8, 9, 40,],
[11, 12, 13, 14, 15,]]
# convert the binary into a vector of elements
def split(word):
return [int(char) for char in word]
# returns the vector size of the largest binary
def binaryBig(lista):
maior = max(lista, key=int)
temp = "{0:b}".format(maior)
return len(split(temp))
# convert the element to binary
def binary(x,big):
temp = split(format(x, "b"))
for n in range(len(temp),big):
temp.insert(0,0)
return temp
# create the matrix with the binaries
def createBinaryMat(lista):
big = binaryBig(lista)
mat = []
for i in lista:
mat.append(binary(i,big))
return mat
# select the column and return the created matrix
def binaryElementsOfColum(colum,mat):
lista = []
for i in mat:
lista.append(i[colum])
return createBinaryMat(lista)
for i in binaryElementsOfColum(4,mat):
print(i)
Output:
[0, 0, 0, 0, 0, 1]
[1, 0, 1, 0, 0, 0]
[0, 0, 1, 1, 1, 1]
Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.
Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?
for example:
arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr
[1 1 1 2 2 4 6 8 8 8 8 8 2]
A way to do it would be with this function:
def fill_zeros_with_last(arr):
last_val = None # I don't really care about the initial value
for i in range(arr.size):
if arr[i]:
last_val = arr[i]
elif last_val is not None:
arr[i] = last_val
However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.
If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...
Any ideas? or should we go straight to Cython?
Disclaimer:
I would say long ago I found a question in stackoverflow asking something like this or very similar. I wasn't able to find it. :-(
Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...
Here's a solution using np.maximum.accumulate:
def fill_zeros_with_last(arr):
prev = np.arange(len(arr))
prev[arr == 0] = 0
prev = np.maximum.accumulate(prev)
return arr[prev]
We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
Then prev looks like:
array([ 0, 0, 0, 3, 3, 5, 6, 7, 7, 7, 7, 7, 12])
Then we just index into arr with prev and we obtain our result. A test:
>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
Note: Be careful to understand what this does when the first entry of your array is zero:
>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])
Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:
def fill_zeros_with_last(arr, initial=0):
ind = np.nonzero(arr)[0]
cnt = np.cumsum(np.array(arr, dtype=bool))
return np.where(cnt, arr[ind[cnt-1]], initial)
I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)
If the 0s only come in strings of 1, this use of nonzero might work:
In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])
I can handle your arr by applying this repeatedly until I is empty.
In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
In [287]: while True:
.....: I=np.nonzero(arr==0)[0]
.....: if len(I)==0: break
.....: arr[I] = arr[I-1]
.....:
In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])
If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.
I have a numpy array like that:
a = numpy.array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0])
The goal is to find the ranges of zeros and ones, start and end indices. I want to use this ranges in another numpy array which contains time stamps to find out how much time each zero phase takes. Something like that:
dur = numpy.diff(time[start idx, end idx])
However, numpy where gives me all indices:
numpy.where(a==0)
(array([ 3, 4, 5, 9, 10, 11], dtype=int64),)
I would need only start and end idx of each zero phase, like [[3,5],[9,11]]. How can I achieve that?
Here's one approach -
def start_stop(a, val=0):
n = np.concatenate(([False], a==val,[False]))
idx = np.flatnonzero(np.diff(n))
# or idx = np.flatnonzero(n[1:] != n[:-1])
return idx[::2], idx[1::2]-1
Shorter way -
def start_stop_v2(a, val=0):
idx = np.flatnonzero(np.diff(np.r_[0,a==val,0]))
return idx[::2], idx[1::2]-1
One-liner -
np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2) - [0,1]
Sample run -
In [324]: a
Out[324]: array([1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0])
In [325]: start_stop(a, val=0)
Out[325]: (array([3, 9]), array([ 5, 11]))
In [326]: start_stop_v2(a, val=0)
Out[326]: (array([3, 9]), array([ 5, 11]))
In [327]: np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2) - [0,1]
Out[327]:
array([[ 3, 5],
[ 9, 11]])
Re-using of np.where(a==val)
To solve it re-using result of np.where(a==val) -
In [388]: idx = numpy.where(a==0)[0]
In [389]: mask = np.r_[True,np.diff(idx)!=1,True]
In [390]: idx[mask[:-1]] # starts
Out[390]: array([3, 9])
In [391]: idx[mask[1:]] # stops
Out[391]: array([ 5, 11])
I have the following array
a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
I would like to find the start and the end index of the array where the values are zeros consecutively. For the array above the output would be as follows
[3,8],[12,15],[19]
I want to achieve this as efficiently as possible.
Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)
import numpy as np
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example:
In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]
In [237]: runs = zero_runs(a)
In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])
With this format, it is simple to get the number of zeros in each run:
In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])
It's always a good idea to check the edge cases:
In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])
In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])
In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)
In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
You can use itertools to achieve your expected result.
from itertools import groupby
a= [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
b = range(len(a))
for group in groupby(iter(b), lambda x: a[x]):
if group[0]==0:
lis=list(group[1])
print [min(lis),max(lis)]
Here is a custom function, not sure the most efficient but works :
def getZeroIndexes(li):
begin = 0
end = 0
indexes = []
zero = False
for ind,elt in enumerate(li):
if not elt and not zero:
begin = ind
zero = True
if not elt and zero:
end = ind
if elt and zero:
zero = False
if begin == end:
indexes.append(begin)
else:
indexes.append((begin, end))
return indexes