Distance to next non-zero element in one-dimensional numpy array - python

I have a one-dimensional numpy array consisting of ones and zeroes, like this:
[0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
For each non-zero element of the array, I want to calculate the "distance" to the next non-zero element. That is, I want to answer the question "How far away is the next non-zero element?" So the result for the above array would be:
[0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
Is there a built-in numpy function for this? And if not, what's the most efficient way to implement this in numpy?

Here is 2 liners. If you don't want override original a replace with copy()
import numpy as np
a = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
ix = np.where(a)[0]
a[ix[:-1]] = np.diff(ix)
print(a[:-1]) # --> array([0, 0, 0, 1, 3, 0, 0, 6, 0, 0, 0, 0, 0, 4, 0, 0, 0])

Probably not the best answer.
np.where will give you the locations of the non-zero indices in increasing order. By iterating through the result, you know the location of each 1 and the location of the following 1, and can build the result array yourself easily. If the 1s are sparse, this is probably pretty efficient.
Let me see if I can think of something more numpy-ish.
== UPDATE ==
Ah, just came to me
# Find the ones in the array
temp = np.where(x)[0]
# find the difference between adjacent elements
deltas = temp[1:] - temp[:-1]
# Build the result based on these
result = np.zeros_like(x)
result[temp[:-1]] = deltas

Let's try:
import numpy as np
arr = np.array([0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1])
# create output
res = np.zeros_like(arr)
# select indices non-zero
where, = np.where(arr)
# assign the indices of the non-zero the diff
res[where[:-1]] = np.diff(where)
print(res)
Output
[0 0 0 1 3 0 0 6 0 0 0 0 0 4 0 0 0 0]

Related

Transform 2d numpy array into 2d one hot encoding

How would I transform
a=[[0,6],
[3,7],
[5,5]]
into
b=[[1,0,0,0,0,0,1,0],
[0,0,0,1,0,0,0,1],
[0,0,0,0,0,1,0,0]]
I want to bring notice to how the final array in b only has one value set to 1 due to the repeat in the final array in a.
Using indexing:
a = np.array([[0,6],
[3,7],
[5,5]])
b = np.zeros((len(a), a.max()+1), dtype=int)
b[np.arange(len(a)), a.T] = 1
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])
This can also be done using numpy broadcasting and boolean comparision in the following way:
a = np.array([[0,6],
[3,7],
[5,5]])
# Convert to 3d array so that each element is present along the last axis
# Compare with max+1 to get the index of values as True.
b = (a[:,:,None] == np.arange(a.max()+1))
# Check if any element along axis 1 is true and convert the type to int
b = b.any(1).astype(int)
Output:
array([[1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0]])

Numpy re-index to first N natural numbers

I have a matrix that has a quite sparse index (the largest values in both rows and columns are beyond 130000), but only a few of those rows/columns actually have non-zero values.
Thus, I want to have the row and column indices shifted to only represent the non-zero ones, by the first N natural numbers.
Visually, I want a example matrix like this
1 0 1
0 0 0
0 0 1
to look like this
1 1
0 1
but only if all values in the row/column are zero.
Since I do have the matrix in a sparse format, I could simply create a dictionary, store every value by an increasing counter (for row and matrix separately), and get a result.
row_dict = {}
col_dict = {}
row_ind = 0
col_ind = 0
# el looks like this: (row, column, value)
for el in sparse_matrix:
if el[0] not in row_dict.keys():
row_dict[el[0]] = row_ind
row_ind += 1
if el[1] not in col_dict.keys():
col_dict[el[1]] = col_ind
col_ind += 1
# now recreate matrix with new index
But I was looking for maybe an internal function in NumPy. Also note that I do not really know how to word the question, so there might well be a duplicate out there that I do not know of; Any pointers in the right direction are appreciated.
You can use np.unique:
>>> import numpy as np
>>> from scipy import sparse
>>>
>>> A = np.random.randint(-100, 10, (10, 10)).clip(0, None)
>>> A
array([[6, 0, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 4, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0, 0]])
>>> B = sparse.coo_matrix(A)
>>> B
<10x10 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
>>> runq, ridx = np.unique(B.row, return_inverse=True)
>>> cunq, cidx = np.unique(B.col, return_inverse=True)
>>> C = sparse.coo_matrix((B.data, (ridx, cidx)))
>>> C.A
array([[6, 5, 0, 0, 0],
[0, 0, 7, 4, 9],
[0, 0, 0, 4, 0],
[9, 0, 0, 0, 0],
[0, 0, 4, 0, 0]])

How to exceed limitation of numpy.array() to convert list of array to an array of array?

I have a list of arrays containing each one 16 int :
ListOfArray=[array([0,1,....,15]), array([0,1,....,15]), array([0,1,....,15]),....,array([0,1,....,15])]
I want to convert it to an array of array.
So I use :
ListOfArray=numpy.array(ListOfArray)
or:
ListOfArray=numpy.asarray(ListOfArray)
or :
ArrayOfArray=numpy.asarray(ListOfArray)
Same result
If my list of arrays contained less than 17716 arrays I have the normal result :
[[0 0 0 ... 0 0 1]
[1 0 0 ... 0 1 0]
[0 0 0 ... 0 0 1]
...
[0 1 1 ... 1 0 0]
[0 1 1 ... 0 0 0]
[0 1 1 ... 0 0 1]]
But from 17716 arrays I have this :
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1])
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]) ...
array([0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0])
array([0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1])
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])]
It seems that there is a limit somewhere, why ?
Can we exceed it ?
Edit :
there is no problem with numpy.array.. It's wasn't desired to have an array containing 17 values. I converted wav frames into binary and then into string (of fifteen 0 and 1), which I add 0 before if it's a negative and 1 for positive, and then convert to a list and then an array.. I didn't expect a value of -32768 (-0b10000000000000000), believed that -32767 and 32767 (15 binaries digits) was the maximums.
It's a pretty ugly code, i'm not proud, but if you have advice for a less patchworking code, here it is :
import numpy as np
import wave
import struct
f= wave.open('Test16PCM.wav','rb')
nf = f.getnframes()
frames=f.readframes(nf)
f.close()
L=[]
# extracting values samples
for i in range (0,((nf-1)*4)+1,4):
L.append( (struct.unpack('<h',frames[i:(i+2)])[0]) ) # only the left track
Lbin=[] # convert int values to string of binaries + 0 or 1 for negative or positive
for i in L:
a=str(bin(i))
if a[0]=="-" : # something like "-0b00101101"
a=a[3:]
while len(a)<16: # to have same length binary number (was 15 before correction)
a='0'+a
Lbin.append('0'+a)
else : # something like "0b00101101"
a=a[2:]
while len(a)<16:
a='0'+a
Lbin.append('1'+a)
Lout=[]
for i in Lbin :
temp=[]
for j in i :
temp.append(int(j))
temp=np.array(temp)
Lout.append(temp)
Lout=np.asarray(Lout)
print(Lout)

Numpy array creation with patterns

is it possible, in a fast way, to create a (large) 2d numpy array which
contains a value n times per row (randomly placed). e.g., for n = 3
1 0 1 0 1
0 0 1 1 1
1 1 1 0 0
...
same as 1., but place groups of that size n randomly per row. e.g.
1 1 1 0 0
0 0 1 1 1
1 1 1 0 0
...
of course, I could enumerate all rows, but I am wondering if there's a way to create the array using np.fromfunctionor some faster way?
The answer to your first question has a simple one-line solution, which I imagine is pretty efficient. Functions like np.random.shuffle or np.random.permutation must be doing something similar under the hood, but they require a python loop over the rows, which might become a problem if you have very many short rows.
The second question also has a pure numpy solution which should be quite efficient, although it is a little less elegant.
import numpy as np
rows = 20
cols = 10
n = 3
#fixed number of ones per row in random places
print (np.argsort(np.random.rand(rows, cols)) < n).view(np.uint8)
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows*n)/n
J = (np.random.randint(0,cols-n+1, (rows,1))+np.arange(n)).flatten()
data[I, J] = 1
print data
Edit: here is a slightly longer, but more elegant and more performant solution to your second question:
import numpy as np
rows = 20
cols = 10
n = 3
def running_view(arr, window, axis=-1):
"""
return a running view of length 'window' over 'axis'
the returned array has an extra last dimension, which spans the window
"""
shape = list(arr.shape)
shape[axis] -= (window-1)
assert(shape[axis]>0)
return np.lib.index_tricks.as_strided(
arr,
shape + [window],
arr.strides + (arr.strides[axis],))
#fixed number of ones per row in random contiguous place
data = np.zeros((rows, cols), np.uint8)
I = np.arange(rows)
J = np.random.randint(0,cols-n+1, rows)
running_view(data, n)[I,J,:] = 1
print data
First of all you need to import some functions of numpy:
from numpy.random import rand, randint
from numpy import array, argsort
Case 1:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #number of 1's
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 1, 0, 0, 1, 1],
[ 1, 0, 0, 1, 1],
[ 0, 1, 0, 1, 1],
[ 1, 0, 1, 0, 1],
[ 1, 0, 0, 1, 1],
[ 1, 1, 0, 0, 1],
[ 0, 1, 1, 1, 0],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 1, 0]])
Case 2:
a = rand(10,5)
b=[]
for i in range(len(a)):
n=3 #max number of 1's
n=randint(0,(n+1))
b.append((argsort(a[i])>=(len(a[i])-n))*1)
b=array(b)
Result:
print b
array([[ 0, 0, 1, 0, 0],
[ 0, 1, 0, 1, 0],
[ 1, 0, 1, 0, 1],
[ 0, 1, 1, 0, 0],
[ 1, 0, 1, 0, 0],
[ 1, 0, 0, 1, 1],
[ 0, 1, 1, 0, 1],
[ 1, 0, 1, 0, 0],
[ 1, 1, 0, 1, 0],
[ 1, 0, 1, 1, 0]])
I think that could work. To get the result i generate lists of random floats and with "argsort" see what of those are the n biggests of the list, then i filter them as ints (boolean*1-> int).
Just for the fun of it, I tried to find a solution for your first question even if I'm quite new to Python. Here what I have so far :
np.vstack([np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0])),
np.hstack(np.random.permutation([np.random.randint(0,2),
np.random.randint(0,2), np.random.randint(0,2), 0, 0, 0]))])
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1]])
It is not the final answer, but maybe it can help you find an alternate solution using random numbers and permutation.

Creating sublist from a give list of items

I would say first that the following question is not for homework purpose even because i've finish software engineer a few months ago. Anyway today I was working and one friend ask to me this strange sorting problem.
"I have a List with 1000 rows, each row represent a number, and I want to create 10 sub lists each have a similar summation of the numbers from the main list. How can I do that?"
For example I've the main list composed by 5,4,3,2 and 1. It's simple, I create two sub lists
one with 5 and 3 the other with 4,2 and 1 the result of each list it's similar: 8 for the first 7 for the second.
I can't figure it out the algorithm even if know it's simple but I'm missing something.
Let A be the input array. I'll assume it is sorted ascending.
A = [2,3,6,8,11]
Let M[i] be the number of sublist found so far to have sum equal to i.
Starts with only M[0] = 1 because there is one list with has sum equals zero, that is the empty list.
M = [1,0,0,...]
Then take each item from the list A one-by-one.
Update the number of ways you have to compose a list of each sum when considering
that the item you just take can be used.
Suppose a is the new item
for each j:
if M[j] != 0:
M_next[j+a] = M[j+a] + M[j]
When you found any M[j] which reach 10 during that, you should stop the algorithm.
Also, modify to remember the items in the list to be able to get the actual list at the end!
Notes:
You can use sparse representation for M
This is similar to those Knapsack and subset sum problems.
Perhaps you might find many better algorithms reading on those.
Here is a working code in Python:
A = [2,3,6,8,11]
t = sum(A)
M = [0]*(t+1)
M[0] = 1
print 'init M :',M
for a in A:
for j in range(len(M)-1,-1,-1):
if M[j] != 0:
M[j+a] += M[j]
print 'use',a,':',M
And its output:
init M : [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
use 2 : [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
use 3 : [1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
use 6 : [1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
use 8 : [1, 0, 1, 1, 0, 1, 1, 0, 2, 1, 1, 2, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
use 11 : [1, 0, 1, 1, 0, 1, 1, 0, 2, 1, 1, 3, 0, 2, 2, 0, 2, 2, 0, 3, 1, 1, 2, 0, 1, 1, 0, 1, 1, 0, 1]
Take the interpretation of M[11] = 3 at the end for example;
it means there are 3 sublists with sum equals 11.
If you trace the progress, you can see the sublists are {2,3,6},{3,8},{11}.
To account for the fact that you allow the 10 sublists to have similar sum. Not just exactly the same sum. You might want to change termination condition from "terminate if any M[j] >= 10" to "terminate if sum(M[j:j+3]) >= 10" or something like that.

Categories