Kinect + Python - Fill depth for shadows

Kinect + Python - Fill depth for shadows - python

The Kinect camera returns a depth image for the whole view. Due to the way the image is captured, some small areas are invisible to the camera. For those areas 2047 is returned.
I want to fill those areas with the value that is left of them - which is the most likely value for that area. I have the depth as numpy uint16 array. The trivial solution would be:
for x in xrange(depth.shape[1]):
for y in xrange(depth.shape[0]):
if depth[y,x] == 2047 and x > 0:
depth[y,x] = depth[y,x-1]
This takes around 16 seconds to execute (Raspberry 2) per 640 x 480 frame.
I came up with a solution using indexes:
w = numpy.where(depth == 2047)
w = zip(w[0], w[1])
for index in w:
if depth[index] == 2047 and index[1] > 0:
depth[index] = depth[index[0],index[1] - 1]
This takes around 0.6 seconds to execute for a test frame. Much faster but still far from perfect. Index computation and zip only take 0.04 seconds, so the main performance killer is the loop.
I reduced it to 0.3 seconds by using item():
for index in w:
if depth.item(index) == 2047 and index[1] > 0:
depth.itemset(index, depth.item(index[0],index[1] - 1))
Can this be improved further using only python (+numpy/opencv)? Compared to how fast simple filtering is, it should be possible to be faster than 0.05s

You have islands going behind the places where the elements in the input array are 2. The job here is to fill the shadows with the element right before starting the shadows. So, one way would be to find out the start and stop places of those islands and put x and -x at those places respectively, where x is the element right before starting of each island. Then, do cumsum along the rows, which would effectively fill the shodow-islands with x. That's all there is for a vectorized solution! Here's the implementation -
# Get mask of places to be updated
mask = np.zeros(np.array(depth.shape) + [0,1],dtype=bool)
mask[:,1:-1] = depth[:,1:] == 2047
# Get differentiation along the second axis and thus island start and stops
diffs = np.diff(mask.astype(int),axis=1)
start_mask = diffs == 1
stop_mask = diffs == -1
# Get a mapping array that has island places filled with the start-1 element
map_arr = np.zeros_like(diffs)
map_arr[start_mask] = depth[start_mask]
map_arr[stop_mask] = -depth[start_mask]
map_filled_arr = map_arr.cumsum(1)[:,:-1]
# Use mask created earlier to selectively set elements from map array
valid_mask = mask[:,1:-1]
depth[:,1:][valid_mask] = map_filled_arr[valid_mask]
Benchmarking
Define functions :
def fill_depth_original(depth):
for x in xrange(depth.shape[1]):
for y in xrange(depth.shape[0]):
if depth[y,x] == 2047 and x > 0:
depth[y,x] = depth[y,x-1]
def fill_depth_original_v2(depth):
w = np.where(depth == 2047)
w = zip(w[0], w[1])
for index in w:
if depth[index] == 2047 and index[1] > 0:
depth[index] = depth[index[0],index[1] - 1]
def fill_depth_vectorized(depth):
mask = np.zeros(np.array(depth.shape) + [0,1],dtype=bool)
mask[:,1:-1] = depth[:,1:] == 2047
diffs = np.diff(mask.astype(int),axis=1)
start_mask = diffs == 1
stop_mask = diffs == -1
map_arr = np.zeros_like(diffs)
map_arr[start_mask] = depth[start_mask]
map_arr[stop_mask] = -depth[start_mask]
map_filled_arr = map_arr.cumsum(1)[:,:-1]
valid_mask = mask[:,1:-1]
depth[:,1:][valid_mask] = map_filled_arr[valid_mask]
Runtime tests and verify outputs :
In [303]: # Create a random array and get a copy for profiling vectorized method
...: depth = np.random.randint(2047-150,2047+150,(500,500))
...: depthc1 = depth.copy()
...: depthc2 = depth.copy()
...:
In [304]: fill_depth_original(depth)
...: fill_depth_original_v2(depthc1)
...: fill_depth_vectorized(depthc2)
...:
In [305]: np.allclose(depth,depthc1)
Out[305]: True
In [306]: np.allclose(depth,depthc2)
Out[306]: True
In [307]: # Create a random array and get a copy for profiling vectorized method
...: depth = np.random.randint(2047-150,2047+150,(500,500))
...: depthc1 = depth.copy()
...: depthc2 = depth.copy()
...:
In [308]: %timeit fill_depth_original(depth)
...: %timeit fill_depth_original_v2(depthc1)
...: %timeit fill_depth_vectorized(depthc2)
...:
10 loops, best of 3: 89.6 ms per loop
1000 loops, best of 3: 1.47 ms per loop
100 loops, best of 3: 10.3 ms per loop
So, the second approach listed in the question still looks like winning!

Related

Finding an integer at a given position i in a sequence of increasing and juxtaposing integers

Disclosure: This is for homework help
I want to find an integer at a given position i while repeatedly constructing and adding to a sequence of integers preferably with decent run time and performance.

You're rebuilding the sub-sequences from 1 at each iteration of the while instead of simply keeping a sequence and adding the next number in the following iteration, and then extend the main list with that.
Also, you should defer the str.join until after the while and not build strings at each iteration:
from itertools import count
def give_output(digitPos):
c = count(1)
l, lst = [], []
while len(lst) <= digitPos:
l.append(next(c)) # update previous sub-sequence
lst.extend(l)
return int(''.join(map(str, lst))[digitPos-1])
Timings:
In [10]: %%timeit
...: giveOutput(500)
...:
1000 loops, best of 3: 219 µs per loop
In [11]: %%timeit
...: give_output(500)
...:
10000 loops, best of 3: 126 µs per loop
About half the time!
You can even do better if you pick the i th item using a div-mod approach instead of building a large string; I'll leave that to you.

In fact, moving the list and index outside the function might cache your results better:
list1 = []
i = 2
def giveOutput():
global list1
global i
digitPos = int(input())
while len(list1) <= digitPos:
list1.extend(list(map(int, ''.join(map(str, range(1, i))))))
i = i + 1
print(list1[digitPos -1])
This only really works well when you are given a number of testcases.
Update: (Thanks to Moses for ideas about building up strings)
in fact your lists could just be a strings:
all_digits = ''
digits_up_to_i = ''
i = 1
def giveOutput():
global all_digits
global digits_up_to_i
global i
digitPos = int(input())
while len(all_digits) <= digitPos:
digits_up_to_i += str(i)
all_digits += digits_up_to_i
i = i + 1
print(all_digits[digitPos -1])

Spedup distance and summary computation between two HUGE multi-dimensional arrays in python

I have only a year of experience with using python. I would like to find summary statistics based on two multi-dimensional arrays DF_All and DF_On. Both have X,Y values. A function is created that computes distance as sqrt((X-X0)^2 + (Y-Y0)^2) and generates summaries as shown in the code below. My question is: Is there any way to make this code run faster? I would prefer a native python method but other strategies (like numba are also welcomed).
The example (toy) code below takes only 50 milliseconds to run on my windows-7 x64 desktop. But my DF_All has more than 10,000 rows and I need to do this calculation a huge number of times as well resulting in a huge execution time.
import numpy as np
import pandas as pd
import json, random
# create data
KY = ['ER','WD','DF']
DS = ['On','Off']
DF_All = pd.DataFrame({'KY': np.random.choice(KY,20,replace = True),
'DS': np.random.choice(DS,20,replace = True),
'X': random.sample(range(1,100),20),
'Y': random.sample(range(1,100),20)})
DF_On = DF_All[DF_All['DS']=='On']
# function
def get_values(DF_All,X = list(DF_On['X'])[0],Y = list(DF_On['Y'])[0]):
dist_vector = np.sqrt((DF_All['X'] - X)**2 + (DF_All['Y'] - Y)**2) # computes distance
DF_All = DF_All[dist_vector<35] # filters if distance is < 35
# print(DF_All.shape)
DS_summary = [sum(DF_All['DS']==x) for x in ['On','Off']] # get summary
KY_summary = [sum(DF_All['KY']==x) for x in ['ER','WD','DF']] # get summary
joined_summary = DS_summary + KY_summary # join two summary lists
return(joined_summary) # return
Array_On = DF_On.values.tolist() # convert to array then to list
Values = [get_values(DF_All,ZZ[2],ZZ[3]) for ZZ in Array_On] # list comprehension to get DS and KY summary for all rows of Array_On list
Array_Updated = [x + y for x,y in zip(Array_On,Values)] # appending the summary list to Array_On list
Array_Updated = pd.DataFrame(Array_Updated) # converting to pandas dataframe
print(Array_Updated)

Here's an approach making use of vectorization by getting rid of the looping there -
from scipy.spatial.distance import cdist
def get_values_vectorized(DF_All, Array_On):
a = DF_All[['X','Y']].values
b = np.array(Array_On)[:,2:].astype(int)
v_mask = (cdist(b,a) < 35).astype(int)
DF_DS = DF_All.DS.values
DS_sums = v_mask.dot(DF_DS[:,None] == ['On','Off'])
DF_KY = DF_All.KY.values
KY_sums = v_mask.dot(DF_KY[:,None] == ['ER','WD','DF'])
return np.column_stack(( DS_sums, KY_sums ))
Using a bit less memory, a tweaked one -
def get_values_vectorized_v2(DF_All, Array_On):
a = DF_All[['X','Y']].values
b = np.array(Array_On)[:,2:].astype(int)
v_mask = cdist(a,b) < 35
DF_DS = DF_All.DS.values
DS_sums = [((DF_DS==x)[:,None] & v_mask).sum(0) for x in ['On','Off']]
DF_KY = DF_All.KY.values
KY_sums = [((DF_KY==x)[:,None] & v_mask).sum(0) for x in ['ER','WD','DF']]
out = np.column_stack(( np.column_stack(DS_sums), np.column_stack(KY_sums)))
return out
Runtime test -
Case #1 : Original sample size of 20
In [417]: %timeit [get_values(DF_All,ZZ[2],ZZ[3]) for ZZ in Array_On]
100 loops, best of 3: 16.3 ms per loop
In [418]: %timeit get_values_vectorized(DF_All, Array_On)
1000 loops, best of 3: 386 µs per loop
Case #2: Sample size of 2000
In [420]: %timeit [get_values(DF_All,ZZ[2],ZZ[3]) for ZZ in Array_On]
1 loops, best of 3: 1.39 s per loop
In [421]: %timeit get_values_vectorized(DF_All, Array_On)
100 loops, best of 3: 18 ms per loop

Find all occurences of a specified match of two numbers in numpy array

what i need to achieve is to get array of all indexes, where in my data array filled with zeros and ones is step from zero to one. I need very quick solution, because i have to work with milions of arrays of hundrets milions length. It will be running in computing centre. For instance..
data_array = np.array([1,1,0,1,1,1,0,0,0,1,1,1,0,1,1,0])
result = [3,9,13]

try this:
In [23]: np.where(np.diff(a)==1)[0] + 1
Out[23]: array([ 3, 9, 13], dtype=int64)
Timing for 100M element array:
In [46]: a = np.random.choice([0,1], 10**8)
In [47]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.46 s per loop
In [48]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.64 s per loop

Here's the procedure:
Compute the diff of the array
Find the index where the diff == 1
Add 1 to the results (b/c len(diff) = len(orig) - 1)
So try this:
index = numpy.nonzero((data_array[1:] - data_array[:-1]) == 1)[0] + 1
index
# [3, 9, 13]

Well thanks a lot to all of you. Solution with nonzero is probably better for me, because I need to know steps from 0->1 and also 1->0 and finally calculate differences. So this is my solution. Any other advice appreciated .)
i_in = np.nonzero( (data_array[1:] - data_array[:-1]) == 1 )[0] +1
i_out = np.nonzero( (data_array[1:] - data_array[:-1]) == -1 )[0] +1
i_return_in_time = (i_in - i_out[:i_in.size] )

Since it's an array filled with 0s and 1s, you can benefit from just comparing rather than performing arithmetic operation between the one-shifted versions to directly give us the boolean array, which could be fed to np.flatnonzero to get us the indices and the final output.
Thus, we would have an implementation like so -
np.flatnonzero(data_array[1:] > data_array[:-1])+1
Runtime test -
In [26]: a = np.random.choice([0,1], 10**8)
In [27]: %timeit np.nonzero((a[1:] - a[:-1]) == 1)[0] + 1
1 loop, best of 3: 1.91 s per loop
In [28]: %timeit np.where(np.diff(a)==1)[0] + 1
1 loop, best of 3: 1.91 s per loop
In [29]: %timeit np.flatnonzero(a[1:] > a[:-1])+1
1 loop, best of 3: 954 ms per loop

Most efficient way to extract parts of one array based on another

I have a time series with about 150 million points. I need to zoom in on 3 million points. That is, I need to extract the 100 time points surrounding each of those 3 million areas of interest in this 150 million point time series.
Attempt:
def get_waveforms(data,spiketimes,lookback=100,lookahead=100):
answer = zeros((len(spiketimes),(lookback+lookahead)))
duration = len(data)
for i in xrange(len(spiketimes)):
if(spiketimes[i] - lookback) > 0 and spiketimes[i] + lookahead) < duration:
answer[i,:] = data[(spiketimes[i]-lookback):(spiketimes[i]+lookahead)]
return answer
This eats up all available memory on my Mac. It explodes if I try to pass and array of where len(array) > 100000. Is there a more memory efficient or (hopefully) more elegant approach to pull out parts of one array based on another?
Related
This answer is related. However, I'm not exactly sure how to apply it and avoid a loop. Would I, effectively, be indexing the time series vector over and over with the columns of a boolean matrix?

You are allocating an array of 200 * len(spiketimes) floats, so for your 100,000 item spiketimes should only be about 160 MB, which doesn't seem like much. On the other hand, if you go to 1,000,000 spiketimes, a 1.6 GB single array may be a stretch for some systems. If you have the memory, you can vectorize the extraction with something like this:
def get_waveforms(data, spiketimes, lookback=100, lookahead=100) :
offsets = np.arange(-lookback, lookahead)
indices = spiketimes + offsets[:, None]
ret = np.take(data, indices, mode='clip')
ret[:, spiketimes < lookback] = 0
ret[:, spiketimes + lookahead >= len(data)] = 0
return ret
The handling of the spiketimes too close to the edges of data mimics that in your function with loops.
The wise thing to do when you have so much data is to take views into it. That is harder to vectorize (or at least I haven't figured how to), but since you aren't copying any of the data, the python loop will not be much slower:
def get_waveforms_views(data, spiketimes, lookback=100, lookahead=100) :
ret = []
for j in spiketimes :
if j < lookback or j + lookahead >= len(data) :
ret.append(None)
else :
ret.append(data[j - lookback:j + lookahead])
return ret
With the following test data:
data_points, num_spikes = 1000000, 10000
data = np.random.rand(data_points)
spiketimes = np.random.randint(data_points, size=(num_spikes))
I get these timings:
In [2]: %timeit get_waveforms(data, spiketimes)
1 loops, best of 3: 320 ms per loop
In [3]: %timeit get_waveforms_views(data, spiketimes)
1 loops, best of 3: 313 ms per loop

Is there lexographical version of searchsorted in numpy?

I have two arrays which are lex-sorted.
In [2]: a = np.array([1,1,1,2,2,3,5,6,6])
In [3]: b = np.array([10,20,30,5,10,100,10,30,40])
In [4]: ind = np.lexsort((b, a)) # sorts elements first by a and then by b
In [5]: print a[ind]
[1 1 1 2 2 3 5 6 6]
In [7]: print b[ind]
[ 10 20 30 5 10 100 10 30 40]
I want to do a binary search for (2, 7) and (5, 150) expecting (4, 7) as the answer.
In [6]: np.lexsearchsorted((a,b), ([2, 5], [7,150]))
We have searchsorted function but that works only on 1D arrays.

EDIT: Edited to reflect comment.
def comp_leq(t1,t2):
if (t1[0] > t2[0]) or ((t1[0] == t2[0]) and (t1[1] > t2[1])):
return 0
else:
return 1
def bin_search(L,item):
from math import floor
x = L[:]
while len(x) > 1:
index = int(floor(len(x)/2) - 1)
#Check item
if comp_leq(x[index], item):
x = x[index+1:]
else:
x = x[:index+1]
out = L.index(x[0])
#If greater than all
if item >= L[-1]:
return len(L)
else:
return out
def lexsearch(a,b,items):
z = zip(a,b)
return [bin_search(z,item) for item in items]
if __name__ == '__main__':
a = [1,1,1,2,2,3,5,6,6]
b = [10,20,30,5,10,100,10,30,40]
print lexsearch(a,b,([2,7],[5,150])) #prints [4,7]

This code seems to do it for a set of (exactly) 2 lexsorted arrays
You might be able to make it faster if you create a set of values[-1], and than create a dictionary with the boundries for them.
I haven't checked other cases apart from the posted one, so please verify it's not bugged.
def lexsearchsorted_2(arrays, values, side='left'):
assert len(arrays) == 2
assert (np.lexsort(arrays) == range(len(arrays[0]))).all()
# here it will be faster to work on all equal values in 'values[-1]' in one time
boundries_l = np.searchsorted(arrays[-1], values[-1], side='left')
boundries_r = np.searchsorted(arrays[-1], values[-1], side='right')
# a recursive definition here will make it work for more than 2 lexsorted arrays
return tuple([boundries_l[i] +
np.searchsorted(arrays[-2[boundries_l[i]:boundries_r[i]],
values[-2][i],
side=side)
for i in range(len(boundries_l))])
Usage:
import numpy as np
a = np.array([1,1,1,2,2,3,5,6,6])
b = np.array([10,20,30,5,10,100,10,30,40])
lexsearchsorted_2((b, a), ([7,150], [2, 5])) # return (4, 7)

I ran into the same issue and came up with a different solution. You can treat the multi-column data instead as single entries using a structured data type. A structured data type will allow one to use argsort/sort on the data (instead of lexsort, although lexsort appears faster at this stage) and then use the standard searchsorted. Here is an example:
import numpy as np
from itertools import repeat
# Setup our input data
# Every row is an entry, every column what we want to sort by
# Unlike lexsort, this takes columns in decreasing priority, not increasing
a = np.array([1,1,1,2,2,3,5,6,6])
b = np.array([10,20,30,5,10,100,10,30,40])
data = np.transpose([a,b])
# Sort the data
data = data[np.lexsort(data.T[::-1])]
# Convert to a structured data-type
dt = np.dtype(zip(repeat(''), repeat(data.dtype, data.shape[1]))) # the structured dtype
data = np.ascontiguousarray(data).view(dt).squeeze(-1) # the dtype change leaves a trailing 1 dimension, ascontinguousarray is required for the dtype change
# You can also first convert to the structured data-type with the two lines above then use data.sort()/data.argsort()/np.sort(data)
# Search the data
values = np.array([(2,7),(5,150)], dtype=dt) # note: when using structured data types the rows must be a tuple
pos = np.searchsorted(data, values)
# pos is (4,7) in this example, exactly what you would want
This works for any number of columns, uses the built-in numpy functions, the columns remain in the "logical" order (decreasing priority), and it should be quite fast.
A compared the two two numpy-based methods time-wise.
#1 is the recursive method from #j0ker5 (the one below extends his example with his suggestion of recursion and works with any number of lexsorted rows)
#2 is the structured array from me
They both take the same inputs, basically like searchsorted except a and v are as per lexsort.
import numpy as np
def lexsearch1(a, v, side='left', sorter=None):
def _recurse(a, v):
if a.shape[1] == 0: return 0
if a.shape[0] == 1: return a.squeeze(0).searchsorted(v.squeeze(0), side)
bl = np.searchsorted(a[-1,:], v[-1], side='left')
br = np.searchsorted(a[-1,:], v[-1], side='right')
return bl + _recurse(a[:-1,bl:br], v[:-1])
a,v = np.asarray(a), np.asarray(v)
if v.ndim == 1: v = v[:,np.newaxis]
assert a.ndim == 2 and v.ndim == 2 and a.shape[0] == v.shape[0] and a.shape[0] > 1
if sorter is not None: a = a[:,sorter]
bl = np.searchsorted(a[-1,:], v[-1,:], side='left')
br = np.searchsorted(a[-1,:], v[-1,:], side='right')
for i in xrange(len(bl)): bl[i] += _recurse(a[:-1,bl[i]:br[i]], v[:-1,i])
return bl
def lexsearch2(a, v, side='left', sorter=None):
from itertools import repeat
a,v = np.asarray(a), np.asarray(v)
if v.ndim == 1: v = v[:,np.newaxis]
assert a.ndim == 2 and v.ndim == 2 and a.shape[0] == v.shape[0] and a.shape[0] > 1
a_dt = np.dtype(zip(repeat(''), repeat(a.dtype, a.shape[0])))
v_dt = np.dtype(zip(a_dt.names, repeat(v.dtype, a.shape[0])))
a = np.asfortranarray(a[::-1,:]).view(a_dt).squeeze(0)
v = np.asfortranarray(v[::-1,:]).view(v_dt).squeeze(0)
return a.searchsorted(v, side, sorter).ravel()
a = np.random.randint(100, size=(2,10000)) # Values to sort, rows in increasing priority
v = np.random.randint(100, size=(2,10000)) # Values to search for, rows in increasing priority
sorted_idx = np.lexsort(a)
a_sorted = a[:,sorted_idx]
And the timing results (in iPython):
# 2 rows
%timeit lexsearch1(a_sorted, v)
10 loops, best of 3: 33.4 ms per loop
%timeit lexsearch2(a_sorted, v)
100 loops, best of 3: 14 ms per loop
# 10 rows
%timeit lexsearch1(a_sorted, v)
10 loops, best of 3: 103 ms per loop
%timeit lexsearch2(a_sorted, v)
100 loops, best of 3: 14.7 ms per loop
Overall the structured array approach is faster, and can be made even faster if you design it to work with the flipped and transposed versions of a and v. It gets even faster as the numbers of rows/keys goes up, barely slowing down when going from 2 rows to 10 rows.
I did not notice any significant timing difference between using a_sorted or a and sorter=sorted_idx so I left those out for clarity.
I believe that a really fast method could be made using Cython, but this is as fast as it is going to get with pure pure Python and numpy.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Kinect + Python - Fill depth for shadows - python

Related

Finding an integer at a given position i in a sequence of increasing and juxtaposing integers

Spedup distance and summary computation between two HUGE multi-dimensional arrays in python

Find all occurences of a specified match of two numbers in numpy array

Most efficient way to extract parts of one array based on another

Is there lexographical version of searchsorted in numpy?

Categories

Resources