NumPy - calculate histogram intersection - python

The following data, represent 2 given histograms split into 13 bins:
key 0 1-9 10-18 19-27 28-36 37-45 46-54 55-63 64-72 73-81 82-90 91-99 100
A 1.274580708 2.466224824 5.045757621 7.413716262 8.958855646 10.41325305 11.14150951 10.91949012 11.29095648 10.95054297 10.10976255 8.128781795 1.886568472
B 0 1.700493692 4.059243006 5.320899616 6.747120132 7.899067471 9.434997257 11.24520022 12.94569391 12.83598464 12.6165661 10.80636314 4.388370817
I'm trying to follow this article in order to calculate the intersection between those 2 histograms, using this method:
def histogram_intersection(h1, h2, bins):
bins = numpy.diff(bins)
sm = 0
for i in range(len(bins)):
sm += min(bins[i]*h1[i], bins[i]*h2[i])
return sm
Since my data is already calculated as a histogram, I can't use numpy built-in function, so I'm failing to provide the function the necessary data.
How can I process my data to fit the algorithm?

Since you have the same number of bins for both of the histograms you can use:
def histogram_intersection(h1, h2):
sm = 0
for i in range(13):
sm += min(h1[i], h2[i])
return sm

You can calculate it faster and more simply with Numpy:
#!/usr/bin/env python3
import numpy as np
A = np.array([1.274580708,2.466224824,5.045757621,7.413716262,8.958855646,10.41325305,11.14150951,10.91949012,11.29095648,10.95054297,10.10976255,8.128781795,1.886568472])
B = np.array([0,1.700493692,4.059243006,5.320899616,6.747120132,7.899067471,9.434997257,11.24520022,12.94569391,12.83598464,12.6165661,10.80636314,4.388370817])
def histogram_intersection(h1, h2):
sm = 0
for i in range(13):
sm += min(h1[i], h2[i])
return sm
print(histogram_intersection(A,B))
print(np.sum(np.minimum(A,B)))
Output
88.44792356099998
88.447923561
But if you time it, Numpy only takes 60% of the time:
%timeit histogram_intersection(A,B)
5.02 µs ± 65.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.sum(np.minimum(A,B))
3.22 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Some caveats first : in your data bins are ranges, in your algorithm they are numbers. You must redefine bins for that.
Furthermore, min(bins[i]*h1[i], bins[i]*h2[i]) is bins[i]*min(h1[i], h2[i]), so the result can be obtained by :
hists=pandas.read_clipboard(index_col=0) # your data
bins=arange(-4,112,9) # try for bins but edges are different here
mins=hists.min('rows')
intersection=dot(mins,bins)

Related

Fastest way to get dot product of a matrix and each point in the given array

I have an array Nx3 of N points, each one has X, Y and Z coordinate. I need to rotate each point, so i have the rotation matrix 3x3. To do this, i need to get dot product of the rotation matrix and each point. The problem is the array of points is quite massive (~1_000_000x3) and therefore it takes noticeable amount of time to compute the rotated points.
The only solution i come up with so far is simple for loop iterating over array of points. Here is the example with a piece of data:
# Given set of points Nx3: np.ndarray
points = np.array([
[285.679, 219.75, 524.733],
[285.924, 219.404, 524.812],
[285.116, 219.217, 524.813],
[285.839, 219.557, 524.842],
[285.173, 219.507, 524.606]
])
points_t = points.T
# Array to store results
points_rotated = np.empty((points_t.shape))
# Given rotation matrix 3x3: np.ndarray
rot_matrix = np.array([
[0.57423549, 0.81862056, -0.01067613],
[-0.81866133, 0.57405696, -0.01588193],
[-0.00687256, 0.0178601, 0.99981688]
])
for i in range(points.shape[0]):
points_rotated[:, i] = np.dot(rot_matrix, points_t[:, i])
# Result
points_rotated.T
# [[ 338.33677093 -116.05910163 526.59831864]
# [ 338.1933725 -116.45955203 526.6694408 ]
# [ 337.5762975 -115.90543822 526.67265381]
# [ 338.26949115 -116.30261156 526.70275207]
# [ 337.84863885 -115.78233784 526.47047941]]
I dont feel confident using numpy as i quite new to it, so i'm sure there is at least more elegant way to do. I've heard about np.einsum() but i don't understand yet how to implement it here and i'm not sure it will make it quicker. And the main problem is still the computation time, so i want to now how to make it work faster?
Thank you very much in advance!
You can write the code using numba parallelization in no python mode as:
#nb.njit("float64[:, ::1](float64[:, ::1], float64[:, ::1])", parallel=True)
def dot(a, b): # dot(points, rot_matrix)
dot_ = np.zeros((a.shape[0], b.shape[0]))
for i in nb.prange(a.shape[0]):
for j in range(b.shape[0]):
dot_[i, j] = a[i, 0] * b[j, 0] + a[i, 1] * b[j, 1] + a[i, 2] * b[j, 2]
return dot_
It must be better than ko3 answer due to parallelization and signatures and using algebra instead np.dot. I have tested similar code (that was applied just on upper triangle of an array) instead dot product in another answer which showed that was at least 2 times faster (as far as I can remember).
I have tried some codes to see the performance of each solution (based on my system) on 1,000,000 x 3 matrix.
Using np.matmul
Using numba njit with dot product(#)
Using numba njit with your code
Here are the results of three functions.
points = np.random.rand(1000000,3)
rot_matrix = np.array([
[0.57423549, 0.81862056, -0.01067613],
[-0.81866133, 0.57405696, -0.01588193],
[-0.00687256, 0.0178601, 0.99981688]
# Using np.matmul
def function_matmul(points, rot_matrix):
return np.matmul(rot_matrix # points.T).T
# Using numba njit with dot product(#)
#njit
def function_numba_dot(points, rot_matrix):
return (rot_matrix, points.T).T
# Using numba njit with your code
#njit
def function_ori_code(points, rot_matrix):
points_t = points.T
points_rotated = np.empty((points_t.shape))
for i in range(points.shape[0]):
points_rotated[:, i] = np.dot(rot_matrix, points_t[:, i])
return points_rotated.T
%timeit function_matmul(points, rot_matrix)
16.5 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit function_numba_dot(points, rot_matrix)
16.3 ms ± 69.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit function_ori_code(points, rot_matrix)
260 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Fastest way for 2D rolling window quantile?

I want to calculate a rolling quantile of a large 2D matrix with dimensions (1e6, 1e5), column wise. I am looking for the fastest way, since I need to perform this operation thousands of times, and it's very computationally expensive. For experiments window=1000 and q=0.1 is used.
import numpy as np
import pandas as pd
import multiprocessing as mp
from functools import partial
import numba as nb
X = np.random.random((10000,1000)) # Original array has dimensions of about (1e6, 1e5)
My current approaches:
Pandas: %timeit: 5.8 s ± 15.5 ms per loop
def pd_rolling_quantile(X, window, q):
return pd.DataFrame(X).rolling(window).quantile(quantile=q)
Numpy Strided: %timeit: 2min 42s ± 3.29 s per loop
def strided_app(a, L, S):
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
def np_1d(x, window, q):
return np.pad(np.percentile(strided_app(x, window, 1), q*100, axis=-1), (window-1, 0) , mode='constant')
def np_rolling_quantile(X, window, q):
results = []
for i in np.arange(X.shape[1]):
results.append(np_1d(X[:,i], window, q))
return np.column_stack(results)
Multiprocessing: %timeit: 1.13 s ± 27.6 ms per loop
def mp_rolling_quantile(X, window, q):
pool = mp.Pool(processes=12)
results = pool.map(partial(pd_rolling_quantile, window=window, q=q), [X[:,i] for i in np.arange(X.shape[1])])
pool.close()
pool.join()
return np.column_stack(results)
Numba: %timeit: 2min 28s ± 182 ms per loop
#nb.njit
def nb_1d(x, window, q):
out = np.zeros(x.shape[0])
for i in np.arange(x.shape[0]-window+1)+window:
out[i-1] = np.quantile(x[i-window:i], q=q)
return out
def nb_rolling_quantile(X, window, q):
results = []
for i in np.arange(X.shape[1]):
results.append(nb_1d(X[:,i], window, q))
return np.column_stack(results)
The timings are not great, and ideally I would target an improvement of 10-50x by speed. I would appreciate any suggestions, how to speed it up. Maybe someone has ideas on using lower level languages (Cython), or other ways to speed it up with Numpy/Numba/Tensorflow based methods. Thanks!
I would recommend the new rolling-quantiles package.
To demonstrate, even the somewhat naive approach of constructing a separate filter for each column outperforms the above single-threaded pandas experiment:
pipes = [rq.Pipeline(rq.LowPass(window=1000, quantile=0.1)) for i in range(1000)]
%timeit [pipe.feed(X[:, i]) for i, pipe in enumerate(pipes)]
1.34 s ± 7.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
versus
df = pd.DataFrame(X)
%timeit df.rolling(1000).quantile(0.1)
5.63 s ± 27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Both can be trivially parallelized by means of multiprocessing, as you showed.

Python speed up function to allocate time dependent elements

I have the following problem: The code below is relatively slow and I bet there is a faster way to do the appropriate work.
I have two panda time series. dataY is at daily frequency, dataX is at monthly frequency. They overlap in time. I want to create a matrix newX which has len(dataY) rows and lags columns, where lags is the number of monthly observations from dataX back in time. Thus, for each day in dataY, I want to have a row in newX with the past lags observations to that day.
Below my function and an example to run the function.
import pandas as pd
import numpy as np
def get_X(dataY, dataX, freq, lags):
# provide dataY and dataX with datetime index
# monthly data
if freq == 'm':
firstday = dataY.index[0]
firstmonth = dataX.index[0]
# which Series starts first
if firstday < firstmonth:
startingday = firstmonth + pd.DateOffset(months=lags)
else:
startingday = firstday + pd.DateOffset(months=lags)
# initialize adjusted Series
newY = dataY[startingday:]
newX = np.zeros((len(newY),lags))
for i in range(len(newY)):
iday = newY.index[i]
# allocate the laged values of dataX per timestamp in dataY
for l in range(0,lags):
newX[i,l] = np.array(dataX[(iday + pd.DateOffset(months=-l)).strftime('%Y-%m')])[0]
newX = pd.DataFrame(newX, index = newY.index)
return [newX, newY]
nobs = 500
dataY = pd.DataFrame(range(0,nobs), index=pd.date_range(start='1/1/2018', periods=nobs))
indexX = pd.date_range(start='1/1/2016', end=dataY.index[len(dataY)-1], freq='M') + pd.DateOffset(days=1)
dataX = pd.DataFrame(range(0,len(indexX)), index=indexX)
%timeit -r5 get_X(dataY, dataX, freq='m', lags=12)
This example leads to the following outcome
1.42 s ± 113 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)
The actually setup however, includes nobs = 5000 and lags = 24, which is even slower.
1min 30s ± 2.25 s per loop (mean ± std. dev. of 5 runs, 1 loop each)
I highly appreciate any help.

Format RGB to Hex using Numpy Array Operations

My goal is to convert a list of pixels from RGB to Hex as quickly as possible. The format is a Numpy dimensional array (rgb colorspace) and ideally it would be converted from RGB to Hex and maintain it's shape.
My attempt at doing this uses list comprehension and with the exception of performance, it solves it. Performance wise, adding the ravel and list comprehension really slows this down. Unfortunately I just don't know enough math to know the solution of how to to speed this up:
Edited: Updated code to reflex most recent changes. Current running around 24ms on 35,000 pixel image.
def np_array_to_hex(array):
array = np.asarray(array, dtype='uint32')
array = (1 << 24) + ((array[:, :, 0]<<16) + (array[:, :, 1]<<8) + array[:, :, 2])
return [hex(x)[-6:] for x in array.ravel()]
>>> np_array_to_hex(img)
['afb3bc', 'abaeb5', 'b3b4b9', ..., '8b9dab', '92a4b2', '9caebc']
I tried it with a LUT ("Look Up Table") - it takes a few seconds to initialise and it uses 100MB (0.1GB) of RAM, but that's a small price to pay amortised over a million images:
#!/usr/bin/env python3
import numpy as np
def np_array_to_hex1(array):
array = np.asarray(array, dtype='uint32')
array = ((array[:, :, 0]<<16) + (array[:, :, 1]<<8) + array[:, :, 2])
return array
def np_array_to_hex2(array):
array = np.asarray(array, dtype='uint32')
array = (1 << 24) + ((array[:, :, 0]<<16) + (array[:, :, 1]<<8) + array[:, :, 2])
return [hex(x)[-6:] for x in array.ravel()]
def me(array, LUT):
h, w, d = array.shape
# Reshape to a color vector
z = np.reshape(array,(-1,3))
# Make array and fill with 32-bit colour number
y = np.zeros((h*w),dtype=np.uint32)
y = z[:,0]*65536 + z[:,1]*256 + z[:,2]
return LUT[y]
# Define dummy image of 35,000 RGB pixels
w,h = 175, 200
im = np.random.randint(0,256,(h,w,3),dtype=np.uint8)
# %timeit np_array_to_hex1(im)
# 112 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# %timeit np_array_to_hex2(im)
# 8.42 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# This may take time to set up, but amortize that over a million images...
LUT = np.zeros((256*256*256),dtype='a6')
for i in range(256*256*256):
h = hex(i)[2:].zfill(6)
LUT[i] = h
# %timeit me(im,LUT)
# 499 µs ± 8.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So that appears to be 4x slower than your fastest which doesn't work, and 17x faster that your slowest which does work.
My next suggestion is to use multi-threading or multi-processing so all your CPU cores get busy in parallel and reduce your overall time by a factor of 4 or more assuming you have a reasonably modern 4+ core CPU.

Calculate histograms along axis

Is there a way to calculate many histograms along an axis of an nD-array? The method I currently have uses a for loop to iterate over all other axes and calculate a numpy.histogram() for each resulting 1D array:
import numpy
import itertools
data = numpy.random.rand(4, 5, 6)
# axis=-1, place `200001` and `[slice(None)]` on any other position to process along other axes
out = numpy.zeros((4, 5, 200001), dtype="int64")
indices = [
numpy.arange(4), numpy.arange(5), [slice(None)]
]
# Iterate over all axes, calculate histogram for each cell
for idx in itertools.product(*indices):
out[idx] = numpy.histogram(
data[idx],
bins=2 * 100000 + 1,
range=(-100000 - 0.5, 100000 + 0.5),
)[0]
out.shape # (4, 5, 200001)
Needless to say this is very slow, however I couldn't find a way to solve this using numpy.histogram, numpy.histogram2d or numpy.histogramdd.
Here's a vectorized approach making use of the efficient tools np.searchsorted and np.bincount. searchsorted gives us the loactions where each element is to be placed based on the bins and bincount does the counting for us.
Implementation -
def hist_laxis(data, n_bins, range_limits):
# Setup bins and determine the bin location for each element for the bins
R = range_limits
N = data.shape[-1]
bins = np.linspace(R[0],R[1],n_bins+1)
data2D = data.reshape(-1,N)
idx = np.searchsorted(bins, data2D,'right')-1
# Some elements would be off limits, so get a mask for those
bad_mask = (idx==-1) | (idx==n_bins)
# We need to use bincount to get bin based counts. To have unique IDs for
# each row and not get confused by the ones from other rows, we need to
# offset each row by a scale (using row length for this).
scaled_idx = n_bins*np.arange(data2D.shape[0])[:,None] + idx
# Set the bad ones to be last possible index+1 : n_bins*data2D.shape[0]
limit = n_bins*data2D.shape[0]
scaled_idx[bad_mask] = limit
# Get the counts and reshape to multi-dim
counts = np.bincount(scaled_idx.ravel(),minlength=limit+1)[:-1]
counts.shape = data.shape[:-1] + (n_bins,)
return counts
Runtime test
Original approach -
def org_app(data, n_bins, range_limits):
R = range_limits
m,n = data.shape[:2]
out = np.zeros((m, n, n_bins), dtype="int64")
indices = [
np.arange(m), np.arange(n), [slice(None)]
]
# Iterate over all axes, calculate histogram for each cell
for idx in itertools.product(*indices):
out[idx] = np.histogram(
data[idx],
bins=n_bins,
range=(R[0], R[1]),
)[0]
return out
Timings and verification -
In [2]: data = np.random.randn(4, 5, 6)
...: out1 = org_app(data, n_bins=200001, range_limits=(- 2.5, 2.5))
...: out2 = hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
...: print np.allclose(out1, out2)
...:
True
In [3]: %timeit org_app(data, n_bins=200001, range_limits=(- 2.5, 2.5))
10 loops, best of 3: 39.3 ms per loop
In [4]: %timeit hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
100 loops, best of 3: 3.17 ms per loop
Since, in the loopy solution, we are looping through the first two axes. So, let's increase their lengths as that would show us how good is the vectorized one -
In [59]: data = np.random.randn(400, 500, 6)
In [60]: %timeit org_app(data, n_bins=21, range_limits=(- 2.5, 2.5))
1 loops, best of 3: 9.59 s per loop
In [61]: %timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
10 loops, best of 3: 44.2 ms per loop
In [62]: 9590/44.2 # Speedup number
Out[62]: 216.9683257918552
The first solution provided a nice short idiom which uses numpy sortedsearch which has the cost of a sort and many searches. But numpy has a fast route in its source code which is done in Python in fact, which can deal with equal bin edge range mathematically. This solution uses only a vectorized subtraction and multiplication and some comparisons instead.
This solution will follow numpy code for the search sorted, type imputations, and handles weights as well as complex numbers. It is basically the first solution combined with numpy histogram fast route, and some extra type, and iteration details, etc.
_range = range
def hist_np_laxis(a, bins=10, range=None, weights=None):
# Initialize empty histogram
N = a.shape[-1]
data2D = a.reshape(-1,N)
limit = bins*data2D.shape[0]
# gh-10322 means that type resolution rules are dependent on array
# shapes. To avoid this causing problems, we pick a type now and stick
# with it throughout.
bin_type = np.result_type(range[0], range[1], a)
if np.issubdtype(bin_type, np.integer):
bin_type = np.result_type(bin_type, float)
bin_edges = np.linspace(range[0],range[1],bins+1, endpoint=True, dtype=bin_type)
# Histogram is an integer or a float array depending on the weights.
if weights is None:
ntype = np.dtype(np.intp)
else:
ntype = weights.dtype
n = np.zeros(limit, ntype)
# Pre-compute histogram scaling factor
norm = bins / (range[1] - range[0])
# We set a block size, as this allows us to iterate over chunks when
# computing histograms, to minimize memory usage.
BLOCK = 65536
# We iterate over blocks here for two reasons: the first is that for
# large arrays, it is actually faster (for example for a 10^8 array it
# is 2x as fast) and it results in a memory footprint 3x lower in the
# limit of large arrays.
for i in _range(0, data2D.shape[0], BLOCK):
tmp_a = data2D[i:i+BLOCK]
block_size = tmp_a.shape[0]
if weights is None:
tmp_w = None
else:
tmp_w = weights[i:i + BLOCK]
# Only include values in the right range
keep = (tmp_a >= range[0])
keep &= (tmp_a <= range[1])
if not np.logical_and.reduce(np.logical_and.reduce(keep)):
tmp_a = tmp_a[keep]
if tmp_w is not None:
tmp_w = tmp_w[keep]
# This cast ensures no type promotions occur below, which gh-10322
# make unpredictable. Getting it wrong leads to precision errors
# like gh-8123.
tmp_a = tmp_a.astype(bin_edges.dtype, copy=False)
# Compute the bin indices, and for values that lie exactly on
# last_edge we need to subtract one
f_indices = (tmp_a - range[0]) * norm
indices = f_indices.astype(np.intp)
indices[indices == bins] -= 1
# The index computation is not guaranteed to give exactly
# consistent results within ~1 ULP of the bin edges.
decrement = tmp_a < bin_edges[indices]
indices[decrement] -= 1
# The last bin includes the right edge. The other bins do not.
increment = ((tmp_a >= bin_edges[indices + 1])
& (indices != bins - 1))
indices[increment] += 1
((bins*np.arange(i, i+block_size)[:,None] * keep)[keep].reshape(indices.shape) + indices).reshape(-1)
#indices = scaled_idx.reshape(-1)
# We now compute the histogram using bincount
if ntype.kind == 'c':
n.real += np.bincount(indices, weights=tmp_w.real,
minlength=limit)
n.imag += np.bincount(indices, weights=tmp_w.imag,
minlength=limit)
else:
n += np.bincount(indices, weights=tmp_w,
minlength=limit).astype(ntype)
n.shape = a.shape[:-1] + (bins,)
return n
data = np.random.randn(4, 5, 6)
out1 = hist_laxis(data, n_bins=200001, range_limits=(- 2.5, 2.5))
out2 = hist_np_laxis(data, bins=200001, range=(- 2.5, 2.5))
print(np.allclose(out1, out2))
True
%timeit hist_np_laxis(data, bins=21, range=(- 2.5, 2.5))
92.1 µs ± 504 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
55.1 µs ± 3.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Although the first solution is faster in the small example and even the larger example:
data = np.random.randn(400, 500, 6)
%timeit hist_np_laxis(data, bins=21, range=(- 2.5, 2.5))
264 ms ± 2.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit hist_laxis(data, n_bins=21, range_limits=(- 2.5, 2.5))
71.6 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
It is not ALWAYS faster:
data = np.random.randn(400, 6, 500)
%timeit hist_np_laxis(data, bins=101, range=(- 2.5, 2.5))
71.5 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit hist_laxis(data, n_bins=101, range_limits=(- 2.5, 2.5))
76.9 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
However, the numpy variation is only faster when the last axis is large. And its a very slight increase. In all other cases I tried, the first solution is much faster regardless of bin count and size of the first 2 dimensions. The only important line ((bins*np.arange(i, i+block_size)[:,None] * keep)[keep].reshape(indices.shape) + indices).reshape(-1) might be more optimizable though I have not found a faster method yet.
This would also imply the sheer number of vectorized operations of O(n) is outdoing the O(n log n) of the sort and repeated incremental searches.
However, realistic use cases will have a last axis with a lot of data and the prior axes with few. So in reality the samples in the first solution are too contrived to fit the desired performance.
Axis addition for histogram is noted as an issue in the numpy repo: https://github.com/numpy/numpy/issues/13166.
An xhistogram library has sought to solve this problem as well: https://xhistogram.readthedocs.io/en/latest/

Categories