efficient numpy array creation

efficient numpy array creation - python

Given x, I want to produce x, log(x) as a numpy array whereby x has shape s, the result has shape (*s, 2). What's the neatest way to do this? x may just be a float, in which case I want a result with shape (2,).
An ugly way to do this is:
import numpy as np
x = np.asarray(x)
result = np.empty((*x.shape, 2))
result[..., 0] = x
result[..., 1] = np.log(x)

It's important to separate aesthetics from performance. Sometimes ugly code is
fast. In fact, that's the case here. Although creating an empty array and then
assigning values to slices may not look beautiful, it is fast.
import numpy as np
import timeit
import itertools as IT
import pandas as pd
def using_empty(x):
x = np.asarray(x)
result = np.empty(x.shape + (2,))
result[..., 0] = x
result[..., 1] = np.log(x)
return result
def using_concat(x):
x = np.asarray(x)
return np.concatenate([x, np.log(x)], axis=-1).reshape(x.shape+(2,), order='F')
def using_stack(x):
x = np.asarray(x)
return np.stack([x, np.log(x)], axis=x.ndim)
def using_ufunc(x):
return np.array([x, np.log(x)])
using_ufunc = np.vectorize(using_ufunc, otypes=[np.ndarray])
tests = [np.arange(600),
np.arange(600).reshape(20,30),
np.arange(960).reshape(8,15,8)]
# check that all implementations return the same result
for x in tests:
assert np.allclose(using_empty(x), using_concat(x))
assert np.allclose(using_empty(x), using_stack(x))
timing = []
funcs = ['using_empty', 'using_concat', 'using_stack', 'using_ufunc']
for test, func in IT.product(tests, funcs):
timing.append(timeit.timeit(
'{}(test)'.format(func),
setup='from __main__ import test, {}'.format(func), number=1000))
timing = pd.DataFrame(np.array(timing).reshape(-1, len(funcs)), columns=funcs)
print(timing)
yields, the following timeit results on my machine:
using_empty using_concat using_stack using_ufunc
0 0.024754 0.025182 0.030244 2.414580
1 0.025766 0.027692 0.031970 2.408344
2 0.037502 0.039644 0.044032 3.907487
So using_empty is the fastest (of the options tested applied to tests).
Note that np.stack does exactly what you want, so
np.stack([x, np.log(x)], axis=x.ndim)
looks reasonably pretty, but it is also the slowest of the three options tested.
Note that along with being much slower, using_ufunc returns an array of object dtype:
In [236]: x = np.arange(6)
In [237]: using_ufunc(x)
Out[237]:
array([array([ 0., -inf]), array([ 1., 0.]),
array([ 2. , 0.69314718]),
array([ 3. , 1.09861229]),
array([ 4. , 1.38629436]), array([ 5. , 1.60943791])], dtype=object)
which is not the same as the desired result:
In [240]: using_empty(x)
Out[240]:
array([[ 0. , -inf],
[ 1. , 0. ],
[ 2. , 0.69314718],
[ 3. , 1.09861229],
[ 4. , 1.38629436],
[ 5. , 1.60943791]])
In [238]: using_ufunc(x).shape
Out[238]: (6,)
In [239]: using_empty(x).shape
Out[239]: (6, 2)

Related

numpy vectorize use on (2,) array

I have a numpy array of (m, 2) and I want to transform it to shape of (m, 1) using a function below.
def func(x):
if x == [1., 1.]:
return 0.
if x == [-1., 1.] or x == [-1., -1.]:
return 1.
if x == [1., -1.]:
return 2.
I want this for applied on each (2,) vector inside the (m, 2) array resulting an (m, 1) array. I tried to use numpy.vectorize but it seems that the function gets applied in each element of a array (which makes sense in general purpose case). So I have failed to apply it.
My intension is not to use for loop. Can anyone help me with this? Thanks.

import numpy as np
def f(a, b):
return a + b
F = np.vectorize(f)
x = np.asarray([[1, 2], [3, 4], [5, 6]]).T
print(F(*x))
Output:
[3, 7, 11]

Fastest way to convert a list of indices to 2D numpy array of ones

I have a list of indices
a = [
[1,2,4],
[0,2,3],
[1,3,4],
[0,2]]
What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?
I.e. what I want is:
output = array([
[0,1,1,0,1],
[1,0,1,1,0],
[0,1,0,1,1],
[1,0,1,0,0]])
I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?
My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.

How about this:
ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:
pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms
Code to produce timings:
import itertools as it
import numpy as np
def make_data(n,m):
I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]
def pp():
sz = np.fromiter(map(len,a),int,nrow)
out = np.zeros((nrow,ncol),int)
out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
return out
def ts():
out = np.zeros((nrow,ncol),int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
def u9():
out = np.zeros((nrow,ncol),int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
nrow,ncol = 1000,1000
a = make_data(nrow,ncol)
from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()
print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")

This might not be the fastest way. You will need to compare execution times of these answers using large arrays in order to find out the fastest way. Here's my solution
output = np.zeros((4,5))
for i, ix in enumerate(a):
output[i][ix] = 1
# output ->
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])

In case you can and want to use Cython you can create a readable (at least if you don't mind the typing) and fast solution.
Here I'm using the IPython bindings of Cython to compile it in a Jupyter notebook:
%load_ext cython
%%cython
cimport cython
cimport numpy as cnp
import numpy as np
#cython.boundscheck(False) # remove this if you cannot guarantee that nrow/ncol are correct
#cython.wraparound(False)
cpdef cnp.int_t[:, :] mseifert(list a, int nrow, int ncol):
cdef cnp.int_t[:, :] out = np.zeros([nrow, ncol], dtype=int)
cdef list subl
cdef int row_idx
cdef int col_idx
for row_idx, subl in enumerate(a):
for col_idx in subl:
out[row_idx, col_idx] = 1
return out
To compare the performance of the solutions presented here I use my library simple_benchmark:
Note that this uses logarithmic axis to simultaneously show the differences for small and large arrays. According to my benchmark my function is actually the fastest of the solutions, however it's also worth pointing out that all of the solutions aren't too far off.
Here is the complete code I used for the benchmark:
import numpy as np
from simple_benchmark import BenchmarkBuilder, MultiArgument
import itertools
b = BenchmarkBuilder()
#b.add_function()
def pp(a, nrow, ncol):
sz = np.fromiter(map(len, a), int, nrow)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat(sz), np.fromiter(itertools.chain.from_iterable(a), int, sz.sum())] = 1
return out
#b.add_function()
def ts(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
#b.add_function()
def u9(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
b.add_functions([mseifert])
#b.add_arguments("number of rows/columns")
def argument_provider():
for n in range(2, 13):
ncols = 2**n
a = [
sorted(set(np.random.randint(0, ncols, size=np.random.randint(0, ncols))))
for _ in range(ncols)
]
yield ncols, MultiArgument([a, ncols, ncols])
r = b.run()
r.plot()

May not be the best way but the only way I can think of:
output = np.zeros((4,5))
for i, (x, y) in enumerate(zip(a, output)):
y[x] = 1
output[i] = y
print(output)
Which outputs:
[[ 0. 1. 1. 0. 1.]
[ 1. 0. 1. 1. 0.]
[ 0. 1. 0. 1. 1.]
[ 1. 0. 1. 0. 0.]]

How about using array indexing? If you knew more about your input, you could get rid of the penalty for having to convert to a linear array first.
import numpy as np
def main():
row_count = 4
col_count = 5
a = [[1,2,4],[0,2,3],[1,3,4],[0,2]]
# iterate through each row, concatenate all indices and convert them to linear
# numpy append performs copy even if you don't want it, list append is faster
b = []
for row_idx, row in enumerate(a):
b.append(np.array(row, dtype=np.int64) + (row_idx * col_count))
linear_idxs = np.hstack(b)
#could skip previous steps if given index inputs well before hand, or in linear index order.
c = np.zeros(row_count * col_count)
c[linear_idxs] = 1
c = c.reshape(row_count, col_count)
print(c)
if __name__ == "__main__":
main()
#output
# [[0. 1. 1. 0. 1.]
# [1. 0. 1. 1. 0.]
# [0. 1. 0. 1. 1.]
# [1. 0. 1. 0. 0.]]

Depending on your use case, you might look into using sparse matrices. The input matrix looks suspiciously like a Compressed Sparse Row (CSR) matrix. Perhaps something like
import numpy as np
from scipy.sparse import csr_matrix
from itertools import accumulate
def ragged2csr(inds):
offset = len(inds[0])
lens = [len(x) for x in inds]
indptr = list(accumulate(lens))
indptr = np.array([x - offset for x in indptr])
indices = np.array([val for sublist in inds for val in sublist])
n = indices.size
data = np.ones(n)
return csr_matrix((data, indices, indptr))
Again, if it fits in your use case, a sparse matrix would allow elementwise/masking operations to scale with the number of nonzeros, rather than the number of elements (rows*columns), which could bring significant speedup (for a sparse enough matrix).
Another good introduction to CSR matrices is section 3.4 of Iterative Methods. In this case, data is aa, indices is ja and indptr is ia. This format also has the benefit of being very popular among different packages/libraries.

Using two np.linspace, how to fill 2D array with complex values?

I'm trying to fill a 2D array with complex(x,y), where x and y are from two two arrays:
xstep = np.linspace(xmin, xmax, Nx)
ystep = np.linspace(ymin, ymax, Ny)
However I can't figure out how to "spread" these values out on a 2D array.
So far my attempts are not really working out. I was hoping for something along the lines of:
result = np.array(xstep + (1j * ystep))
Maybe something from fromfunction, meshgrid or full, but I can't quite make it work.
As an example, say I do this:
xstep = np.linspace(0, 1, 2) # array([0., 1.])
ystep = np.linspace(0, 1, 3) # array([0. , 0.5, 1. ])
I'm trying to construct an answer:
array([
[0+0j, 0+0.5j, 0+1j],
[1+0j, 1+0.5j, 1+1j]
])
Note that I am not married to the linspace, so any quicker method would also do, it is just my natural starting point for creating this array, being new to Numpy.

In [4]: xstep = np.linspace(0, 1, 2)
In [5]: ystep = np.linspace(0, 1, 3)
In [6]: xstep[:, None] + 1j*ystep
Out[6]:
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[1.+0.j , 1.+0.5j, 1.+1.j ]])
xstep[:, None] is equivalent to xstep[:, np.newaxis] and its purpose is to add a new axis to xstep on the right. Thus, xstep[:, None] is a 2D array of shape (2, 1).
In [19]: xstep[:, None].shape
Out[19]: (2, 1)
xstep[:, None] + 1j*ystep is thus the sum of a 2D array of shape (2, 1) and a 1D array of shape (3,).
NumPy broadcasting resolves this apparent shape conflict by automatically adding new axes (of length 1) on the left. So, by NumPy broadcasting rules, 1j*ystep is promoted to an array of shape (1, 3).
(Notice that xstep[:, None] is required to explicitly add new axes on the right, but broadcasting will automatically add axes on the left. This is why 1j*ystep[None, :] was unnecessary though valid.)
Broadcasting further promotes both arrays to the common shape (2, 3) (but in a memory-efficient way, without copying the data). The values along the axes of length 1 are broadcasted repeatedly:
In [15]: X, Y = np.broadcast_arrays(xstep[:, None], 1j*ystep)
In [16]: X
Out[16]:
array([[0., 0., 0.],
[1., 1., 1.]])
In [17]: Y
Out[17]:
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[0.+0.j , 0.+0.5j, 0.+1.j ]])

You can use np.ogrid with imaginary "step" to obtain linspace semantics:
y, x = np.ogrid[0:1:2j, 0:1:3j]
y + 1j*x
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
Here the ogrid line means make an open 2D grid. axis 0: 0 to 1, 2 steps, axis 1: 0 to 1, 3 steps. The type of the slice "step" acts as a switch, if it is imaginary (in fact anything of complex type) its absolute value is taken and the expression is treated like a linspace. Otherwise range semantics apply.
The return values
y, x
# (array([[0.],
# [1.]]), array([[0. , 0.5, 1. ]]))
are "broadcast ready", so in the example we can simply add them and obtain a full 2D grid.
If we allow ourselves an imaginary "stop" parameter in the second slice (which only works with linspace semantics, so depending on your style you may prefer to avoid it) this can be condensed to one line:
sum(np.ogrid[0:1:2j, 0:1j:3j])
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
A similar but potentially more performant method would be preallocation and then broadcasting:
out = np.empty((y.size, x.size), complex)
out.real[...], out.imag[...] = y, x
out
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
And another one using outer sum:
np.add.outer(np.linspace(0,1,2), np.linspace(0,1j,3))
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])

Use reshape(-1,1) for xstep as:
xstep = np.linspace(0, 1, 2) # array([0., 1.])
ystep = np.linspace(0, 1, 3) # array([0. , 0.5, 1. ])
result = np.array(xstep.reshape(-1,1) + (1j * ystep))
result
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[1.+0.j , 1.+0.5j, 1.+1.j ]])

Vectorizing A Function With Array Parameter

I'm currently trying to apply Chi-Squared analysis to some data.
I want to plot a colourmap of varying values depending on the two coefficients of a model
def f(x, coeff):
return coeff[0] + numpy.exp(coeff[1] * x)
def chi_squared(coeff, x, y, y_err):
return numpy.sum(((y - f(x, coeff) / y_err)**2)
us = numpy.linspace(u0, u1, n)
vs = numpy.linspace(v0, v1, n)
rs = numpy.meshgrid(us, vs)
chi = numpy.vectorize(chi_squared)
chi(rs, x, y, y_error)
I tried vectorizing the function to be able to pass a meshgrid of the varying coefficents to produce the colormap.
The values of x, y, y_err are all 1D arrays of length n.
And u, v are the various changing coefficients.
However this doesn't work, resulting in
IndexError: invalid index to scalar variable.
This is because coeff is passed as a scalar rather than a vector, however I don't know how to correct this.
Update
My aim is to take an array of coordinates
rs = [[[u0, v0], [u1, v0],..,[un, v0]],...,[[u0, vm],..,[un,vm]]
Where each coordinate is the coefficient parameters to be passed to the chi-squared method.
This should return a 2D array populated with Chi-Squared values for the appropriate coordinate
chi = [[c00, c10, ..., cn0], ..., [c0m, c1m, ..., cnm]]
I can then use this data to plot a colormap using imshow

Here's my first attempt to run your code:
In [44]: def f(x, coeff):
...: return coeff[0] + numpy.exp(coeff[1] * x)
...:
...: def chi_squared(coeff, x, y, y_err):
...: return numpy.sum((y - f(x, coeff) / y_err)**2)
(I had to remove the ( in that last line)
First guess at possible array values:
In [45]: x = np.arange(3)
In [46]: y = x
In [47]: y_err = x
In [48]: us = np.linspace(0,1,3)
In [49]: rs = np.meshgrid(us,us)
In [50]: rs
Out[50]:
[array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]]),
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])]
In [51]: chi_squared(rs, x, y, y_err)
/usr/local/bin/ipython3:5: RuntimeWarning: divide by zero encountered in true_divide
import sys
Out[51]: inf
oops, y_err shouldn't have a 0. Try again:
In [52]: y_err = np.array([1,1,1])
In [53]: chi_squared(rs, x, y, y_err)
Out[53]: 53.262865105526018
It also works if I turn the rs list into an array:
In [55]: np.array(rs).shape
Out[55]: (2, 3, 3)
In [56]: chi_squared(np.array(rs), x, y, y_err)
Out[56]: 53.262865105526018
Now, what was the purpose of vectorize?
The f function returns a (n,n) array:
In [57]: f(x, rs)
Out[57]:
array([[ 1. , 1.5 , 2. ],
[ 1. , 2.14872127, 3.71828183],
[ 1. , 3.21828183, 8.3890561 ]])
Lets modify the chi_squared to give sum an axis
In [61]: def chi_squared(coeff, x, y, y_err, axis=None):
...: return numpy.sum((y - f(x, coeff) / y_err)**2, axis=axis)
In [62]: chi_squared(np.array(rs), x, y, y_err)
Out[62]: 53.262865105526018
In [63]: chi_squared(np.array(rs), x, y, y_err, axis=0)
Out[63]: array([ 3. , 6.49033483, 43.77253028])
In [64]: chi_squared(np.array(rs), x, y, y_err, axis=1)
Out[64]: array([ 1.25 , 5.272053 , 46.74081211])
I'm tempted to change the coeff to coeff0, coeff1, to give more control from the start on how this parameter is passed, but it probably doesn't make a difference.
update
Now that you've been more specific about how the coeff values relate to x, y etc, I see that this can be solved with simple broadcasting. No need to use np.vectorize.
First, define a grid that has a different size; that way we, and the code, won't think that each dimension of the coeff grid has anything to do with the x,y values.
In [134]: rs = np.meshgrid(np.linspace(0,1,4), np.linspace(0,1,5), indexing='ij')
In [135]: coeff=np.array(rs)
In [136]: coeff.shape
Out[136]: (2, 4, 5)
Now look at what f looks like when given this coeff and x.
In [137]: f(x, coeff[...,None]).shape
Out[137]: (4, 5, 3)
coeff is effectively (4,5,1), while x is (1,1,3), resulting in a (4,5,3) (by broadcasting rules)
The same thing happens inside chi_squared, with the final step of sum on the last axis (size 3):
In [138]: chi_squared(coeff[...,None], x, y, y_err, axis=-1)
Out[138]:
array([[ 2. , 1.20406718, 1.93676807, 8.40646968,
32.99441808],
[ 2.33333333, 2.15923164, 3.84810347, 11.80559574,
38.73264336],
[ 3.33333333, 3.78106277, 6.42610554, 15.87138846,
45.13753532],
[ 5. , 6.06956056, 9.67077427, 20.60384785,
52.20909393]])
In [139]: _.shape
Out[139]: (4, 5)
One value for each coeff pair of values, the (4,5) grid.

Python Multiprocessing data output wrong

I am trying Multiprocessing in Python. I have written some code which does vector add, but couldn't get the output out of the function. Which mean, the output Z prints out 0 rather than 2.
from multiprocessing import Process
import numpy as np
numThreads = 16
num = 16
numIter = num/numThreads
X = np.ones((num, 1))
Y = np.ones((num, 1))
Z = np.zeros((num, 1))
def add(X,Y,Z,j):
Z[j] = X[j] + Y[j]
if __name__ == '__main__':
jobs = []
for i in range(numThreads):
p = Process(target=add, args=(X, Y, Z, i,))
jobs.append(p)
for i in range(numThreads):
jobs[i].start()
for i in range(numThreads):
jobs[i].join()
print Z[0]
Edit: Took advice of clocker, and changed my code to this:
import multiprocessing
import numpy as np
numThreads = 16
numRows = 32000
numCols = 2
numOut = 3
stride = numRows / numThreads
X = np.ones((numRows, numCols))
W = np.ones((numCols, numOut))
B = np.ones((numRows, numOut))
Y = np.ones((numRows, numOut))
def conv(idx):
Y[idx*stride:idx*stride+stride] = X[idx*stride:idx*stride+stride].dot(W) + B[idx*stride:idx*stride+stride]
if __name__=='__main__':
pool = multiprocessing.Pool(numThreads)
pool.map(conv, range(numThreads))
print Y
And the output is Y instead of a Saxp.

The reason your last line print Z[0] returns [0] instead of [2] is that each of the processes makes an independent copy of Z (or may be Z[j] - not completely sure about that) before modifying it. Either way, a separate process run will guarantee that your original version will be unchanged.
If you were to use the threading module instead the last line would indeed return [2] as expected, but that is not multi-processing.
So, you probably want to use multiprocessing.Pool instead. Going along your experiment purely for illustration, one could do the following:
In [40]: pool = multiprocessing.Pool()
In [41]: def add_func(j):
....: return X[j] + Y[j]
In [42]: pool = multiprocessing.Pool(numThreads)
In [43]: pool.map(add_func, range(numThreads))
Out[43]:
[array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.]),
array([ 2.])]
Have fun!
For your second part to your question, the problem is that the conv() function does not return any value. While the process pool gets a copy of X, B and W for pulling values from, the Y inside conv() is local to each process that is launched. To get the new computed value of Y, you would use something like this:
def conv(idx):
Ylocal_section = X[idx*stride:idx*stride+stride].dot(W) + B[idx*stride:idx*stride+stride]
return Ylocal_section
results = pool.map(conv, range(numThreads)) # then apply each result to Y
for idx in range(numThreads):
Y[idx*stride:idx*stride+stride] = results[idx]
Parallelism can get complicated really fast, and at this point I would evaluate existing libraries that can perform fast 2D convolution. numpy and scipy libraries can be super efficient and might serve your needs better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

efficient numpy array creation - python

Related

numpy vectorize use on (2,) array

Fastest way to convert a list of indices to 2D numpy array of ones

Using two np.linspace, how to fill 2D array with complex values?

Vectorizing A Function With Array Parameter

Python Multiprocessing data output wrong

Categories

Resources