I use below code to create a empty matrix:
import numpy as np
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(x)
y =np.empty_like(x)
print(y)
# I get below data:
[[2097184 2097184 2097184]
[2097184 2097184 2097184]
[2097184 2097184 2097184]
[2097184 2097184 2097184]]
why the 2097184 stand for empty?
It doesn't stand for anything. From the documentation:
This function does not initialize the returned array; to do that use zeros_like or ones_like instead. It may be marginally faster than the functions that do set the array values.
So the contents of the array are whatever happens to be in the memory that it used for it. In this case, it was a bunch of 2097184 values. The next time you try it you'll probably get something different.
You use this when you don't care what's in the array, because you're going to overwrite it.
The empty_like method does not initialize the array (that's why it's very faster than zeros_like and ones_like), so the shape of the array is exactly the same as x, but the values are uninitialized and actually are almost random values from the memory place allocated to the array.
In addition, it's just a more efficient alternative to zeros_like or ones_like:
%%timeit
np.zeros_like(x)
>>> 18.4 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.ones_like(x)
>>> 14.1 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.empty_like(x)
>>> 2.09 µs ± 62.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Related
I have a 1D array of integers with D elements (i.e. idx = np.array([i0, i1, ...]), s.t. idx.size = D), where each element corresponds to the index along that dimension of an ND array with D dimensions (i.e. data s.t. data.ndim = D). How can I index the data array using the index array idx?
In python I would do data[tuple(idx)], but tuple aren't supported in numba nopython mode.
My current workaround is to use data.ravel() and convert from ND indices to 1D indices of the flattened array, but it seems like there must be an easier (and computationally faster) solution. Is there a take_along_each_axis(data, idx) method somewhere?
Lets do a bit of time testing:
In [135]: data = np.ones((100,100,100,100)); idx = (50,50,50,50)
That's nearly a Gb of memory - not huge enough to create a memory error, but still should be a reasonable test. Actually, I get the same time for basic indexing for much smaller arrays. And for other idx values
In [136]: timeit data[idx]
212 ns ± 9.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
the interpreter translates that into a method call:
In [137]: timeit data.__getitem__(idx)
283 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
indexing the 'flat' array, can be done with:
In [138]: timeit data.flat[np.ravel_multi_index(idx,data.shape)]
6.65 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
or taking the conversion out of the loop:
In [139]: %%timeit x=np.ravel_multi_index(idx,data.shape)
...: data.flat[x]
574 ns ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [142]: %%timeit x=np.ravel_multi_index(idx,data.shape);df=data.flat
...: df[x]
345 ns ± 6.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
I think there are cases where flat indexing is faster, but this isn't one.
So a stand alone operation I don't see the point to writing a njit version. I suppose if it's part of some larger operation it could be worth it.
It seems numpy.transpose only save strides, and do actually transpose lazily according to this
So, when data movement actually happened and how to move? use many many memcpy? or some other trick?
I follow the path:
array_reshape,
PyArray_Newshape,
PyArray_NewCopy,
PyArray_NewLikeArray,
PyArray_NewFromDescr,
PyArray_NewFromDescrAndBase,
PyArray_NewFromDescr_int
but see nothing about axis permute. When did it happen indeed?
Update 2021/1/19
Thanks for answers, numpy array copy with transpose is here, which use a common macro to implement it, this algorithm is very native, and it does not consider any of simd acceleration or cache friendliness
The answer to your question is: Numpy doesn't move data.
Did you see PyArray_Transpose on line 688 of your above links? There is a permute in this function,
n = permute->len;
axes = permute->ptr;
...
for (i = 0; i < n; i++) {
int axis = axes[i];
...
permutation[i] = axis;
}
Any array shape is purely metadata, used by Numpy to understand how to handle the data, as memory is always stored linearly and contiguously. There is therefore no reason to move or reorder any data, from the docs here,
Other operations, such as transpose, don't move data elements
around in the array, but rather change the information about the shape and strides so that the indexing of the array changes, but the data in the doesn't move.
Typically these new versions of the array metadata but the same data buffer are
new 'views' into the data buffer. There is a different ndarray object, but it
uses the same data buffer. This is why it is necessary to force copies through
use of the .copy() method if one really wants to make a new and independent
copy of the data buffer.
The only reason to copy may be to maximize cache efficiency, although Numpy already considers this,
As it turns out, numpy is smart enough when dealing with ufuncs to determine which index is the most rapidly varying one in memory and uses that for the innermost loop.
Tracing through the numpy C code is a slow and tedious process. I prefer to deduce patterns of behavior from timings.
Make a sample array and its transpose:
In [168]: A = np.random.rand(1000,1000)
In [169]: At = A.T
First a fast view - no coping of the databuffer:
In [171]: timeit B = A.ravel()
262 ns ± 4.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
A fast copy (presumably uses some fast block memory coping):
In [172]: timeit B = A.copy()
2.2 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
A slow copy (presumably requires traversing the source in its strided order, and the target in its own order):
In [173]: timeit B = A.copy(order='F')
6.29 ms ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Copying At without having to change the order - fast:
In [174]: timeit B = At.copy(order='F')
2.23 ms ± 51.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Like [173] but going from 'F' to 'C':
In [175]: timeit B = At.copy(order='C')
6.29 ms ± 4.16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [176]: timeit B = At.ravel()
6.54 ms ± 214 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Copies with simpler strided reordering fall somewhere in between:
In [177]: timeit B = A[::-1,::-1].copy()
3.75 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [178]: timeit B = A[::-1].copy()
3.73 ms ± 6.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [179]: timeit B = At[::-1].copy(order='K')
3.98 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This astype also requires the slower copy:
In [182]: timeit B = A.astype('float128')
6.7 ms ± 8.12 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PyArray_NewFromDescr_int is described as Generic new array creation routine. While I can't figure out where it copies data from the source to the target, it clearly is checking order and strides and dtype. Presumably it handles all cases where the generic copy is required. The axis permutation isn't a special case.
I am wondering if there is any downside of using b = np.array(a) rather than b = np.copy(a) to copy a Numpy array a into b. When I %timeit, the former can be upto 100% faster.
In both cases b is a is False, and I can manipulate b leaving a intact, so I suppose this does what is expected from .copy().
Am I missing anything? What is improper about using np.array to do copy an array?
with python 3.6.5, numpy 1.14.2, while the speed difference closes rapidly for larger sizes:
a = np.arange(1000)
%timeit np.array(a)
501 ns ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.copy(a)
1.1 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
From documentation of numpy.copy:
This is equivalent to:
>>> np.array(a, copy=True)
Also, if you look at the source code:
def copy(a, order='K'):
return array(a, order=order, copy=True)
Some timings:
In [1]: import numpy as np
In [2]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000))
In [3]: %timeit b = np.array(a)
562 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit b = np.array(a, order='K', copy=True)
1.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit b = np.copy(a)
1.21 µs ± 9.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000000))
In [7]: %timeit b = np.array(a)
310 µs ± 6.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: %timeit b = np.array(a, order='K', copy=True)
311 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit b = np.copy(a)
313 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: print(np.__version__)
1.13.3
It is unexpected that simply explicitly setting parameters to their default values changes the speed of execution of np.array(). On the other hand, maybe just processing these explicit arguments adds enough execution time to make a difference for small arrays. Indeed, from the source code for the numpy.array(), one can see that there are many more checks and more processing being performed when keyword arguments are provided, for example, see goto full_path. When keyword parameters are not set, the execution skips all the way down to goto finish. This overhead (of additional processing of keyword arguments) is what you detect in timings for small arrays. For larger arrays this overhead is insignificant in comparison to the actual time of copying the arrays.
"What is improper about using np.array to do copy an array?"
I'd argue it is harder to read. Because it is not obvious that array makes a copy, for example, the similar asarray does not make a copy if it doesn't have to. The reader basically has to know the default value of the copy keyword argument to be sure.
As AGN pointed out, np.array is faster than np.copy because essentially the latter is a wrapper of the former. This means python "loses" some extra time searching for both functions. A similar thing happens with decorators.
This extra time is insignificant for pratical purposes, and you gain better code readability.
You can test it by using a big array (where the array creation takes the main time), and you'll see very little differences in %timeit for both.
Is there a built-in function to join two 1D arrays into a 2D array?
Consider an example:
X=np.array([1,2])
y=np.array([3,4])
result=np.array([[1,3],[2,4]])
I can think of 2 simple solutions.
The first one is pretty straightforward.
np.transpose([X,y])
The other one employs a lambda function.
np.array(list(map(lambda i: [a[i],b[i]], range(len(X)))))
While the second one looks more complex, it seems to be almost twice as fast as the first one.
Edit
A third solution involves the zip() function.
np.array(list(zip(X, y)))
It's faster than the lambda function but slower than column_stack solution suggested by #Divakar.
np.column_stack((X,y))
Take into consideration scalability. If we increase the size of the arrays, complete numpy solutions are quite faster than solutions involving python built-in operations:
np.random.seed(1234)
X = np.random.rand(10000)
y = np.random.rand(10000)
%timeit np.array(list(map(lambda i: [X[i],y[i]], range(len(X)))))
6.64 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.array(list(zip(X, y)))
4.53 ms ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.column_stack((X,y))
19.2 µs ± 30.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.transpose([X,y])
16.2 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.vstack((X, y)).T
14.2 µs ± 94.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Taking into account all proposed solutions, np.vstack(X,y).T is the fastest when working with greater array sizes.
This is one way:
import numpy as np
X = np.array([1,2])
y = np.array([3,4])
result = np.vstack((X, y)).T
print(result)
# [[1 3]
# [2 4]]
I am interested in finding the fastest way of carrying a simple operation in Python3.6 using Numpy. I wish to create a function and from a given array to an array of function values. Here is a simplified code that does that using map:
import numpy as np
def func(x):
return x**2
xRange = np.arange(0,1,0.01)
arr_func = np.array(list(map(func, xRange)))
However, as I am running it with a complicated function and using large arrays, runtime speed is very important for me. Is there a known faster way?
EDIT My question is not the same as this one, because I am asking about assigning from a function, as opposed to a generator.
Check the related How do I build a numpy array from a generator?, where the most compelling option seems to be preallocating the numpy array and setting values, instead of creating a throwaway intermediate list.
arr_func = np.empty(len(xRange))
for i in range(len(xRange)):
arr_func[i] = func(xRange[i])
With a complex function that can't be rewritten with compiled numpy functions, we can't make big improvements in speed.
Define a function with math methods that require scalars, for example:
def func(x):
return math.sin(x)**2 + math.cos(x)**2
In [868]: x = np.linspace(0,np.pi,10000)
For reference do a straight forward list comprehension:
In [869]: np.array([func(i) for i in x])
Out[869]: array([ 1., 1., 1., ..., 1., 1., 1.])
In [870]: timeit np.array([func(i) for i in x])
13.4 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Your list map is slightly faster:
In [871]: timeit np.array(list(map(func, x)))
12.6 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
For 1d array like this, np.array can be replaced with np.fromiter. It works with a generator as well, including the Py3 map.
In [875]: timeit np.fromiter(map(func, x),float)
13.1 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So that could get around the possible time penalty of creating a whole list first. But in this case it doesn't help.
Another iterator is np.frompyfunc. It is used by np.vectorize, but usually is faster with less overhead. It returns a dtype object array:
In [876]: f = np.frompyfunc(func, 1, 1)
In [877]: f(x)
Out[877]: array([1.0, 1.0, 1.0, ..., 1.0, 1.0, 1.0], dtype=object)
In [878]: timeit f(x)
11.1 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [879]: timeit f(x).astype(float)
11.2 ms ± 85.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
A slight speed improvement. I noticed more of an improvement with 1000 item x. This is even better if your problem requires several arrays that may be broadcasted against each other.
Assigning to a preallocated out array may save memory, and is often recommended as a alternative to the list append iteration. But here it doesn't not give a speed improvement:
In [882]: %%timeit
...: out = np.empty_like(x)
...: for i,j in enumerate(x): out[i]=func(j)
16.1 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(the use of enumerate is slightly faster than range iteration).