Merging 1D arrays into a 2D array - python

Is there a built-in function to join two 1D arrays into a 2D array?
Consider an example:
X=np.array([1,2])
y=np.array([3,4])
result=np.array([[1,3],[2,4]])
I can think of 2 simple solutions.
The first one is pretty straightforward.
np.transpose([X,y])
The other one employs a lambda function.
np.array(list(map(lambda i: [a[i],b[i]], range(len(X)))))
While the second one looks more complex, it seems to be almost twice as fast as the first one.
Edit
A third solution involves the zip() function.
np.array(list(zip(X, y)))
It's faster than the lambda function but slower than column_stack solution suggested by #Divakar.
np.column_stack((X,y))

Take into consideration scalability. If we increase the size of the arrays, complete numpy solutions are quite faster than solutions involving python built-in operations:
np.random.seed(1234)
X = np.random.rand(10000)
y = np.random.rand(10000)
%timeit np.array(list(map(lambda i: [X[i],y[i]], range(len(X)))))
6.64 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.array(list(zip(X, y)))
4.53 ms ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.column_stack((X,y))
19.2 µs ± 30.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.transpose([X,y])
16.2 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.vstack((X, y)).T
14.2 µs ± 94.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Taking into account all proposed solutions, np.vstack(X,y).T is the fastest when working with greater array sizes.

This is one way:
import numpy as np
X = np.array([1,2])
y = np.array([3,4])
result = np.vstack((X, y)).T
print(result)
# [[1 3]
# [2 4]]

Related

Need help in understanding the loop speed with timeit function in python

I need help in understanding the %timeit function works in the two programs.
Program A
a = [1,3,2,4,1,4,2]
%timeit [val + 5 for val in a]
830 ns ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Program B
import numpy as np
a = np.array([1,3,2,4,1,4,2])
%timeit [a+5]
1.07 µs ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
My confusion:
µs is bigger than ns. How does the NumPy function execute slower than for loop here?
1.07 µs ± 23.7 ns per loop... why is the loop speed calculated in ns and not in µs?
Numpy adds an overhead, this will impact the speed on small datasets. Vectorization is mostly useful when using large datasets.
You must try on larger numbers:
N = 10_000_000
a = list(range(N))
%timeit [val + 5 for val in a]
import numpy as np
a = np.arange(N)
%timeit a+5
Output:
1.51 s ± 318 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
55.8 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

What's under the hood of numpy's 'mean' function such that it works faster than built in python methods?

I've been exploring the performance differences between numpy functions and the normal built-in functions of Python, and I want to know how numpy functions are so optimized such that there's almost a 100x speed up.
Below is some code that I wrote to highlight the execution time differences between numpy mean() and manual calculation of mean using sum() and len()
import numpy as np
import time
n = 10**7
a = np.random.randn(n)
start = time.perf_counter()
mean = sum(a)/len(a)
seconds1 = time.perf_counter()-start
start = time.perf_counter()
mean = np.mean(a)
seconds2 = time.perf_counter()-start
print("First method takes time {:.3f}s".format(seconds1))
print("Second method takes time {:.3f}s".format(seconds2))
Output:-
First method takes 1.687s
Second method takes 0.013s
Make a numpy array:
In [130]: a=np.arange(10000)
Apply the numpy sum function:
In [131]: timeit np.sum(a)
16.2 µs ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
mean is a bit slower, since it has to divide by the shape (and may do a few other tests):
In [132]: timeit np.mean(a)
34.9 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
np.sum actually delegates the action to the sum method of the array, so using that directly is a bit faster:
In [133]: timeit a.sum()
13.3 µs ± 25.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Python sum isn't a bad function, but it iterates over its argument. Iterating (in Python code) on an array is slow:
In [134]: timeit sum(a)
1.16 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Converting the array to a list first saves time:
In [135]: timeit sum(a.tolist())
369 µs ± 7.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Better yet if we just time the list operation:
In [136]: %%timeit alist=a.tolist()
...: sum(alist)
57.2 µs ± 294 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
When working with numpy arrays, it is best to use its own methods (or numpy functions). Generally when using Python functions, it is better to use lists.
Using a numpy function on a list is slow, because it has to first convert the list to an array:
In [137]: %%timeit alist=a.tolist()
...: np.sum(alist)
795 µs ± 28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

NumPy: Create a multidimensional array from an iterable

I have an iterable of tuples, and I'd like to build an ndarray from it. Say that the shape would be (12345, 67890). What would be an efficient and elegant way to do so?
Here are a few options, and why I ruled them out:
np.array(my_tuples) starts allocating the array before it knows the size, which requires inefficient relocations according to NumPy's documentation.
Create an array with uninitialized content using np.ndarray((12345, 67890)) and then do a loop that populates it with data. It works and it's efficient, but a bit inelegant because it requires multiple statements.
Use np.fromiter which appears to be geared towards 1-dimensional arrays only.
Does anyone have a better solution?
(I've seen this question, but I'm not seeing any promising answers there.)
Define a generator:
def foo(m,n):
for i in range(m):
yield list(range(i,i+n))
timing several alternatives:
In [93]: timeit np.array(list(foo(3000,4000)))
1.74 s ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [94]: timeit list(foo(3000,4000))
663 ms ± 3.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [95]: timeit np.stack([np.array(row) for row in foo(3000,4000)])
1.32 s ± 2.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [96]: timeit np.concatenate([np.array(row, ndmin=2) for row in foo(3000,4000)
...: ])
1.33 s ± 23.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [97]: %%timeit
...: arr = np.empty((3000,4000),int)
...: for i,row in enumerate(foo(3000,4000)):
...: arr[i] = row
...:
1.29 s ± 3.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and with a flat generator:
def foo1(m,n):
for i in range(m):
for j in range(n):
yield i+j
In [104]: timeit np.fromiter(foo1(3000,4000),int).reshape(3000,4000)
1.54 s ± 5.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Use fromiter() with .reshape().
Reshaping does not require more memory or processing.
I suspect you'll find this not elegant enough, but fast it is:
from timeit import timeit
import itertools as it
def x():
for i in range(3000):
yield list(range(i,i+4000))
timeit(lambda:np.fromiter(it.chain.from_iterable(x()),int,12000000).reshape(3000,4000),number=10)
# 5.048861996969208
Compare that to, for example
timeit(lambda:np.concatenate(list(x()),0),number=10)
# 12.466914481949061
Btw. if you do not know the total number of elements in advance, no big deal:
timeit(lambda:np.fromiter(it.chain.from_iterable(x()),int).reshape(3000,-1),number=10)
# 5.331893905065954

Fast numpy row slicing on a matrix

I have the following issue: I have a matrix yj of size (m,200) (m = 3683), and I have a dictionary that for each key, returns a numpy array of row indices for yj (for each key, the size array changes, just in case anyone is wondering).
Now, I have to access this matrix lots of times (around 1M times) and my code is slowing down because of the indexing (I've profiled the code and it takes 65% of time on this step).
Here is what I've tried out:
First of all, use the indices for slicing:
>> %timeit yj[R_u_idx_train[1]]
10.5 µs ± 79.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The variable R_u_idx_train is the dictionary that has the row indices.
I thought that maybe boolean indexing might be faster:
>> yj[R_u_idx_train_mask[1]]
10.5 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
R_u_idx_train_mask is a dictionary that returns a boolean array of size m where the indices given by R_u_idx_train are set to True.
I also tried np.ix_
>> cols = np.arange(0,200)
>> %timeit ix_ = np.ix_(R_u_idx_train[1], cols); yj[ix_]
42.1 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I also tried np.take
>> %timeit np.take(yj, R_u_idx_train[1], axis=0)
2.35 ms ± 88.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And while this seems great, it is not, since it gives an array that is shape (R_u_idx_train[1].shape[0], R_u_idx_train[1].shape[0]) (it should be (R_u_idx_train[1].shape[0], 200)). I guess I'm not using the method correctly.
I also tried np.compress
>> %timeit np.compress(R_u_idx_train_mask[1], yj, axis=0)
14.1 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Finally I tried to index with a boolean matrix
>> %timeit yj[R_u_idx_train_mask2[1]]
244 µs ± 786 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So, is 10.5 µs ± 79.7 ns per loop the best I can do? I could try to use cython but that seems like a lot of work for just indexing...
Thanks a lot.
A very smart solution was given by V.Ayrat in the comments.
>> newdict = {k: yj[R_u_idx_train[k]] for k in R_u_idx_train.keys()}
>> %timeit newdict[1]
202 ns ± 6.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Anyway maybe it would still be cool to know if there is a way to speed it up using numpy!

Faster return_inverse in np.unique

I have a large numpy 1D array with over a 100 million elements and am applying np.unique to it
import numpy as np
x = np.random.randint(0,10000, size=100_000_000)
_, index = np.unique(x, return_inverse=True)
What I actually need is the index that is returned from np.unique but I do not need the unique array at all (i.e., it is throwaway). Since, in my real use case, I need to call np.unique many times on different arrays (all with the same length), this becomes the bottleneck. I'm guessing that a lot of the time is spent on sorting the unique array.
What is the a fastest way to obtain the index for a large 1D array (it may be over a billion elements in length)?
Is there a parallelized option?
Here's a way with array-assignment + masking + indexing trickery specific to the case of positive integers only in the input array x -
def return_inverse_only(x, maxnum=None):
if maxnum is None:
maxnum = x.max()+1 # Determines extent of indexing array
p = np.zeros(maxnum, dtype=bool)
p[x] = 1
p2 = np.empty(maxnum, dtype=np.uint64)
c = p.sum()
p2[p] = np.arange(c)
out = p2[x]
return out
If max number in the input array is known before-hahnd, feed in one-added number as maxnum to boost perf. further.
Timings on large arrays -
In [146]: np.random.seed(0)
...: x = np.random.randint(0,10000, size=100000)
In [147]: %timeit np.unique(x, return_inverse=True)
...: %timeit return_inverse_only(x)
...: %timeit return_inverse_only(x, maxnum=10000)
10.9 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
539 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
446 µs ± 30 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [148]: np.random.seed(0)
...: x = np.random.randint(0,10000, size=1000000)
In [149]: %timeit np.unique(x, return_inverse=True)
...: %timeit return_inverse_only(x)
...: %timeit return_inverse_only(x, maxnum=10000)
149 ms ± 5.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.1 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.3 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [150]: np.random.seed(0)
...: x = np.random.randint(0,10000, size=10000000)
In [151]: %timeit np.unique(x, return_inverse=True)
...: %timeit return_inverse_only(x)
...: %timeit return_inverse_only(x, maxnum=10000)
1.88 s ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
67.9 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
30x+ speedup!

Categories