Numpy - create matrix with rows of vector - python

I have a vector [x,y,z,q] and I want to create a matrix:
[[x,y,z,q],
[x,y,z,q],
[x,y,z,q],
...
[x,y,z,q]]
with m rows. I think this could be done in some smart way, using broadcasting, but I can only think of doing it with a for loop.

Certainly possible with broadcasting after adding with m zeros along the columns, like so -
np.zeros((m,1),dtype=vector.dtype) + vector
Now, NumPy already has an in-built function np.tile for exactly that same task -
np.tile(vector,(m,1))
Sample run -
In [496]: vector
Out[496]: array([4, 5, 8, 2])
In [497]: m = 5
In [498]: np.zeros((m,1),dtype=vector.dtype) + vector
Out[498]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
In [499]: np.tile(vector,(m,1))
Out[499]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
You can also use np.repeat after extending its dimension with np.newaxis/None for the same effect, like so -
In [510]: np.repeat(vector[None],m,axis=0)
Out[510]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
You can also use integer array indexing to get the replications, like so -
In [525]: vector[None][np.zeros(m,dtype=int)]
Out[525]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
And finally with np.broadcast_to, you can simply create a 2D view into the input vector and as such this would be virtually free and with no extra memory requirement. So, we would simply do -
In [22]: np.broadcast_to(vector,(m,len(vector)))
Out[22]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
Runtime test -
Here's a quick runtime test comparing the various approaches -
In [12]: vector = np.random.rand(10000)
In [13]: m = 10000
In [14]: %timeit np.broadcast_to(vector,(m,len(vector)))
100000 loops, best of 3: 3.4 µs per loop # virtually free!
In [15]: %timeit np.zeros((m,1),dtype=vector.dtype) + vector
10 loops, best of 3: 95.1 ms per loop
In [16]: %timeit np.tile(vector,(m,1))
10 loops, best of 3: 89.7 ms per loop
In [17]: %timeit np.repeat(vector[None],m,axis=0)
10 loops, best of 3: 86.2 ms per loop
In [18]: %timeit vector[None][np.zeros(m,dtype=int)]
10 loops, best of 3: 89.8 ms per loop

Related

Sort paired array of 3d array (replace for loop)

I have the following 3d array:
import numpy as np
z = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
I want to sort them according to 3rd dim & 1st value.
Currently I am using following code:
from operator import itemgetter
np.array([sorted(x,key=itemgetter(0)) for x in z])
array([[[ 4, 4],
[ 5, 3],
[10, 2]],
[[ 4, 2],
[ 5, 8],
[ 7, 6]]])
I would like to make the code more efficient/faster by removing the for loop?
For a numpy one liner you can use numpy.argsort:
import numpy as np
a = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
a[np.arange(0,2)[:,None], a[:,:,0].argsort()]
array([[[ 4, 4],
[ 5, 3],
[10, 2]],
[[ 4, 2],
[ 5, 8],
[ 7, 6]]])
Which for such small size array takes about the same time, yet scaling up the size will result in quite an improvement, for instance:
from operator import itemgetter
a = np.random.randint(0,10, (2,100_000,2))
%timeit a[np.arange(0,2)[:,None], a[:,:,0].argsort()]
26.9 ms ± 351 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [sorted(x,key=itemgetter(0)) for x in a]
327 ms ± 6.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
You can use map() to achieve the same result without a for-loop. And with the sort function being either user-defined, or a lambda, or a partial of sorted:
By first creating a sort function:
>>> def mysort(it):
... return sorted(it, key=itemgetter(0))
...
>>> list(map(mysort, z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Same as above, but with a lambda instead:
>>> list(map(lambda it: sorted(it, key=itemgetter(0)), z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
With a partial:
>>> from functools import partial
>>> psort = partial(sorted, key=itemgetter(0))
>>> list(map(psort, z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Or the partial defined in-place:
>>> list(map(partial(sorted, key=itemgetter(0)), z))
[[[4, 4], [5, 3], [10, 2]], [[4, 2], [5, 8], [7, 6]]]
Your question has a list of lists of lists, rather than a 3d numpy array. For numpy-oriented solutions, see this answer.
FYI, (2) and (3b) are roughly equivalent, but have their differences.
Among options 1-3, my preference is the lambda in (2).
Why not simply : np.sort(z,axis=1) ?
import numpy as np
z = np.array([[[10, 2],
[ 5, 3],
[ 4, 4]],
[[ 7, 6],
[ 4, 2],
[ 5, 8]]])
print(np.sort(z,axis=1))
[[[ 4 2]
[ 5 3]
[10 4]]
[[ 4 2]
[ 5 6]
[ 7 8]]]

numpy array - efficiently subtract each row of B from A

I have two numpy arrays a and b. I want to subtract each row of b from a. I tried to use:
a1 - b1[:, None]
This works for small arrays, but takes too long when it comes to real world data sizes.
a = np.arange(16).reshape(8,2)
a
Out[35]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
b = np.arange(6).reshape(3,2)
b
Out[37]:
array([[0, 1],
[2, 3],
[4, 5]])
a - b[:, None]
Out[38]:
array([[[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10],
[12, 12],
[14, 14]],
[[-2, -2],
[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10],
[12, 12]],
[[-4, -4],
[-2, -2],
[ 0, 0],
[ 2, 2],
[ 4, 4],
[ 6, 6],
[ 8, 8],
[10, 10]]])
%%timeit
a - b[:, None]
The slowest run took 10.36 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.18 µs per loop
This approach is too slow / inefficient for larger arrays.
a1 = np.arange(18900 * 41).reshape(18900, 41)
b1 = np.arange(2674 * 41).reshape(2674, 41)
%%timeit
a1 - b1[:, None]
1 loop, best of 3: 12.1 s per loop
%%timeit
for index in range(len(b1)):
a1 - b1[index]
1 loop, best of 3: 2.35 s per loop
Is there any numpy trick I can use to speed this up?
You are playing with memory limits.
If like in your examples, 8 bits are sufficient to store data, use uint8:
import numpy as np
a1 = np.arange(18900 * 41,dtype=np.uint8).reshape(18900, 41)
b1 = np.arange(2674 * 41,dtype=np.uint8).reshape(2674, 41)
%time c1=(a1-b1[:,None])
#1.02 s

Delete and duplicate rows in numpy array

In Python, let's say I have a 1366x768 numpy array. And I want to delete each second row from it (0th row remains, 1st removed, 2nd remains, 3rd removed.. and so on), and replace the empty space with a duplicate from the row which was before it (the undeleted row) at the same time.
Is it possible in numpy?
One approach -
a[::2].repeat(2,axis=0)
To make the changes in the array, assign it back.
Sample run -
In [105]: a
Out[105]:
array([[2, 5, 1, 1],
[2, 0, 2, 5],
[1, 1, 5, 7],
[0, 7, 1, 8],
[8, 5, 2, 3],
[2, 1, 0, 6],
[5, 6, 1, 6],
[7, 1, 4, 7],
[3, 8, 1, 4],
[5, 8, 8, 8]])
In [106]: a[::2].repeat(2,axis=0)
Out[106]:
array([[2, 5, 1, 1],
[2, 5, 1, 1],
[1, 1, 5, 7],
[1, 1, 5, 7],
[8, 5, 2, 3],
[8, 5, 2, 3],
[5, 6, 1, 6],
[5, 6, 1, 6],
[3, 8, 1, 4],
[3, 8, 1, 4]])
If we care about performance, here's another approach using NumPy strides -
def strided_app(a):
m0,n0 = a.strides
m,n = a.shape
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=(m//2,2,n),strides=(2*m0,0,n0)).reshape(-1,n)
Sample run -
In [154]: a
Out[154]:
array([[4, 8, 7, 7],
[5, 5, 1, 7],
[1, 8, 1, 3],
[6, 6, 5, 6],
[0, 2, 6, 3],
[6, 6, 8, 7],
[7, 6, 8, 1],
[7, 8, 8, 2],
[4, 0, 2, 8],
[5, 8, 1, 4]])
In [155]: strided_app(a)
Out[155]:
array([[4, 8, 7, 7],
[4, 8, 7, 7],
[1, 8, 1, 3],
[1, 8, 1, 3],
[0, 2, 6, 3],
[0, 2, 6, 3],
[7, 6, 8, 1],
[7, 6, 8, 1],
[4, 0, 2, 8],
[4, 0, 2, 8]])
Timings -
In [156]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-1
In [157]: %timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 1.26 ms per loop
# #Psidom 's soln
In [158]: %timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 928 µs per loop
In [159]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-2
In [160]: %timeit strided_app(arr)
1000 loops, best of 3: 830 µs per loop
Looks like you have an even number of rows, in which case, you can use assignment (assign the odd rows values to corresponding even rows):
arr = np.array([[1,4],[3,1],[2,3],[2,2]])
arr[1::2] = arr[::2]
arr
#array([[1, 4],
# [1, 4],
# [2, 3],
# [2, 3]])
This avoids copying the entire array, but doesn't work if the array has odd number of rows.
Timing: Here is a comparison of the timing, the assignment does seem faster.
arr = np.arange(1000000).reshape(1000, 1000)
%timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 913 µs per loop
%timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 655 µs per loop
This works for both even and an odd number of rows.
for i in range(1,len(a),2):
a[i] = a[i-1]

Pandas/Numpy Get matrix from column of arrays

I have a pandas dataframe with a column of lists.
df:
inputs
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [10, 11, 12]
I need the matrix
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
An efficient way to do this?
Note: When I try df.inputs.as_matrix() the output is
array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=object)
which has shape (4,), not (4,3) as desired.
You can convert the column to list and then apply numpy array, if all the lists in the column have the same length, this will make a 2D array:
arr = np.array(df.inputs.tolist())
#array([[ 1, 2, 3],
# [ 4, 5, 6],
# [ 7, 8, 9],
# [10, 11, 12]])
arr.shape
# (4, 3)
Or another option use .values to access the numpy object firstly and then convert it to list as commented by #piRSquared, this is marginally faster with the example given:
%timeit df.inputs.values.tolist()
# 100000 loops, best of 3: 5.52 µs per loop
%timeit df.inputs.tolist()
# 100000 loops, best of 3: 11.5 µs per loop

Sub matrix of a list of lists (without numpy)

Suppose I have a matrix composed of a list of lists like so:
>>> LoL=[list(range(10)) for i in range(10)]
>>> LoL
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
Assume, also, that I have a numpy matrix of the same structure called LoLa:
>>> LoLa=np.array(LoL)
Using numpy, I could get a submatrix of this matrix like this:
>>> LoLa[1:4,2:5]
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I can replicate the numpy matrix slice in pure Python like so:
>>> r=(1,4)
>>> s=(2,5)
>>> [LoL[i][s[0]:s[1]] for i in range(len(LoL))][r[0]:r[1]]
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Which is not the easiest thing in the world to read nor the most efficient :-)
Question: Is there an easier way (in pure Python) to slice an arbitrary matrix as a sub matrix?
In [74]: [row[2:5] for row in LoL[1:4]]
Out[74]: [[2, 3, 4], [2, 3, 4], [2, 3, 4]]
You could also mimic NumPy's syntax by defining a subclass of list:
class LoL(list):
def __init__(self, *args):
list.__init__(self, *args)
def __getitem__(self, item):
try:
return list.__getitem__(self, item)
except TypeError:
rows, cols = item
return [row[cols] for row in self[rows]]
lol = LoL([list(range(10)) for i in range(10)])
print(lol[1:4, 2:5])
also yields
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Using the LoL subclass won't win any speed tests:
In [85]: %timeit [row[2:5] for row in x[1:4]]
1000000 loops, best of 3: 538 ns per loop
In [82]: %timeit lol[1:4, 2:5]
100000 loops, best of 3: 3.07 us per loop
but speed isn't everything -- sometimes readability is more important.
For one, you can use slice objects directly, which helps a bit with both the readability and performance:
r = slice(1,4)
s = slice(2,5)
[LoL[i][s] for i in range(len(LoL))[r]]
And if you just iterate over the list-of-lists directly, you can write that as:
[row[s] for row in LoL[r]]
Do this,
submat = [ [ mat[ i ][ j ] for j in range( index1, index2 ) ] for i in range( index3, index4 ) ]
the submat will be the rectangular (square if index3 == index1 and index2 == index4) chunk of your original big matrix.
I dont know if its easier, but let me throw an idea to the table:
from itertools import product
r = (1+1, 4+1)
s = (2+1, 5+1)
array = [LoL[i][j] for i,j in product(range(*r), range(*s))]
This is a flattened version of the submatrix you want.

Categories