What is the dask equivalent of numpy.tile?

What is the dask equivalent of numpy.tile? - python

Dask (http://dask.pydata.org/en/latest/array-api.html) is a flexible parallel computing library for analytics. It scales to big data, in constrast to Numpy and has many similar methods. How can I achieve the same effect as numpy.tile on a dask array?

Using dask.array.concatenate() could be a possible workaround.
Demo in NumPy:
In [374]: x = numpy.arange(4).reshape((2, 2))
In [375]: x
Out[375]:
array([[0, 1],
[2, 3]])
In [376]: n = 3
In [377]: numpy.tile(x, n)
Out[377]:
array([[0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3]])
In [378]: numpy.concatenate([x for i in range(n)], axis=1)
Out[378]:
array([[0, 1, 0, 1, 0, 1],
[2, 3, 2, 3, 2, 3]])

Related

How does the transpose of high-dimensional arrays work?

It's easy to understand the concept of Transpose in 2-D array. I reall can not understand How the transpose of high-dimensional arrays works.
For example
c = np.indices([4,5]).T.reshape(20,1,2)
d = np.indices([4,5]).reshape(20,1,2)
np.all(c==d) # output is False
Why are the outputs of C and D inconsistent?

In [143]: c = np.indices([4,5])
In [144]: c
Out[144]:
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
In [145]: c.shape
Out[145]: (2, 4, 5)
In [146]: c.T.shape
Out[146]: (5, 4, 2)
Look at one 2d array from the size 2 dimension:
In [150]: c[0,:,:]
Out[150]:
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3]])
In [151]: c.T[:,:,0]
Out[151]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
The 2nd is the usual 2d transpose, a (5,4) array.
MATLAB doesn't do transpose on 3d arrays, at least it doesn't call it such. It may have a way making such a change. numpy, using a general shape/strides multidimensional implementation, easily generalizes the 2d transpose - to 1d or 3d or more.

Forming a Co-variance matrix for a 2D numpy array

I am trying to figure out a fully vectorised way to compute the co-variance matrix for a 2D numpy array for a given base kernel function. For example if the input is X = [[a,b],[c,d]] for a kernel function k(x_1,x_2) the covariance matrix will be
K=[[k(a,a),k(a,b),k(a,c),k(a,d)],
[k(b,a),k(b,b),k(b,c),k(b,d)],
[k(c,a),k(c,b),k(c,c),k(c,d)],
[k(d,a),k(d,b),k(d,c),k(d,d)]].
how do I go about doing this? I am confused as to how to repeat the values and then apply the function and what might be the most efficient way of doing this.

You can use np.meshgrid to get two matrices with values for the first and second parameter to the k function.
In [8]: X = np.arange(4).reshape(2,2)
In [9]: np.meshgrid(X, X)
Out[9]:
[array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]]),
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3]])]
You can then just pass these matrices to the k function:
In [10]: k = lambda x1, x2: (x1-x2)**2
In [11]: X1, X2 = np.meshgrid(X, X)
In [12]: k(X1, X2)
Out[12]:
array([[0, 1, 4, 9],
[1, 0, 1, 4],
[4, 1, 0, 1],
[9, 4, 1, 0]])

Here's another way
k(X.reshape(-1, 1), X.reshape(1, -1))

numpy indexing operations for 3D matrix

Is there an elegant/quick way to reproduce this without the for loops? I'm looking to have a 3D matrix of values, and and 2D matrix that gives the indices for which to copy the 3rd dimensions' values while creating a new 3D matrix of the same shape. Here is an implementation with a lot of loops.
np.random.seed(0)
x = np.random.randint(5, size=(2, 3, 4))
y = np.random.randint(x.shape[1], size=(3, 4))
z = np.zeros((2, 3, 4))
for i in range(x.shape[0]):
for j in range(x.shape[1]):
z[i, j, :] = x[i, y[i, j], :]

This puzzled me for a bit, until I realized you aren't using all of y. y is (3,4), but you are indexing over (2,3):
In [28]: x[np.arange(2)[:,None], y[:2,:3],:]
Out[28]:
array([[[4, 0, 0, 4],
[4, 0, 3, 3],
[3, 1, 3, 2]],
[[3, 0, 3, 0],
[2, 1, 0, 1],
[1, 0, 1, 4]]])
We could use all of y with:
In [32]: x[np.arange(2)[:,None,None],y,np.arange(4)]
Out[32]:
array([[[4, 0, 3, 2],
[4, 0, 3, 2],
[3, 0, 0, 3]],
[[3, 1, 1, 4],
[3, 1, 1, 4],
[1, 1, 3, 1]]])
the 3 indexes broadcast to (2,3,4). But the selection is different from your z.

np.choose not giving desired result after broadcasting

I would like to pick the nth elements as specified in maxsuit from suitCounts. I did broadcast the maxsuit array so I do get a result, but not the desired one. Any suggestions what I'm doing conceptually wrong is appreciated. I don't understand the result of np.choose(self.maxsuit[:,:,None]-1, self.suitCounts), which is not what I'm looking for.
>>> self.maxsuit
Out[38]:
array([[3, 3],
[1, 1],
[1, 1]], dtype=int64)
>>> self.maxsuit[:,:,None]-1
Out[33]:
array([[[2],
[2]],
[[0],
[0]],
[[0],
[0]]], dtype=int64)
>>> self.suitCounts
Out[34]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
>>> np.choose(self.maxsuit[:,:,None]-1, self.suitCounts)
Out[35]:
array([[[2, 2, 0, 0],
[1, 1, 1, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[2, 1, 3, 0],
[1, 0, 3, 0]]])
The desired result would be:
[[3,3],[4,3],[2,1]]

You could use advanced-indexing for a broadcasted way to index into the array, like so -
In [415]: val # Data array
Out[415]:
array([[[2, 1, 3, 0],
[1, 0, 3, 0]],
[[4, 1, 2, 0],
[3, 0, 3, 0]],
[[2, 2, 0, 0],
[1, 1, 1, 0]]])
In [416]: idx # Indexing array
Out[416]:
array([[3, 3],
[1, 1],
[1, 1]])
In [417]: m,n = val.shape[:2]
In [418]: val[np.arange(m)[:,None],np.arange(n),idx-1]
Out[418]:
array([[3, 3],
[4, 3],
[2, 1]])
A bit cleaner way with np.ogrid to use open range arrays -
In [424]: d0,d1 = np.ogrid[:m,:n]
In [425]: val[d0,d1,idx-1]
Out[425]:
array([[3, 3],
[4, 3],
[2, 1]])

This is the best I can do with choose
In [23]: np.choose([[1,2,0],[1,2,0]], suitcounts[:,:,:3])
Out[23]:
array([[4, 2, 3],
[3, 1, 3]])
choose prefers that we use a list of arrays, rather than single one. It's supposed to prevent misuse. So the problem could be written as:
In [24]: np.choose([[1,2,0],[1,2,0]], [suitcounts[0,:,:3], suitcounts[1,:,:3], suitcounts[2,:,:3]])
Out[24]:
array([[4, 2, 3],
[3, 1, 3]])
The idea is to select items from the 3 subarrays, based on an index array like:
In [25]: np.array([[1,2,0],[1,2,0]])
Out[25]:
array([[1, 2, 0],
[1, 2, 0]])
The output will match the indexing array in shape. The choise arrays have match in shape as well, hence my use of [...,:3].
Values for the first column are selected from suitcounts[1,:,:3], for the 2nd column from suitcounts[2...] etc.
choose is limited to 32 choices; this is limitation imposed by the broadcasting mechanism.
Speaking of broadcasting I could simplify the expression
In [26]: np.choose([1,2,0], suitcounts[:,:,:3])
Out[26]:
array([[4, 2, 3],
[3, 1, 3]])
This broadcasts [1,2,0] to match the 2x3 shape of the subarrays.
I could get the target order by reordering the columns:
In [27]: np.choose([0,1,2], suitcounts[:,:,[2,0,1]])
Out[27]:
array([[3, 4, 2],
[3, 3, 1]])

making an array of n columns where each successive row increases by one

In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
[(0,0,0,0),
(1,1,1,1),
(2,2,2,2)]
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
[(0,0,0,0,0,0),
(1,1,1,1,1,1),
(2,2,2,2,2,2),
(3,3,3,3,3,3)]
4 rows and 6 columns where the row increases by one each time. Thanks for your time.

So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
Out[53]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
Out[54]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
Out[55]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
Out[70]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
Out[72]:
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])

import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T

Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])

You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']

On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]

As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)

Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]

you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]

Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
a*np.ones((m),dtype=int)
a[:,np.zeros((m),dtype=int)]
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the dask equivalent of numpy.tile? - python

Dask (http://dask.pydata.org/en/latest/array-api.html) is a flexible parallel computing library for analytics. It scales to big data, in constrast to Numpy and has many similar methods. How can I achieve the same effect as numpy.tile on a dask array?

Related

How does the transpose of high-dimensional arrays work?

Forming a Co-variance matrix for a 2D numpy array

numpy indexing operations for 3D matrix

np.choose not giving desired result after broadcasting

making an array of n columns where each successive row increases by one

Categories

Resources