List of tuples of vectors --> two matrices - python

In Python, I have a list of tuples, each of them containing two nx1 vectors.
data = [(np.array([0,0,3]), np.array([0,1])),
(np.array([1,0,4]), np.array([1,1])),
(np.array([2,0,5]), np.array([2,1]))]
Now, I want to split this list into two matrices, with the vectors as columns.
So I'd want:
x = np.array([[0,1,2],
[0,0,0],
[3,4,5]])
y = np.array([[0,1,2],
[1,1,1]])
Right now, I have the following:
def split(data):
x,y = zip(*data)
np.asarray(x)
np.asarray(y)
x.transpose()
y.transpose()
return (x,y)
This works fine, but I was wondering whether a cleaner method exists, which doesn't use the zip(*) function and/or doesn't require to convert and transpose the x and y matrices.

This is for pure entertainment, since I'd go with the zip solution if I were to do what you're trying to do.
But a way without zipping would be vstack along your axis 1.
a = np.array(data)
f = lambda axis: np.vstack(a[:, axis]).T
x,y = f(0), f(1)
>>> x
array([[0, 1, 2],
[0, 0, 0],
[3, 4, 5]])
>>> y
array([[0, 1, 2],
[1, 1, 1]])

Comparing the best elements of all previously proposed methods, I think it's best as follows*:
def split(data):
x,y = zip(*data) #splits the list into two tuples of 1xn arrays, x and y
x = np.vstack(x[:]).T #stacks the arrays in x vertically and transposes the matrix
y = np.vstack(y[:]).T #stacks the arrays in y vertically and transposes the matrix
return (x,y)
* this is a snippet of my code

Related

Replacing array at i`th dimension

Let's say I have a two-dimensional array
import numpy as np
a = np.array([[1, 1, 1], [2,2,2], [3,3,3]])
and I would like to replace the third vector (in the second dimension) with zeros. I would do
a[:, 2] = np.array([0, 0, 0])
But what if I would like to be able to do that programmatically? I mean, let's say that variable x = 1 contained the dimension on which I wanted to do the replacing. How would the function replace(arr, dimension, value, arr_to_be_replaced) have to look if I wanted to call it as replace(a, x, 2, np.array([0, 0, 0])?
numpy has a similar function, insert. However, it doesn't replace at dimension i, it returns a copy with an additional vector.
All solutions are welcome, but I do prefer a solution that doesn't recreate the array as to save memory.
arr[:, 1]
is basically shorthand for
arr[(slice(None), 1)]
that is, a tuple with slice elements and integers.
Knowing that, you can construct a tuple of slice objects manually, adjust the values depending on an axis parameter and use that as your index. So for
import numpy as np
arr = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
axis = 1
idx = 2
arr[:, idx] = np.array([0, 0, 0])
# ^- axis position
you can use
slices = [slice(None)] * arr.ndim
slices[axis] = idx
arr[tuple(slices)] = np.array([0, 0, 0])

Einsum for high dimensions

Considering the 3 arrays below:
np.random.seed(0)
X = np.random.randint(10, size=(4,5))
W = np.random.randint(10, size=(3,4))
y = np.random.randint(3, size=(5,1))
i want to add and sum each column of the matrix X to the row of W ,given by y as index. So ,for example, if the first element in y is 3 , i'll add the first column of X to the fourth row of W(index 3 in python) and sum it. i'll do it over and over until all columns of X are added to the specific row of W and summed.
i could do it in different ways:
1- using for loop:
for i,j in enumerate(y):
W[j]+=X[:,i]
2- using the add.at function
np.add.at(W,(y.ravel()),X.T)
3- but i can't understand how to do it using einsum.
i was given a solution ,but really can't understand it.
N = y.max()+1
W[:N] += np.einsum('ijk,lk->il',(np.arange(N)[:,None,None] == y.ravel()),X)
Anyone could explain me this structure?
1 - (np.arange(N)[:,None,None] == y.ravel(),X). i imagine this part refers to summing the column of X to the specific row of W ,according to y. But where s W ? and why do we have to transform W in 4 dimensions in this case?
2- 'ijk,lk->il' - i didnt understand this either.
i -refers to the rows,
j - columns,
k- each element,
l - what does 'l' refers too?.
if anyone can understand this and explain to me , i would really appreciate.
Thanks in advance.
Let's simplify the problem by dropping one dimension and using values that are easy to verify manually:
W = np.zeros(3, np.int)
y = np.array([0, 1, 1, 2, 2])
X = np.array([1, 2, 3, 4, 5])
Values in the vector W get added values from X by looking up with y:
for i, j in enumerate(y):
W[j] += X[i]
W is calculated as [1, 5, 9], (check quickly by hand).
Now, how could this code be vectorized? We can't do a simple W[y] += X[y] as y has duplicate values in it and the different sums would overwrite each other at indices 1 and 2.
What could be done is to broadcast the values into a new dimension of len(y) and then sum up over this newly created dimension.
N = W.shape[0]
select = (np.arange(N) == y[:, None]).astype(np.int)
Taking the index range of W ([0, 1, 2]), and setting the values where they match y to 1 in a new dimension, otherwise 0. select contains this array:
array([[1, 0, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 1]])
It has len(y) == len(X) rows and len(W) columns and shows for every y/row, what index of W it contributes to.
Let's multiply X with this array, mult = select * X[:, None]:
array([[1, 0, 0],
[0, 2, 0],
[0, 3, 0],
[0, 0, 4],
[0, 0, 5]])
We have effectively spread out X into a new dimension, and sorted it in a way we can get it into shape W by summing over the newly created dimension. The sum over the rows is the vector we want to add to W:
sum_Xy = np.sum(mult, axis=0) # [1, 5, 9]
W += sum_Xy
The computation of select and mult can be combined with np.einsum:
# `select` has shape (len(y)==len(X), len(W)), or `yw`
# `X` has shape len(X)==len(y), or `y`
# we want something `len(W)`, or `w`, and to reduce the other dimension
sum_Xy = np.einsum("yw,y->w", select, X)
And that's it for the one-dimensional example. For the two-dimensional problem posed in the question it is exactly the same approach: introduce an additional dimension, broadcast the y indices, and then reduce the additional dimension with einsum.
If you internalize how every step works for the one-dimensional example, I'm sure you can work out how the code is doing it in two dimensions, as it is just a matter of getting the indices right (W rows, X columns).

How to plot pairwise distances of two-dimensional vectors?

I have a set of data in python likes:
x y angle
If I want to calculate the distance between two points with all possible value and plot the distances with the difference between two angles.
x, y, a = np.loadtxt('w51e2-pa-2pk.log', unpack=True)
n = 0
f=(((x[n])-x[n+1:])**2+((y[n])-y[n+1:])**2)**0.5
d = a[n]-a[n+1:]
plt.scatter(f,d)
There are 255 points in my data.
f is the distance and d is the difference between two angles.
My question is can I set n = [1,2,3,.....255] and do the calculation again to get the f and d of all possible pairs?
You can obtain the pairwise distances through broadcasting by considering it as an outer operation on the array of 2-dimensional vectors as follows:
vecs = np.stack((x, y)).T
np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
For example,
In [1]: import numpy as np
...: x = np.array([1, 2, 3])
...: y = np.array([3, 4, 6])
...: vecs = np.stack((x, y)).T
...: np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
...:
Out[1]:
array([[ 0. , 1.41421356, 3.60555128],
[ 1.41421356, 0. , 2.23606798],
[ 3.60555128, 2.23606798, 0. ]])
Here, the (i, j)'th entry is the distance between the i'th and j'th vectors.
The case of the pairwise differences between angles is similar, but simpler, as you only have one dimension to deal with:
In [2]: a = np.array([10, 12, 15])
...: a[np.newaxis, :] - a[: , np.newaxis]
...:
Out[2]:
array([[ 0, 2, 5],
[-2, 0, 3],
[-5, -3, 0]])
Moreover, plt.scatter does not care that the results are given as matrices, and putting everything together using the notation of the question, you can obtain the plot of angles by distances by doing something like
vecs = np.stack((x, y)).T
f = np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
d = angle[np.newaxis, :] - angle[: , np.newaxis]
plt.scatter(f, d)
You have to use a for loop and range() to iterate over n, e.g. like like this:
n = len(x)
for i in range(n):
# do something with the current index
# e.g. print the points
print x[i]
print y[i]
But note that if you use i+1 inside the last iteration, this will already be outside of your list.
Also in your calculation there are errors. (x[n])-x[n+1:] does not work because x[n] is a single value in your list while x[n+1:] is a list starting from n+1'th element. You can not subtract a list from an int or whatever it is.
Maybe you will have to even use two nested loops to do what you want. I guess that you want to calculate the distance between each point so a two dimensional array may be the data structure you want.
If you are interested in all combinations of the points in x and y I suggest to use itertools, which will give you all possible combinations. Then you can do it like follows:
import itertools
f = [((x[i]-x[j])**2 + (y[i]-y[j])**2)**0.5 for i,j in itertools.product(255,255) if i!=j]
# and similar for the angles
But maybe there is even an easier way...

Creating an X by Y dimension array of (a,b) points in Python 2.7

I've been oddly bashing my head against this problem for several hours, and would appreciate any help!
I would like to create a (for example) 100x100 array in which each index is a (x,y) coordinate. The overall goal here is the following:
I have x,y coordinates and would like to arrange them in a 2D space so that I can use the np.diagonal function to return the (x,y) coordinates along a line. I'll then use those (x,y) points to compare particular values.
The first step here is actually creating the array and I just can't seem to do it.
I'm not sure about the numpy part of your request, but you can create the array like so:
coords = [[(y,x) for x in range(100)] for y in range(100)]
>>> coords[50][2]
(50,2)
If you just want the values along the diagonal, why dont you just create a 1D list?
import numpy as np
xs = np.linspace(1,10,100) # assuming x goes form 1 to 10
ys = np.linspace(2,3, 100) # assuming y goes from 2 to 3
xy = zip(xs, ys)
You no longer need the 2d array and then call the diagonal.
Working on Jaime's suggestion:
>>> x, y = numpy.mgrid[0:100, 0:100]
>>> z = numpy.array([x, y]).transpose([1,2,0])
>>> z[50, 2]
array([50, 2])
EDIT: Given an array of points p, of the shape (2, P), this is how you would find out which of these points are underneath diagonal n:
>>> d = numpy.diagonal(z, n)
>>> cond0 = p[0, None] < d[0, :, None]
>>> cond1 = p[1, None] < d[1, :, None]
>>> good_indices_full = numpy.where(numpy.logical_and(cond0, cond1))
>>> good_indices = good_indices_full[1]
(I prefer to work with "good_indices", i.e. write stuff like p[:, good_indices], rather than the full tuple of arrays that numpy.where gives back).

Roll rows of a matrix independently

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?
Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]
numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop
In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr
I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))
Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Categories