Reformat table in Python - python

I have a table in a Python script with numpy in the following shape:
[array([[a1, b1, c1], ..., [x1, y1, z1]]),
array([a2, b2, c2, ..., x2, y2, z2])
]
I would like to reshape it to a format like this:
(array([[a2],
[b2],
.
.
.
[z2]],
dtype = ...),
array([[a1],
[b1],
.
.
.
[z1]])
)
To be honest, I'm also quite confused about the different parentheses. array1, array2] is a list of arrays, right? What is (array1, array2), then?

Round brackets (1, 2) are tuples, square brackets [1, 2] are lists. To convert your data structure, use expand_dims and flatten.
import numpy as np
a = [
np.array([[1, 2, 3], [4, 5, 6]]),
np.array([10, 11, 12, 13, 14])
]
print(a)
b = (
np.expand_dims(a[1], axis=1),
np.expand_dims(a[0].flatten(), axis=1)
)
print(b)

#[array1,array2] is a python list of two numpy tables(narray)
#(array1,array2) is a python tuple of two numpy tables(narray)
tuple([array.reshape((-1,1)) for array in your_list.reverse()])

Related

Sort an array of multi D points by distance to a reference point

I have a reference point p_ref stored in a numpy array with a shape of (1024,), something like:
print(p_ref)
>>> array([ p1, p2, p3, ..., p_n])
I also have a numpy array A_points with a shape of (1024,5000) containing 5000 points, each having 1024 dimensions like p_ref. My problem: I would like to sort the points in A_points by their (eucledian) distance to p_ref!
How can I do this? I read about scipy.spatial.distance.cdist and scipy.spatial.KDTree, but they both weren't doing exactly what I wanted and when I tried to combine them I made a mess. Thanks!
For reference and consistency let's assume:
p_ref = np.array([0,1,2,3])
A_points = np.reshape(np.array([10,3,2,13,4,5,16,3,8,19,4,11]), (4,3))
Expected output:
array([[ 3, 2, 10],
[ 4, 5, 13],
[ 3, 8, 16],
[ 4, 11, 19]])
EDIT: Updated on suggestions by the OP.
I hope I understand you correctly, but you can calculate the distance between two vectors by using numpy.linalg.norm. Using this it should be as simple as:
A_sorted = sorted( A_points.T, key = lambda x: np.linalg.norm(x - p_ref ) )
A_sorted = np.reshape(A_sorted, (3,4)).T
You can do something like this -
A_points[:,np.linalg.norm(A_points-p_ref[:,None],axis=0).argsort()]
Another with np.einsum that should be more efficient than np.linalg.norm -
d = A_points-p_ref[:,None]
out = A_points[:,np.einsum('ij,ij->j',d,d).argsort()]
Further optimize to leverage fast matrix-multiplication to replace last step -
A_points[:,((A_points**2).sum(0)+(p_ref**2).sum()-2*p_ref.dot(A_points)).argsort()]

count overlap between two numpy arrays [duplicate]

I want to get the intersecting (common) rows across two 2D numpy arrays. E.g., if the following arrays are passed as inputs:
array([[1, 4],
[2, 5],
[3, 6]])
array([[1, 4],
[3, 6],
[7, 8]])
the output should be:
array([[1, 4],
[3, 6])
I know how to do this with loops. I'm looking at a Pythonic/Numpy way to do this.
For short arrays, using sets is probably the clearest and most readable way to do it.
Another way is to use numpy.intersect1d. You'll have to trick it into treating the rows as a single value, though... This makes things a bit less readable...
import numpy as np
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ncols)
For large arrays, this should be considerably faster than using sets.
You could use Python's sets:
>>> import numpy as np
>>> A = np.array([[1,4],[2,5],[3,6]])
>>> B = np.array([[1,4],[3,6],[7,8]])
>>> aset = set([tuple(x) for x in A])
>>> bset = set([tuple(x) for x in B])
>>> np.array([x for x in aset & bset])
array([[1, 4],
[3, 6]])
As Rob Cowie points out, this can be done more concisely as
np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)])
There's probably a way to do this without all the going back and forth from arrays to tuples, but it's not coming to me right now.
I could not understand why there is no suggested pure numpy way to get this working. So I found one, that uses numpy broadcast. The basic idea is to transform one of the arrays to 3d by axes swapping. Let's construct 2 arrays:
a=np.random.randint(10, size=(5, 3))
b=np.zeros_like(a)
b[:4,:]=a[np.random.randint(a.shape[0], size=4), :]
With my run it gave:
a=array([[5, 6, 3],
[8, 1, 0],
[2, 1, 4],
[8, 0, 6],
[6, 7, 6]])
b=array([[2, 1, 4],
[2, 1, 4],
[6, 7, 6],
[5, 6, 3],
[0, 0, 0]])
The steps are (arrays can be interchanged) :
#a is nxm and b is kxm
c = np.swapaxes(a[:,:,None],1,2)==b #transform a to nx1xm
# c has nxkxm dimensions due to comparison broadcast
# each nxixj slice holds comparison matrix between a[j,:] and b[i,:]
# Decrease dimension to nxk with product:
c = np.prod(c,axis=2)
#To get around duplicates://
# Calculate cumulative sum in k-th dimension
c= c*np.cumsum(c,axis=0)
# compare with 1, so that to get only one 'True' statement by row
c=c==1
#//
# sum in k-th dimension, so that a nx1 vector is produced
c=np.sum(c,axis=1).astype(bool)
# The intersection between a and b is a[c]
result=a[c]
In a function with 2 lines for used memory reduction (correct me if wrong):
def array_row_intersection(a,b):
tmp=np.prod(np.swapaxes(a[:,:,None],1,2)==b,axis=2)
return a[np.sum(np.cumsum(tmp,axis=0)*tmp==1,axis=1).astype(bool)]
which gave result for my example:
result=array([[5, 6, 3],
[2, 1, 4],
[6, 7, 6]])
This is faster than set solutions, as it makes use only of simple numpy operations, while it reduces constantly dimensions, and is ideal for two big matrices. I guess I might have made mistakes in my comments, as I got the answer by experimentation and instinct. The equivalent for column intersection can either be found by transposing the arrays or by changing the steps a little. Also, if duplicates are wanted, then the steps inside "//" have to be skipped. The function can be edited to return only the boolean array of the indices, which came handy to me ,while trying to get different arrays indices with the same vector. Benchmark for the voted answer and mine (number of elements in each dimension plays role on what to choose):
Code:
def voted_answer(A,B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.intersect1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
a_small=np.random.randint(10, size=(10, 10))
b_small=np.zeros_like(a_small)
b_small=a_small[np.random.randint(a_small.shape[0],size=[a_small.shape[0]]),:]
a_big_row=np.random.randint(10, size=(10, 1000))
b_big_row=a_big_row[np.random.randint(a_big_row.shape[0],size=[a_big_row.shape[0]]),:]
a_big_col=np.random.randint(10, size=(1000, 10))
b_big_col=a_big_col[np.random.randint(a_big_col.shape[0],size=[a_big_col.shape[0]]),:]
a_big_all=np.random.randint(10, size=(100,100))
b_big_all=a_big_all[np.random.randint(a_big_all.shape[0],size=[a_big_all.shape[0]]),:]
print 'Small arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_small,b_small),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_small,b_small),number=100)/100
print 'Big column arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_col,b_big_col),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_col,b_big_col),number=100)/100
print 'Big row arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_row,b_big_row),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_row,b_big_row),number=100)/100
print 'Big arrays:'
print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_all,b_big_all),number=100)/100
print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_all,b_big_all),number=100)/100
with results:
Small arrays:
Voted answer: 7.47108459473e-05
Proposed answer: 2.47001647949e-05
Big column arrays:
Voted answer: 0.00198730945587
Proposed answer: 0.0560171294212
Big row arrays:
Voted answer: 0.00500325918198
Proposed answer: 0.000308241844177
Big arrays:
Voted answer: 0.000864889621735
Proposed answer: 0.00257176160812
Following verdict is that if you have to compare 2 big 2d arrays of 2d points then use voted answer. If you have big matrices in all dimensions, voted answer is the best one by all means. So, it depends on what you choose each time.
Numpy broadcasting
We can create a boolean mask using broadcasting which can be then used to filter the rows in array A which are also present in array B
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
m = (A[:, None] == B).all(-1).any(1)
>>> A[m]
array([[1, 4],
[3, 6]])
Another way to achieve this using structured array:
>>> a = np.array([[3, 1, 2], [5, 8, 9], [7, 4, 3]])
>>> b = np.array([[2, 3, 0], [3, 1, 2], [7, 4, 3]])
>>> av = a.view([('', a.dtype)] * a.shape[1]).ravel()
>>> bv = b.view([('', b.dtype)] * b.shape[1]).ravel()
>>> np.intersect1d(av, bv).view(a.dtype).reshape(-1, a.shape[1])
array([[3, 1, 2],
[7, 4, 3]])
Just for clarity, the structured view looks like this:
>>> a.view([('', a.dtype)] * a.shape[1])
array([[(3, 1, 2)],
[(5, 8, 9)],
[(7, 4, 3)]],
dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])
np.array(set(map(tuple, b)).difference(set(map(tuple, a))))
This could also work
Without Index
Visit https://gist.github.com/RashidLadj/971c7235ce796836853fcf55b4876f3c
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
"""
# ''' Using Tuple ''' #
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(tuple(x) == tuple(y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using Numpy function "array_equal" ''' #
""" This method is valid for an ndarray """
intersectionList = list(set([tuple(x) for x in Array_A for y in Array_B if(np.array_equal(x, y))]))
print ("intersectionList = \n",intersectionList)
# ''' Using set and bitwise and '''
intersectionList = [list(y) for y in (set([tuple(x) for x in Array_A]) & set([tuple(x) for x in Array_B]))]
print ("intersectionList = \n",intersectionList)
return intersectionList
With Index
Visit https://gist.github.com/RashidLadj/bac71f3d3380064de2f9abe0ae43c19e
def intersect2D(Array_A, Array_B):
"""
Find row intersection between 2D numpy arrays, a and b.
Returns another numpy array with shared rows and index of items in A & B arrays
"""
# [[IDX], [IDY], [value]] where Equal
# ''' Using Tuple ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(tuple(x) == tuple(y))]).T
# ''' Using Numpy array_equal ''' #
IndexEqual = np.asarray([(i, j, x) for i,x in enumerate(Array_A) for j, y in enumerate (Array_B) if(np.array_equal(x, y))]).T
idx, idy, intersectionList = (IndexEqual[0], IndexEqual[1], IndexEqual[2]) if len(IndexEqual) != 0 else ([], [], [])
return intersectionList, idx, idy
A = np.array([[1,4],[2,5],[3,6]])
B = np.array([[1,4],[3,6],[7,8]])
def matching_rows(A,B):
matches=[i for i in range(B.shape[0]) if np.any(np.all(A==B[i],axis=1))]
if len(matches)==0:
return B[matches]
return np.unique(B[matches],axis=0)
>>> matching_rows(A,B)
array([[1, 4],
[3, 6]])
This of course assumes the rows are all the same length.
import numpy as np
A=np.array([[1, 4],
[2, 5],
[3, 6]])
B=np.array([[1, 4],
[3, 6],
[7, 8]])
intersetingRows=[(B==irow).all(axis=1).any() for irow in A]
print(A[intersetingRows])

Faster way to build text file in python

I have two 3d numpy arrays, call them a and b, 512x512x512. I need to write them to a text file:
a1 b1
a2 b2
a3 b3
...
This can be accomplished with a triple loop:
lines = []
for x in range(nx):
for y in range(ny):
for z in range(nz):
lines.append('{} {}'.format(a[x][y][z], b[x][y][z])
print('\n'.join(lines))
But this is brutally slow (10 minutes when I'd prefer a few seconds on a mac pro).
I am using python 3.6, latest numpy, and am happy to use other libraries, build extensions, whatever is necessary. What is the best way to get this faster?
You could use np.stack and reshape the array to (-1, 2) (two columns) array, then use np.savetxt:
a = np.arange(8).reshape(2,2,2)
b = np.arange(8, 16).reshape(2,2,2)
np.stack([a, b], axis=-1).reshape(-1, 2)
#array([[ 0, 8],
# [ 1, 9],
# [ 2, 10],
# [ 3, 11],
# [ 4, 12],
# [ 5, 13],
# [ 6, 14],
# [ 7, 15]])
Then you can save the file as:
np.savetxt("*.txt", np.stack([a, b], axis=-1).reshape(-1, 2), fmt="%d")
you could use flatten() and dstack(), see example below
a = np.random.random([5,5,5]).flatten()
b = np.random.random([5,5,5]).flatten()
c = np.dstack((a,b))
print c
will result in
[[[ 0.31314428 0.35367513]
[ 0.9126653 0.40616986]
[ 0.42339608 0.57728441]
[ 0.50773896 0.15861347]
....
It's a bit difficult to understand your problem without knowing what kind of data you have in those three arrays, but it looks like numpy.savetxt could be useful for you.
Here's how it works:
import numpy as np
a = np.array(range(10))
np.savetxt("myfile.txt", a)
And here's the documentation:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

Partially unpacking of tuple in Numpy array indexing

In order to solve a problem which is only possible element by element I need to combine NumPy's tuple indexing with an explicit slice.
def f(shape, n):
"""
:param shape: any shape of an array
:type shape: tuple
:type n: int
"""
x = numpy.zeros( (n,) + shape )
for i in numpy.ndindex(shape): # i = (k, l, ...)
x[:, k, l, ...] = numpy.random.random(n)
x[:, *i] results in a SyntaxError and x[:, i] is interpreted as numpy.array([ x[:, k] for k in i ]). Unfortunally it's not possible to have the n-dimension as last (x = numpy.zeros(shape+(n,)) for x[i] = numpy.random.random(n)) because of the further usage of x.
EDIT: Here some example wished in comment.
>>> n, shape = 2, (3,4)
>>> x = np.arange(24).reshape((n,)+(3,4))
>>> print(x)
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> i = (1,2)
>>> print(x[ ??? ]) # '???' expressed by i with any length is the question
array([ 6, 18])
If I understand the question correctly, you have a multi-dimensional numpy array and want to index it by combining a : slice with some number of other indices from a tuple i.
The index to the numpy array is a tuple, so you can basically just combine those 'partial' indices to one tuple and use that as the index. A naive approach might look like this
x[ (:,) + i ] = numpy.random.random(n) # does not work
but this will give a syntax error. Instead of :, you have to use the slice builtin.
x[ (slice(None),) + i ] = numpy.random.random(n)

Get product of two one dimensional array in python

Something very simple in Matlab, but I can't get it in Python. How to get the following:
x=np.array([1,2,3])
y=np.array([4,5,6,7])
z=x.T*y
z=
[[4,5,6,7],
[8,10,12,14],
[12,15,18,21]]
As in
x [4][5][6][7]
[1]
[2]
[3]
In scientific python that would be an outer product np.outer(x,y)
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.outer.html:
import numpy;
>>> x=numpy.array([1,2,3])
>>> y=numpy.array([4,5,6,7])
>>> numpy.outer(x,y)
array([[ 4, 5, 6, 7],
[ 8, 10, 12, 14],
[12, 15, 18, 21]])
In MATLAB, size(x) is (1,3). So x' is (3,1). Multiply that by y, which is (1,4), produces (3,4) shape.
In numpy, x.shape is (3,). x.T is the same. So to get the same outer product, you need to expand the dimensions of x and y. One way is with reshape.
z = x.reshape(3,1)* y.reshape(1,4)
numpy also lets you do this with a newaxis indexing (None also works). It also automatically adds beginning newaxis if that is needed. So this also does the job:
z = x[:,np.newaxis]*y
np.outer does exactly this (with a minor embelishment): a.ravel()[:, newaxis]*b.ravel()[newaxis,:].
There's another tool in numpy
z = np.einsum('i,j->ij',x,y)
It is based on an indexing notation that is popular in physics, and is especially useful in writing more complicated inner (dot) products.
Using list comprehension:
x = [1, 2, 3]
y = [4, 5, 6, 7]
z = [[i * j for j in y] for i in x]

Categories