Faster way to build text file in python - python

I have two 3d numpy arrays, call them a and b, 512x512x512. I need to write them to a text file:
a1 b1
a2 b2
a3 b3
...
This can be accomplished with a triple loop:
lines = []
for x in range(nx):
for y in range(ny):
for z in range(nz):
lines.append('{} {}'.format(a[x][y][z], b[x][y][z])
print('\n'.join(lines))
But this is brutally slow (10 minutes when I'd prefer a few seconds on a mac pro).
I am using python 3.6, latest numpy, and am happy to use other libraries, build extensions, whatever is necessary. What is the best way to get this faster?

You could use np.stack and reshape the array to (-1, 2) (two columns) array, then use np.savetxt:
a = np.arange(8).reshape(2,2,2)
b = np.arange(8, 16).reshape(2,2,2)
np.stack([a, b], axis=-1).reshape(-1, 2)
#array([[ 0, 8],
# [ 1, 9],
# [ 2, 10],
# [ 3, 11],
# [ 4, 12],
# [ 5, 13],
# [ 6, 14],
# [ 7, 15]])
Then you can save the file as:
np.savetxt("*.txt", np.stack([a, b], axis=-1).reshape(-1, 2), fmt="%d")

you could use flatten() and dstack(), see example below
a = np.random.random([5,5,5]).flatten()
b = np.random.random([5,5,5]).flatten()
c = np.dstack((a,b))
print c
will result in
[[[ 0.31314428 0.35367513]
[ 0.9126653 0.40616986]
[ 0.42339608 0.57728441]
[ 0.50773896 0.15861347]
....

It's a bit difficult to understand your problem without knowing what kind of data you have in those three arrays, but it looks like numpy.savetxt could be useful for you.
Here's how it works:
import numpy as np
a = np.array(range(10))
np.savetxt("myfile.txt", a)
And here's the documentation:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html

Related

Sort an array of multi D points by distance to a reference point

I have a reference point p_ref stored in a numpy array with a shape of (1024,), something like:
print(p_ref)
>>> array([ p1, p2, p3, ..., p_n])
I also have a numpy array A_points with a shape of (1024,5000) containing 5000 points, each having 1024 dimensions like p_ref. My problem: I would like to sort the points in A_points by their (eucledian) distance to p_ref!
How can I do this? I read about scipy.spatial.distance.cdist and scipy.spatial.KDTree, but they both weren't doing exactly what I wanted and when I tried to combine them I made a mess. Thanks!
For reference and consistency let's assume:
p_ref = np.array([0,1,2,3])
A_points = np.reshape(np.array([10,3,2,13,4,5,16,3,8,19,4,11]), (4,3))
Expected output:
array([[ 3, 2, 10],
[ 4, 5, 13],
[ 3, 8, 16],
[ 4, 11, 19]])
EDIT: Updated on suggestions by the OP.
I hope I understand you correctly, but you can calculate the distance between two vectors by using numpy.linalg.norm. Using this it should be as simple as:
A_sorted = sorted( A_points.T, key = lambda x: np.linalg.norm(x - p_ref ) )
A_sorted = np.reshape(A_sorted, (3,4)).T
You can do something like this -
A_points[:,np.linalg.norm(A_points-p_ref[:,None],axis=0).argsort()]
Another with np.einsum that should be more efficient than np.linalg.norm -
d = A_points-p_ref[:,None]
out = A_points[:,np.einsum('ij,ij->j',d,d).argsort()]
Further optimize to leverage fast matrix-multiplication to replace last step -
A_points[:,((A_points**2).sum(0)+(p_ref**2).sum()-2*p_ref.dot(A_points)).argsort()]

Numpy fastest way to apply array of functions to matrix columns

I have an array of functions shape (n,) and a numpy matrix of shape (m, n). Now I want to apply each function to its corresponding column in the matrix, i.e.
matrix[:, i] = funcs[i](matrix[:, i])
I could do this with a for loop (see example below), but using for loops is generally discouraged in numpy. My question is what is the quickest (and preferably most elegant) way to do this?
A working example
import numpy as np
# Example of functions to apply to each row
funcs = np.array([np.vectorize(lambda x: x+1),
np.vectorize(lambda x: x-2),
np.vectorize(lambda x: x+3)])
# Initialise dummy matrix
matrix = np.random.rand(50, 3)
# Apply each function to each column
for i in range(funcs.shape[0]):
matrix[:, i] = funcs[i](matrix[:, i])
For an array that has many rows and a few columns, a simple column iteration should be time effective:
In [783]: funcs = [lambda x: x+1, lambda x: x+2, lambda x: x+3]
In [784]: arr = np.arange(12).reshape(4,3)
In [785]: for i in range(3):
...: arr[:,i] = funcs[i](arr[:,i])
...:
In [786]: arr
Out[786]:
array([[ 1, 3, 5],
[ 4, 6, 8],
[ 7, 9, 11],
[10, 12, 14]])
If the functions work with 1d array inputs, there's not need for np.vectorize (np.vectorize is generally slower than plain iteration anyways.) Also for iteration like this there's no need to wrap the list of functions in an array. It's faster to iterate on lists.
A variation on the indexed iteration:
In [787]: for f, col in zip(funcs, arr.T):
...: col[:] = f(col)
...:
In [788]: arr
Out[788]:
array([[ 2, 5, 8],
[ 5, 8, 11],
[ 8, 11, 14],
[11, 14, 17]])
I use arr.T here so the iteration is on the columns of arr, not the rows.
A general observation: a few iterations on a complex task is perfectly good numpy style. Many iterations on simple tasks is slow, and should be performed in compiled code where possible.
A loop is efficient here since the job in the loop is heavy.
A readable solution is just :
np.vectorize(apply)(funcs,matrix)

Reformat table in Python

I have a table in a Python script with numpy in the following shape:
[array([[a1, b1, c1], ..., [x1, y1, z1]]),
array([a2, b2, c2, ..., x2, y2, z2])
]
I would like to reshape it to a format like this:
(array([[a2],
[b2],
.
.
.
[z2]],
dtype = ...),
array([[a1],
[b1],
.
.
.
[z1]])
)
To be honest, I'm also quite confused about the different parentheses. array1, array2] is a list of arrays, right? What is (array1, array2), then?
Round brackets (1, 2) are tuples, square brackets [1, 2] are lists. To convert your data structure, use expand_dims and flatten.
import numpy as np
a = [
np.array([[1, 2, 3], [4, 5, 6]]),
np.array([10, 11, 12, 13, 14])
]
print(a)
b = (
np.expand_dims(a[1], axis=1),
np.expand_dims(a[0].flatten(), axis=1)
)
print(b)
#[array1,array2] is a python list of two numpy tables(narray)
#(array1,array2) is a python tuple of two numpy tables(narray)
tuple([array.reshape((-1,1)) for array in your_list.reverse()])

Get product of two one dimensional array in python

Something very simple in Matlab, but I can't get it in Python. How to get the following:
x=np.array([1,2,3])
y=np.array([4,5,6,7])
z=x.T*y
z=
[[4,5,6,7],
[8,10,12,14],
[12,15,18,21]]
As in
x [4][5][6][7]
[1]
[2]
[3]
In scientific python that would be an outer product np.outer(x,y)
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.outer.html:
import numpy;
>>> x=numpy.array([1,2,3])
>>> y=numpy.array([4,5,6,7])
>>> numpy.outer(x,y)
array([[ 4, 5, 6, 7],
[ 8, 10, 12, 14],
[12, 15, 18, 21]])
In MATLAB, size(x) is (1,3). So x' is (3,1). Multiply that by y, which is (1,4), produces (3,4) shape.
In numpy, x.shape is (3,). x.T is the same. So to get the same outer product, you need to expand the dimensions of x and y. One way is with reshape.
z = x.reshape(3,1)* y.reshape(1,4)
numpy also lets you do this with a newaxis indexing (None also works). It also automatically adds beginning newaxis if that is needed. So this also does the job:
z = x[:,np.newaxis]*y
np.outer does exactly this (with a minor embelishment): a.ravel()[:, newaxis]*b.ravel()[newaxis,:].
There's another tool in numpy
z = np.einsum('i,j->ij',x,y)
It is based on an indexing notation that is popular in physics, and is especially useful in writing more complicated inner (dot) products.
Using list comprehension:
x = [1, 2, 3]
y = [4, 5, 6, 7]
z = [[i * j for j in y] for i in x]

numpy ndarray slicing and iteration

I'm trying to slice and iterate over a multidimensional array at the same time. I have a solution that's functional, but it's kind of ugly, and I bet there's a slick way to do the iteration and slicing that I don't know about. Here's the code:
import numpy as np
x = np.arange(64).reshape(4,4,4)
y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
for j in range(0,4,2)
for k in range(0,4,2)]
y = np.array(y)
z = np.array([np.min(u) for u in y]).reshape(y.shape[1:])
Your last reshape doesn't work, because y has no shape defined. Without it you get:
>>> x = np.arange(64).reshape(4,4,4)
>>> y = [x[i:i+2,j:j+2,k:k+2] for i in range(0,4,2)
... for j in range(0,4,2)
... for k in range(0,4,2)]
>>> z = np.array([np.min(u) for u in y])
>>> z
array([ 0, 2, 8, 10, 32, 34, 40, 42])
But despite that, what you probably want is reshaping your array to 6 dimensions, which gets you the same result as above:
>>> xx = x.reshape(2, 2, 2, 2, 2, 2)
>>> zz = xx.min(axis=-1).min(axis=-2).min(axis=-3)
>>> zz
array([[[ 0, 2],
[ 8, 10]],
[[32, 34],
[40, 42]]])
>>> zz.ravel()
array([ 0, 2, 8, 10, 32, 34, 40, 42])
It's hard to tell exactly what you want in the last mean, but you can use stride_tricks to get a "slicker" way. It's rather tricky.
import numpy.lib.stride_tricks
# This returns a view with custom strides, x2[i,j,k] matches y[4*i+2*j+k]
x2 = numpy.lib.stride_tricks(
x, shape=(2,2,2,2,2,2),
strides=(numpy.array([32,8,2,16,4,1])*x.dtype.itemsize))
z2 = z2.min(axis=-1).min(axis=-2).min(axis=-3)
Still, I can't say this is much more readable. (Or efficient, as each min call will make temporaries.)
Note, my answer differs from Jaime's because I tried to match your elements of y. You can tell if you replace the min with max.

Categories