(Inverse-) Sorting 2d numpy array column-wise - python

The following code sorts an 2d numpy array column-wise forth and back
import numpy as np
#Column-wise sort and inverse sort of image (2d array)
nrows = 10
ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
ori_indices = np.zeros_like(a)
for c in range(ncols):
ori_indices[:,c] = np.argsort(np.argsort(a[:,c]))
#Do some work on sorted array, like e.g row-wise filtering
#After processing sorted array, move it back to original order
a_backsorted = np.zeros_like(a)
for c in range(ncols):
a_backsorted[:,c] = a_sorted[:,c][ori_indices[:,c]]
print (a); print ()
print (a_backsorted); print ()
print (a_sorted); print ()
The code work as is but I guess there is a more efficient implementation without for loop (using fancy indexing)

You can try a_sorted[::-1] to reverse the array
print (a_sorted); print ()
print (a_sorted[::-1])
[[0 0 0 2 0]
[2 0 0 2 2]
[4 0 2 6 4]
[4 2 3 7 5]
[4 4 4 7 6]
[5 5 4 8 7]
[6 5 4 8 7]
[7 6 8 9 8]
[8 7 9 9 9]
[8 8 9 9 9]]
[[8 8 9 9 9]
[8 7 9 9 9]
[7 6 8 9 8]
[6 5 4 8 7]
[5 5 4 8 7]
[4 4 4 7 6]
[4 2 3 7 5]
[4 0 2 6 4]
[2 0 0 2 2]
[0 0 0 2 0]]

#Column-wise sort and inverse sort of image (2d array)
import numpy as np
#Define random array and sort it
nrows = 10
ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
#Save original order of columns
ori_indices = np.argsort(np.argsort(a, axis=0), axis=0)
#Do some work on sorted array, like e.g row-wise filtering.
#....
#After processing sorted array, move it back to original order:
c=np.array([[i] for i in range(ncols)]).T
a_backsorted = a_sorted[ori_indices, c]
#Check results
print (a); print ()
print (a_backsorted); print ()
print (a_sorted); print ()

import numpy as np
nrows = 10; ncols = 5
a = np.random.randint(nrows, size=(nrows, ncols))
a_sorted = np.sort(a, axis=0)
a_backsorted = np.zeros_like(a)
c = np.array([[i] for i in range(ncols)]).T
a_backsorted[np.argsort(a, axis=0), c] = a_sorted
The reverting of the column-wise sorting is done by inserting the values of the sorted array at the argsorted positions in the backsorted array. Since this is done columnwise, the argsorted positions are paired with the columns represented in the c array

Related

An efficient way to concatenate rows of a 2-dim array according to a given list of pairs of indexes

Suppose I have a 2 dimensional array with a very large number of rows, and a list of pairs of indexes of that array. I want to create a new 2 dim array, whose rows are concatenations of the rows of the original array, made according to the list of pairs of indexes. For example:
a =
1 2 3
4 5 6
7 8 9
0 0 0
indexes = [[0,0], [0,1], [2,3]]
the returned array should be:
1 2 3 1 2 3
1 2 3 4 5 6
7 8 9 0 0 0
Obviously I can iterate the list of indexes, but my question is whether there is a more efficient way of doing this. I should say that the list of indexes is also very large.
First convert indexes to a Numpy array:
ind = np.array(indexes)
Then generate your result as:
result = np.concatenate([a[ind[:,0]], a[ind[:,1]]], axis=1)
The result is:
array([[1, 2, 3, 1, 2, 3],
[1, 2, 3, 4, 5, 6],
[7, 8, 9, 0, 0, 0]])
Another possible formula (with the same result):
result = np.concatenate([ a[ind[:,i]] for i in range(ind.shape[1]) ], axis=1)
You can do this in one line using NumPy as:
a = np.arange(12).reshape(4, 3)
print(a)
b = [[0, 0], [1, 1], [2, 3]]
b = np.array(b)
print(b)
c = a[b.reshape(-1)].reshape(-1, a.shape[1]*b.shape[1])
print(c)
'''
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[0 0]
[1 1]
[2 3]]
[[ 0 1 2 0 1 2]
[ 3 4 5 3 4 5]
[ 6 7 8 9 10 11]]
'''
You can use horizontal stacking np.hstack:
c = np.array(indexes)
np.hstack((a[c[:,0]],a[c[:,1]]))
output:
[[1 2 3 1 2 3]
[1 2 3 4 5 6]
[7 8 9 0 0 0]]

Delete specific Values of an Array: Python

I have an array of the shape (1179648, 909).
The problem is that some rows are filled with 0's only. I am checking for this as follows:
for i in range(spectra1Only.shape[0]):
for j in range(spectra1Only.shape[1]):
if spectra1Only[i,j] == 0:
I now want to remove the whole row of [i] if there is any 0 appearing to get a smaller amount of only the data needed.
My question is: what would be the best method to do so? Remove? Del? numpy.delete? Or any other method?
You can use Boolean indexing with np.any along axis=1:
spectra1Only = spectra1Only[~(spectra1Only == 0).any(1)]
Here's a demonstration:
A = np.random.randint(0, 9, (5, 5))
print(A)
[[5 0 3 3 7]
[3 5 2 4 7]
[6 8 8 1 6]
[7 7 8 1 5]
[8 4 3 0 3]]
print(A[~(A == 0).any(1)])
[[3 5 2 4 7]
[6 8 8 1 6]
[7 7 8 1 5]]

Python numpy: reshape list into repeating 2D array

I'm new to python and I have a question about numpy.reshape. I currently have 2 lists of values like this:
x = [0,1,2,3]
y = [4,5,6,7]
And I want them to be in separate 2D arrays, where each item is repeated for the length of the original lists, like this:
xx = [[0,0,0,0]
[1,1,1,1]
[2,2,2,2]
[3,3,3,3]]
yy = [[4,5,6,7]
[4,5,6,7]
[4,5,6,7]
[4,5,6,7]]
Is there a way to do this with numpy.reshape, or is there a better method I could use? I would very much appreciate a detailed explanation. Thanks!
numpy.meshgrid will do this for you.
N.B. From your requested output, it looks like you want ij indexing, not the default xy
from numpy import meshgrid
x = [0,1,2,3]
y = [4,5,6,7]
xx,yy=meshgrid(x,y,indexing='ij')
print xx
>>> [[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
print yy
>>> [[4 5 6 7]
[4 5 6 7]
[4 5 6 7]
[4 5 6 7]]
For reference, here's xy indexing
xx,yy=meshgrid(x,y,indexing='xy')
print xx
>>> [[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
print yy
>>> [[4 4 4 4]
[5 5 5 5]
[6 6 6 6]
[7 7 7 7]]

Write multidimensional numpy array to csv

I have a multidimensional numpy array containing function values, and I'd like to write it to a long csv. How can I do that cleanly? I couldn't find a numpy function but maybe I was googling the wrong terms. An example:
#!/usr/bin/python
import csv
import numpy as np
x = np.array([1, 2, 3, 4])
y = np.array([50, 51])
z = np.array([99, 100, 101])
f = np.arange(24).reshape((4, 2, 3)) # Contains f(x, y, z)
assert f.shape == (x.size, y.size, z.size)
## I'd like to create a csv file whose columns are x, y, z, f
## How can I do that?
## np.savetxt("test.csv", a, delimiter=",")
## TypeError: float argument required, not numpy.ndarray
## Works, but does numpy already have a function that does this?
with open("test.csv", "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=",", quotechar="'", quoting=csv.QUOTE_MINIMAL)
writer.writerow(["x", "y", "z", "f"])
for x_index in range(x.size):
for y_index in range(y.size):
for z_index in range(z.size):
writer.writerow([x[x_index], y[y_index], z[z_index],
f[x_index, y_index, z_index]])
I have three vectors x, y, z and an X-by-Y-by-Z array containing function values f(x, y, z). In other words, f[i, j, k] contains the function value f that corresponds to x[i], y[j] and z[k]. Is there a cleaner way to write a long csv with columns x,y,z,f?
Here's head test.csv:
x,y,z,f
1,50,99,0
1,50,100,1
1,50,101,2
1,51,99,3
1,51,100,4
1,51,101,5
2,50,99,6
2,50,100,7
2,50,101,8
Edit: This seems to work as well:
x_y_z = np.array([x for x in itertools.product(x, y, z)])
assert x_y_z.shape[0] == f.size
output_array = np.hstack((x_y_z, f.flatten().reshape((f.size, 1)))
np.savetxt("test2.csv", output_array, comments="", delimiter=",", fmt="%i",
header="x,y,z,f")
Am I reinventing the wheel?
In fact, yes it's lightly more complicated than what it should be.
Given 3 lists x,y and z
import numpy as np
x = [1,2,3]
y = [4,5]
z = [6,7,8]
You need to modify this lists in order to get all possible combinations, use numpy.repeat this way:
new_x = np.array(x).repeat(len(y)*len(z))
print new_x
>> [1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3]
new_y = np.array([y]).repeat(len(z),axis=0).repeat(len(x),axis=1)
print new_y
>> [4 4 4 5 5 5 4 4 4 5 5 5 4 4 4 5 5 5]
new_z = np.array([z]).repeat(len(x)*len(y),axis=0)
print new_z
>> [6 7 8 6 7 8 6 7 8 6 7 8 6 7 8 6 7 8]
# reshape y and z just like new_x
new_y = new_y.reshape(new_x.shape)
new_z = new_z.reshape(new_x.shape)
just concatenate them!
# suppose that your vector f
f = np.array(range(len(x)*len(y)*len(z)))
matrix = np.array([new_x,new_y,new_z,f]).T
# or matrix = np.concatenate((np.concatenate((new_x,new_y),axis=1),np.concatenate((new_z,f),axis=1)),axis=1).T
print matrix
>>
[[ 1 4 6 0]
[ 1 4 7 1]
[ 1 4 8 2]
[ 1 5 6 3]
[ 1 5 7 4]
[ 1 5 8 5]
[ 2 4 6 6]
[ 2 4 7 7]
[ 2 4 8 8]
[ 2 5 6 9]
[ 2 5 7 10]
[ 2 5 8 11]
[ 3 4 6 12]
[ 3 4 7 13]
[ 3 4 8 14]
[ 3 5 6 15]
[ 3 5 7 16]
[ 3 5 8 17]]
finally, save the array as csv
np.savetxt('file_name.csv',matrix)

Performance of NumPy for algorithms concerning individual elements of an array

I'm interested in the performance of NumPy, when it comes to algorithms that check whether a condition is True for an element and its affiliations (e.g. neighbouring elements) and assign a value according to the condition.
An example may be: (I make this up now)
I generate a 2d array of 1's and 0's, randomly.
Then I check whether the first element of the array is the same with its neighbors.
If the similar ones are the majority, I switch (0 -> 1 or 1 -> 0) that particular element.
And I proceed to the next element.
I guess that this kind of element wise conditions and element-wise operations are pretty slow with NumPy, is there a way that I can make the performance better?
For example, would creating the array with type dbool and adjusting the code, would it help?
Thanks in advance.
Maybe http://www.scipy.org/Cookbook/GameOfLifeStrides helps you.
It looks like your are doing some kind of image processing, you can try scipy.ndimage.
from scipy.ndimage import convolve
import numpy as np
np.random.seed(0)
x = np.random.randint(0,2,(5,5))
print x
w = np.ones((3,3), dtype=np.int8)
w[1,1] = 0
y = convolve(x, w, mode="constant")
print y
the outputs are:
[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
[0 0 0 0 1]
[0 1 1 0 0]]
[[3 4 4 5 2]
[3 5 5 5 3]
[2 4 4 4 4]
[2 3 3 3 1]
[1 1 1 2 1]]
y is the sum of the neighbors of every element. Do the same convolve with all ones, you get the number of neighbors number of every element:
>>> n = convolve(np.ones((5,5),np.int8), w, mode="constant")
>>> n
[[3 5 5 5 3]
[5 8 8 8 5]
[5 8 8 8 5]
[5 8 8 8 5]
[3 5 5 5 3]]
then you can do element-wise operations with x, y, n, and get your result.

Categories