numpy array with n prefilled columns - python

I need to specialized numpy arrays. Assume I have a function:
def gen_array(start, end, n_cols):
It should behave like this, generating three columns where each column goes from start (inclusive) to end (exclusive):
>>> gen_array(20, 25, 3)
array([[20, 20, 20],
[21, 21, 21],
[22, 22, 22],
[23, 23, 23],
[24, 24, 24]])
My rather naïve implementation looks like this:
def gen_array(start, end, n_columns):
a = np.arange(start, end).reshape(end-start, 1) # create a column vector from start to end
return np.dot(a, [np.ones(n_columns)]) # replicate across n_columns
(It's okay, though not required, that the np.dot converts values to floats.)
I'm sure there's a better, more efficient and more numpy-ish way to accomplish the same thing. Suggestions?
Update
Buildin on a suggestion by #msi_gerva to use np.tile, my latest best thought is:
def gen_array(start, end, n_cols):
return np.tile(np.arange(start, end).reshape(-1, 1), (1, n_cols))
... which seems pretty good to me.

In addition to numpy.arange and numpy.reshape, use numpy.repeat to extend your data.
import numpy as np
def gen_array(start, end, n_cols):
return np.arange(start, end).repeat(n_cols).reshape(-1, n_cols)
print(gen_array(20, 25, 3))
# [[20 20 20]
# [21 21 21]
# [22 22 22]
# [23 23 23]
# [24 24 24]]

The simplest I found:
The [:,None] adds a dimension to the array.
np.arange(start, end)[:,None]*np.ones(n_cols)

np.arange(start, end)[:, np.newaxis].repeat(n_cols, axis=1)

Related

Tensorflow: Interlieving two ragged tensors

Is it possible to interleave two Ragged Tensors in Tensorflow? Example:
I have two RaggedTensors with the same "shape":
a = [[[10,10]],[[20,20],[21,21]]]
b = [[[30,30]],[[40,40],[41,41]]]
I would like to interleave them so the resulting tensor looks like this:
c = [[[10,10],[30,30]],[[20,20],[40,40],[21,21],[41,41]]]
Note that both tensors a and b always have the same "shape".
I have been trying to use the stack and the concat functions but both of them return non-desired shapes:
tf.stack([a,b],axis=-1)
c = [[[[10, 30], [10, 30]]], [[[20, 40], [20, 40]], [[21, 41], [21, 41]]]]
tf.concat([a,b],axis=-1)
c = [[[10, 10, 30, 30]], [[20, 20, 40, 40], [21, 21, 41, 41]]]
I have seen some other solutions for regular tensors that reshape the resulting tensor c after applying the stack/concat functions. E.g.,:
a = [[[10, 10]], [[20, 20]]]
b = [[[30, 30]], [[40, 40]]]
tf.reshape(
tf.concat([a[..., tf.newaxis], b[..., tf.newaxis]], axis=1),
[a.shape[0], -1, a.shape[-1]])
c = [[[10, 10],[30, 30]],[[20, 20],[40, 40]]]
However, as far as I know, since I am using Ragged Tensors the shape in some dimensions is None (I am using TF2.6).
If this is a part of a preprocessing step, the tf.data.Dataset API is one route. This has the benefit in using the "interleave" function of mixing up the interleaving pattern with different block_length settings and can interleave an arbitrary number of lists.
# I made a little longer to emphasize raggedness
a = tf.ragged.constant([[[10,10]],[[20,20],[21,21],[23,23]]])
b = tf.ragged.constant([[[30,30]],[[40,40],[41,41]]])
c = tf.concat([a,b],axis=1)
NUM_ELEMENTS=7
# Option 1) more flexible
def ragged_to_ds(x):
return tf.data.Dataset.from_tensor_slices(x)
tf.data.Dataset.from_tensor_slices(c).interleave(ragged_to_ds, block_length=1).batch(NUM_ELEMENTS).get_single_element()
# Option 2) less mess but unbatch makes copies of the data
tf.data.Dataset.from_tensor_slices(c).unbatch().batch(NUM_ELEMENTS).get_single_element()
The tf.data.Dataset API can be powerful and expressive and can help with a large number of data processing and rearranging tasks.

How to sum a data cube with python

I am trying to collapse a fits data cube with Python. I know that special packages are doing it, but it is for a lecture purposes. I first extract a subcube in Z:
hdu.data = hdu.data[3365:3405, :, :]
subcube = hdu.data
The subcube has a dimension of Z=40, Y=50 and X=26. I want to collapse the cube in a all fashion way by a double loop in X and Y, in order to have a simple 2D image.
for i in range(1, xdim):
for j in range(1, ydim):
Sum[j,i] = subcube[:,j,i].sum()
I get an error message: IndexError: index 26 is out of bounds for axis 1 with size 26.
I know that python handle differently the cube dimensions as Z, Y, X and not X, Y, Z like IDL for example, but I can not figure out why I have the error.
Python indices start at 0. You need to do range(xdim) and range(ydim) in your for loops.
Python ranges starts with 0. Range for X is 0-25. For Y and Z the same.
Maybe simple double loop over subcube with new list creation can hel you?
z_flatten = [[sum(col) for col in row] for row in subcube]
The existing answers pointing out that Python is 0-indexed are correct, but no one pointed out yet that you don't even need to create an empty array with np.zeros or to use any for loops to do this.
Numpy already allows you to apply most operations along a specific axis of your array, as opposed to looping over the dimensions of your sub-cube and summing just one pixel at a time.
For example let's make a 3x4x4 data cube:
>>> cube = np.arange(3 * 4 * 4).reshape((3, 4, 4))
>>> cube
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]])
Say you want to sum all layers of a 3x3 slice of this cube:
>>> cube[:, :3, :3].sum(axis=0)
array([[48, 51, 54],
[60, 63, 66],
[72, 75, 78]])
In your case, the equivalent would be
subcube[:, :ydim, :xdim].sum(axis=0)
This is equivalent to what you're trying to do, but much more efficient.
As a general note, although you read your data cube out of a FITS file, since astropy.io.fits returns a Numpy array, any documentation or questions you can find about Numpy arrays apply--it generally isn't important at that point that it came from a FITS file. I point this out, just because it might help you in the future if you're struggling to perform operations on Numpy arrays.

How to slice arrays with a percantage of overlapping

I have a set of data like this:
numpy.array([[3, 7],[5, 8],[6, 19],[8, 59],[10, 42],[12, 54], [13, 32], [14, 19], [99, 19]])
which I want to split into number of chunkcs with a percantage of overlapping, for each column separatly... for example for column 1, splitting into 3 chunkcs with %50 overlapping (results in a 2-d array):
[[3, 5, 6, 8,],
[6, 8, 10, 12,],
[10, 12, 13, 14,]]
(ignoring last row which will result in [13, 14, 99] not identical in size as the rest).
I'm trying to make a function that takes the array, number of chunkcs and overlpapping percantage and returns the results.
That's a window function, so use skimage.util.view_as_windows:
from skimage.util import view_as_windows
out = view_as_windows(in_arr[:, 0], window_shape = 4, step = 2)
If you need numpy only, you can use this recipe
For numpy only, quite fast approach is:
def rolling(a, window, step):
shape = ((a.size - window)//step + 1, window)
strides = (step*a.itemsize, a.itemsize)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
And you can call it like so:
rolling(arr[:,0].copy(), 4, 2)
Remark: I've got unexpected outputs for rolling(arr[:,0], 4, 2) so just took a copy instead.

How to reverse engineer original array from boolean indexed array?

Ok so I wrote some code for vectorizing a symmetric matrix, it just takes the unique elements and turns them into a 1d vector, while also multiplying the off diagonal elements by root2:
def vectorize_mat(mat):
assert mat.shape[0] == mat.shape[1], 'Matrix is not square'
n = int(mat.shape[0])
vec_len = 0.5*n*(n+1)
weight_mat = (np.tri(n,k=-1)*np.sqrt(2))+np.identity(n)
mask_mat = np.tri(n).astype(bool)
vec_mat = (mat*weight_mat)[mask_mat]
return vec_mat
and this works really well, now I'm trying to figure out how to reconstruct the original array from the vector. I've gotten the original matrix dimensions like so:
v = len(vec_mat)
n = isqrt(2*v)
where isqrt() is an integer square root from:Integer square root in python
but I'm struggling with what to do next. I can now reconstruct the weight and mask matrices. So obviously I could vectorize the weight matrix and divide the vector by it, or divide the reconstructed matrix by the weight matrix to undo that step, but it's the reshaping and stuff (from the boolean indexing) that I don't know how to do. Maybe there's some super simple answer out there,but I can't seem to see it.
To answer your headline question. Indexing - including boolean indexing - can be used for assignment.
Here is an example. Let us first extract the lower triangle using a mask.
>>> a = np.arange(25).reshape(5, 5)
>>> y, x = np.ogrid[:5, :5]
>>> lower = y>=x
>>> b = a[lower]
Now b contains the lower triangle. We can use the same mask to reconstruct the lower triangle and fill the upper triangle symmetrically:
>>> recon = np.empty_like(a)
>>> recon[lower] = b
>>> recon.T[lower] = b
>>> recon
array([[ 0, 5, 10, 15, 20],
[ 5, 6, 11, 16, 21],
[10, 11, 12, 17, 22],
[15, 16, 17, 18, 23],
[20, 21, 22, 23, 24]])

Removing every nth element in an array

How do I remove every nth element in an array?
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
n = 3 # remove every 3rd element
...something like the opposite of x[0::n]? I've tried this, but of course it doesn't work:
for i in np.arange(0,len(x),n):
x = np.delete(x,i)
You're close... Pass the entire arange as subslice to delete instead of attempting to delete each element in turn, eg:
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
x = np.delete(x, np.arange(0, x.size, 3))
# [10 27 44 32 35 87 47 17]
I just add another way with reshaping if the length of your array is a multiple of n:
import numpy as np
x = np.array([0,10,27,35,44,32,56,35,87,22,47,17])
x = x.reshape(-1,3)[:,1:].flatten()
# [10 27 44 32 35 87 47 17]
On my computer it runs almost twice faster than the solution with np.delete (between 1.8x and 1.9x to be honnest).
You can also easily perfom fancy operations, like m deletions each n values etc.
Here's a super fast version for 2D arrays: Remove every m-th row and n-th column from a 2D array (assuming the shape of the array is a multiple of (n, m)):
array2d = np.arange(60).reshape(6, 10)
m, n = (3, 5)
remove = lambda x, q: x.reshape(x.shape[0], -1, q)[..., 1:].reshape(x.shape[0], -1).T
remove(remove(array2d, n), m)
returns:
array([[11, 12, 13, 14, 16, 17, 18, 19],
[21, 22, 23, 24, 26, 27, 28, 29],
[41, 42, 43, 44, 46, 47, 48, 49],
[51, 52, 53, 54, 56, 57, 58, 59]])
To generalize for any shape use padding or reduce the input array depending on your situation.
Speed comparison:
from time import time
'remove'
start = time()
for _ in range(100000):
res = remove(remove(array2d, n), m)
time() - start
'delete'
start = time()
for _ in range(100000):
tmp = np.delete(array2d, np.arange(0, array2d.shape[0], m), axis=0)
res = np.delete(tmp, np.arange(0, array2d.shape[1], n), axis=1)
time() - start
"""
'remove'
0.3835930824279785
'delete'
3.173515558242798
"""
So, compared to numpy.delete the above method is significantly faster.

Categories