How to tile a 1D numpy array using uneven subarrays as tiles? - python

Is there a way to use sub-arrays of a 1-D array as the input tiles for np.tile? I start with:
a 1D array,
the sizes of each of the tiles,
the number of repeats for each tile.
In this case, the number of repeats for each tile is equal to the number of elements in that tile.
Example:
arr = np.array([0,1,2,3,4])
tile_sizes = np.array([2, 3])
num_repeats = tile_sizes
#do some np.tile thing here
and the output array will be:
np.array([0,1,0,1,2,3,4,2,3,4,2,3,4])
note that the first 2 elements (0 and 1) formed a tile of shape (2,) which was repeated 2 times. The next tile was 3 elements (2,3, and 4) and was tiled 3 times.
The use-case for this will involve arrays of a million elements, so memory and speed are concerns, meaning broadcasting is preferred.
A non-broadcasting way to achieve this looks like:
tiles = np.split(arr, np.cumsum(tile_sizes)[:-1])
repeated_tiles = [np.tile(tile, tile.shape[0]) for tile in tiles]
output = np.concatenate(repeated_tiles)
output
>>>>>
array([0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4])

It's not a perfect solution, but you can get rid of the list comprehension using np.repeat if that helps.
a = np.arange(5)
tile_sizes = np.array([2, 3])
tiles = np.array(np.split(a, np.cumsum(tile_sizes)[:-1]), dtype=np.object)
tiles = np.concatenate(np.repeat(tiles, tile_sizes))

Related

Efficiently create 2d numpy array given 1 dimension and a constant

Given an x-dataset,
x = np.array([1, 2, 3, 4, 5])
what is the most efficient way to create the NumPy array where each x coordinate is paired with a y-coordinate of value 0? I am wondering if there is a way specifically that doesn't require any hard coding, so that x could vary in length without causing failure.
As per your problem statement, the following is one way to do it.
# initialize an array of zeros
In [36]: res = np.zeros((2, *x.shape), dtype=x.dtype)
# fill `x` as first row
In [37]: res[0] = x
In [38]: res
Out[38]:
array([[1, 2, 3, 4],
[0, 0, 0, 0]])
When we initialize the array of zeros, we use 2 for axis-0 dimension since your requirement is to create a 2D array. For the column size we simply take the length from the x array. For reasonably larger arrays, this approach would be the fastest.

Randomly select rows from numpy array based on a condition

Let's say I have 2 arrays of arrays, labels is 1D and data is 5D note that both arrays have the same first dimension.
To simplify things let's say labels contain only 3 arrays :
labels=np.array([[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]])
And let's say I have a datalist of data arrays (length=3) where each array has a 5D shape where the first dimension of each one is the same as the arrays of the labels array.
In this example, datalist has 3 arrays of shapes : (8,3,100,10,1), (5,3,100,10,1) and (10,3,100,10,1) respectively. Here, the first dimension of each of these arrays is the same as the lengths of each array in label.
Now I want to reduce the number of zeros in each array of labels and keep the other values. Let's say I want to keep only 3 zeros for each array. Therefore, the length of each array in labels as well as the first dimension of each array in data will be 6, 4 and 8.
In order to reduce the number of zeros in each array of labels, I want to randomly select and keep only 3. Now these same random selected indexes will be used then to select the correspondant rows from data.
For this example, the new_labels array will be something like this :
new_labels=np.array([[0,0,1,1,2,0],[4,0,0,0],[0,3,2,1,0,1,7,0]])
Here's what I have tried so far :
all_ind=[] #to store indexes where value=0 for all arrays
indexes_to_keep=[] #to store the random selected indexes
new_labels=[] #to store the final results
for i in range(len(labels)):
ind=[] #to store indexes where value=0 for one array
for j in range(len(labels[i])):
if (labels[i][j]==0):
ind.append(j)
all_ind.append(ind)
for k in range(len(labels)):
indexes_to_keep.append(np.random.choice(all_ind[i], 3))
aux= np.zeros(len(labels[i]) - len(all_ind[i]) + 3)
....
....
Here, how can I fill **aux** with the values ?
....
....
new_labels.append(aux)
Any suggestions ?
Playing with numpy arrays of different lenghts is not a good idea therefore you are required to iterate each item and perform some method on it. Assuming you want to optimize that method only, masking might work pretty well here:
def specific_choice(x, n):
'''leaving n random zeros of the list x'''
x = np.array(x)
mask = x != 0
idx = np.flatnonzero(~mask)
np.random.shuffle(idx) #dynamical change of idx value, quite fast
idx = idx[:n]
mask[idx] = True
return x[mask] # or mask if you need it
Iteration of list is faster than one of array so effective usage would be:
labels = [[0,0,0,1,1,2,0,0],[0,4,0,0,0],[0,3,0,2,1,0,0,1,7,0]]
output = [specific_choice(n, 3) for n in labels]
Output:
[array([0, 1, 1, 2, 0, 0]), array([0, 4, 0, 0]), array([0, 3, 0, 2, 1, 1, 7, 0])]

numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]

This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
Closed 3 years ago.
Let's consider a as an 1D row/horizontal array:
import numpy as np
N = 10
a = np.arange(N) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.shape # (10,)
now I want to have b a 1D column/vertical array transposed of a:
b = a.transpose() # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b.shape # (10,)
but the .transpose() method returns an identical ndarray whith the exact same shape!
What I expected to see was
np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
which can be achieved by
c = a.reshape(a.shape[0], 1) # or c = a; c.shape = (c.shape[0], 1)
c.shape # (10, 1)
and to my surprise, it has a shape of (10, 1) instead of (1, 10).
In Octave/Scilab I could do:
N = 10
b = 0:(N-1)
a = b'
size(b) % ans = 1 10
size(a) % ans = 10 1
I understand that numpy ndarrays are not matrices (as discussed here), but the behavior of the numpy's transpose function just doesn't make sense to me! I would appreciate it if you could help me understand how this behavior makes sense and what am I missing here.
P.S. So what I have understood so far is that b = a.transpose() is the equivalent of b = a; b.shape = b.shape[::-1] which if you had a "2D array" of (N, 1) would return a (1, N) shaped array, as you would expect from a transpose operator. However, numpy seems to treat the "1D array" of (N,) as a 0D scalar. I think they should have named this method something else, as this is very misleading/confusing IMHO.
To understand the numpy array better, you should take a look at this review paper: The NumPy array: a structure for efficient numerical computation
In short, numpy ndarrays have this attribute called the stride, which is
the number of bytes to skip in memory to proceed to the next element.
For a (10, 10) array of bytes, for example, the strides may be (10,
1), in other words: proceed one byte to get to the next column and ten
bytes to locate the next row.
For your ndarray a, a.stride = (8,), which shows that it is only 1 dimensional, and that to get to the next element on this single dimension, you need to advance 8 bytes in memory (each int is 64-bit).
Strides are useful for representing transposes:
By modifying strides, for example, an array can be transposed or
reshaped at zero cost (no memory needs to be copied).
So if there was a 2-dimensional ndarray, say b = np.ones((3,5)) for example, then b.strides = (40, 8), while b.transpose().strides = (8, 40). So as you see a transposed 2D-ndarray is simply the exact same array, whose strides have been reordered. And since your 1D ndarray has only 1 dimension, swapping the the values of its strides (i.e. taking its transpose), doesn't do anything.
As you already mentioned that numpy array are not matrix. The defination of transpose function is like below
Permute the dimensions of an array.
Which means that numpy's transpose method will move data from one dimension to another. As 1D array has only one dimension there is no other dimension to move the data t0. So you need add a dimension before transpose has any effect. This behavior make sense also to be consistent with higher dimensional array (3D, 4D ...) array.
There is a clean way to achive what you want
N = 10
a = np.arange(N)
a[ :, np.newaxis]

Numpy Basics - How to Interpret [:,] in array access

I have an nd-array A
A.shape
(2, 500, 3)
What's the difference between A[:] and A[:,2]
Coming from Python, the ',' in the array access is confusing me a lot.
The commas separate the subscripts for each dimension. So, for example, if the matrix M is defined as
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
then M[2, 1] would be 8 (third row, second column).
The subscript for each dimension can also be a slice, where : represents a full slice, like a slice in normal Python sequences. For example, M[:, 2] would select from every row the third column, which would be [3, 6, 9].
Any additional dimensions for which a subscript is not provided are implicitly full slices. In your example, A[:,2] is equivalent to A[:, 2, :]. If you consider the (2, 500, 3) shaped array to be two stacked matrices with 500 rows and 3 columns, then A[:, 2, :] would select from both matrices the third row (and every column of the third row), which should have a shape of (2, 3).
When you have multidimensional NumPy arrays, the slicing operation [] can work if you provide tuple of slice() objects. If the number of tuples does not match your number of dimensions, this is equivalent to having a slice(None) (which abbreviates to :) in all the remaining dimensions. Note also that NumPy also accepts ... which means "fill the rest of the dimensions with :" - which is especially useful if you want to "fill" the initial dimensions.
So to recapitulate the following expression give identical results on your A array of A.ndim == 3:
A[:, 2]
A[:, 2, :]
A[:, 2, ...]
A[slice(None), 2]
A[slice(None), 2, slice(None)]
A[(slice(None), 2) + tuple(slice(None) for _ in range(A.ndim - 2))]

Can not reshape after numpy.bincount() (ValueError)

If I generate the b array using np.random.uniform() I can reshape it with no issues (so I can multiply it by the larger array a). But if I try the same line generating b using np.bincount(), I get a
ValueError: cannot reshape array of size 7 into shape (20,)
even thought both the a and b arrays have the exact same shape in both blocks.
import numpy as np
a = np.random.uniform(0., 1., 20)
# Works
b = np.random.uniform(0., 1., 7)
b.resize(a.shape)
d = b * a
# Does not work
c = [0, 4, 5, 4, 1, 3, 4, 5, 6, 6, 5, 6, 4, 6, 3, 1, 5, 4, 6, 0]
b = np.bincount(c)
b.reshape(a.shape)
d = b * a
NumPys resize can change the total number of elements. It discards elements if the new shape is smaller and fills elements with zeros in case the new shape is bigger (or repeats the arrays values in case you use the resize function). So it's no problem if you "resize" an array from size 7 to size 20.
Return a new array with the specified shape.
If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.
However reshape needs to keep the number of elements constant. That's why you can't reshape an array of length 7 to an array of size 20.
Gives a new shape to an array without changing its data.
Also the reshape method (and function) don't change the array in-place. Only the resize method does that (the resize function also doesn't!).
Thanks #user2357112 for pointing that out!

Categories