Concatenate arrays inside a tuple in simpler way - python

I have a tuple X whose element is 2-D numpy arrays that have same 1st dimension and different 2nd dimension. I want to concatenate those arrays to make 1 big array. For example:
X = (np.array of shape[10,3], np.array of shape[10,5], np.array of shape[10,7]).
I want to make a final array Y that has a shape of [10,15] which is the concatenation of all elements in tuple X.
I did something like this. It works, but I'm asking if there is any shorter/simpler way to do this? Thanks!
def concat_arrays(data: tuple) -> np.ndarray:
final_array = data[0]
for i in range(len(data)):
if i > 0:
final_array = np.hstack((final_array,data[i]))
return final_array

It is this simple:
def concat_arrays(data: tuple) -> np.ndarray:
return np.hstack(data)
There is no need to iterate through and stack one at a time - that's why you are asked to pass a tuple instead of two separate arguments. (That isn't a real restriction in Python anyway - because *args exists - but still.)
But of course, there is no point to writing this, as we could simply use np.hstack directly.

Related

numpy: get ndarray's value from an index array

I have high dimension numpy array, the dimension of the array is not fixed. I need to retrieve the value with a index list, the length of the index list is the same as the dimension of the numpy array.
In other words, I need a function:
def get_value_by_list_index(target_array, index_list):
# len(index_list) = target_array.ndim
# target array can be in any dimension
# return the element at index specified on index list
For example, for a 3-dimension array, data and a list [i1, i2, i3], I the function should return data[i1][i2][i3].
Is there a good way to achieve this task?
If you know the ndarray is actually containing a type that is just a number well-representable by python types:
source_array.item(*index_iterable)
will do the same.
If you need to work with ndarrays of more complex types that might not have a python built-in type representation, things are harder.
You could implement exactly what you sketch in your comment:
data[i1][i2][i3]
# note that I didn't like the name of your function
def get_value_by_index_iterable(source_array, index_iterable):
subarray = source_array
for index in index_iterable:
subarray = subarray[index]
return subarray

expand numpy array in n dimensions

I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...

Random array from list of arrays by numpy.random.choice()

I have list of arrays similar to lstB and want to pick random collection of 2D arrays. The problem is that numpy somehow does not treat objects in lists equally:
lstA = [numpy.array(0), numpy.array(1)]
lstB = [numpy.array([0,1]), numpy.array([1,0])]
print(numpy.random.choice(lstA)) # returns 0 or 1
print(numpy.random.choice(lstB)) # returns ValueError: must be 1-dimensional
Is there an ellegant fix to this?
Let's call it semi-elegant...
# force 1d object array
swap = lstB[0]
lstB[0] = None
arrB = np.array(lstB)
# reinsert value
arrB[0] = swap
# and clean up
lstB[0] = swap
# draw
numpy.random.choice(arrB)
# array([1, 0])
Explanation: The problem you encountered appears to be that numpy when converting the input list to an array will make as deep an array as it can. Since all your list elements are sequences of the same length this will be 2d. The hack shown here forces it to make a 1d array of object dtype instead by temporarily inserting an incompatible element.
However, I personally would not use this. Because if you draw multiple subarrays with this method you'll get a 1d array of arrays which is probably not what you want and tedious to convert.
So I'd actually second what one of the comments recommends, i.e. draw ints and then use advanced indexing into np.array(lstB).

When should I use hstack/vstack vs append vs concatenate vs column_stack?

Simple question: what is the advantage of each of these methods. It seems that given the right parameters (and ndarray shapes) they all work seemingly equivalently. Do some work in place? Have better performance? Which functions should I use when?
If you have two matrices, you're good to go with just hstack and vstack:
If you're stacking a matrice and a vector, hstack becomes tricky to use, so column_stack is a better option:
If you're stacking two vectors, you've got three options:
And concatenate in its raw form is useful for 3D and above, see
my article Numpy Illustrated for details.
All the functions are written in Python except np.concatenate. With an IPython shell you just use ??.
If not, here's a summary of their code:
vstack
concatenate([atleast_2d(_m) for _m in tup], 0)
i.e. turn all inputs in to 2d (or more) and concatenate on first
hstack
concatenate([atleast_1d(_m) for _m in tup], axis=<0 or 1>)
colstack
transform arrays with (if needed)
array(arr, copy=False, subok=True, ndmin=2).T
append
concatenate((asarray(arr), values), axis=axis)
In other words, they all work by tweaking the dimensions of the input arrays, and then concatenating on the right axis. They are just convenience functions.
And newer np.stack:
arrays = [asanyarray(arr) for arr in arrays]
shapes = set(arr.shape for arr in arrays)
result_ndim = arrays[0].ndim + 1
axis = normalize_axis_index(axis, result_ndim)
sl = (slice(None),) * axis + (_nx.newaxis,)
expanded_arrays = [arr[sl] for arr in arrays]
concatenate(expanded_arrays, axis=axis, out=out)
That is, it expands the dims of all inputs (a bit like np.expand_dims), and then concatenates. With axis=0, the effect is the same as np.array.
hstack documentation now adds:
The functions concatenate, stack and
block provide more general stacking and concatenation operations.
np.block is also new. It, in effect, recursively concatenates along the nested lists.
numpy.vstack: stack arrays in sequence vertically (row wise).Equivalent to np.concatenate(tup, axis=0) example see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html
numpy.hstack: Stack arrays in sequence horizontally (column wise).Equivalent to np.concatenate(tup, axis=1), except for 1-D arrays where it concatenates along the first axis. example see:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html
append is a function for python's built-in data structure list. Each time you add an element to the list. Obviously, To add multiple elements, you will use extend. Simply put, numpy's functions are much more powerful.
example:
suppose gray.shape = (n0,n1)
np.vstack((gray,gray,gray)) will have shape (n0*3, n1), you can also do it by np.concatenate((gray,gray,gray),axis=0)
np.hstack((gray,gray,gray)) will have shape (n0, n1*3), you can also do it by np.concatenate((gray,gray,gray),axis=1)
np.dstack((gray,gray,gray)) will have shape (n0, n1,3).
In IPython you can look at the source code of a function by typing its name followed by ??. Taking a look at hstack we can see that it's actually just a wrapper around concatenate (similarly with vstack and column_stack):
np.hstack??
def hstack(tup):
...
arrs = [atleast_1d(_m) for _m in tup]
# As a special case, dimension 0 of 1-dimensional arrays is "horizontal"
if arrs[0].ndim == 1:
return _nx.concatenate(arrs, 0)
else:
return _nx.concatenate(arrs, 1)
So I guess just use whichever one has the most logical sounding name to you.

How to return an array of at least 4D: efficient method to simulate numpy.atleast_4d

numpy provides three handy routines to turn an array into at least a 1D, 2D, or 3D array, e.g. through numpy.atleast_3d
I need the equivalent for one more dimension: atleast_4d. I can think of various ways using nested if statements but I was wondering whether there is a more efficient and faster method of returning the array in question. In you answer, I would be interested to see an estimate (O(n)) of the speed of execution if you can.
The np.array method has an optional ndmin keyword argument that:
Specifies the minimum number of dimensions that the resulting array
should have. Ones will be pre-pended to the shape as needed to meet
this requirement.
If you also set copy=False you should get close to what you are after.
As a do-it-yourself alternative, if you want extra dimensions trailing rather than leading:
arr.shape += (1,) * (4 - arr.ndim)
Why couldn't it just be something as simple as this:
import numpy as np
def atleast_4d(x):
if x.ndim < 4:
y = np.expand_dims(np.atleast_3d(x), axis=3)
else:
y = x
return y
ie. if the number of dimensions is less than four, call atleast_3d and append an extra dimension on the end, otherwise just return the array unchanged.

Categories