Python/Numpy - merge 2 items in each "row" of 1d array - python

I have a quick question:
I have an array like this:
array([('A', 'B'),
('C', 'D'),
dtype=[('group1', '<U4'), ('group2', '<U4')])
And I would like to combine group1 and group2 into 1 like this:
array([('A_B'),
('C_D'),
dtype=[('group3', '<U4')])
I tried some different things from other answers like this:
array_test = np.array([])
for group in array_test:
combi = np.append(combi,np.array(group[0]+"_"+group[1]))
this does give me a new array with what I want, but when I try to add it to the array I get an error which I can't figure out (don't really know what it means):
np.append(test_array, combi, axis=1)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
I tried other thing with concaternate as well but it gave the same error
could someone help me?

The error means that you try to append a 1D array (shape(n,)) to another 1D array along the the second dimension (axis=1) which is impossible as your arrays have only one dimension.
If you don't specify the axis (or axis=0) you'll end up, however, with just a 1D array like array(['A_B', 'C_D']). To get a structured array as requested you need to create a new array like np.array(combi, dtype=[('group3', '<U4')]).
You can do the same vectorized without a loop:
np.array(np.char.add(np.char.add(a['group1'], '_'), a['group2']), dtype=[('group3', '<U4')])

Related

Is there a way to write a python function that will create 'N' arrays? (see body)

I have an numpy array that is shape 20, 3. (So 20 3 by 1 arrays. Correct me if I'm wrong, I am still pretty new to python)
I need to separate it into 3 arrays of shape 20,1 where the first array is 20 elements that are the 0th element of each 3 by 1 array. Second array is also 20 elements that are the 1st element of each 3 by 1 array, etc.
I am not sure if I need to write a function for this. Here is what I have tried:
Essentially I'm trying to create an array of 3 20 by 1 arrays that I can later index to get the separate 20 by 1 arrays.
a = np.load() #loads file
num=20 #the num is if I need to change array size
num_2=3
for j in range(0,num):
for l in range(0,num_2):
array_elements = np.zeros(3)
array_elements[l] = a[j:][l]
This gives the following error:
'''
ValueError: setting an array element with a sequence
'''
I have also tried making it a dictionary and making the dictionary values lists that are appended, but it only gives the first or last value of the 20 that I need.
Your array has shape (20, 3), this means it's a 2-dimensional array with 20 rows and 3 columns in each row.
You can access data in this array by indexing using numbers or ':' to indicate ranges. You want to split this in to 3 arrays of shape (20, 1), so one array per column. To do this you can pick the column with numbers and use ':' to mean 'all of the rows'. So, to access the three different columns: a[:, 0], a[:, 1] and a[:, 2].
You can then assign these to separate variables if you wish e.g. arr = a[:, 0] but this is just a reference to the original data in array a. This means any changes in arr will also be made to the corresponding data in a.
If you want to create a new array so this doesn't happen, you can easily use the .copy() function. Now if you set arr = a[:, 0].copy(), arr is completely separate to a and changes made to one will not affect the other.
Essentially you want to group your arrays by their index. There are plenty of ways of doing this. Since numpy does not have a group by method, you have to horizontally split the arrays into a new array and reshape it.
old_length = 3
new_length = 20
a = np.array(np.hsplit(a, old_length)).reshape(old_length, new_length)
Edit: It appears you can achieve the same effect by rotating the array -90 degrees. You can do this by using rot90 and setting k=-1 or k=3 telling numpy to rotate by 90 k times.
a = np.rot90(a, k=-1)

Numpy array dimensions are unexpectedly reordered after indexing [duplicate]

I have the following minimal example:
a = np.zeros((5,5,5))
a[1,1,:] = [1,1,1,1,1]
print(a[1,:,range(4)])
I would expect as output an array with 5 rows and 4 columns, where we have ones on the second row. Instead it is an array with 4 rows and 5 columns with ones on the second column. What is happening here, and what can I do to get the output I expected?
This is an example of mixed basic and advanced indexing, as discussed in https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The slice dimension has been appended to the end.
With one scalar index this is a marginal case for the ambiguity described there. It's been discussed in previous SO questions and one or more bug/issues.
Numpy sub-array assignment with advanced, mixed indexing
In this case you can replace the range with a slice, and get the expected order:
In [215]: a[1,:,range(4)].shape
Out[215]: (4, 5) # slice dimension last
In [216]: a[1,:,:4].shape
Out[216]: (5, 4)
In [219]: a[1][:,[0,1,3]].shape
Out[219]: (5, 3)

Call numpy ravel and include an interger multi-index

I have 4-dimensional array. I am going to turn it into a 1-dim array. I use numpy ravel and it works fin with the default parameters.
However I would also like the positions/indices in the 4-dim array.
I want something like this as row in my output.
x,y,z,w,value
With x being the first dimension of my initial array and so on.
The obvious approach is iteration, however I was told to avoid it when I can.
for i in range(test.shape[0]):
for j in range(test.shape[1]):
for k in range(test.shape[2]):
for l in range(test.shape[3]):
print(i,j,k,l,test[(i,j,k,l)])
It will be to slow when I use a larger dataset.
Is there a way to configure ravel to do this or any other approach faster than iteration.
Use np.indices with sparse=False, combined with np.concatenate to build the array. np.indices provides the first n columns, and np.concatenate appends the last one:
test = np.random.randint(10, size=(3, 5, 4, 2))
index = np.indices(test.shape, sparse=False) # shape: (4, 3, 5, 4, 2)
data = np.concatenate((index, test[None, ...]), axis=0).reshape(test.ndim + 1, -1).T
A more detailed breakdown:
index is a (4, *test.shape) array, with one element per dimension.
To make test concatenatable with index, you need to prepend a unit dimension, which is what test[None, ...] does. None is synonymous with np.newaxis, and Ellipsis, or ..., means "all the remaining dimensions".
When you concatenate along axis=0, you are appending test to the array of indices. Each element of index along the first axis is now a 5-element array containing the index followed by the value. The remaining axes reflect the shape of test, but besides that, you have what you want.
The goal is to flatten out the trailing dimensions, so you get a (5, N) array, where N = np.prod(test.shape). Thats what the final reshape does. test.ndim + 1 is the size of the index +1 for the value. -1 can appear exactly once in a reshape. It means "product of all the remaining dimensions".

Find minimum values of numpy columns

Looking to print the minimum values of numpy array columns.
I am using a loop in order to do this.
The array is shaped (20, 3) and I want to find the min values of columns, starting with the first (i.e. col_value=0)
I have coded
col_value=0
for col_value in X:
print(X[:, col_value].min)
col_value += 1
However, it is coming up with an error
"arrays used as indices must be of integer (or boolean) type"
How do I fix this?
Let me suggest an alternative approach that you might find useful. numpy min() has axis argument that you can use to find min values along various
dimensions.
Example:
X = np.random.randn(20, 3)
print(X.min(axis=0))
prints numpy array with minimum values of X columns.
You don't need col_value=0 nor do you need col_value+=1.
x = numpy.array([1,23,4,6,0])
print(x.min())
EDIT:
Sorry didn't see that you wanted to iterate through columns.
import numpy as np
X = np.array([[1,2], [3,4]])
for col in X.T:
print(col.min())
Transposing the axis of the matrix is one the best solution.
X=np.array([[11,2,14],
[5,15, 7],
[8,9,20]])
X=X.T #Transposing the array
for i in X:
print(min(i))

Reshaping Numpy Arrays to a multidimensional array

For a numpy array I have found that
x = numpy.array([]).reshape(0,4)
is fine and allows me to append (0,4) arrays to x without the array losing its structure (ie it dosnt just become a list of numbers). However, when I try
x = numpy.array([]).reshape(2,3)
it throws an error. Why is this?
This out put will explain what it mean to reshape an array...
np.array([2, 3, 4, 5, 6, 7]).reshape(2, 3)
Output -
array([[2, 3, 4],
[5, 6, 7]])
So reshaping just means reshaping an array. reshape(0, 4) means convert the current array into a format with 0 rows and 4 columns intuitively. But 0 rows means no elements means so it works as your array is empty. Similarly (2, 3) means 2 rows and 3 columns which is 6 elements...
reshape is not an 'append' function. It reshapes the array you give it to the dimensions you want.
np.array([]).reshape(0,4) works because you reshape a zero element array to a 0x4(=0 elements) array.
np.reshape([]).reshape(2,3) doesn't work because you're trying to reshape a zero element array to a 2x3(=6 elements) array.
To create an empty array use np.zeros((2,3)) instead.
And in case you're wondering, numpy arrays can't be appended to. You'll have to work around by casting it as a list, appending what you want and the converting back to a numpy array. Preferably, you only create a numpy array when you don't mean to append data later.

Categories