I have one array with dimension (1538,4) called X_scaled and another array with dimensions (1538,1) called Y_mlp. I want to add Y_mlp to X_scaled such that Y_mlp becomes the fifth column in X_scaled. How can I do this?
You're looking for np.hstack.
numpy.hstack(tup)
Take a sequence of arrays and stack them horizontally to make a single
array.
import numpy as np
X_scaled, Y_mlp = ..., ...
Y_mlp = Y_mlp.reshape(-1, 1)
out = np.hstack((X_scaled, Y_mlp))
print(out.shape)
Output:
(5, 5)
Concatenation occurs along the second dimension.
Related
I have four Numpy arrays of shapes:
(2577, 42)
(2580, 100)
(2580, 236)
(2580, 8)
(2580, 37)
When I try to concatenate all of them do except (2577, 42). I get an error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 2580 and the array at index 4 has size 2577
The code I am using:
dataset = np.concatenate((onehot_b, num_v, onehot_s, onehot_c, onehot_s), axis=1)
Is there a way to fix this?
The error is prety clear. You Cannot concatenate arrays of different sizes. One possible way out is convert the numpy arrays to lists and append all list lines to you dataset.
Numpy does not allow non-rectangular arrays, meaning that all sub-arrays should have the same dimension along the same axis. In your case, 2577 and 2580, are dimensions along same axis=0 that you are not stacking over (hence not adding them along that axis and they should have same length). If you can change all of them to have same first dimension shape, you can use concatenate. If you insist on stacking them, another way is just stacking arrays rather than their content:
dataset = np.asarray([onehot_b, num_v, onehot_s, onehot_c, onehot_s])
This will create an array of arrays for you.
I'm currently learning about broadcasting in Numpy and in the book I'm reading (Python for Data Analysis by Wes McKinney the author has mentioned the following example to "demean" a two-dimensional array:
import numpy as np
arr = np.random.randn(4, 3)
print(arr.mean(0))
demeaned = arr - arr.mean(0)
print(demeaned)
print(demeand.mean(0))
Which effectively causes the array demeaned to have a mean of 0.
I had the idea to apply this to an image-like, three-dimensional array:
import numpy as np
arr = np.random.randint(0, 256, (400,400,3))
demeaned = arr - arr.mean(2)
Which of course failed, because according to the broadcasting rule, the trailing dimensions have to match, and that's not the case here:
print(arr.shape) # (400, 400, 3)
print(arr.mean(2).shape) # (400, 400)
Now, i have gotten it to work mostly, by substracting the mean from every single index in the third dimension of the array:
demeaned = np.ones(arr.shape)
for i in range(3):
demeaned[...,i] = arr[...,i] - means
print(demeaned.mean(0))
At this point, the returned values are very close to zero and i think, that's a precision error. Am i actually right with this thought or is there another caveat, that i missed?
Also, this doesn't seam to be the cleanest, most 'numpy'-way to achieve what i wanted to achieve. Is there a function or a principle that i can make use of to improve the code?
As of numpy version 1.7.0, np.mean, and several other functions, accept a tuple in their axis parameter. This means that you can perform the operation on the planes of the image all at once:
m = arr.mean(axis=(0, 1))
This mean will have shape (3,), with one element for each plane of the image.
If you want to subtract the means of each pixel individually, you have to remember that broadcasting aligns shape tuples on the right edge. That means that you need to insert an extra dimension:
n = arr.mean(axis=2)
n = n.reshape(*n.shape, 1)
Or
n = arr.mean(axis=2)[..., None]
Try np.apply_along_axis().
np.apply_along_axis(lambda x: x - np.mean(x), 2, arr)
Output: you get the array of the same shape where each cell is demeaned in the dimension you want (the second parameter, here it is 2).
This is an example of my error. Say i created a numpy array
X = np.zeros((1000, 50))
Where 1000 is the features (rows) and 50 is the examples (columns)
Since i am adding examples one by one i will have to replace columns in the array 1 by 1 to get the final feature array. I tried this:
X[:,i] = example
where example is of size (1000, 1), and i is iterated for every example. This does not work because X[:,i] is of shape (1000,), a rank 1 array. How do i code it so that each example replaces a row of the X array without throwing the broadcast error. Thank you.
Reshape your vector before assigning it.
X[:,i] = example.reshape(-1,)
This will suppress the second dimension and turn example into shape (1000,)
Or, avoiding assigning one by one in the loop you can put all of your arrays in a list and then call np.array on your list and transpose it to have them as columns. This will probably work better if you can construct your list of arrays in a list comprehension.
Example:
arrs = [np.random.randint(10, size=5) for _ in range(5)]
X = np.array(arrs).T
I'm trying to load MNIST dataset into arrays.
When I use
(X_train, y_train), (X_test, y_test)= mnist.load_data()
I get an array y_test(10000,) but I want it to be in the shape of (10000,1).
What is the difference between array(10000,1) and array(10000,)?
How can I convert the first array to the second array?
Your first Array with shape (10000,) is a 1-Dimensional np.ndarray.
Since the shape attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,) and not (10000) (which would be an int). So currently your data looks like this:
import numpy as np
a = np.arange(5) # >>> array([0, 1, 2, 3, 4]
print(a.shape) # >>> (5,)
What you want is an 2-Dimensional array with shape of (10000, 1).
Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims() or np.reshape().
Using np.expand_dims:
import numpy as np
b = np.array(np.arange(5)) # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1) # >>> array([[0],[1],[2],[3],[4]])
The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.
Using np.reshape:
import numpy as np
a = np.arange(5)
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])
The shape=[-1, 1] specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally.
Reshape is a more powerful function than expand_dims and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()
An array with a size of (10,1) is a 2D array containing empty columns.
An array with a size of (10,) is a 1D array.
To convert (10,1) to (10,), you can simply collapse the columns. For example, we take the x array, which has x.shape = (10,1). now using x[:,] you can collapse the columns and x[:,].shape = (10,).
To convert (10,) to (10,1), you can add a dimension by using np.newaxis. So, after import numpy as np, assuming we are using numpy arrays here. Take a y array for example, which has y.shape = (10,). Using y[:, np.newaxis], you can a new array with the shape of (10,1).
I am analyzing some image represented datasets using keras. I am stuck that I have two different dimensions of images. Please see the snapshot. Features has 14637 images having dimension (10,10,3) and features2 has dimension (10,10,100)
Is there any way that I can merge/concatenate these two data together.?
If features and features2 contain the features of the same batch of images, that is features[i] is the same image of features2[i] for each i, then it would make sense to group the features in a single array using the numpy function concatenate():
newArray = np.concatenate((features, features2), axis=3)
Where 3 is the axis along which the arrays will be concatenated. In this case, you'll end up with a new array having dimension (14637, 10, 10, 103).
However, if they refer to completely different batches of images and you would like to merge them on the first axis such that the 14637 images of features2 are placed after the first 14637 image, then, there no way you can end up with an array, since numpy array are structured as matrix, non as a list of objects.
For instance, if you try to execute:
> a = np.array([[0, 1, 2]]) // shape = (1, 3)
> b = np.array([[0, 1]]) // shape = (1, 2)
> c = np.concatenate((a, b), axis=0)
Then, you'll get:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
since you are concatenating along axis = 0 but axis 1's dimensions differ.
If dealing with numpy arrays, you should be able to use concatenate method and specify the axis, along which the data should be merged. Basically: np.concatenate((array_a, array_b), axis=2)
I think it would be better if you use class.
class your_class:
array_1 = []
array_2 = []
final_array = []
for x in range(len(your_previous_one_array)):
temp_class = your_class
temp_class.array_1 = your_previous_one_array
temp_class.array_2 = your_previous_two_array
final_array.append(temp_class)