Adding column of ones to numpy array - python

I am trying to simply add a column of ones to a numpy array but cannot find any easy solution to what I feel should be a straightforward answer. The number of rows in my array may change therefore the solution needs to generalise.
import numpy as np
X = np.array([[1,45,23,56,34,23],
[2,46,24,57,35,23]])
My desired output:
array([[ 1, 45, 23, 56, 34, 23, 1],
[ 2, 46, 24, 57, 35, 23, 1]])
I have tried using np.append and np.insert, but they either flatten the array or replace the values.
Thanks.

you can do hstack:
np.hstack((X,np.ones([X.shape[0],1], X.dtype)))
Output:
array([[ 1, 45, 23, 56, 34, 23, 1],
[ 2, 46, 24, 57, 35, 23, 1]])

You can use append, but you have to tell it which axis you want it to work along:
np.append(X, [[1],[1]], axis=1)

You can use numpy.c_
np.c_[X, [1, 1]]

You might use numpy.insert following way:
import numpy as np
X = np.array([[1,45,23,56,34,23], [2,46,24,57,35,23]])
X1 = np.insert(X, X.shape[1], 1, axis=1)
print(X1)
Output:
[[ 1 45 23 56 34 23 1]
[ 2 46 24 57 35 23 1]]

Related

Numpy: Apply function pairwise on two arrays of different length

I am trying to make use of numpy vectorized operations. But I struggle on the following task: The setting is two arrays of different length (X1, X2). I want to apply a method to each pair (e.g. X1[0] with X2[0], X2[1], etc). I wrote the following working code using loops, but I'd like to get rid of the loops.
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i] - X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can reshape one of your vectors to be (N, 1) and then use vectorize which will broadcast the operation as normal:
import numpy as np
X1 = np.arange(5)
X2 = np.arange(3)
print(X1, X2)
# [0 1 2 3 4] [0 1 2]
def my_op(x, y):
return x + y
np.vectorize(my_op)(X1[:, np.newaxis], X2)
# array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4],
# [3, 4, 5],
# [4, 5, 6]])
Note that my_op is just defined as an example; if your function is actually anything included in numpy's vectorized operations, it'd be much faster to just use that directly, e.g.:
X1[:, np.newaxis] + X2
itertools.product might be what you're looking for:
from itertools import product
import numpy as np
x1 = np.array(...)
x2 = np.array(...)
result = np.array([my_method(x_1 - x_2) for x_1, x_2 in product(x1,x2)])
Alternatively you could also use a double list comprehension:
result = np.array([my_method(x_1 - x_2) for x_1 in x1 for x_2 in x2])
This obviously depends on what my_method is doing and operating on and what you have stored in x1 and x2.
Assuming a simple function my_method(a, b), which adds the two numbers.
And this input:
X1 = np.arange(10)
X2 = np.arange(10,60,10)
You code is:
result = []
for i in range(len(X1)):
result.append([])
for j in range(len(X2)):
tmp = my_method(X1[i], X2[j])
result[i].append(tmp)
result = np.asarray(result)
You can replace it with broadcasting:
X1[:,None]+X2
output:
array([[10, 20, 30, 40, 50],
[11, 21, 31, 41, 51],
[12, 22, 32, 42, 52],
[13, 23, 33, 43, 53],
[14, 24, 34, 44, 54],
[15, 25, 35, 45, 55],
[16, 26, 36, 46, 56],
[17, 27, 37, 47, 57],
[18, 28, 38, 48, 58],
[19, 29, 39, 49, 59]])
Now you need to see if your operation can be vectorized… please share details on what you want to achieve. Functions can be vectorized using numpy.vectorize, but this is not a magic tool as it will loop on the elements, which can be slow. The best is to have a true vector operation.

Avoid for-loop to split array into multiple arrays by index values using numpy

Input: There are two input arrays:
value_array = [56, 10, 65, 37, 29, 14, 97, 46]
index_array = [ 0, 0, 1, 0, 3, 0, 1, 1]
Output: I want to split value_array using index_array without using for-loop. So the output array will be:
split_array = [[56, 10, 37, 14], # index 0
[65, 97, 46], # index 1
[], # index 2
[29]] # index 3
Is there any way to do that using numpy without using any for-loop? I have looked at numpy.where but cannot figure it out how to do that.
For-loop: Here is the way to do that using for-loop. I want to avoid for-loop.
split_array = []
for i in range(max(index_array) + 1):
split_array.append([])
for i in range(len(value_array)):
split_array[index_array[i]].append(value_array[i])
Does this suffice?
Solution 1 (Note: for loop is not over the entire index array)
import numpy as np
value_array = np.array([56, 10, 65, 37, 29, 14, 97, 46])
index_array = np.array([ 0, 0, 1, 0, 3, 0, 1, 1])
max_idx = np.max(index_array)
split_array = []
for idx in range(max_idx + 1):
split_array.append([])
split_array[-1].extend(list(value_array[np.where(index_array == idx)]))
print(split_array)
[[56, 10, 37, 14], [65, 97, 46], [], [29]]
Solution 2
import numpy as np
value_array = np.array([56, 10, 65, 37, 29, 14, 97, 46])
index_array = np.array([ 0, 0, 1, 0, 3, 0, 1, 1])
value_array = value_array[index_array.argsort()]
split_idxs = np.squeeze(np.argwhere(np.diff(np.sort(index_array)) != 0) + 1)
print(np.array_split(value_array, split_idxs))
[array([56, 10, 37, 14]), array([65, 97, 46]), array([29])]
Indeed, you can use numpy by using arrays :
import numpy as np
value_array=np.array(value_array)
index_array=np.array(index_array)
split_array=[value_array[np.where(index_array==j)[0]] for j in set(index_array)]
You could do:
import numpy as np
value_array = np.array([56, 10, 65, 37, 29, 14, 97, 46])
index_array = np.array([ 0, 0, 1, 0, 3, 0, 1, 1])
# find the unique values in index array and the corresponding counts
unique, counts = np.unique(index_array, return_counts=True)
# create an array with 0 for the missing indices
zeros = np.zeros(index_array.max() + 1, dtype=np.int32)
zeros[unique] = counts # zeros = [4 3 0 1] 0 -> 4, 1 -> 3, 2 -> 0, 3 -> 1
# group by index array
so = value_array[np.argsort(index_array)] # so = [56 10 37 14 65 97 46 29]
# finally split using the counts
res = np.split(so, zeros.cumsum()[:-1])
print(res)
Output
[array([56, 10, 37, 14]), array([65, 97, 46]), array([], dtype=int64), array([29])]
The time complexity of this approach is O(N logN).
Additionally if you don't care about the missing indices, you could use the following:
_, counts = np.unique(index_array, return_counts=True)
res = np.split(value_array[np.argsort(index_array)], counts.cumsum()[:-1])
print(res)
Output
[array([56, 10, 37, 14]), array([65, 97, 46]), array([29])]

How to sort 2D array column by ascending and row by descending in Python 3

This is my array:
import numpy as np
boo = np.array([
[10, 55, 12],
[0, 81, 33],
[92, 11, 3]
])
If I print:
[[10 55 12]
[ 0 81 33]
[92 11 3]]
How to sort array column by ascending and row by descending like this:
[[33 81 92]
[10 12 55]
[0 3 11]]
# import the necessary packages.
import numpy as np
# create the array.
boo = np.array([
[10, 55, 12],
[0, 81, 33],
[92, 11, 3]
])
# we use numpy's 'sort' method to conduct the sorting process.
# we first sort the array along the rows.
boo = np.sort(boo, axis=0)
# we print to observe results.
print(boo)
# we therafter sort the resultant array again, this time on the axis of 1/columns.
boo = np.sort(boo, axis=1)
# we thereafter reverse the contents of the array.
print(boo[::-1])
# output shows as follows:
array([[33, 81, 92],
[10, 12, 55],
[ 0, 3, 11]])

Python Create vertical Numpy array

I have created a code in which from my lists I create an array, which must be vertical, like a vector, the problem is that using the reshape method I don't get anything.
import numpy as np
data = [[ 28, 29, 30, 19, 20, 21],
[ 31, 32, 33, 22, 23, 24],
[ 1, 34, 35, 36, 25, 26],
[ 2, 19, 20, 21, 10, 11],
[ 3, 4, 5, 6, 7, 8 ]]
index = []
for i in range(len(data)):
index.append([data[i][0], data[i][1], data[i][2],
data[i][3], data[i][4], data[i][5]])
y = np.array([index[i]])
# y.reshape(6,1)
Is there any solution for these cases? Thank you.
I'm looking for something like this to remain:
If you want to view each row as a column, transpose the array in any one of the following ways:
index = data.T
index = np.transpose(data)
index = data.transpose()
index = np.swapaxes(data, 0, 1)
index = np.moveaxis(data, 1, 0)
...
Each column of index will be a row of data. If you just want to access one column at a time, you can do that too. For example, to get row 3 (4th row) of the original array, any of the following would work:
y = data[3, :]
y = data[3]
y = index[:, 3]
You can get a column vector from the result by explicitly reshaping it to one:
y = y.reshape(-1, 1)
y = np.reshape(y, (-1, 1))
y = np.expand_dims(y, 1)
Remember that reshaping creates a new array object which views the same data as the original. The only way I know to reshape an array in-place is to assign to its shape attribute:
y.shape = (y.size, 1)
You can use flatten() from numpy https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html
(if you want a copy of the original array without modifying the original)
import numpy as np
data = [[ 28, 29, 30, 19, 20, 21],
[ 31, 32, 33, 22, 23, 24],
[ 1, 34, 35, 36, 25, 26],
[ 2, 19, 20, 21, 10, 11],
[ 3, 4, 5, 6, 7, 8 ]]
data = np.array(data).flatten()
print(data.shape)
(30,)
You can also use ravel()
(if you don't want a copy)
data = np.array(data).ravel()
If your array always has 2-d, this also works,
data = data.reshape(-1)

define a 3d numpy array with a range in column-wise format

I want to define a 3d numpy array with shape = [3, 3, 5] and also with values as a range starting with 11 and step = 3 in column-wise manner. I mean:
B[:,:,0] = [[ 11, 20, 29],
[ 14, 23, 32],
[17, 26, 35]]
B[:,:,1] = [[ 38, ...],
[ 41, ...],
[ 44, ...]]
...
I am new to numpy and I doubt doing it with np.arange or np.mgrid maybe. but I don't how to do.
How can this be done with minimal lines of code?
You can calculate the end of the range by multiplying the shape by the step and adding the start. Then it's just reshape and transpose to move the column around:
start = 11
step = 3
shape = [5, 3, 3]
end = np.prod(shape) * step + start
B = np.arange(start, end, step).reshape([5, 3, 3]).transpose(2, 1, 0)
B[:, :, 0]
# array([[11, 20, 29],
# [14, 23, 32],
# [17, 26, 35]])

Categories