The elementwise square of subset of numpy array - python

I have the following numpy array
np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I want to compute the elementwise square of second column and third column only by retaining the first column as it is, yielding the result,
np.array([[1, 4, 9],
[4, 25, 36]],
[7, 64, 81]])
What have I tried
I extracted the first column.
then I extracted the second and third columns. found square, using numpy.square function.
arr1 = arr[:, 0]
arr2 = np.square(arr[:, 1:])
and then concatenated them
np.c_[arr1, arr2]
Is there a single step solution?

Select all elements of the non-first column using [:, 1:].
arr[:, 1:] **= 2

Related

How to sum specific row values together in Sparse COO matrix to reshape matrix

I have a sparse coo matrix built in python using the scipy library. An example data set looks something like this:
>>> v.toarray()
array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
I would like to add the 0th index and 2nd index together and the 1st index and the and 3rd index together so the shape would change from 3, 4 to 3, 2.
However looking at the docs their sum function doesn't support slicing of some sort. So the only way I have thought of a way to do something like that would be to loop the matrix as an array then use numpy to get the summed values like so:
a_col = []
b_col = []
for x in range(len(v.toarray()):
a_col.append(np.sum(v.toarray()[x, [0, 2]], axis=0))
b_col.append(np.sum(v.toarray()[x, [1, 3]], axis=0))
Then use those values for a_col and b_col to create the matrix again.
But surely there should be a way to handle it with the sum method?
You can add the values with a simple loop and 2d slicing and than take the columns you want
v = np.array([[1, 0, 2, 4],
[0, 0, 3, 1],
[4, 5, 6, 9]])
for i in range(2):
v[:, i] = v[:, i] + v[:, i+2]
print(v[:, :2])
Output
[[ 3 4]
[ 3 1]
[10 14]]
You can use csr_matrix.dot with a special matrix to achieve the same,
csr = csr_matrix(csr.dot(np.array([[1,0,1,0],[0,1,0,1]]).T))
#csr.data
#[ 3, 4, 3, 1, 10, 14]

Split numpy 2D array based on separate label array

I have a 2D numpy array A. For example:
A = np.array([[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 0]])
I have another label array B corresponding to rows of A. For example:
B = np.array([0, 1, 2, 0, 1])
I want to split A into 3 arrays based on their labels, so the result would be:
[[[1, 2],
[7, 8]],
[[3, 4],
[9, 0]],
[[5, 6]]]
Are there any numpy built in functions to achieve this?
Right now, my solution is rather ugly and involves repeating calling numpy.where in a for-loop, and slicing the indices tuples to contain only the rows.
Here's one way to do it:
hstack both the array together.
sort the array by the last column
split the array based on unique value index
a = np.hstack((A,B[:,None]))
a = a[a[:, -1].argsort()]
a = np.split(a[:,:-1], np.unique(a[:, -1], return_index=True)[1][1:])
OUTPUT:
[array([[1, 2],
[7, 8]]),
array([[3, 4],
[9, 0]]),
array([[5, 6]])]
If the output can always be an array because the labels are equally distributed, you only need to sort the data by label:
idx = B.argsort()
n = np.flatnonzero(np.diff(idx))[0] + 1
result = A[idx].reshape(n, A.shape[0] // n, A.shape[1])
If the labels aren't equally distributed, you'll have to make a list in the outer dimension:
_, indices, counts = np.unique(B, return_counts=True, return_inverse=True)
result = np.split(A[indices.argsort()], counts.cumsum()[:-1])
Using the equivalent of np.where is not very efficient, but you can do it without a loop:
b, idx = np.unique(B, return_inverse=True)
mask = idx[:, None] == np.arange(b.size)
result = np.split(A[idx.argsort()], np.count_nonzero(mask, axis=0).cumsum()[:-1])
You can compute the mask simulataneously for all the labels and apply it to the sorted A (A[idx.argsort()]) by counting the number of matching elements in each category (np.count_nonzero(mask, axis=0).cumsum()). The last index is stripped off the cumulative sum because np.split always adds an implicit total index.
You could also use Pandas for this because it's designed for labelled data and has a powerful groupby method.
import pandas as pd
index = pd.Index(B, name='label')
df = pd.DataFrame(A, index=index)
groups = {k: v.values for k, v in df.groupby('label')}
print(groups)
This produces a dictionary of arrays of the grouped values:
{0: array([[1, 2],
[7, 8]]), 1: array([[3, 4],
[9, 0]]), 2: array([[5, 6]])}
For a list of the arrays you can do this instead:
groups = [v.values for k, v in df.groupby('label')]
This is probably the simplest way:
groups = [A[B == label, :] for label in np.unique(B)]
print(groups)
Output:
[array([[1, 2],
[7, 8]]), array([[3, 4],
[9, 0]]), array([[5, 6]])]

Python numpy 2D array sum over certain indices

There is a 2-d array like this:
img = [
[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
[[2, 2, 2], [3, 2, 3], [6, 7, 6]],
[[9, 8, 1], [9, 8, 3], [9, 8, 5]]
]
And i just want to get the sum of certain indices which are like this:
indices = [[0, 0], [0, 1]] # which means img[0][0] and img[0][1]
# means here is represents
There was a similar ask about 1-d array in stackoverflow in this link, but it got a error when I tried to use print(img[indices]). Because I want to make it clear that the element of img are those which indicates by indices, and then get the mean sum of it.
Expected output
[5, 7, 9]
Use NumPy:
import numpy as np
img = np.array(img)
img[tuple(indices)].sum(axis = 0)
#array([5, 7, 9])
If the result would be [5, 7, 9] which is sum over the column of the list. Then easy:
img = np.asarray(img)
indices = [[0, 0], [0, 1]]
img[(indices)].sum(axis = 0)
Result:
array([5, 7, 9])
When you supply a fancy index, each element of the index tuple represents a different axis. The shape of the index arrays broadcasts to the shape of the output you get.
In your case, the rows of indices.T are the indices in each axis. You can convert them into an index tuple and append slice(None), which is the programmatic equivalent of :. You can take the mean of the resulting 2D array directly:
img[tuple(indices.T) + (slice(None),)].sum(0)
Another way is to use the splat operator:
img[(*indices.T, slice(None))].sum(0)

Which way are rows and columns in a numpy 2-d array used as a matrix?

When using a numpy array as a matrix, in which order are rows and columns?
For example:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Is [1, 2, 3] the first row or the first column?
I cannot find this information in the documentation, perhaps because the answer is too obvious.
[1, 2, 3] is the first row.
The examples in numpy ndarray documentation actually gives you some hints:
>>> x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
>>> # The element of x in the *second* row, *third* column, namely, 6.
>>> x[1, 2] ```

Modify different columns in each row of a 2D NumPy array

I have the following problem:
Let's say I have an array defined like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What I would like to do is to make use of Numpy multiple indexing and set several elements to 0. To do that I'm creating a vector:
indices_to_remove = [1, 2, 0]
What I want it to mean is the following:
Remove element with index '1' from the first row
Remove element with index '2' from the second row
Remove element with index '0' from the third row
The result should be the array [[1,0,3],[4,5,0],[0,8,9]]
I've managed to get values of the elements I would like to modify by following code:
values = np.diagonal(np.take(A, indices, axis=1))
However, that doesn't allow me to modify them. How could this be solved?
You could use integer array indexing to assign those zeros -
A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
Sample run -
In [445]: A
Out[445]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [446]: indices_to_remove
Out[446]: [1, 2, 0]
In [447]: A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
In [448]: A
Out[448]:
array([[1, 0, 3],
[4, 5, 0],
[0, 8, 9]])

Categories