How can I merge rows in np matrix? - python

I've got a numpy matrix that has 2 rows and N columns, e.g. (if N=4):
[[ 1 3 5 7]
[ 2 4 6 8]]
The goal is create a string 1,2,3,4,5,6,7,8.
Merge the rows such that the elements from the first row have the even (1, 3, ..., N - 1) positions (the index starts from 1) and the elements from the second row have the odd positions (2, 4, ..., N).
The following code works but it isn't really nice:
xs = []
for i in range(number_of_cols):
xs.append(nums.item(0, i))
ys = []
for i in range(number_of_cols):
ys.append(nums.item(1, i))
nums_str = ""
for i in range(number_of_cols):
nums_str += '{},{},'.format(xs[i], ys[i])
Join the result list with a comma as a delimiter (row.join(','))
How can I merge the rows using built in functions (or just in a more elegant way overall)?

Specify F order when flattening (or ravel):
In [279]: arr = np.array([[1,3,5,7],[2,4,6,8]])
In [280]: arr
Out[280]:
array([[1, 3, 5, 7],
[2, 4, 6, 8]])
In [281]: arr.ravel(order='F')
Out[281]: array([1, 2, 3, 4, 5, 6, 7, 8])

Joining rows can be done this way :
>>> a = np.arange(12).reshape(3,4)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.hstack([a[i,:] for i in range(a.shape[0])])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Then it's simple to convert this array into string.

Here's one way of doing it:
out_str = ','.join(nums.T.ravel().astype('str'))
We are first transposing the array with .T, then flattening it with .ravel(), then converting each element from int to str, and then applying `','.join() to combine all the str elements
Trying it out:
import numpy as np
nums = np.array([[1,3,5,7],[2,4,6,8]])
out_str = ','.join(nums.T.ravel().astype('str'))
print (out_str)
Result:
1,2,3,4,5,6,7,8

Related

How to sum a numpy along the row axis by including only certain values per row according to variable length indices?

For the following array:
array = [
[1, 5, 6, 8, 10, 3],
[3, 2, 4, 9, 11, 7],
[8, 0, 9, 6, 23, 4]
]
How could we sum the elements (per row) as indicated by these indices:
indices = [
[2, 4, 5],
[1, 3],
[4]
]
that is to say that:
for the first row only the values on indices [2, 4, 5] will be considered when summing up -> (6 + 10 + 3)
for the second row only the values on indices [1, 3] will be considered when summing up -> (2 + 9)
and so on
Output:
array([19, 11, 23])
The output has the same shape as if we did array.sum(axis=1) but not every value is included. Instead, the participants of each row are determined by the indices array.
I have thought of creating a mask for that purpose, but I did not know how to pass the indices to it.
Try this:
arr = np.array(array)
out = np.array([arr[idx, ind].sum() for idx, ind in enumerate(indices)])
out
Output : array([19, 11, 23])

Shift values in numpy array by differing amounts

I have an array a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9]) that continues like that. I would like to shift this array but I'm not sure if I can use np.roll() here.
The array I would like to produce is [0, 0, 0, 2, 2, 3, 15, 15, 7].
As you can see, the first like numbers which are in array a (in this case the three '2's) should be replaced with '0's. Everything should then be shifted such that the '3's are replaced with '2's, the '15' is replaced with the '3' etc. Ideally I would like to do this operation without any for loop as I need it to run quickly.
I realise this operation may be a bit confusing so please ask questions.
If you want to stick with NumPy, you can achieve this using np.unique by returning the counts per unique elements with the return_counts option.
Then, simply roll the values and construct a new array with np.repeat:
>>> s, i, c = np.unique(a, return_index=True, return_counts=True)
(array([ 2, 3, 7, 9, 15]), array([0, 3, 6, 8, 5]), array([3, 2, 2, 1, 1]))
The three outputs are respectively: unique sorted elements, indices of first encounter unique element, and the count per unique element.
np.unique sorts the value, so we need to unsort the values as well as the counts first. We can then shift the values with np.roll:
>>> idx = np.argsort(i)
>>> v = np.roll(s[idx], 1)
>>> v[0] = 0
array([ 0, 2, 3, 15, 7])
Alternatively with np.append, this requires a whole copy though:
>>> v = np.append([0], s[idx][:-1])
array([ 0, 2, 3, 15, 7])
Finally reassemble:
>>> np.repeat(v, c[idx])
array([ 0, 0, 0, 2, 2, 3, 15, 15, 7])
Another - more general - solution that will work when there are recurring values in a. This requires the use of np.diff.
You can get the indices of the elements with:
>>> i = np.diff(np.append(a, [0])).nonzero()[0] + 1
array([3, 5, 6, 8, 9])
>>> idx = np.append([0], i)
array([0, 3, 5, 6, 8, 9])
The values are then given using a[idx]:
>>> v = np.append([0], a)[idx]
array([ 0, 2, 3, 15, 7, 9])
And the counts per element with:
>>> c = np.append(np.diff(i, prepend=0), [0])
array([3, 2, 1, 2, 1, 0])
Finally, reassemble:
>>> np.repeat(v, c)
array([ 0, 0, 0, 2, 2, 3, 15, 15, 7])
This is not using numpy, but one approach that comes to mind is to itertools.groupby to collect contiguous runs of the same elements. Then shift all the elements (by prepending a 0) and use the counts to repeat them.
from itertools import chain, groupby
def shift(data):
values = [(k, len(list(g))) for k,g in groupby(data)]
keys = [0] + [i[0] for i in values]
reps = [i[1] for i in values]
return list(chain.from_iterable([[k]*rep for k, rep in zip(keys, reps)]))
For example
>>> a = np.array([2,2,2,3,3,15,7,7,9])
>>> shift(a)
[0, 0, 0, 2, 2, 3, 15, 15, 7]
You can try this code:
import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])
diff_a=np.diff(a)
idx=np.flatnonzero(diff_a)
val=diff_a[idx]
val=np.insert(val[:-1],0, a[0]) #update value
diff_a[idx]=val
res=np.append([0],np.cumsum(diff_a))
print(res)
You can try this:
import numpy as np
a = np.array([2, 2, 2, 3, 3, 15, 7, 7, 9])
z = a - np.pad(a, (1,0))[:-1]
z[m] = np.pad(z[(m := z!=0)], (1,0))[:-1]
print(z.cumsum())
It gives:
[ 0 0 0 2 2 3 15 15 7]

Numpy concatenate lists where first column is in range n

I am trying to select all rows in a numpy matrix named matrix with shape (25323, 9), where the values of the first column are inside the range of start and end for each tuple on the list range_tuple. Ultimately, I want to create a new numpy matrix with the result where final has a shape of (n, 9). The following code returns this error: TypeError: only integer scalar arrays can be converted to a scalar index. I have also tried initializing final with numpy.zeros((1,9)) and used np.concatenate but get similar results. I do get a compiled result when I use final.append(result) instead of using np.concatenate but the shape of the matrix gets lost. I know there is a proper solution to this problem, any help would be appreciated.
final = []
for i in range_tuples:
copy = np.copy(matrix)
start = i[0]
end = i[1]
result = copy[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate(final, result)
final = np.matrix(final)
In [33]: arr
Out[33]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
In [34]: tups = [(0,6),(3,12),(9,10),(15,14)]
In [35]: alist=[]
...: for start, stop in tups:
...: res = arr[(arr[:,0]<stop)&(arr[:,0]>=start), :]
...: alist.append(res)
...:
check the list; note that elements differ in shape; some are 1 or 0 rows. It's a good idea to test these edge cases.
In [37]: alist
Out[37]:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]]), array([[ 9, 10, 11]]), array([], shape=(0, 3), dtype=int64)]
vstack joins them:
In [38]: np.vstack(alist)
Out[38]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[ 9, 10, 11]])
Here concatenate also works, because default axis is 0, and all inputs are already 2d.
Try the following
final = np.empty((0,9))
for start, stop in range_tuples:
result = matrix[(matrix[:,0] < end) & (matrix[:,0] > start)]
final = np.concatenate((final, result))
The first is to initialize final as a numpy array. The first argument to concatenate has to be a python list of the arrays, see docs. In your code it interprets the result variable as the value for the parameter axis
Notes
I used tuple deconstruction to make the loop clearer
the copy is not needed
appending lists can be faster. The final result can afterwards be obtained through reshaping, if result is always of the same length.
I would simply create a boolean mask to select rows that satisfy required conditions.
EDIT: I missed that you are working with matrix (as opposite to ndarray). Answer was edited for matrix.
Assume following input data:
matrix = np.matrix([[1, 2, 3], [5, 6, 7], [2, 1, 7], [3, 4, 5], [8, 9, 0]])
range_tuple = [(0, 2), (1, 4), (1, 9), (5, 9), (0, 100)]
Then, first, I would convert range_tuple to a numpy.ndarray:
range_mat = np.matrix(range_tuple)
Now, create the mask:
mask = np.ravel((matrix[:, 0] > range_mat[:, 0]) & (matrix[:, 0] < range_mat[:, 1]))
Apply the mask:
final = matrix[mask] # or matrix[mask].copy() if you intend to modify matrix
To check:
print(final)
[[1 2 3]
[2 1 7]
[8 9 0]]
If length of range_tuple can be different from the number of rows in the matrix, then do this:
n = min(range_mat.shape[0], matrix.shape[0])
mask = np.pad(
np.ravel(
(matrix[:n, 0] > range_mat[:n, 0]) & (matrix[:n, 0] < range_mat[:n, 1])
),
(0, matrix.shape[0] - n)
)
final = matrix[mask]

Averaging over n elements

I have a numpy array like this [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Let's assume I want the average of 3 elements, my target array should then look like this:
[2, 2, 2, 5, 5, 5, 8, 8, 8, 10]
Notice that when there is no triplet available I want to calculate the average over the remaining elements
Is there a neat way to do that with array operations?
You could reshape the array to use the mean function, for example:
a = np.arange(1,11)
b = a[:a.size//3*3]
b.shape = (-1,3)
c = np.mean(b, axis=1)
# c == array([2., 5., 8.])
Then reassign the results in the original array:
c.shape = (-1,1) # i.e. (len(b), 1)
b[:] = c
print(a)
# array([ 2, 2, 2, 5, 5, 5, 8, 8, 8, 10])
Note that this works because b is a sub-view of a. Also the last element is not as you asked the average (I left it untouched), but it'll be easy to fix, with e.g.:
a[9:] = np.mean(a[9:])
I have done most of it in a one liner just for fun :D
*Notice I'm using sum() to flatten the list.. (that's some weird python trick)
def custom_avg(group: int, arr):
out = list()
[out.append(list(np.full( (1, group), np.sum(arr[i:i+group])/ (1 if i + group > len(arr) else group), dtype=int))) for i in range(0, len(arr), group) ]
return sum(out,[])
Enjoy! good luck.

how to delete multiple rows in an array

I have to delete last three rows of the array. It was list but I had to convert it into array so that I can use np.delete function
I tried np.delete function. It deletes column wise instead of row wise.
I want to delete row not column. When I change the axis to 1. it gives an error message of AxisError: axis 1 is out of bounds for array of dimension 1
featureStr2=np.delete(f, slice(3,-1), axis=0). I want to delete last 3 rows. Array looks like below
1 2 3 4 5
6 7 8 9 20
11 23 54 6 7
2 3 4 5 6 7
1 2 3 4 5
Out put of the code is. I want output to delete last 3 rows.
Don't delete in numpy. Deleting triggers a reallocation, which is expensive. The cheap (proper) solution is to just create a view using indexing:
arr = arr[:-3, ...]
you can drop rows by using pandas with drop, indexing and condition function
import numpy as np
import pandas as pd
df = ([1,2,3,4,5], [6,7,8,9,20],[11,23,54,6,7],[2,3,4,5,6,7],[1,2,3,4,5])
series = pd.DataFrame(df)
by using drop function
series1 = series.drop([2,3,4])
print(series1)
using index function
series1 = series.drop(series.index[2,3,4]
print(series1)
What you need is Axis and object:
Syntax: numpy.delete(arr, obj, axis=None)
object : is the row number or column number or a indices
Axis: 0 for the rows and 1 for the columns
e.g. i'm assuming your array looks like this.
a = np.array([[1,2,3,4,5], [2,4,5,6,7], [3,4,5,6,7], [5,7,8,9,1]])
>>> np.delete(a, [2,3], axis=0)
array([[1, 2, 3, 4, 5],
[2, 4, 5, 6, 7]])
P.S. for now np.delete doesn't support negative indices, in future it will , so i suggest you to get the indices of the rows you want to delete first and then pass it to obj in np.delete()
The simplest way is to just use basic indexing
>>>import numpy as np
>>>arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 20], [11, 23, 54, 6, 7],
[2, 3, 4, 6, 7],[1,2, 3, 4, 5]])
>>>arr = arr[:-3]
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 20]])
np.delete(arr, obj, axis=None) doesn't take in negative indices in its object argument
Also, if array size is large, then supplying index of every row, column or element to be deleted becomes tedious.
>>>np.delete(arr, [2,3,4], axis=0)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 20]])
But by using np.s_ you can supply a slice to the function
>>>np.delete(arr, np.s_[2:5], axis=0)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 20]])
You can supply negative indexing to np.s_
>>>np.delete(arr, np.s_[-3:], axis=0)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 20]])

Categories