How to extract columns from an indexed matrix? - python

I have the following matrix:
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
And I receive a vector indexing the columns of the matrix:
index = np.array([1,1,2,2,2,2,3,4,4,4])
This vector has 4 different values, so my objective is to create a list containing four new matrices so that the first matrix is made by the first two columns of M, the second matrix is made by columns 3 to 6 and so on:
M1 = np.matrix([[1,2],[11,12],[21,22]])
M2 = np.matrix([[3,4,5,6],[13,14,15,16],[23,24,25,26]])
M3 = np.matrix([[7],[17],[27]])
M4 = np.matrix([[8,9,10],[18,19,20],[28,29,30]])
l = list(M1,M2,M3,M4)
I need to do this in a automated way, since the number of rows and columns of M as well as the indexing scheme are not fixed. How can I do this?

There are 3 points to note:
For a variable number of variables, as in this case, the recommended solution is to use a dictionary.
You can use simple numpy indexing for the individual case.
Unless you have a very specific reason, use numpy.array instead of numpy.matrix.
Combining these points, you can use a dictionary comprehension:
d = {k: np.array(M[:, np.where(index==k)[0]]) for k in np.unique(index)}
Result:
{1: array([[ 1, 2],
[11, 12],
[21, 22]]),
2: array([[ 3, 4, 5, 6],
[13, 14, 15, 16],
[23, 24, 25, 26]]),
3: array([[ 7],
[17],
[27]]),
4: array([[ 8, 9, 10],
[18, 19, 20],
[28, 29, 30]])}

import numpy as np
M = np.matrix([[1,2,3,4,5,6,7,8,9,10],
[11,12,13,14,15,16,17,18,19,20],
[21,22,23,24,25,26,27,28,29,30]])
index = np.array([1,1,2,2,2,2,3,4,4,4])
m = [[],[],[],[]]
for i,c in enumerate(index):
m[k-1].append(c)
for idx in m:
print M[:,idx]
this is a little hard coded, I assumed you will always want 4 matrixes and such.. you can change it for more generalisation

Related

how to use numpy plus each rows in a martix with every rows in another martix

Here is the example:
import numpy as np
a = np.array([1,2],[3,4],[5,6])
b = np.array([7,8],[9,10],[11,12],[12,13])
what I want is to use each item in a to plus every item in b then plus them together. For example, [1,2] should plus every row in b 1+7=8,2+8=10, 8+10=18;1+9=10,2+10=12,10+12=22... The result would like to be that[[18,22,26,28...],[22,26,....],[26,30....]]
My question is how to fulfil that? I know use numpy can be more efficient than loop but how to use the matrix to calculate this?
I believe this does what you want:
>>> a = np.array([[1,2],[3,4],[5,6]])
>>> b = np.array([[7,8],[9,10],[11,12],[12,13]])
>>> np.sum(a, axis=1)[:,None] + np.sum(b, axis=1)[None,:]
array([[18, 22, 26, 28],
[22, 26, 30, 32],
[26, 30, 34, 36]])
You can use list comprehensions:
import numpy as np
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[7, 8], [9, 10], [11, 12], [12, 13]])
[[sum(i) + sum(j) for j in b] for i in a]
Output:
[[18, 22, 26, 28], [22, 26, 30, 32], [26, 30, 34, 36]]
This can be done in the most precise and succinct way as follows:
np.einsum('ijk-> ij', a[:,None,:]+b)
Let me explain each step.It combines einsum and broadcasting concepts of numpy.
Step 1- a[:,None,:] reshapes matrix a to shape (3,1,2). This mid axis with value 1 is helpful for broadcasting.
Step 2- a[:,None,:] + b broadcasts a and adds matrix b to get a resultant matrix of shape (3,4,2).
Step 3- np.einsum('ijk-> ij', a[:,None,:]+b) does sum reduction along the last axis of matrix obtained from previous step.

randomly split data in n groups?

I am currently trying to write code for splitting a given data into a number of groups.
The groups should be created randomly and they should encompass together the entire data.
So let's suppose there's an array A of eg. shape = (3, 3, 3) that has 27 root elements e:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I want to create n groups such that g1 & g2 & ... & gn will "add up" to the original array A.
I shuffled A as following
def shuffle(array):
shuf = array.ravel()
np.random.shuffle(shuf)
return np.reshape(shuf, array.shape)
But how do I create n groups (n < e) randomly?
Thanks!
Leo
Though not so elegant, the following code will spread the array to n group with ensuring each group having at least one element, and spread the rest randomly.
import numpy as np
def shuffle_and_group(array, n):
shuf = array.ravel()
np.random.shuffle(shuf)
shuf = list(shuf)
groups = []
for i in range(n): # ensuring no empty group
groups.append([shuf.pop()])
for num in shuf: # spread the remaining
groups[np.random.randint(n)].append(num)
return groups
array = np.arange(15)
print(shuffle_and_group(array, 9))
In case you worry about the time, the code will have time complexity of O(e) where e is the number of elements.

Repeat each row n times in a matrix and add a column of 1*n next to each part

I want to repeat each row of my matrix and add another column next to it.
Imagine here is my matrix
A = [[11, 12], [13, 14], [15, 16], [17, 18]]
and I want repetition of 2 times for each row, then the result will be
B = [[1, 11, 12], [2, 11, 12], [1, 13, 14], [2, 13, 14], [1, 15, 16], [2, 15, 16], [1, 17, 18], [2, 17, 18]]
I already tried below code
k = 2
B = [A] * k
which gives me error of memory in my full code
I do not know how to use panda and I am using numpy.
Is there any way to use numpy in efficient way without facing memory error, in my case,
And get the correct reply?
P.S.: I didn't add my code as I am working with huge dataset plus it is just one little little piece of whole code!
You should do
k = 2
B = A * k
instead of
k = 2
B = [A] * k
To add the new index column, you could do this
for i, sub_list in B:
sub_list.insert(i, i+1) # starting from index 1 instead of 0
Since lists are mutable, there is no need to reassign B.
#Here is my reply which solved memory problem too.
#I guess memory problem was due to exceeding array limit!!!
#I do not know HOW but below code was practically correct.
A = np.repeat(A, k, axis=0)
AB = [[1], [2]]
AB = np.reshape(AB, (-1,1))
AB = np.tile(AB,((len(A)//k),1))
B = np.hstack((AB, A))

From a 2D array, create 2nd 2D array of Unique(non-repeated) random selected values from 1st array (values not shared among rows) without using a loop

This is a follow up on this question.
From a 2d array, create another 2d array composed of randomly selected values from original array (values not shared among rows) without using a loop
I am looking for a way to create a 2D array whose rows are randomly selected unique values (non-repeating) from another row, without using a loop.
Here is a way to do it With using a loop.
pool = np.random.randint(0, 30, size=[4,5])
seln = np.empty([4,3], int)
for i in range(0, pool.shape[0]):
seln[i] =np.random.choice(pool[i], 3, replace=False)
print('pool = ', pool)
print('seln = ', seln)
>pool = [[ 1 11 29 4 13]
[29 1 2 3 24]
[ 0 25 17 2 14]
[20 22 18 9 29]]
seln = [[ 8 12 0]
[ 4 19 13]
[ 8 15 24]
[12 12 19]]
Here is a method that does not uses a loop, however, it can select the same value multiple times in each row.
pool = np.random.randint(0, 30, size=[4,5])
print(pool)
array([[ 4, 18, 0, 15, 9],
[ 0, 9, 21, 26, 9],
[16, 28, 11, 19, 24],
[20, 6, 13, 2, 27]])
# New array shape
new_shape = (pool.shape[0],3)
# Indices where to randomly choose from
ix = np.random.choice(pool.shape[1], new_shape)
array([[0, 3, 3],
[1, 1, 4],
[2, 4, 4],
[1, 2, 1]])
ixs = (ix.T + range(0,np.prod(pool.shape),pool.shape[1])).T
array([[ 0, 3, 3],
[ 6, 6, 9],
[12, 14, 14],
[16, 17, 16]])
pool.flatten()[ixs].reshape(new_shape)
array([[ 4, 15, 15],
[ 9, 9, 9],
[11, 24, 24],
[ 6, 13, 6]])
I am looking for a method that does not use a loop, and if a particular value from a row is selected, that value can Not be selected again.
Here is a way without explicit looping. However, it requires generating an array of random numbers of the size of the original array. That said, the generation is done using compiled code so it should be pretty fast. It can fail if you happen to generate two identical numbers, but the chance of that happening is essentially zero.
m,n = 4,5
pool = np.random.randint(0, 30, size=[m,n])
new_width = 3
mask = np.argsort(np.random.rand(m,n))<new_width
pool[mask].reshape(m,3)
How it works:
We generate a random array of floats, and argsort it. By default, when artsort is applied to a 2d array it is applied along axis 1 so the value of the i,j entry of the argsorted list is what place the j-th entry of the i-th row would appear if you sorted the i-th row.
We then find all the values in this array where the entries whose values are less than new_width. Each row contains the numbers 0,...,n-1 in a random order, so exactly new_width of them will be less than new_width. This means each row of mask will have exactly new_width number of entries which are True, and the rest will be False (when you use a boolean operator between a ndarray and a scalar it applies it component-wise).
Finally, the boolean mask is applied to the original data to grab new_width many entries from each row.
You could also use np.vectorize for your loop solution, although that is just shorthand for a loop.

2D numpy argsort index returns 3D when used in the original matrix

I am trying to obtain the top 2 values from each row in a matrix using argsort. The indexing is working, as in argsort is returning the correct values. However, when I put the argsort result as an index, it returns a 3 dimensional result.
For example:
test_mat = np.matrix([[0 for i in range(5)] for j in range(5)])
for i in range(5):
for j in range(5):
test_mat[i, j] = i * j
test_mat[range(2,3)] = test_mat[range(2,3)] * -1
last_two = range(-1, -3, -1)
index = np.argsort(test_mat, axis=1)
index = index[:, last_k]
This gives:
index.shape
Out[402]: (5L, 5L)
test_mat[index].shape
Out[403]: (5L, 5L, 5L)
Python is new to me and I find indexing to be very confusing in general even after reading the various array manuals. I spend more time trying to get the right values out of objects than actually solving problems. I'd welcome any tips on where to properly learn what is going on. Thanks.
You can use linear indexing to solve your case, like so -
# Say A is your 2D input array
# Get sort indices for the top 2 values in each row
idx = A.argsort(1)[:,::-1][:,:2]
# Get row offset numbers
row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
# Add row offsets with top2 sort indices giving us linear indices of
# top 2 elements in each row. Index into input array with those for output.
out = np.take( A, idx + row_offset )
Here's a step-by-step sample run -
In [88]: A
Out[88]:
array([[34, 45, 16, 20, 24],
[37, 13, 49, 37, 21],
[42, 36, 35, 24, 18],
[26, 28, 21, 13, 44]])
In [89]: idx = A.argsort(1)[:,::-1][:,:2]
In [90]: idx
Out[90]:
array([[1, 0],
[2, 3],
[0, 1],
[4, 1]])
In [91]: row_offset = A.shape[1]*np.arange(A.shape[0])[:,None]
In [92]: row_offset
Out[92]:
array([[ 0],
[ 5],
[10],
[15]])
In [93]: np.take( A, idx + row_offset )
Out[93]:
array([[45, 34],
[49, 37],
[42, 36],
[44, 28]])
You can directly get the top 2 values from each row with just sorting along the second axis and some slicing, like so -
out = np.sort(A,1)[:,:-3:-1]

Categories