Based on BFS output how to select sparse matrix elements - python

I am trying to select sparse matrix elements row-wise based on BFS output array. Suppose my BFS output is
[1, 2, 3, 6, 4, 7, 5, 8, 11, 9, 12, 10, 13, 15, 14, 16, 17, 18, 19, 20]
and I have a sparse matrix of 20x20 for example.
Now I want to use BFS output as row index and select nonzero values from the sparse matrix in same order as that of BFS output array and plot. Here is my code through which I can do some job but not perfectly what I wanted.
a = numpy.loadtxt('sparsematrix.txt', float, delimiter=',') # import data
y = numpy.reshape(a, np.size(a))
pos = np.delete(y, np.arange(0, y.size, 19))
plt.plot(pos)
plt.xlabel(sample)
plt.ylabel(position)
Problem with the above code is:
It selects every value in row-wise, but not in defined order of
my BFS output. (it should use BFS output array as row index number
to select nonzero values one-by-one)
It selects all the values, even zeros. - How to get only nonzero values?
Indexing is starting from 0 and goes to 19. I want indexing to start from 1 onwards.

Important Update
Now I get what you completly wanted by reading Christian's answer. I made another function to complement the one already given. See the whole program here:
sparseMatrix = ([0,4,5,0],[2,0,4,0],[0,3,3,0],[6,6,0,0])
iList = [3,1,2,4]
def AnalyzeSparseMatrix( sMatrix, iList ):
orderedArray = [] #The array you want as output
for i in iList:
orderedArray += AnalyzeRowWise(sMatrix[i-1]) #Add non-zero selected line from sparse matrix
return orderedArray #Returns a non-zero list ordered in the selected way by the BFS output list
def AnalyzeRowWise( oldArray ):
newMatrix = []
#Code to analize row wise
for data in oldArray:
if(data != 0): #Condition
newMatrix.append(data)
return newMatrix
#Test program
print (AnalyzeSparseMatrix(sparseMatrix, iList)) #Output: [3,3,4,5,2,4,6,6]
The new method AnalyzeSparseMatrix() takes two arguments, the first argument is the sparse matrix, the second argument is the BFS output list. The method returns a list, which is the desired list. So you can assign that list to another list you want, for example:
finalOrderedList = AnalyzeSparseMatrix( sparseMatrix, iList )
Find more details about what almost every line of code does, in the code above.

I've assumed that this is what you want:
bfs_output = list of row indexes where 1 is the first/top row of a matrix.
matrix m = some matrix with elements that can be 0 or non-zero
list l = a list composed of non-zero elements chosen from m
Elements from m are chosen as follows:
Select the row, r, in m indicated by the first/next value in bfs_output
Starting from the first column in r select the non-zero elements in r
Append the elements chosen in 2 to l
Repeat until no more row indexes in bfs_output
For example:
0 3 1
bfs_output = [2 3 1] & matrix = 0 2 0 ==> list = [2 4 3 1]
4 0 0
I am not sure if this is what you are after. But if it is, we can use numpy's build in selection functions to select non-zero elements from a numpy array and also to chose rows in the order we want.
from io import StringIO
import numpy as np
bfs_output = [2,3,1]
file = StringIO(u"0,3,1\n0,2,0\n4,0,0")
matrix = np.loadtxt(file, delimiter=",")
# we are subtracting 1 from all elements in bfs_output
# in order to comply with indexes starting from 1
select_rows = matrix[np.subtract(bfs_output,1)]
select_rows_1d = np.reshape(select_rows,np.size(select_rows))
list = select_rows_1d[select_rows_1d != 0]
print(list) # output = [2 4 3 1]

Related

Get column indices of row-wise maximum values of a 2D array (with random tie-breaking)

Given a 2D numpy array, I want to construct an array out of the column indices of the maximum value of each row. So far, arr.argmax(1) works well. However, for my specific case, for some rows, 2 or more columns may contain the maximum value. In that case, I want to select a column index randomly (not the first index as it is the case with .argmax(1)).
For example, for the following arr:
arr = np.array([
[0, 1, 0],
[1, 1, 0],
[2, 1, 3],
[3, 2, 2]
])
there can be two possible outcomes: array([1, 0, 2, 0]) and array([1, 1, 2, 0]) each chosen with 1/2 probability.
I have code that returns the expected output using a list comprehension:
idx = np.arange(arr.shape[1])
ans = [np.random.choice(idx[ix]) for ix in arr == arr.max(1, keepdims=True)]
but I'm looking for an optimized numpy solution. In other words, how do I replace the list comprehension with numpy methods to make the code feasible for bigger arrays?
Use scipy.stats.rankdata and apply_along_axis as follows.
import numpy as np
from scipy.stats import rankdata
ranks = rankdata(-arr, axis = 1, method = "min")
func = lambda x: np.random.choice(np.where(x==1)[0])
idx = np.apply_along_axis(func, 1, ranks)
print(idx)
It returns [1 0 2 0] or [1 1 2 0].
The main idea is rankdata calculates ranks of every value in each row, and the maximum value will have 1. func randomly choices one of index whose corresponding value is 1. Finally, apply_along_axis applies the func to every row of arr.
After some advice I got offline, it turns out that randomization of maximum values are possible when we multiply the boolean array that flags row-wise maximum values by a random array of the same shape. Then what remains is a simple argmax(1) call.
# boolean array that flags maximum values of each row
mxs = arr == arr.max(1, keepdims=True)
# random array where non-maximum values are zero and maximum values are random values
random_arr = np.random.rand(*arr.shape) * mxs
# row-wise maximum of the auxiliary array
ans = random_arr.argmax(1)
A timeit test shows that for data of shape (507_563, 12), this code runs in ~172 ms on my machine while the loop in the question runs for 11 sec, so this is about 63x faster.

Python Group Values in a Nested List At Corresponding Indexes

Given a list nested with numpy lists, I want to go through each index in each corresponding list and keep track of the count depending on the element, which would be stored into a single list.
The minimum runnable code example below better showcases the problem.
import numpy as np
lst = [
np.array([1,0,1]),
np.array([1,1,1]),
np.array([2,2,1])
]
print(lst)
#Problem, take count of each corresponding index within list
#If the element == 2, subtract one. If it's 1, add 1. 0 has no value
#Make the output just one list with the calculated count for each index
#Expected output:
#lst = [1, 0, 3]
# 1 in index 0 because there's two 1s in each lists first index, but a 2 subtracts one from the count
# 0 in index 1 since there's one 1 which brings the count to 1, but a 2 subtracts one from the count
# 3 in index 2 since there's three 1's, adding 3 to the count.
I have looked into from collections import defaultdict and from itertools import groupby, but I couldn't understand how to use that to do anything else but sort a nested list.
Apologies if this issue is within the two sources I provided above.
If all the inner arrays have the same length you could transform it into a 2-dimensional array to leverage numpy vectorization:
# transform into two dimensional array
arr = np.array(lst)
# set -1 where values are equals to 2
t = np.where(arr == 2, -1, arr)
# sum across first axis (each corresponding positions on the inners lists)
res = t.sum(axis=0).tolist()
print(res)
Output
[1, 0, 3]

Finding the max value in each row of 2-dim vector when there is multiple max value in one row in python

I'm looking for a way to find the max value in each row of a 2-dim vector and save the indices of it in another vector. I know that i could do that with this code:
max_index = np.argmax(vec, axis=1)
Now my problem is when one row has multiple max values it takes the first index of it. Lets assume we have this matrix:
vec = [[1, 0 ,1],
[1, 2 ,3],
[0, 5 ,5]]
So i am thinking to replace the index of max with -1 when there is multiple max in one row.
At the end max_index should be like this.
max_index = [-1, 2, -1]
Thanks in advance
Trick: Take the argmax from left and right and check whether they coincide:
L = np.argmax(vec,1)
R = np.argmax(vec[:,::-1],1)
np.where(L+R==len(vec[0])-1,L,-1)
# array([-1, 2, -1])
If your original problem is to find the last index of multiple max values then you can follow these approaches
Approach #1
np.argmax((vec.max(axis=1)[...,None] == vec).cumsum(axis=1), axis=1)
Taking -1 as the last index when there are repeated max values in one row, will fail to give the correct index for a row which is like [[1,1,0]].
vec.max(axis=1)gives the max along each row.
vec.max(axis=1)[...,None]converts it into a 2D array.
(vec.max(axis=1)[...,None] == vec) compares each element in each row with the max in each row.
(vec.max(axis=1)[...,None] == vec).cumsum(axis=1) results in cumulative sum whose argmax gives the index of the last max value.
Case1: vec = [[1, 0 ,1], [1, 2 ,3], [0, 5 ,5]], result will be:
[2,2,2]
Case2: vec = [[1, 1 ,0], [1, 2 ,3], [0, 5 ,5]], result will be: [1,2,2]
Approach #2
R = np.argmax(vec[:,::-1],1) # Get the index of max from right side
result = vec.shape[1]-1-R
Here I am reversing the columns and then taking the argmax. After that I am making adjustments to get the correct index
Maybe this can solve your issue:
# Creating a copy of vec
vec_without_max = np.copy(vec);
# Remove the max values found before from the new copy
for i in range(np.shape(vec)[0]):
vec_without_max[i][max_index[i]] = np.iinfo(vec_without_max[i][max_index[i]].dtype).min
# Find the max values on the copy array without the max values of vec
max_index_again = np.argmax(vec_without_max, axis=1)
# Compare the two arrays, if we have the same max value we set the max_index equals to -1
for i in range(np.shape(vec)[0]):
if vec[i][max_index[i]] == vec[i][max_index_again[i]]:
max_index[i] = -1
This script returns
max_index = [-1, 2, -1]
for the example you posted but it should work with array of any dimensions.

to make a llist that have all zeros except the one column which should be value 1

can anyone help on this?
if i try this code,
a = np.array([2,1,5],[2,5,3])
b = np.zeros_like(a)
c=b[np.arange(len(a)), a.argmax()] = 1
print(c)
It gives errortoo many Indices for array
my motive is to make a list that gives me all columns zeros except the one that is highest in the input Numpy array,and make it '1'.
output shoud be([0,0,1],[0,1,0])
Code:
matrix = np.array([[2, 1, 5], [2, 5, 3]])
imax = matrix.argmax(1)
n_labels = np.size(matrix, 1)
onehot = np.eye(n_labels)[imax]
How it works, line by line:
Define the numpy array.
Get the index of the maximum value in each row.
Get the size of the matrix's row, N.
Create a NxN matrix (lets call it M), in which all the values are zero, except the main diagonal, which is 1. For each value i from step 2, select the row i from this new matrix M.

Sequential testing of elements in a 2D array?

Apologies if this is a simple question - I'm new to Python and numpy - but I'd be very grateful for your help.
I've got a 2D numpy array of data arranged in rows and columns, with the first column representing time, and subsequent columns showing values of different parameters at each point in time.
I want to read down a given column of data from top to bottom (i.e. time = 0 to time = number of rows), and test each element in that column in sequence to find the very first instance (and only the first instance) where the data values in that column meet given criteria.
This is different to testing 'all' or 'any' of the elements in a column 'all at once' by testing and iterating using the numpy arange() function.
As a minimal working example in pseudocode, if my array is:
myarray =
[[1, 4 ....]
[2, 3 ....]
[3, 8 ....]
[4, 9 ....]....]
...where the first column is time, and the second column contains the values of data collected at each time point.
I want to be able to iterate over the rows in sequence from top to bottom and test:
threshold = 5
for row = 0 to number of rows:
if data in [column 1, row] > threshold:
print "The first time point at which the data exceed the threshold is at time = 3 "
break
What is the most Pythonic (i.e. efficient and intelligible) way of doing this?
Is it necessary to convert the array into a list before iterating & testing, or is it possible to sequentially iterate and test over the array directly?
Hope this makes some sort of sense...
Many thanks in anticipation
Dave
Try this code:
>>> myarray = [[1, 4 ], [2, 3 ], [3, 8 ], [4, 9 ]]
>>> stop = False
>>> for row in myarray:
for d in row:
if d > 5:
print("Row: ", row, "Data: ", d)
stop = True
break
if stop:
break
('Row: ', [3, 8], 'Data: ', 8)
>>>

Categories