Numpy matrix sum without looping - python

Trying to build a numpy matrix without double for loops
if i have a matrix:
x = [val, val, val]
[val, val, val]
[val, val, val]
and I want to subtract each row's items with the other two rows while simultaneously extrapolating to a larger matrix with final result. Each row substraction (in this example) is 3 elements. (I'm doing it with larger matrices though)
new = [row 1 - 2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[row 1 - 3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, row 2 - 1, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, row 2 - 3, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 row 3 - 1]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 row 3 - 2]
And then similar but with the columns instead except the items are filled in horizontally, if that makes sense (each item is a single value vs above)
new = [col 1 - 2, 0.0, 0.0, col 1 - 2, 0.0, 0.0, col 1 - 2, 0.0, 0.0]
[col 1 - 3, 0.0, 0.0, col 1 - 3, 0.0, 0.0, col 1 - 3, 0.0, 0.0]
[0.0, col 2 - 1, 0.0, 0.0, col 2 - 1, 0.0, 0.0, col 2 - 1, 0.0]
[0.0, col 2 - 3, 0.0, 0.0, col 2 - 3, 0.0, 0.0, col 2 - 3, 0.0]
[0.0, 0.0, col 3 - 1, 0.0, 0.0, col 3 - 1, 0.0, 0.0, col 3 - 1]
[0.0, 0.0, col 3 - 2, 0.0, 0.0, col 3 - 2, 0.0, 0.0, col 3 - 2]
If someone has the numpy magic to figure this, I'll lose it ha.
Edit: better example with small matrix:
x = [[.5, 0.],
[.1, 1.2]]
turns into
new = [[ 0.4, -1.2, 0., 0. ],
[ 0., 0., -0.4, 1.2]]
and for column version
y = [[.2, .9],
[.6, .1]]
turns into
new = [[-0.7, 0., 0.5, 0. ],
[ 0., 0.7, 0., -0.5]]

Here is some indexing madness which I believe does what you are asking for:
>>> def magic(data):
... n, m = data.shape
... assert n==m
... rows = np.zeros((n, n-1, n, n), data.dtype)
... cols = np.zeros((n, n-1, n, n), data.dtype)
... idx = np.argsort(np.identity(n), kind='mergesort', axis=1)
... self = idx[:, -1] # should be just 0, 1, 2, 3, ...
... other = idx[:, :-1]
... rows[self, :, self, :] = data[:, None, :] - data[other[..., None], self]
... cols[self, ..., self] = data.T[:, None, :] - data.T[other[..., None], self]
... return rows.reshape(-1, n*n), cols.reshape(-1, n*n)
...
>>> magic(np.array([[.5,0], [.1,1.2]]))
(array([[ 0.4, -1.2, 0. , 0. ],
[ 0. , 0. , -0.4, 1.2]]), array([[ 0.5, 0. , -1.1, 0. ],
[ 0. , -0.5, 0. , 1.1]]))
>>> magic(np.array([[.2,.9], [.6,.1]]))
(array([[-0.4, 0.8, 0. , 0. ],
[ 0. , 0. , 0.4, -0.8]]), array([[-0.7, 0. , 0.5, 0. ],
[ 0. , 0.7, 0. , -0.5]]))

Related

str.replace() can not replace zero dot (0.) to zero dot zero (0.0)

I have an np.array called arr which is:
arr = np.array([[0.0, 0.0, 0.0], [1 / 3, 1 / 3, 0], [0.0, 0.0, 0.0]])
and I want to write its information to a single-line string called s as:
[[0.0, 0.0, 0.0], [1 / 3, 1 / 3, 0], [0.0, 0.0, 0.0]]
For this, I am using this type of conversation(in my code it is in a function):
import re
import numpy as np
arr = np.array([[0.0, 0.0, 0.0], [1 / 3, 1 / 3, 0], [0.0, 0.0, 0.0]])
s = np.array_str(arr, precision=4)
s = re.sub('(\d) +(-|\d)', r'\1,\2', s)
s.replace('^0. $', '0.0')
# s.replace('0. ', '0.0') #gives same result
s.replace('\n', ',')
print(s)
However, the result is:
[[0. 0. 0. ]
[0.3333,0.3333,0. ]
[0. 0. 0. ]]
You need to catch the output of s.replace() and save it as the s variable, or another variable name.
import re
import numpy as np
arr = np.array([[0.0, 0.0, 0.0], [1 / 3, 1 / 3, 0], [0.0, 0.0, 0.0]])
s = np.array_str(arr, precision=4)
s = re.sub('(\d) +(-|\d)', r'\1,\2', s)
s = s.replace('0. ', '0.0') #gives same result
s = s.replace('\n', ',')
print(s)
You could use nested comprehensions on the array and process it using the fraction module:
from fractions import Fraction
s = "["+", ".join("["+", ".join(f"{Fraction(n).limit_denominator(10000)}"
for n in row)+"]"
for row in arr )+"]"
print(s)
[[0, 0, 0], [1/3, 1/3, 0], [0, 0, 0]]
Note that, while this is close to your expected result, there is no way to distinguish the 0.0 from the 0 in the original array because this information is lost when the numbers are converted to floats by numpy. So all 0s will be printed the same way.
For more than 2 dimensions, you could generalize this into a recursive function:
from fractions import Fraction
def arrayToStr(arr):
if isinstance(arr,np.ndarray):
return "["+", ".join(arrayToStr(n) for n in arr)+"]"
if not arr return "0.0"
return f"{Fraction(arr).limit_denominator(10000)}"
This will print zeroes as 0.0 though I cannot fathom why you would want to do that specifically (and only) for zero.
Output:
arr = np.array([[0.0, 0.0, 0.0], [1 / 3, 1 / 3, 0], [0.0, 0.0, 0.0]])
print(arrayToStr(arr))
[[0.0, 0.0, 0.0], [1/3, 1/3, 0.0], [0.0, 0.0, 0.0]]
arr = np.arange(24).reshape((4,3,2))/6
print(arrayToStr(arr))
[[[0.0, 1/6], [1/3, 1/2], [2/3, 5/6]], [[1, 7/6], [4/3, 3/2], [5/3, 11/6]], [[2, 13/6], [7/3, 5/2], [8/3, 17/6]], [[3, 19/6], [10/3, 7/2], [11/3, 23/6]]]
If you don't mind getting the 1/3 values as decimals, you could use the json module which would do the formatting more directly:
import json
s = json.dumps(arr.tolist())
print(s)
[[0.0, 0.0, 0.0], [0.3333333333333333, 0.3333333333333333, 0.0], [0.0, 0.0, 0.0]]

How to get the max value and coordinates of a connected component?

For example, given a predicted probability map, like a
a = np.array([[0.1, 0.2, 0.3, 0.0, 0.0, 0.0],
[0.1, 0.92, 0.3, 0.0, 0.2, 0.1],
[0.1, 0.9, 0.3, 0.0, 0.7, 0.89],
[0.0, 0.0, 0.0, 0.0, 0.4, 0.5],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])
How can I find two max probability (0.9, 0.9) and coordinates ((1,1), (2,5)) of two connected components in a?
Use np.where or np.argwhere
>>> np.unique(a)[-2:]
array([0.89, 0.92])
>>> np.where(np.isin(a, np.unique(a)[-2:]))
(array([1, 2]), array([1, 5]))
# OR
>>> np.argwhere(np.isin(a, np.unique(a)[-2:]))
array([[1, 1],
[2, 5]])
Here is my answer, but maybe too complicated.
def nms_cls(loc, cls):
"""
Find the max class and prob point in a mask
:param loc: binary prediction with 0 and 1 (h, w)
:param cls: multi-classes prediction with prob (c, h, w)
:return: list of tuple (class, prob, coordinate)
"""
prob = np.max(cls, axis=0) # (H, W)
cls_idx = np.argmax(cls, axis=0)
point_list = []
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(loc, connectivity=8, ltype=None)
for i in range(num_labels):
# get the mask of connected component i
label_i = np.copy(labels)
label_i[label_i != i] = 0
label_i[label_i > 0] = 1
prob_mask_i = prob * label_i
# get max prob's coords and class
state_i = {}
state_i['coord'] = np.unravel_index(prob_mask_i.argmax(), prob_mask_i.shape)
state_i['cls'] = cls_idx[state_i['coord'][0], state_i['coord'][1]]
state_i['prob'] = prob[state_i['coord'][0], state_i['coord'][1]]
point_list.append(state_i)
return point_list

How can I do element-wise multiplication of two arrays when one element is derived by a formula?

I'm trying to multiply two arrays together element wise
expected_state = np.array([-1.004 0.002 0.0])
b = np.array([[1.0, 0.0, 0.0], [[stoch_rate[1]*(2*(popul_num[0]) - 1)/2], 0.0, 0.0], [0.0, 0.5, 0.0], [0.0, 0.4, 0.0]])
Each element of expected_state should be multiplied postionally with every element from each row in b
So [[-1.004*1.0, 0.002*0.0, 0.0*0.0], [-1.004*[stoch_rate[1]*(2*(popul_num[0]) - 1)/2], 0.002*0.0....etc]]
Array b is defined in a function so that the first element in row two can change as stoch_rate and popul_num change as the program executes.
def update_matrix(popul_num, stoch_rate):
"""Specific to this model
will need to change if different model
implements equaiton 24 of the Gillespie paper"""
b = np.array([[1.0, 0.0, 0.0], [[stoch_rate[1]*(2*(popul_num[0]) - 1)/2], 0.0, 0.0], [0.0, 0.5,
0.0], [0.0, 0.4, 0.0]])
return b
So far I've used nested for loops to try and do the multiplication:
for j in range(len(evaluate_propensity)):
for i in range(len(popul_num)):
denominator[j] += (exptd_state_array[i]*b[j, i]) # TypeError: Cant multiply sequence by non-int type "numpy.float64"
But get the TypeError: can't multiply sequence by non-int of type 'numpy.float64'
I've had a look at some other posts which say things like this happen when trying to multiply list indecies with non-integers because list indecies can't have partial numbers. Which I understand but the elements of my arrays are meant to be floats so I'm not too sure how to over come that.
Then after reading some more, I found that the error came when the program was trying to do the multiplication of the [stoch_rate[1]*(2*(popul_num[0]) - 1)/2] element from array b and was wondering if the TypeError would come from that formula derived element of the array and if it does how could that be fixed?
Cheers
EDIT:
popul_num = np.array([1.0E5, 0, 0]) # array of molecule numbers for 3 species in model
stoch_rate = np.array([1.0, 0.002, 0.5, 0.04]) # rates of the 4 reactions in the model
evaluate_propensity = np.array(a, b, c, d) # An array of the probability of each reaction occuring, is dynamically calculated on each iteration so isn't hard coded.
exptd_state_array and expected_state are the same thing sorry forgot to change the short hand
popul_num = np.array([1.0E5, 0, 0]) # array of molecule numbers for 3 species in model
stoch_rate = np.array([1.0, 0.002, 0.5, 0.04]) # rates of the 4 reactions in the model
evaluate_propensity = np.array((.25,.25,.25,.25)) # An array of the probability of each reaction occuring, is dynamically calculated on each iteration so isn't hard coded.
denominator = np.zeros(4,)
expected_state = np.array([-1.004, 0.002, 0.0])
exptd_state_array = expected_state
b = np.array([[1.0, 0.0, 0.0], [[stoch_rate[1]*(2*(popul_num[0]) - 1)/2], 0.0, 0.0], [0.0, 0.5, 0.0], [0.0, 0.4, 0.0]])
b
array([[1.0, 0.0, 0.0],
[list([199.999]), 0.0, 0.0],
[0.0, 0.5, 0.0],
[0.0, 0.4, 0.0]], dtype=object)
so, b has mixed types. The list is generated by the square brackets around [stoch_rate[1]*(2*(popul_num[0]) - 1)/2]
Multiplication for lists is defined as concatenation with itself: 3 * [5] = [5, 5, 5]. This fails with floats, as #hpaulj pointed out in the comment.
leaving out the square brackets:
b = np.array([[1.0, 0.0, 0.0], [stoch_rate[1]*(2*(popul_num[0]) - 1)/2, 0.0, 0.0], [0.0, 0.5, 0.0], [0.0, 0.4, 0.0]])
b
array([[ 1. , 0. , 0. ],
[199.999, 0. , 0. ],
[ 0. , 0.5 , 0. ],
[ 0. , 0.4 , 0. ]])
Then, the double loop does execute

Find groups of values that are !=0 in a list

I'm looking for an easy way to find "plateaus" or groups in python lists. As input, I have something like this:
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
I want to extract the middle position of every "group". Group is defined in this case as data that is !=0 and for example at least 3 positions long. Enclaved single zeros (like on position 6) should be ignored.
Basically, I want to get the following output:
myoutput = [8, 20]
For my use case, it is not really important to get very precise output data. [10,21] would still be fine.
To conclude everything: first group: [0.143, 0.0, 0.22, 0.135, 0.44, 0.1]; second group: [0.33, 0.65, 0.22]. Now, the position of the middle element (or left or right from the middle, if there is no true middle value). So in the output 8 would be the middle of the first group and 20 the middle of the second group.
I've already tried some approaches. But they are not as stable as I wanted them to be (for example: more enclaved zeros can cause problems). So before investing more time in this idea, I wanted to ask if there is a better way to implement this feature. I even think that this could be a generic problem. Is there maybe already standard code that solves it?
There are other questions that describe roughly the same problem, but I have also the need to "smooth" the data before processing.
smooth the data - get rid of enclaved zeros
import numpy as np
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
y_smooth = smooth(mydata, 20)
find start points in the smooth list (if a value is !=0 and the value before was 0 it should be a start point). If an endpoint was found: use the last start point that was found and the current endpoint to get the middle position of the group and write it to a deque.
laststart = 0
lastend = 0
myoutput = deque()
for i in range(1, len(y_smooth)-1):
#detect start:
if y_smooth[i]!=0 and y_smooth[i-1]==0:
laststart = i
#detect end:
elif y_smooth[i]!=0 and y_smooth[i+1]==0 and laststart+2 < i:
lastend = i
myoutput.appendleft(laststart+(lastend-laststart)/2)
EDIT: to simplify everything, I gave only a short example for my input data at the beginning. This short list actually causes a problematic smoothing output - the whole list will get smoothed, and no zero will be left. actual input data; actual input data after smoothing
A fairly simple way of finding groups as you described would be to convert data to a boolean array with ones for data inside groups and 0 for data outside the groups and compute the difference of two consecutive value, this way you'll have 1 for the start of a group and -1 for the end.
Here's an example of that :
import numpy as np
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
arr = np.array(mydata)
mask = (arr!=0).astype(np.int) #array that contains 1 for every non zero value, zero other wise
padded_mask = np.pad(mask,(1,),"constant") #add a zero at the start and at the end to handle edge cases
edge_mask = padded_mask[1:] - padded_mask[:-1] #diff between a value and the following one
#if there's a 1 in edge mask it's a group start
#if there's a -1 it's a group stop
#where gives us the index of those starts and stops
starts = np.where(edge_mask == 1)[0]
stops = np.where(edge_mask == -1)[0]
print(starts,stops)
#we format groups and drop groups that are too small
groups = [group for group in zip(starts,stops) if (group[0]+2 < group[1])]
for group in groups:
print("start,stop : {} middle : {}".format(group,(group[0]+group[1])/2) )
And the output :
[ 5 7 19] [ 6 11 22]
start,stop : (7, 11) middle : 9.0
start,stop : (19, 22) middle : 20.5
Your smoothed data has no zeros left:
import numpy as np
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
print(box)
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
mydata = [0.0, 0.0, 0.0, 0.0,-0.2, 0.143,
0.0, 0.22, 0.135, 0.44, 0.1, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.33, 0.65, 0.22, 0.0, 0.0, 0.0,
0.0, 0.0]
y_smooth = smooth(mydata, 27)
print(y_smooth)
Output:
[ 0.0469 0.0519 0.0519 0.0519 0.0519 0.0519
0.0519 0.0519 0.0519 0.0519 0.0684 0.1009
0.1119 0.1119 0.1119 0.1119 0.10475 0.10475
0.09375 0.087 0.065 0.06 0.06 0.06
0.06 0.06 0.06 ]
A way to find it in your original data would be:
def findGroups(data, minGrpSize=1):
startpos = -1
endpos = -1
pospos = []
for idx,v in enumerate(mydata):
if v > 0 and startpos == -1:
startpos = idx
elif v == 0.0:
if startpos > -1:
if idx < (len(mydata)-1) and mydata[idx+1] != 0.0:
pass # ignore one 0.0 in a run
else:
endpos = idx
if startpos > -1:
if endpos >-1 or idx == len(mydata)-1: # both set or last one
if (endpos - startpos) >= minGrpSize:
pospos.append((startpos,endpos))
startpos = -1
endpos = -1
return pospos
pos = findGroups(mydata,1)
print(*map(lambda x: sum(x) // len(x), pos))
pos = findGroups(mydata,3)
print(*map(lambda x: sum(x) // len(x), pos))
pos = findGroups(mydata,5)
print(*map(lambda x: sum(x) // len(x), pos))
Output:
8 20
8 20
8
Part 2 - find the group midpoint:
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
groups = []
last_start = 0
last_end = 0
in_group = 0
for i in range(1, len(mydata) - 1):
if not in_group:
if mydata[i] and not mydata[i - 1]:
last_start = i
in_group = 1
else: # a group continued.
if mydata[i]:
last_end = i
elif last_end - last_start > 1: # we have a group i.e. not single non-zero value
mid_point = (last_end - last_start) + last_start
groups.append(((last_end - last_start)//2) + last_start)
last_start, last_end, in_group = (0, 0, 0)
else: # it was just a single non-zero.
last_start, last_end, in_group = (0, 0, 0)
print(groups)
Output:
[8, 20]
Full numpy solution would be something like this: (not fully optimized)
import numpy as np
input_data = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.143,
0.0, 0.22, 0.135, 0.44, 0.1, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.33, 0.65, 0.22, 0.0, 0.0, 0.0,
0.0, 0.0])
# Find transitions between zero and nonzero
non_zeros = input_data > 0
changes = np.ediff1d(non_zeros, to_begin=not non_zeros[0],
to_end=not non_zeros[-1])
change_idxs = np.nonzero(changes)[0]
# Filter out small holes
holes = change_idxs.reshape(change_idxs.size//2, 2)
hole_sizes = holes[:, 1]-holes[:, 0]
big_holes = holes[hole_sizes > 1]
kept_change_idxs = np.r_[0, big_holes.flatten(), input_data.size]
# Get midpoints of big intervals
intervals = kept_change_idxs.reshape(kept_change_idxs.size//2, 2)
big_intervals = intervals[intervals[:, 1]-intervals[:, 0] >= 3]
print((big_intervals[:, 0]+big_intervals[:, 1])//2)

Get to a dictionary format (by just saving non-zero values and indices) from a sparse matrixin in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I have a sparse matrix imagine something like following :
X=([1.5 0.0 0.0 71.9 0.0 0.0 0.0],
[0.0 10.0 0.0 2.0 0.0 0.0 0.0],
[0.0 0.0 0.0 0.0 0.0 0.0 11.0])
is there any specific method already existed which can convert such matrix into the following file format(or matrix), where each row only contain nonzero values and their corresponding indices of rows in X:
Example
X1=( 0:1.5 3:71.9
1:10 3:2
6:11 )
my question is is there any existed way which can produce such dictionary out of a sparse matrix in python ?
You could use a scipy.sparse.csr_matrix. It contains the data you are looking for in its indptr, indices and data attributes:
import scipy.sparse as sparse
X = sparse.csr_matrix([[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0],
[0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]])
for row in range(X.shape[0]):
sl = slice(X.indptr[row], X.indptr[row+1])
pairs = zip(X.indices[sl], X.data[sl])
print(' '.join(['{}:{}'.format(idx, val) for idx, val in pairs]))
yields
0:1.5 3:71.9
1:10.0 3:2.0
6:11.0
Because you have a matrix of rows and columns, in my humble opinion, I think you need to mention the row and column of non-zero values for ease reference later, this can be done without importing any libraries:
>>> x
[[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0], [0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]]
>>>
>>> l = []
>>>
>>> for i,subl in enumerate(x):
for j, item in enumerate(subl):
if item:
l.append(([i,j],item))
>>> l
[([0, 0], 1.5), ([0, 3], 71.9), ([1, 1], 10.0), ([1, 3], 2.0), ([2, 6], 11.0)]
This should get you a long way there:
X = np.array(
[[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0],
[0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]])
>>> zip(np.argwhere(X).tolist(), X[X != 0])
[([0, 0], 1.5),
([0, 3], 71.900000000000006),
([1, 1], 10.0),
([1, 3], 2.0),
([2, 6], 11.0)]
You can also use a nested dictionary comprehension:
>>> {(row, col): val
for row, data in enumerate(X)
for col, val in enumerate(data)
if val != 0}
{(0, 0): 1.5,
(0, 3): 71.900000000000006,
(1, 1): 10.0,
(1, 3): 2.0,
(2, 6): 11.0}

Categories