I'm looking for an easy way to find "plateaus" or groups in python lists. As input, I have something like this:
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
I want to extract the middle position of every "group". Group is defined in this case as data that is !=0 and for example at least 3 positions long. Enclaved single zeros (like on position 6) should be ignored.
Basically, I want to get the following output:
myoutput = [8, 20]
For my use case, it is not really important to get very precise output data. [10,21] would still be fine.
To conclude everything: first group: [0.143, 0.0, 0.22, 0.135, 0.44, 0.1]; second group: [0.33, 0.65, 0.22]. Now, the position of the middle element (or left or right from the middle, if there is no true middle value). So in the output 8 would be the middle of the first group and 20 the middle of the second group.
I've already tried some approaches. But they are not as stable as I wanted them to be (for example: more enclaved zeros can cause problems). So before investing more time in this idea, I wanted to ask if there is a better way to implement this feature. I even think that this could be a generic problem. Is there maybe already standard code that solves it?
There are other questions that describe roughly the same problem, but I have also the need to "smooth" the data before processing.
smooth the data - get rid of enclaved zeros
import numpy as np
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
y_smooth = smooth(mydata, 20)
find start points in the smooth list (if a value is !=0 and the value before was 0 it should be a start point). If an endpoint was found: use the last start point that was found and the current endpoint to get the middle position of the group and write it to a deque.
laststart = 0
lastend = 0
myoutput = deque()
for i in range(1, len(y_smooth)-1):
#detect start:
if y_smooth[i]!=0 and y_smooth[i-1]==0:
laststart = i
#detect end:
elif y_smooth[i]!=0 and y_smooth[i+1]==0 and laststart+2 < i:
lastend = i
myoutput.appendleft(laststart+(lastend-laststart)/2)
EDIT: to simplify everything, I gave only a short example for my input data at the beginning. This short list actually causes a problematic smoothing output - the whole list will get smoothed, and no zero will be left. actual input data; actual input data after smoothing
A fairly simple way of finding groups as you described would be to convert data to a boolean array with ones for data inside groups and 0 for data outside the groups and compute the difference of two consecutive value, this way you'll have 1 for the start of a group and -1 for the end.
Here's an example of that :
import numpy as np
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
arr = np.array(mydata)
mask = (arr!=0).astype(np.int) #array that contains 1 for every non zero value, zero other wise
padded_mask = np.pad(mask,(1,),"constant") #add a zero at the start and at the end to handle edge cases
edge_mask = padded_mask[1:] - padded_mask[:-1] #diff between a value and the following one
#if there's a 1 in edge mask it's a group start
#if there's a -1 it's a group stop
#where gives us the index of those starts and stops
starts = np.where(edge_mask == 1)[0]
stops = np.where(edge_mask == -1)[0]
print(starts,stops)
#we format groups and drop groups that are too small
groups = [group for group in zip(starts,stops) if (group[0]+2 < group[1])]
for group in groups:
print("start,stop : {} middle : {}".format(group,(group[0]+group[1])/2) )
And the output :
[ 5 7 19] [ 6 11 22]
start,stop : (7, 11) middle : 9.0
start,stop : (19, 22) middle : 20.5
Your smoothed data has no zeros left:
import numpy as np
def smooth(y, box_pts):
box = np.ones(box_pts)/box_pts
print(box)
y_smooth = np.convolve(y, box, mode='same')
return y_smooth
mydata = [0.0, 0.0, 0.0, 0.0,-0.2, 0.143,
0.0, 0.22, 0.135, 0.44, 0.1, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.33, 0.65, 0.22, 0.0, 0.0, 0.0,
0.0, 0.0]
y_smooth = smooth(mydata, 27)
print(y_smooth)
Output:
[ 0.0469 0.0519 0.0519 0.0519 0.0519 0.0519
0.0519 0.0519 0.0519 0.0519 0.0684 0.1009
0.1119 0.1119 0.1119 0.1119 0.10475 0.10475
0.09375 0.087 0.065 0.06 0.06 0.06
0.06 0.06 0.06 ]
A way to find it in your original data would be:
def findGroups(data, minGrpSize=1):
startpos = -1
endpos = -1
pospos = []
for idx,v in enumerate(mydata):
if v > 0 and startpos == -1:
startpos = idx
elif v == 0.0:
if startpos > -1:
if idx < (len(mydata)-1) and mydata[idx+1] != 0.0:
pass # ignore one 0.0 in a run
else:
endpos = idx
if startpos > -1:
if endpos >-1 or idx == len(mydata)-1: # both set or last one
if (endpos - startpos) >= minGrpSize:
pospos.append((startpos,endpos))
startpos = -1
endpos = -1
return pospos
pos = findGroups(mydata,1)
print(*map(lambda x: sum(x) // len(x), pos))
pos = findGroups(mydata,3)
print(*map(lambda x: sum(x) // len(x), pos))
pos = findGroups(mydata,5)
print(*map(lambda x: sum(x) // len(x), pos))
Output:
8 20
8 20
8
Part 2 - find the group midpoint:
mydata = [0.0, 0.0, 0.0, 0.0, 0.0, 0.143, 0.0, 0.22, 0.135, 0.44, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.33, 0.65, 0.22, 0.0, 0.0, 0.0, 0.0, 0.0]
groups = []
last_start = 0
last_end = 0
in_group = 0
for i in range(1, len(mydata) - 1):
if not in_group:
if mydata[i] and not mydata[i - 1]:
last_start = i
in_group = 1
else: # a group continued.
if mydata[i]:
last_end = i
elif last_end - last_start > 1: # we have a group i.e. not single non-zero value
mid_point = (last_end - last_start) + last_start
groups.append(((last_end - last_start)//2) + last_start)
last_start, last_end, in_group = (0, 0, 0)
else: # it was just a single non-zero.
last_start, last_end, in_group = (0, 0, 0)
print(groups)
Output:
[8, 20]
Full numpy solution would be something like this: (not fully optimized)
import numpy as np
input_data = np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.143,
0.0, 0.22, 0.135, 0.44, 0.1, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.33, 0.65, 0.22, 0.0, 0.0, 0.0,
0.0, 0.0])
# Find transitions between zero and nonzero
non_zeros = input_data > 0
changes = np.ediff1d(non_zeros, to_begin=not non_zeros[0],
to_end=not non_zeros[-1])
change_idxs = np.nonzero(changes)[0]
# Filter out small holes
holes = change_idxs.reshape(change_idxs.size//2, 2)
hole_sizes = holes[:, 1]-holes[:, 0]
big_holes = holes[hole_sizes > 1]
kept_change_idxs = np.r_[0, big_holes.flatten(), input_data.size]
# Get midpoints of big intervals
intervals = kept_change_idxs.reshape(kept_change_idxs.size//2, 2)
big_intervals = intervals[intervals[:, 1]-intervals[:, 0] >= 3]
print((big_intervals[:, 0]+big_intervals[:, 1])//2)
Related
I used two online references to construct a neural network in python with four input nodes, a layer of 4 hidden nodes, and 6 output nodes. When I run the network, the loss increases rather than decreasing which I believe means its predictions are getting worse.
Sorry for the ocean of code, I have no idea where in the code the issue could be. Nothing that I did has been able to fix this. Is there something wrong with my code, or is my assumption about the loss function wrong?
import numpy as np
#defining inputs and real outputs
inputData = np.array([[10.0, 5.0, 15.0, 3.0],
[9.0, 6.0, 16.0, 4.0],
[8.0, 4.0, 17.0, 5.0],
[7.0, 3.0, 18.0, 6.0],
[6.0, 2.0, 19.0, 7.0]])
statsReal = np.array([[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]])
def sigmoid(x):
return 1/(1 + np.exp(-x))
def sigmoid_d_dx(x):
return sigmoid(x) * (1 - sigmoid(x))
def softmax(A):
expA = np.exp(A)
return expA / expA.sum(axis=1, keepdims=True)
#defining the hidden and output nodes, and the weights and biases for hidden and output layers
instances = inputData.shape[0]
attributes = inputData.shape[1]
hidden_nodes = 4
output_nodes = 6
wh = np.random.rand(attributes,hidden_nodes)
bh = np.random.randn(hidden_nodes)
wo = np.random.rand(hidden_nodes,output_nodes)
bo = np.random.randn(output_nodes)
learningRate = 10e-4
error_cost = []
for epoch in range(100):
#Feedforward Phase 1
zh = np.dot(inputData, wh) + bh
ah = sigmoid(zh)
#Feedforward Phase 2
zo = np.dot(ah, wo) + bo
ao = softmax(zo)
#Backpropogation Phase 1
dcost_dzo = ao - statsReal
dzo_dwo = ah
dcost_wo = np.dot(dzo_dwo.T, dcost_dzo)
dcost_bo = dcost_dzo
#Backpropogation Phase 2
dzo_dah = wo
dcost_dah = np.dot(dcost_dzo, dzo_dah.T)
dah_dzh = sigmoid_d_dx(zh)
dzh_dwh = inputData
dcost_wh = np.dot(dzh_dwh.T, dah_dzh * dcost_dah)
dcost_bh = dcost_dah*dah_dzh
#Weight Updates
wh -= learningRate * dcost_wh
bh -= learningRate * dcost_bh.sum(axis=0)
wo -= learningRate * dcost_wo
bo -= learningRate * dcost_bo.sum(axis=0)
loss = np.sum(-statsReal * np.log(ao))
print(loss)
error_cost.append(loss)
print(error_cost)```
Your network is learning when you train with reasonable data.
Try this data for example. I added one distinct case for every class and one hot encoded the targets. I scaled the inputs to [0.0, 1.0]
inputData = np.array([[1.0, 0.5, 0.0, 0.0],
[0.0, 1.0, 0.5, 0.0],
[1.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 0.0, 0.5],
[0.0, 0.0, 0.0, 1.0],
[1.0, 1.0, 0.5, 0.0]])
statsReal = np.array([[1.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 1.0]])
Increase the learning rate
learningRate = 10e-2
Train for more epochs and print a little bit less often.
for epoch in range(1000):
#....
if epoch % 100 == 99: print(loss)
Output of your loss function
6.116573523774877
2.6901680150532847
1.323221228926058
0.7688474199923144
0.5186915091033664
0.38432651801528794
0.3024486736712547
0.24799685736356275
0.20944414625474833
0.1808455098847857
For example, given a predicted probability map, like a
a = np.array([[0.1, 0.2, 0.3, 0.0, 0.0, 0.0],
[0.1, 0.92, 0.3, 0.0, 0.2, 0.1],
[0.1, 0.9, 0.3, 0.0, 0.7, 0.89],
[0.0, 0.0, 0.0, 0.0, 0.4, 0.5],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])
How can I find two max probability (0.9, 0.9) and coordinates ((1,1), (2,5)) of two connected components in a?
Use np.where or np.argwhere
>>> np.unique(a)[-2:]
array([0.89, 0.92])
>>> np.where(np.isin(a, np.unique(a)[-2:]))
(array([1, 2]), array([1, 5]))
# OR
>>> np.argwhere(np.isin(a, np.unique(a)[-2:]))
array([[1, 1],
[2, 5]])
Here is my answer, but maybe too complicated.
def nms_cls(loc, cls):
"""
Find the max class and prob point in a mask
:param loc: binary prediction with 0 and 1 (h, w)
:param cls: multi-classes prediction with prob (c, h, w)
:return: list of tuple (class, prob, coordinate)
"""
prob = np.max(cls, axis=0) # (H, W)
cls_idx = np.argmax(cls, axis=0)
point_list = []
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(loc, connectivity=8, ltype=None)
for i in range(num_labels):
# get the mask of connected component i
label_i = np.copy(labels)
label_i[label_i != i] = 0
label_i[label_i > 0] = 1
prob_mask_i = prob * label_i
# get max prob's coords and class
state_i = {}
state_i['coord'] = np.unravel_index(prob_mask_i.argmax(), prob_mask_i.shape)
state_i['cls'] = cls_idx[state_i['coord'][0], state_i['coord'][1]]
state_i['prob'] = prob[state_i['coord'][0], state_i['coord'][1]]
point_list.append(state_i)
return point_list
Below is a portion of python code which is meant to work with FreeCAD libraries. I can provide the full code if requested.
What's weird in this code is that appending to a list, mr_fus.References, does no effect on the size of the list. I also tried to append to a dummy list, temp, and its size returned as expected.
Here is the definition of References.
Although type() indicates that References is a list, to me, it looks like no ordinary list. I am curious if it is possible for a list to deny to add element to itself.
temp = [] # just for comparison
for i in range(1,len(App.ActiveDocument.Shape004.Shape.Faces)):
mr_fus.References.append((App.ActiveDocument.Shape004.Shape, App.ActiveDocument.Shape004.Shape.Faces[i]))
temp.append((App.ActiveDocument.Shape004.Shape, App.ActiveDocument.Shape004.Shape.Faces[i]))
print(type(mr_fus.References)) # <class 'list'>
print(len(mr_fus.References)) # 0 # why??
print(len(temp)) # 18
EDIT: Here is a reproducible example though not minimal.
import sys
import math
import numpy
sys.path.append('/usr/lib/freecad/lib/')
import FreeCAD
import Draft
import Part
import BOPTools.JoinFeatures
doc = FreeCAD.newDocument('newdoc')
ZERO = 1e-10
def U(c, xl):
u = c[5]
if c[6] != 0:
t = c[0] + c[1] * abs((xl + c[2]) / c[3]) ** c[4]
if abs(t) <= ZERO:
t = 0
u += c[6] * t ** (1./c[7])
return u;
class Fuselage:
def H(xl):
if xl < 0.4:
c = [1.0, -1.0, -0.4, 0.4, 1.8, 0.0, 0.25, 1.8]
elif xl < 0.8:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0, 0.0]
elif xl < 1.9:
c = [1.0, -1.0, -0.8, 1.1, 1.5, 0.05, 0.2, 0.6]
elif xl < 2.0:
c = [1.0, -1.0, -1.9, 0.1, 2.0, 0.0, 0.05, 2.0]
return U(c, xl);
def W(xl):
if xl < 0.4:
c = [1.0, -1.0, -0.4, 0.4, 2.0, 0.0, 0.25, 2.0]
elif xl < 0.8:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0, 0.0]
elif xl < 1.9:
c = [1.0, -1.0, -0.8, 1.1, 1.5, 0.05, 0.2, 0.6]
elif xl < 2.0:
c = [1.0, -1.0, -1.9, 0.1, 2.0, 0.0, 0.05, 2.0]
return U(c, xl);
def Z(xl):
if xl < 0.4:
c = [1.0, -1.0, -0.4, 0.4, 1.8, -0.08, 0.08, 1.8]
elif xl < 0.8:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
elif xl < 1.9:
c = [1.0, -1.0, -0.8, 1.1, 1.5, 0.04, -0.04, 0.6]
elif xl < 2.0:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 0.04, 0.0, 0.0]
return U(c, xl);
def N(xl):
if xl < 0.4:
c = [2.0, 3.0, 0.0, 0.4, 1.0, 0.0, 1.0, 1.0]
elif xl < 0.8:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0]
elif xl < 1.9:
c = [5.0, -3.0, -0.8, 1.1, 1.0, 0.0, 1.0, 1.0]
elif xl < 2.0:
c = [2.0, 0.0, 0.0, 0.0, 0.0, 0.04, 1.0, 1.0]
return U(c, xl);
class Pylon:
def H(xl):
if xl < 0.8:
c = [1.0, -1.0, -0.8, 0.4, 3.0, 0.0, 0.145, 3.0]
elif xl < 1.018:
c = [1.0, -1.0, -0.8, 0.218, 2.0, 0.0, 0.145, 2.0]
return U(c, xl);
def W(xl):
if xl < 0.8:
c = [1.0, -1.0, -0.8, 0.4, 3.0, 0.0, 0.166, 3.0]
elif xl < 1.018:
c = [1.0, -1.0, -0.8, 0.218, 2.0, 0.0, 0.166, 2.0]
return U(c, xl);
def Z(xl):
if xl < 0.4:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 0.125, 0.0, 0.0]
elif xl < 1.018:
c = [1.0, -1.0, -0.8, 1.1, 1.5, 0.065, 0.06, 0.6]
return U(c, xl);
def N(xl):
if xl < 0.4:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0]
elif xl < 1.018:
c = [0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 0.0, 0.0]
return U(c, xl);
def yl(xl, phi, part):
return r(part.H(xl), part.W(xl), part.N(xl), phi) * math.sin(phi)
def zl(xl, phi, part):
return r(part.H(xl), part.W(xl), part.N(xl), phi) * math.cos(phi) + part.Z(xl)
def r(H, W, N, phi):
a = abs(0.5 * H * math.sin(phi)) ** N + abs(0.5 * W * math.cos(phi)) ** N
b = (0.25 * H * W) ** N
return (b / a) ** (1./N)
xm = numpy.linspace(0.00001, 2, 10)
xp = numpy.linspace(0.40001, 1.018, 10)
p = numpy.linspace(0, 2*math.pi, 10)
def makepart(part, x):
polygons = []
for i in range(len(x)-1):
points = []
for j in range(len(p)-1):
points.append(FreeCAD.Vector(x[i], yl(x[i], p[j], part), zl(x[i], p[j], part)))
points.append(points[0])
polygons.append(Part.makePolygon(points))
loft = Part.makeLoft(polygons)
cap1 = Part.Face(polygons[0])
cap2 = Part.Face(polygons[-1])
shell = Part.Shell(loft.Faces+[cap1, cap2])
Part.show(shell)
return shell
fuselage = makepart(Fuselage, xm)
pylon = makepart(Pylon, xp)
# join-connect fuselage and pylon
# let's call the joined object 'heli'
heli = BOPTools.JoinFeatures.makeConnect(name = 'Connected')
heli.Objects = [App.ActiveDocument.Shape, App.ActiveDocument.Shape001]
heli.Proxy.execute(heli)
heli.purgeTouched()
# make heli a solid
s = Part.Solid(Part.Shell(heli.Shape.Faces))
Part.show(s)
# make a sphere
sphere = Part.makeSphere(2,FreeCAD.Vector(1,0,0))
Part.show(sphere)
# cut heli from sphere
cut = sphere.cut(s)
Part.show(cut)
# extract seem of sphere
import CompoundTools.CompoundFilter
f = CompoundTools.CompoundFilter.makeCompoundFilter(name = 'CompoundFilter')
f.Base = App.ActiveDocument.Shape004
f.FilterType = 'window-volume'
f.Proxy.execute(f)
f.purgeTouched()
# make a line from seem of sphere to heli
line = Draft.makeWire([heli.Shape.Vertex27.Point, App.ActiveDocument.CompoundFilter.Shape.Vertex1.Point])
# split heli and line
import BOPTools.SplitFeatures
split = BOPTools.SplitFeatures.makeBooleanFragments(name= 'BooleanFragments')
split.Objects = [App.ActiveDocument.Shape004, App.ActiveDocument.Line]
split.Mode = 'Standard'
split.Proxy.execute(split)
split.purgeTouched()
#export to step
#split.Shape.exportStep("robin.step")
import ObjectsFem
mesh = ObjectsFem.makeMeshGmsh(FreeCAD.ActiveDocument, 'FEMMeshGmsh')
mesh.CharacteristicLengthMin = 0.5
mesh.CharacteristicLengthMax = 0.5
mesh.ElementDimension = 3
FreeCAD.ActiveDocument.ActiveObject.Part = FreeCAD.ActiveDocument.Shape004
mr_fus = ObjectsFem.makeMeshRegion(FreeCAD.ActiveDocument, FreeCAD.ActiveDocument.FEMMeshGmsh, 0.5, 'fus')
mr_outer = ObjectsFem.makeMeshRegion(FreeCAD.ActiveDocument, FreeCAD.ActiveDocument.FEMMeshGmsh, 1.0, 'outer')
mr_fus.CharacteristicLength = 0.7
temp = []
for i in range(1,len(App.ActiveDocument.Shape004.Shape.Faces)):
mr_fus.References.append((App.ActiveDocument.Shape004.Shape, App.ActiveDocument.Shape004.Shape.Faces[i]))
temp.append((App.ActiveDocument.Shape004.Shape, App.ActiveDocument.Shape004.Shape.Faces[i]))
print(type(mr_fus.References))
print(len(mr_fus.References))
print(len(temp))
EDIT: About FreeCAD version:
OS: Ubuntu 19.04
Word size of OS: 64-bit
Word size of FreeCAD: 64-bit
Version: 0.18.4.
Build type: Release
Python version: 3.7.3
Qt version: 5.12.2
Coin version: 4.0.0a
OCC version: 7.3.0
Locale: English/United States (en_US)
Trying to build a numpy matrix without double for loops
if i have a matrix:
x = [val, val, val]
[val, val, val]
[val, val, val]
and I want to subtract each row's items with the other two rows while simultaneously extrapolating to a larger matrix with final result. Each row substraction (in this example) is 3 elements. (I'm doing it with larger matrices though)
new = [row 1 - 2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[row 1 - 3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, row 2 - 1, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, row 2 - 3, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 row 3 - 1]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0 row 3 - 2]
And then similar but with the columns instead except the items are filled in horizontally, if that makes sense (each item is a single value vs above)
new = [col 1 - 2, 0.0, 0.0, col 1 - 2, 0.0, 0.0, col 1 - 2, 0.0, 0.0]
[col 1 - 3, 0.0, 0.0, col 1 - 3, 0.0, 0.0, col 1 - 3, 0.0, 0.0]
[0.0, col 2 - 1, 0.0, 0.0, col 2 - 1, 0.0, 0.0, col 2 - 1, 0.0]
[0.0, col 2 - 3, 0.0, 0.0, col 2 - 3, 0.0, 0.0, col 2 - 3, 0.0]
[0.0, 0.0, col 3 - 1, 0.0, 0.0, col 3 - 1, 0.0, 0.0, col 3 - 1]
[0.0, 0.0, col 3 - 2, 0.0, 0.0, col 3 - 2, 0.0, 0.0, col 3 - 2]
If someone has the numpy magic to figure this, I'll lose it ha.
Edit: better example with small matrix:
x = [[.5, 0.],
[.1, 1.2]]
turns into
new = [[ 0.4, -1.2, 0., 0. ],
[ 0., 0., -0.4, 1.2]]
and for column version
y = [[.2, .9],
[.6, .1]]
turns into
new = [[-0.7, 0., 0.5, 0. ],
[ 0., 0.7, 0., -0.5]]
Here is some indexing madness which I believe does what you are asking for:
>>> def magic(data):
... n, m = data.shape
... assert n==m
... rows = np.zeros((n, n-1, n, n), data.dtype)
... cols = np.zeros((n, n-1, n, n), data.dtype)
... idx = np.argsort(np.identity(n), kind='mergesort', axis=1)
... self = idx[:, -1] # should be just 0, 1, 2, 3, ...
... other = idx[:, :-1]
... rows[self, :, self, :] = data[:, None, :] - data[other[..., None], self]
... cols[self, ..., self] = data.T[:, None, :] - data.T[other[..., None], self]
... return rows.reshape(-1, n*n), cols.reshape(-1, n*n)
...
>>> magic(np.array([[.5,0], [.1,1.2]]))
(array([[ 0.4, -1.2, 0. , 0. ],
[ 0. , 0. , -0.4, 1.2]]), array([[ 0.5, 0. , -1.1, 0. ],
[ 0. , -0.5, 0. , 1.1]]))
>>> magic(np.array([[.2,.9], [.6,.1]]))
(array([[-0.4, 0.8, 0. , 0. ],
[ 0. , 0. , 0.4, -0.8]]), array([[-0.7, 0. , 0.5, 0. ],
[ 0. , 0.7, 0. , -0.5]]))
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
I have a sparse matrix imagine something like following :
X=([1.5 0.0 0.0 71.9 0.0 0.0 0.0],
[0.0 10.0 0.0 2.0 0.0 0.0 0.0],
[0.0 0.0 0.0 0.0 0.0 0.0 11.0])
is there any specific method already existed which can convert such matrix into the following file format(or matrix), where each row only contain nonzero values and their corresponding indices of rows in X:
Example
X1=( 0:1.5 3:71.9
1:10 3:2
6:11 )
my question is is there any existed way which can produce such dictionary out of a sparse matrix in python ?
You could use a scipy.sparse.csr_matrix. It contains the data you are looking for in its indptr, indices and data attributes:
import scipy.sparse as sparse
X = sparse.csr_matrix([[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0],
[0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]])
for row in range(X.shape[0]):
sl = slice(X.indptr[row], X.indptr[row+1])
pairs = zip(X.indices[sl], X.data[sl])
print(' '.join(['{}:{}'.format(idx, val) for idx, val in pairs]))
yields
0:1.5 3:71.9
1:10.0 3:2.0
6:11.0
Because you have a matrix of rows and columns, in my humble opinion, I think you need to mention the row and column of non-zero values for ease reference later, this can be done without importing any libraries:
>>> x
[[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0], [0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]]
>>>
>>> l = []
>>>
>>> for i,subl in enumerate(x):
for j, item in enumerate(subl):
if item:
l.append(([i,j],item))
>>> l
[([0, 0], 1.5), ([0, 3], 71.9), ([1, 1], 10.0), ([1, 3], 2.0), ([2, 6], 11.0)]
This should get you a long way there:
X = np.array(
[[1.5, 0.0, 0.0, 71.9, 0.0, 0.0, 0.0],
[0.0, 10.0, 0.0, 2.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 11.0]])
>>> zip(np.argwhere(X).tolist(), X[X != 0])
[([0, 0], 1.5),
([0, 3], 71.900000000000006),
([1, 1], 10.0),
([1, 3], 2.0),
([2, 6], 11.0)]
You can also use a nested dictionary comprehension:
>>> {(row, col): val
for row, data in enumerate(X)
for col, val in enumerate(data)
if val != 0}
{(0, 0): 1.5,
(0, 3): 71.900000000000006,
(1, 1): 10.0,
(1, 3): 2.0,
(2, 6): 11.0}