Given a numpy array, which can be subset to indices for array elements meeting given criteria. How do I create tuples of triplets (or quadruplets, quintuplets, ...) from the resulting pairs of indices ?
In the example below, pairs_tuples is equal to [(1, 0), (3, 0), (3, 1), (3, 2)]. triplets_tuples should be [(0, 1, 3)] because all of its elements (i.e. (1, 0), (3, 0), (3, 1)) have pairwise values meeting the condition, whereas (3, 2) does not.
a = np.array([[0. , 0. , 0. , 0. , 0. ],
[0.96078379, 0. , 0. , 0. , 0. ],
[0.05498203, 0.0552454 , 0. , 0. , 0. ],
[0.46005028, 0.45468466, 0.11167813, 0. , 0. ],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
pairs = np.where((a >= .11) & (a <= .99))
pairs_tuples = list(zip(pairs[0].tolist(), pairs[1].tolist()))
# [(1, 0), (3, 0), (3, 1), (3, 2)]
How to get to the below?
triplets_tuples = [(0, 1, 3)]
quadruplets_tuples = []
quintuplets_tuples = []
This has an easy part and an NP part. Here's the solution to the easy part.
Let's assume you have the full correlation matrix:
>>> c = a + a.T
>>> c
array([[0. , 0.96078379, 0.05498203, 0.46005028, 0.1030161 ],
[0.96078379, 0. , 0.0552454 , 0.45468466, 0.10350956],
[0.05498203, 0.0552454 , 0. , 0.11167813, 0.00109096],
[0.46005028, 0.45468466, 0.11167813, 0. , 0.00928037],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
What you're doing is converting this into an adjacency matrix:
>>> adj = (a >= .11) & (a <= .99)
>>> adj.astype(int) # for readability below - False and True take a lot of space
array([[0, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[1, 1, 1, 0, 0],
[0, 0, 0, 0, 0]])
This now represents a graph where columns and rows corresponds to nodes, and a 1 is a line between them. We can use networkx to visualize this:
import networkx
g = networkx.from_numpy_matrix(adj)
networkx.draw(g)
You're looking for maximal fully-connected subgraphs, or "cliques", within this graph. This is the Clique problem, and is the NP part. Thankfully, networkx can solve that too:
>>> list(networkx.find_cliques(g))
[[3, 0, 1], [3, 2], [4]]
Here [3, 0, 1] is one of your triplets.
Related
I have the following numpy array which is basically a 3 channel image:
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
I am trying to find the most elegant way to set all the values of the 3rd channel according to the dictionary. Here is what I tried:
locations = list(values_of_channel_0.keys())
values = list(values_of_channel_0.values())
arr[lc,0] = values # trying to set the 3rd channel
But this fails.
Is there a way in which this can be done without looping over keys and values?
What's wrong with a simple loop? Something will have to iterate over the key/value-pairs you provide in your dictionary in any case?
import numpy as np
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
for (a, b), v in values_of_channel_0.items():
arr[a, b, 0] = v
print(arr)
Result:
[[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 5. 0. 0.]]
[[ 0. 0. 0.]
[ 2. 0. 0.]
[ 3. 0. 0.]
[ 1. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 2. 0. 0.]
[ 0. 0. 0.]]
[[ 2. 0. 0.]
[ 0. 0. 0.]
[20. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[10. 0. 0.]
[ 1. 0. 0.]]]
If you insist on not looping for the assignment, you can construct a data structure that can be assigned at once:
channel_0 = [[values_of_channel_0[b, a] if (b, a) in values_of_channel_0 else 0 for a in range(4)] for b in range(6)]
arr[..., 0] = channel_0
But this is clearly rather pointless and not even more efficient. If you have some control over how values_of_channel_0 is constructed, you could consider constructing it as a nested list or array of the right dimensions immediately, to allow for this type of assignment.
Users #mechanicpig and #michaelszczesny offer a very clean alternative (which will be more efficient since it relies on the efficient implementation of zip()):
arr[(*zip(*values_of_channel_0), 0)] = list(values_of_channel_0.values())
Edit: you asked for an explanation of the lefthand side.
This hinges on the unpacking operator *. *values_of_channel_0 spreads all the keys of the dictionary values_of_channel_0 into a call to zip(). Since these keys are all 2-tuples of int, zip will yield two tuples, one with all the first coordinates (0, 1, 1, ...) and the second with the second coordinates (2, 0, 3, ...).
Since the call to zip() is also preceded by *, these two values will be spread to index arr[], together with a final coordinate 0. So this:
arr[(*zip(*values_of_channel_0), 0)] = ...
Is essentially the same as:
arr[((0, 1, 1, ...), (2, 0, 3, ...), 0)] = ...
That's a slice of arr with exactly the same number of elements as the dictionary, including all the elements with the needed coordinates. And so assigning list(values_of_channel_0.values()) to it works and has the desired effect of assigning the matching values to the correct coordinates.
I have data in the following format:
[('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
And I want use this information to create a NumPy array that has the value 1.0 in position 2, value 2.5 in position 6, etc. All positions not listed in the above should be zeroes. Like this:
array([0, 3.0, 0, 0, 0, 0, 2.5, 0, 0, 3.0])
First reformat the data:
data = [
("user_1", 2, 1.0),
("user_2", 6, 2.5),
("user_3", 9, 3.0),
("user_4", 1, 3.0),
]
usernames, indices, values = zip(*data)
And then create the array:
length = max(indices) + 1
arr = np.zeros(shape=(length,))
arr[list(indices)] = values
print(arr) # array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
Note that you need to convert indices to a list,
otherwise when using it for indexing numpy will
think it is trying to index multiple dimensions.
I've come up with this solution:
import numpy as np
a = [('user_1', 2, 1.0),
('user_2', 6, 2.5),
('user_3', 9, 3.0),
('user_4', 1, 3.0)]
res = np.zeros(max(x[1] for x in a)+1)
for i in range(len(a)):
res[a[i][1]] = a[i][2]
res
# array([0. , 3. , 1. , 0. , 0. , 0. , 2.5, 0. , 0. , 3. ])
First I create a 0 filled array with maximum value of the number in index 1 of each tuple in list a + 1 to account that your positions are 1 higher than the indexes inside the array are.
Then I do a simple loop and assign the values according to the arguments in the tuple.
I have a numpy 2D array (50x50) filled with values. I would like to flatten the 2D array into one column (2500x1), but the location of these values are very important. The indices can be converted to spatial coordinates, so I want another two (x,y) (2500x1) arrays so I can retrieve the x,y spatial coordinate of the corresponding value.
For example:
My 2D array:
--------x-------
[[0.5 0.1 0. 0.] |
[0. 0. 0.2 0.8] y
[0. 0. 0. 0. ]] |
My desired output:
#Values
[[0.5]
[0.1]
[0. ]
[0. ]
[0. ]
[0. ]
[0. ]
[0.2]
...],
#Corresponding x index, where I will retrieve the x spatial coordinate from
[[0]
[1]
[2]
[3]
[4]
[0]
[1]
[2]
...],
#Corresponding y index, where I will retrieve the x spatial coordinate from
[[0]
[0]
[0]
[0]
[1]
[1]
[1]
[1]
...],
Any clues on how to do this? I've tried a few things but they have not worked.
For the simplisity let's reproduce your array with this chunk of code:
value = np.arange(6).reshape(2, 3)
Firstly, we create variables x, y which contains index for each dimension:
x = np.arange(value.shape[0])
y = np.arange(value.shape[1])
np.meshgrid is the method, related to the issue you described:
xx, yy = np.meshgrid(x, y, sparse=False)
Finaly, transform all elements it in the shape you want with these lines:
xx = xx.reshape(-1, 1)
yy = yy.reshape(-1, 1)
value = value.reshape(-1, 1)
According to your example, with np.indices:
data = np.arange(2500).reshape(50, 50)
y_indices, x_indices = np.indices(data.shape)
Reshaping your data:
data = data.reshape(-1,1)
x_indices = x_indices.reshape(-1,1)
y_indices = y_indices.reshape(-1,1)
Assuming you want to flatten and reshape into a single column, use reshape:
a = np.array([[0.5, 0.1, 0., 0.],
[0., 0., 0.2, 0.8],
[0., 0., 0., 0. ]])
a.reshape((-1, 1)) # 1 column, as many row as necessary (-1)
output:
array([[0.5],
[0.1],
[0. ],
[0. ],
[0. ],
[0. ],
[0.2],
[0.8],
[0. ],
[0. ],
[0. ],
[0. ]])
getting the coordinates
y,x = a.shape
np.tile(np.arange(x), y)
# array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3])
np.repeat(np.arange(y), x)
# array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2])
or simply using unravel_index:
Y, X = np.unravel_index(range(a.size), a.shape)
# (array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]),
# array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]))
I have a numpy array which represents the adjacency of faces in a 3D model. In general the nth row and column represent the nth face of the model. If a 1 is located in the upper right triangle of the matrix, it represents a convex connection between two faces. If a 1 is located in the lower left triangle, it represents a concave connection.
For example in the matrix below, there are convex connections between faces 1 and 2, 1 and 3, 2 and 3 and so on.
1 2 3 4 5 6
1 [[ 0. 1. 1. 0. 0. 0.]
2 [ 0. 0. 1. 1. 1. 1.]
3 [ 0. 0. 0. 0. 0. 0.]
4 [ 0. 0. 0. 0. 1. 0.]
5 [ 0. 0. 0. 0. 0. 0.]
6 [ 0. 0. 0. 0. 0. 0.]]
Id like to be able to record how many concave and convex connections each face has.
i.e. Face 1 has: 0 concave and 2 convex connections
Possibly even record which faces they are connected to.
i.e. Face 1 has: 0 concave and 2 convex (2, 3) connections
So far I have tried using np.nonzero() to return the indices of the 1's. However this returns the indices in a format which doesn't seem to be very easy to work with (a separate array for the row and column indices:
(array([ 0, 0, 1, 1, 1, 1, 3]), array([ 1, 2, 2, 3, 4, 5,
4]))
Can anyone help me with an easier way to carry out this task? Thanks
try this:
import numpy as np
a=np.matrix([[0,1,1,0,0,0],
[ 0,0,1,1,1,1],
[ 0,0,0,0,0,0],
[ 0,0,0,0,1,0],
[ 0,0,0,0,0,0],
[ 0,0,0,0,0,0]]).astype(float)
concave={}
convex={}
for i,j in zip(np.nonzero(a)[0]+1,np.nonzero(a)[1]+1):
if j > i :
if i not in convex.keys():
convex[i]=[]
if j not in convex.keys():
convex[j]=[]
convex[i].append(j)
convex[j].append(i)
else :
if i not in concave.keys():
concave[i]=[]
if j not in concave.keys():
concave[j]=[]
concave[i].append(j)
concave[j].append(i)
print 'concave relations : {} and number of relations is {}'.format(concave,sum(len(v) for v in concave.values()))
print 'convex relations : {} and number of relations is {}'.format(convex,sum(len(v) for v in convex.values()))
gives the result :
concave relations : {} and number of relations is 0
convex relations : {1: [2, 3], 2: [1, 3, 4, 5, 6], 3: [1, 2], 4: [2, 5], 5: [2, 4], 6: [2]} and number of relations is 14
where the dictionary key is the name of the face and key values are it's connections.
Logic is :
for every non-zero pair (i,j)
if i>j then j is the concave connection of face i & i is the concave connection of face j
if j>i then j is the convex connection of face i & i is the convex connection of face j
import numpy as np
A = np.array([[0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
convex = np.triu(A, 1) # upper triangle
concave = np.tril(A, -1) # lower triangle
convex_indices = list(zip(np.nonzero(convex)[0] + 1, np.nonzero(convex)[1] + 1))
concave_indices = list(zip(np.nonzero(concave)[0] + 1, np.nonzero(concave)[1] + 1))
num_convex = len(convex_indices)
num_concave = len(concave_indices)
print('There are {} convex connections between faces: {}'.format(num_convex, ', '.join(str(e) for e in convex_indices)))
print('There are {} concave connections between faces: {}'.format(num_concave, ', '.join(str(e) for e in concave_indices)))
# will print:
# There are 7 convex connections between faces: (1, 2), (1, 3), (2, 3), (2, 4), (2, 5), (2, 6), (4, 5)
# There are 0 concave connections between faces:
I'm trying to efficiently map a N * 1 numpy array of ints to a N * 3 numpy array of floats using a ufunc.
What I have so far:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = numpy.frompyfunc(lambda x: numpy.array(map[x], numpy.float32), 1, 1)
input = numpy.array([1, 2, 3], numpy.int32)
ufunc(input) gives a 3 * 3 array with dtype object. I'd like this array but with dtype float32.
You could use np.hstack:
import numpy as np
mapping = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = np.frompyfunc(lambda x: np.array(mapping[x], np.float32), 1, 1, dtype = np.float32)
data = np.array([1, 2, 3], np.int32)
result = np.hstack(ufunc(data))
print(result)
# [ 0. 0. 0. 0.5 0.5 0.5 1. 1. 1. ]
print(result.dtype)
# float32
print(result.shape)
# (9,)
If your mapping is a numpy array, you can just use fancy indexing for this:
>>> valmap = numpy.array([(0, 0, 0), (0.5, 0.5, 0.5), (1, 1, 1)])
>>> input = numpy.array([1, 2, 3], numpy.int32)
>>> valmap[input-1]
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
You can use ndarray fancy index to get the same result, I think it should be faster than frompyfunc:
map_array = np.array([[0,0,0],[0,0,0],[0.5,0.5,0.5],[1,1,1]], dtype=np.float32)
index = np.array([1,2,3,1])
map_array[index]
Or you can just use list comprehension:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
np.array([map[i] for i in [1,2,3,1]], dtype=np.float32)
Unless I misread the doc, the output of np.frompyfunc on a scalar a object indeed: when using a ndarray as input, you'll get a ndarray with dtype=obj.
A workaround is to use the np.vectorize function:
F = np.vectorize(lambda x: mapper.get(x), 'fff')
Here, we force the dtype of F's output to be 3 floats (hence the 'fff').
>>> mapper = {1: (0, 0, 0), 2: (0.5, 1.0, 0.5), 3: (1, 2, 1)}
>>> inp = [1, 2, 3]
>>> F(inp)
(array([ 0. , 0.5, 1. ], dtype=float32), array([ 0., 0.5, 1.], dtype=float32), array([ 0. , 0.5, 1. ], dtype=float32))
OK, not quite what we want: it's a tuple of three float arrays (as we gave 'fff'), the first array being equivalent to [mapper[i][0] for i in inp]. So, with a bit of manipulation:
>>> np.array(F(inp)).T
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]], dtype=float32)