I have the following numpy array which is basically a 3 channel image:
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
I am trying to find the most elegant way to set all the values of the 3rd channel according to the dictionary. Here is what I tried:
locations = list(values_of_channel_0.keys())
values = list(values_of_channel_0.values())
arr[lc,0] = values # trying to set the 3rd channel
But this fails.
Is there a way in which this can be done without looping over keys and values?
What's wrong with a simple loop? Something will have to iterate over the key/value-pairs you provide in your dictionary in any case?
import numpy as np
arr = np.zeros((6, 4, 3), dtype=np.float32)
# dictionary of values, key is array location
values_of_channel_0 = {
(0, 2) : 1,
(1, 0) : 1,
(1, 3) : 5,
(2, 1) : 2,
(2, 2) : 3,
(2, 3) : 1,
(3, 0) : 1,
(3, 2) : 2,
(4, 0) : 2,
(4, 2) : 20,
(5, 0) : 1,
(5, 2) : 10,
(5, 3) : 1
}
for (a, b), v in values_of_channel_0.items():
arr[a, b, 0] = v
print(arr)
Result:
[[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 5. 0. 0.]]
[[ 0. 0. 0.]
[ 2. 0. 0.]
[ 3. 0. 0.]
[ 1. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[ 2. 0. 0.]
[ 0. 0. 0.]]
[[ 2. 0. 0.]
[ 0. 0. 0.]
[20. 0. 0.]
[ 0. 0. 0.]]
[[ 1. 0. 0.]
[ 0. 0. 0.]
[10. 0. 0.]
[ 1. 0. 0.]]]
If you insist on not looping for the assignment, you can construct a data structure that can be assigned at once:
channel_0 = [[values_of_channel_0[b, a] if (b, a) in values_of_channel_0 else 0 for a in range(4)] for b in range(6)]
arr[..., 0] = channel_0
But this is clearly rather pointless and not even more efficient. If you have some control over how values_of_channel_0 is constructed, you could consider constructing it as a nested list or array of the right dimensions immediately, to allow for this type of assignment.
Users #mechanicpig and #michaelszczesny offer a very clean alternative (which will be more efficient since it relies on the efficient implementation of zip()):
arr[(*zip(*values_of_channel_0), 0)] = list(values_of_channel_0.values())
Edit: you asked for an explanation of the lefthand side.
This hinges on the unpacking operator *. *values_of_channel_0 spreads all the keys of the dictionary values_of_channel_0 into a call to zip(). Since these keys are all 2-tuples of int, zip will yield two tuples, one with all the first coordinates (0, 1, 1, ...) and the second with the second coordinates (2, 0, 3, ...).
Since the call to zip() is also preceded by *, these two values will be spread to index arr[], together with a final coordinate 0. So this:
arr[(*zip(*values_of_channel_0), 0)] = ...
Is essentially the same as:
arr[((0, 1, 1, ...), (2, 0, 3, ...), 0)] = ...
That's a slice of arr with exactly the same number of elements as the dictionary, including all the elements with the needed coordinates. And so assigning list(values_of_channel_0.values()) to it works and has the desired effect of assigning the matching values to the correct coordinates.
Related
When we uses arrays as indexes cupy/numpy ignores duplicates.
Example:
import cupy as cp
matrix = cp.zeros((3, 3))
xi = cp.asarray([0, 1, 1, 2])
yi = cp.asarray([0, 1, 1, 2])
matrix[xi, yi] += 1
print(matrix.get())
Output:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
Desired output:
[[1. 0. 0.]
[0. 2. 0.]
[0. 0. 1.]]
The second one (1, 1) index is ignored. How to apply operation for duplicate indexes also?
I would like to make a sparse matrix from the dense one, such that in each row or column only n-largest elements are preserved. I do the following:
def sparsify(K, min_nnz = 5):
'''
This function eliminates the elements which are smaller that the maximal element in the matrix,
Parameters
----------
K : ndarray
K - the input matrix
min_nnz:
the minimal number of elements in row or column to be preserved
'''
cond = np.bitwise_or(K >= -np.partition(-K, min_nnz - 1, axis = 1)[:, min_nnz - 1][:, None],
K >= -np.partition(-K, min_nnz - 1, axis = 0)[min_nnz - 1, :][None, :])
return spsp.csr_matrix(np.where(cond, K, 0))
This approach works as intended but seems to be not the most efficient, and the robust one. What would you recommend to do it an better way?
The example of usage:
A = np.random.rand(10, 10)
A_sp = sparsify(A, min_nnz = 3)
Instead of making another dense matrix, you can use coo_matrix to build up using only the values you need:
return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)
As for the rest, you can maybe short-circuit the second dimension, but your time savings will be completely dependent on your inputs
def sparsify(K, min_nnz = 5):
'''
This function eliminates the elements which are smaller that the maximal element in the matrix,
Parameters
----------
K : ndarray
K - the input matrix
min_nnz:
the minimal number of elements in row or column to be preserved
'''
cond = K >= -np.partition(-K, min_nnz - 1, axis = 0)[min_nnz - 1, :]
mask = cond.sum(1) < min_nnz
cond[mask] = np.bitwise_or(cond[mask],
K[mask] >= -np.partition(-K[mask],
min_nnz - 1,
axis = 1)[:, min_nnz - 1][:, None])
return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)
Testing:
sparsify(A)
Out[]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 58 stored elements in COOrdinate format>
sparsify(A).A
Out[]:
array([[0. , 0. , 0.61362248, 0. , 0.73648987,
0.64561856, 0.40727807, 0.61674005, 0.53533315, 0. ],
[0.8888361 , 0.64548039, 0.94659603, 0.78474203, 0. ,
0. , 0.78809603, 0.88938798, 0. , 0.37631541],
[0.69356682, 0. , 0. , 0. , 0. ,
0.7386594 , 0.71687659, 0.67750768, 0.58002451, 0. ],
[0.67241433, 0.71923718, 0.95888737, 0. , 0. ,
0. , 0.82773085, 0.69788448, 0.63736915, 0.4263064 ],
[0. , 0.65831794, 0. , 0. , 0.59850093,
0. , 0. , 0.61913869, 0.65024867, 0.50860294],
[0.75522891, 0. , 0.93342402, 0.8284258 , 0.64471939,
0.6990814 , 0. , 0. , 0. , 0.32940821],
[0. , 0.88458635, 0.62460096, 0.60412265, 0.66969674,
0. , 0.40318741, 0. , 0. , 0.44116059],
[0. , 0. , 0.500971 , 0.92291245, 0. ,
0.8862903 , 0. , 0.375885 , 0.49473635, 0. ],
[0.86920647, 0.85157893, 0.89883006, 0. , 0.68427193,
0.91195162, 0. , 0. , 0.94762875, 0. ],
[0. , 0.6435456 , 0. , 0.70551006, 0. ,
0.8075527 , 0. , 0.9421039 , 0.91096934, 0. ]])
sparsify(A).A.astype(bool).sum(0)
Out[]: array([5, 6, 7, 5, 5, 6, 5, 7, 7, 5])
sparsify(A).A.astype(bool).sum(1)
Out[]: array([6, 7, 5, 7, 5, 6, 6, 5, 6, 5])
Given a numpy array, which can be subset to indices for array elements meeting given criteria. How do I create tuples of triplets (or quadruplets, quintuplets, ...) from the resulting pairs of indices ?
In the example below, pairs_tuples is equal to [(1, 0), (3, 0), (3, 1), (3, 2)]. triplets_tuples should be [(0, 1, 3)] because all of its elements (i.e. (1, 0), (3, 0), (3, 1)) have pairwise values meeting the condition, whereas (3, 2) does not.
a = np.array([[0. , 0. , 0. , 0. , 0. ],
[0.96078379, 0. , 0. , 0. , 0. ],
[0.05498203, 0.0552454 , 0. , 0. , 0. ],
[0.46005028, 0.45468466, 0.11167813, 0. , 0. ],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
pairs = np.where((a >= .11) & (a <= .99))
pairs_tuples = list(zip(pairs[0].tolist(), pairs[1].tolist()))
# [(1, 0), (3, 0), (3, 1), (3, 2)]
How to get to the below?
triplets_tuples = [(0, 1, 3)]
quadruplets_tuples = []
quintuplets_tuples = []
This has an easy part and an NP part. Here's the solution to the easy part.
Let's assume you have the full correlation matrix:
>>> c = a + a.T
>>> c
array([[0. , 0.96078379, 0.05498203, 0.46005028, 0.1030161 ],
[0.96078379, 0. , 0.0552454 , 0.45468466, 0.10350956],
[0.05498203, 0.0552454 , 0. , 0.11167813, 0.00109096],
[0.46005028, 0.45468466, 0.11167813, 0. , 0.00928037],
[0.1030161 , 0.10350956, 0.00109096, 0.00928037, 0. ]])
What you're doing is converting this into an adjacency matrix:
>>> adj = (a >= .11) & (a <= .99)
>>> adj.astype(int) # for readability below - False and True take a lot of space
array([[0, 1, 0, 1, 0],
[1, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[1, 1, 1, 0, 0],
[0, 0, 0, 0, 0]])
This now represents a graph where columns and rows corresponds to nodes, and a 1 is a line between them. We can use networkx to visualize this:
import networkx
g = networkx.from_numpy_matrix(adj)
networkx.draw(g)
You're looking for maximal fully-connected subgraphs, or "cliques", within this graph. This is the Clique problem, and is the NP part. Thankfully, networkx can solve that too:
>>> list(networkx.find_cliques(g))
[[3, 0, 1], [3, 2], [4]]
Here [3, 0, 1] is one of your triplets.
Given a N×N grid, where all positions in the grid are marked either has "Yes," "No" or " " (space).
Find all possible sequences in the grid containing 5 consecutive " " (spaces). Here "consecutive spaces" can either be vertical, horizontal or diagonal.
We can assume that the grid is represented as a dictionary, wherein every key of the dictionary represents a position as a co-ordinate and every value represents whether there is a "Yes," "No" or " " on the position.
For example, grid[(1, 2)] = "Yes." Indicates that there is a "Yes" at position (1, 2) on the grid.
We can also assume that the value of N is known in advance.
My initial approach to this problem involved looping through the entire grid from the beginning, checking for horizontal sequences, then vertical sequences and then diagonal sequences. However, this would prove to be inefficient as I would have to continuously re-calculate lengths of the sequences each time, making sure they are equal to 5 as well as checking whether the sequences are consecutive.
I was looking for a more elegant approach, a better way to do this. Is there a Python library that allows for such computations? I have tried looking but didn't find anything that fit the constraints of the problem.
Any guidance would be greatly appreciated!
Use scipy; specfically, use the scipy.ndimage.label function, which labels connected sequences together.
import scipy, scipy.ndimage
# N and grid dictionary are already known
G = scipy.zeros([N,N])
for k, v in grid.iteritems():
if v.lower() == 'yes':
G[tuple(k)] = 1
elif v.lower() == 'no':
G[tuple(k)] = -1
def get_consecutive_spaces(G, chain=5):
sequences = []
# Generate a pattern for each directional sequence. (1) Horizontal,
# (2) vertical, (3) top-left to bottom-right diagonal, (4) bottom-left
# to top-right diagonal
patterns = [scipy.ndimage.label(G == 0, structure = scipy.array([[0,0,0],
[1,1,1],
[0,0,0]])),
scipy.ndimage.label(G == 0, structure = scipy.array([[0,1,0],
[0,1,0],
[0,1,0]])),
scipy.ndimage.label(G == 0, structure = scipy.array([[1,0,0],
[0,1,0],
[0,0,1]])),
scipy.ndimage.label(G == 0, structure = scipy.array([[0,0,1],
[0,1,0],
[1,0,0]]))]
# Loop over patterns, then find any labelled sequence >= a size of chain=5
for lab_arr, n in patterns:
for i in range(1, n+1):
b = lab_arr == i
b_inds = scipy.where(b)
if len(b_inds[0]) < chain:
continue
sequences.append((tuple(b_inds[0]), tuple(b_inds[1])))
return sequences
E.g.
>>> G = scipy.sign(scipy.random.random([12,12]) - 0.5)*(scipy.random.random([12,12]) < 0.5)
>>> print G
[[-0. 1. -1. 1. 1. -1. 1. 0. -0. 1. 0. -0.]
[ 1. 1. -0. -0. -1. 1. 1. -1. -1. 1. -0. -1.]
[ 0. 1. 1. 1. 0. 1. -0. 0. 0. 0. -0. 0.]
[ 1. -0. -1. 0. -1. -0. 1. 0. -0. -0. -0. 1.]
[-0. 1. -1. 1. -0. -0. -1. -0. 1. 1. -0. 0.]
[ 0. -1. 1. -0. 1. 0. -0. -1. -1. -0. 0. -1.]
[-1. -0. 0. -1. -1. -0. -1. 0. 0. 0. -1. 0.]
[-0. 1. 0. 0. 0. 1. 1. -1. 0. -0. -1. 0.]
[ 1. 1. 0. 1. -1. -1. 0. 0. -1. 1. 0. 0.]
[-0. 0. 0. -1. -0. -1. 1. -0. 0. 1. 1. 0.]
[-1. 1. -1. 1. 0. 1. 0. 1. 1. 1. 1. 0.]
[-1. 1. -0. 0. -1. 0. -0. -1. 1. -1. -0. 0.]]
>>> sequences = get_consecutive_spaces(G)
>>> for s in sequences: print s
((2, 2, 2, 2, 2, 2), (6, 7, 8, 9, 10, 11))
((0, 1, 2, 3, 4, 5), (10, 10, 10, 10, 10, 10))
((6, 7, 8, 9, 10, 11), (11, 11, 11, 11, 11, 11))
((0, 1, 2, 3, 4, 5, 6, 7), (11, 10, 9, 8, 7, 6, 5, 4))
((2, 3, 4, 5, 6), (6, 5, 4, 3, 2))
((4, 5, 6, 7, 8), (11, 10, 9, 8, 7))
Note, returns sequences greater or equal to chain; does not restrict itself to sequences of only 5 in length. To change it to only a length of 5 is a trivial fix.
How can I convert an ndarray to a matrix in numpy? I'm trying to import data from a csv and turn it into a matrix.
from numpy import array, matrix, recfromcsv
my_vars = ['docid','coderid','answer1','answer2']
toy_data = matrix( array( recfromcsv('toy_data.csv', names=True)[my_vars] ) )
print toy_data
print toy_data.shape
But I get this:
[[(1, 1, 3, 3) (1, 2, 4, 1) (1, 3, 7, 2) (2, 1, 3, 3) (2, 2, 4, 4)
(2, 4, 3, 1) (3, 1, 3, 3) (3, 2, 4, 3) (3, 3, 3, 4) (4, 4, 5, 1)
(4, 5, 6, 2) (4, 2, 4, 3) (5, 2, 5, 4) (5, 3, 3, 1) (5, 4, 7, 2)
(6, 1, 3, 3) (6, 5, 4, 1) (6, 2, 5, 2)]]
(1, 18)
What do I have to do to get a 4 by 18 matrix out of this code? There's got to be an easy answer to this question, but I just can't find it.
If the ultimate goal is to make a matrix, there's no need to create a recarray with named columns. You could use np.loadtxt to load the csv into an ndarray, then use np.asmatrix to convert it to a matrix:
import numpy as np
toy_data = np.asmatrix(np.loadtxt('toy_data.csv',delimiter=','skiprows=1))
print toy_data
print toy_data.shape
yields
[[ 1. 1. 3. 3.]
[ 1. 2. 4. 1.]
[ 1. 3. 7. 2.]
[ 2. 1. 3. 3.]
[ 2. 2. 4. 4.]
[ 2. 4. 3. 1.]
[ 3. 1. 3. 3.]
[ 3. 2. 4. 3.]
[ 3. 3. 3. 4.]
[ 4. 4. 5. 1.]
[ 4. 5. 6. 2.]
[ 4. 2. 4. 3.]
[ 5. 2. 5. 4.]
[ 5. 3. 3. 1.]
[ 5. 4. 7. 2.]
[ 6. 1. 3. 3.]
[ 6. 5. 4. 1.]
[ 6. 2. 5. 2.]]
(18, 4)
Note: the skiprows argument is used to skip over the header in the csv.
You can just read all your values into a vector, then reshape it.
fo = open("toy_data.csv")
def _ReadCSV(fileobj):
for line in fileobj:
for el in line.split(","):
yield float(el)
header = map(str.strip, fo.readline().split(","))
a = numpy.fromiter(_ReadCSV(fo), numpy.float64)
a.shape = (-1, len(header))
But there may be an even more direct way with newer numpy.