Numpy: calculate edges of a matrix - python

I have the following to calculate the difference of a matrix, i.e. the i-th element - the (i-1) element.
How can I (easily) calculate the difference for each element horizontally and vertically? With a transpose?
inputarr = np.arange(12)
inputarr.shape = (3,4)
inputarr+=1
#shift one position
newarr = list()
for x in inputarr:
newarr.append(np.hstack((np.array([0]),x[:-1])))
z = np.array(newarr)
print inputarr
print 'first differences'
print inputarr-z
Output
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
first differences
[[1 1 1 1]
[5 1 1 1]
[9 1 1 1]]

Check out numpy.diff.
From the documentation:
Calculate the n-th order discrete difference along given axis.
The first order difference is given by out[n] = a[n+1] - a[n] along
the given axis, higher order differences are calculated by using diff
recursively.
An example:
>>> import numpy as np
>>> a = np.arange(12).reshape((3,4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> np.diff(a,axis = 1) # row-wise
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
>>> np.diff(a, axis = 0) # column-wise
array([[4, 4, 4, 4],
[4, 4, 4, 4]])

Related

Find the row index number of an array in a 2D numpy array

If I have a 2D numpy array A:
[[6 9 6]
[1 1 2]
[8 7 3]]
And I have access to array [1 1 2]. Clearly, [1 1 2] belongs to index 1 of array A. But how do I do this?
Access the second row using the following operator:
import numpy as np
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
row = [1, 1, 2]
i = np.where(np.all(a==row, axis=1))
print(i[0][0])
np.where will return a tuple of indices (lists), which is why you need to use the operators [0][0] consecutively in order to obtain an int.
One option:
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
b = np.array([1, 1, 2])
np.nonzero((a == b).all(1))[0]
output: [1]
arr1 = [[6,9,6],[1,1,2],[8,7,3]]
ind = arr1.index([1,1,2])
Output:
ind = 1
EDIT for 2D np.array:
arr1 = np.array([[6,9,6],[1,1,2],[8,7,3]])
ind = [l for l in range(len(arr1)) if (arr1[l,:] == np.array([1,1,2])).all()]
import numpy as np
a = np.array([[6, 9, 6],
[1, 1, 2],
[8, 7, 3]])
b = np.array([1, 1, 2])
[x for x,y in enumerate(a) if (y==b).all()] # here enumerate will keep the track of index
#output
[1]

how can I shuffle node labels and get a new weight vector using NumPy in Python?

I am saving the edge weights of an undirected graph in a row vector. For instance, if I have a graph as pictured below
The vector that I create is [5, 3, 4, 1, 2, 7] as ordered based on node number in ascending order. Now, if I swap the node labels of nodes 1 and 4, I can obtain the following graph;
In this scenerio, the vector that I should have is [2, 7, 4, 1, 5, 3]. My question is if I have an n by m NumPy array, where n is the number of graphs and m is the number of edges, how can I shuffle the node labels for each row and get the updated array efficiently?
Suppose I have a set of graphs consisting of four nodes as shown below. My intention is to randomly shuffle node labels in each network and then get an updated weights accordingly in a same size array.
np.random.seed(2)
arr = np.random.randint(10, size=(5, 6))
arr
array([[8, 8, 6, 2, 8, 7],
[2, 1, 5, 4, 4, 5],
[7, 3, 6, 4, 3, 7],
[6, 1, 3, 5, 8, 4],
[6, 3, 9, 2, 0, 4]])
You can do it like this:
import numpy as np
def get_arr_from_edges(a):
n = int(np.sqrt(len(a) * 2)) + 1
mask = np.tri(n, dtype=bool, k=-1).T
out = np.zeros((n, n))
out[mask] = a
out += out.T
return out
def get_edges_from_arr(a):
mask = np.tri(a.shape[0], dtype=bool, k=-1).T
out = a[mask]
return out
def swap_nodes(a, nodes):
a[:, [nodes[0] - 1, nodes[1] - 1], :] = a[:, [nodes[1] - 1, nodes[0] - 1], :]
a[:, :, [nodes[0] - 1, nodes[1] - 1]] = a[:, :, [nodes[1] - 1, nodes[0] - 1]]
return a
arr = np.array([
[8, 8, 6, 2, 8, 7],
[2, 1, 5, 4, 4, 5],
[7, 3, 6, 4, 3, 7],
[6, 1, 3, 5, 8, 4],
[6, 3, 9, 2, 0, 4],
])
nodes_to_swap = (1, 4)
# initialize node-arr
node_arrs = np.apply_along_axis(get_arr_from_edges, axis=1, arr=arr)
# swap nodes
node_arrs = swap_nodes(node_arrs, nodes_to_swap)
# return rempapped edges
edges = np.array([get_edges_from_arr(node_arr) for node_arr in node_arrs])
print(edges)
Gives the following result:
[[8 7 6 2 8 8]
[4 5 5 4 2 1]
[3 7 6 4 7 3]
[8 4 3 5 6 1]
[0 4 9 2 6 3]]
The idea is to build a connection-matrix from the edges, where the edge-number is saved at the indices of the two nodes.
Then you just swap the columns and rows according to the nodes you want to swap. If you want this process to be random you could create random node pairs and call the function multiple times with these node pairs. This process is non-commutative, so if you want to swap multiple node-pairs then order matters!
After that you read out the remapped edges of the array with the swapped columns and rows (this is basically the inverse of the first step).
I am sure that there are some more optimizations left using numpys vast functionality.

How to get the indexes of the greatest N values greater than a threshold in Numpy?

For a project I need to be able to get, from a vector with shape (k, m), the indexes of the N greatest values of each row greater than a fixed threshold.
For example, if k=3, m=5, N=3 and the threshold is 5 and the vector is :
[[3 2 6 7 0],
[4 1 6 4 0],
[7 10 6 9 8]]
I should get the result (or the flattened version, I don't care) :
[[2, 3],
[2],
[1, 3, 4]]
The indexes don't have to be sorted.
My code is currently :
indexes = []
for row, inds in enumerate(np.argsort(results, axis=1)[:, -N:]):
for index in inds:
if results[row, index] > threshold:
indexes.append(index)
but I feel like I am not using Numpy to its full capacity.
Does anybody know a better and more elegant solution ?
How about this method:
import numpy as np
arr = np.array(
[[3, 2, 6, 7, 0],
[4, 1, 6, 4, 0],
[7, 10, 6, 9, 8]]
)
t = 5
n = 3
sorted_idxs = arr.argsort(1)[:, -n:]
sorted_arr = np.sort(arr, 1)[:, -n:]
item_nums = np.cumsum((sorted_arr > t).sum(1))
masked_idxs = sorted_idxs[sorted_arr > t]
idx_lists = np.split(masked_idxs, item_nums)
output:
[array([2, 3]), array([2]), array([4, 3, 1])]

Populating a 2D array to calculate a function of two linspaces

I have this set of equations I want to perform:
x = np.linspace(0, 2, 3)
y = np.linspace(x, x+2, 3)
I then want to populate the 2D array with a calculation that does:
a = 2*x + y
So for example, given an array:
x = [0, 1, 2]
Then, the array y is:
y = [[0, 1, 2],
[1, 2, 3],
[2, 3, 4]]
When I perform the operation a = 2*x + y I should get the array:
a = [[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]
How do I do this, keeping in mind I want to perform this operation quickly for array of size up to 10000x10000 (or larger)?
Or do your code adding two Ts:
print((2*x+y.T).T)
Output:
[[0 1 2]
[3 4 5]
[6 7 8]]

Select certain rows (condition met), but only some columns in Python/Numpy

I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:
I = A[A[:,1] == i]
which works. Then I further tried (similarly to matlab which I know very well):
I = A[A[:,1] == i, [0,2,3]]
which doesn't work. How to do it?
EXAMPLE DATA:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
# I want to get the columns 1, 3 and 4
# for every row which has the value i in the second column.
# In this case, this would be row 1 and 3 with columns 1, 3 and 4:
[[1 3 4]
[3 5 6]]
I am now currently using this:
I = A[A[:,1] == i]
I = I[:, [0,2,3]]
But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)
>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5, 6, 8],
[ 9, 10, 12]])
# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323
Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:
>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
If you do not want to use boolean positions but the indexes, you can write it this way:
A[:, [0, 2, 3]][A[:, 1] == i]
Going back to your example:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
[3 5 6]]
Seriously,
>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
[1, 3]])
>>>
This also works.
I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I
Edit: Since indexing starts from 0, so
i-1
should be used.
I am hoping this answers your question but a piece of script I have implemented using pandas is:
df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]
For example,
targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]
this will return a dataframe with only columns ['symbol','date','rtns'] from stockdf where the row value of rtns satisfies, stockdf['rtns'] > .04
hope this helps

Categories