Formating numpy arrays to find the sum betyween 2 values Python - python

I am trying to modify the values of the values down below to the expected values. The function down below is meant to sum out all the values between 2 consecutive elements of limits. none of the values are between 0 and 2 within Numbers so the resultant is 0. However the values between 2 and 5 are 3,4 within Numbers so the resultant is 3+4=7. The function has been gotten from issue: issue.
def formating(a, b):
# Formating goes here
x = np.sort(b);
# digitize
l = np.digitize(a, x)
# output:
result = np.bincount(l, weights=a)
return result
Numbers = np.array([3, 4, 5, 7, 8, 10,20])
limit1 = np.array([0, 2 , 5, 12, 15])
limit2 = np.array([0, 2 , 5, 12])
limit3 = np.array([0, 2 , 5, 12, 15, 22])
result1= formating(Numbers, limit1)
result2= formating(Numbers, limit2)
result3= formating(Numbers, limit3)
Current output
result1: [ 0. 0. 7. 30. 0. 20.]
result2: [ 0. 0. 7. 30. 20.]
result3: [ 0. 0. 7. 30. 0. 20.]
Wanted Output:
result1: [ 0. 7. 30. 0.]
result2: [ 0. 7. 30. ]
result3: [ 0. 7. 30. 0. 20.]

So just throw out the bins for numbers off the end.
result1 = result1[1:len(limit1)]
result2 = result2[1:len(limit2)]
result3 = result3[1:len(limit3)]
Or, for smarter results, end the function with:
result = np.bincount(1, weights=a)
return result[1:len(b)]

Related

Matrix element repetition bug

I'm trying to create a matrix that reads:
[0,1,2]
[3,4,5]
[6,7,8]
However, my elements keep repeating. How do I fix this?
import numpy as np
n = 3
X = np.empty(shape=[0, n])
for i in range(3):
for j in range(1,4):
for k in range(1,7):
X = np.append(X, [[(3*i) , ((3*j)-2), ((3*k)-1)]], axis=0)
print(X)
Results:
[[ 0. 1. 2.]
[ 0. 1. 5.]
[ 0. 1. 8.]
[ 0. 1. 11.]
[ 0. 1. 14.]
[ 0. 1. 17.]
[ 0. 4. 2.]
[ 0. 4. 5.]
I'm not really sure how you think your code was supposed to work. You are appending a row in X at each loop, so 3 * 3 * 7 times, so you end up with a matrix of 54 x 3.
I think maybe you meant to do:
for i in range(3):
X = np.append(X, [[3*i , 3*i+1, 3*i+2]], axis=0)
Just so you know, appending array is usually discouraged (just create a list of list, then make it a numpy array).
You could also do
>> np.arange(9).reshape((3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])

Formating multidimensional numpy arrays to find the sum betyween 2 values Python

The function down below is meant to sum out all second row values of Numbers[:,0] between 2 consecutive elements of limits limit1-3. For the first calculation if none of the values are between 0 and 2 (the first two elements of limit1) within Numbers so the resultant is 0. For the second calculation 3,4 within Numbers[:,0] is between the values 2-5 in limit1 so the second column of Numbers is summed up 1+3 =4 resulting in 4. How could I implement this to the function below?
def formating(a, b, c):
# Formating goes here
x = np.sort(c);
# digitize
l = np.digitize(a, x)
# output:
result = np.bincount(l, weights=b)
return result[1:len(b)]
Numbers = np.array([[3,1], [4,3], [5,3], [7,11], [8,9], [10,20] , [20, 45]])
limit1 = np.array([0, 2 , 5, 12, 15])
limit2 = np.array([0, 2 , 5, 12])
limit3 = np.array([0, 2 , 5, 12, 15, 22])
result1= formating(Numbers[:,0], Numbers[:,1], limit1)
result2= formating(Numbers[:,0], Numbers[:,1], limit2)
result3= formating(Numbers[:,0], Numbers[:,1], limit3)
Expected Output
result1: [ 0. 4. 43. 0. ]
result2: [ 0. 4. 43. ]
result3: [ 0. 4. 43. 0. 45.]
Current Output
result1: [ 0. 4. 43. 0. 45.]
result2: [ 0. 4. 43. 45.]
result3: [ 0. 4. 43. 0. 45.]
This:
return result[1:len(b)]
should be
return result[1:len(c)]
Your return vector is dependent on the length of your bins, not your input data.

A robust way to keep the n-largest elements in rows or colums in the matrix

I would like to make a sparse matrix from the dense one, such that in each row or column only n-largest elements are preserved. I do the following:
def sparsify(K, min_nnz = 5):
'''
This function eliminates the elements which are smaller that the maximal element in the matrix,
Parameters
----------
K : ndarray
K - the input matrix
min_nnz:
the minimal number of elements in row or column to be preserved
'''
cond = np.bitwise_or(K >= -np.partition(-K, min_nnz - 1, axis = 1)[:, min_nnz - 1][:, None],
K >= -np.partition(-K, min_nnz - 1, axis = 0)[min_nnz - 1, :][None, :])
return spsp.csr_matrix(np.where(cond, K, 0))
This approach works as intended but seems to be not the most efficient, and the robust one. What would you recommend to do it an better way?
The example of usage:
A = np.random.rand(10, 10)
A_sp = sparsify(A, min_nnz = 3)
Instead of making another dense matrix, you can use coo_matrix to build up using only the values you need:
return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)
As for the rest, you can maybe short-circuit the second dimension, but your time savings will be completely dependent on your inputs
def sparsify(K, min_nnz = 5):
'''
This function eliminates the elements which are smaller that the maximal element in the matrix,
Parameters
----------
K : ndarray
K - the input matrix
min_nnz:
the minimal number of elements in row or column to be preserved
'''
cond = K >= -np.partition(-K, min_nnz - 1, axis = 0)[min_nnz - 1, :]
mask = cond.sum(1) < min_nnz
cond[mask] = np.bitwise_or(cond[mask],
K[mask] >= -np.partition(-K[mask],
min_nnz - 1,
axis = 1)[:, min_nnz - 1][:, None])
return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)
Testing:
sparsify(A)
Out[]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 58 stored elements in COOrdinate format>
sparsify(A).A
Out[]:
array([[0. , 0. , 0.61362248, 0. , 0.73648987,
0.64561856, 0.40727807, 0.61674005, 0.53533315, 0. ],
[0.8888361 , 0.64548039, 0.94659603, 0.78474203, 0. ,
0. , 0.78809603, 0.88938798, 0. , 0.37631541],
[0.69356682, 0. , 0. , 0. , 0. ,
0.7386594 , 0.71687659, 0.67750768, 0.58002451, 0. ],
[0.67241433, 0.71923718, 0.95888737, 0. , 0. ,
0. , 0.82773085, 0.69788448, 0.63736915, 0.4263064 ],
[0. , 0.65831794, 0. , 0. , 0.59850093,
0. , 0. , 0.61913869, 0.65024867, 0.50860294],
[0.75522891, 0. , 0.93342402, 0.8284258 , 0.64471939,
0.6990814 , 0. , 0. , 0. , 0.32940821],
[0. , 0.88458635, 0.62460096, 0.60412265, 0.66969674,
0. , 0.40318741, 0. , 0. , 0.44116059],
[0. , 0. , 0.500971 , 0.92291245, 0. ,
0.8862903 , 0. , 0.375885 , 0.49473635, 0. ],
[0.86920647, 0.85157893, 0.89883006, 0. , 0.68427193,
0.91195162, 0. , 0. , 0.94762875, 0. ],
[0. , 0.6435456 , 0. , 0.70551006, 0. ,
0.8075527 , 0. , 0.9421039 , 0.91096934, 0. ]])
sparsify(A).A.astype(bool).sum(0)
Out[]: array([5, 6, 7, 5, 5, 6, 5, 7, 7, 5])
sparsify(A).A.astype(bool).sum(1)
Out[]: array([6, 7, 5, 7, 5, 6, 6, 5, 6, 5])

Minimum sum route through numpy array

This is a two part question.
Part 1
Given the following Numpy array:
foo = array([[22.5, 20. , 0. , 20. ],
[24. , 40. , 0. , 8. ],
[ 0. , 0. , 50. , 9.9],
[ 0. , 0. , 0. , 9. ],
[ 0. , 0. , 0. , 2.5]])
what is the most efficient way to (i) find the two minimal possible sums of values across columns (taking into account cell values greater than zero only) where for every column only one row is used and (ii) keep track of the array index locations visited on that route?
For example, in the example above this would be: minimum_bar = 22.5 + 20 + 50 + 2.5 = 95 at indices [0,0], [0,1], [2,2], [4,3] and next_best_bar = 22.5 + 20 + 50 + 8 = 100.5 at indices [0,0], [0,1], [2,2], [1,3].
Part 2
Similar to Part 1 but now with the constraint that the row-wise of sums of foo (if that row is used in the solution) must be greater than the values in an array (for example np.array([10, 10, 10, 10, 10]). In other words sum(row[0])>array[0]=62.5>10=True but sum(row[4])>array[4]=2.5>10=False.
In which case the result is: minimum_bar = 22.5 + 20 + 50 + 9.9 = 102.4 at indices [0,0], [0,1], [2,2], [2,3] and next_best_bar = 22.5 + 20 + 50 + 20 = 112.5 at indices [0,0], [0,1], [2,2], [0,3].
My initial approach was to find all possible routes (combinations of indices using itertools) but this solution does not scale well for large matrix sizes (e.g., mxn=500x500).
Here's one solution that I came up with (hopefully I didn't misunderstand anything in your question)
def minimum_routes(foo):
assert len(foo) >= 2
assert np.all(np.any(foo > 0, axis=0))
foo = foo.astype(float)
foo[foo <= 0] = np.inf
foo.sort(0)
minimum_bar = foo[0]
next_best_bar = minimum_bar.copy()
c = np.argmin(np.abs(foo[0] - foo[1]))
next_best_bar[c] = foo[1, c]
return minimum_bar, next_best_bar
Let's test it:
foo = np.array([[22.5, 20. , 0. , 20. ],
[24. , 40. , 0. , 8. ],
[ 0. , 0. , 50. , 9.9],
[ 0. , 0. , 0. , 9. ],
[ 0. , 0. , 0. , 2.5]])
# PART 1
minimum_bar, next_best_bar = minimum_routes(foo)
# (array([22.5, 20. , 50. , 2.5]), array([24. , 20. , 50. , 2.5]))
# PART 2
constraint = np.array([10, 10, 10, 10, 10])
minimum_bar, next_best_bar = minimum_routes(foo[foo.sum(1) > constraint])
# (array([22.5, 20. , 50. , 8. ]), array([24., 20., 50., 8.]))
To find the indices:
np.where(foo == minimum_bar)
np.where(foo == next_best_bar)

Compare neighbours boolean numpy array in grid

I want to write a function which compares the 8 neighbours of a node in my grid. When minimum of 3 of the neighbours have the same value as the central node, we can define the node as happy.
for example in this array the central node and value is 0, we see that it has 3 neighbours of 0, so the node is happy:
array([[ 1, 0, 1],
[ 1, 0, 1],
[-1, 0, 0]])
I expect an boolean output with True or False.
Can I think of something like this or can I use easily numpy for this?
def nodehappiness(grid, i, j, drempel=3):
if i,j => 3:
node == True
Thanks in advance
Try this:
def neighbours(grid, i, j):
rows = np.array([-1, -1, -1, 0, 0, 1, 1, 1])
cols = np.array([-1, 0, 1, -1, 1, -1, 0, 1])
return grid[rows+i,cols+j]
Edit: Example:
grid = np.arange(25).reshape((5,5))
#array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
neighbours(grid, 0, 0)
# array([24, 20, 21, 4, 1, 9, 5, 6])
Explanation:
With numpy you can use negative indices allowing you to easily access the last entries of an array. This will also work for multiple dimensions:
x = np.array([0,1,2,3])
x[-1]
# 3
x.reshape((2,2))
#array([[0, 1],
# [2, 3]])
x[-1,-1]
# 3
You are interested in 8 entries of the matrix.
left above -> row - 1, column - 1
above -> row - 1, column + 0
right above -> row - 1, column + 1
left -> row + 0, column - 1
...
Thats what the arrays rows and cols represent. By adding i and j you get all the entries around these coordinates.
Try this.
y=[]
l= len(x)
for i in range(0,l):
for j in range(0,l):
if i==int(l/2) and j==int(l/2):
continue
y.append(x[j,i])
You search something like this?
def neighbour(grid, i, j):
return np.delete((grid[i-1:i+2,j-1:j+2]).reshape(1,9),4)
# Test code
grid = np.arange(16).reshape(4,4)
b = neighbour(m, 2, 2)
Some hackery using ndimage.generic_filter:
from scipy import ndimage
def get_neighbors(arr):
output = []
def f(x):
output.append(x)
return 0
t = tuple(int((x - 1) / 2) for x in arr.shape)
footprint = np.ones_like(arr)
footprint[t] = 0
ndimage.generic_filter(arr, f, footprint=footprint, mode='wrap')
return np.array(output)
arr = np.arange(9).reshape(3, 3)
neighbors = get_neighbors(arr)
neighbors_grid = neighbors.reshape(*arr.shape, -1)
print(neighbors)
print(neighbors_grid)
Which prints:
# neighbors
[[8. 6. 7. 2. 1. 5. 3. 4.]
[6. 7. 8. 0. 2. 3. 4. 5.]
[7. 8. 6. 1. 0. 4. 5. 3.]
[2. 0. 1. 5. 4. 8. 6. 7.]
[0. 1. 2. 3. 5. 6. 7. 8.]
[1. 2. 0. 4. 3. 7. 8. 6.]
[5. 3. 4. 8. 7. 2. 0. 1.]
[3. 4. 5. 6. 8. 0. 1. 2.]
[4. 5. 3. 7. 6. 1. 2. 0.]]
# neighbors_grid
[[[8. 6. 7. 2. 1. 5. 3. 4.]
[6. 7. 8. 0. 2. 3. 4. 5.]
[7. 8. 6. 1. 0. 4. 5. 3.]]
[[2. 0. 1. 5. 4. 8. 6. 7.]
[0. 1. 2. 3. 5. 6. 7. 8.]
[1. 2. 0. 4. 3. 7. 8. 6.]]
[[5. 3. 4. 8. 7. 2. 0. 1.]
[3. 4. 5. 6. 8. 0. 1. 2.]
[4. 5. 3. 7. 6. 1. 2. 0.]]]
If you merely want the padded array:
padded = np.pad(arr, pad_width=1, mode='wrap')
print(padded)
Which of course gives:
[[8 6 7 8 6]
[2 0 1 2 0]
[5 3 4 5 3]
[8 6 7 8 6]
[2 0 1 2 0]]

Categories