Nearest value of a given value - python

Given this pandas Series.
x = pd.Series([5, 10])
I use searchsorted and a loop to find the closest value of the looped number, X.
for X in xrange(1, 11):
print X, x.searchsorted(X)
What is returned.
1 [0]
2 [0]
3 [0]
4 [0]
5 [0]
6 [1]
7 [1]
8 [1]
9 [1]
10 [1]
What I'm trying to achieve is having 6 and 7 return [0] because those numbers are closer to 5 than 10.

Instead of .searchsorted() you could also check for the index of the pd.Series value that minimizes the absolute difference to the value in question:
import numpy as np
for X in range(1, 11):
print(X, (np.abs(x - X)).argmin())
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 1
9 1
10 1

Related

How to fill a zeros matrix using for loop?

I have an array full of zeros created as A = np.zeros(m,m) and m = 6.
I want to fill this array with specific numbers that each equals sum of it's row and column number; such as A(x,y) = x+y.
How can i do this using for loop and while loop?
Method that avoids a loop with rather significant performance improvement on a large ndarray:
A = np.zeros((6,6))
m = A.shape[0]
n = A.shape[1]
x = np.transpose(np.array([*range(1,m+1)]*n).reshape(n,m))
y = np.array([*range(1,n+1)]*m).reshape(m,n)
A = x+y
print(A)
[[ 2 3 4 5 6 7]
[ 3 4 5 6 7 8]
[ 4 5 6 7 8 9]
[ 5 6 7 8 9 10]
[ 6 7 8 9 10 11]
[ 7 8 9 10 11 12]]
A = np.zeros((6,6))
for i in range(0,A.shape[0]):
for j in range(0, A.shape[1]):
A[i][j] = i+j+2
If you want the rows and columns to be starting from 1, you can directly use this code, but if you want them to be starting from 0 you can surely remove the "+2" in line-4.
Explanation:
I am first traversing the row in a loop when, then traversing the columns in loop 2, and then I am accessing the cell value using A[i][j]. and assigning it to i+j+2 (or just i + j). This way the original array will fill your new values.
Have you tried this?
for y in range(len(A)):
for x in range(len(A[Y]):
A[y][x] = x + y

Divide matrix into submatrix python

The program must accept an integer matrix of size R*C and four integers X, Y, P, Q as the input. The program must divide the matrix into nine submatrices based on the following condition. The program must divide the matrix horizontally after the Xth row and Yth row. Then the program must divide the matrix vertically after the Pth column and Qth column. Finally, the program must print the sum of integers in each submatrix as the output.
Input:
6 5
6 9 2 9 2
7 1 9 3 2
9 9 1 2 6
6 5 7 1 9
6 6 6 2 3
1 6 7 9 7
3 5 2 4
Output:
41 26 10 23 16 12 7 16 7
Explanation:
Here X = 3, Y=5, P = 2 and Q = 4
The nine submatrices and their sums are given below.
1st submatrix sum= 6+9+7+1+9+9 =41
6 9
7 1
9 9
2nd submatrix sum= 2+9+9+3+1+2 =26
2 9
9 3
1 2
3rd submatrix sum= 2+2+6 = 10
2
2
6
4th submatrix sum= 6+5+6+6 = 23
6 5
6 6
5th submatrix sum = 7+1+6+2 = 16
7 1
6 2
6th submatrix sum = 9 + 3 = 12
9
3
7th submatrix sum = 1 + 6 = 7
1 6
8th submatrix sum = 7 + 9 = 16
7 9
9th submatrix sum = 7
7
My program:
r,c=map(int,input().split())
m=[list(map(int,input().split())) for i in range(r)]
x,y,p,q=list(map(int,input().split()))
for i in range(x):
for j in range(p):
print(m[i][j])
print()
How to iterate from the given row and column and find the submatrix and print the sum?
Here are five solutions...
After reading all input like you did, you could go through the three boundary pairs for columns and the three boundary pairs for columns:
print(*(sum(sum(row[j:J]) for row in m[i:I])
for i, I in [(0, x), (x, y), (y, r)]
for j, J in [(0, p), (p, q), (q, c)]))
Same idea, slicing earlier / less often:
print(*(sum(sum(row[j:J]) for row in rows)
for rows in [m[:x], m[x:y], m[y:]]
for j, J in [(0, p), (p, q), (q, c)]))
Or without slicing:
print(*(sum(m[i][j]
for i in range(*irange)
for j in range(*jrange))
for irange in [(0, x), (x, y), (y, r)]
for jrange in [(0, p), (p, q), (q, c)]))
Or go through the matrix and update the right one of the nine sums:
sums = [0] * 9
for i in range(r):
for j in range(c):
sums[((i>=x)+(i>=y)) * 3 + (j>=p)+(j>=q)] += m[i][j]
print(*sums)
Again a variation:
sums = [0] * 9
for i in range(r):
for j in range(c):
sums[(0 if i<x else 3 if i<y else 6) +
(0 if j<p else 1 if j<q else 2)] += m[i][j]
print(*sums)
Try it online!

How do you use the range function to count from 0 to 8 by increments of 1 for each iteration in a for loop

I have a triple for loop that creates a 1 row and 2 column collection of numbers starting at 0 0 and going up to 2 2. The third for loop counts from 0 to 8. The code looks as follows:
for N in range(0,3):
for K in range(0,3):
print(N,K)
for P in range(0,9):
print(P)
If you run this code you get the obvious output:
0 0
0
1
2
3
4
5
6
7
8
0 1
0
1
2
3
4
5
6
7
8
0 2
0
1
2
3
4
5
6
7
8
...
And so on. I want instead of the output of 0 to 8 after the N K printout, instead something that looks like:
0 0
0
0 1
1
0 2
2
1 0
3
1 1
4
1 2
5
2 0
6
2 1
7
2 2
8
My first guess was an if statement that said:
if P == Q:
break
where Q was several sets of sums and even the N,K array. However, I couldn't figure out the best way to get my
wanted output. I do think an if statement is the best way to achieve my wanted result, but I'm not quite sure of how to approach it. P is necessary for the rest of my code as it will be used in some subplots.
As this is just an increment by one at each print, you can just do compute the index with N * 3 + K
for N in range(0, 3):
for K in range(0, 3):
print(N, K)
print(N * 3 + K)
CODE DEMO
You can use zip to traverse two iterables in parallel. In this case, one of the iterables is the result of a nested list. That can be handled by using itertools.product, as follows:
import itertools
for (N, K), P in zip(itertools.product(range(3), range(3)), range(9)):
print(N, K)
print(P)

Compare current column value to different column value by row slices

Assuming a dataframe like this
In [5]: data = pd.DataFrame([[9,4],[5,4],[1,3],[26,7]])
In [6]: data
Out[6]:
0 1
0 9 4
1 5 4
2 1 3
3 26 7
I want to count how many times the values in a rolling window/slice of 2 on column 0 are greater or equal to the value in col 1 (4).
On the first number 4 at col 1, a slice of 2 on column 0 yields 5 and 1, so the output would be 2 since both numbers are greater than 4, then on the second 4 the next slice values on col 0 would be 1 and 26, so the output would be 1 because only 26 is greater than 4 but not 1. I can't use rolling window since iterating through rolling window values is not implemented.
I need something like a slice of the previous n rows and then I can iterate, compare and count how many times any of the values in that slice are above the current row.
I have done this using list instead of doing it in data frame. Check the code below:
list1, list2 = df['0'].values.tolist(), df['1'].values.tolist()
outList = []
for ix in range(len(list1)):
if ix < len(list1) - 2:
if list2[ix] < list1[ix + 1] and list2[ix] < list1[ix + 2]:
outList.append(2)
elif list2[ix] < list1[ix + 1] or list2[ix] < list1[ix + 2]:
outList.append(1)
else:
outList.append(0)
else:
outList.append(0)
df['2_rows_forward_moving_tag'] = pd.Series(outList)
Output:
0 1 2_rows_forward_moving_tag
0 9 4 1
1 5 4 1
2 1 3 0
3 26 7 0

Identify clusters linked by delta to the left and different delta to the right

Consider the sorted array a:
a = np.array([0, 2, 3, 4, 5, 10, 11, 11, 14, 19, 20, 20])
If I specified left and right deltas,
delta_left, delta_right = 1, 1
Then this is how I'd expect the clusters to be assigned:
# a = [ 0 . 2 3 4 5 . . . . 10 11 . . 14 . . . . 19 20
# 11 20
#
# [10--|-12] [19--|-21]
# [1--|--3] [10--|-12] [19--|-21]
# [-1--|--1] [3--|--5] [9--|-11] [18--|-20]
# +--+--|--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
# [2--|--4] [13--|-15]
#
# │ ╰──┬───╯ ╰┬─╯ │ ╰┬─╯
# │ cluster 2 Cluster 3 │ Cluster 5
# Cluster 1 Cluster 4
NOTE: Despite the interval [-1, 1] sharing an edge with [1, 3], neither interval includes an adjacent point and therefore do not constitute joining their respective clusters.
Assuming the cluster assignments were stored in an array named clusters, I'd expect the results to look like this
print(clusters)
[1 2 2 2 2 3 3 3 4 5 5 5]
However, suppose I change the left and right deltas to be different:
delta_left, delta_right = 2, 1
This means that for a value of x it should be combined with any other point in the interval [x - 2, x + 1]
# a = [ 0 . 2 3 4 5 . . . . 10 11 . . 14 . . . . 19 20
# 11 20
#
# [9-----|-12] [18-----|-21]
# [0-----|--3] [9-----|-12] [18-----|-21]
# [-2-----|--1][2-----|--5] [8-----|-11] [17-----|-20]
# +--+--|--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
# [1 ----|--4] [12-----|-15]
#
# ╰─────┬─────╯ ╰┬─╯ │ ╰┬─╯
# cluster 1 Cluster 2 │ Cluster 4
# Cluster 3
NOTE: Despite the interval [9, 12] sharing an edge with [12, 15], neither interval includes an adjacent point and therefore do not constitute joining their respective clusters.
Assuming the cluster assignments were stored in an array named clusters, I'd expect the results to look like this:
print(clusters)
[1 1 1 1 1 2 2 2 3 4 4 4]
We will leverage np.searchsorted and logic to find cluster edges.
First, let's take a closer look at what np.searchsorted does:
Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.
What I'll do is execute np.searchsorted with a using a - delta_left. Let's look at that for delta_left = 1
# a =
# [ 0 2 3 4 5 10 11 11 14 19 20 20]
#
# a - delta_left
# [-1 1 2 3 4 9 10 10 13 18 19 19]
-1 would get inserted at position 0 to maintain order
1 would get inserted at position 1 to maintain order
2 would get inserted at position 1 as well, indicating that 2 might be in the same cluster as 1
3 would get inserted at position 2 indicating that 3 might be in the same cluster as 2
so on and so forth
What we notice is that only when an element less delta would get inserted at its current position would we consider a new cluster starting.
We do this again for the right side with a difference. The difference is that by default if a bunch of elements are the same, np.searchsorted assumes to insert into the front of values. To identify the ends of clusters, I'm going to want to insert after the identical elements. Therefore I'll use the paramater side='right'
If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of a).
Now the logic. A cluster can only begin if a prior cluster has ended, with the exception of the first cluster. We'll then consider a shifted version of the results of our second np.searchsorted
Let's now define our function
def delta_cluster(a, dleft, dright):
# use to track whether searchsorted results are at correct positions
rng = np.arange(len(a))
edge_left = a.searchsorted(a - dleft)
starts = edge_left == rng
# we append 0 to shift
edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
ends = edge_right == rng
return (starts & ends).cumsum()
demonstration
with left, right deltas equal to 1 and 1
print(delta_cluster(a, 1, 1))
[1 2 2 2 2 3 3 3 4 5 5 5]
with left, right deltas equal to 2 and 1
print(delta_cluster(a, 2, 1))
[1 1 1 1 1 2 2 2 3 4 4 4]
Extra Credit
What if a isn't sorted?
I'll utilize information learned from this post
def delta_cluster(a, dleft, dright):
s = a.argsort()
size = s.size
if size > 1000:
y = np.empty(s.size, dtype=np.int64)
y[s] = np.arange(s.size)
else:
y = s.argsort()
a = a[s]
rng = np.arange(len(a))
edge_left = a.searchsorted(a - dleft)
starts = edge_left == rng
edge_right = np.append(0, a.searchsorted(a + dright, side='right')[:-1])
ends = edge_right == rng
return (starts & ends).cumsum()[y]
demonstration
b = np.random.permutation(a)
print(b)
[14 10 3 11 20 0 19 20 4 11 5 2]
print(delta_cluster(a, 2, 1))
[1 1 1 1 1 2 2 2 3 4 4 4]
print(delta_cluster(b, 2, 1))
[3 2 1 2 4 1 4 4 1 2 1 1]
print(delta_cluster(b, 2, 1)[b.argsort()])
[1 1 1 1 1 2 2 2 3 4 4 4]

Categories