Fast way to round array values based on condition - python

I have an array like this:
a = np.array([
[0.02, 1.01, 4.01, 3.00, 5.12],
[2.11, 1.50, 3.98, 0.52, 5.01]])
and a "condition" array:
c = np.array([0, 1, 4, 5])
I want to round a[i][j]=c[k] if c[k] - const < a[i][j] < c[k] + const, otherwise a[i][j] = 0
For example, if const = 0.05. The result could be:
a_result = [[0 1 4 0 0]
[0 0 4 0 5]]
The navie way is to use 3 for loop to check for each a[i][j] and c[k]. However, it's very slow when a is big. Do we have a fast "python way" to do this?
For loop (slow) solution:
a_result = np.full(a.shape, 0)
const = 0.05
mh, mw = a.shape
for i in range(mh-1):
for j in range(mw-1):
for k in range(1, len(c)):
if a[i][j] > (c[k] - const) and a[i][j] < (c[k] + const):
a_result[i][j] = c[k]

Approach #1
One vectorized approach would be with broadcasting -
c[(np.abs(a - c[:,None,None]) < const).argmax(0)]
Sample run -
In [312]: a
Out[312]:
array([[ 0.02, 1.01, 4.01, 3. , 5.12],
[ 2.11, 1.5 , 3.98, 0.52, 5.01]])
In [313]: c
Out[313]: array([0, 1, 4, 5])
In [314]: c[(np.abs(a - c[:,None,None]) < const).argmax(0)]
Out[314]:
array([[0, 1, 4, 0, 0],
[0, 0, 4, 0, 5]])
Approach #2
Another one that would be closer to what we had in the question, but vectorized, like so -
mask = ((c[:,None,None] - const) < a) & (a < (c[:,None,None] + const))
out = c[mask.argmax(0)]
Approach #3
Here's another with memory efficiency in mind, based on this post -
idx = np.searchsorted(c, a, side="left").clip(max=c.size-1)
mask = (idx > 0) & \
( (idx == len(xx)) | (np.fabs(yy - xx[idx-1]) < np.fabs(yy - xx[idx])) )
idx0 = idx-mask
out = xx[idx0]
out[np.abs(c[idx0] - a) >= const] = 0

Related

Unable to access float object in a 2D array in Python

I need to return the vector solution x of Ux = b for an upper triangular matrix U and vector b using back substitution, but I'm unable to actually access an element of the matrix U.
def BackSub(U, b):
n = len(U)
x = [0 for i in range(n)]
for i in range(n - 1, -1, -1):
s = 0
for j in range(n - 1, i, -1):
s += (U[i][j])*b[j]
b[i] = (b[i] - s)/(U[i][i])
return b
b = [5, 6, 3, 2]
U = [[ 1, 2, 1, 0],
[ 0, 3, -5, 9],
[ 0, 0, 0.5, 1],
[ 0, 0, 0, 7]]
N = GaussElim(U, b)
x = BackSub(N, b)
It returns
TypeError: 'float' object is not subscriptable for U[i][i]
The GaussElim function is this
import numpy as np
def GaussElim(A, b):
n = len(b) #n is matrix size
#Elimination phase
for k in range(0 , n - 1): #k is matrix row
for i in range(k + 1, n): #i is matrix col
if A[i][k] != 0:
factor = A[i][k]/ A[k][k]
A[i][k + 1 : n] = A[i][k + 1 : n] - np.multiply(factor, A[k][k + 1 : n])
b[i] = b[i] - np.multiply(factor, b[k])
#Back substitution
for k in range(n - 1, -1, -1):
b[k] = (b[k] - dot(A[k][k + 1 : n], b[k + 1 : n]))/A[k][k]
return b

How to iterate through a NumPy array while filtering cells values by index and performing math on it

I am trying to filter/loop and perform math within the same iteration but cant seem to find the right answer. I have a numpy array, that is size 6, 2, and consists of two values that I want to minus together, however I want the values filtered before the minus process commences.
So if the value is greater than in the other column, then the lowest value should be subtracted from the high value, and vice versa. Also this needs to happen in a loop which iterates through the array while performing the filtering and math.
This is my code example:
#minus price
print('minus price trying appending')
minus_p_orgp1 = np.append(dif_p_times1, fp, axis=0)
print(minus_p_orgp1)
for ii, vv in enumerate(minus_p_orgp1):
print('greater')
greater_1 = np.all(ii > 0, axis=0)
greater_0 = np.all(ii <= 0, axis=0)
if greater_1 < greater_0:
iit = greater_0 - greater_1
if greater_1 > greater_0:
iit = greater_1 - greater_0
print(iit, ii, vv)
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(minus_p_orgp1) - 1):
if minus_p_orgp1[i] < minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i + 1] - minus_p_orgp1[i]
elif minus_p_orgp1[i + 1] > minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i] - minus_p_orgp1[i + 1]
print(ssss)
This is a print of the array where the upper vector is def_p_time1, and lower vector is fp:
minus price trying appending
[[79340.33057205 78379.24102508 72188.80527274 76557.26239563
72857.90423589 71137.7943199 ]
[43528.22 43705. 43931.07 44571.24
44330.43 44465.64 ]]
What can I do to achieve my goal?
I have also tried to do the process with just having the array being two separate vectors with size 6, 1. But that also seems very difficult, let me know what you think.
I have also just tried this; however it just prints out zeros when running the code:
trii = np.array([[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1]])
print(trii)
print(minus_p_orgp1[~(trii >= 1)])
print('it works')
itt = minus_p_orgp1[~(trii >= 1)]
itt1 = minus_p_orgp1[~(trii >= 0)]
sssss = np.zeros(dif_p_times1.size - 1)
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(dif_p_times1) - 1):
for ii in range(len(fp) - 1):
if itt < itt1:
sssss[i] = itt[i] + itt1[i + 1]
ssss[i, ii] = fp[ii + 1] - dif_p_times1[i]
elif itt > itt1:
sssss[i] = itt[i + 1] + itt1[i]
ssss[i, ii] = dif_p_times1[i] - fp[ii + 1]
print(sssss)
[[0 0 0 0 0 0]
[1 1 1 1 1 1]]
[63455.70703442 68744.47486851 77804.44752373 79686.34612013
69322.78250338 83255.08459329]
it does something
[0. 0. 0. 0. 0.]
Here is a new attempt, however it still doesn't work:
ssss = np.zeros(minus_p_orgp1.size - 1)
x = minus_p_orgp1[::2]
y = minus_p_orgp1[::-2]
z = ssss[::2]
for z, x, y in range(len(minus_p_orgp1) - 1):
if x[i + 1] < y[i]:
z[i] = y[i + 1] - x[i]
elif x[i + 1] > y[i]:
z[i] = x[i + 1] - y[i]
print(z)
Is there a way to iterate through multidimensional lists which has real values not created from the reshape/arrange functions and still being able to perform boolean filtering on individual cells, from where you perform a math operation?
I have looked at nditer, and numba, but all seems to do a version where they transpose the dimensions of a 2d array and reduce the values, which with real values only returned me 1 value, and not an array.
Looking at your first block
minus_p_orgp1 = np.append(dif_p_times1, fp, axis=0)
print(minus_p_orgp1)
looks like minus_p_orgp1 is a (2,6) array, just a row join of the 2 arrays.
for ii, vv in enumerate(minus_p_orgp1):
print('greater')
greater_1 = np.all(ii > 0, axis=0)
greater_0 = np.all(ii <= 0, axis=0)
if greater_1 < greater_0:
iit = greater_0 - greater_1
if greater_1 > greater_0:
iit = greater_1 - greater_0
print(iit, ii, vv)
vv is, iteratively, the 2 rows of minus_p_orgp1. ii is, iteratively, 0 and 1. So this np.all(ii>0...) business is just testing whether iiis 0 or. It sets iit in each loop, but doesn't save the value anywhere. At the end of the loop it has the last value, but so what?
In the following size is 12 (2*6), so ssss in np.zeros(11)
But the iteration is over 2-1, i.e. it just evaluates for i=0:
ssss = np.zeros(minus_p_orgp1.size - 1)
for i in range(len(minus_p_orgp1) - 1):
if minus_p_orgp1[i] < minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i + 1] - minus_p_orgp1[i]
elif minus_p_orgp1[i + 1] > minus_p_orgp1[i]:
ssss[i] = minus_p_orgp1[i] - minus_p_orgp1[i + 1]
print(ssss)
minus_p_orgp1[i] < minus_p_orgp1[i] is an array of 6 False. I expect that to raise an ambiguity error when used in the if clause.
minus_p_orgp1[i + 1] > minus_p_orgp1[i] makes more sense, but still can't be used in elif. Are you taking the difference across columns or rows?
The initial word description is a bit vague, but it sure sounds like you just want the positive difference between two arrays:
In [68]: x = np.array([1,3,2,4]); y = np.array([2,1,3,3])
In [69]: x-y
Out[69]: array([-1, 2, -1, 1])
In [70]: y-x
Out[70]: array([ 1, -2, 1, -1])
In [71]: np.abs(x-y)
Out[71]: array([1, 2, 1, 1])
an iterative equivalent (which would work with lists just as well):
In [72]: z = np.zeros(4, int)
In [73]: for i in range(4):
...: if x[i]>y[i]:
...: z[i] = x[i]-y[i]
...: else:
...: z[i] = y[i]-x[i]
...:
In [74]: z
Out[74]: array([1, 2, 1, 1])
or simply:
In [75]: [abs(i-j) for i,j in zip(x,y)]
Out[75]: [1, 2, 1, 1]
As per hpaulj's answer, this worked:
[abs(i-j) for i,j in zip(x,y)]

Count number of clusters of non-zero values in Python?

My data looks something like this:
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.
Number of zeros between groups of non-zeros is variable
Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)
With a as the input array, we could have a vectorized solution -
m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]
Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -
out = np.count_nonzero(m[1:] > m[:-1]) + m[0]
Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.
Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.
Sample runs for three cases -
In [92]: a # Case1 :Given sample
Out[92]:
array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0,
0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7,
1])
In [93]: m = a!=0
In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3
In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start
In [96]: m = a!=0
In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4
In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end
In [100]: m = a!=0
In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5
You may achieve it via using itertools.groupby() with list comprehension expression as:
>>> from itertools import groupby
>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3
simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
print(count) # 3
pad array with a zero on both sides with np.concatenate
find where zero with a == 0
find boundaries with np.diff
sum up boundaries found with sum
divide by two because we will have found twice as many as we want
def nonzero_clusters(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
demonstration
nonzero_clusters(
[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)
3
nonzero_clusters([0, 1, 2, 0, 1, 2])
2
nonzero_clusters([0, 1, 2, 0, 1, 2, 0])
2
nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])
3
timing
a = np.random.choice((0, 1), 100000)
code
from itertools import groupby
def div(a):
m = a != 0
return (m[1:] > m[:-1]).sum() + m[0]
def pir(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
def jean(a):
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
return count
def moin(a):
return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
def user(a):
return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])
sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

Numpy sum running length of non-zero values

Looking for a fast vectorized function that returns the rolling number of consecutive non-zero values. The count should start over at 0 whenever encountering a zero. The result should have the same shape as the input array.
Given an array like this:
x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])
The function should return this:
array([1, 2, 3, 0, 0, 1, 0, 1, 2])
This post lists a vectorized approach which basically consists of two steps:
Initialize a zeros vector of the same size as input vector, x and set ones at places corresponding to non-zeros of x.
Next up, in that vector, we need to put minus of runlengths of each island right after the ending/stop positions for each "island". The intention is to use cumsum again later on, which would result in sequential numbers for the "islands" and zeros elsewhere.
Here's the implementation -
import numpy as np
#Append zeros at the start and end of input array, x
xa = np.hstack([[0],x,[0]])
# Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewhere
xa1 =(xa!=0)+0
# Find consecutive differences on xa1
xadf = np.diff(xa1)
# Find start and stop+1 indices and thus the lengths of "islands" of non-zeros
starts = np.where(xadf==1)[0]
stops_p1 = np.where(xadf==-1)[0]
lens = stops_p1 - starts
# Mark indices where "minus ones" are to be put for applying cumsum
put_m1 = stops_p1[[stops_p1 < x.size]]
# Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewhere
vec = xa1[1:-1] # Note: this will change xa1, but it's okay as not needed anymore
vec[put_m1] = -lens[0:put_m1.size]
# Perform cumsum to get the desired output
out = vec.cumsum()
Sample run -
In [116]: x
Out[116]: array([ 0. , 2.3, 1.2, 4.1, 0. , 0. , 5.3, 0. , 1.2, 3.1, 0. ])
In [117]: out
Out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)
Runtime tests -
Here's some runtimes tests comparing the proposed approach against the other itertools.groupby based approach -
In [21]: N = 1000000
...: x = np.random.rand(1,N)
...: x[x>0.5] = 0.0
...: x = x.ravel()
...:
In [19]: %timeit sumrunlen_vectorized(x)
10 loops, best of 3: 19.9 ms per loop
In [20]: %timeit sumrunlen_loopy(x)
1 loops, best of 3: 2.86 s per loop
You can use itertools.groupby and np.hstack :
>>> import numpy as np
>>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])
>>> from itertools import groupby
>>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)])
array([ 1., 2., 3., 0., 0., 1., 0., 1., 2.])
We can group the array elements based on non-zero elements then use a list comprehension and enumerate to replace the non-zero sub-arrays with those index then flatten the list with np.hstack.
This sub-problem came up in Kick Start 2021 Round A for me. My solution:
def current_run_len(a):
a_ = np.hstack([0, a != 0, 0]) # first in starts and last in stops defined
d = np.diff(a_)
starts = np.where(d == 1)[0]
stops = np.where(d == -1)[0]
a_[stops + 1] = -(stops - starts) # +1 for behind-last
return a_[1:-1].cumsum()
In fact, the problem also required a version where you count down consecutive sequences. Thus here another version with an optional keyword argument which does the same for rev=False:
def current_run_len(a, rev=False):
a_ = np.hstack([0, a != 0, 0]) # first in starts and last in stops defined
d = np.diff(a_)
starts = np.where(d == 1)[0]
stops = np.where(d == -1)[0]
if rev:
a_[starts] = -(stops - starts)
cs = -a_.cumsum()[:-2]
else:
a_[stops + 1] = -(stops - starts) # +1 for behind-last
cs = a_.cumsum()[1:-1]
return cs
Results:
a = np.array([1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1])
print('a = ', a)
print('current_run_len(a) = ', current_run_len(a))
print('current_run_len(a, rev=True) = ', current_run_len(a, rev=True))
a = [1 1 1 1 0 0 0 1 1 0 1 0 0 0 1]
current_run_len(a) = [1 2 3 4 0 0 0 1 2 0 1 0 0 0 1]
current_run_len(a, rev=True) = [4 3 2 1 0 0 0 2 1 0 1 0 0 0 1]
For an array that consists of 0s and 1s only, you can simplify [0, a != 0, 0] to [0, a, 0]. But the version as-posted also works for arbitrary non-zero numbers.

How to iterate Numpy array and perform calculation only if element matches a criteria?

I want to iterate a numpy array and process only elements match with specific criteria. In the code below, I want to perform calculation only if element is greater than 1.
a = np.array([[1,3,5],
[2,4,3],
[1,2,0]])
for i in range(0, a.shape[0]):
for j in range(0, a.shape[1]):
if a[i,j] > 1:
a[i,j] = (a[i,j] - 3) * 5
Is it possible to use single-line code instead of the double loop above? and perhaps make it faster?
Method #1: use a boolean array to index:
>>> a = np.array([[1,3,5], [2,4,3], [1,2,0]])
>>> a[a > 1] = (a[a > 1] - 3) * 5
>>> a
array([[ 1, 0, 10],
[-5, 5, 0],
[ 1, -5, 0]])
This computes a > 1 twice, although you could assign it to a variable instead. (In practice it's very unlikely to be a bottleneck, of course, although if a is large enough memory can be an issue.)
Method #2: use np.where:
>>> a = np.array([[1,3,5], [2,4,3], [1,2,0]])
>>> np.where(a > 1, (a-3)*5, a)
array([[ 1, 0, 10],
[-5, 5, 0],
[ 1, -5, 0]])
This only computes a > 1 once, but OTOH computes (ax-3)*5 for every element ax in a, instead of only doing it for those elements that really need it.
for index, x in np.ndenumerate(a):
if x > 1:
a[index] = (a[index] - 3) * 5

Categories