I have a (8864,40) array A, containing both negative and positive values. I wanna divide the positive values of the array with the maximum value of A, and divide the negative values with the minimum of A. Is it possible to do this whilst also keeping the shape of the array A? Any help would be appreciated.
please see the snipped below
A[A > 0] /= np.max(A)
A[A < 0] /= np.min(A)
This?
np.where(A > 0, A/A.max(), A/A.min())
If it is a list you can use list comprehension such as
x = [-2, 1, 3, 0, -4, -1, 0, 5, 2]
y = [i / max(x) if i > 0 else i / abs(min(x)) for i in x]
print(x)
print(y)
that produces
[-2, 1, 3, 0, -4, -1, 0, 5, 2]
[-0.5, 0.2, 0.6, 0.0, -1.0, -0.25, 0.0, 1.0, 0.4]
where sign of the number - or + is conserved. Without the use of abs() you will get only positive values.
I do not quite understand by the phrase
Is it possible to do this whilst also keeping the shape of the array A?
By any change the shape means the sign?
Related
I understand the output of torch.where() as per the content mentioned in the documentation.
However, I do not understand the output it produces when x and y are not given as show below (the dimensionality of this output keeps varying though the shape of x remains the same). Can someone help me understand?
y = torch.ones(3, 2)
x = torch.randn(3, 2)
print(x)
----------------------------
tensor([[-0.0022, 0.4871],
[ 0.0788, 0.2937],
[ 0.1909, -2.1636]])
----------------------------
print(torch.where(x > 0, x, y))
----------------------------
tensor([[1.0000, 0.4871],
[0.0788, 0.2937],
[0.1909, 1.0000]])
----------------------------
print(torch.where(x > 0))
(tensor([0, 1, 1, 2]), tensor([1, 0, 1, 0]))
This version of torch.where intend to return satisfying element indexes.
print(f"Y={torch.where(x > 0)[0].numpy()}")
print(f"X={torch.where(x > 0)[1].numpy()}")
--------------------------------
Y=[0 1 1 2]
X=[1 0 1 0]
Here you can better see coordinates of positives in matrix.
After trying out a few examples of different shaped tensors (including 3 and 4 dimension tensors), I understood that when only the boolean condition is passed but not the x and y parameters to torch.where() function, it outputs separate tensors, one for each of the dimensions with the corresponding indices of the elements that meet the boolean condition.
So, in the above output ---- (tensor([0, 1, 1, 2]), tensor([1, 0, 1, 0])) ---- tensor([0, 1, 1, 2]) represents the indices of 1st dimension for all the elements that satisfy the condition x > 0, and the tensor([1, 0, 1, 0]) represents the indices of 2nd dimension of the same elements.
I have the following array
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
and would like to apply two thresholds, such that all values below -1.0 are set to 1 and all values above -0.3 are set to 0. For the values inbetween, the following rule should apply: if the last value was below -1.0 then it should be a 1 but if the last value was above -0.3, then it should be a 0.
For the example array above, the output should be
target = np.array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0])
If multiple consecutive values are between -1.0 and -0.3, then it should go back as far as required until there is a value above or below the two thresholds and set the output accordingly.
I tried to achieve this by iterating over the array and using a while inside the for loop to find the next occurence where the value is above the threshold, but it doesn't work:
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
p = []
def function(array, p):
for i in np.nditer(array):
if i < -1:
while i <= -0.3:
p.append(1)
i += 1
else:
p.append(0)
i += 1
return p
a = function(array, p)
print(a)
How can I apply the two thresholds to my array as described above?
What you are trying to achieve is called "thresholding with hysteresis". For this, I adapted the very nice algorithm from this answer:
Given your test data,
import numpy as np
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])
you detect which values are below the first threshold -1.0, and which are above the second threshold -0.3:
low_values = array <= -1.0
high_values = array >= -0.3
These are the values for which you know the result: either 1 or 0. For all other values, it depends on its neighbors. Thus, all values for which either low_values or high_values is True are known.
You can get the indices of all known elements with:
known_values = high_values | low_values
known_idx = np.nonzero(known_values)[0]
To find the result for all unknown values, we use the np.cumsum function on the known_values array. The Booleans are interpreted as 0 or 1, so this gives us the following array:
acc = np.cumsum(known_values)
which will result in the following for your example:
[ 0 1 2 2 3 4 5 6 7 8 9 10 11].
Now, known_idx[acc - 1] will contain the index of the last known value for each point. With low_values[known_idx[acc - 1]] you get a True if the last known value was below -1.0 and a False if it was above -0.3:
result = low_values[known_idx[acc - 1]]
There is one problem left: If the initial value is below -1.0 or above -0.3, then everything works out perfectly fine. But if it is in-between, then it would depend on its left neighbor - which it doesn't have. So in your case, you simply define it to be zero.
We can do that by checking if acc[0] equals 0 or 1. If acc[0] = 1, then everything is fine, but if acc[0] = 0, then this means that the first value is between -1.0 and -0.3, so we have to set it to zero:
if not acc[0]:
result[0] = False
Finally, as we were doing lots of comparisons, our result array is a boolean array. To convert it to integer 0 and 1, we simply call
result = np.int8(result)
and we get our desired result:
array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0], dtype=int8)
Is there a way to avoid this loop so optimize the code?
import numpy as np
cLoss = 0
dist_ = np.array([0,1,0,1,1,0,0,1,1,0]) # just an example, longer in reality
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1]) # just an example, longer in reality
t = float(dist_.size)
for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
cLoss+= 1 - TLabels[i]*(1. * np.sum(labels)/t)
print cLoss
Note: dist_ and TLabels are both numpy arrays with the same shape (t,1)
I am not sure what you exactly want to do, but are you aware of scipy.ndimage.measurements for computing on arrays with labels? It look like you want something like:
cLoss = len(dist_) - sum(TLabels * scipy.ndimage.measurements.sum(TLabels,dist_,dist_) / len(dist_))
I first wonder, what is labels at each step in the loop?
With dist_ = array([2,1,2]) and TLabels=array([1,2,3])
I get
[-1 1]
[1]
[-1 1]
The different length immediately raise a warning flag - it may be difficult to vectorize this.
With the longer arrays in the edited example
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
[-1 1 -1 -1 -1]
[ 1 1 1 1 -1]
[ 1 1 1 1 -1]
[-1 1 -1 -1 -1]
The labels vectors are all the same length. Is that normal, or just a coincidence of values?
Drop a couple of elements off of dist_, and labels are:
In [375]: for i in range(len(dist_)):
labels = TLabels[dist_ == dist_[i]]
v = (1.*np.sum(labels)/t); v1 = 1-TLabels[i]*v
print(labels, v, TLabels[i], v1)
cLoss += v1
.....:
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, 1, 1.25)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([-1, 1, -1, -1]), -0.25, -1, 0.75)
(array([1, 1, 1, 1]), 0.5, 1, 0.5)
Again different lengths of labels, but really only a few calculations. There is 1 v value for each different dist_ value.
Without working out all the details, it looks like you are just calculating labels*labels for each distinct dist_ value, and then summing those.
This looks like a groupBy problem. You want to divide the dist_ into groups with a common value, and sum some function of their corresponding TLabels values. Python itertools has a groupBy function, so does pandas. I think both require you to sort dist_.
Try sorting dist_ and see if that adds any clarity to the problem.
I'm not sure if this is any better since I didn't exactly understand why you might want to do this. Many variables in your loop are bivalued hence can be computed in advance.
Also the entries of dist_ can be used as a boolean switch but I used an explicit copy anyhow.
dist_ = np.array([0,1,0,1,1,0,0,1,1,0])
TLabels = np.array([-1,1,1,1,1,-1,-1,1,-1,-1])
t = len(dist)
dist_zeros = dist_== 0
one_zero_sum = [sum(TLabels[dist_zeros])/t , sum(TLabels[~dist_zeros])/t]
cLoss = sum([1-x*one_zero_sum[dist_[y]] for y,x in enumerate(TLabels)])
which results in cLoss = 8.2. I am using Python3 so didn't check whether this is a true division or not in Python2.
let's say I have:
c = array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], float)
then I take a fast fourier transform:
r = rfft(c)
which produces the following complex array:
r = [ 21.+0.j , -3.+5.19615242j , -3.+1.73205081j , -3.+0.j ]
the number of elements in the new array is 1/2*N + 1.
I'm trying to tell python to change the values of SPECIFIC elements in the new array. I want to tell python to keep the FIRST 50% of the elements and to set the others equal to zero, so instead the result would look like
r = r = [ 21.+0.j , -3.+5.19615242j , 0 , 0 ]
how would I go about this?
rfft return a numpy array which helps easy manipulation of the array.
c = [1,2,3,4,5,6]
r = rfft(c)
r[r.shape[0]/2:] = 0
r
>> array([21.+0.j, -3.+5.1961j, 0.+0.j , 0.+0.j])
You can use slice notation and extend the result to the correct length:
r = r[:len(r)/2].extend([0] * (len(r) - len(r) / 2))
The * syntax just repeats the zero element the specified number of times.
You can split the list in half, then append a list of zeros as the same length as the remaining part:
>>> i
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> i[:len(i)/2] + [0]*len(i[len(i)/2:])
[1, 2, 3, 4, 5, 0, 0, 0, 0, 0]
I want to iterate a numpy array and process only elements match with specific criteria. In the code below, I want to perform calculation only if element is greater than 1.
a = np.array([[1,3,5],
[2,4,3],
[1,2,0]])
for i in range(0, a.shape[0]):
for j in range(0, a.shape[1]):
if a[i,j] > 1:
a[i,j] = (a[i,j] - 3) * 5
Is it possible to use single-line code instead of the double loop above? and perhaps make it faster?
Method #1: use a boolean array to index:
>>> a = np.array([[1,3,5], [2,4,3], [1,2,0]])
>>> a[a > 1] = (a[a > 1] - 3) * 5
>>> a
array([[ 1, 0, 10],
[-5, 5, 0],
[ 1, -5, 0]])
This computes a > 1 twice, although you could assign it to a variable instead. (In practice it's very unlikely to be a bottleneck, of course, although if a is large enough memory can be an issue.)
Method #2: use np.where:
>>> a = np.array([[1,3,5], [2,4,3], [1,2,0]])
>>> np.where(a > 1, (a-3)*5, a)
array([[ 1, 0, 10],
[-5, 5, 0],
[ 1, -5, 0]])
This only computes a > 1 once, but OTOH computes (ax-3)*5 for every element ax in a, instead of only doing it for those elements that really need it.
for index, x in np.ndenumerate(a):
if x > 1:
a[index] = (a[index] - 3) * 5