From list comprehension to numpy.where()

From list comprehension to numpy.where() - python

I have the following code that converts a noisy square wave to a noiseless one:
import numpy as np
threshold = 0.5
low = 0
high = 1
time = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
amplitude = np.array([0.1, -0.2, 0.2, 1.1, 0.9, 0.8, 0.98, 0.2, 0.1, -0.1])
# using list comprehension
new_amplitude_1 = [low if a<threshold else high for a in amplitude]
print(new_amplitude_1)
# gives: [0, 0, 0, 1, 1, 1, 1, 0, 0, 0]
# using numpy's where
new_amplitude_2 = np.where(amplitude > threshold)
print(new_amplitude_2)
# gives: (array([3, 4, 5, 6]),)
Is is possible to use np.where() in order to obtain identical result for new_amplitude_2 as the list comprehension (new_amplitude_1) in this case?
I read some tutorials online but I can't see the logic to have an if else inside np.where(). Maybe I should use another function?

Here's how you can do it using np.where:
np.where(amplitude < threshold, low, high)
# array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])

you can do it without where:
new_ampl2 = (amplitude > 0.5).astype(np.int32)
print(new_ampl2)
Out[11]:
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])

Related

How to replace every third 0 in the numpy array consists of 0 and 1?

I'm new to Python and Stackoverflow, so I'm sorry in advance if this question is silly and/or duplicated.
I'm trying to write a code that replaces every nth 0 in the numpy array that consists of 0 and 1.
For example, if I want to replace every third 0 with 0.5, the expected result is:
Input: np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1])
Output: np.array([0, 0, 0.5, 0, 1, 1, 1, 1, 1, 0, 0.5, 1, 0, 1])
And I wrote the following code.
import numpy as np
arr = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1])
counter = 0
for i in range(len(arr)):
if arr[i] == 0 and counter%3 == 0:
arr[i] = 0.5
counter += 1
print(arr)
The expected output is [0, 0, 0.5, 0, 1, 1, 1, 1, 1, 0, 0.5, 1, 0, 1].
However, the output is exactly the same as input and it's not replacing any values...
Does anyone know why this does not replace value and how I can solve this?
Thank you.

Reasonably quick and dirty:
Find the indices of entries that are zero
indices = np.flatnonzero(arr == 0)
Take every third of those indices
indices = indices[::3]
As noted in a comment, you need a float type
arr = arr.astype(float)
Set those indices to 0.5
arr[indices] = 0.5

Comparing an array with an exact value and an approximate value in Python

I have a Python matrix array for example like this one:
a = array([[0, 2, 1, 1.4142, 4, 7],
[3, 0, 1.4142, 9, 2, 0],
[1.4142, 0, 0, 1, 1, 3]])
I want to convert all the elements of this array being different to 1 or different to sqrt(2) (1.4142) to 0. That is:
a = array([[0, 0, 1, 1.4142, 0, 0],
[0, 0, 1.4142, 0, 0, 0],
[1.4142, 0, 0, 1, 1, 0]])
I have tried this
a[(a != 1).any() or not (np.isclose(a, np.sqrt(2))).any()] = 0
and some variations but I can't make it to work. Thx.

Just use masking -
m1 = np.isclose(a,1) # use a==1 for exact matches
m2 = np.isclose(a,np.sqrt(2))
a[~(m1 | m2)] = 0

You can try it:
np.where((a == 1.4142), a, a == 1)

why not to check sum and product of elements for both arrays? correct if I am wrong this should work for positive numbers.

Concatenate two numpy arrays so that index order keeps the same?

Assume I have two numpy arrays as follows:
{0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
Now I want to replace each array with the ID at the front, i.e. the values in array 0 become 0 and in array 1 become 1, then both arrays should be merged, whereby the index order must be correct.
I.e. desired output:
array([1, 0, 1, 0, 1, 0, 0 ,0])
But that's what I get:
np.concatenate((h1,h2), axis=0)
array([0, 0, 0, 0, 0, 1, 1, 1])
(Each array contains only unique values, if this helps.)
How can this be done?

Your description of merging is a bit unclear. But here's something that makes sense
In [399]: dd ={0: np.array([ 2, 4, 8, 9, 12]),
...: 1: np.array([ 1, 3, 5])}
In [403]: res = np.zeros(13, int)
In [404]: res[dd[0]] = 0
In [405]: res[dd[1]] = 1
In [406]: res
Out[406]: array([0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0])
Or to make the assignments clearer:
In [407]: res = np.zeros(13, int)
In [408]: res[dd[0]] = 2
In [409]: res[dd[1]] = 1
In [410]: res
Out[410]: array([0, 1, 2, 1, 2, 1, 0, 0, 2, 2, 0, 0, 2])
Otherwise the talk index positions doesn't make a whole lot of sense.

Something like this?
d = {0: array([ 2, 4, 8, 9, 12], dtype=int64),
1: array([ 1, 3, 5], dtype=int64)}
(np.concatenate([d[0],d[1]]).argsort(kind="stable")>=len(d[0])).view(np.uint8)
# array([1, 0, 1, 0, 1, 0, 0, 0], dtype=uint8)

.concatenate Just appends lists/arrays.
Maybe an unconventional way to go about it, but you could repeat the [0 1] pattern for the len of the shortest array, using numpy.repeat and then add repeated 1 values for the difference of the two arrays?
if len(h1) > len(h2):
temp = len(h2)
else:
temp = len(h1)
diff = abs(h1-h2)
for i in range(temp):
A = numpy.repeat(0, 1)
for i in range(diff):
B = numpy.repeat(1)
C = numpy.concatenate((A,B), axis=0)
Maybe not the most dynamic or kindest way to go about this but if your solution requires just that, then it could do the job in the meantime.

Calculate cluster accuracy of two clustering outcomes

So say I have two clustering outcomes that look like this:
clustering = [[8, 9, 10, 11], [14, 13, 4, 7, 6, 12, 5, 15], [1, 2, 0, 3]]
correct_clustering = [[2, 8, 10, 0, 15], [12, 13, 9, 14], [11, 3, 5, 1, 4, 6, 7]]
How would I go about comparing the outcome contained in clustering to the one contained in correct_clustering. I want to have some number between 0 and 1. I was thinking about calculating the fraction of pairs which are correctly clustered together in the same cluster. But can't think of a programmatic way to solve this.

The best practice measures are indeed based on pair counting.
In particular the adjusted Rand index (ARI) is the standard measure here.
You don't actually count pairs, but the number of pairs from a set can trivially be computed using the binomial, simply (n*(n-1))>>2.
You'll need this for each cluster and each cluster intersection.
The results of all intersections are aggregated, and it is easy to see that this is invariant to the permutation of clusters (and hence to the cluster labels). The Rand index is the accuracy of predicting whether two objects a, b are in the same cluster, or in different clusters. The ARI improves this by adjusting for chance: in a very unbalanced problem, a random result can score a high accuracy, but in ARI it is close to 0 on average.

Use the Rand Index:
import numpy as np
from scipy.special import comb
def rand_index_score(clusters, classes):
tp_plus_fp = comb(np.bincount(clusters), 2).sum()
tp_plus_fn = comb(np.bincount(classes), 2).sum()
A = np.c_[(clusters, classes)]
tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
for i in set(clusters))
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(len(A), 2) - tp - fp - fn
return (tp + tn) / (tp + fp + fn + tn)
clusters = [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]
classes = [0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 2, 1, 0, 2, 2, 2, 0]
rand_index_score(clusters, classes)
0.6764705882352942

You can use the function adjusted_rand_score in sklearn:
from sklearn.metrics import adjusted_rand_score
clustering = sorted((i, num) for num, lst in enumerate(clustering) for i in lst)
clustering = [i for _, i in clustering]
# [2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]
correct_clustering = sorted((i, num) for num, lst in enumerate(correct_clustering) for i in lst)
correct_clustering = [i for _, i in correct_clustering]
# [0, 2, 0, 2, 2, 2, 2, 2, 0, 1, 0, 2, 1, 1, 1, 0]
ari = adjusted_rand_score(correct_clustering, clustering)
# -0.012738853503184737
The function returns values between 1 and -1 so to get a value between 0 and 1 you need to rescale:
ari_scaled = (ari + 1) / 2
# 0.49363057324840764

In numpy how would be convert values internally

I am new to numpy library.
How values are converted to get below output and internally how the values are changed?
>>> np.convolve([1, 2, 3], [0, 1, 0.5])
o/p: array([ 0. , 1. , 2.5, 4. , 1.5])

np.convolve(a, v, mode='full') function just pushes the first array a sliding on the second array v step by step from left to right. On every step, we just calculate v[i]*a and get v[i]a[0], v[i]a[1], ..., v[i]a[n]. We get len(v) arrays/lists, adding them together.
The result of np.convolve([1, 2, 3], [0, 1, 0.5]) is calculated as following:
step 1:
1, 2, 3
0, 1, 0.5
a=0, 0, 0,
step two:
1, 2, 3
0, 1, 0.5
b=0, 1, 2, 3,
step three:
1, 2, 3
0, 1, 0.5
c=0, 0, 0.5, 1, 1.5
finally, adding a, b, and c:
0, 0, 0,
+ 0, 1, 2, 3,
+ 0, 0, 0.5, 1, 1.5
-------------------
= 0, 1, 2.5, 4, 1.5

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

From list comprehension to numpy.where() - python

Here's how you can do it using np.where: np.where(amplitude < threshold, low, high) # array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])

you can do it without where: new_ampl2 = (amplitude > 0.5).astype(np.int32) print(new_ampl2) Out[11]: array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])

Related

How to replace every third 0 in the numpy array consists of 0 and 1?

Comparing an array with an exact value and an approximate value in Python

Concatenate two numpy arrays so that index order keeps the same?

Calculate cluster accuracy of two clustering outcomes

In numpy how would be convert values internally

Categories

Resources