Find with python numpy the amount of combinations in tuple - python

I have a tuple (1,2,5,3,2,1,3,4,1) and want to find the amount of combinations of 1 and 2. So in this example it should return 2 because there are 3 times a 1 but only 2 times a 2.
The task that I want to solve is:
Give the amount of possible combinations by 1 and 2. Each number can only be used once for a combination.
I already solved the issue with this code:
count1 = tuple.count(number1)
count2 = tuple.count(number2)
if count1 < count2:
return count1
else:
return count2
return count1
But because I want to learn more magic things with numpy, I would like to know if there is a better solution to use it here.

Your if/else can be expressed more compactly with min:
In [707]: tup = (1,2,5,3,2,1,3,4,1)
In [708]: max(tup.count(1), tup.count(2))
Out[708]: 3
In [709]: min(tup.count(1),tup.count(2))
Out[709]: 2
numpy won't improve on this. There is a bincount, that counts all values in a range.
In [710]: arr = np.array(tup)
In [711]: arr
Out[711]: array([1, 2, 5, 3, 2, 1, 3, 4, 1])
In [712]: np.bincount(arr)
Out[712]: array([0, 3, 2, 2, 1, 1], dtype=int32)
We could select a could select a couple of values and as before get their min:
In [716]: np.bincount(arr)[[1,2]]
Out[716]: array([3, 2], dtype=int32)
In [717]: min(np.bincount(arr)[[1,2]])
Out[717]: 2
Keep in mind that np.array(tup) takes time; so sticking with list operations is often faster.
Another array approach is to use a broadcasted == test
In [532]: arr == [[1],[2]]
Out[532]:
array([[ True, False, False, False, False, True, False, False, True],
[False, True, False, False, True, False, False, False, False]], dtype=bool)
In [533]: _.sum(axis=1)
Out[533]: array([3, 2])
using sum on the booleans to count them.
There's also Counter class that can count all values in one call:
In [534]: from collections import Counter
In [535]: Counter(tup)
Out[535]: Counter({1: 3, 2: 2, 3: 2, 4: 1, 5: 1})
In [536]: min(_[1],_[2])
Out[536]: 2

Related

How to count repeated elements in a numpy 2d array?

I have many very large padded numpy 2d arrays, simplified to array A, shown below. Array Z is the basic pad array:
A = np.array(([1 , 2, 3], [2, 3, 4], [0, 0, 0], [0, 0, 0], [0, 0, 0]))
Z = np.array([0, 0, 0])
How to count the number of pads in array A in the simplest / fastest pythonic way?
This works (zCount=3), but seems verbose, loopy and unpythonic:
zCount = 0
for a in A:
if a.any() == Z.any():
zCount += 1
zCount
Also tried a one-line list comprehension, which doesn't work (dont know why not):
[zCount += 1 for a in A if a.any() == Z.any()]
zCount
Also tried a list count, but 'truth value of array with more than one element is ambiguous':
list(A).count(Z)
Have searched for a simple numpy expression without success. np.count_nonzero gives full elementwise boolean for [0]. Is there a one-word / one-line counting expression for [0, 0, 0]? (My actual arrays are approx. shape (100,30) and I have up to millions of these. I am trying to deal with them in batches, so any simple time savings generating a count would be helpful). thx
Try:
>>> np.equal(A, Z).all(axis=1).sum()
3
Step by step:
>>> np.equal(A, Z)
array([[False, False, False],
[False, False, False],
[ True, True, True],
[ True, True, True],
[ True, True, True]])
>>> np.equal(A, Z).all(axis=1)
array([False, False, True, True, True])
>>> np.equal(A, Z).all(axis=1).sum()
3

NumPy mean of multiple slices

Let's say I have an array a where I would like to calculate the mean across multiple slices defined in idx:
a = np.arange(10)
idx = np.random.choice([0,1], a.size).astype(bool)
a, idx
Out[1]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([False, False, True, True, False, False, False, True, True,
True]))
With the desired output:
array([2.5, 8. ])
Of course I could write a simple for loop, but I would prefer a fully vectorized approach given that the arrays sizes can become quite big and 2D.
It's possible to do it completely vectorized:
edges = np.diff(idx.astype(np.int8), prepend=0, append=0)
rising = np.where(edges == 1)[0]
falling = np.where(edges == -1)[0]
cum = np.insert(np.cumsum(a), 0, 0)
means = (cum[falling] - cum[rising]) / (falling - rising)
This takes about 0.2 seconds on my machine with a = np.arange(10**7).

Count how many times a value is exceeded in a list

in a Python list, I need to count how many times a value is exceeded.
This code counts how many values exceed a limit.
Suppose I have this example, and I want to count how many time 2 is exceeded.
array = [1, 2, 3, 4, 1, 2, 3, 1]
a = pd.Series(array)
print(len(a[a >= 2]))
# prints 5
How can I collapse consecutive values, such that 2 is returned instead?
First compute exc = a.ge(2) - a Series answering the question:
Does the current value is >= 2.
Then, to get a number of sequences of "exceeding" elements, run:
result = (exc.shift().ne(exc) & exc).sum()
The result for your data is just 2.
I think you are very close.
>>> a = [1, 2, 3, 4, 1, 2, 3, 1]
>>> b = a >= 2
>>> b
array([False, True, True, True, False, True, True, False])
Now, instead of counting Trues, you need to count how many times you see False, True. you can compare each item in b to the item before it, b[i] > b[i-1], to find False, Trues. and you need to consider the start of the array a as well.
>>> c = np.r_[ b[0], b[1:] > b[:-1] ]
>>> c
array([ False, True, False, False, False, True, False, False])
>>> np.sum( c )
2
where
>>> b[1:]
array([ True, True, True, False, True, True, False])
>>> b[:-1]
array([False, True, True, True, False, True, True])
You can use a set to remove duplicates before converting it to a numpy array.
import numpy as np
array = [1, 2, 3, 4, 1, 2, 3, 1]
arr_set = set(array)
a = pd.Series(list(arr_set))
print(len(a[a >= 2]))
You can also do this with numpy by only showing unique values and then filtering.
len(a.unique()[a.unique() >= 2])

What is the '<' doing in this line?: data += dt < b

Input:
dt = [6,7,8,9,10]
data = [1,2,3,4,5]
b = 8.0
b = np.require(b, dtype=np.float)
data += dt < b
data
Output:
array([2, 3, 3, 4, 5])
I tried to input different number but still couldn't figure out what's the "<" doing there....
Also, it seems to work only when b is np.float (hence the conversion).
The < with numpy arrays does an element-wise comparison. That means it returns an array where there is a True where the condition is true and False if not. The np.require line is necessary here so it actually uses NumPy arrays. You could drop the np.require if you converted your data and dt to np.arrays beforehand.
Then the result is added (element-wise) to the numeric array. In this context True is equal to 1 and False to zero.
>>> dt < b # which elements are smaller than b?
array([ True, True, False, False, False])
>>> 0 + (dt < b) # boolean arrays in arithmetic operations with numbers
array([1, 1, 0, 0, 0])
So it adds 1 to every element of data where the element in dt is smaller than 8.
dt is a list:
In [50]: dt = [6,7,8,9,10]
In [51]: dt < 8
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-3d06f93227f5> in <module>()
----> 1 dt < 8
TypeError: '<' not supported between instances of 'list' and 'int'
< (.__lt__) is not defined for lists.
But if one element of the comparison is an ndarray, then the numpy definition of __lt__ applies. dt is turned into an array, and it does an element by element comparison.
In [52]: dt < np.array(8)
Out[52]: array([ True, True, False, False, False])
In [53]: np.array(dt) < 8
Out[53]: array([ True, True, False, False, False])
numpy array operations also explain the data += part:
In [54]: data = [1,2,3,4,5] # a list
In [55]: data + (dt < np.array(8)) # list=>array, and boolean array to integer array
Out[55]: array([2, 3, 3, 4, 5])
In [56]: data
Out[56]: [1, 2, 3, 4, 5]
In [57]: data += (dt < np.array(8))
In [58]: data
Out[58]: array([2, 3, 3, 4, 5])
Actually I'm a bit surprised that with the += data has been changed from list to array. It means the data+=... has been implemented as an assignment:
data = data + (dt <np.array(8))
Normally + for a list is a concatenate:
In [61]: data += ['a','b','c']
In [62]: data
Out[62]: [1, 2, 3, 4, 5, 'a', 'b', 'c']
# equivalent of: data.extend(['a','b','c'])
You can often get away with using lists in array contexts, but it's better to make objects arrays, so you do get these implicit, and sometimes unexpected, conversions.
This is just an alias (or shortcut or convenience notation) to the equivalent function: numpy.less()
In [116]: arr1 = np.arange(8)
In [117]: scalar = 6.0
# comparison that generates a boolean mask
In [118]: arr1 < scalar
Out[118]: array([ True, True, True, True, True, True, False, False])
# same operation as above
In [119]: np.less(arr1, scalar)
Out[119]: array([ True, True, True, True, True, True, False, False])
Let's see how this boolean array can be added to a non-boolean array in this case. It is possible due to type coercion
# sample array
In [120]: some_arr = np.array([1, 1, 1, 1, 1, 1, 1, 1])
# addition after type coercion
In [122]: some_arr + (arr1 < scalar)
Out[122]: array([2, 2, 2, 2, 2, 2, 1, 1])
# same output achieved with `numpy.less()`
In [123]: some_arr + np.less(arr1, scalar)
Out[123]: array([2, 2, 2, 2, 2, 2, 1, 1])
So, type coercion happens on the boolean array and then addition is performed.

Vectorized element assignment involving comparisons between matrices in Numpy

I'm currently trying to replace the for-loops in this code chunk with vectorized operations in Numpy:
def classifysignal(samplemat, binedges, nbinmat, nodatacode):
ndata, nsignals = np.shape(samplemat)
classifiedmat = np.zeros(shape=(ndata, nsignals))
ncounts = 0
for i in range(ndata):
for j in range(nsignals):
classifiedmat[i,j] = nbinmat[j]
for e in range(nbinmat[j]):
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
break
elif samplemat[i,j] <= binedges[j, e]:
classifiedmat[i,j] = e
ncounts += 1
break
ncounts = float(ncounts/nsignals)
return classifiedmat, ncounts
However, I'm having a little trouble conceptualizing how to replace the third for loop (i.e. the one beginning with for e in range(nbinmat[j]), since it entails comparing individual elements of two separate matrices before assigning a value, with the indices of these elements (i and e) being completely decoupled from each other. Is there a simple way to do this using whole-array operations, or would sticking with for-loops be best?
PS: My first Stackoverflow question, so if anything's unclear/more details are needed, please let me know! Thanks.
Without some concrete examples and explanation it's hard (or at least work) to figure out what you are trying to do, especially in the inner loop. So let's tackle a few pieces and try to simplify them
In [59]: C=np.zeros((3,4),int)
In [60]: N=np.arange(4)
In [61]: C[:]=N
In [62]: C
Out[62]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
means that classifiedmat[i,j] = nbinmat[j] can be moved out of the loops
classifiedmat = np.zeros(samplemat.shape)
classifiedmat[:] = nbinmat
and
In [63]: S=np.arange(12).reshape(3,4)
In [64]: C[S>8]=99
In [65]: C
Out[65]:
array([[ 0, 1, 2, 3],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
suggests that
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
could be replaced with
classifiedmat[samplemat==nodatacode] = nodatacode
I haven't worked out whether loop and break modifies this replacement or not.
a possible model for inner loop is:
In [83]: B=np.array((np.arange(4),np.arange(2,6))).T
In [84]: for e in range(2):
C[S<=B[:,e]]=e
....:
In [85]: C
Out[85]:
array([[ 1, 1, 1, 1],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
You could also compare all values of S and B with:
In [86]: S[:,:,None]<=B[None,:,:]
Out[86]:
array([[[ True, True],
[ True, True],
[ True, True],
[ True, True]],
[[False, False],
[False, False],
[False, False],
[False, False]],
[[False, False],
[False, False],
[False, False],
[False, False]]], dtype=bool)
The fact that you are iterating over:
for e in range(nbinmat[j]):
may throw out all these equivalences. I'm not going try to figure out its significance. But maybe I've given you some ideas.
Well, if you want to use vector operations you need to solve the problem using linear algebra. I can't rethink the problem for you, but the general approach I would take is something like:
res = Subtract samplemat from binedges
res = Normalize values in res to 0 and 1 (use clip?). i.e if > 0, then 1 else 0.
ncount = sum ( res )
classifiedMat = res * binedges
And so on.

Categories