Python: Finding differences between elements of a list - python

Given a list of numbers, how does one find differences between every (i)-th elements and its (i+1)-th?
Is it better to use a lambda expression or maybe a list comprehension?
For example:
Given a list t=[1,3,6,...], the goal is to find a list v=[2,3,...] because 3-1=2, 6-3=3, etc.

>>> t
[1, 3, 6]
>>> [j-i for i, j in zip(t[:-1], t[1:])] # or use itertools.izip in py2k
[2, 3]

The other answers are correct but if you're doing numerical work, you might want to consider numpy. Using numpy, the answer is:
v = numpy.diff(t)

If you don't want to use numpy nor zip, you can use the following solution:
>>> t = [1, 3, 6]
>>> v = [t[i+1]-t[i] for i in range(len(t)-1)]
>>> v
[2, 3]

Starting in Python 3.10, with the new pairwise function it's possible to slide through pairs of elements and thus map on rolling pairs:
from itertools import pairwise
[y-x for (x, y) in pairwise([1, 3, 6, 7])]
# [2, 3, 1]
The intermediate result being:
pairwise([1, 3, 6, 7])
# [(1, 3), (3, 6), (6, 7)]

You can use itertools.tee and zip to efficiently build the result:
from itertools import tee
# python2 only:
#from itertools import izip as zip
def differences(seq):
iterable, copied = tee(seq)
next(copied)
for x, y in zip(iterable, copied):
yield y - x
Or using itertools.islice instead:
from itertools import islice
def differences(seq):
nexts = islice(seq, 1, None)
for x, y in zip(seq, nexts):
yield y - x
You can also avoid using the itertools module:
def differences(seq):
iterable = iter(seq)
prev = next(iterable)
for element in iterable:
yield element - prev
prev = element
All these solution work in constant space if you don't need to store all the results and support infinite iterables.
Here are some micro-benchmarks of the solutions:
In [12]: L = range(10**6)
In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop
In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop
In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop
And the other proposed solutions:
In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop
In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop
In [20]: import numpy as np
In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop
In [35]: %%timeit
...: res = []
...: for i in range(len(L) - 1):
...: res.append(L[i+1] - L[i])
...:
1 loops, best of 3: 234 ms per loop
Note that:
zip(L[1:], L) is equivalent to zip(L[1:], L[:-1]) since zip already terminates on the shortest input, however it avoids a whole copy of L.
Accessing the single elements by index is very slow because every index access is a method call in python
numpy.diff is slow because it has to first convert the list to a ndarray. Obviously if you start with an ndarray it will be much faster:
In [22]: arr = np.array(L)
In [23]: %timeit np.diff(arr)
100 loops, best of 3: 3.02 ms per loop

I would suggest using
v = np.diff(t)
this is simple and easy to read.
But if you want v to have the same length as t then
v = np.diff([t[0]] + t) # for python 3.x
or
v = np.diff(t + [t[-1]])
FYI: this will only work for lists.
for numpy arrays
v = np.diff(np.append(t[0], t))

Using the := walrus operator available in Python 3.8+:
>>> t = [1, 3, 6]
>>> prev = t[0]; [-prev + (prev := x) for x in t[1:]]
[2, 3]

A functional approach:
>>> import operator
>>> a = [1,3,5,7,11,13,17,21]
>>> map(operator.sub, a[1:], a[:-1])
[2, 2, 2, 4, 2, 4, 4]
Using generator:
>>> import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)
>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))
[1, 3, 5, 7]
Using indices:
>>> [a[i+1]-a[i] for i in xrange(len(a)-1)]
[2, 2, 2, 4, 2, 4, 4]

Ok. I think I found the proper solution:
v = [x[0]-x[1] for x in zip(t[1:],t[:-1])]

I suspect this is what the numpy diff command does anyway, but just for completeness you can simply difference the sub-vectors:
from numpy import array as a
a(x[1:])-a(x[:-1])
In addition, I wanted to add these solutions to generalizations of the question:
Solution with periodic boundaries
Sometimes with numerical integration you will want to difference a list with periodic boundary conditions (so the first element calculates the difference to the last. In this case the numpy.roll function is helpful:
v-np.roll(v,1)
Solutions with zero prepended
Another numpy solution (just for completeness) is to use
numpy.ediff1d(v)
This works as numpy.diff, but only on a vector (it flattens the input array). It offers the ability to prepend or append numbers to the resulting vector. This is useful when handling accumulated fields that is often the case fluxes in meteorological variables (e.g. rain, latent heat etc), as you want a resulting list of the same length as the input variable, with the first entry untouched.
Then you would write
np.ediff1d(v,to_begin=v[0])
Of course, you can also do this with the np.diff command, in this case though you need to prepend zero to the series with the prepend keyword:
np.diff(v,prepend=0.0)
All the above solutions return a vector that is the same length as the input.

You can also convert the difference into an easily readable transition matrix using
v = t.reshape((c,r)).T - t.T
Where c = number of items in the list and r = 1 since a list is basically a vector or a 1d array.

My way
>>>v = [1,2,3,4,5]
>>>[v[i] - v[i-1] for i, value in enumerate(v[1:], 1)]
[1, 1, 1, 1]

Related

What is the alternative for numpy bincount when using negative integers? [duplicate]

Suppose I have the following NumPy array:
a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])
How can I find the most frequent number in this array?
If your list contains all non-negative ints, you should take a look at numpy.bincounts:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
and then probably use np.argmax:
a = np.array([1,2,3,1,2,1,1,1,3,2,2,1])
counts = np.bincount(a)
print(np.argmax(counts))
For a more complicated list (that perhaps contains negative numbers or non-integer values), you can use np.histogram in a similar way. Alternatively, if you just want to work in python without using numpy, collections.Counter is a good way of handling this sort of data.
from collections import Counter
a = [1,2,3,1,2,1,1,1,3,2,2,1]
b = Counter(a)
print(b.most_common(1))
You may use
values, counts = np.unique(a, return_counts=True)
ind = np.argmax(counts)
print(values[ind]) # prints the most frequent element
ind = np.argpartition(-counts, kth=10)[:10]
print(values[ind]) # prints the 10 most frequent elements
If some element is as frequent as another one, this code will return only the first element.
If you're willing to use SciPy:
>>> from scipy.stats import mode
>>> mode([1,2,3,1,2,1,1,1,3,2,2,1])
(array([ 1.]), array([ 6.]))
>>> most_frequent = mode([1,2,3,1,2,1,1,1,3,2,2,1])[0][0]
>>> most_frequent
1.0
Performances (using iPython) for some solutions found here:
>>> # small array
>>> a = [12,3,65,33,12,3,123,888000]
>>>
>>> import collections
>>> collections.Counter(a).most_common()[0][0]
3
>>> %timeit collections.Counter(a).most_common()[0][0]
100000 loops, best of 3: 11.3 µs per loop
>>>
>>> import numpy
>>> numpy.bincount(a).argmax()
3
>>> %timeit numpy.bincount(a).argmax()
100 loops, best of 3: 2.84 ms per loop
>>>
>>> import scipy.stats
>>> scipy.stats.mode(a)[0][0]
3.0
>>> %timeit scipy.stats.mode(a)[0][0]
10000 loops, best of 3: 172 µs per loop
>>>
>>> from collections import defaultdict
>>> def jjc(l):
... d = defaultdict(int)
... for i in a:
... d[i] += 1
... return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)[0]
...
>>> jjc(a)[0]
3
>>> %timeit jjc(a)[0]
100000 loops, best of 3: 5.58 µs per loop
>>>
>>> max(map(lambda val: (a.count(val), val), set(a)))[1]
12
>>> %timeit max(map(lambda val: (a.count(val), val), set(a)))[1]
100000 loops, best of 3: 4.11 µs per loop
>>>
Best is 'max' with 'set' for small arrays like the problem.
According to #David Sanders, if you increase the array size to something like 100,000 elements, the "max w/set" algorithm ends up being the worst by far whereas the "numpy bincount" method is the best.
Starting in Python 3.4, the standard library includes the statistics.mode function to return the single most common data point.
from statistics import mode
mode([1, 2, 3, 1, 2, 1, 1, 1, 3, 2, 2, 1])
# 1
If there are multiple modes with the same frequency, statistics.mode returns the first one encountered.
Starting in Python 3.8, the statistics.multimode function returns a list of the most frequently occurring values in the order they were first encountered:
from statistics import multimode
multimode([1, 2, 3, 1, 2])
# [1, 2]
Also if you want to get most frequent value(positive or negative) without loading any modules you can use the following code:
lVals = [1,2,3,1,2,1,1,1,3,2,2,1]
print max(map(lambda val: (lVals.count(val), val), set(lVals)))
While most of the answers above are useful, in case you:
1) need it to support non-positive-integer values (e.g. floats or negative integers ;-)), and
2) aren't on Python 2.7 (which collections.Counter requires), and
3) prefer not to add the dependency of scipy (or even numpy) to your code, then a purely python 2.6 solution that is O(nlogn) (i.e., efficient) is just this:
from collections import defaultdict
a = [1,2,3,1,2,1,1,1,3,2,2,1]
d = defaultdict(int)
for i in a:
d[i] += 1
most_frequent = sorted(d.iteritems(), key=lambda x: x[1], reverse=True)[0]
In Python 3 the following should work:
max(set(a), key=lambda x: a.count(x))
I like the solution by JoshAdel.
But there is just one catch.
The np.bincount() solution only works on numbers.
If you have strings, collections.Counter solution will work for you.
Here is a general solution that may be applied along an axis, regardless of values, using purely numpy. I've also found that this is much faster than scipy.stats.mode if there are a lot of unique values.
import numpy
def mode(ndarray, axis=0):
# Check inputs
ndarray = numpy.asarray(ndarray)
ndim = ndarray.ndim
if ndarray.size == 1:
return (ndarray[0], 1)
elif ndarray.size == 0:
raise Exception('Cannot compute mode on empty array')
try:
axis = range(ndarray.ndim)[axis]
except:
raise Exception('Axis "{}" incompatible with the {}-dimension array'.format(axis, ndim))
# If array is 1-D and numpy version is > 1.9 numpy.unique will suffice
if all([ndim == 1,
int(numpy.__version__.split('.')[0]) >= 1,
int(numpy.__version__.split('.')[1]) >= 9]):
modals, counts = numpy.unique(ndarray, return_counts=True)
index = numpy.argmax(counts)
return modals[index], counts[index]
# Sort array
sort = numpy.sort(ndarray, axis=axis)
# Create array to transpose along the axis and get padding shape
transpose = numpy.roll(numpy.arange(ndim)[::-1], axis)
shape = list(sort.shape)
shape[axis] = 1
# Create a boolean array along strides of unique values
strides = numpy.concatenate([numpy.zeros(shape=shape, dtype='bool'),
numpy.diff(sort, axis=axis) == 0,
numpy.zeros(shape=shape, dtype='bool')],
axis=axis).transpose(transpose).ravel()
# Count the stride lengths
counts = numpy.cumsum(strides)
counts[~strides] = numpy.concatenate([[0], numpy.diff(counts[~strides])])
counts[strides] = 0
# Get shape of padded counts and slice to return to the original shape
shape = numpy.array(sort.shape)
shape[axis] += 1
shape = shape[transpose]
slices = [slice(None)] * ndim
slices[axis] = slice(1, None)
# Reshape and compute final counts
counts = counts.reshape(shape).transpose(transpose)[slices] + 1
# Find maximum counts and return modals/counts
slices = [slice(None, i) for i in sort.shape]
del slices[axis]
index = numpy.ogrid[slices]
index.insert(axis, numpy.argmax(counts, axis=axis))
return sort[index], counts[index]
Expanding on this method, applied to finding the mode of the data where you may need the index of the actual array to see how far away the value is from the center of the distribution.
(_, idx, counts) = np.unique(a, return_index=True, return_counts=True)
index = idx[np.argmax(counts)]
mode = a[index]
Remember to discard the mode when len(np.argmax(counts)) > 1
You can use the following approach:
x = np.array([[2, 5, 5, 2], [2, 7, 8, 5], [2, 5, 7, 9]])
u, c = np.unique(x, return_counts=True)
print(u[c == np.amax(c)])
This will give the answer: array([2, 5])
Using np.bincount and the np.argmax method can get the most common value in a numpy array. If your array is an image array, use the np.ravel or np.flatten() methods to convert a ndarray to a 1-dimensional array.
I'm recently doing a project and using collections.Counter.(Which tortured me).
The Counter in collections have a very very bad performance in my opinion. It's just a class wrapping dict().
What's worse, If you use cProfile to profile its method, you should see a lot of '__missing__' and '__instancecheck__' stuff wasting the whole time.
Be careful using its most_common(), because everytime it would invoke a sort which makes it extremely slow. and if you use most_common(x), it will invoke a heap sort, which is also slow.
Btw, numpy's bincount also have a problem: if you use np.bincount([1,2,4000000]), you will get an array with 4000000 elements.

How to tell when more than one index matches?

I have any array of values, that are often the same and I am trying to find the index of the smallest one. But I want to know all the objects that are the same.
So for example I have the array a = [1, 2, 3, 4] and to find the index of the smallest one I use a.index(min(a)) and this returns 0. But if I had an array of a = [1, 1, 1, 1], using the same thing would still return 0.
I want to know that multiple indices match what I am searching for and what those indices are. How would I go about doing this?
list.index(value) returns the index of the first occurrence of value in list.
A better idea is to use a simple list comprehension and enumerate:
indices = [i for i, x in enumerate(iterable) if x == v]
where v is the value you want to search for and iterable is an object that supports iterator protocol e.g. it can be a generator or a sequence (like list).
For your specific use case, that'll look like
def smallest(seq):
m = min(seq)
return [i for i, x in enumerate(seq) if x == m]
Some examples:
In [23]: smallest([1, 2, 3, 4])
Out[23]: [0]
In [24]: smallest([1, 1, 1, 1])
Out[24]: [0, 1, 2, 3]
If you're not sure whether the seq is empty or not, you can pass the default=-1 (or some other value) argument to min function (in Python 3.4+):
m = min(seq, default=-1)
Consider using m = min(seq or (-1,)) (again, any value) instead, when using older Python.
A different approach using numpy.where could look like
In [1]: import numpy as np
In [2]: def np_smallest(seq):
...: return np.where(seq==seq.min())[0]
In [3]: np_smallest(np.array([1,1,1,1]))
Out[3]: array([0, 1, 2, 3])
In [4]: np_smallest(np.array([1,2,3,4]))
Out[4]: array([0])
This approach is slighly less efficient than the list comprehension for small list but if you face large arrays, numpy may save you some time.
In [5]: seq = np.random.randint(100, size=1000)
In [6]: %timeit np_smallest(seq)
100000 loops, best of 3: 10.1 µs per loop
In [7]: %timeit smallest(seq)
1000 loops, best of 3: 194 µs per loop
Here is my solution:
def all_smallest(seq):
"""Takes sequence, returns list of all smallest elements"""
min_i = min(seq)
amount = seq.count(min_i)
ans = []
if amount > 1:
for n, i in enumerate(seq):
if i == min_i:
ans.append(n)
if len(ans) == amount:
return ans
return [seq.index(min_i)]
Code very straightforward I think here all clear without any explanation.

Grouping repetitions in an array? [duplicate]

This question already has an answer here:
What's the most Pythonic way to identify consecutive duplicates in a list?
(1 answer)
Closed 9 years ago.
I am looking for a function that gets a one dimensional sorted array and returns
a two dimensional array with two columns, first column containing non-repeated
items and second column containing number of repetiotions of the item. Right now
my code is as follows:
def priorsGrouper(priors):
if priors.size==0:
ret=priors;
elif priors.size==1:
ret=priors[0],1;
else:
ret=numpy.zeros((1,2));
pointer1,pointer2=0,0;
while(pointer1<priors.size):
counter=0;
while(pointer2<priors.size and priors[pointer2]==priors[pointer1]):
counter+=1;
pointer2+=1;
ret=numpy.row_stack((ret,[priors[pointer1],pointer2-pointer1]))
pointer1=pointer2;
return ret;
print priorsGrouper(numpy.array([1,2,2,3]))
My output is as follows:
[[ 0. 0.]
[ 1. 1.]
[ 2. 2.]
[ 3. 1.]]
First of all I cannot get rid of my [0,0]. Secondly I want to know if there is
a numpy or scipy function for this or is mine OK?
Thanks.
You could use np.unique to get the unique values in x, as well as an array of indices (called inverse). The inverse can be thought of as "labels" for the elements in x. Unlike x itself, the labels are always integers, starting at 0.
Then you can take a bincount of the labels. Since the labels start at 0, the bincount won't be filled with a lot of zeros that you don't care about.
Finally, column_stack will join y and the bincount into a 2D array:
In [84]: x = np.array([1,2,2,3])
In [85]: y, inverse = np.unique(x, return_inverse=True)
In [86]: y
Out[86]: array([1, 2, 3])
In [87]: inverse
Out[87]: array([0, 1, 1, 2])
In [88]: np.bincount(inverse)
Out[88]: array([1, 2, 1])
In [89]: np.column_stack((y,np.bincount(inverse)))
Out[89]:
array([[1, 1],
[2, 2],
[3, 1]])
Sometimes when an array is small, it turns out that using plain Python methods are faster than NumPy functions. I wanted to check if that was the case here, and, if so, how large x would have to be before NumPy methods are faster.
Here is a graph of the performance of various methods as a function of the size of x:
In [173]: x = np.random.random(1000)
In [174]: x.sort()
In [156]: %timeit using_unique(x)
10000 loops, best of 3: 99.7 us per loop
In [180]: %timeit using_groupby(x)
100 loops, best of 3: 3.64 ms per loop
In [157]: %timeit using_counter(x)
100 loops, best of 3: 4.31 ms per loop
In [158]: %timeit using_ordered_dict(x)
100 loops, best of 3: 4.7 ms per loop
For len(x) of 1000, using_unique is over 35x faster than any of the plain Python methods tested.
So it looks like using_unique is fastest, even for very small len(x).
Here is the program used to generate the graph:
import numpy as np
import collections
import itertools as IT
import matplotlib.pyplot as plt
import timeit
def using_unique(x):
y, inverse = np.unique(x, return_inverse=True)
return np.column_stack((y, np.bincount(inverse)))
def using_counter(x):
result = collections.Counter(x)
return np.array(sorted(result.items()))
def using_ordered_dict(x):
result = collections.OrderedDict()
for item in x:
result[item] = result.get(item,0)+1
return np.array(result.items())
def using_groupby(x):
return np.array([(k, sum(1 for i in g)) for k, g in IT.groupby(x)])
fig, ax = plt.subplots()
timing = collections.defaultdict(list)
Ns = [int(round(n)) for n in np.logspace(0, 3, 10)]
for n in Ns:
x = np.random.random(n)
x.sort()
timing['unique'].append(
timeit.timeit('m.using_unique(m.x)', 'import __main__ as m', number=1000))
timing['counter'].append(
timeit.timeit('m.using_counter(m.x)', 'import __main__ as m', number=1000))
timing['ordered_dict'].append(
timeit.timeit('m.using_ordered_dict(m.x)', 'import __main__ as m', number=1000))
timing['groupby'].append(
timeit.timeit('m.using_groupby(m.x)', 'import __main__ as m', number=1000))
ax.plot(Ns, timing['unique'], label='using_unique')
ax.plot(Ns, timing['counter'], label='using_counter')
ax.plot(Ns, timing['ordered_dict'], label='using_ordered_dict')
ax.plot(Ns, timing['groupby'], label='using_groupby')
plt.legend(loc='best')
plt.ylabel('milliseconds')
plt.xlabel('size of x')
plt.show()
If order is not important, use Counter.
from collections import Counter
% Counter([1,2,2,3])
= Counter({2: 2, 1: 1, 3: 1})
% Counter([1,2,2,3]).items()
[(1, 1), (2, 2), (3, 1)]
To preserve order (by first appearance), you can implement your own version of Counter:
from collections import OrderedDict
def OrderedCounter(seq):
res = OrderedDict()
for x in seq:
res.setdefault(x, 0)
res[x] += 1
return res
% OrderedCounter([1,2,2,3])
= OrderedDict([(1, 1), (2, 2), (3, 1)])
% OrderedCounter([1,2,2,3]).items()
= [(1, 1), (2, 2), (3, 1)]
If you want to count repetitions of an item you can use a dictionary:
l = [1, 2, 2, 3]
d = {}
for i in l:
if i not in d:
d[i] = 1
else:
d[i] += 1
result = [[k, v] for k, v in d.items()]
For your example returns:
[[1, 1],
[2, 2],
[3, 1]]
Good luck.
First of all, you don't need to end your statements with semicolons (;), this isn't C. :-)
Second, line 5 (and others) set ret to be value,value but that isn't a list:
>type foo.py
def foo():
return [1],2
a,b = foo()
print "a = {0}".format(a)
print "b = {0}".format(b)
Gives:
>python foo.py
a = [1]
b = 2
Third: there are easier ways to do this, here's what comes to mind:
Use the Set constructor to create a unique list of items
Create a list of the number of times each entry in Set occurs in the input string
Use zip() to combine and return the two lists as set of tuples (although this isn't exactly what you were asking for)
Here's one way:
def priorsGrouper(priors):
"""Find out how many times each element occurs in a list.
#param[in] priors List of elements
#return Two-dimensional list: first row is the unique elements,
second row is the number of occurrences of each element.
"""
# Generate a `list' containing only unique elements from the input
mySet = set(priors)
# Create the list that will store the number of occurrences
occurrenceCounts = []
# Count how many times each element occurs on the input:
for element in mySet:
occurrenceCounts.append(priors.count(element))
# Combine the two:
combinedArray = zip(mySet, occurrenceCounts)
# End of priorsGrouper() ----------------------------------------------
# Check zero-element case
print priorsGrouper([])
# Check multi-element case
sampleInput = ['a','a', 'b', 'c', 'c', 'c']
print priorsGrouper(sampleInput)

Python: fast way to compute the average of several (same length) lists?

Is there a simple way to calculate the mean of several (same length) lists in Python? Say, I have [[1, 2, 3], [5, 6, 7]], and want to obtain [3,4,5]. This is to be doing 100000 times, so want it to be fast.
In case you're using numpy (which seems to be more appropriate here):
>>> import numpy as np
>>> data = np.array([[1, 2, 3], [5, 6, 7]])
>>> np.average(data, axis=0)
array([ 3., 4., 5.])
In [6]: l = [[1, 2, 3], [5, 6, 7]]
In [7]: [(x+y)/2 for x,y in zip(*l)]
Out[7]: [3, 4, 5]
(You'll need to decide whether you want integer or floating-point maths, and which kind of division to use.)
On my computer, the above takes 1.24us:
In [11]: %timeit [(x+y)/2 for x,y in zip(*l)]
1000000 loops, best of 3: 1.24 us per loop
Thus processing 100,000 inputs would take 0.124s.
Interestingly, NumPy arrays are slower on such small inputs:
In [27]: In [21]: a = np.array(l)
In [28]: %timeit (a[0] + a[1]) / 2
100000 loops, best of 3: 5.3 us per loop
In [29]: %timeit np.average(a, axis=0)
100000 loops, best of 3: 12.7 us per loop
If the inputs get bigger, the relative timings will no doubt change.
Extending NPEs answer, for a list containing n sublists which you want to average, use this (a numpy solution might be faster, but mine uses only built-ins):
def average(l):
llen = len(l)
def divide(x): return x / llen
return map(divide, map(sum, zip(*l)))
This sums up all sublists and then divides the result by the number of sublists, producing the average. You could inline the len computation and turn divide into a lambda like lambda x: x / len(l), but using an explicit function and pre-computing the length should be a bit faster.
Slightly modified version for smooth work with RGB pixels:
def average(*l):
l=tuple(l)
def divide(x): return x // len(l)
return list(map(divide, map(sum, zip(*l))))
print(average([0,20,200],[100,40,100]))
>>> [50,30,150]

number of values in a list greater than a certain number

I have a list of numbers and I want to get the number of times a number appears in a list that meets a certain criteria. I can use a list comprehension (or a list comprehension in a function) but I am wondering if someone has a shorter way.
# list of numbers
j=[4,5,6,7,1,3,7,5]
#list comprehension of values of j > 5
x = [i for i in j if i>5]
#value of x
len(x)
#or function version
def length_of_list(list_of_numbers, number):
x = [i for i in list_of_numbers if j > number]
return len(x)
length_of_list(j, 5)
is there an even more condensed version?
You could do something like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> sum(i > 5 for i in j)
3
It might initially seem strange to add True to True this way, but I don't think it's unpythonic; after all, bool is a subclass of int in all versions since 2.3:
>>> issubclass(bool, int)
True
You can create a smaller intermediate result like this:
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> len([1 for i in j if i > 5])
3
if you are otherwise using numpy, you can save a few strokes, but i dont think it gets much faster/compact than senderle's answer.
import numpy as np
j = np.array(j)
sum(j > i)
A (somewhat) different way:
reduce(lambda acc, x: acc + (1 if x > 5 else 0), j, 0)
If you are using NumPy (as in ludaavic's answer), for large arrays you'll probably want to use NumPy's sum function rather than Python's builtin sum for a significant speedup -- e.g., a >1000x speedup for 10 million element arrays on my laptop:
>>> import numpy as np
>>> ten_million = 10 * 1000 * 1000
>>> x, y = (np.random.randn(ten_million) for _ in range(2))
>>> %timeit sum(x > y) # time Python builtin sum function
1 loops, best of 3: 24.3 s per loop
>>> %timeit (x > y).sum() # wow, that was really slow! time NumPy sum method
10 loops, best of 3: 18.7 ms per loop
>>> %timeit np.sum(x > y) # time NumPy sum function
10 loops, best of 3: 18.8 ms per loop
(above uses IPython's %timeit "magic" for timing)
Different way of counting by using bisect module:
>>> from bisect import bisect
>>> j = [4, 5, 6, 7, 1, 3, 7, 5]
>>> j.sort()
>>> b = 5
>>> index = bisect(j,b) #Find that index value
>>> print len(j)-index
3
I'll add a map and filter version because why not.
sum(map(lambda x:x>5, j))
sum(1 for _ in filter(lambda x:x>5, j))
You can do like this using function:
l = [34,56,78,2,3,5,6,8,45,6]
print ("The list : " + str(l))
def count_greater30(l):
count = 0
for i in l:
if i > 30:
count = count + 1.
return count
print("Count greater than 30 is : " + str(count)).
count_greater30(l)
This is a little bit longer but the detailed solution for beginners:
from functools import reduce
from statistics import mean
two_dim_array = [[1, 5, 7, 3, 2], [2, 4 ,1 ,6, 8]]
# convert two dimensional array to one dimensional array
one_dim_array = reduce(list.__add__, two_dim_array)
arithmetic_mean = mean(one_dim_array)
exceeding_count = sum(i > arithmetic_mean for i in one_dim_array)

Categories