How to drop-while in python? - python

I've got a list of numbers, e.g.:
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,0,0,0,0]
What I want to know is, how many leading values do I need to drop to get a list of all zeros?
So the answer here is 4.
I'm thinking, reverse the list, then use a for loop and a counter to run down the list until I find the first non-zero element, and then subtract counter and list length, but it seems a bit ugly.
Is there a nice 'pythonic' way to do it?
(Edit for clarity:
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0]
should go to 11, so I can't just use filter. I want to know how long the producer took to settle down to the point where the output becomes continuously zero)

You can use itertools.dropwhile and itertools.takewhile for this:
>>> l = [0.01,0.02,0.01,-0.01,0,0,0,0,0,0,0,0,0,0]
>>> import itertools
>>> list(itertools.dropwhile(lambda x: x != 0, l))
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> list(itertools.takewhile(lambda x: x != 0, l))
[0.01, 0.02, 0.01, -0.01]
>>> sum(1 for _ in itertools.takewhile(lambda x: x != 0, l))
4
However, if you want the list to contain only 0, then dropping from the front might not work if there are zeros and then non-zero elements again. Instead, you might better start from the end, using reversed, until you find the first non-zero element.
>>> sum(1 for _ in itertools.takewhile(lambda x: x == 0, reversed(l)))
10
>>> sum(1 for _ in itertools.dropwhile(lambda x: x == 0, reversed(l)))
4
Here, the first is the number of consecutive zeros starting from the end of the list, and the second the number of the remaining elements starting with the first non-zero, again from the end.

point = next(index for index, value in enumerate(reversed(l)) if value != 0)
point = len(l) - point if point else -1
We iterate over the list in reversed order till we get the first non 0 element. We use that index and subtract it from length to get the actual point.
updated code as suggest in comment.
Thanks tobias_k

There isn't a particularly Pythonic and efficient way to do this. You could iterate backwards over the list using range, but I think it's slightly cleaner to use the reversed list iterator:
def nonzeros(seq):
for i, v in enumerate(reversed(seq)):
if v:
break
return len(seq) - i
lst = [0.01,0.02,0.01,-0.01,0,0,0,0,0,0,0,0,0,0]
print(nonzeros(lst))
lst = [0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0,0]
print(nonzeros(lst))
output
4
11

Reversing the list is an O(n) operation anyway, so there's no point. Just walk the list and note the index of the last non-zero element.
last = -1
for i, value in enumerate(l):
if value != 0:
last = i
(Consider using a tolerance test instead of strict equality for value.)
After the walk, last + 1 is the index of the first 0 in the longest all-zero suffix of your list. That is, all(x == 0 for x in l[last+1:]) will be true.

How about a verbatim solution, e.g. find the maximal index of a non-zero element?
res = max(i for i, x in enumerate(lst) if x != 0) + 1

pop would make your counting easy:
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,0,0,0,0]
while not l.pop():
pass
result = len(l) + 1
assert result == 4
Edit
I would make it a function though:
def foo(original):
clone = original[:]
while not clone.pop(): pass
return len(clone) + 1
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,0,0,0,0]
assert foo(l) == 4
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0]
assert foo(l) == 11

The length of the list is stored with its internal data. Start with the length of the full list, and then iterate through the list backwards until you find a non-zero value.
Worst case complexity should be O(n) if the list is all zeros.
It would be lightning fast in the case of a very long list with only a couple of zeros at the end before the first non-zero value, such as my_list = [5] * 1000000 + [0, 0].
my_list = [0.01, 0.02, 0.01, -0.01, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
n = len(my_list)
while n:
n -= 1
if my_list[n] != 0:
n += 1
break
>>> n
4

Given:
>>> l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0]
You can use groupby on a reversed iterator of the list to group the value of the last item for as long as that value is == the value before it:
>>> last_grp=next((k, len(l)-sum(1 for _ in v)) for k,v in groupby(reversed(l)))
>>> last_grp
(0, 11)
The first element of the tuple returned will be the repeated value of the last group -- 0 in this case. The length of that group then is how long. Subtract that from the overall list length for the index to the start of the group.
reversed and groupby are iterators. next returns the next value of an iterator. Since this is the last group, it is only needed once. This is efficient on any size list.
This works on a group of anything where l[x-1]==l[x] and the value of k is set to whatever value that is. groupby does just that -- groups items of the same value together.
You could also use groupby to find ranges where some condition is True or False; in this case, created than 0:
di={True:[], False:[]}
for k, v in groupby(enumerate(l), key=lambda t: t[1]>0):
grp=list(v)
di[k].append((grp[0][0], grp[-1][0]))
>>> di
{False: [(3, 9), (11, 13)], True: [(0, 2), (10, 10)]}
So the list l has a value greater than 0 in each range of [(0, 2), (10, 10)] and a value less than or equal to 0 in a range of [(3, 9), (11, 13)]

l = [0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0]
for i,j in enumerate(reversed(l)) :
if j:
print (len(l[:-i]))
break
Output:
11

This is probably more convoluted than you would want it to be but its my 2 cents anyway:
l=[0.01,0.02,0.01,-0.01,0,0,0,0,0,0,1,0,0,0]
s = len(l) - next(i for i, x in enumerate(l[::-1]) if x != 0)
print(s) # 11

Update-> Earlier I misunderstood the question. (Thanks to dawg)
One way can be to convert the reversed list into bool array and search for first non-zero(True) value in the list.
For both operations (converting and searching) we can use built-in functions and thus better speed but it comes at cost of some memory(you haven't mentioned anything about memory consumption,so I'm assuming extra memory is available).
Here's the Code
bool_list = map(bool, reversed(l))
index = bool_list.index(True)
if index == -1:
# No such sub-array found
return len(bool_list)
else:
# Start index of the required sub-array
return len(bool_list) - index
Here, we used reversed instead of slice operator [::-1] for reversing because it is a generator function and returns element on the go without consuming any additional memory. We just need memory for bool array.

Related

Create a list counting sequential occurrences of numbers in another list

I have this list
x = [2,3,4,2,2]
I want to create a new list of the same size where each element will represent a count of the number of times the corresponding number has appeared in the original list before the current element. For example, the first 3 elements are 2,3,4 So, I want the first 3 values in the new list to be 0, and in x[3] since 2 is repeated, I want the value to be 1 and x[4] it's again 2 I want the value to be 2 (the number 2 has been encountered in the list twice up to this point).
My expected answer is the new list = [0,0,0,1,2]
Similarly for the list [2,3,4,2,2,3,3] I want the new list to be [0,0,0,1,2,1,2]
You can leverage itertools.count and collections.defaultdict to make a simple, efficient list comprehension. By setting count() as the default value for your default dictionary you get a new counter item for every unique number in the list that increments as you call next() on it.
from itertools import count
from collections import defaultdict
c = defaultdict(count)
x = [2,3,4,2,2]
[next(c[n]) for n in x ]
# [0, 0, 0, 1, 2]
You can keep track of the number of times a number has been met and add it to the result accordingly.
>>> d = defaultdict(int)
>>> res = []
>>> for el in x:
... res.append(d[el])
... d[el]+=1
...
>>> res
[0, 0, 0, 1, 2]
A shorter but probably less efficient solution:
[x[:i].count(n) for i,n in enumerate(x)]
I don't see any shortcuts to solving this other than full exploration.
counters = {} # keep track of counts
x= [2,3,4,2,2]
new_list = [counters[c] for c in x if not counters.update({c:counters.get(c,-1)+1})]
I came up with this one:
print([x[:i].count(x[i]) for i in range(len(x))])
First iterate over the index of the list: for i in range(len)
Then take the value at position i of the original list and count how many times it occurs before that position. For the original list x = [2,3,4,2,2] we obtain:
i=0 [].count(2) -> 0
i=1 [2].count(3) -> 0
i=2 [2,3].count(4) -> 0
i=3 [2,3,4].count(2) -> 1
i=4 [2,3,4,2].count(2) -> 2
You could try this as well:
list = [2, 3, 4, 2, 2]
repeats = []
repeatsCounter = 0
for i in range(len(list)): #loops through the list
for j in range(i - 1, -1, -1): #loops through every value until i
if list[i] == list[j]:
repeatsCounter += 1
break
repeats.append(repeatsCounter)

Making a new list that returns the minimum value digits' position/index from the previous list

Make a function called positions_lowest(lorig1), with at least one element, with integer numbers, possibly repeated, and returns a new list (lpos) containing the positions where the minimum number is in the original list.
For example:
lorig=[10,2,-3,34.5, 22,-3,1]
lres = positions_lowest(lorig)
print(lres)
# gives the output:
[2, 5] # because the minimum digit is -3 and positions are 2 and 5
I tried many times fixing my code and it is becoming more and more complicated for this simple question, below is my code. It does not even execute
def positions_lowest(lorig):
lpos = []
if lorig:
min_val = lorig[0]
[(i,j) for i,val in enumerate(lorig) if j < min_val]
if j == min_j:
lpos.append(i)
else:
min_val = j
lpos = [i]
return lpos
Try to split the problem into two steps - finding the minimum element and then returning all indices of the minimum element. This will make the problem easier in my opinion. There is a built-in function for the first step - min:
In [1]: a = [10, 2, -3, 34.5, 22, -3, 1]
In [2]: minv = min(a)
# minv is now -3
Now try using a list comprehension to get all indices of minv. I think you are on the right path. You should use enumerate over the list (to be able to get the indices) and then compare each value to minv.
This is not my answer, its from https://stackoverflow.com/a/15098701/12128167
You will have to make changes if you want different variable names or formats but this is
sweet and simple.
a = [10,2,-3,34.5, 22,-3,1]
def locate_min(a):
smallest = min(a)
return smallest, [index for index, element in enumerate(a)
if smallest == element]
print(locate_min(a))
Output:
(-3, [2, 5])
where -3 is the minimum number and the list of elements shows the index of the minimum number.
IF YOU DON'T WANT MINIMUM
return [index for index, element in enumerate(a)
if smallest == element]
Change code like above.
Gives you:
[2, 5]

Is there a way to find a max value between a range?

For example,
l = [1, -9, 2, 5, 9, 16, 11, 0, 21]
and if the range is 10 (10 meaning any numbers higher than 10 wont be considered as the max), I want the code to return 9.
You can first delete all elements too large and then find the max:
filtered = filter(lambda x: x <= limit, list)
val = max(filtered, default = None) # the `default` part means that that's returned if there are no elements
filtered is a filter object which contains all elements less than or equal to the limit. val is the maximum value in that.
Alternatively,
filtered = [x for x in list if x <= limit]
val = max(filtered, default = None)
filtered contains all elements in the list if and only if they are less than the limit. val is the maximum of filtered.
Alternatively,
val = max((x for x in list if x <= limit), default = None)
This combines the two steps from the above method by using an argument comprehension.
Alternatively,
val = max(filter(limit.__ge__, list), default = None)
limit.__ge__ is a function that means x => limit >= x (ge means Greater-Equal). This is the shortest and least readable way of writing it.
Also please rename list
list is a global variable (the list type in Python). Please don't overwrite global variables ;_;
The following is not radically different, conceptually, than #HyperNeutrino's excellent answer, but I think it's somewhat clearer (per the Zen):
from __future__ import print_function
l = [1, -9, 2, 5, 9, 16, 11, 0, 21]
def lim(x, n):
if x <= n:
return x
print(max(lim(a,10) for a in l))
The cleanest and most space efficient method is to utilize a conditioned generator expression:
maxl = max(num for num in l if num <= 10)
This loops over the list l once, ignoring any numbers not satisfying num <= 10 and finds the maximum. No additional list is build.

Permutations with repetition without two consecutive equal elements

I need a function that generates all the permutation with repetition of an iterable with the clause that two consecutive elements must be different; for example
f([0,1],3).sort()==[(0,1,0),(1,0,1)]
#or
f([0,1],3).sort()==[[0,1,0],[1,0,1]]
#I don't need the elements in the list to be sorted.
#the elements of the return can be tuples or lists, it doesn't change anything
Unfortunatly itertools.permutation doesn't work for what I need (each element in the iterable is present once or no times in the return)
I've tried a bunch of definitions; first, filterting elements from itertools.product(iterable,repeat=r) input, but is too slow for what I need.
from itertools import product
def crp0(iterable,r):
l=[]
for f in product(iterable,repeat=r):
#print(f)
b=True
last=None #supposing no element of the iterable is None, which is fine for me
for element in f:
if element==last:
b=False
break
last=element
if b: l.append(f)
return l
Second, I tried to build r for cycle, one inside the other (where r is the class of the permutation, represented as k in math).
def crp2(iterable,r):
a=list(range(0,r))
s="\n"
tab=" " #4 spaces
l=[]
for i in a:
s+=(2*i*tab+"for a["+str(i)+"] in iterable:\n"+
(2*i+1)*tab+"if "+str(i)+"==0 or a["+str(i)+"]!=a["+str(i-1)+"]:\n")
s+=(2*i+2)*tab+"l.append(a.copy())"
exec(s)
return l
I know, there's no need you remember me: exec is ugly, exec can be dangerous, exec isn't easy-readable... I know.
To understand better the function I suggest you to replace exec(s) with print(s).
I give you an example of what string is inside the exec for crp([0,1],2):
for a[0] in iterable:
if 0==0 or a[0]!=a[-1]:
for a[1] in iterable:
if 1==0 or a[1]!=a[0]:
l.append(a.copy())
But, apart from using exec, I need a better functions because crp2 is still too slow (even if faster than crp0); there's any way to recreate the code with r for without using exec? There's any other way to do what I need?
You could prepare the sequences in two halves, then preprocess the second halves to find the compatible choices.
def crp2(I,r):
r0=r//2
r1=r-r0
A=crp0(I,r0) # Prepare first half sequences
B=crp0(I,r1) # Prepare second half sequences
D = {} # Dictionary showing compatible second half sequences for each token
for i in I:
D[i] = [b for b in B if b[0]!=i]
return [a+b for a in A for b in D[a[-1]]]
In a test with iterable=[0,1,2] and r=15, I found this method to be over a hundred times faster than just using crp0.
You could try to return a generator instead of a list. With large values of r, your method will take a very long time to process product(iterable,repeat=r) and will return a huge list.
With this variant, you should get the first element very fast:
from itertools import product
def crp0(iterable, r):
for f in product(iterable, repeat=r):
last = f[0]
b = True
for element in f[1:]:
if element == last:
b = False
break
last = element
if b:
yield f
for no_repetition in crp0([0, 1, 2], 12):
print(no_repetition)
# (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
# (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
Instead of filtering the elements, you could generate a list directly with only the correct elements. This method uses recursion to create the cartesian product:
def product_no_repetition(iterable, r, last_element=None):
if r == 0:
return [[]]
else:
return [p + [x] for x in iterable
for p in product_no_repetition(iterable, r - 1, x)
if x != last_element]
for no_repetition in product_no_repetition([0, 1], 12):
print(no_repetition)
I agree with #EricDuminil's comment that you do not want "Permutations with repetition." You want a significant subset of the product of the iterable with itself multiple times. I don't know what name is best: I'll just call them products.
Here is an approach that builds each product line without building all the products then filtering out the ones you want. My approach is to work primarily with the indices of the iterable rather than the iterable itself--and not all the indices, but ignoring the last one. So instead of working directly with [2, 3, 5, 7] I work with [0, 1, 2]. Then I work with the products of those indices. I can transform a product such as [1, 2, 2] where r=3 by comparing each index with the previous one. If an index is greater than or equal to the previous one I increment the current index by one. This prevents two indices from being equal, and this also gets be back to using all the indices. So [1, 2, 2] is transformed to [1, 2, 3] where the final 2 was changed to a 3. I now use those indices to select the appropriate items from the iterable, so the iterable [2, 3, 5, 7] with r=3 gets the line [3, 5, 7]. The first index is treated differently, since it has no previous index. My code is:
from itertools import product
def crp3(iterable, r):
L = []
for k in range(len(iterable)):
for f in product(range(len(iterable)-1), repeat=r-1):
ndx = k
a = [iterable[ndx]]
for j in range(r-1):
ndx = f[j] if f[j] < ndx else f[j] + 1
a.append(iterable[ndx])
L.append(a)
return L
Using %timeit in my Spyder/IPython configuration on crp3([0,1], 3) shows 8.54 µs per loop while your crp2([0,1], 3) shows 133 µs per loop. That shows a sizeable speed improvement! My routine works best where iterable is short and r is large--your routine finds len ** r lines (where len is the length of the iterable) and filters them while mine finds len * (len-1) ** (r-1) lines without filtering.
By the way, your crp2() does do filtering, as shown by the if lines in your code that is execed. The sole if in my code does not filter a line, it modifies an item in the line. My code does return surprising results if the items in the iterable are not unique: if that is a problem, just change the iterable to a set to remove the duplicates. Note that I replaced your l name with L: I think l is too easy to confuse with 1 or I and should be avoided. My code could easily be changed to a generator: replace L.append(a) with yield a and remove the lines L = [] and return L.
How about:
from itertools import product
result = [ x for x in product(iterable,repeat=r) if all(x[i-1] != x[i] for i in range(1,len(x))) ]
Elaborating on #peter-de-rivaz's idea (divide and conquer). When you divide the sequence to create into two subsequences, those subsequences are the same or very close. If r = 2*k is even, store the result of crp(k) in a list and merge it with itself. If r=2*k+1, store the result of crp(k) in a list and merge it with itself and with L.
def large(L, r):
if r <= 4: # do not end the divide: too slow
return small(L, r)
n = r//2
M = large(L, r//2)
if r%2 == 0:
return [x + y for x in M for y in M if x[-1] != y[0]]
else:
return [x + y + (e,) for x in M for y in M for e in L if x[-1] != y[0] and y[-1] != e]
small is an adaptation from #eric-duminil's answer using the famous for...else loop of Python:
from itertools import product
def small(iterable, r):
for seq in product(iterable, repeat=r):
prev, *tail = seq
for e in tail:
if e == prev:
break
prev = e
else:
yield seq
A small benchmark:
print(timeit.timeit(lambda: crp2( [0, 1, 2], 10), number=1000))
#0.16290732200013736
print(timeit.timeit(lambda: crp2( [0, 1, 2, 3], 15), number=10))
#24.798989593000442
print(timeit.timeit(lambda: large( [0, 1, 2], 10), number=1000))
#0.0071403849997295765
print(timeit.timeit(lambda: large( [0, 1, 2, 3], 15), number=10))
#0.03471425700081454

Python split list by one element

I am analysing data using Python and I have a list of N 2d data arrays. I would like to look at these elements one by one and compare them to the average of the other N-1 elements.
Is there a built-in method in Python to loop over a list and have on one hand a single element and on the other the rest of the list.
I know how to do it the ``ugly'' way by looping over an integer and joining the left and right part:
for i in xrange(N):
my_element = my_list[i]
my_sublist = my_list[:i] + my_list[i+1:]
but is there a Pythonista way of doing it?
We can calculate the sum of all elements. Then we can easily for each element find the sum of the rest of the elements. We subtract from the total sum the value of the current element and then in order to find the average we divide by N - 1:
s = sum(my_list)
for my_element in my_list:
avg_of_others = (s - my_element) / float(len(my_list) - 1)
...
EDIT:
This is an example how it can be extended to numpy:
import numpy as np
l = np.array([(1, 2), (1, 3)])
m = np.array([(3, 1), (2, 4)])
my_list = [l, m]
s = sum(my_list)
for my_element in my_list:
avg_of_others = (s - my_element) / float(len(my_list) - 1)
Nothing built-in that I know of, but maybe a little less copy-intensive using generators:
def iwithout(pos, seq):
for i, elem in enumerate(seq):
if i != pos:
yield elem
for elem, others in (elem, iwithout(i, N) for i, elem in enumerate(N)):
...
# others is now a generator expression running over N
# leaving out elem
I would like to look at these elements one by one and compare them to the average of the other N-1 elements.
For this specific use case, you should just calculate the sum of the entire list once and substract the current element to calculate the average, as explained in JuniorCompressor's answer.
Is there a built-in method in Python to loop over a list and have on one hand a single element and on the other the rest of the list.
For the more general problem, you could use collections.deque to pop the next element from the one end, giving you that element and the remaining elements from the list, and then add it back to the other end before the next iteration of the loop. Both operations are O(1).
my_queue = collections.deque(my_list)
for _ in enumerate(my_list):
my_element = my_queue.popleft() # pop next my_element
my_sublist = my_queue # rest of queue without my_element
print my_element, my_sublist # do stuff...
my_queue.append(my_element) # re-insert my_element
Sample output for my_list = range(5):
0 deque([1, 2, 3, 4])
1 deque([2, 3, 4, 0])
2 deque([3, 4, 0, 1])
3 deque([4, 0, 1, 2])
4 deque([0, 1, 2, 3])

Categories