Track value changes in a repetitive list in Python - python

I have a list with repeating values as shown below:
x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
This list is generated from a pattern matching regular expression (not shown here). The list is guaranteed to have repeating values (many, many repeats - hundreds, if not thousands), and is never randomly arranged because that's what the regex is matching each time.
What I want is to track the list indices at which the entries change from the previous value. So for the above list x, I want to obtain a change-tracking list [3, 6] indicating that x[3] and x[6] are different from their previous entries in the list.
I managed to do this, but I was wondering if there was a cleaner way. Here's my code:
x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
flag = []
for index, item in enumerate(x):
if index != 0:
if x[index] != x[index-1]:
flag.append(index)
print flag
Output: [3, 6]
Question: Is there a cleaner way to do what I want, in fewer lines of code?

It can be done using a list comprehension, with a range function
>>> x = [1, 1, 1, 2, 2, 2, 3, 3, 3]
>>> [i for i in range(1,len(x)) if x[i]!=x[i-1] ]
[3, 6]
>>> x = [1, 1, 1, 2, 2, 2, 1, 1, 1]
>>> [i for i in range(1,len(x)) if x[i]!=x[i-1] ]
[3, 6]

You can do something like this using itertools.izip, itertools.tee and a list-comprehension:
from itertools import izip, tee
it1, it2 = tee(x)
next(it2)
print [i for i, (a, b) in enumerate(izip(it1, it2), 1) if a != b]
# [3, 6]
Another alternative using itertools.groupby on enumerate(x). groupby groups similar items together, so all we need is the index of first item of each group except the first one:
from itertools import groupby
from operator import itemgetter
it = (next(g)[0] for k, g in groupby(enumerate(x), itemgetter(1)))
next(it) # drop the first group
print list(it)
# [3, 6]
If NumPy is an option:
>>> import numpy as np
>>> np.where(np.diff(x) != 0)[0] + 1
array([3, 6])

I'm here to add the obligatory answer that contains a list comprehension.
flag = [i+1 for i, value in enumerate(x[1:]) if (x[i] != value)]

instead multi-indexing that has O(n) complexity you can use an iterator to check for the next element in list :
>>> x =[1, 1, 1, 2, 2, 2, 3, 3, 3]
>>> i_x=iter(x[1:])
>>> [i for i,j in enumerate(x[:-1],1) if j!=next(i_x)]
[3, 6]

itertools.izip_longest is what you are looking for:
from itertools import islice, izip_longest
flag = []
leader, trailer = islice(iter(x), 1), iter(x)
for i, (current, previous) in enumerate(izip_longest(leader, trailer)):
# Skip comparing the last entry to nothing
# If None is a valid value use a different sentinel for izip_longest
if leader is None:
continue
if current != previous:
flag.append(i)

Related

Grouping numbers in a list of floats in ascending order [duplicate]

Assume no consecutive integers are in the list.
I've tried using NumPy (np.diff) for the difference between each element, but haven't been able to use that to achieve the answer. Two examples of the input (first line) and expected output (second line) are below.
[6, 0, 4, 8, 7, 6]
[[6], [0, 4, 8], [7], [6]]
[1, 4, 1, 2, 4, 3, 5, 4, 0]
[[1, 4], [1, 2, 4], [3, 5], [4], [0]]
You could use itertools.zip_longest to enable iteration over sequential element pairs in your list along with enumerate to keep track of index values where the sequences are not increasing in order to append corresponding slices to your output list.
from itertools import zip_longest
nums = [1, 4, 1, 2, 4, 3, 5, 4, 0]
results = []
start = 0
for i, (a, b) in enumerate(zip_longest(nums, nums[1:])):
if b is None or b <= a:
results.append(nums[start:i+1])
start = i + 1
print(results)
# [[1, 4], [1, 2, 4], [3, 5], [4], [0]]
Here's a simple way to do what you're asking without any extra libraries:
result_list = []
sublist = []
previous_number = None
for current_number in inp:
if previous_number is None or current_number > previous_number:
# still ascending, add to the current sublist
sublist.append(current_number)
else:
# no longer ascending, add the current sublist
result_list.append(sublist)
# start a new sublist
sublist = [current_number]
previous_number = current_number
if sublist:
# add the last sublist, if there's anything there
result_list.append(sublist)
Just cause I feel kind, this will also work with negative numbers.
seq = [6, 0, 4, 8, 7, 6]
seq_by_incr_groups = [] # Will hold the result
incr_seq = [] # Needed to create groups of increasing values.
previous_value = 0 # Needed to assert whether or not it's an increasing value.
for curr_value in seq: # Iterate over the list
if curr_value > previous_value: # It's an increasing value and belongs to the group of increasing values.
incr_seq.append(curr_value)
else: # It was lower, lets append the previous group of increasing values to the result and reset the group so that we can create a new one.
if incr_seq: # It could be that it's empty, in the case that the first number in the input list is a negative.
seq_by_incr_groups.append(incr_seq)
incr_seq = []
incr_seq.append(curr_value)
previous_value = curr_value # Needed so that we in the next iteration can assert that the value is increasing compared to the prior one.
if incr_seq: # Check if we have to add any more increasing number groups.
seq_by_incr_groups.append(incr_seq) # Add them.
print(seq_by_incr_groups)
Below code should help you. However I would recommend that you use proper nomenclature and consider handling corner cases:
li1 = [6, 0, 4, 8, 7, 6]
li2 = [1, 4, 1, 2, 4, 3, 5, 4, 0]
def inc_seq(li1):
lix = []
li_t = []
for i in range(len(li1)):
#print (i)
if i < (len(li1) - 1) and li1[i] >= li1[i + 1]:
li_t.append(li1[i])
lix.append(li_t)
li_t = []
else:
li_t.append(li1[i])
print (lix)
inc_seq(li1)
inc_seq(li2)
You can write a simple script and you don't need numpy as far as I have understood your problem statement. Try the script below. I have tested it using Python 3.6.7 and Python 2.7.15+ on my Ubuntu machine.
def breakIntoList(inp):
if not inp:
return []
sublist = [inp[0]]
output = []
for a in inp[1:]:
if a > sublist[-1]:
sublist.append(a)
else:
output.append(sublist);
sublist = [a]
output.append(sublist)
return output
list = [1, 4, 1, 2, 4, 3, 5, 4, 0]
print(list)
print(breakIntoList(list))
Explanation:
The script first checks if input List passed to it has one or more elements.
It then initialise a sublist (variable name) to hold elements in increasing order. After that, we append input List's first element into our sublist.
We iterate through the input List beginning from it's second element (Index: 1). We keep on checking if the current element in Input List is greater than last element of sublist (by sublist[-1]). If yes, we append the current element to our sublist (at the end). If not, it means we can't hold that current element in sub-List. We append the sublist to output List and clear the sublist (for holding other increasing order sublists) and add the current element to our sublist.
At the end, we append the remaining sublist to the output List.
Here's an alternative using dict, list comprehensions, and zip:
seq = [1, 4, 1, 2, 4, 3, 5, 4, 0]
dict_seq = {i:j for i,j in enumerate(seq)}
# Get the index where numbers start to decrease
idx = [0] # Adding a zero seems counter-intuitive now; we'll see the benefit later.
for k, v in dict_seq.items():
if k>0:
if dict_seq[k]<dict_seq[k-1]:
idx.append(k)
# Using zip, slice and handling the last entry
inc_seq = [seq[i:j] for i, j in zip(idx, idx[1:])] + [seq[idx[-1:]]]
Output
print(inc_seq)
>>> [[1, 4], [1, 2, 4], [3, 5], [4], [0]]
By initiating idx = [0] and creating 2 sublists idx, idx[1:], we can zip these sublists to form [0:2], [2:5], [5:7] and [7:8] with the list comprehension.
>>> print(idx)
>>> [0, 2, 5, 7, 8]
>>> for i, j in zip(idx, idx[1:]):
print('[{}:{}]'.format(i,j))
[0:2]
[2:5]
[5:7]
[7:8] # <-- need to add the last slide [8:]

Shuffle two python list

I am having issue figuring out a way to shuffle two python list.
I have two different lists.
first = [1,2,3]
second = [4,5,6]
I want my final list to be a combination of these two lists but shuffled in a particular way.
combined = [1,4,2,5,3,6]
I can shuffle both the lists and combine them, but the result would be [2,1,3,6,5,4] but what I want is [1,4,2,5,3,6].
The combined list should have one item from the first list, and then the subsequent item from the second list.
The two lists might even be of different lengths.
first = [1,2,3]
second = [4,5,6,7]
def shuffle(f, s):
newlist = []
maxlen = len(f) if len(f) > len(s) else len(s)
for i in range(maxlen):
try:
newlist.append(f[i])
except IndexError:
pass
try:
newlist.append(s[i])
except IndexError:
pass
return newlist
print(shuffle(first, second))
If both lists are always going to be the same length then you could do:
x = [1, 2, 3]
y = [4, 5, 6]
z = []
for i in range(len(x)):# Could be in range of y too as they would be the same length
z.append(x[i])
z.append(y[i])
If they are different lengths then you will have to do things slightly differently and check which one is longer and make that the one you get the length of. The you will need to check each iteration of the loop if you are past the length of the shorter one.
Purely because I want to understand itertools one day:
from itertools import chain, zip_longest, filterfalse
first = [1, 2, 3]
second = [4, 5, 6, 9, 10, 11]
result = filterfalse(
lambda x: x is None,
chain.from_iterable(zip_longest(first, second)))
print(tuple(result))
Enjoy!
For lists of unequal length, use itertools.zip_longest:
from itertools import chain, zip_longest
a = [1, 2, 3]
b = [4, 5, 6, 7]
combined = [x for x in chain(*zip_longest(a, b)) if x is not None]
print(combined) # -> [1, 4, 2, 5, 3, 6, 7]
For Python 2, replace zip_longest with izip_longest.
This is based on pylang's answer on Interleave multiple lists of the same length in Python
Here it is generalized, as a function in a module.
#!/usr/bin/env python3
from itertools import chain, zip_longest
def interleave(*args):
"""
Interleave iterables.
>>> list(interleave([1, 2, 3], [4, 5, 6, 7]))
[1, 4, 2, 5, 3, 6, 7]
>>> ''.join(interleave('abc', 'def', 'ghi'))
'adgbehcfi'
"""
for x in chain(*zip_longest(*args)):
if x is not None:
yield x

Pythonic list slicing with variable step size?

To frame the question, let's assume I've got the following list in Python, where X is some arbitrarily large natural number:
l = [1, 2, 3, 4, 5, 6, 7, ... X]
And I want to slice it such that I take the first, second, third, fifth, eighth, etc. elements of the list, abiding by the Fibonacci sequence. E.g. an operation akin to:
l_prime = [l[0], l[1], l[2], l[4], l[7], l[11], ...]
I'm comfortable with Python indexing notation, of l[start:end:step_size], and I'm wondering if there's a way to index Python lists within this notational paradigm with a step size that varies after each index is added to my new sliced list. Or, would I need to use some other technique to solve the prior problem I posed?
If you can use numpy this is really easy.
l = np.array([1, 2, 3, 4, ..., X])
fibs = np.array([0, 1, 2, 4, 7])
print(l[fibs])
If you want to retrieve multiple elements from the list you can use the function itemgetter():
from operator import itemgetter
lst = [1, 2, 3, 4, 5, 6, 7, 8]
ind = [0, 0, 1, 2, 4, 7]
itemgetter(*ind)(lst)
# (1, 1, 2, 3, 5, 8)
You can first write a generator that gives you the fibonacci numbers:
def fibs():
prev1 = 1
prev2 = 2
yield prev1
yield prev2
while True:
prev1 += prev2
prev2 += prev1
yield prev1
yield prev2
And then you can use list comprehension to map each of the fibonacci numbers fib to l[fib - 1]:
import itertools
result = [l[fib - 1] for fib in itertools.takewhile(lambda x: x <= len(l), fibs())]

Cycle a list from alternating sides

Given a list
a = [0,1,2,3,4,5,6,7,8,9]
how can I get
b = [0,9,1,8,2,7,3,6,4,5]
That is, produce a new list in which each successive element is alternately taken from the two sides of the original list?
>>> [a[-i//2] if i % 2 else a[i//2] for i in range(len(a))]
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
Explanation:
This code picks numbers from the beginning (a[i//2]) and from the end (a[-i//2]) of a, alternatingly (if i%2 else). A total of len(a) numbers are picked, so this produces no ill effects even if len(a) is odd.
[-i//2 for i in range(len(a))] yields 0, -1, -1, -2, -2, -3, -3, -4, -4, -5,
[ i//2 for i in range(len(a))] yields 0, 0, 1, 1, 2, 2, 3, 3, 4, 4,
and i%2 alternates between False and True,
so the indices we extract from a are: 0, -1, 1, -2, 2, -3, 3, -4, 4, -5.
My assessment of pythonicness:
The nice thing about this one-liner is that it's short and shows symmetry (+i//2 and -i//2).
The bad thing, though, is that this symmetry is deceptive:
One might think that -i//2 were the same as i//2 with the sign flipped. But in Python, integer division returns the floor of the result instead of truncating towards zero. So -1//2 == -1.
Also, I find accessing list elements by index less pythonic than iteration.
cycle between getting items from the forward iter and the reversed one. Just make sure you stop at len(a) with islice.
from itertools import islice, cycle
iters = cycle((iter(a), reversed(a)))
b = [next(it) for it in islice(iters, len(a))]
>>> b
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
This can easily be put into a single line but then it becomes much more difficult to read:
[next(it) for it in islice(cycle((iter(a),reversed(a))),len(a))]
Putting it in one line would also prevent you from using the other half of the iterators if you wanted to:
>>> iters = cycle((iter(a), reversed(a)))
>>> [next(it) for it in islice(iters, len(a))]
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
>>> [next(it) for it in islice(iters, len(a))]
[5, 4, 6, 3, 7, 2, 8, 1, 9, 0]
A very nice one-liner in Python 2.7:
results = list(sum(zip(a, reversed(a))[:len(a)/2], ()))
>>>> [0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
First you zip the list with its reverse, take half that list, sum the tuples to form one tuple, and then convert to list.
In Python 3, zip returns a generator, so you have have to use islice from itertools:
from itertools import islice
results = list(sum(islice(zip(a, reversed(a)),0,int(len(a)/2)),()))
Edit: It appears this only works perfectly for even-list lengths - odd-list lengths will omit the middle element :( A small correction for int(len(a)/2) to int(len(a)/2) + 1 will give you a duplicate middle value, so be warned.
Use the right toolz.
from toolz import interleave, take
b = list(take(len(a), interleave((a, reversed(a)))))
First, I tried something similar to Raymond Hettinger's solution with itertools (Python 3).
from itertools import chain, islice
interleaved = chain.from_iterable(zip(a, reversed(a)))
b = list(islice(interleaved, len(a)))
If you don’t mind sacrificing the source list, a, you can just pop back and forth:
b = [a.pop(-1 if i % 2 else 0) for i in range(len(a))]
Edit:
b = [a.pop(-bool(i % 2)) for i in range(len(a))]
Not terribly different from some of the other answers, but it avoids a conditional expression for determining the sign of the index.
a = range(10)
b = [a[i // (2*(-1)**(i&1))] for i in a]
i & 1 alternates between 0 and 1. This causes the exponent to alternate between 1 and -1. This causes the index divisor to alternate between 2 and -2, which causes the index to alternate from end to end as i increases. The sequence is a[0], a[-1], a[1], a[-2], a[2], a[-3], etc.
(I iterate i over a since in this case each value of a is equal to its index. In general, iterate over range(len(a)).)
The basic principle behind your question is a so-called roundrobin algorithm. The itertools-documentation-page contains a possible implementation of it:
from itertools import cycle, islice
def roundrobin(*iterables):
"""This function is taken from the python documentation!
roundrobin('ABC', 'D', 'EF') --> A D E B F C
Recipe credited to George Sakkis"""
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables) # next instead of __next__ for py2
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
so all you have to do is split your list into two sublists one starting from the left end and one from the right end:
import math
mid = math.ceil(len(a)/2) # Just so that the next line doesn't need to calculate it twice
list(roundrobin(a[:mid], a[:mid-1:-1]))
# Gives you the desired result: [0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
alternatively you could create a longer list (containing alternating items from sequence going from left to right and the items of the complete sequence going right to left) and only take the relevant elements:
list(roundrobin(a, reversed(a)))[:len(a)]
or using it as explicit generator with next:
rr = roundrobin(a, reversed(a))
[next(rr) for _ in range(len(a))]
or the speedy variant suggested by #Tadhg McDonald-Jensen (thank you!):
list(islice(roundrobin(a,reversed(a)),len(a)))
Not sure, whether this can be written more compactly, but it is efficient as it only uses iterators / generators
a = [0,1,2,3,4,5,6,7,8,9]
iter1 = iter(a)
iter2 = reversed(a)
b = [item for n, item in enumerate(
next(iter) for _ in a for iter in (iter1, iter2)
) if n < len(a)]
For fun, here is an itertools variant:
>>> a = [0,1,2,3,4,5,6,7,8,9]
>>> list(chain.from_iterable(izip(islice(a, len(a)//2), reversed(a))))
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]
This works where len(a) is even. It would need a special code for odd-lengthened input.
Enjoy!
Not at all elegant, but it is a clumsy one-liner:
a = range(10)
[val for pair in zip(a[:len(a)//2],a[-1:(len(a)//2-1):-1]) for val in pair]
Note that it assumes you are doing this for a list of even length. If that breaks, then this breaks (it drops the middle term). Note that I got some of the idea from here.
Two versions not seen yet:
b = list(sum(zip(a, a[::-1]), ())[:len(a)])
and
import itertools as it
b = [a[j] for j in it.accumulate(i*(-1)**i for i in range(len(a)))]
mylist = [0,1,2,3,4,5,6,7,8,9]
result = []
for i in mylist:
result += [i, mylist.pop()]
Note:
Beware: Just like #Tadhg McDonald-Jensen has said (see the comment below)
it'll destroy half of original list object.
One way to do this for even-sized lists (inspired by this post):
a = range(10)
b = [val for pair in zip(a[:5], a[5:][::-1]) for val in pair]
I would do something like this
a = [0,1,2,3,4,5,6,7,8,9]
b = []
i = 0
j = len(a) - 1
mid = (i + j) / 2
while i <= j:
if i == mid and len(a) % 2 == 1:
b.append(a[i])
break
b.extend([a[i], a[j]])
i = i + 1
j = j - 1
print b
You can partition the list into two parts about the middle, reverse the second half and zip the two partitions, like so:
a = [0,1,2,3,4,5,6,7,8,9]
mid = len(a)//2
l = []
for x, y in zip(a[:mid], a[:mid-1:-1]):
l.append(x)
l.append(y)
# if the length is odd
if len(a) % 2 == 1:
l.append(a[mid])
print(l)
Output:
[0, 9, 1, 8, 2, 7, 3, 6, 4, 5]

Picking the most common element from a bunch of lists

I have a list l of lists [l1, ..., ln] of equal length
I want to compare the l1[k], l2[k], ..., ln[k] for all k in len(l1) and make another list l0 by picking the element that appears most frequently.
So, if l1 = [1, 2, 3], l2 = [1, 4, 4] and l3 = [0, 2, 4], then l = [1, 2, 4]. If there is a tie, I will look at the lists that make up the tie and choose the one in the list with higher priority. Priority is given a priori, each list is given a priority.
Ex. if you have value 1 in lists l1 and l3, and value 2 in lists l2 and l4, and 3 in l5, and lists are ordered according to priority, say l5>l2>l3>l1>l4, then I will pick 2, because 2 is in l2 that contains an element with highest occurrence and its priority is higher than l1 and l3.
How do I do this in python without creating a for loop with lots of if/else conditions?
You can use the Counter module from the collections library. Using the map function will reduce your list looping. You will need an if/else statement for the case that there is no most frequent value but only for that:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
if counts[0][1] > counts[1][1]: #is there a most common value
list0.append(counts[0][0]) #takes the value with highest count
else:
list0.append(k_vals[0]) #takes element from first list
list0 is the answer you are looking for. I just hate using l because it's easy to confuse with the number 1
Edit (based on comments):
Incorporating your comments, instead of the if/else statement, use a while loop:
i = list_length
while counts[0][1] == counts[1][1]:
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count once there's no tie
So the whole thing is now:
import collections
list0 = []
list_length = len(your_lists[0])
for k in list_length:
k_vals = map(lambda x: x[k], your_lists) #collect all values at k pos
counts = collections.Counter(k_vals).most_common() #tuples (val,ct) sorted by count
i = list_length
while counts[0][1] == counts[1][1]: #in case of a tie
counts = collections.Counter(k_vals[:i]).most_common() #ignore the lowest priority element
i -= 1 #go back farther if there's still a tie
list0.append(counts[0][0]) #takes the value with highest count
You throw in one more tiny loop but on the bright side there's no if/else statements at all!
Just transpose the sublists and get the Counter.most_common element key from each group:
from collections import Counter
lists = [[1, 2, 3],[1, 4, 4],[0, 2, 4]]
print([Counter(sub).most_common(1)[0][0] for sub in zip(*lists)])
If they are individual lists just zip those:
l1, l2, l3 = [1, 2, 3], [1, 4, 4], [0, 2, 4]
print([Counter(sub).most_common(1)[0][0] for sub in zip(l1,l2,l3)])
Not sure how taking the first element from the grouping if there is a tie makes sense as it may not be the one that tied but that is trivial to implement, just get the two most_common and check if their counts are equal:
def most_cm(lists):
for sub in zip(*lists):
# get two most frequent
comm = Counter(sub).most_common(2)
# if their values are equal just return the ele from l1
yield comm[0][0] if len(comm) == 1 or comm[0][1] != comm[1][1] else sub[0]
We also need if len(comm) == 1 in case all the elements are the same or we will get an IndexError.
If you are talking about taking the element that comes from the earlier list in the event of a tie i.e l2 comes before l5 then that is just the same as taking any of the elements that tie.
For a decent number of sublists:
In [61]: lis = [[randint(1,10000) for _ in range(10)] for _ in range(100000)]
In [62]: list(most_cm(lis))
Out[62]: [5856, 9104, 1245, 4304, 829, 8214, 9496, 9182, 8233, 7482]
In [63]: timeit list(most_cm(lis))
1 loops, best of 3: 249 ms per loop
Solution is:
a = [1, 2, 3]
b = [1, 4, 4]
c = [0, 2, 4]
print [max(set(element), key=element.count) for element in zip(a, b, c)]
That's what you're looking for:
from collections import Counter
from operator import itemgetter
l0 = [max(Counter(li).items(), key=itemgetter(1))[0] for li in zip(*l)]
If you are OK taking any one of a set of elements that are tied as most common, and you can guarantee that you won't hit an empty list within your list of lists, then here is a way using Counter (so, from collections import Counter):
l = [ [1, 0, 2, 3, 4, 7, 8],
[2, 0, 2, 1, 0, 7, 1],
[2, 0, 1, 4, 0, 1, 8]]
res = []
for k in range(len(l[0])):
res.append(Counter(lst[k] for lst in l).most_common()[0][0])
Doing this in IPython and printing the result:
In [86]: res
Out[86]: [2, 0, 2, 1, 0, 7, 8]
Try this:
l1 = [1,2,3]
l2 = [1,4,4]
l3 = [0,2,4]
lists = [l1, l2, l3]
print [max(set(x), key=x.count) for x in zip(*lists)]

Categories