How can we count text 'changes' in a list? - python

How can we count text 'changes' in a list?
The list below has 0 'changes'. Everything is the same.
['Comcast', 'Comcast', 'Comcast', 'Comcast', 'Comcast', 'Comcast']
The list below has 3 changes. First change is Comcast>Sprint, second change is Sprint>AT&T and third change is AT&T>Comcast.
['Comcast', 'Comcast', 'Sprint', 'AT&T', 'Comcast', 'Comcast']
I Googled this before posting here. Finding unique items seems pretty easy. Finding changes, or switches, seems not so easy.

One option is to use itertools.groupby. This counts the number of "chunks" and subtract one (to get the "boundaries").
from itertools import groupby
lst = ['Comcast', 'Comcast', 'Sprint', 'AT&T', 'Comcast', 'Comcast']
output = sum(1 for _ in groupby(lst)) - 1
print(output) # 3

You want to compare elements pairwise, so you can create an iterator to pair up adjacent elements:
>>> l = [ 1, 1, 2, 3, 1, 1, 1, ]
>>> list(zip(l[:-1], l[1:]))
[(1, 1), (1, 2), (2, 3), (3, 1), (1, 1), (1, 1)]
Then iterate over them and test if they're pairwise equal:
>>> [x == y for (x, y) in zip(l[0:-1], l[1:])]
[True, False, False, False, True, True]
Then count them where they are not equal:
>>> sum(1 for (x, y) in zip(l[0:-1], l[1:]) if x != y)
3

Related

How to use next iterator within a list comprehension in python3 to get a list without any leading zeroes [duplicate]

This question already has answers here:
Problem removing leading zeros using a list comprehension best expression
(2 answers)
Closed 3 years ago.
Trying to remove all the leading zeroes from a list of array using next() and enumerate within a list comprehension. Came across the below code which works. Can anyone explain clearly what the code does.
example : result = [0,0,1,2,0,0,3] returns result = [1,2,0,0,3]
Edited* - the code just removes the leading zeroes
result = result[next((i for i, x in enumerate(result) if x != 0), len(result)):]
print(result)
Trying to remove all the leading zeroes from a list of array using
next() and enumerate within a list comprehension.
Are you obligated to use next(), enumerate() and a list comprehension? An alternate approach:
from itertools import dropwhile
from operator import not_ as is_zero
result = dropwhile(is_zero, [0, 0, 1, 2, 0, 0, 3])
print(*result)
OUTPUT
% python3 test.py
1 2 0 0 3
%
We can potentially explain the original code:
result = [0, 0, 1, 2, 0, 0, 3]
result[next((i for i, x in enumerate(result) if x != 0), len(result)):]
By breaking it down into pieces and executing them:
enumerate(result) # list of indexes and values [(i0, x0), (i1, x1), ...]
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 0), (5, 0), (6, 3)]
[i for i, x in enumerate(result)] # just the indexes
[i for i, x in [(0, 0), (1, 0), ..., (5, 0), (6, 3)]] # what effectively happens
[0, 1, 2, 3, 4, 5, 6]
[i for i, x in enumerate(result) if x != 0] # just the indexes of non-zero values
[2, 3, 6]
# not needed with this example input, used to make an all
# zero list like [0, 0, ..., 0] return the empty list []
len(result)
7
# pull off the first element of list of indexes of non-zero values
next((i for i, x in enumerate(result) if x != 0), len(result))
next(iter([2, 3, 6]), 7) # what effectively happens
2
result[next((i for i, x in enumerate(result) if x != 0), len(result)):] # slice
result[2:] # what effectively happens
[1, 2, 0, 0, 3]
So lets unpack the code from inside out.
(i for i, x in enumerate(result) if x != 0) is a generator for all indices of values that are not zero.
next((i for i, x in enumerate(result) if x != 0), len(result)) returns the first value of the generator (so the index of the first value that is not zero). len(result) is the default value, if the generator does not return any value. So we could also extract this result into a new variable.
index = next((i for i, x in enumerate(result) if x != 0), len(result))
result = result[index:]
The last step is a simple list comprehension and only takes values from the list with an index equals or higher than the given one.

Is There A Universal Selector Option For if...in Clauses?

I have a "large" list of tuples:
thelist=[(1,2),(1,3),(2,3)]
I want to check whether any tuple in the list starts with a 1, and if it does, print "aaa":
for i in thelist:
templist.append((i[0],i))
for i in templist:
if i[0]==1:
print("aaa")
break
Which is rather ardurous as I have to create the templist. Is there any way I can do this:
if (1,_) in thelist:
print("aaa")
Where _ is the universal selector. Note that the list would be very large and thus it is very costly to implement another list.
There isn't, although you can just use any
any(i[0] == 1 for i in thelist) --> Returns true if the first element is 1
If you don’t actually need the actual tuple, like you do in your example, then you can actually use tuple unpacking for exactly that purpose:
>>> the_list = [(1, 2), (1, 3), (2, 3)]
>>> for x, y in the_list:
if x == 1:
print('aaa')
break
aaa
If you add a * in front of the y, you can also unpack tuples of different sizes, collecting the remainder of the tuple:
>>> other_list = [(1, 2, 3, 4, 5), (1, 3), (2, 3)]
>>> for x, *y in other_list:
if x == 1:
print(y)
break
[2, 3, 4, 5]
Otherwise, if you just want to filter your list based on some premise and then do something on those filtered items, you can use filter with a custom function:
>>> def startsWithOne(x):
return x[0] == 1
>>> thelist = [(1, 2), (1, 3), (2, 3)]
>>> for x in filter(starts_with_one, the_list):
print(x)
(1, 2)
(1, 3)
This is probably the most flexible way which also avoids creating a separate list in memory, as the elements are filtered lazily when you interate the list with your loop.
Finally, if you just want to figure out if any of your items starts with a 1, like you do in your example code, then you could just do it like this:
>>> if any(filter(starts_with_one, the_list)):
print('aaa')
aaa
But I assume that this was just an oversimplified example.

How to isolate specific rows of a Cartesian Product - Python

I've created a cartesian product and printed from the following code:
A = [0, 3, 5]
B = [5, 10, 15]
C = product(A, B)
for n in C:
print(n)
And we have a result of
(0, 5)
(0, 10)
(0, 15)
(3, 5)
(3, 10)
(3, 15)
(5, 5)
(5, 10)
(5, 15)
Is it possible to check the sum of each set within the cartesian product to a target value of 10? Can we then return or set aside the sets that match?
Result should be:
(0, 10)
(5, 5)
Finally, how can I count the frequency of numbers in my resulting subset to show that 0 appeared once, 10 appeared once, and 5 appeared twice? I would appreciate any feedback or guidance.
You can use a list comprehension like this:
C = [p for p in product(A, B) if sum(p) == 10]
And to count the frequency of the numbers in the tuples, as #Amadan suggests, you can use collections.Counter after using itertools.chain.from_iterable to chain the sub-lists into one sequence:
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(C))
which returns, given your sample input:
Counter({5: 2, 0: 1, 10: 1})
which is an instance of a dict subclass, so you can iterate over its items for output:
for n, freq in Counter({5: 2, 0: 1, 10: 1}).items():
print('{}: {}'.format(n, freq))

Find overlapping elements in a list of tuples?

From my understanding of the intersection function, it finds complete overlap between elements in a list. For example:
tup_1 = [(1,2,3),(4,5,6)]
tup_2 = [(4,5,6)]
ol_tup = set(tup_1).intersection(tup_2)
print ol_tup
would yield:
set([(4, 5, 6)])
However, suppose my list of tuples are set up as this:
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
Where there's an overlap in 2 elements of the 2nd tuple in tup_1 and 1st tuple in tup_2. If I want to python to return these 2 tuples: (4,5,5) and (4,5,6), is there an easier way than this nested for loop (below)?
for single_tuple_1 in tup_1:
for single_tuple_2 in tup_2:
if single_tuple_1[0] == single_tuple_2[0] and single_tuple_1[1] == single_tuple_2[1]:
print single_tuple_1,single_tuple_2
EDIT:
For this case, suppose order matters and suppose the tuples contain 5 elements:
tup_1 = [(1,2,3,4,5),(4,5,6,7,8),(11,12,13,14,15)]
tup_2 = [(1,2,3,4,8),(4,5,1,7,8),(11,12,13,14,-5)]
And I would like to find the tuples that intersect with each other in their respective first 4 elements. So the result should be:
[(1,2,3,4,5),(1,2,3,4,8),(11,12,13,14,15),(11,12,13,14,-5)]
How would the code change to accommodate this?
If you want to return all the pairs of "overlapping" tuples there's no way around comparing all the pairs, i.e. a quadratic algorithm. But you could make the code a bit more elegant using a list comprehension, product for the combinations and zip and sum for the comparison:
>>> tup_1 = [(1,2,3),(4,5,5),(7,8,9)]
>>> tup_2 = [(4,5,6),(0,5,5),(9,8,7)]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if sum(1 for ai, bi in zip(a, b) if ai == bi) >= 2]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]
Note: This checks whether two tuples have the same element in at least two positions, i.e. order matters. If order should not matter, you can convert a and b to set instead and check the size of their intersection, but that might fail for repeated numbers, i.e. the intersection of (1,1,2) and (1,1,3) would just be 1 instead of 2.
If you only want to match the first two, or first two and last two elements, you can compare slices of the tuples in an accordant disjunction:
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2]]
[((4, 5, 5), (4, 5, 6))]
>>> [(a, b) for (a, b) in itertools.product(tup_1, tup_2)
... if a[:2] == b[:2] or a[-2:] == b[-2:]]
[((4, 5, 5), (4, 5, 6)), ((4, 5, 5), (0, 5, 5))]
This is one way using a list comprehension. The logic as written checks for an overlap of at least 2 elements.
Note that if there is no overlap you will be left with the one element of tup_2, but that can be trivially identified.
from itertools import chain
tup_1 = [(1,2,3),(4,5,5)]
tup_2 = [(4,5,6)]
y = sorted(tup_2[0])
res = [i for i in chain(tup_1, tup_2) if
sum(i==j for i, j in zip(sorted(i), y)) > 1]
print res
[(4, 5, 5), (4, 5, 6)]

Using Python's list index() method on a list of tuples or objects?

Python's list type has an index() method that takes one parameter and returns the index of the first item in the list matching the parameter. For instance:
>>> some_list = ["apple", "pear", "banana", "grape"]
>>> some_list.index("pear")
1
>>> some_list.index("grape")
3
Is there a graceful (idiomatic) way to extend this to lists of complex objects, like tuples? Ideally, I'd like to be able to do something like this:
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> some_list.getIndexOfTuple(1, 7)
1
>>> some_list.getIndexOfTuple(0, "kumquat")
2
getIndexOfTuple() is just a hypothetical method that accepts a sub-index and a value, and then returns the index of the list item with the given value at that sub-index. I hope
Is there some way to achieve that general result, using list comprehensions or lambas or something "in-line" like that? I think I could write my own class and method, but I don't want to reinvent the wheel if Python already has a way to do it.
How about this?
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> [x for x, y in enumerate(tuple_list) if y[1] == 7]
[1]
>>> [x for x, y in enumerate(tuple_list) if y[0] == 'kumquat']
[2]
As pointed out in the comments, this would get all matches. To just get the first one, you can do:
>>> [y[0] for y in tuple_list].index('kumquat')
2
There is a good discussion in the comments as to the speed difference between all the solutions posted. I may be a little biased but I would personally stick to a one-liner as the speed we're talking about is pretty insignificant versus creating functions and importing modules for this problem, but if you are planning on doing this to a very large amount of elements you might want to look at the other answers provided, as they are faster than what I provided.
Those list comprehensions are messy after a while.
I like this Pythonic approach:
from operator import itemgetter
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
def collect(l, index):
return map(itemgetter(index), l)
# And now you can write this:
collect(tuple_list,0).index("cherry") # = 1
collect(tuple_list,1).index("3") # = 2
If you need your code to be all super performant:
# Stops iterating through the list as soon as it finds the value
def getIndexOfTuple(l, index, value):
for pos,t in enumerate(l):
if t[index] == value:
return pos
# Matches behavior of list.index
raise ValueError("list.index(x): x not in list")
getIndexOfTuple(tuple_list, 0, "cherry") # = 1
One possibility is to use the itemgetter function from the operator module:
import operator
f = operator.itemgetter(0)
print map(f, tuple_list).index("cherry") # yields 1
The call to itemgetter returns a function that will do the equivalent of foo[0] for anything passed to it. Using map, you then apply that function to each tuple, extracting the info into a new list, on which you then call index as normal.
map(f, tuple_list)
is equivalent to:
[f(tuple_list[0]), f(tuple_list[1]), ...etc]
which in turn is equivalent to:
[tuple_list[0][0], tuple_list[1][0], tuple_list[2][0]]
which gives:
["pineapple", "cherry", ...etc]
You can do this with a list comprehension and index()
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
[x[0] for x in tuple_list].index("kumquat")
2
[x[1] for x in tuple_list].index(7)
1
Inspired by this question, I found this quite elegant:
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> next(i for i, t in enumerate(tuple_list) if t[1] == 7)
1
>>> next(i for i, t in enumerate(tuple_list) if t[0] == "kumquat")
2
I would place this as a comment to Triptych, but I can't comment yet due to lack of rating:
Using the enumerator method to match on sub-indices in a list of tuples.
e.g.
li = [(1,2,3,4), (11,22,33,44), (111,222,333,444), ('a','b','c','d'),
('aa','bb','cc','dd'), ('aaa','bbb','ccc','ddd')]
# want pos of item having [22,44] in positions 1 and 3:
def getIndexOfTupleWithIndices(li, indices, vals):
# if index is a tuple of subindices to match against:
for pos,k in enumerate(li):
match = True
for i in indices:
if k[i] != vals[i]:
match = False
break;
if (match):
return pos
# Matches behavior of list.index
raise ValueError("list.index(x): x not in list")
idx = [1,3]
vals = [22,44]
print getIndexOfTupleWithIndices(li,idx,vals) # = 1
idx = [0,1]
vals = ['a','b']
print getIndexOfTupleWithIndices(li,idx,vals) # = 3
idx = [2,1]
vals = ['cc','bb']
print getIndexOfTupleWithIndices(li,idx,vals) # = 4
ok, it might be a mistake in vals(j), the correction is:
def getIndex(li,indices,vals):
for pos,k in enumerate(lista):
match = True
for i in indices:
if k[i] != vals[indices.index(i)]:
match = False
break
if(match):
return pos
z = list(zip(*tuple_list))
z[1][z[0].index('persimon')]
tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
def eachtuple(tupple, pos1, val):
for e in tupple:
if e == val:
return True
for e in tuple_list:
if eachtuple(e, 1, 7) is True:
print tuple_list.index(e)
for e in tuple_list:
if eachtuple(e, 0, "kumquat") is True:
print tuple_list.index(e)
Python's list.index(x) returns index of the first occurrence of x in the list. So we can pass objects returned by list compression to get their index.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> [tuple_list.index(t) for t in tuple_list if t[1] == 7]
[1]
>>> [tuple_list.index(t) for t in tuple_list if t[0] == 'kumquat']
[2]
With the same line, we can also get the list of index in case there are multiple matched elements.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11), ("banana", 7)]
>>> [tuple_list.index(t) for t in tuple_list if t[1] == 7]
[1, 4]
I guess the following is not the best way to do it (speed and elegance concerns) but well, it could help :
from collections import OrderedDict as od
t = [('pineapple', 5), ('cherry', 7), ('kumquat', 3), ('plum', 11)]
list(od(t).keys()).index('kumquat')
2
list(od(t).values()).index(7)
7
# bonus :
od(t)['kumquat']
3
list of tuples with 2 members can be converted to ordered dict directly, data structures are actually the same, so we can use dict method on the fly.
This is also possible using Lambda expressions:
l = [('rana', 1, 1), ('pato', 1, 1), ('perro', 1, 1)]
map(lambda x:x[0], l).index("pato") # returns 1
Edit to add examples:
l=[['rana', 1, 1], ['pato', 2, 1], ['perro', 1, 1], ['pato', 2, 2], ['pato', 2, 2]]
extract all items by condition:
filter(lambda x:x[0]=="pato", l) #[['pato', 2, 1], ['pato', 2, 2], ['pato', 2, 2]]
extract all items by condition with index:
>>> filter(lambda x:x[1][0]=="pato", enumerate(l))
[(1, ['pato', 2, 1]), (3, ['pato', 2, 2]), (4, ['pato', 2, 2])]
>>> map(lambda x:x[1],_)
[['pato', 2, 1], ['pato', 2, 2], ['pato', 2, 2]]
Note: The _ variable only works in the interactive interpreter. More generally, one must explicitly assign _, i.e. _=filter(lambda x:x[1][0]=="pato", enumerate(l)).
I came up with a quick and dirty approach using max and lambda.
>>> tuple_list = [("pineapple", 5), ("cherry", 7), ("kumquat", 3), ("plum", 11)]
>>> target = 7
>>> max(range(len(tuple_list)), key=lambda i: tuple_list[i][1] == target)
1
There is a caveat though that if the list does not contain the target, the returned index will be 0, which could be misleading.
>>> target = -1
>>> max(range(len(tuple_list)), key=lambda i: tuple_list[i][1] == target)
0

Categories