An algorithm to find transitions in Python - python

I want to implement an algorithm that gets the index of letter changes.
I have the below list, here I want to find the beginning of every letter changes and put a result list except the first one. Because, for the first one, we should get the last index of occurrence of it. Let me give you an example:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
Transitions:
'A','A','A','A','A','A','A','A','A','A','A','A'-->'B'-->'C','C'-->'X'-->'D'-->'X'-->'B','B'-->'A','A','A','A'
Here, after A letters finish, B starts, we should put the index of last A and the index of first B and so on, but we should not include X letter into the result list.
Desired result:
[(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
So far, I have done this code, this finds other items except the (11, 'A'). How can I modify my code to get the desired result?
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
result.append((i,(letters[i])))
My result:
[(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')] ---> missing (11, 'A').

Now that you've explained you want the first index of every letter after the first, here's a one-liner:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
[(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Now, your first entry is different. For this, you need to use a recipe which finds the last index of each item:
import itertools
grouped = [(len(list(g))-1,k) for k,g in (itertools.groupby(letters))]
weird_transitions = [grouped[0]] + [(n+1, b) for (n, (a,b)) in enumerate(zip(letters,letters[1:])) if a!=b and b!='X']
#=> [(11, 'A'), (12, 'B'), (13, 'C'), (16, 'D'), (18, 'B'), (20, 'A')]
Of course, you could avoid creating the whole list of grouped, because you only ever use the first item from groupby. I leave that as an exercise for the reader.
This will also give you an X as the first item, if X is the first (set of) items. Because you say nothing about what you're doing, or why the Xs are there, but omitted, I can't figure out if that's the right behaviour or not. If it's not, then probably use my entire other recipe (in my other answer), and then take the first item from that.

Your question is a bit confusing, but this code should do what you want.
firstChangeFound = False
for i in range(len(letters)):
if letters[i]!='X' and letters[i]!=letters[i-1]:
if not firstChangeFound:
result.append((i-1, letters[i-1])) #Grab the last occurrence of the first character
result.append((i, letters[i]))
firstChangeFound = True
else:
result.append((i, letters[i]))

You want (Or, you don't, as you finally explained - see my other answer):
import itertools
import functional # get it from pypi
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
grouped = [(len(list(g)),k) for k,g in (itertools.groupby(letters))]
#=> [(12, 'A'), (1, 'B'), (2, 'C'), (1, 'D'), (2, 'B'), (4, 'A')]
#-1 to take this from counts to indices
filter(lambda (a,b): b!='X',functional.scanl(lambda (a,b),(c,d): (a+c,d), (-1,'X'), grouped))
#=> [(11, 'A'), (12, 'B'), (14, 'C'), (16, 'D'), (19, 'B'), (23, 'A')]
This gives you the last index of each letter run, other than Xs. If you want the first index after the relevant letter, then switch the -1 to 0.
scanl is a reduce which returns intermediate results.
As a general rule, it makes sense to either filter first or last, unless that is for some reason expensive, or the filtering can easily be accomplished without increasing complexity.
Also, your code is relatively hard to read and understand, because you iterate by index. That's unusual in python, unless manipulating the index numerically. If you're visiting every item, it's usual to iterate directly.
Also, why do you want this particular format? It's usual to have the format as (unique item,data) because that can easily be placed in a dict.

With minimal change to your code, and following Josh Caswell's suggestion:
for i, letter in enumerate(letters[1:], 1):
if letter != 'X' and letters[i] != letters[i-1]:
result.append((i, letter))
first_change = result[0][0]
first_stretch = ''.join(letters[:first_change]).rstrip('X')
if first_stretch:
result.insert(0, (len(first_stretch) - 1, first_stretch[-1]))

Here's a solution which uses groupby to generate a single sequence from which both first and last indices can be extracted.
import itertools
import functools
letters = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'C', 'X', 'D', 'X', 'B', 'B', 'A', 'A', 'A', 'A']
groupbysecond = functools.partial(itertools.groupby,key=operator.itemgetter(1))
def transitions(letters):
#segregate transition and non-transition indices
grouped = groupbysecond(enumerate(zip(letters,letters[1:])))
# extract first such entry from each group
firsts = (next(l) for k,l in grouped)
# group those entries together - where multiple, there are first and last
# indices of the run of letters
regrouped = groupbysecond((n,a) for n,(a,b) in firsts)
# special case for first entry, which wants last index of first letter
kfirst,lfirst = next(regrouped)
firstitem = (tuple(lfirst)[-1],) if kfirst != 'X' else ()
#return first item, and first index for all other letters
return itertools.chain(firstitem,(next(l) for k,l in regrouped if k != 'X'))

letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
prev = letters[0]
result = []
for i in range(len(letters)):
if prev!=letters[i]:
result.append((i-1,prev))
if letters[i]!='X':
prev = letters[i]
else:
prev = letters[i+1]
result.append((len(letters)-1,letters[-1]))
print result
RESULTS IN: (Not OP's desired results, sorry I must have misunderstood. see JSutton's ans)
[(11,'A'), (12,'B'), (14,'C'), (16,'D'), (19,'B'), (23,'A')]
which is actually the index of the last instance of a letter before they change or the list ends.

With an aid of dictionary to keep running time linear in number of input, here is a solution:
letters=['A','A','A','A','A','A','A','A','A','A','A','A','B','C','C','X','D','X','B','B','A','A','A','A']
def f(letters):
result = []
added = {}
for i in range(len(letters)):
if (i+1 == len(letters)):
break
if letters[i+1]!='X' and letters[i+1]!=letters[i]:
if(i not in added and letters[i]!='X'):
result.append((i, letters[i]))
added[i] = letters[i]
if(i+1 not in added):
result.append((i+1, letters[i+1]))
added[i+1] = letters[i+1]
return result
Basically, my the solution always tries to add both indices where a change occurred. But the dictionary (which has constant time lookup tells us if we already added the element or not to exclude duplicates). This takes care of adding the first element. Otherwise you can use an if statement to indicate first round which will only run once. However, I argue that this solution has same running time. As long as you do not check if you added the element by looking up the list itself (since this is linear time lookup at worst), this will result in O(n^2) time which is bad!

Here's my suggestion. It has three steps.
Fist, find all the starting indexes for each run of letters.
Replace the index in the first non-X run with the index of the end of its run, which will be one less than the start of the following run.
Filter out all X runs.
The code:
def letter_runs(letters):
prev = None
results = []
for index, letter in enumerate(letters):
if letter != prev:
prev = letter
results.append((index, letter))
if results[0][1] != "X":
results[0] = (results[1][0]-1, results[0][1])
else: # if first run is "X" second must be something else!
results[1] = (results[2][0]-1, results[1][1])
return [(index, letter) for index, letter in results if letter != "X"]

Related

Printing Tuple's items [duplicate]

I stumbled across the following code:
for i, a in enumerate(attributes):
labels.append(Label(root, text = a, justify = LEFT).grid(sticky = W))
e = Entry(root)
e.grid(column=1, row=i)
entries.append(e)
entries[i].insert(INSERT,"text to insert")
I don't understand the i, a bit, and searching for information on for didn't yield any useful results. When I try and experiment with the code I get the error:
ValueError: need more than 1 value to unpack
Does anyone know what it does, or a more specific term associated with it that I can google to learn more?
You could google "tuple unpacking". This can be used in various places in Python. The simplest is in assignment:
>>> x = (1,2)
>>> a, b = x
>>> a
1
>>> b
2
In a for-loop it works similarly. If each element of the iterable is a tuple, then you can specify two variables, and each element in the loop will be unpacked to the two.
>>> x = [(1,2), (3,4), (5,6)]
>>> for item in x:
... print "A tuple", item
A tuple (1, 2)
A tuple (3, 4)
A tuple (5, 6)
>>> for a, b in x:
... print "First", a, "then", b
First 1 then 2
First 3 then 4
First 5 then 6
The enumerate function creates an iterable of tuples, so it can be used this way.
Enumerate basically gives you an index to work with in the for loop. So:
for i,a in enumerate([4, 5, 6, 7]):
print(i, ": ", a)
Would print:
0: 4
1: 5
2: 6
3: 7
Take this code as an example:
elements = ['a', 'b', 'c', 'd', 'e']
index = 0
for element in elements:
print element, index
index += 1
You loop over the list and store an index variable as well. enumerate() does the same thing, but more concisely:
elements = ['a', 'b', 'c', 'd', 'e']
for index, element in enumerate(elements):
print element, index
The index, element notation is required because enumerate returns a tuple ((1, 'a'), (2, 'b'), ...) that is unpacked into two different variables.
[i for i in enumerate(['a','b','c'])]
Result:
[(0, 'a'), (1, 'b'), (2, 'c')]
You can combine the for index,value approach with direct unpacking of tuples using ( ). This is useful where you want to set up several related values in your loop that can be expressed without an intermediate tuple variable or dictionary, e.g.
users = [
('alice', 'alice#example.com', 'dog'),
('bob', 'bob#example.com', 'cat'),
('fred', 'fred#example.com', 'parrot'),
]
for index, (name, addr, pet) in enumerate(users):
print(index, name, addr, pet)
prints
0 alice alice#example.com dog
1 bob bob#example.com cat
2 fred fred#example.com parrot
The enumerate function returns a generator object which, at each iteration, yields a tuple containing the index of the element (i), numbered starting from 0 by default, coupled with the element itself (a), and the for loop conveniently allows you to access both fields of those generated tuples and assign variable names to them.
Short answer, unpacking tuples from a list in a for loop works.
enumerate() creates a tuple using the current index and the entire current item, such as (0, ('bob', 3))
I created some test code to demonstrate this:
list = [('bob', 3), ('alice', 0), ('john', 5), ('chris', 4), ('alex', 2)]
print("Displaying Enumerated List")
for name, num in enumerate(list):
print("{0}: {1}".format(name, num))
print("Display Normal Iteration though List")
for name, num in list:
print("{0}: {1}".format(name, num))
The simplicity of Tuple unpacking is probably one of my favourite things about Python :D
let's get it through with an example:
list = [chips, drinks, and, some, coding]
i = 0
while i < len(list):
if i % 2 != 0:
print(i)
i+=1
output:[drinks,some]
now using EMUNERATE fuction:(precise)
list = [chips, drinks, and, coding]
for index,items in enumerate(list):
print(index,":",items)
OUTPUT: 0:drinks
1:chips
2:drinks
3:and
4:coding

List of tuples and mean calculation [duplicate]

This question already has answers here:
Finding the average of a list
(25 answers)
Closed 1 year ago.
I have the following list of tuples:
list = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0, 'd'), (2435, 'x')]
I would like to calculate the mean of the first component of all tuples. I did the following:
s = []
for i in range(len(list)):
a = list[0][i]
if a =! 0:
s.append(a)
else:
pass
mean = sum(s) / len(s)
and it works, but my question is whether there is any way to avoid using for loops? since I have a very large list of tuples and due to time calculation I need to find another way if that possible.
According to the above stated for loop method. How could I find the mean with regard to the wights? I mean, e.g. the last element in the list is (2435, 'x') and the number 2435 is very large in comparison to that one in (1, 'x') which is 1. Any ideas would be very appreciated. Thanks in advance.
The loop is unavoidable as you need to iterate over all the elements at least once as John describes.
However, you can use an iterator based approach to get rid of creating a list to save on space:
mean = sum(elt[0] for elt in lst)/len(lst)
Update: I realize you only need the mean of elements that are non-zero. You can modify your approach to not store the elements in the list.
total = 0
counts = 0
for elt in lst:
if elt[0]:
total += elt[0]
counts += 1
mean = total/counts
A pandas approach:
import pandas as pd
tuples = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0, 'd'), (2435, 'x')]
df = pd.DataFrame(tuples)
df[0][df[0]!=0].mean() #1001.2
Careful timing would be needed to see if this is any better than what you are currently doing. The actual mean calculation should be faster, but the gain could well be negated by the cost of conversion.
You do need a for loop, but you can use list comprehension to make it cleaner.
Also, python standard library has a very nice statistics module that you can use for the calculation of the mean.
As extra note, please, do not use list as a variable name, it can be confused with the type list.
from statistics import mean
mylist = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0,'d'), (2435, 'x')]
m = mean([item[0] for item in mylist if item[0] != 0])
print(m)
1001.2
In Python 2.7
items = [item[0] for item in mylist if item[0] != 0]
mean = sum(items)/len(items)
print(mean)
1001.2
Finish up by refactoring the list comprehension to show more meaningful variable names, for example items = [number for number, letter in mylist if number != 0]

Character count in Python

The task is given: need to get a word from user, then total characters in the word must be counted and displayed in sorted order (count must be descending and characters must be ascending -
i.e.,
if the user gives as "management"
then the output should be
**a 2
e 2
m 2
n 2
g 1
t 1**
this is the code i written for the task:
string=input().strip()
set1=set(string)
lis=[]
for i in set1:
lis.append(i)
lis.sort()
while len(lis)>0:
maxi=0
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
lis.remove(j)
this code gives me following output for string "management"
a 2
m 2
e 2
n 2
g 1
t 1
m & e are not sorted.
What is wrong with my code?
The issue with your code lies in that you're trying to remove an element from the list while you're still iterating over it. This can cause problems. Presently, you remove "a", whereupon "e" takes its spot - and the list advances to the next letter, "m". Thus, "e" is skipped 'till the next iteration.
Try separating your printing and your removal, and don't remove elements from a list you're currently iterating over - instead, try adding all other elements to a new list.
string=input().strip()
set1=set(string)
lis=[]
for i in set1:
lis.append(i)
lis.sort()
while len(lis)>0:
maxi=0
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
dupelis = lis
lis = []
for k in dupelis:
if string.count(k)!=maxi:
lis.append(k)
managementa 2e 2m 2n 2g 1t 1
Demo
The problem with your code is the assignment of the variable maxi and the two for loops. "e" wont come second because you are assigning maxi as "2" and string.count(i) will be less than maxi.
for i in lis:
if string.count(i)>maxi:
maxi=string.count(i)
for j in lis:
if string.count(j)==maxi:
print(j,maxi)
There are several ways of achieving what you are looking for. You can try the solutions as others have explained.
you can use a simple Counter for that
from collections import Counter
Counter("management")
Counter({'a': 2, 'e': 2, 'm': 2, 'n': 2, 'g': 1, 't': 1})
I'm not really sure what you are trying to achieve by adding a while loop and then two nested for loops inside it. But the same thing can be achieved by a single for loop.
for i in lis:
print(i, string.count(i))
With this the output will be:
a 2
e 2
g 1
m 2
n 2
t 1
As answered before, you can use a Counter to get the counts of characters, no need to make a set or list.
For sorting, you'd be well off using the inbuilt sorted function which accepts a function in the key parameter. Read more about sorting and lambda functions.
>>> from collections import Counter
>>> c = Counter('management')
>>> sorted(c.items())
[('a', 2), ('e', 2), ('g', 1), ('m', 2), ('n', 2), ('t', 1)]
>>> alpha_sorted = sorted(c.items())
>>> sorted(alpha_sorted, key=lambda x: x[1])
[('g', 1), ('t', 1), ('a', 2), ('e', 2), ('m', 2), ('n', 2)]
>>> sorted(alpha_sorted, key=lambda x: x[1], reverse=True) # Reverse ensures you get descending sort
[('a', 2), ('e', 2), ('m', 2), ('n', 2), ('g', 1), ('t', 1)]
The easiest way to count the characters is to use Counter, as suggested by some previous answers. After that, the trick is to come up with a measure that takes both the count and the character into account to achieve the sorting. I have the following:
from collections import Counter
c = Counter('management')
sc = sorted(c.items(),
key=lambda x: -1000 * x[1] + ord(x[0]))
for char, count in sc:
print(char, count)
c.items() gives a list of tuples (character, count). We can use sorted() to sort them.
The parameter key is the key. sorted() puts items with lower keys (i.e. keys with smaller values) first, so I have to make a big count have a small value.
I basically give a lot of negative weight (-1000) to the count (x[1]), then augment that with the ascii value of character (ord(x[0])). The result is a sorting order that takes into account the count first, the character second.
An underlying assumption is that ord(x[0]) never exceeds 1000, which should be true of English characters.

Longest common subsequence test case fails

I have implemented this algorithm for the Longest common subsequence, I have tested it for every possible test case, it works, but when I submit it to the online grader of the course, it says it failed on case 11, I can't think of any possible test case that would break it down. Can you help? It returns idx longest subsequence.
def lcs2(a, b):
idx = 0
for i in a:
if i not in b:
a.remove(i)
if len(a) <= len(b):
for i in a:
if i in b:
idx += 1; b = b[b.index(i)+1:]
else:
for i in b:
if i in a:
idx += 1; a = a[a.index(i)+1:]
return idx
I can hack you with a sample:
a = [1,2,3,4,5]
b = [2,3,4,1,5]
one correct solution is Dynamic Programming
What you seem to be searching for is not the longest common subsequence, but the length of this longest common subsequence. However, the wrong assumption is that the first of the two lists contains the start of this subsequence at an earlier index than the second list.
The answer supplied already gives an example of where this happens:
a = [1,2,3,4,5]
b = [2,3,4,1,5]
I will expand with:
c = [3,4,2,1,5]
You will see that:
import itertools as it
rets = [lcs2(x,y) for x,y in it.permutations([a,b,c],2)]
combis = [(x,y) for x,y in it.permutations(['a','b','c'],2)]
print(*zip(rets, combis), sep='\n')
#(2, ('a', 'b'))
#(2, ('a', 'c'))
#(4, ('b', 'a'))
#(3, ('b', 'c'))
#(3, ('c', 'a'))
#(4, ('c', 'b'))
In other words, the lcs2 function you defined is a-symmetrical and therefore not correct.

Finding key with minimum value in OrderedDict

I need to find the key whose value is the lowest in the ordered dictionary but only when it is True for the position in my_list.
from collections import OrderedDict
my_list = [True,False,False,True,True,False,False,False,]
my_lookup = OrderedDict([('a', 2), ('b', 9), ('c', 4), ('d', 7),
('e', 3), ('f', 0), ('g', -5), ('h', 9)])
I know how to do it with a for loop like
mins=[]
i=0
for item in my_lookup.items():
if my_list[i]:
mins.append(item)
i+=1
print min(mins,key=lambda x:x[1])[0]
prints
a
because a is lowest that is also True in my_list.
This works but it is long and I want to know how to do it with comprehensions or one line?
You can use itertools.compress with key to min being get method,
>>> from itertools import compress
>>> min(compress(my_lookup, my_list), key=my_lookup.get)
a
You can also combine generator expression and min:
>>> min(((value, key) for ((key, value), flag) in zip(my_lookup.iteritems(), my_list) if flag))[1]
'a'
Two-liner:
from itertools import izip
print min((lookup_key for (lookup_key, keep) in izip(my_lookup.iterkeys(),
my_list) if keep), key=my_lookup.get)
For what it's worth, a slight variation of the original code performs reasonably (on par with ovgolovin's approach) and is quite readable:
minimum = (None, float('inf'))
for i, item in enumerate(my_lookup.items()):
if not my_list[i]:
continue
if item[1] < minimum[1]:
minimum = item
return minimum[0]
Even the original code is only slightly slower than this, based on ovgolovin's ideone benchmark. Of course, Jared's solution is quite faster and quite shorter.

Categories