Counting in one list based on another - python

I'd like to count the elements in y that exist in the same order than in x. So for:
x = [a,b,c,d,e,f,g,h]
y = [c,a,b,z,k,f,g,d,s,t]
I'd want a function that returns me a 4 as 'a','b','c','d' are in y but not "e" I'd like a function that returns 4. y is random but it never has any duplicates. x is constant and len(x) = 8.
x and y are both lists of strings.
That means for:
x = [a,b,c,d,e,f,g,h]
y = [c,a,k,z,k,f,g,d,s,t]
I'd like the function to return 1.
I've tried something with a nested loop:
i = 0
h = 0
for s in x:
for t in y:
if s == t:
i = i + 1 #i is what I'm looking for in the end.
h = 0
elif h = 9:
break
else:
h = h + 1
My idea was to count the delta from one 't' to the next 't' but I can't get it to work properly as I just can't wrap my head around the required math.
Thanks a lot for your suggestions already and please enjoy your day!

In my previous answer, the code would throw an error when all elements of x were in y - so, here is my revised code:
print(([value in y for value in x] + [False]).index(False))
It does the job, but it's really hard to read. Let's split it up (the comments explain what each line does):
# This is our new list. In the previous code, this was a tuple - I'll get into
# that later. Basically, for each element in x, it checks whether that value is in
# y, resulting in a new list of boolean values. (In the last code, I used the map
# function with a lambda, but this is definitely more readable).
# For example, in OP's example, this list would look like
# [True, True, True, True, False, True, True, False]
new_list = [value in y for value in x]
# This is the step lacking with the old code and why I changed to a list.
# This adds a last False value, which prevents the index function from throwing an
# error if it doesn't find a value in the list (it instead returns the index this
# last False value is at). I had to convert from a tuple because
# you cannot add to a tuple, but you can add to a list. I was using a tuple in the
# last code because it is supposed to be faster than a list.
new_list_with_overflow = (new_list + [False])
# This is the final result. The index function gets the first element that is equal
# to False - meaning, it gets the index of the first element where x is not in y.
result = new_list_with_overflow.index(False)
# Finally, print out the result.
print(result)
Hopefully this explains what that one line is doing!
Some more links for reading:
What's the difference between lists and tuples?
How do I concatenate two lists in Python?
Python Docs on List Comprehensions
Here is another (arguably less readable) code snippet:
print((*(value in y for value in x), False).index(False))
A benefit of this code is that it uses tuples, so it is faster than the previous code, with the drawback of being a bit harder to understand. It also is not supported by older versions of python. However, I can leave this as an exercise for you to figure out! You might want to check out what the * does.
EDIT: This is the new answer. The code below only works when all elements of x are not in y - otherwise, it throws an error. Also, these solutions are just more readable.
A "pythonic" one-liner:
print(tuple(map(lambda value: value in y, x)).index(False))

Here's your function needed:
def counter(x, y):
print("_" * 50)
print("x: " + str(x))
print("y: " + str(y))
for i, letter in enumerate(x):
if letter not in y:
break
print("i: " + str(i))
return i
counter(
["a","b","c","d","e","f","g","h"],
["c","a","b","z","k","f","g","d","s","t"]
)
counter(
["a","b","c","d","e","f","g","h"],
["a","b","z","k","f","g","d","s","t"]
)
counter(
["a","b","c","d","e","f","g","h"],
["c","a","b","z","k","f","g","d","s","t", "e"]
)
return:
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['c', 'a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't']
i: 4
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't']
i: 2
__________________________________________________
x: ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
y: ['c', 'a', 'b', 'z', 'k', 'f', 'g', 'd', 's', 't', 'e']
i: 7

Using itertools.takewhile:
from itertools import takewhile
result = len(list(takewhile(lambda item : item in y, x)))
It takes every item in x starting from the first item in x until the condition lambda item : item in y is no longer satisfied.

Related

How to remove elements from a list that appear less than k = 2?

I am trying to keep elements of a list that appear at least twice, and remove the elements that appear less than twice.
For example, my list can look like:
letters = ['a', 'a', 'b', 'b', 'b', 'c']
I want to get a list with the numbers that appear at least twice, to get the following list:
letters_appear_twice = ['a', 'b'].
But since this is part of a bigger code, I don't know exactly what my lists looks like, only that I want to keep the letters that are repeated at least twice. But for the sake of understanding, we can assume I know what the list looks like!
I have tried the following:
'''
letters = ['a', 'a', 'b', 'b', 'b', 'c']
for x in set(letters):
if letters.count(x) > 2:
while x in letters:
letters.remove(x)
print(letters)
'''
But this doesn't quite work like I want it too...
Thank you in advance for any help!
letters = ['a', 'a', 'b', 'b', 'b', 'c']
res = []
for x in set(letters):
if letters.count(x) >= 2:
res.append(x)
print(res)
Prints:
['b', 'a']
Using your code above. You can make a new list, and append to it.
new_list = []
for x in set(letters):
if letters.count(x) >= 2:
new_list.append(x)
print(new_list)
Output
['b', 'a']
Easier to create a new list instead of manipulating the source list
def letters_more_or_equal_to_k(letters, k):
result = []
for x in set(letters):
if letters.count(x) >= k:
result.append(x)
result.sort()
return result
def main():
letters = ['a', 'a', 'b', 'b', 'b', 'c']
k = 2
result = letters_more_or_equal_to_k(letters, k)
print(result) # prints ['a', 'b']
if __name__ == "__main__":
main()
If you don't mind shuffling the values, here's one possible solution:
from collections import Counter
letters = ['a', 'a', 'b', 'b', 'b', 'c']
c = Counter(letters)
to_remove = {x for x, i in c.items() if i < 2}
result = list(set(letters) - to_remove)
print(result)
Output:
['a', 'b']
You can always sort later.
This solution is efficient for lists with more than ~10 unique elements.

Creating an irregular list of lists from a single list

I'm trying to create a list of lists from a single list. I'm able to do this if the new list of lists have the same number of elements, however this will not always be the case
As said earlier, the function below works when the list of lists have the same number of elements.
I've tried using regular expressions to determine if an element matches a pattern using
pattern2=re.compile(r'\d\d\d\d\d\d') because the first value on my new list of lists will always be 6 digits and it will be the only one that follows that format. However, i'm not sure of the syntax of getting it to stop at the next match and create another list
def chunks(l,n):
for i in range(0,len(l),n):
yield l[i:i+n]
The code above works if the list of lists will contain the same number of elements
Below is what I expect.
OldList=[111111,a,b,c,d,222222,a,b,c,333333,a,d,e,f]
DesiredList=[[111111,a,b,c,d],[222222,a,b,c],[333333,a,d,e,f]]
Many thanks indeed.
Cheers
Likely a much more efficient way to do this (with fewer loops), but here is one approach that finds the indexes of the breakpoints and then slices the list from index to index appending None to the end of the indexes list to capture the remaining items. If your 6 digit numbers are really strings, then you could eliminate the str() inside re.match().
import re
d = [111111,'a','b','c','d',222222,'a','b','c',333333,'a','d','e','f']
indexes = [i for i, x in enumerate(d) if re.match(r'\d{6}', str(x))]
groups = [d[s:e] for s, e in zip(indexes, indexes[1:] + [None])]
print(groups)
# [[111111, 'a', 'b', 'c', 'd'], [222222, 'a', 'b', 'c'], [333333, 'a', 'd', 'e', 'f']]
You can use a fold.
First, define a function to locate the start flag:
>>> def is_start_flag(v):
... return len(v) == 6 and v.isdigit()
That will be useful if the flags are not exactly what you expected them to be, or to exclude some false positives, or even if you need a regex.
Then use functools.reduce:
>>> L = d = ['111111', 'a', 'b', 'c', 'd', '222222', 'a', 'b', 'c', '333333', 'a', 'd', 'e', 'f']
>>> import functools
>>> functools.reduce(lambda acc, x: acc+[[x]] if is_start_flag(x) else acc[:-1]+[acc[-1]+[x]], L, [])
[['111111', 'a', 'b', 'c', 'd'], ['222222', 'a', 'b', 'c'], ['333333', 'a', 'd', 'e', 'f']]
If the next element x is the start flag, then append a new list [x] to the accumulator. Else, add the element to the current list, ie the last list of the accumulator.

Reverse a List by Swap Ends

I am trying to reverse a list's order by finding three bugs in this function. This function is supposed to reverse the first and last elements of a list, the second and second to last elements, and so on. I believe I found two, but am having trouble fixing the line of list[j] = y.
def reverse(list):
"""Reverses elements of a list."""
for i in range(len(list)):
j = len(list) - i
x = list[i]
y = list[j-1]
list[i] = x
list[j] = y
l = ['a', 'b', 'c', 'd', 'e']
reverse(l)
print(l)
Homework I suspect...
But - we all need a break from homework. By looping over the whole list you're reversing it twice.
def reverse(list):
"""Reverses elements of a list."""
for i in range(len(list)/2):
j = i + 1
x = list[i]
y = list[-j]
list[-j] = x
list[i] = y
l = ['a', 'b', 'c', 'd', 'e']
l=reverse(l)
print(l)
resulting in
['e', 'd', 'c', 'b', 'a']
You have a couple problems. Your first problem is that you use list[j] = y instead of list[j-1] = x. You defined y correctly with j-1, but you should be changing list[j-1] to the other one, x. Another problem is that you are going from the beginning of the list all the way to the end. Once you get to more than half way through the list, you are undoing your work. You also don't need to use len(list)-i because you can just use -i. Here is the updated code:
def reverse(seq):
"""Reverses elements of a list."""
for i in range(len(seq)//2):
x = seq[i]
y = seq[-i-1]
seq[i] = y
seq[-i-1] = x
l = ['a', 'b', 'c', 'd', 'e']
reverse(l)
print(l)
Output:
['e', 'd', 'c', 'b', 'a']
You don't even need to define x and y. Instead, do this:
def reverse(seq):
"""Reverses elements of a list."""
for i in range(len(list)//2):
seq[i], seq[-i-1] = seq[-i-1], seq[i]
I also changed your naming. There's probably a better name than seq, but list is unacceptable because it conflicts with the built-in type.
Use this code:
l = ['a', 'b', 'c', 'd', 'e']
l=l[::-1]
print(l)
Why you want to complicate this simple construction? Or if you don't wanna do this on that way, try to use:
l.reverse()
function. Python has a lot of functions ready to use.

Operate on a list in a pythonic way when output depends on other elements

I have a task requiring an operation on every element of a list, with the outcome of the operation depending on other elements in the list.
For example, I might like to concatenate a list of strings conditional on them starting with a particular character:
This code solves the problem:
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
concat = []
for element in x:
if element.startswith('*'):
concat.append(element)
else:
concat[len(concat) - 1] += element
resulting in:
concat
Out[16]: ['*abc', '*de', '*f', '*g']
But this seems horribly un-Pythonic. How should one operate on the elements of a list when the outcome of the operation depends on previous outcomes?
A few relevant excerpts from import this (the arbiter of what is Pythonic):
Simple is better than complex
Readability counts
Explicit is better than implicit.
I would just use code like this, and not worry about replacing the for loop with something "flatter".
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
partials = []
for element in x:
if element.startswith('*'):
partials.append([])
partials[-1].append(element)
concat = map("".join, partials)
You could use regex to accomplish this succinctly. This does however, sort of circumvent your question regarding how to operate on dependent list elements. Credits to mbomb007 for improving the allowed character functionality.
import re
z = re.findall('\*[^*]+',"".join(x))
Outputs:
['*abc', '*de', '*f', '*g']
Small benchmarking:
Donkey Kong's answer:
import timeit
setup = '''
import re
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100
'''
print (min(timeit.Timer('re.findall("\*[^\*]+","".join(x))', setup=setup).repeat(7, 1000)))
print (min(timeit.Timer('re.findall("\*[^\*]+","".join(y))', setup=setup).repeat(7, 1000)))
Returns 0.00226416693456, and 0.06827958075, respectively.
Chepner's answer:
setup = '''
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100
def chepner(x):
partials = []
for element in x:
if element.startswith('*'):
partials.append([])
partials[-1].append(element)
concat = map("".join, partials)
return concat
'''
print (min(timeit.Timer('chepner(x)', setup=setup).repeat(7, 1000)))
print (min(timeit.Timer('chepner(y)', setup=setup).repeat(7, 1000)))
Returns 0.00456210269896 and 0.364635824689, respectively.
Saksham's answer
setup = '''
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100
'''
print (min(timeit.Timer("['*'+item for item in ''.join(x).split('*') if item]", setup=setup).repeat(7, 1000)))
print (min(timeit.Timer("['*'+item for item in ''.join(y).split('*') if item]", setup=setup).repeat(7, 1000))))
Returns 0.00104848906006, and 0.0556093171512 respectively.
tl;dr Saksham's is slightly faster than mine, then Chepner's follows both of ours.
How about this:
>>> x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
>>> print ['*'+item for item in ''.join(x).split('*') if item]
['*abc', '*de', '*f', '*g']
"".join(x).split("*")
maybe sufficient, of coarse this may be a contrived example in your OP that is oversimplified and as such this will not work
I feel that this is very Pythonic:
# assumes no empty strings, or no spaces in strings
"".join(x).replace('*', ' *').split()
Here's a functional approach to it:
from functools import reduce
# assumes no empty strings
def reduction(l, it):
if it[0] == '*':
return l + [it]
else:
new_l, last = l[:-1], l[-1]
return new_l + [last + it]
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
print reduce(reduction, x, [])
>>> ['*abc', '*de', '*f', '*g']
If you are a fan of lambdas (not very Pythonic), you could get away with this:
# Don't do this, it's ugly and unreadable.
reduce(lambda l, it: l + [it] if it.startswith('*') else l[:-1] + [l[-1]+it], x, [])
This is awfully close to what itertools.groupby does, and in fact with a little helping of curry I can make it keep grouping until a "break" condition occurs, such as startswith('*').
from itertools import groupby
def new_group_when_true(pred):
group_num = 0
def group_for_elem(elem):
nonlocal group_num
if pred(elem):
group_num +=1
return group_num
return group_for_elem
l = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
test = new_group_when_true(lambda elem: elem.startswith('*'))
grouped = [list(v) for k,v in groupby(l, test)]
Resulting in:
>>> print(grouped)
[['*a', 'b', 'c'], ['*d', 'e'], ['*f'], ['*g']]
The nonlocal keyword requires Python 3, of course. Another possibility would be to make a class, along the lines of the groupby "equivalent code" from the itertools docs.
I don't know that this is more Pythonic than your code, but I think the idea of going to the standard library to see if something almost fits your needs is a useful point.

Ordered Sets Python 2.7

I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark

Categories