Counting elements in a list with given property values - python

I'm using Python 2.7. I have a list of objects that each include a property called vote that can have values of either yes or no, and I want to count the number of yes and no votes. One way to do this is:
list = [ {vote:'yes'}, {vote:'no'}, {vote:'yes'} ] #...
numYesVotes = len([x for x in list if x.vote=='yes'])
numNoVotes = len([x for x in list if x.vote=='no'])
but it seems horribly wasteful/inefficient to me to build these lists only to get their lengths and
First question: Am I right about that? Wouldn't it a good bit more efficient to simply loop through the list once and increment counter variables? i.e:
numYesVotes = numNoVotes = 0;
for x in list:
if x.vote == 'yes':
numYesVotes++
else:
numNoVotes++
or is there something about list comprehensions that would make them more efficient here?
Second question: I'm just learning to use lambdas, and I have a feeling this is a perfect use case for one - but I can't quite figure out how to use one here how might I do that?

See Counter
Counter(x.vote for x in mylst)
Edit:
Example:
yn = Counter("yes" if x%2 else "no" for x in range(10))
print(yn["yes"],yn["no"])

Note that it is faster to do:
sum(1 for x in L if x.vote=='yes')
Than:
len([x for x in L if x.vote=='yes'])
As no list has to be created.

The lambda form;
t =lambda lst: len([ v for x in lst for k,v in x.items() if v=='yes'])
#print (t(mylst))
mylst is the list that you want to check how much yes or no in that. Use key-dict methods.
Demo;
mylst = [ {"vote":'yes'}, {"vote":'no'}, {"vote":'yes'} ]
t =lambda lst: len([ v for x in lst for k,v in x.items() if v=='yes'])
print (t(mylst))
>>>
2
>>>

Related

How delete a element of a list and save the original index of deleted element?

I want delete some elements of one list equal to a value:
I can do it :
List =[1,2,3.....]
List = [x for x in List if x != 2]
How can i save the indexs of the deleted elements ?
I want to use this index to delete elements of another list.
Simplest solution is to make a list of indices to keep, then use that to strip the elements from both of your lists. itertools provides a handy compress utility to apply the indices to keep quickly:
from itertools import compress
tokeep = [x != 2 for x in List]
List = list(compress(List, tokeep))
otherlist = list(compress(otherlist, tokeep))
Alternatively (and frankly more clearly) you can just use one loop to strip both inputs; listcomps are fun, but sometimes they're not the way to go.
newlist = []
newotherlist = []
for x, y in zip(List, otherlist):
if x != 2:
newlist.append(x)
newotherlist.append(y)
which gets the same effect in a single pass. Even if it does feel less overtly clever, it's very clear, which is a good thing; brevity for the sake of brevity that creates complexity is not a win.
And now, to contradict that last paragraph, the amusingly overtly clever and brief solution to one-line this:
List, otherlist = map(list, zip(*[(x, y) for x, y in zip(List, otherlist) if x != 2]))
For the love of sanity, please don't actually use this, I just had to write it for funsies.
You can also leverage enumerate
for index, val in enumerate(List):
if val == value:
del List[index]
break
print(index)
Based on documentation
list_first = ['d', 'a']
list_second = ['x', 'z']
def remove_from_lists(element):
index_deleted = list_first.index(element)
list_first.remove(element)
list_second.pop(index_deleted)
remove_from_lists('d')

Count the number of elements that return 1 in a list in one liner

I have a really basic and simple question but for some reason I can't figure it out. I have the following code in python:
counter = 0
for el in mylist:
if self.check_el(el):
counter += 1
I want to make it in one line. Is it something that possible to achieve?
Yeah, here is one option:
counter = sum(1 for el in mylist if self.check_el(el))
counter = sum(int(self.check_el(el)) for el in mylist)
sum(map(lambda el: bool(self.check_el(el)), my_list))
Or if you know check_el always returns a bool:
sum(map(self.check_el, my_list))
I suggest
len(list(filter(self.check_el, mylist)))
It filters out all elemenents not fulfilling self.check_el, converts the filter object into a list, and then takes the length of said list. If you want to iterate over the elements, filter(self.check_el, mylist) is iterable.

Intersection in sets

I have my_dict with sets as values and I have x which is also a set.
I need to return list with set from my dict which contain all numbers in x. If set in my_dict does not contain all numbers in x I do not want to return it.
I want to use intersection (&) but it returns all the sets in my_dict.
my_dict = {1: {1,2,3,4,5},
2: {1,2,3,7,8},
3: {1,2,3,4}
}
x = {1,2,5}
new_list = []
for i in my_dict:
if my_dict[i] & x:
new_list.append(i)
print(new_list)
Output:
[1, 2, 3]
I need to receive [1] instead of [1, 2, 3]
When intersection becomes x that means all values in x are present in the set in dictionary.
for i in my_dict:
if (my_dict[i] & x)==x:
new_list.append(i)
print(new_list)
Edit: as suggested in the comments below you can also do
for i in my_dict:
if x.issubset(my_dict[i]):
new_list.append(i)
print(new_list)
I suggest you use the set.issuperset method, rather than using the & operator. Why combine several operators when a method exists to do exactly what you want?
new_list = []
for i in my_dict:
if my_dict[i].issuperset(x):
new_list.append(i)
Note that I'd normally write this with a list comprehension:
newlist = [key for key, value in my_dict.items() if value.issuperset(x)]
The inter section between my_dict values and x should be equal to x that means x should be a subset of my_dict value
my_dict = {1: {1,2,3,4,5},
2: {1,2,3,7,8},
3: {1,2,3,4}}
x = {1,2,5}
new_list = []
for i,j in my_dict.items():
if x.issubset(j):
new_list.append(i)
print(new_list)
This can also be solved using the issubset function. Here's an example:
for i in my_dict:
if x.issubset(my_dict[i]):
new_list.append(i)
Output: [1]
In this example, we're checking whether the value of every key value pair in the dictionary is a super-set of x (in other words x belongs to my_dict[i]), if that is the case then we just append the index to the desired list.
To check whether the entirety of a set is within another set, the nicest (in my opinon) way is to use the < and > operators, which are override to act as the equivalent of "is a superset of" in mathematics, and equivalent to the set.issuperset method. The advantage of this way is that the >= and <= operators are naturally available to check non-strict supersets.
Here's quite an idomatic way of doing it:
new_list = []
for key, value in my_dict.items():
if value >= x:
new_list.append(key)
The problem with your original code is it checks to see if there is any intersection between the two sets, i.e. they share even just one element, when you seem to want to check if all of x: set is in the set you're checking against.
I would also advise using a list compehension if you want to simplify the code, unless you have other steps you also need to do.
new_list = [key for key, value in my_dict.items() if value >= x]

Python, finding unique words in multiple lists

I have the following code:
a= ['hello','how','are','hello','you']
b= ['hello','how','you','today']
len_b=len(b)
for word in a:
count=0
while count < len_b:
if word == b[count]:
a.remove(word)
break
else:
count=count+1
print a
The goal is that it basically outputs (contents of list a)-(contents of list b)
so the wanted result in this case would be a = ['are','hello']
but when i run my code i get a= ['how','are','you']
can anybody either point out what is wrong with my implementation, or is there another better way to solve this?
You can use a set to get all non duplicate elements
So you could do set(a) - set(b) for the difference of sets
The reason for this is because you are mutating the list a while iterating over it.
If you want to solve it correctly, you can try the below method. It uses list comprehension and dictionary to keep track of the number of words in the resulting set:
>>> a = ['hello','how','are','hello','you']
>>> b = ['hello','how','you','today']
>>>
>>> cnt_a = {}
>>> for w in a:
... cnt_a[w] = cnt_a.get(w, 0) + 1
...
>>> for w in b:
... if w in cnt_a:
... cnt_a[w] -= 1
... if cnt_a[w] == 0:
... del cnt_a[w]
...
>>> [y for k, v in cnt_a.items() for y in [k] * v]
['hello', 'are']
It works well in case where there are duplicates, even in the resulting list. However it may not preserve the order, but it can be easily modify to do this if you want.
set(a+b) is alright, too. You can use sets to get unique elements.

Removing an element from a list based on a predicate

I want to remove an element from list, such that the element contains 'X' or 'N'. I have to apply for a large genome. Here is an example:
input:
codon=['AAT','XAC','ANT','TTA']
expected output:
codon=['AAT','TTA']
For basis purpose
>>> [x for x in ['AAT','XAC','ANT','TTA'] if "X" not in x and "N" not in x]
['AAT', 'TTA']
But if you have huge amount of data, I suggest you to use dict or set
And If you have many characters other than X and N, you may do like this
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(ch for ch in list(x) if ch in ["X","N","Y","Z","K","J"])]
['AAT', 'TTA']
NOTE: list(x) can be just x, and ["X","N","Y","Z","K","J"] can be just "XNYZKJ", and refer gnibbler answer, He did the best one.
Another not fastest way but I think it reads nicely
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(y in x for y in "XN")]
['AAT', 'TTA']
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not set("XN")&set(x)]
['AAT', 'TTA']
This way will be faster for long codons (assuming there is some repetition)
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]=not any(y in s for y in "XN")
return memo[s]
print filter(pred,codon)
Here is the method suggested by James Brooks, you'd have to test to see which is faster for your data
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not set("XN")&set(s)
return memo[s]
print filter(pred,codon)
For this sample codon, the version using sets is about 10% slower
There is also the method of doing it using filter
lst = filter(lambda x: 'X' not in x and 'N' not in x, list)
filter(lambda x: 'N' not in x or 'X' not in x, your_list)
your_list = [x for x in your_list if 'N' not in x or 'X' not in x]
I like gnibbler’s memoization approach a lot. Either method using memoization should be identically fast in the big picture on large data sets, as the memo dictionary should quickly be filled and the actual test should be rarely performed. With this in mind, we should be able to improve the performance even more for large data sets. (This comes at some cost for very small ones, but who cares about those?) The following code only has to look up an item in the memo dict once when it is present, instead of twice (once to determine membership, another to extract the value).
codon = ['AAT', 'XAC', 'ANT', 'TTA']
def pred(s,memo={}):
try:
return memo[s]
except KeyError:
memo[s] = not any(y in s for y in "XN")
return memo[s]
filtered = filter(pred, codon)
As I said, this should be noticeably faster when the genome is large (or at least not extremely small).
If you don’t want to duplicate the list, but just iterate over the filtered list, do something like:
for item in (item for item in codon if pred):
do_something(item)
If you're dealing with extremely large lists, you want to use methods that don't involve traversing the entire list any more than you absolutely need to.
Your best bet is likely to be creating a filter function, and using itertools.ifilter, e.g.:
new_seq = itertools.ifilter(lambda x: 'X' in x or 'N' in x, seq)
This defers actually testing every element in the list until you actually iterate over it. Note that you can filter a filtered sequence just as you can the original sequence:
new_seq1 = itertools.ifilter(some_other_predicate, new_seq)
Edit:
Also, a little testing shows that memoizing found entries in a set is likely to provide enough of an improvement to be worth doing, and using a regular expression is probably not the way to go:
seq = ['AAT','XAC','ANT','TTA']
>>> p = re.compile('[X|N]')
>>> timeit.timeit('[x for x in seq if not p.search(x)]', 'from __main__ import p, seq')
3.4722548536196314
>>> timeit.timeit('[x for x in seq if "X" not in x and "N" not in x]', 'from __main__ import seq')
1.0560532134670666
>>> s = set(('XAC', 'ANT'))
>>> timeit.timeit('[x for x in seq if x not in s]', 'from __main__ import s, seq')
0.87923730529996647
Any reason for duplicating the entire list? How about:
>>> def pred(item, haystack="XN"):
... return any(needle in item for needle in haystack)
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> idx = 0
>>> while idx < len(lst):
... if pred(lst[idx]):
... del lst[idx]
... else:
... idx = idx + 1
...
>>> lst
['AAT', 'TTA']
I know that list comprehensions are all the rage these days, but if the list is long we don't want to duplicate it without any reason right? You can take this to the next step and create a nice utility function:
>>> def remove_if(coll, predicate):
... idx = len(coll) - 1
... while idx >= 0:
... if predicate(coll[idx]):
... del coll[idx]
... idx = idx - 1
... return coll
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> remove_if(lst, pred)
['AAT', 'TTA']
>>> lst
['AAT', 'TTA']
As S.Mark requested here is my version. It's probably slower but does make it easier to change what gets removed.
def filter_genome(genome, killlist = set("X N".split()):
return [codon for codon in genome if 0 == len(set(codon) | killlist)]
It is (asympotically) faster to use a regular expression than searching many times in the same string for a certain character: in fact, with a regular expression the sequences is only be read at most once (instead of twice when the letters are not found, in gnibbler's original answer, for instance). With gnibbler's memoization, the regular expression approach reads:
import re
remove = re.compile('[XN]').search
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not remove(s)
return memo[s]
print filter(pred,codon)
This should be (asymptotically) faster than using the "in s" or the "set" checks (i.e., the code above should be faster for long enough strings s).
I originally thought that gnibbler's answer could be written in a faster and more compact way with dict.setdefault():
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
return memo.setdefault(s, not any(y in s for y in "XN"))
print filter(pred,codon)
However, as gnibbler noted, the value in setdefault is always evaluated (even though, in principle, it could be evaluated only when the dictionary key is not found).
If you want to modify the actual list instead of creating a new one here is a simple set of functions that you can use:
from typing import TypeVar, Callable, List
T = TypeVar("T")
def list_remove_first(lst: List[T], accept: Callable[[T], bool]) -> None:
for i, v in enumerate(lst):
if accept(v):
del lst[i]
return
def list_remove_all(lst: List[T], accept: Callable[[T], bool]) -> None:
for i in reversed(range(len(lst))):
if accept(lst[i]):
del lst[i]

Categories