Uniqueify returning a empty list - python

I'm new to python and trying to make a function Uniqueify(L) that will be given either a list of numbers or a list of strings (non-empty), and will return a list of the unique elements of that list.
So far I have:
def Uniquefy(x):
a = []
for i in range(len(x)):
if x[i] in a == False:
a.append(x[i])
return a
It looks like the if str(x[i]) in a == False: is failing, and that's causing the function to return a empty list.
Any help you guys can provide?

Relational operators all have exactly the same precedence and are chained. This means that this line:
if x[i] in a == False:
is evaluated as follows:
if (x[i] in a) and (a == False):
This is obviously not what you want.
The solution is to remove the second relational operator:
if x[i] not in a:

You can just create a set based on the list which will only contain unique values:
>>> s = ["a", "b", "a"]
>>> print set(s)
set(['a', 'b'])

The best option here is to use a set instead! By definition, sets only contain unique items and putting the same item in twice will not result in two copies.
If you need to create it from a list and need a list back, try this. However, if there's not a specific reason you NEED a list, then just pass around a set instead (that would be the duck-typing way anyway).
def uniquefy(x):
return list(set(x))

You can use the built in set type to get unique elements from a collection:
x = [1,2,3,3]
unique_elements = set(x)

You should use set() here. It reduces the in operation time:
def Uniquefy(x):
a = set()
for item in x:
if item not in a:
a.add(item)
return list(a)
Or equivalently:
def Uniquefy(x):
return list(set(x))

If order matters:
def uniquefy(x):
s = set()
return [i for i in x if i not in s and s.add(i) is None]
Else:
def uniquefy(x):
return list(set(x))

Related

Using Filter Function to categorize list by specific index + Printing by specific index or list name

I am creating and manipulating a list in Python, I am having trouble categorizing my lists via filter function...
I have 3 lists, that I append into one list, I frequently print (the_list) along the way, here is my code:
list1 = ['Georgia', 10.5, 'Peach'];
list2 = ['Florida', 21.3, 'Sunshine'];
list3= ['Alabama', 4.9, 'Dixie'];
#List that includes list1, list2 and list3 called "the_list"
the_list = []
the_list.append(list1)
the_list.append(list2)
the_list.append(list3)
the_list
#insert new values into the list (these values represent state numbers)
list1.insert(3, 4)
list2.insert(3, 27)
list3.insert(3, 22)
#print the modified list
print (the_list)
#Organize the list from lowest to highest (based off numbers in index 1)
the_list.sort(key=lambda tup: tup[1])
print (the_list)
#filter states by category based off their population
#This is where I need help
#Small States
def lessThanTen(index):
return index < 10
the_list
filter(lessThanTen, the_list)
print (the_list)
#Big States
def greaterThanTen(index):
return index > 10
the_list
filter(greaterThanTen, the_list)
print (the_list)
Is there a way to filter these lists into categories by a specific index number, in this case index [1] (Which is population), and subsequently output these list items by printing, either their list name or their value at index [0]...example 'Georgia' or "list1"
Python filter documentation:
filter(function, iterable) Construct an iterator from those elements
of iterable for which function returns true. iterable may be either a
sequence, a container which supports iteration, or an iterator. If
function is None, the identity function is assumed, that is, all
elements of iterable that are false are removed.
Note that filter(function, iterable) is equivalent to the generator
expression (item for item in iterable if function(item)) if function
is not None and (item for item in iterable if item) if function is
None.
It's unclear what you mean, but I'll try to help you the best I can.
First of all: you're greaterThanTen function takes index as input, at least it looks that way. filter doesn't pass index as an argument to greaterThanTen, but rather the element at that index.
Another thing: I don't know if you understand that filter only returns one 'category' as output -- you can only sort one condition at a time. Also, filter doesn't operate on the original list, but creates a new sequence, so filter(greaterThanTen, the_list) doesn't actually change anything. What you should do is: the_list = filter(greaterThanTen, the_list).
If you want to sort by the value at index 1 for each element in the list, you can do this:
filter(lambda element: yourSortFunction(elmenet[1]), list)
This is similar to the function you're using as a key to sort.
Another another thing: Why are you trying to call the_list in greaterThanTen, it makes no sense. The function stops evaluating code after the return statement.
Printing:
If you want to print a value from a specific index in a list just ask for that index.
print(LIST[index])
I hope this helps.
If you want to pass index as argument and maintain some flexibility of your list (you may have population at index other than 1), you can do this
def greaterThanTen(index):
return lambda x: x[index] > 10
def lessThanTen(index):
return lambda x: x[index] < 10
def myfilter(f, L):
return [x[0] for x in filter(f, L)]
print(myfilter(greaterThanTen(1), the_list)) # -> ['Georgia', 'Florida']
print(myfilter(lessThanTen(1), the_list)) # -> ['Alabama']
Or more generically,
import operator
def index_vs_num(index):
ops = {
'>' : operator.gt,
'<' : operator.lt,
'>=': operator.ge,
'<=': operator.le,
'=' : operator.eq
}
return lambda relation: lambda num: lambda x: ops[relation](x[index], num)
greaterThanTwenty = index_vs_num(1)('>')(20)
# the 1st argument is index of your population in the list
# the 2nd argument is the type of comparation
# the 3rd argument is the number to be compared
lessThanFive = index_vs_num(1)('<')(5)
def filter_by_outindex(*index):
def filter_by_f(f, L):
try:
return [x[index[0]] for x in filter(f, L)]
except IndexError:
return list(filter(f, L))
return filter_by_f
myfilter=filter_by_outindex(0)
print(myfilter(greaterThanTwenty, the_list)) # -> ['Florida']
print(myfilter(lessThanFive, the_list)) # -> ['Alabama']
I think this is what you actually want to achieve.
You could probably just do some sort of list comprehension and avoid filtering altogether.
the_final_list = [x for x in the_list if x[1] < 10]
This to me is simpler/more readable and accomplishes your objective.

python intersect of dict items

Suppose I have a dict like:
aDict[1] = '3,4,5,6,7,8'
aDict[5] = '5,6,7,8,9,10,11,12'
aDict[n] = '5,6,77,88'
The keys are arbitrary, and there could be any number of them. I want to consider every value in the dictionary.
I want to treat each string as comma-separated values, and find the intersection across the entire dictionary (the elements common to all dict values). So in this case the answer would be '5,6'. How can I do this?
from functools import reduce # if Python 3
reduce(lambda x, y: x.intersection(y), (set(x.split(',')) for x in aDict.values()))
First of all, you need to convert these to real lists.
l1 = '3,4,5,6,7,8'.split(',')
Then you can use sets to do the intersection.
result = set(l1) & set(l2) & set(l3)
Python Sets are ideal for that task. Consider the following (pseudo code):
intersections = None
for value in aDict.values():
temp = set([int(num) for num in value.split(",")])
if intersections is None:
intersections = temp
else:
intersections = intersections.intersection(temp)
print intersections
result = None
for csv_list in aDict.values():
aList = csv_list.split(',')
if result is None:
result = set(aList)
else:
result = result & set(aList)
print result
Since set.intersection() accepts any number of sets, you can make do without any use of reduce():
set.intersection(*(set(v.split(",")) for v in aDict.values()))
Note that this version won't work for an empty aDict.
If you are using Python 3, and your dictionary values are bytes objects rather than strings, just split at b"," instead of ",".

How to check if a list is contained inside another list without a loop?

Is there any builtins to check if a list is contained inside another list without doing any loop?
I looked for that in dir(list) but found nothing useful.
Depends on what you mean by "contained". Maybe this:
if set(a) <= set(b):
print("a is in b")
Assuming that you want to see if all elements of sublist are also elements of superlist:
all(x in superlist for x in sublist)
You might want to use a set
if set(a).issubset(b):
print('a is contained in b')
the solution depends on what values you expect from your lists.
if there is the possiblity of a repetition of a value, and you need to check that there is enough values in the tested container, then here is a time-inefficient solution:
def contained(candidate, container):
temp = container[:]
try:
for v in candidate:
temp.remove(v)
return True
except ValueError:
return False
test this function with:
>>> a = [1,1,2,3]
>>> b = [1,2,3,4,5]
>>> contained(a,b)
False
>>> a = [1,2,3]
>>> contained(a,b)
True
>>> a = [1,1,2,4,4]
>>> b = [1,1,2,2,2,3,4,4,5]
>>> contained(a,b)
True
of course this solution can be greatly improved: list.remove() is potentially time consuming and can be avoided using clever sorting and indexing. but i don't see how to avoid a loop here...
(anyway, any other solution will be implemented using sets or list-comprehensions, which are using loops internally...)
If you want to validate that all the items from the list1 are on list2 you can do the following list comprehension:
all(elem in list1 for elem in list2)
You can also replace list1 and list2 directly with the code that will return that list
all([snack in ["banana", "apple", "lemon", "chocolate", "chips"] for snack in ["chips","chocolate"])
That any + list comprehension can be translated into this for a better understanding of the code
return_value = False
for snack in snacks:
if snack in groceries:
return_value = True
else:
return_value = False

Removing an element from a list based on a predicate

I want to remove an element from list, such that the element contains 'X' or 'N'. I have to apply for a large genome. Here is an example:
input:
codon=['AAT','XAC','ANT','TTA']
expected output:
codon=['AAT','TTA']
For basis purpose
>>> [x for x in ['AAT','XAC','ANT','TTA'] if "X" not in x and "N" not in x]
['AAT', 'TTA']
But if you have huge amount of data, I suggest you to use dict or set
And If you have many characters other than X and N, you may do like this
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(ch for ch in list(x) if ch in ["X","N","Y","Z","K","J"])]
['AAT', 'TTA']
NOTE: list(x) can be just x, and ["X","N","Y","Z","K","J"] can be just "XNYZKJ", and refer gnibbler answer, He did the best one.
Another not fastest way but I think it reads nicely
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not any(y in x for y in "XN")]
['AAT', 'TTA']
>>> [x for x in ['AAT','XAC','ANT','TTA'] if not set("XN")&set(x)]
['AAT', 'TTA']
This way will be faster for long codons (assuming there is some repetition)
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]=not any(y in s for y in "XN")
return memo[s]
print filter(pred,codon)
Here is the method suggested by James Brooks, you'd have to test to see which is faster for your data
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not set("XN")&set(s)
return memo[s]
print filter(pred,codon)
For this sample codon, the version using sets is about 10% slower
There is also the method of doing it using filter
lst = filter(lambda x: 'X' not in x and 'N' not in x, list)
filter(lambda x: 'N' not in x or 'X' not in x, your_list)
your_list = [x for x in your_list if 'N' not in x or 'X' not in x]
I like gnibbler’s memoization approach a lot. Either method using memoization should be identically fast in the big picture on large data sets, as the memo dictionary should quickly be filled and the actual test should be rarely performed. With this in mind, we should be able to improve the performance even more for large data sets. (This comes at some cost for very small ones, but who cares about those?) The following code only has to look up an item in the memo dict once when it is present, instead of twice (once to determine membership, another to extract the value).
codon = ['AAT', 'XAC', 'ANT', 'TTA']
def pred(s,memo={}):
try:
return memo[s]
except KeyError:
memo[s] = not any(y in s for y in "XN")
return memo[s]
filtered = filter(pred, codon)
As I said, this should be noticeably faster when the genome is large (or at least not extremely small).
If you don’t want to duplicate the list, but just iterate over the filtered list, do something like:
for item in (item for item in codon if pred):
do_something(item)
If you're dealing with extremely large lists, you want to use methods that don't involve traversing the entire list any more than you absolutely need to.
Your best bet is likely to be creating a filter function, and using itertools.ifilter, e.g.:
new_seq = itertools.ifilter(lambda x: 'X' in x or 'N' in x, seq)
This defers actually testing every element in the list until you actually iterate over it. Note that you can filter a filtered sequence just as you can the original sequence:
new_seq1 = itertools.ifilter(some_other_predicate, new_seq)
Edit:
Also, a little testing shows that memoizing found entries in a set is likely to provide enough of an improvement to be worth doing, and using a regular expression is probably not the way to go:
seq = ['AAT','XAC','ANT','TTA']
>>> p = re.compile('[X|N]')
>>> timeit.timeit('[x for x in seq if not p.search(x)]', 'from __main__ import p, seq')
3.4722548536196314
>>> timeit.timeit('[x for x in seq if "X" not in x and "N" not in x]', 'from __main__ import seq')
1.0560532134670666
>>> s = set(('XAC', 'ANT'))
>>> timeit.timeit('[x for x in seq if x not in s]', 'from __main__ import s, seq')
0.87923730529996647
Any reason for duplicating the entire list? How about:
>>> def pred(item, haystack="XN"):
... return any(needle in item for needle in haystack)
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> idx = 0
>>> while idx < len(lst):
... if pred(lst[idx]):
... del lst[idx]
... else:
... idx = idx + 1
...
>>> lst
['AAT', 'TTA']
I know that list comprehensions are all the rage these days, but if the list is long we don't want to duplicate it without any reason right? You can take this to the next step and create a nice utility function:
>>> def remove_if(coll, predicate):
... idx = len(coll) - 1
... while idx >= 0:
... if predicate(coll[idx]):
... del coll[idx]
... idx = idx - 1
... return coll
...
>>> lst = ['AAT', 'XAC', 'ANT', 'TTA']
>>> remove_if(lst, pred)
['AAT', 'TTA']
>>> lst
['AAT', 'TTA']
As S.Mark requested here is my version. It's probably slower but does make it easier to change what gets removed.
def filter_genome(genome, killlist = set("X N".split()):
return [codon for codon in genome if 0 == len(set(codon) | killlist)]
It is (asympotically) faster to use a regular expression than searching many times in the same string for a certain character: in fact, with a regular expression the sequences is only be read at most once (instead of twice when the letters are not found, in gnibbler's original answer, for instance). With gnibbler's memoization, the regular expression approach reads:
import re
remove = re.compile('[XN]').search
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
if s not in memo:
memo[s]= not remove(s)
return memo[s]
print filter(pred,codon)
This should be (asymptotically) faster than using the "in s" or the "set" checks (i.e., the code above should be faster for long enough strings s).
I originally thought that gnibbler's answer could be written in a faster and more compact way with dict.setdefault():
codon = ['AAT','XAC','ANT','TTA']
def pred(s,memo={}):
return memo.setdefault(s, not any(y in s for y in "XN"))
print filter(pred,codon)
However, as gnibbler noted, the value in setdefault is always evaluated (even though, in principle, it could be evaluated only when the dictionary key is not found).
If you want to modify the actual list instead of creating a new one here is a simple set of functions that you can use:
from typing import TypeVar, Callable, List
T = TypeVar("T")
def list_remove_first(lst: List[T], accept: Callable[[T], bool]) -> None:
for i, v in enumerate(lst):
if accept(v):
del lst[i]
return
def list_remove_all(lst: List[T], accept: Callable[[T], bool]) -> None:
for i in reversed(range(len(lst))):
if accept(lst[i]):
del lst[i]

python list that matches everything

I probably didn't ask correctly: I would like a list value that can match any list: the "inverse" of (None,)
but even with (None,) it will match item as None (which I don't want)
The point is I have a function working with: [x for x in my_list if x[field] not in filter_list]
and I would like to filter everything or nothing without making tests like:
if filter_list==(None,): return [] and if filter_list==('*',): return my_list
PS: I wanted to simplify my question leading to some errors (list identifier) or stupid thing [x for x in x] ;)
Hi,
I need to do some filtering with list comprehension in python.
if I do something like that:
[x for x in list if x in (None,)]
I get rid of all values, which is fine
but I would like to have the same thing to match everything
I can do something like:
[x for x in list if x not in (None,)]
but it won't be homogeneous with the rest
I tried some things but for example (True,) matches only 1
Note than the values to filter are numeric, but if you have something generic (like (None,) to match nothing), it would be great
Thanks
Louis
__contains__ is the magic method that checks if something is in a sequence:
class everything(object):
def __contains__(self, _):
return True
for x in (1,2,3):
print x in everything()
The better syntax would be:
[x for x in lst if x is None]
[x for x in lst if x is not None]
What do you mean by
I would like to have the same thing to match everything
Just do
[x for x in list]
and every item in list is matched.
You could change your program to accept a filter object, instead of a list.
The abstract base filter would have a matches method, that returns true if x *matches".
Your general case filters would be constructed with a list argument, and would filter on membership of the list - the matches function would search the list and return true if the argument was in the list.
You could also have two special subclasses of the filter object : none and all.
These would have special match functions which either always return true (all) or false (none).
You don't need an if, you can just say
[x for x in list]
but I would like to have the same
thing to match everything
To match everything, you don't need if statement
[x for x in list1]
or If you really like to do
[x for x in list1 if x in [x]]
Answering your revised question: the list that "matches" all possible values is effectively of infinite length. So you can't do what you want to do without an if test. I suggest that your arg should be either a list or one of two values representing the "all" and "none" cases:
FILTER_NONE = object() # or []
FILTER_ALL = object()
def filter_func(alist, filter_list):
if filter_list is FILTER_ALL:
return []
elif filter_list is FILTER_NONE:
return alist
# or maybe alist[:] # copy the list
return [x for x in alist if x not in filter_list]
If filter_list is large, you may wish the replace the last line by:
filter_set = set(filter_list)
return [x for x in alist if x not in filter_set]
Alternatively, don't bother; just document that filter_list (renamed as filter_collection) can be anything that supports __contains__() and remind readers that sets will be faster than lists.

Categories