I have a list that I'm attempting to remove duplicate items from. I'm using python 2.7.1 so I can simply use the set() function. However, this reorders my list. Which for my particular case is unacceptable.
Below is a function I wrote; which does this. However I'm wondering if there's a better/faster way. Also any comments on it would be appreciated.
def ordered_set(list_):
newlist = []
lastitem = None
for item in list_:
if item != lastitem:
newlist.append(item)
lastitem = item
return newlist
The above function assumes that none of the items will be None, and that the items are in order (ie, ['a', 'a', 'a', 'b', 'b', 'c', 'd'])
The above function returns ['a', 'a', 'a', 'b', 'b', 'c', 'd'] as ['a', 'b', 'c', 'd'].
Another very fast method with set:
def remove_duplicates(lst):
dset = set()
# relies on the fact that dset.add() always returns None.
return [item for item in lst
if item not in dset and not dset.add(item)]
Use an OrderedDict:
from collections import OrderedDict
l = ['a', 'a', 'a', 'b', 'b', 'c', 'd']
d = OrderedDict()
for x in l:
d[x] = True
# prints a b c d
for x in d:
print x,
print
Assuming the input sequence is unordered, here's O(N) solution (both in space and time).
It produces a sequence with duplicates removed, while leaving unique items in the same relative order as they appeared in the input sequence.
>>> def remove_dups_stable(s):
... seen = set()
... for i in s:
... if i not in seen:
... yield i
... seen.add(i)
>>> list(remove_dups_stable(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e']))
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I know this has already been answered, but here's a one-liner (plus import):
from collections import OrderedDict
def dedupe(_list):
return OrderedDict((item,None) for item in _list).keys()
>>> dedupe(['q', 'w', 'e', 'r', 'q', 'w', 'y', 'u', 'i', 't', 'e', 'p', 't', 'y', 'e'])
['q', 'w', 'e', 'r', 'y', 'u', 'i', 't', 'p']
I think this is perfectly OK. You get O(n) performance which is the best you could hope for.
If the list were unordered, then you'd need a helper set to contain the items you've already visited, but in your case that's not necessary.
if your list isn't sorted then your question doesn't make sense.
e.g. [1,2,1] could become [1,2] or [2,1]
if your list is large you may want to write your result back into the same list using a SLICE to save on memory:
>>> x=['a', 'a', 'a', 'b', 'b', 'c', 'd']
>>> x[:]=[x[i] for i in range(len(x)) if i==0 or x[i]!=x[i-1]]
>>> x
['a', 'b', 'c', 'd']
for inline deleting see Remove items from a list while iterating or Remove items from a list while iterating without using extra memory in Python
one trick you can use is that if you know x is sorted, and you know x[i]=x[i+j] then you don't need to check anything between x[i] and x[i+j] (and if you don't need to delete these j values, you can just copy the values you want into a new list)
So while you can't beat n operations if everything in the set is unique i.e. len(set(x))=len(x)
There is probably an algorithm that has n comparisons as its worst case but can have n/2 comparisons as its best case (or lower than n/2 as its best case if you know somehow know in advance that len(x)/len(set(x))>2 because of the data you've generated):
The optimal algorithm would probably use binary search to find maximum j for each minimum i in a divide and conquer type approach. Initial divisions would probably be of length len(x)/approximated(len(set(x))). Hopefully it could be carried out such that even if len(x)=len(set(x)) it still uses only n operations.
There is unique_everseen solution described in
http://docs.python.org/2/library/itertools.html
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Looks ok to me. If you really want to use sets do something like this:
def ordered_set (_list) :
result = set()
lastitem = None
for item in _list :
if item != lastitem :
result.add(item)
lastitem = item
return sorted(tuple(result))
I don't know what performance you will get, you should test it; probably the same because of method's overheat!
If you really are paranoid, just like me, read here:
http://wiki.python.org/moin/HowTo/Sorting/
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Just remembered this(it contains the answer):
http://www.peterbe.com/plog/uniqifiers-benchmark
Related
I want to rearrange a list based on another list which have common elements between them.
my list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
Above lists have common 'a','b' and 'c' as common elements.the expected outcome for is as below
my_result = ['a','b','c','q','s','f','l','x']
Thanks in Advance
Sky
my_list = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
res1=[x for x in base_list if x in my_list] # common elements
res2=[x for x in my_list if x not in res1] #
res3=res1+res2
Output :
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
Create a custom key for sorted as shown in this document. Set the value arbitrarily high for the letters that don't appear in the base_list so they end up in the back. Since sorted is considered stable those that aren't in the base_list will remain untouched in terms of original order.
l = ['q','s','b','f','l','c','x','a']
base_list = ['z','a','b','c']
def custom_key(letter):
try:
return base_list.index(letter)
except ValueError:
return 1_000
sorted(l, key=custom_key)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
A (probably non optimal) way:
>>> sorted(my_list, key=lambda x: base_list.index(x) if x in base_list
else len(base_list)+1)
['a', 'b', 'c', 'q', 's', 'f', 'l', 'x']
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u',
'v', 'w', 'x', 'y', 'z']
endlist = []
def loopfunc(n, lis):
if n ==0:
endlist.append(lis[0]+lis[1]+lis[2]+lis[3]+lis[4])
for i in alphabet:
if n >0:
lis.append(i)
loopfunc(n-1, lis )
loopfunc(5, [])
This program is supposed to make endlist be:
endlist = [aaaaa, aaaab, aaaac, ... zzzzy, zzzzz]
But it makes it:
endlist = [aaaaa, aaaaa, aaaaa, ... , aaaaa]
The lenght is right, but it won't make different words. Can anyone help me see why?
The only thing you ever add to endlist is the first 5 elements of lis, and since you have a single lis that is shared among all the recursive calls (note that you never create a new list in this code other than the initial values for endlist and lis, so every append to lis is happening to the same list), those first 5 elements are always the a values that you appended in your first 5 recursive calls. The rest of the alphabet goes onto the end of lis and is never reached by any of your other code.
Since you want string in the end, it's a little easier just to use strings for collecting your items. This avoids the possibility of shared mutable references which is cause your issues. With that the recursion becomes pretty concise:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def loopfunc(n, lis=""):
if n < 1:
return [lis]
res = []
for a in alphabet:
res.extend(loopfunc(n-1, lis + a))
return res
l = loopfunc(5)
print(l[0], l[1], l[-1], l[-2])
# aaaaa aaaab zzzzz zzzzy
Note that with n=5 you'll have almost 12 million combinations. If you plan on having larger n values, it may be worth rewriting this as a generator.
I'm trying to create a list of lists from a single list. I'm able to do this if the new list of lists have the same number of elements, however this will not always be the case
As said earlier, the function below works when the list of lists have the same number of elements.
I've tried using regular expressions to determine if an element matches a pattern using
pattern2=re.compile(r'\d\d\d\d\d\d') because the first value on my new list of lists will always be 6 digits and it will be the only one that follows that format. However, i'm not sure of the syntax of getting it to stop at the next match and create another list
def chunks(l,n):
for i in range(0,len(l),n):
yield l[i:i+n]
The code above works if the list of lists will contain the same number of elements
Below is what I expect.
OldList=[111111,a,b,c,d,222222,a,b,c,333333,a,d,e,f]
DesiredList=[[111111,a,b,c,d],[222222,a,b,c],[333333,a,d,e,f]]
Many thanks indeed.
Cheers
Likely a much more efficient way to do this (with fewer loops), but here is one approach that finds the indexes of the breakpoints and then slices the list from index to index appending None to the end of the indexes list to capture the remaining items. If your 6 digit numbers are really strings, then you could eliminate the str() inside re.match().
import re
d = [111111,'a','b','c','d',222222,'a','b','c',333333,'a','d','e','f']
indexes = [i for i, x in enumerate(d) if re.match(r'\d{6}', str(x))]
groups = [d[s:e] for s, e in zip(indexes, indexes[1:] + [None])]
print(groups)
# [[111111, 'a', 'b', 'c', 'd'], [222222, 'a', 'b', 'c'], [333333, 'a', 'd', 'e', 'f']]
You can use a fold.
First, define a function to locate the start flag:
>>> def is_start_flag(v):
... return len(v) == 6 and v.isdigit()
That will be useful if the flags are not exactly what you expected them to be, or to exclude some false positives, or even if you need a regex.
Then use functools.reduce:
>>> L = d = ['111111', 'a', 'b', 'c', 'd', '222222', 'a', 'b', 'c', '333333', 'a', 'd', 'e', 'f']
>>> import functools
>>> functools.reduce(lambda acc, x: acc+[[x]] if is_start_flag(x) else acc[:-1]+[acc[-1]+[x]], L, [])
[['111111', 'a', 'b', 'c', 'd'], ['222222', 'a', 'b', 'c'], ['333333', 'a', 'd', 'e', 'f']]
If the next element x is the start flag, then append a new list [x] to the accumulator. Else, add the element to the current list, ie the last list of the accumulator.
Is there any way to create an iterator to repeat elements in a list certain times? For example, a list is given:
color = ['r', 'g', 'b']
Is there a way to create a iterator in form of itertools.repeatlist(color, 7) that can produce the following list?
color_list = ['r', 'g', 'b', 'r', 'g', 'b', 'r']
You can use itertools.cycle() together with itertools.islice() to build your repeatlist() function:
from itertools import cycle, islice
def repeatlist(it, count):
return islice(cycle(it), count)
This returns a new iterator; call list() on it if you must have a list object.
Demo:
>>> from itertools import cycle, islice
>>> def repeatlist(it, count):
... return islice(cycle(it), count)
...
>>> color = ['r', 'g', 'b']
>>> list(repeatlist(color, 7))
['r', 'g', 'b', 'r', 'g', 'b', 'r']
The documentation of cycle says:
Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).
I'm curious why python doesn't provide a more efficient implementation:
def cycle(it):
while True:
for x in it:
yield x
def repeatlist(it, count):
return [x for (i, x) in zip(range(count), cycle(it))]
This way, you don't need to save a whole copy of the list. And it works if list is infinitely long.
I'm new to programming in general, so looking to really expand my skills here. I'm trying to write a script that will grab a list of strings from an object, then order them based on a template of my design. Any items not in the template will be added to the end.
Here's how I'm doing it now, but could someone suggest a better/more efficient way?
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
listFinal = []
for thing in listTemplate:
if thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
for thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
Try this:
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
order = { element:index for index, element in enumerate(listTemplate) }
sorted(originalList, key=lambda element: order.get(element, float('+inf')))
=> ['a', 'b', 'c', 'd', 'z']
This is how it works:
First, we build a dictionary indicating, for each element in listTemplate, its relative order with respect to the others. For example a is 0, b is 1 and so on
Then we sort originalList. If one of its elements is present in the order dictionary, then use its relative position for ordering. If it's not present, return a positive infinite value - this will guarantee that the elements not in listTemplate will end up at the end, with no further ordering among them.
The solution in the question, although correct, is not very pythonic. In particular, whenever you have to build a new list, try to use a list comprehension instead of explicit looping/appending. And it's not a good practice to "destroy" the input list (using pop() in this case).
You can create a dict using the listTemplate list, that way the expensive(O(N)) list.index operations can be reduced to O(1) lookups.
>>> lis1 = ['b', 'a', 'c', 'z', 'd']
>>> lis2 = ['a', 'b', 'c', 'd']
Use enumerate to create a dict with the items as keys(Considering that the items are hashable) and index as values.
>>> dic = { x:i for i,x in enumerate(lis2) }
Now dic looks like:
{'a': 0, 'c': 2, 'b': 1, 'd': 3}
Now for each item in lis1 we need to check it's index in dic, if the key is not found we return float('inf').
Function used as key:
def get_index(key):
return dic.get(key, float('inf'))
Now sort the list:
>>> lis1.sort(key=get_index)
>>> lis1
['a', 'b', 'c', 'd', 'z']
For the final step, you can just use:
listFinal += originalList
and it will add these items to the end.
There is no need to create a new dictionary at all:
>>> len_lis1=len(lis1)
>>> lis1.sort(key = lambda x: lis2.index(x) if x in lis2 else len_lis1)
>>> lis1
['a', 'b', 'c', 'd', 'z']
Here is a way that has better computational complexity:
# add all elements of originalList not found in listTemplate to the back of listTemplate
s = set(listTemplate)
listTemplate.extend(el for el in originalList if el not in s)
# now sort
rank = {el:index for index,el in enumerate(listTemplate)}
listFinal = sorted(originalList, key=rank.get)