Fast sorting of large nested lists - python

I am looking to find out the likelihood of parameter combinations using Monte Carlo Simulation.
I've got 4 parameters and each can have about 250 values.
I have randomly generated 250,000 scenarios for each of those parameters using some probability distribution function.
I now want to find out which parameter combinations are the most likely to occur.
To achieve this I have started by filtering out any duplicates from my 250,000 randomly generated samples in order to reduce the length of the list.
I then iterated through this reduced list and checked how many times each scenario occurs in the original 250,000 long list.
I have a large list of 250,000 items which contains lists, as such :
a = [[1,2,5,8],[1,2,5,8],[3,4,5,6],[3,4,5,7],....,[3,4,5,7]]# len(a) is equal to 250,000
I want to find a fast and efficient way of having each list within my list only occurring once.
The end goal is to count the occurrences of each list within list a.
so far I've got:
'''Removing duplicates from list a and storing this as a new list temp'''
b_set = set(tuple(x) for x in a)
temp = [ list(x) for x in b_set ]
temp.sort(key = lambda x: a.index(x) )
''' I then iterate through each of my possible lists (i.e. temp) and count how many times they occur in a'''
most_likely_dict = {}
for scenario in temp:
freq = list(scenario_list).count(scenario)
most_likely_dict[str(scenario)] = freq
at the moment it takes a good 15 minutes to perform ... Any suggestion on how to turn that into a few seconds would be greatly appreciated !!

You can take out the sorting part, as the final result is a dictionary which will be unordered in any case, then use a dict comprehension:
>>> a = [[1,2],[1,2],[3,4,5],[3,4,5], [3,4,5]]
>>> a_tupled = [tuple(i) for i in a]
>>> b_set = set(a_tupled)
>>> {repr(i): a_tupled.count(i) for i in b_set}
{'(1, 2)': 2, '(3, 4, 5)': 3}
calling list on your tuples will add more overhead, but you can if you want to
>>> {repr(list(i)): a_tupled.count(i) for i in b_set}
{'[3, 4, 5]': 3, '[1, 2]': 2}
Or just use a Counter:
>>> from collections import Counter
>>> Counter(tuple(i) for i in a)

{str(item):a.count(item) for item in a}
Input :
a = [[1,2,5,8],[1,2,5,8],[3,4,5,6],[3,4,5,7],[3,4,5,7]]
Output :
{'[3, 4, 5, 6]': 1, '[1, 2, 5, 8]': 2, '[3, 4, 5, 7]': 2}

Related

If values exist in list, put them at the back of the list in Python

I have a list with values and a list with some given numbers:
my_list = [1, 3, 5, 6, 8, 10]
my_numbers = [2, 3, 4]
Now I want to know if the values in my_numbers exist in my_list and, if so, put the matching value(s) to the back of my_list. I can do this for example like this:
for number in my_numbers:
if number in my_list:
my_list.remove(number)
my_list.append(number)
Specifics:
I can be certain that there are no duplicates within neither of the lists due to my program setup.
The order of which the matching numbers in my_numbers are put in the back of my_list does not matter.
Question: Can I do this more efficiently performance wise?
One possible solution is this: rebuild the list my_list from two parts:
the items that are not in my_numbers
the items that are in my_numbers
Note that I would suggest to use a set for the lookup. A membership test for a set is O(1) (constant time), whereas such a test for a list is O(n), where n is the length of the list.
This means that the total runtime of the code below is O(max(m,n)), where m, n are the lengths of the lists. Your original solution was more like O(m*n), which is much slower if either of the lists is large.
my_numbers_set = set(my_numbers)
my_list = [x for x in my_list if x not in my_numbers_set] + \
[x for x in my_list if x in my_numbers_set]

sort a list of integers by rank in python

I'm trying to create a function that for each number, it counts how many
elements are smaller than it (this is the number’s rank) and then places the number in its rank in the sorted list. Say I have a list
L=(4,7,9,10,6,11,3)
What I want to produce is a corresponding list
K=(1,3,4,5,2,6,0)
where element K[i] has the value of the 'rank' of the element for the corresponding location in L. I wrote this code:
def sort_by_rank(lst):
for i in range(len(lst)):
# rank = how many elements are smaller than lst[i]?
rank = 0
for elem in lst:
if elem < lst[i]:
rank += 1
lst[rank] = lst[i]
return lst
but it has a bug that i don't manage to debug.
The simpler way to do this is make a sorted copy of the list, then get the indexes:
L = [4,7,9,10,6,11,3]
s = sorted(L)
result = [s.index(x) for x in L] # [1, 3, 4, 5, 2, 6, 0]
Now, this is the naive approach, and it works great for small lists like yours, but for long lists it would be slow since list.index is relatively slow and it's being run over and over.
To make it more efficient, use enumerate to make a dict of element-index pairs. Lookups in a dict are much faster than list.index.
s = {x: i for i, x in enumerate(sorted(set(L)))}
result = [s[x] for x in L] # [1, 3, 4, 5, 2, 6, 0]
Here I'm also converting L to a set in case of duplicates, otherwise later occurrences would override the indexes of earlier ones.
For a more efficient approach, you can use enumerate to create indices for the items for sorting together, and then use another enumerate create indices for the sorted sequence to map the old indices to the sorted indices as a dict, and then iterate an index through the range of the list to output the new indices in a list comprehension:
L=(4,7,9,10,6,11,3)
d = {o: i for i, (o, _) in enumerate(sorted(enumerate(L), key=lambda t: t[1]))}
print([d[i] for i in range(len(L))])
This outputs:
[1, 3, 4, 5, 2, 6, 0]
First of all, your L and K are not lists, they are tuples. Your code probably bugged because you're trying to modify values within a tuple.
You can use map function to map each element to the amount of numbers smaller than it.
result = list(map(lambda item: sum([number < item for number in L]), L))
know that when calculating sum of True and False, True is same as 1 and False is same as 0. By calculating the sum of the new list, where each element is True/False depending on if item is larger than number, we are essentially counting how many True is in the list, which should be what you're asking.
My apology, did not saw the part you need you number sorted by the rank. You can use the key argument to sort it
L = [(item, sum([number < item for number in L])) for item in L]
L.sort(key=lambda item: item[1])
where each element in L is converted to tuple of (original_value, its_rank)

python3 : Getting unordered elements after using set. Why, how? [duplicate]

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.
Consider this example:
x=[1,2,20,6,210]
print(x)
# [1, 2, 20, 6, 210] # the order is same as initial order
set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted
My questions are -
Why is this happening?
How can I do set operations (especially set difference) without losing the initial order?
A set is an unordered data structure, so it does not preserve the insertion order.
This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:
>>> a = [1, 2, 20, 6, 210]
>>> b = set([6, 20, 1])
>>> [x for x in a if x not in b]
[2, 210]
If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:
>>> a = dict.fromkeys([1, 2, 20, 6, 210])
>>> b = dict.fromkeys([6, 20, 1])
>>> dict.fromkeys(x for x in a if x not in b)
{2: None, 210: None}
b doesn't really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won't preserve the insertion order.
In older versions of Python, you can use collections.OrderedDict instead:
>>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
>>> b = collections.OrderedDict.fromkeys([6, 20, 1])
>>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
OrderedDict([(2, None), (210, None)])
In Python 3.6, set() now should keep the order, but there is another solution for Python 2 and 3:
>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]
Remove duplicates and preserve order by below function
def unique(sequence):
seen = set()
return [x for x in sequence if not (x in seen or seen.add(x))]
How to remove duplicates from a list while preserving order in Python
Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.
In mathematics, there are sets and ordered sets (osets).
set: an unordered container of unique elements (Implemented)
oset: an ordered container of unique elements (NotImplemented)
In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).
Given
a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}
Code
oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])
Demo
Replicates are removed, insertion-order is preserved.
list(oset)
# [1, 2, 20, 6, 210]
Set-like operations on dict keys.
oset - b
# {1, 20, 210}
oset | b
# {1, 2, 5, 6, 20, 210}
oset & b
# {2, 6}
oset ^ b
# {1, 5, 20, 210}
Details
Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:
assert {1, 2, 3} == {2, 3, 1} # sets (order is ignored)
assert [1, 2, 3] != [2, 3, 1] # lists (order is guaranteed)
One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:
list: an ordered container of elements that permits replicates (Implemented)
mset: an unordered container of elements that permits replicates (NotImplemented)*
Summary
Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set | n | y | y
oset | y | y | n
list | y | n | y
mset | n | n | n*
*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).
You can remove the duplicated values and keep the list order of insertion with one line of code, Python 3.8.2
mylist = ['b', 'b', 'a', 'd', 'd', 'c']
results = list({value:"" for value in mylist})
print(results)
>>> ['b', 'a', 'd', 'c']
results = list(dict.fromkeys(mylist))
print(results)
>>> ['b', 'a', 'd', 'c']
As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order -
However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want - try using these snippets:
# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
new_list[x_dict[element]] = element
Building on Sven's answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:
import collections
x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])
If you want to add items but still treat it like a set you can just do:
z['nextitem']=None
And you can perform an operation like z.keys() on the dict and get the set:
list(z.keys())
[1, 2, 20, 6, 210]
One more simpler way can be two create a empty list ,let's say "unique_list" for adding the unique elements from the original list, for example:
unique_list=[]
for i in original_list:
if i not in unique_list:
unique_list.append(i)
else:
pass
This will give you all the unique elements as well as maintain the order.
Late to answer but you can use Pandas, pd.Series to convert list while preserving the order:
import pandas as pd
x = pd.Series([1, 2, 20, 6, 210, 2, 1])
print(pd.unique(x))
Output:
array([ 1, 2, 20, 6, 210])
Works for a list of strings
x = pd.Series(['c', 'k', 'q', 'n', 'p','c', 'n'])
print(pd.unique(x))
Output
['c' 'k' 'q' 'n' 'p']
An implementation of the highest score concept above that brings it back to a list:
def SetOfListInOrder(incominglist):
from collections import OrderedDict
outtemp = OrderedDict()
for item in incominglist:
outtemp[item] = None
return(list(outtemp))
Tested (briefly) on Python 3.6 and Python 2.7.
In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:
# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
... if not n in evens_set and not n in result:
... result.append(n)
...
>>> result
[1, 3, 5]
Its time complexity is not that good but it is neat and easy to read.
It's interesting that people always use 'real world problem' to make joke on the definition in theoretical science.
If set has order, you first need to figure out the following problems.
If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?
Plus, set is much faster in searching for a particular key which is very good in sets operation (and that's why you need a set, but not list).
If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.
def indx_dic(l):
dic = {}
for i in range(len(l)):
if l[i] in dic:
dic.get(l[i]).append(i)
else:
dic[l[i]] = [i]
return(dic)
a = [1,2,3,4,5,1,3,2]
set_a = set(a)
dic_a = indx_dic(a)
print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}
We can use collections.Counter for this:
# tested on python 3.7
>>> from collections import Counter
>>> lst = ["1", "2", "20", "6", "210"]
>>> for i in Counter(lst):
>>> print(i, end=" ")
1 2 20 6 210
>>> for i in set(lst):
>>> print(i, end=" ")
20 6 2 1 210
You can remove the duplicated values and keep the list order of insertion, if you want
lst = [1,2,1,3]
new_lst = []
for num in lst :
if num not in new_lst :
new_lst.append(num)
# new_lst = [1,2,3]
don't use 'sets' for removing duplicate if 'order' is something you want,
use sets for searching i.e.
x in list
takes O(n) time
where
x in set
takes O(1) time *most cases
Here's an easy way to do it:
x=[1,2,20,6,210]
print sorted(set(x))

Fastest way to compare ordered lists and count common elements *including* duplicates

I need to compare two lists of numbers and count how many elements of first list are there in second list. For example,
a = [2, 3, 3, 4, 4, 5]
b1 = [0, 2, 2, 3, 3, 4, 6, 8]
here I should get result of 4: I should count '2' 1 time (as it happens only once in first list), '3' - 2 times, '4' - 1 time (as it happens only once in second list). I was using the following code:
def scoreIn(list1, list2):
score=0
list2c=list(list2)
for i in list1:
if i in list2c:
score+=1
list2c.remove(i)
return score
it works correctly, but too slow for my case (I call it 15000 times). I read a hint about 'walking' through sorted lists which was supposed to be faster, so I tried to do like that:
def scoreWalk(list1, list2):
score=0
i=0
j=0
len1=len(list1) # we assume that list2 is never shorter than list1
while i<len1:
if list1[i]==list2[j]:
score+=1
i+=1
j+=1
elif list1[i]>list2[j]:
j+=1
else:
i+=1
return score
Unfortunately this code is even slower. Is there any way to make it more efficient? In my case, both lists are sorted, contains only integers, and list1 is never longer than list2.
You can use the intersection feature of collections.Counter to solve the problem in an easy and readable way:
>>> from collections import Counter
>>> intersection = Counter( [2,3,3,4,4,5] ) & Counter( [0, 2, 2, 3, 3, 4, 6, 8] )
>>> intersection
Counter({3: 2, 2: 1, 4: 1})
As #Bakuriu says in the comments, to obtain the number of elements in the intersection (including duplicates), like your scoreIn function, you can then use sum( intersection.values() ).
However, doing it this way you're not actually taking advantage of the fact that your data is pre-sorted, nor of the fact (mentioned in the comments) that you're doing this over and over again with the same list.
Here is a more elaborate solution more specifically tailored for your problem. It uses a Counter for the static list and directly uses the sorted dynamic list. On my machine it runs in 43% of the run-time of the naïve Counter approach on randomly generated test data.
def common_elements( static_counter, dynamic_sorted_list ):
last = None # previous element in the dynamic list
count = 0 # count seen so far for this element in the dynamic list
total_count = 0 # total common elements seen, eventually the return value
for x in dynamic_sorted_list:
# since the list is sorted, if there's more than one element they
# will be consecutive.
if x == last:
# one more of the same as the previous element
# all we need to do is increase the count
count += 1
else:
# this is a new element that we haven't seen before.
# first "flush out" the current count we've been keeping.
# - count is the number of times it occurred in the dynamic list
# - static_counter[ last ] is the number of times it occurred in
# the static list (the Counter class counted this for us)
# thus the number of occurrences the two have in common is the
# smaller of these numbers. (Note that unlike a normal dictionary,
# which would raise KeyError, a Counter will return zero if we try
# to look up a key that isn't there at all.)
total_count += min( static_counter[ last ], count )
# now set count and last to the new element, starting a new run
count = 1
last = x
if count > 0:
# since we only "flushed" above once we'd iterated _past_ an element,
# the last unique value hasn't been counted. count it now.
total_count += min( static_counter[ last ], count )
return total_count
The idea of this is that you do some of the work up front when you create the Counter object. Once you've done that work, you can use the Counter object to quickly look up counts, just like you look up values in a dictionary: static_counter[ x ] returns the number of times x occurred in the static list.
Since the static list is the same every time, you can do this once and use the resulting quick-lookup structure 15 000 times.
On the other hand, setting up a Counter object for the dynamic list may not pay off performance-wise. There is a little bit of overhead involved in creating a Counter object, and we'd only use each dynamic list Counter one time. If we can avoid constructing the object at all, it makes sense to do so. And as we saw above, you can in fact implement what you need by just iterating through the dynamic list and looking up counts in the other counter.
The scoreWalk function in your post does not handle the case where the biggest item is only in the static list, e.g. scoreWalk( [1,1,3], [1,1,2] ). Correcting that, however, it actually performs better than any of the Counter approaches for me, contrary to the results you report. There may be a significant difference in the distribution of your data to my uniformly-distributed test data, but double-check your benchmarking of scoreWalk just to be sure.
Lastly, consider that you may be using the wrong tool for the job. You're not after short, elegant and readable -- you're trying to squeeze every last bit of performance out of a rather simple task. CPython allows you to write modules in C. One of the primary use cases for this is to implement highly optimized code. It may be a good fit for your task.
You can do this with a dict comprehension:
>>> a = [2, 3, 3, 4, 4, 5]
>>> b1 = [0, 2, 2, 3, 3, 4, 6, 8]
>>> {k: min(b1.count(k), a.count(k)) for k in set(a)}
{2: 1, 3: 2, 4: 1, 5: 0}
This is much faster if set(a) is small. If set(a) is more than 40 items, the Counter based solution is faster.

Converting a list to a set changes element order

Recently I noticed that when I am converting a list to set the order of elements is changed and is sorted by character.
Consider this example:
x=[1,2,20,6,210]
print(x)
# [1, 2, 20, 6, 210] # the order is same as initial order
set(x)
# set([1, 2, 20, 210, 6]) # in the set(x) output order is sorted
My questions are -
Why is this happening?
How can I do set operations (especially set difference) without losing the initial order?
A set is an unordered data structure, so it does not preserve the insertion order.
This depends on your requirements. If you have an normal list, and want to remove some set of elements while preserving the order of the list, you can do this with a list comprehension:
>>> a = [1, 2, 20, 6, 210]
>>> b = set([6, 20, 1])
>>> [x for x in a if x not in b]
[2, 210]
If you need a data structure that supports both fast membership tests and preservation of insertion order, you can use the keys of a Python dictionary, which starting from Python 3.7 is guaranteed to preserve the insertion order:
>>> a = dict.fromkeys([1, 2, 20, 6, 210])
>>> b = dict.fromkeys([6, 20, 1])
>>> dict.fromkeys(x for x in a if x not in b)
{2: None, 210: None}
b doesn't really need to be ordered here – you could use a set as well. Note that a.keys() - b.keys() returns the set difference as a set, so it won't preserve the insertion order.
In older versions of Python, you can use collections.OrderedDict instead:
>>> a = collections.OrderedDict.fromkeys([1, 2, 20, 6, 210])
>>> b = collections.OrderedDict.fromkeys([6, 20, 1])
>>> collections.OrderedDict.fromkeys(x for x in a if x not in b)
OrderedDict([(2, None), (210, None)])
In Python 3.6, set() now should keep the order, but there is another solution for Python 2 and 3:
>>> x = [1, 2, 20, 6, 210]
>>> sorted(set(x), key=x.index)
[1, 2, 20, 6, 210]
Remove duplicates and preserve order by below function
def unique(sequence):
seen = set()
return [x for x in sequence if not (x in seen or seen.add(x))]
How to remove duplicates from a list while preserving order in Python
Answering your first question, a set is a data structure optimized for set operations. Like a mathematical set, it does not enforce or maintain any particular order of the elements. The abstract concept of a set does not enforce order, so the implementation is not required to. When you create a set from a list, Python has the liberty to change the order of the elements for the needs of the internal implementation it uses for a set, which is able to perform set operations efficiently.
In mathematics, there are sets and ordered sets (osets).
set: an unordered container of unique elements (Implemented)
oset: an ordered container of unique elements (NotImplemented)
In Python, only sets are directly implemented. We can emulate osets with regular dict keys (3.7+).
Given
a = [1, 2, 20, 6, 210, 2, 1]
b = {2, 6}
Code
oset = dict.fromkeys(a).keys()
# dict_keys([1, 2, 20, 6, 210])
Demo
Replicates are removed, insertion-order is preserved.
list(oset)
# [1, 2, 20, 6, 210]
Set-like operations on dict keys.
oset - b
# {1, 20, 210}
oset | b
# {1, 2, 5, 6, 20, 210}
oset & b
# {2, 6}
oset ^ b
# {1, 5, 20, 210}
Details
Note: an unordered structure does not preclude ordered elements. Rather, maintained order is not guaranteed. Example:
assert {1, 2, 3} == {2, 3, 1} # sets (order is ignored)
assert [1, 2, 3] != [2, 3, 1] # lists (order is guaranteed)
One may be pleased to discover that a list and multiset (mset) are two more fascinating, mathematical data structures:
list: an ordered container of elements that permits replicates (Implemented)
mset: an unordered container of elements that permits replicates (NotImplemented)*
Summary
Container | Ordered | Unique | Implemented
----------|---------|--------|------------
set | n | y | y
oset | y | y | n
list | y | n | y
mset | n | n | n*
*A multiset can be indirectly emulated with collections.Counter(), a dict-like mapping of multiplicities (counts).
You can remove the duplicated values and keep the list order of insertion with one line of code, Python 3.8.2
mylist = ['b', 'b', 'a', 'd', 'd', 'c']
results = list({value:"" for value in mylist})
print(results)
>>> ['b', 'a', 'd', 'c']
results = list(dict.fromkeys(mylist))
print(results)
>>> ['b', 'a', 'd', 'c']
As denoted in other answers, sets are data structures (and mathematical concepts) that do not preserve the element order -
However, by using a combination of sets and dictionaries, it is possible that you can achieve wathever you want - try using these snippets:
# save the element order in a dict:
x_dict = dict(x,y for y, x in enumerate(my_list) )
x_set = set(my_list)
#perform desired set operations
...
#retrieve ordered list from the set:
new_list = [None] * len(new_set)
for element in new_set:
new_list[x_dict[element]] = element
Building on Sven's answer, I found using collections.OrderedDict like so helped me accomplish what you want plus allow me to add more items to the dict:
import collections
x=[1,2,20,6,210]
z=collections.OrderedDict.fromkeys(x)
z
OrderedDict([(1, None), (2, None), (20, None), (6, None), (210, None)])
If you want to add items but still treat it like a set you can just do:
z['nextitem']=None
And you can perform an operation like z.keys() on the dict and get the set:
list(z.keys())
[1, 2, 20, 6, 210]
One more simpler way can be two create a empty list ,let's say "unique_list" for adding the unique elements from the original list, for example:
unique_list=[]
for i in original_list:
if i not in unique_list:
unique_list.append(i)
else:
pass
This will give you all the unique elements as well as maintain the order.
Late to answer but you can use Pandas, pd.Series to convert list while preserving the order:
import pandas as pd
x = pd.Series([1, 2, 20, 6, 210, 2, 1])
print(pd.unique(x))
Output:
array([ 1, 2, 20, 6, 210])
Works for a list of strings
x = pd.Series(['c', 'k', 'q', 'n', 'p','c', 'n'])
print(pd.unique(x))
Output
['c' 'k' 'q' 'n' 'p']
An implementation of the highest score concept above that brings it back to a list:
def SetOfListInOrder(incominglist):
from collections import OrderedDict
outtemp = OrderedDict()
for item in incominglist:
outtemp[item] = None
return(list(outtemp))
Tested (briefly) on Python 3.6 and Python 2.7.
In case you have a small number of elements in your two initial lists on which you want to do set difference operation, instead of using collections.OrderedDict which complicates the implementation and makes it less readable, you can use:
# initial lists on which you want to do set difference
>>> nums = [1,2,2,3,3,4,4,5]
>>> evens = [2,4,4,6]
>>> evens_set = set(evens)
>>> result = []
>>> for n in nums:
... if not n in evens_set and not n in result:
... result.append(n)
...
>>> result
[1, 3, 5]
Its time complexity is not that good but it is neat and easy to read.
It's interesting that people always use 'real world problem' to make joke on the definition in theoretical science.
If set has order, you first need to figure out the following problems.
If your list has duplicate elements, what should the order be when you turn it into a set? What is the order if we union two sets? What is the order if we intersect two sets with different order on the same elements?
Plus, set is much faster in searching for a particular key which is very good in sets operation (and that's why you need a set, but not list).
If you really care about the index, just keep it as a list. If you still want to do set operation on the elements in many lists, the simplest way is creating a dictionary for each list with the same keys in the set along with a value of list containing all the index of the key in the original list.
def indx_dic(l):
dic = {}
for i in range(len(l)):
if l[i] in dic:
dic.get(l[i]).append(i)
else:
dic[l[i]] = [i]
return(dic)
a = [1,2,3,4,5,1,3,2]
set_a = set(a)
dic_a = indx_dic(a)
print(dic_a)
# {1: [0, 5], 2: [1, 7], 3: [2, 6], 4: [3], 5: [4]}
print(set_a)
# {1, 2, 3, 4, 5}
We can use collections.Counter for this:
# tested on python 3.7
>>> from collections import Counter
>>> lst = ["1", "2", "20", "6", "210"]
>>> for i in Counter(lst):
>>> print(i, end=" ")
1 2 20 6 210
>>> for i in set(lst):
>>> print(i, end=" ")
20 6 2 1 210
You can remove the duplicated values and keep the list order of insertion, if you want
lst = [1,2,1,3]
new_lst = []
for num in lst :
if num not in new_lst :
new_lst.append(num)
# new_lst = [1,2,3]
don't use 'sets' for removing duplicate if 'order' is something you want,
use sets for searching i.e.
x in list
takes O(n) time
where
x in set
takes O(1) time *most cases
Here's an easy way to do it:
x=[1,2,20,6,210]
print sorted(set(x))

Categories