Removing Similarities Python - python

The list contains other lists:
L = [[3, 3], [4, 2], [3, 2]]
If the first element of the sublist is equal to the first element of other sublists the one that has higher second element has to be remove from the whole list.
So the new list is:
L = [[4,2], [3,2]]
How to do this as efficiently as possible?

L.sort(key=lambda x: x[1], reverse=True)
L = OrderedDict(L).items()
Why that works
If you do a dict(L) with L a list or tuple, this is more or less equivalent to:
{k: v for k, v in L}
As you can see, later values override prior values if duplicate keys (k) are present.
We can make use of this if we are able to put L in the correct order.
In your case, we don't really care about the order of the keys, but we want lower values (i.e. second elements of the sublists) to appear later. This way, any lower value overwrites a higher value with the same key.
It is sufficient to sort by the second elements of the sublists (in reverse order). Since list.sort() is stable this also preserves the original order of the entries as much as possible.
L.sort(key=lambda x: x[1], reverse=True)
collections.OrderedDict(L) now makes the elements unique by first element, keeping insertion order.
The sort() is O(n ln n) and the dict creation adds another O(n). It's possible to do without the sort:
d = OrderedDict()
for k, v in L:
ev = d.get(k, None)
# update value. Always if key is not present or conditionally
# if existing value is larger than current value
d[k] = v if ev is None or ev > v else ev
L = d.items()
But that is a lot more code and probably not at all or not much faster in pure Python.
Edits: (1) make it work with non-integer keys (2) It's enough to sort by second elements, no need for a full sort.

If you don't care about the ordering of the elements in the output list, then you can create a dictionary that maps first items to second items, then construct your result from the smallest values.
from collections import defaultdict
L = [[3, 3], [4, 2], [3, 2]]
d = defaultdict(list)
for k,v in L:
d[k].append(v)
result = [[k, min(v)] for k,v in d.iteritems()]
print result
Result:
[[3, 2], [4, 2]]
This is pretty efficient - O(n) average case, O(n*log(n)) worst case.

You can use this too.
x = [[3, 3], [4, 2], [3, 2]]
for i in x:
if i[0]==i[1]:
x.pop(x.index(i))

Related

Get partitioned indices of sorted 2D list

I have "2D" list and I want to make partitions/groups of the list indices based on the first value of the nested list, and then return the sorted index of the partitions/groups based on the second value in the nested list. For example
test = [[1, 2], [1, 1], [1, 5], [2, 3], [2, 1], [1, 10]]
sorted_partitions(test)
>>> [[1, 0, 2, 5], [4, 3]]
# because the groupings are [(1, [1, 1]), (0, [1, 2]), (2, [1, 5]), (5, [1, 10]), (4, [2, 1]), (3, [2, 3])]
My quick solution is to use itertools and groupby. I'm sure there's some alternate approaches with other libraries, or even without libraries.
# masochist one-liner
sorted_partitions = lambda z: return [[val[0] for val in l] for l in [list(g) for k, g in itertools.groupby(sorted(enumerate(z), key=lambda x:x[1]), key=lambda x: x[1][0])]] # not PEP compliant
# cleaner version
def sorted_partitions(x):
sorted_inds = sorted(enumerate(x), key=lambda x:x[1])
grouped_tuples = [list(g) for k, g in itertools.groupby(sorted_inds, key=lambda x: x[1][0])]
partioned_inds = [[val[0] for val in l] for l in grouped_tuples]
return partioned_inds
After coming up with what I thought would be an improvement to my original attempt, I decided to do some runtime tests. To my surprise, the bisect didn't actually improve. So the best implementation is currently:
from collections import defaultdict
def my_sorted_partitions(l):
groups = defaultdict(list)
for i, (group, val) in enumerate(l):
groups[group].append((val, i))
res = []
for group, vals_i in sorted(groups.items()):
res.append([i for val, i in sorted(vals_i)])
return res
It is very similar to the original, but uses a defaultdict instead of groupby. This means there is no need to sort the input list (which is required to use groupby). It is now necessary to sort the groups dict (by keys) but assuming num_groups << num_elements it is efficient. Lastly, we need to sort each group (by values) but since they are smaller it might be more efficient.
The attempted improvement using bisect (which removes the need to sort the values, but apparently the "repeated sorting" costs more):
def bisect_sorted_partitions(l):
groups = defaultdict(list)
for i, (group, val) in enumerate(l):
bisect.insort(groups[group], (val, i))
res = []
for group, vals_i in sorted(groups.items()):
res.append([i for val, i in vals_i])
return res
And the timing done in this REPL.
The input is randomly generated, but the results from an example run are:
My: 28.024
Bisect: 60.325
Orig: 200.61
Where Orig is the answer by the OP.

Breaking nested dictionary into list

I have been handling data from graph theory lately. I need to reiterate the shortest path in a form of nested dictionary to a tuple of lists.
For example:
What I received:
{0:{2:{4:{},3:{}}},1:{2:{2:{},7:{}}}}
What I want:
[[4,2,0],[3,2,0],[2,2,1],[7,2,1]]
My first thought was to define a function to append the list. However, I have been struggling from that for several hours already. I have no clue to get into the deepest keys of the dictionary…
Look forward to hearing advice from you guys, really appreciate of your help!!!
(P.s I have the number of layers of the nest, such as 3 for the above example)
A recursive approach works for this. Define a function that will return the path to the nodes in your dictionary like so:
For each item in the dictionary
if the item is not a dictionary, or an empty dictionary, we yield its key.
if the item is a non-empty dictionary, we get the path to the children of this item, and append the item's key to this path before yielding.
def dict_keys_to_list(d):
for k, v in d.items():
if isinstance(v, dict) and v:
for c in dict_keys_to_list(v):
yield c + [k]
else:
yield [k]
To use this:
d = {0:{2:{4:{},3:{}}},1:{2:{2:{},7:{}}}}
x = list(dict_keys_to_list(d))
print(x)
gives:
[[4, 2, 0], [3, 2, 0], [2, 2, 1], [7, 2, 1]]
You can use a recursive generator:
def unnest(d, val=[]):
if d == {}:
yield val
else:
for k,v in d.items():
yield from unnest(v, [k]+val)
list(unnest(d))
Output:
[[4, 2, 0], [3, 2, 0], [2, 2, 1], [7, 2, 1]]
How it works:
for each key, value pair, run a recursive iteration on the sub dictionaries passing the key as parameter.
at each step the new key is added on front of the list
when an empty dictionary is found the path is over, yield the list

sort a list of integers by rank in python

I'm trying to create a function that for each number, it counts how many
elements are smaller than it (this is the number’s rank) and then places the number in its rank in the sorted list. Say I have a list
L=(4,7,9,10,6,11,3)
What I want to produce is a corresponding list
K=(1,3,4,5,2,6,0)
where element K[i] has the value of the 'rank' of the element for the corresponding location in L. I wrote this code:
def sort_by_rank(lst):
for i in range(len(lst)):
# rank = how many elements are smaller than lst[i]?
rank = 0
for elem in lst:
if elem < lst[i]:
rank += 1
lst[rank] = lst[i]
return lst
but it has a bug that i don't manage to debug.
The simpler way to do this is make a sorted copy of the list, then get the indexes:
L = [4,7,9,10,6,11,3]
s = sorted(L)
result = [s.index(x) for x in L] # [1, 3, 4, 5, 2, 6, 0]
Now, this is the naive approach, and it works great for small lists like yours, but for long lists it would be slow since list.index is relatively slow and it's being run over and over.
To make it more efficient, use enumerate to make a dict of element-index pairs. Lookups in a dict are much faster than list.index.
s = {x: i for i, x in enumerate(sorted(set(L)))}
result = [s[x] for x in L] # [1, 3, 4, 5, 2, 6, 0]
Here I'm also converting L to a set in case of duplicates, otherwise later occurrences would override the indexes of earlier ones.
For a more efficient approach, you can use enumerate to create indices for the items for sorting together, and then use another enumerate create indices for the sorted sequence to map the old indices to the sorted indices as a dict, and then iterate an index through the range of the list to output the new indices in a list comprehension:
L=(4,7,9,10,6,11,3)
d = {o: i for i, (o, _) in enumerate(sorted(enumerate(L), key=lambda t: t[1]))}
print([d[i] for i in range(len(L))])
This outputs:
[1, 3, 4, 5, 2, 6, 0]
First of all, your L and K are not lists, they are tuples. Your code probably bugged because you're trying to modify values within a tuple.
You can use map function to map each element to the amount of numbers smaller than it.
result = list(map(lambda item: sum([number < item for number in L]), L))
know that when calculating sum of True and False, True is same as 1 and False is same as 0. By calculating the sum of the new list, where each element is True/False depending on if item is larger than number, we are essentially counting how many True is in the list, which should be what you're asking.
My apology, did not saw the part you need you number sorted by the rank. You can use the key argument to sort it
L = [(item, sum([number < item for number in L])) for item in L]
L.sort(key=lambda item: item[1])
where each element in L is converted to tuple of (original_value, its_rank)

Compare elements inside list of lists in Python

I'm trying to create a new list of lists by removing the rows with a duplicated value within existing list of lists.
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
sec = []
for row in fir:
if sec is None:
sec.append(row)
elif row[0] not in sec:
sec.append(row)
print(sec)
Expected output:
[['a35', 1], ['3r', 6], [5, 9]]
Actual output:
[['a35', 1], ['a35', 2], ['3r', 6], ['3r', 8], [5, 9]]
I want create a list of lists in which the values of row[0] are unique and not duplicated (e.g. the row with 'a35' should be included only once)
How can I achieve this?
you can simply save the unique value (the 1st data in the tuple), you're wrong because you compare the 1st tuple to all the data (comparing 'a35' to ['a35',1])
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
sec = []
index = []
for f in fir:
if not f[0] in index:
index.append(f[0])
sec.append(f)
print(sec)
Your current code fails because after the first iteration sec looks like this: [['a35',1]]. On the second iteration row has value of ['a35',2] which can't be found from sec thus it gets appended there.
You could use groupby to group the inner lists based on the first element. groupby returns iterable of (key, it) tuple where key is value returned by second parameter and it is iterable of elements in within the group:
>>> from itertools import groupby
>>> fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9]]
>>> [next(g) for _, g in groupby(fir, lambda x: x[0])]
[['a35', 1], ['3r', 6], [5, 9]]
Note that above assumes that lists with the same first element are next to each other in seq. If that's not the case you could sort seq before passing it to groupby but that only works if the first elements can be used as keys. With your data that's not the case since there are strings and ints which can't be compared on Python 3. You could collect the items to OrderedDict though:
from collections import OrderedDict
fir = [['a35',1],['a35',2],['3r',6],['3r',8],[5,9],['a35',7]]
d = OrderedDict()
for x in fir:
d.setdefault(*x)
print([list(x) for x in d.items()])
Output:
[['a35', 1], ['3r', 6], [5, 9]]
Use List Comprehension to achieve this:
sec=[i for i in fir if i[0] not in [fir[idx][0] for idx in xrange(0,fir.index(i))]]
This selects each item from fir and compare first element of the item with all the item from index 0 till the index of that item.
As you have only two items in the inner list and you don't want to have duplicates,
Dictionary would have been the perfect data structure for your case.
I think when you loop the fir, you should add a list for recording which key you have put in the sec.

Sort python list based on length, and then based on content

I need to sort a list based on element's length, and then based on the contents.
For example, with an input [[1,2,3,4],[1,2,3],[2,3,4]], I need to get [[1,2,3,4],[2,3,4],[1,2,3]]: [1,2,3,4] has the largest elements, and then [2,3,4] is bigger than [1,2,3] in its first element. With an input [[2,3,5],[1,2,3],[2,3,4]], [[2,3,5],[2,3,4],[1,2,3]] should be returned by comparing element by element when the length of the element is the same.
I could easily sort the list by the length of element, but how can I resort after that?
>>> a = [[1,2,3,4],[1,2,3],[2,3,4]]
>>> sorted(a, key=len, reverse=True)
[[1, 2, 3, 4], [1, 2, 3], [2, 3, 4]]
Don't. Sort once.
key=lambda a: (-len(a), a)
The easiest way is to have key return a tuple. So in this case:
sorted(a, key=lambda item: (len(item), item), reverse=True)
Define you own function
def comparing(a, b):
if len(a) != len(b):
return len(a) - len(b)
else:
return sum(a) - sum(b)

Categories