Get partitioned indices of sorted 2D list - python

I have "2D" list and I want to make partitions/groups of the list indices based on the first value of the nested list, and then return the sorted index of the partitions/groups based on the second value in the nested list. For example
test = [[1, 2], [1, 1], [1, 5], [2, 3], [2, 1], [1, 10]]
sorted_partitions(test)
>>> [[1, 0, 2, 5], [4, 3]]
# because the groupings are [(1, [1, 1]), (0, [1, 2]), (2, [1, 5]), (5, [1, 10]), (4, [2, 1]), (3, [2, 3])]

My quick solution is to use itertools and groupby. I'm sure there's some alternate approaches with other libraries, or even without libraries.
# masochist one-liner
sorted_partitions = lambda z: return [[val[0] for val in l] for l in [list(g) for k, g in itertools.groupby(sorted(enumerate(z), key=lambda x:x[1]), key=lambda x: x[1][0])]] # not PEP compliant
# cleaner version
def sorted_partitions(x):
sorted_inds = sorted(enumerate(x), key=lambda x:x[1])
grouped_tuples = [list(g) for k, g in itertools.groupby(sorted_inds, key=lambda x: x[1][0])]
partioned_inds = [[val[0] for val in l] for l in grouped_tuples]
return partioned_inds

After coming up with what I thought would be an improvement to my original attempt, I decided to do some runtime tests. To my surprise, the bisect didn't actually improve. So the best implementation is currently:
from collections import defaultdict
def my_sorted_partitions(l):
groups = defaultdict(list)
for i, (group, val) in enumerate(l):
groups[group].append((val, i))
res = []
for group, vals_i in sorted(groups.items()):
res.append([i for val, i in sorted(vals_i)])
return res
It is very similar to the original, but uses a defaultdict instead of groupby. This means there is no need to sort the input list (which is required to use groupby). It is now necessary to sort the groups dict (by keys) but assuming num_groups << num_elements it is efficient. Lastly, we need to sort each group (by values) but since they are smaller it might be more efficient.
The attempted improvement using bisect (which removes the need to sort the values, but apparently the "repeated sorting" costs more):
def bisect_sorted_partitions(l):
groups = defaultdict(list)
for i, (group, val) in enumerate(l):
bisect.insort(groups[group], (val, i))
res = []
for group, vals_i in sorted(groups.items()):
res.append([i for val, i in vals_i])
return res
And the timing done in this REPL.
The input is randomly generated, but the results from an example run are:
My: 28.024
Bisect: 60.325
Orig: 200.61
Where Orig is the answer by the OP.

Related

How to efficiently get shared key subsets of multiple dicts as multiple arrays?

Is there an efficient way to get the intersection of (the keys of) multiple dictionaries?
Similar to iterating over shared keys in two dictionaries , except the idea is not to iterate but rather get the set so it can be used to get the subset of dicts.
d1 = {'a':[1,2], 'b':[2,2]}
d2 = {'e':[3,2], 'b':[5,1], 'a':[5,5]}
d3 = {'b':[8,2], 'a':[3,3], 'c': [1,2]}
So intersection manually is simple
d1.keys() & d2.keys() & d3.keys()
but what about n-dimensional list? I feel like there is a better way than this:
d_list = [d1, d2, d3]
inter_keys = {}
for i in range(len(d_list)):
if i == 0:
inter_keys = d_list[i]
inter_keys = inter_keys & d_list[i].keys()
Then getting a subset
subsets = []
for n in d_list:
subsets.append( {k: n[k] for k in inter_keys} )
and finally use it to get the value subset
v = [ x.values() for x in subsets ]
really the last part is formatted as v = np.array([ np.array(list(x.values())) for x in subsets ]) to get the ndarray as:
[[[2 2] [1 2]]
[[5 1] [5 5]]
[[8 2] [3 3]]]
I was thinking there may be an approach using something like the numpy where to more efficiently get the subset, but not sure.
I think your code can be simplified to:
In [383]: d_list=[d1,d2,d3]
In [388]: inter_keys = d_list[0].keys()
In [389]: for n in d_list[1:]:
...: inter_keys &= n.keys()
...:
In [390]: inter_keys
Out[390]: {'a', 'b'}
In [391]: np.array([[n[k] for k in inter_keys] for n in d_list])
Out[391]:
array([[[1, 2],
[2, 2]],
[[5, 5],
[5, 1]],
[[3, 3],
[8, 2]]])
That is, iteratively get the intersection of keys, followed by extraction of the values into a list of lists, which can be made into an array.
inter_keys starts as a dict.keys object, but becomes a set; both work with &=.
I don't think there's a way around the double loop with dict indexing, n[k] as the core. Unless you can use the values or items lists, there isn't a way around accessing dict items one by one.
The sub_sets list of dict is an unnecessary intermediate step.
All the keys and values can be extracted into a list of lists, but that doesn't help with selecting a common subset:
In [406]: big_list = [list(d.items()) for d in d_list]
In [407]: big_list
Out[407]:
[[('a', [1, 2]), ('b', [2, 2])],
[('e', [3, 2]), ('b', [5, 1]), ('a', [5, 5])],
[('b', [8, 2]), ('a', [3, 3]), ('c', [1, 2])]]
Assuming that the lists of values in your dictionaries are of the same length, you can use this approach:
import numpy as np
d1 = {'a':[1,2], 'b':[2,2]}
d2 = {'e':[3,2], 'b':[5,1], 'a':[5,5]}
d3 = {'b':[8,2], 'a':[3,3], 'c':[1,2]}
d_list = [d1, d2, d3]
inter_map = {} if len(d_list) == 0 else d_list[0]
for d_it in d_list[1:]:
# combine element lists based on the current intersection. keys that do not match once are removed from inter_map
inter_map = {k: inter_map[k] + d_it[k] for k in d_it.keys() & inter_map.keys()}
# inter_map holds a key->value list mapping at this point
values = np.array([item for sublist in inter_map.values() for item in sublist]).reshape([len(inter_map.keys()),
2 * len(d_list)])
# real_values restructures the values into the order used in your program, assumes you always have 2 values per sublist
real_values = np.zeros(shape=[len(d_list), 2 * len(inter_map.keys())])
for i, k in enumerate(inter_map.keys()):
real_values[:, 2*i:2*(i+1)] = values[i].reshape([len(d_list), 2])
Please note that this code is not deterministic, since the order of keys in your map is not guaranteed to be the same for different runs of the program.

Is there any way to use itertools groupby to remove adjacent duplicating items in a list, but keep the original index?

For example, if I have a list [1,2,2,3,2,2,1], I would like to return [1,2,3,2,1]with index (0,1,3,4,6).
I'm not familiar with groupby, I learned how to eliminate adjacent duplicate items, but it is possible to somehow also extract index with groupby()? Thank you.
Yes, it's possible, you can combine itertools.groupby, with enumerate() and the right key=
For example:
lst = [1,2,2,3,2,2,1]
from itertools import groupby
rv, index = [], []
for v, g in groupby(enumerate(lst), key=lambda k: k[1]):
rv.append(v)
index.append(next(g)[0])
print(rv)
print(index)
Prints:
[1, 2, 3, 2, 1]
[0, 1, 3, 4, 6]

Subtract dictionary value from key

I have a dictionary like this:
a = {(8, 9): [[0, 0], [4, 5]], (3, 4): [[1, 2], [6, 7]]}
I would like to subtract the sum of the corresponding elements of the nested lists in the values from each element of each key, and replace the key with the result.
For example:
new_key[0] = 8 - 0+4 = 4, new_key[1] = 9 - (0+5) = 4
Hence the new key becomes (4, 4) and it replaces (8, 9)
I am not able to understand how to access a list of lists which is the value to the key!
Any ideas as to how to do this?
See Access item in a list of lists for indexing list of list.
For your specific case, this should work
b = {(k[0]-v[0][0]-v[1][0], k[1]-v[0][1]-v[1][1]):v for k, v in a.items()}
for key in list(a.keys()):
new_key = []
new_key.append(key[0] - (a[key][0][0] + a[key][1][0]))
new_key.append(key[1] - (a[key][0][1] + a[key][1][1]))
a[new_key] = a.pop(key) # pop() returns the value
Iterate through the dictionary to get the keys and values and create a new one b. Then just point a to the new dictionary b
a = {(8, 9): [[0, 0], [4, 5]], (3, 4): [[1, 2], [6, 7]]}
b = {}
for key, val in a.items():
new_key = (key[0]-(val[0][0]+val[1][0]), key[1]-(val[0][1]+val[1][1]))
b[new_key] = val
a = b
del b
Try This:
b = {}
for key, value in a.items():
new_key = key[0]-(value[0][0]+value[1][0])
new_key_1 = key[1]-(value[0][1]+value[1][1])
u_key = (new_key, new_key_1)
b[u_key]=value
print(b)
The following code will work just as well with any size of key tuple (2, 5, 87, whatever.)
There is no simple way to rename a dictionary key, but you can insert a new key and delete the old one. This isn't recommended for a couple of reasons:
to be safe, you'll need to check to make sure you're not doing something weird like creating a new key only to delete that same key
be careful when you iterate over a dictionary while changing it
If you need a dictionary result, the safest thing is to generate an entirely new dictionary based on a, as has been done here.
The problem you're trying to solve is easier if you transpose the dictionary values.
After calculating the new key, (8, 9): [[0, 0], [4, 5]] should become:
(8 - sum([0, 4]), 9 - sum([0, 5])): [[0, 0], [4, 5]]
Now see how transposing helps:
transposed([[0, 0], [4, 5]]) == [[0, 4], [0, 5]]
then the new key[0] calculation is:
key[0] - sum(transposed(values)[0])
and the new key[1] calculation is:
key[1] - sum(transposed(values)[1])
So transposing makes the calculation easier.
Python dictionaries can't have lists as keys (lists are not hashable) so I've built the key as a list, then converted it to a tuple at the end.
a = {
(8, 9): [[0, 0], [4, 5]],
(3, 4): [[1, 2], [6, 7]]
}
def transpose(m):
return list(zip(*m))
results = {}
for source_keys, source_values in a.items():
transposed_values = transpose(source_values)
key = []
for n, key_item in enumerate(source_keys):
subtractables = sum(transposed_values[n])
key.append(key_item - subtractables)
results[tuple(key)] = source_values
print(results)
>>> python transpose.py
{(4, 4): [[0, 0], [4, 5]], (-4, -5): [[1, 2], [6, 7]]}
The following solution works with three assumptions:
The keys are iterables of integers
The values are iterables of iterables of integers
The inner iterables of each value are the same length as the corresponding key
Practically, this means that the length of the value can be anything, as long as it's a 2D list, and that you can have keys of different lengths, even in the same dictionary, as long as the values match along the inner dimension.
You would want to transpose the values to make the sum easier to compute. The idiom zip(*value) lets you do this quite easily. Then you map sum onto that and subtract the result from the elements of the key.
Another thing to keep in mind is that replacing keys during iteration is a very bad idea. You're better off creating an entirely new dictionary to hold the updated mapping.
from operator import sub
from itertools import starmap
a = {(8, 9): [[0, 0], [4, 5]], (3, 4): [[1, 2], [6, 7]]}
b = {
tuple(starmap(sub, zip(k, map(sum, zip(*v))))): v
for k, v in a.items()
}
The result is
{(4, 4): [[0, 0], [4, 5]], (-4, -5): [[1, 2], [6, 7]]}
Here is an IDEOne Link to play with.
The full version of the loop would look like this:
b = {}
for k, v in a.items():
vt = zip(*v) # transposed values
sums = map(sum, vt) # sum of the 1st elements, 2nd elements, etc
subs = zip(k, sums) # match elements of key with sums to subtract
diffs = starmap(sub, subs) # subtract sums from key
new_key = tuple(diffs) # evaluate the generators
b[new_key] = value

Filtering a (Nx1) list in Python

I have a list of the form
[(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
I want to scan the list and return those elements whose (i,1) are repeated. (I apologize I couldn't frame this better)
For example, in the given list the pairs are (2,3),(4,3) and I see that 3 is repeated so I wish to return 2 and 4. Similarly, from (3,4),(1,4),(5,4) I will return 3, 1, and 5 because 4 is repeated.
I have implemented the bubble search but that is obviously very slow.
for i in range(0,p):
for j in range(i+1,p):
if (arr[i][1] == arr[j][1]):
print(arr[i][0],arr[j][0])
How do I go about it?
You can use collections.defaultdict. This will return a mapping from the second item to a list of first items. You can then filter for repetition via a dictionary comprehension.
from collections import defaultdict
lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
d = defaultdict(list)
for i, j in lst:
d[j].append(i)
print(d)
# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})
res = {k: v for k, v in d.items() if len(v)>1}
print(res)
# {3: [2, 4], 4: [3, 1, 5]}
Using numpy allows to avoid for loops:
import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
a = np.array(l)
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates
# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output
(toggle comment to get output in form of tuples)
pandas is another option:
import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

Removing Similarities Python

The list contains other lists:
L = [[3, 3], [4, 2], [3, 2]]
If the first element of the sublist is equal to the first element of other sublists the one that has higher second element has to be remove from the whole list.
So the new list is:
L = [[4,2], [3,2]]
How to do this as efficiently as possible?
L.sort(key=lambda x: x[1], reverse=True)
L = OrderedDict(L).items()
Why that works
If you do a dict(L) with L a list or tuple, this is more or less equivalent to:
{k: v for k, v in L}
As you can see, later values override prior values if duplicate keys (k) are present.
We can make use of this if we are able to put L in the correct order.
In your case, we don't really care about the order of the keys, but we want lower values (i.e. second elements of the sublists) to appear later. This way, any lower value overwrites a higher value with the same key.
It is sufficient to sort by the second elements of the sublists (in reverse order). Since list.sort() is stable this also preserves the original order of the entries as much as possible.
L.sort(key=lambda x: x[1], reverse=True)
collections.OrderedDict(L) now makes the elements unique by first element, keeping insertion order.
The sort() is O(n ln n) and the dict creation adds another O(n). It's possible to do without the sort:
d = OrderedDict()
for k, v in L:
ev = d.get(k, None)
# update value. Always if key is not present or conditionally
# if existing value is larger than current value
d[k] = v if ev is None or ev > v else ev
L = d.items()
But that is a lot more code and probably not at all or not much faster in pure Python.
Edits: (1) make it work with non-integer keys (2) It's enough to sort by second elements, no need for a full sort.
If you don't care about the ordering of the elements in the output list, then you can create a dictionary that maps first items to second items, then construct your result from the smallest values.
from collections import defaultdict
L = [[3, 3], [4, 2], [3, 2]]
d = defaultdict(list)
for k,v in L:
d[k].append(v)
result = [[k, min(v)] for k,v in d.iteritems()]
print result
Result:
[[3, 2], [4, 2]]
This is pretty efficient - O(n) average case, O(n*log(n)) worst case.
You can use this too.
x = [[3, 3], [4, 2], [3, 2]]
for i in x:
if i[0]==i[1]:
x.pop(x.index(i))

Categories