How can I get unique arrays in Python?

How can I get unique arrays in Python? - python

I got a fairly large array of arrays of length 2 (List[List[int, int]])
How can I unique arrays of them? Preferably without using different libraries
I've seen several solutions that use numpy, but I'm unlikely to be able to use this in olympiads
# Example input:
nums = [[2, 9], [3, 6], [9, 2], [6, 3]]
for i in nums:
# some code here
# Output:
# nums = [[2, 9], [3, 6]]
I tried doing this but I guess it's not a very fast solution
# Example input:
nums = [[2, 9], [3, 6], [9, 2], [6, 3]]
unique = []
for i in nums:
if sorted(i) not in unique:
unique.append(sorted(i))
# Output:
print(unique) # [[2, 9], [3, 6]]

To deal with sets not being hashable, you can create a set of frozensets this way:
unique = {frozenset(i) for i in nums}
Then you can use whichever means to turn the results into the objects you want; for example:
unique = [list(i) for i in unique]

Turn each pair into a sorted tuple, put them all into a set, then turn that back into a list of lists.
>>> nums = [[2, 9], [3, 6], [9, 2], [6, 3]]
>>> {tuple(sorted(n)) for n in nums}
{(2, 9), (3, 6)}
>>> [list(t) for t in {tuple(sorted(n)) for n in nums}]
[[2, 9], [3, 6]]
The tuple is necessary because a set (which is created via the {} set comprehension expression) needs to contain hashable (immutable) objects.

This answer is wrong. The set constructor does not sort its elements.
As #Swifty mentioned, you can use set to solve this problem. Send each pair through the set constructor to sort the pair, then convert it to tuple to make it hashable and use set again to remove duplicate tuples.
nums = [[2, 9], [3, 6], [9, 2], [6, 3]]
num_tuples = set(tuple(set(pair)) for pair in nums)
print(num_tuples) # {(9, 2), (3, 6)}
Warning: As pointed out by #Samwise
This is a little dodgy because you're assuming that tuple(set(pair)) will deterministically create the same tuple ordering for any given set. This is probably true in practice (IIRC when you iterate over a set the items always come out in hash order, at least in CPython 3) but I'm not sure it's necessarily guaranteed by the Python spec. –

Related

How to calculate the difference between two lists of lists in Python

I have two lists like the following:
A = [[1, 2, 3], [1, 2, 4], [4, 5, 6]]
and
B = [[1, 2, 3], [1, 2, 6], [4, 5, 6], [4, 3, 6]]
And I wish to calculate the difference, which is equal to the following:
A - B =[[1, 2, 4]]
In other words, I want to treat A and B as a set of lists (all of the sample size, in this example it is 3) and find the difference (i.e, remove all lists in B, which are also in A and return the rest.).
Is there a faster way than using multiple for loops for this?

Simple list comprehension will do the trick:
[a for a in A if a not in B]
output:
[[1, 2, 4]]

If you convert the second list to a set first, then membership tests are asymptotically faster; the downside is you have to convert the rows to tuples so that they can be in a set. (Consider having the rows as tuples instead of lists in the first place.)
def list_of_lists_subtract(a, b):
b_set = {tuple(row) for row in b}
return [row for row in a if tuple(row) not in b_set]
Note that "asymptotically faster" only means this should be faster for large inputs; the simpler version will likely be faster for small inputs. If performance is critical then it's up to you to benchmark the alternatives on realistic data.

You can try this.
Convert the first list of lists to a set of tuples S1
Convert the second list of lists to a set of tuples S2
Use the difference method or simply S1 - S2 to get the set of tuples that are present in S1 but not in S2
Convert the result obtained to the desired format (in this case, a list of lists).
# (Untested)
A = [[1, 2, 3], [1, 2, 4], [4, 5, 6]]
B = [[1, 2, 3], [1, 2, 6], [4, 5, 6], [4, 3, 6]]
set_A = set([tuple(item) for item in A])
set_B = set([tuple(item) for item in B])
difference_set = set_A - set_B
difference_list = [list(item) for item in sorted(difference_set)]
print(difference_list)

Remove tuples that are duplicate combinations of lists in the list of these tuples

I have the list
a = [([4, 7, 9], [3], 5.5), ([2, 5, 8], [3], 5.5), ([3], [4, 7, 9], 5.5), ([3], [2, 5, 8], 5.5)]
and I am trying to remove duplicate tuples that have the same combination of lists.
For example, ([4, 7, 9], [3], 5.5) and ([3], [4, 7, 9], 5.5) are the same. So the output after removing the duplicate tuples will look something like:
a = [([4, 7, 9], [3], 5.5), ([2, 5, 8], [3], 5.5)]
with any order of the lists in the tuples allowed.
Edit (based on #DYZ's feedback): Fully flattened tuples are not allowed. For example, (4,7,9,3,5.5) is not allowed. The output should still be of the form: ([list 1], [list2], constant).
I tried to adapt a method that is related in Remove duplicated lists in list of lists in Python, but I have reached a mental deadlock..
Is it possible to modify the code further in the linked question, or is there a more efficient way to do this?

Sort the elements of a by their length (setting the length of elements which aren't list as -1). Then find the indices of the unique elements of the resulting sort, and use those to index into the unsorted list.
asort = [sorted(aa, key= lambda x: len(x) if isinstance(x,list) else -1) for aa in a]
inds = [i for i,x in enumerate(asort) if asort.index(x)==i]
a = [a[i] for i in inds]

You can use a dictionary for this job. Create an empty dictionary:
from itertools import chain
d = {}
Insert each tuple and its flattened form into the dictionary as the value and the key, respectively:
for t in a:
# Flatten the tuple
flat = chain.from_iterable(part if isinstance(part,list) else [part]
for part in t)
maps_to = frozenset(flat) # Sets cannot be used as keys
d[maps_to] = t # Add it to the dict; the most recent addition "survives"
list(d.values())
#[([3], [4, 7, 9], 5.5), ([3], [2, 5, 8], 5.5)]

how to manipulate nested lists

So I currently have a nested list.
org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
I need to figure out how to manipulate it to create lists of possible combinations of the nested lists. These combinations cannot have lists that share numbers. Here is an example of what the result should be:
network_1=[[1,2,3],[7,9,10]]
network_2=[[1,4,5],[7,9,10]]
network_3=[[1,3,6],[7,9,10]]
Note:
1. This code is going to be linked to a constantly updated csv file, so the org_network list will have varying amounts of elements within it (which also means that there will be numerous resulting networks.
I have been working on this for about four hours and have yet to figure it out. Any help would be very appreciated. I have primarily been trying to use for loops and any() functions to no avail. Thanks for any help.

You can use itertools.combinations() with set intersection:
>>> from itertools import combinations
>>> org_network=[[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
>>> [[x, y] for x, y in combinations(org_network, r=2) if not set(x).intersection(y)]
[[[1, 2, 3], [7, 9, 10]], [[1, 4, 5], [7, 9, 10]], [[1, 3, 6], [7, 9, 10]]]

Here is an approach that will be efficient if the number of unique elements is small relative to the number of sets.
Steps:
For each unique element, store indices of all sets in which the element does not occur.
For each set s in the network, find all other sets that contain every element of s using data from the first step.
Iterate over pairs, discarding duplicates based on ID order.
from functools import reduce
org_network = [[1, 2, 3], [1, 4, 5], [1, 3, 6], [7, 9, 10]]
# convert to sets
sets = [set(lst) for lst in org_network]
# all unique numbers
uniqs = set().union(*sets)
# map each unique number to sets that do not contain it:
other = {x: {i for i, s in enumerate(sets) if x not in s} for x in uniqs}
# iterate over sets:
for i, s in enumerate(sets):
# find all sets not overlapping with i
no_overlap = reduce(lambda l, r: l.intersection(r), (other[x] for x in s))
# iterate over non-overlapping sets
for j in no_overlap:
# discard duplicates
if j <= i:
continue
print([org_network[i], org_network[j]])
# result
# [[1, 2, 3], [7, 9, 10]]
# [[1, 4, 5], [7, 9, 10]]
# [[1, 3, 6], [7, 9, 10]]
Edit: If combinations of size greater than two are required, it is possible to modify the above approach. Here is an extension that uses depth-first search to traverse all pairwise disjoint combinations.
def not_overlapping(set_ids):
candidates = reduce(
lambda l, r: l.intersection(r), (other[x] for sid in set_ids for x in sets[sid])
)
mid = max(set_ids)
return {c for c in candidates if c > mid}
# this will produce "combinations" consisting of a single element
def iter_combinations():
combs = [[i] for i in range(len(sets))]
while combs:
comb = combs.pop()
extension = not_overlapping(comb)
combs.extend(comb + [e] for e in extension)
yield [org_network[i] for i in comb]
def iter_combinations_long():
for comb in iter_combinations():
if len(comb) > 1:
yield comb
all_combs = list(iter_combinations_long())

How to store the values of a set distinct subsets using dictionary in Python?

How could I use a list or tuple as a key for a dictionary in Python? Let us suppose I have a set of subsets as L = [[1, 2, 3], [4, 5], [6, 19]. Now I want store a value for each subset. How could I handle this in Python?

One way you can do it is, to convert the individual elements as the key.
Example:
{"".join(map(str, x)): x for x in L}
This would give an output of
{'123': [1, 2, 3], '45': [4, 5], '619': [6, 19]}
for your example
Note that it is not the most efficient way to go about.

Check item membership in set in Python

Hello I've been coding for a couple of months now and know the basics, but I'm having a set membership problem for which I can't find a solution.
I have a list of lists of pairs of integers, and I want to remove the list that have the "a" integer in them. I thought using sets was the easiest way. Bellow is the code:
## This is the item to test against.
a = set([3])
## This is the list to test.
groups = [[3, 2], [3, 4], [1, 2], [5, 4], [4, 3]]
## This is a list that will contain the lists present
## in groups which do not contain "a"
groups_no_a = []
for group in groups:
group = set(group)
if a in group:
groups_no_a.append(group)
## I thought the problem had something to do with
## clearing the variable so I put this in,
## but to no remedy.
group.clear()
print groups_no_a
I had also tried using s.issubset(t) until I realized that this tested if every element in s in t.
Thank you!

You want to test if there is no intersection:
if not a & group:
or
if not a.intersection(group):
or, inversely, that the sets are disjoint:
if a.isdisjoint(group):
The method forms take any iterable, you don't even have to turn group into a set for that. The following one-liner would work too:
groups_no_a = [group for group in groups if a.isdisjoint(group)]
Demo:
>>> a = set([3])
>>> groups = [[3, 2], [3, 4], [1, 2], [5, 4], [4, 3]]
>>> [group for group in groups if a.isdisjoint(group)]
[[1, 2], [5, 4]]
If all you are testing for is one element, then it could be that creating sets is going to cost more in performance than what you gain in testing for membership, and just doing:
3 not in group
where group is a short list.
You can use the timeit module to compare pieces of Python code to see what works best for your specific typical list sizes.

Maybe you could use List Comprehension:
a = 3
groups = [[3, 2], [3, 4], [1, 2], [5, 4], [4, 3]]
print [x for x in groups if a not in x]
Edit based on a comment:
Well to those curious, what I want to do is; I have a list like the
following: [ [error, [ [group_item_1, group_item_2], [...], [...],
[...] ] ], [more like this previous], [...] ], and I want to get the
item with least error and that doesn't have "a" in group_item_1 or
group_item_2. The lists are already sorted by error. I sorta almost go
it :D
This should do the trick:
from itertools import chain, iterfilter
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
errors_list = [ ['error0', [ [30, 2], [3, 4], [1, 2], [5, 4], [4, 3] ] ], ['error1', [ [31, 2], [3, 4], [1, 2], [5, 4], [4, 3] ] ] ]
a = 30
result = next(ifilter(lambda err: a not in flatten(err[1]), reversed(errors_list)), None)
print result #finds error1 as it has no 30 on its list

Rather than making a = set([3]), why not do the following?
a = 3
groups = [[3, 2], [3, 4], [1, 2], [5, 4], [4, 3]]
groups_no_a = [group for group in groups if a not in group]

You don't need to use sets here, you can test for membership of elements in lists. You also seem to have in, where I think you should have not in.
This code is similar to yours, and should work:
## This is the item to test against.
a = 3
## This is the list to test.
groups = [[3, 2], [3, 4], [1, 2], [5, 4], [4, 3]]
## This is a list that will contain the lists present
## in groups which do not contain a
groups_no_a = []
for group in groups:
if a not in group:
groups_no_a.append(group)
print groups_no_a
However, a shorter, more Pythonic way uses list comprehensions:
groups_no_a = [i for i in groups if a not in i]
If you are testing a whether an item is in a much longer list, you should use sets instead for performance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I get unique arrays in Python? - python

To deal with sets not being hashable, you can create a set of frozensets this way: unique = {frozenset(i) for i in nums} Then you can use whichever means to turn the results into the objects you want; for example: unique = [list(i) for i in unique]

Related

How to calculate the difference between two lists of lists in Python

Remove tuples that are duplicate combinations of lists in the list of these tuples

how to manipulate nested lists

How to store the values of a set distinct subsets using dictionary in Python?

Check item membership in set in Python

Categories

Resources