Currently, I want to find the correct data structure to meet the following requirement.
There are multiple arrays with disordered element, for example,
[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]
After processing those data, the result is,
[1, 2], [2, 2, 3], [2], [1, 2, 3]
With sorted element in each array and filter the duplicate arrays.
Here are my thoughts:
Data structure Set(Arrays)? - Failed. It seems there is only one array in the build-in set
set([])
Data structure Array(Sets)? - Failed. However, there is no duplicate element in the build-in set. I want to know whether there is one data structure like multiset in C++ within Python?
Transform your list to tuple(thus can be a item of set), then back to list.
>>> [list(i) for i in set([tuple(sorted(i)) for i in a])]
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]
lst = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
map(list, set(map(tuple, map(sorted, lst)))
Output:
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]
Try this:
[list(i) for i in set(map(tuple, a))]
EDIT:
Assuming that list is already sorted. Thanks to #PM2RING to remind me.
If not, then add this line above
a = [sorted(i) for i in a]
Thanks again to #PM2RING: one liner
[list(i) for i in set(map(tuple, (sorted(i) for i in a)))]
Demo
Some of the solutions currently here are destroying ordering. I'm not sure if that's important to you or not, but here is a version which preserves original ordering:
>>> from collections import OrderedDict
>>> A = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
>>> [list(k) for k in OrderedDict.fromkeys(tuple(sorted(a)) for a in A)]
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]
No Python, doesn't have a built-in multiset; the closest equivalent in the standard modules is collections.Counter, which is a type of dictionary. A Counter may be suitable for your needs, but it's hard to tell without more context.
Note that sets do not preserve order of addition. If you need to preserve the initial ordering of the lists, you can do what you want like this:
data = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
a = set()
outlist = []
for s in data:
t = tuple(sorted(s))
if t not in a:
a.add(t)
outlist.append(list(t))
print(outlist)
output
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]
If the number of input lists is fairly small you don't need the set (and the list<->tuple conversions), just test membership in outlist. However, that's not efficient for larger input lists since it performs a linear search on the list.
Related
I have a dictionary, each key of dictionary has a list of list (nested list) as its value. What I want is imagine we have:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
My question is how can I access each element of the dictionary and concatenate those with same index: for example first list from all keys:
[1,2] from first keye +
[2,1] from second and
[1,5] from third one
How can I do this?
You can access your nested list easily when you're iterating through your dictionary and append it to a new list and the you apply the sum function.
Code:
x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
ans=[]
for key in x:
ans += x[key][0]
print(sum(ans))
Output:
12
Assuming you want a list of the first elements, you can do:
>>> x={1: [[1,2],[3,5]] , 2:[[2,1],[2,6]], 3:[[1,5],[5,4]]}
>>> y = [a[0] for a in x.values()]
>>> y
[[1, 2], [2, 1], [1, 5]]
If you want the second element, you can use a[1], etc.
The output you expect is not entirely clear (do you want to sum? concatenate?), but what seems clear is that you want to handle the values as matrices.
You can use numpy for that:
summing the values
import numpy as np
sum(map(np.array, x.values())).tolist()
output:
[[4, 8], [10, 15]] # [[1+2+1, 2+1+5], [3+2+5, 5+6+4]]
concatenating the matrices (horizontally)
import numpy as np
np.hstack(list(map(np.array, x.values()))).tolist()
output:
[[1, 2, 2, 1, 1, 5], [3, 5, 2, 6, 5, 4]]
As explained in How to iterate through two lists in parallel?, zip does exactly that: iterates over a few iterables at the same time and generates tuples of matching-index items from all iterables.
In your case, the iterables are the values of the dict. So just unpack the values to zip:
x = {1: [[1, 2], [3, 5]], 2: [[2, 1], [2, 6]], 3: [[1, 5], [5, 4]]}
for y in zip(*x.values()):
print(y)
Gives:
([1, 2], [2, 1], [1, 5])
([3, 5], [2, 6], [5, 4])
Please could I have help with the following query in Python 3.9.
I have the following sublists:
[0, 1]
[1, 3]
[2, 5]
I would like to make a new list with each of these sublists repeated a different number of times. Required output:
[[0,1],[0,1],[0,1],[1,3],[1,3],[2,5],[2,5],[2,5],[2,5]]
I have tried doing the following:
[[[0,1]]*3,[[1,3]]*2,[[2,5]]*4]
However I get this:
[[[0,1],[0,1],[0,1]],[[1,3],[1,3]],[[2,5],[2,5],[2,5],[2,5]]]
How do I get my desired output? Or alternatively, how do I just flatten it by one level? Thank you
You can just unpack the sublists:
[*[[0,1]]*3, *[[1,3]]*2, *[[2,5]]*4]
# [[0, 1], [0, 1], [0, 1], [1, 3], [1, 3], [2, 5], [2, 5], [2, 5], [2, 5]]
Note however, that the resulting sublists are not independent, but references to the same list objects (changes made to one sublist will be reflected in all the equal others)! Better use generators/comprehensions:
[*([0,1] for _ in range(3)),
*([1,3] for _ in range(2)),
*([2,5] for _ in range(4))]
# [[0, 1], [0, 1], [0, 1], [1, 3], [1, 3], [2, 5], [2, 5], [2, 5], [2, 5]]
The more general question of 1-level flattening has been asked and answered multiple times, but the main options are the nested comprehension:
[x for sub in lst for x in sub]
or itertools.chain:
[*chain(lst)]
You can use list loop:
r1 = [0, 1]
r2 = [1, 3]
r3 = [2, 5]
h = [*(r1 for x in range(3)),
*(r2 for x in range(2)),
*(r3 for x in range(4))]
print(h)
I have the following code to remove from the data list all sublists for which nums is a subset.and I dont understand why its not working:
data=[[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]
nums=[1,2]
for each in data:
if set(nums).issubset(each):
data.remove(each)
print(data)
>>[[1, 2, 4], [1, 3, 4], [2, 3, 4]]
Why isn't [1,2,4] being removed when nums is a subset of it, as seen below?
set(nums).issubset([1,2,4])
>>True
You're modifing the list you're iterating from.
This is a nicer solution:
data=[[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]
nums=[1,2]
data = [each for each in data if not set(nums).issubset(each)]
print(data)
For learning purposes, see this code which also works. The difference with your code is that here we're not modifying data list in the for loop.
data=[[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]
nums=[1,2]
new_data = []
for each in data:
if not set(nums).issubset(each):
new_data.append(each)
data = new_data
print(data)
Because the iterator is unaware that you removed an element. When it passes to the second element, it finds [1, 3, 4] meaning that you skipped [1, 2, 4].
For your information, there is also a filterfalse function in the very useful itertools module.
from itertools import filterfalse
data = list(filterfalse(set(nums).issubset, data))
I have the following list
list = [1, 2, 3, [3, [1, 2]]]
the result would be:
[[[2, 1], 3], 3, 2, 1]
How to sort that list by size of list and by element?
Here's one way to recursively sort the list:
def recursive_sort(item):
if isinstance(item, list):
item[:] = sorted(item, key=recursive_sort)
return 0, -len(item)
else:
return 1, -item
lst = [1, 2, 3, [3, [1, 2], [2, 3, 6]]]
print(sorted(lst, key=recursive_sort))
# [[[6, 3, 2], [2, 1], 3], 3, 2, 1]
Caveat: This is more of an academic exercise and should never be used in production code. The state of the list during a sort (at least with Timsort in CPython) is undefined, so you shouldn't count on this to always work.
Does Python offer a way to iterate over all "consecutive sublists" of a given list L - i.e. sublists of L where any two consecutive elements are also consecutive in L - or should I write my own?
(Example: if L = [1, 2, 3], then the set over which I want to iterate is {[1], [2], [3], [1, 2], [2,3], [1, 2, 3]}. [1, 3] is skipped since 1 and 3 are not consecutive in L.)
I don't think there's a built-in for exactly that; but it probably wouldn't be too difficult to code up by hand - you're basically just looping through all of the possible lengths from 1 to L.length, and then taking all substrings of each length.
You could probably use itertools.chain() to combine the sequences for each length of substring together into a generator for all of them.
Example:
>>> a = [1,2,3,4]
>>> list(
... itertools.chain(
... *[[a[i:i+q] for q in xrange(1,len(a)-i+1)] for i in xrange(len(a))]
... )
... )
[[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], [2], [2, 3], [2, 3, 4], [3], [3, 4], [4]]
If you prefer them in the increasing-length-and-then-lexographical-order sequence that you described, you'd want this instead:
itertools.chain(*[[a[q:i+q] for q in xrange(len(a)-i+1)] for i in xrange(1,len(a)+1)])
Try something like this:
def iter_sublists(l):
n = len(l)+1
for i in xrange(n):
for j in xrange(i+1, n):
yield l[i:j]
>>> print list(iter_sublists([1,2,3]))
[[1], [1, 2], [1, 2, 3], [2], [2, 3], [3]]
This should work:
def sublists(lst):
for sublen in xrange(1,len(lst)+1):
for idx in xrange(0,len(lst)-sublen+1):
yield lst[idx:idx+sublen]