Removing duplicates from a 4D list

Removing duplicates from a 4D list - python

I'm working with a 4D list and I'm trying to remove some duplicate inner lists, I have done something, but it's not exactly working, here is my code.
mylist = [[[], [[4, 3], [4, 3]], [[3, 2], [2, 3], [3, 4]]], [[[4, 2], [2, 3]], [[4, 3], [4, 3]], [[3, 2], [2, 3], [3, 4]]]]
final_list = []
for i in mylist:
current = []
for j in i:
for k in j:
for l in zip(k, k[1:]):
if list(l) not in current:
current.append(list(l))
final_list.append(current)
print(final_list)
final_list = [[[3, 2], [2, 3], [3, 4]], [[3, 2], [2, 3], [3, 4]]]
So instead of removing elements I append the values that are the same. This should be my desired output
#Here I remove the duplicate [4,3] #And here
! !
v v
final_list = [[[], [[4, 3]], [[3, 2], [2, 3], [3, 4]]], [[[4, 2], [2, 3]], [[4, 3]], [[3, 2], [2, 3], [3, 4]]]]
I think there should be an easy way, too many nested for loops, so any help would be appreciated, thank you so much!

You can try this with itertools.groupby:
import itertools
final_list=[[list(sbls for sbls,_ in itertools.groupby(sbls)) for sbls in ls] for ls in mylist]
Same as:
final_list=[[[sbls[i] for i in range(len(sbls)) if i == 0 or sbls[i] != sbls[i-1]] for sbls in ls] for ls in mylist]
Both outputs:
final_list
[[[], [[4, 3]], [[3, 2], [2, 3], [3, 4]]],
[[[4, 2], [2, 3]], [[4, 3]], [[3, 2], [2, 3], [3, 4]]]]
It can be done manually as well, with for loops, similar to your original approach:
flist=[]
for ls in mylist:
new_ls=[]
for sbls in ls:
new_sbls = []
for elem in sbls:
if elem not in new_sbls:
new_sbls.append(elem)
new_ls.append(new_sbls)
flist.append(new_ls)

You could use itertools.chain twice in order to reduce your list to a two-dimensional list. Now you can search for duplicates (e.g. by using count to count the numbers of occurrences. There are many solutions for this). Once you found all your duplicate entries, iterate over your original list and remove all but one occurrence of the duplicates:
import itertools
flat_list = itertools.chain(*itertools.chain(*mylist))
# TODO find duplicates in flat list
duplicates = ...
# TODO remove all duplicates from the original list

Related

How can I modify my code in order to avoid duplicate entries in this generator function?

The problem is as follows: Write a function choose_gen(S, k) that produces a generator that yields all the k-element subsets of a set S (represented as a sorted list of values without duplicates) in some arbitrary order.
Here is what I have so far:
def choose_gen(l: object, k: object) -> object:
if k>len(l):
return None
elif k == len(l):
yield sorted(l)
return
for i in l:
aux = l[:]
aux.remove(i)
result = choose_gen(aux, k)
if result:
yield from result
It runs but does not avoid the duplicate subsets. Could somebody please help to solve this issue? Thanks in advance.
an example of an input would be:
print([s for s in choose_gen([1,3,5,7], 2)])
actual output: [[5, 7], [3, 7], [3, 5], [5, 7], [1, 7], [1, 5], [3, 7], [1, 7], [1, 3], [3, 5], [1, 5], [1, 3]]
expected output: [[5, 7], [3, 7], [3, 5], [1, 7], [1, 5], [1, 3]]

I am not sure. But
I think that in the 6th line you have to write something after return. You have left it empty.

Or try,
new_menu = [s for s in choose_gen([1,3,5,7], (2)]
final_new_menu = list(dict.fromkeys(new_menu))
print(final_new_menu)

Combination of betting odds in Python

So I'm new to Python and I've decided to work on a project that I'm interested in. I've connected to an API to get betting odds from different bookies. I've successfully got the data and stored in a Sqlite3 database. The next step is to compare the odds, and this is where I'm getting stuck.
So let's say I have a list of odds from 3 bookies:
bookie1 = [1,2]
bookie2 = [3,4]
bookie3 = [5,6]
then I have the odds from all bookies in 1 list, such as:
bookies_all = [ [1,2], [3,4], [5,6] ]
How do I get the combinations of odds from the 3 bookies?
I expect the output to look something like this:
combos = [[1,3], [1,5], [1,4], [1,6], [2,3], [2,5], [2,4], [2,6], [3,5], [3,6],[4,5], [4,6]]
Is the best option to loop through the list?

I've coded this up and it gives me all the combinations I need.
bookies_all = [[1, 2], [3, 4], [5, 6]]
combos = []
count = 0
for outer in bookies_all:
for inner in bookies_all:
temp_list = [outer[0], inner[1]]
count += 1
combos.append(temp_list)
print(combos)
Output: [[1, 2], [1, 4], [1, 6], [3, 2], [3, 4], [3, 6], [5, 2], [5, 4], [5, 6]]
The combinations in bold are the ones I want. This code works for this example.
I will test it out for scenarios where the bookies_all list has more values.

You can use itertools.combinations to find the combinations of bookies, then use a list comprehension to interleave the items:
from itertools import combinations
bookies_all = [[1, 2], [3, 4], [5, 6]]
all_comb = list(combinations(bookies_all, 2))
#print(all_comb)
combos = [[i, j] for c in all_comb for i in c[0] for j in c[1]]
print(combos)
Output:
[[1, 3], [1, 4], [2, 3], [2, 4], [1, 5], [1, 6], [2, 5], [2, 6], [3, 5], [3, 6], [4, 5], [4, 6]]

Remove sublist after first element appears n times [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a long nested list. Each sublist contains 2 elements. What I would like to do is iterate over the full list and remove sublists once I've found the first element more than 3 times.
Example:
ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
desired_result = [[1,1], [1,2], [1,3], [2,2], [2,3], [3,4], [3,5], [3,6]]

If the input is sorted by the first element, you could use groupby and islice:
from itertools import groupby, islice
from operator import itemgetter
ls = [[1, 1], [1, 2], [1, 3], [1, 4], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6], [3, 7]]
result = [e for _, group in groupby(ls, key=itemgetter(0)) for e in islice(group, 3)]
print(result)
Output
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]
The idea is to group the elements by the first value using groupby, and then fetch the first 3 values, if they exist, using islice.

You can do it like below:
ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
val_count = dict.fromkeys(set([i[0] for i in ls]), 0)
new_ls = []
for i in ls:
if val_count[i[0]] < 3:
val_count[i[0]] += 1
new_ls.append(i)
print(new_ls)
Output:
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]

Probably not the shortest answer.
The idea is to count occurrences while you're iterating over ls
from collections import defaultdict
filtered_ls = []
counter = defaultdict(int)
for l in ls:
counter[l[0]] += 1
if counter[l[0]] > 3:
continue
filtered_ls += [l]
print(filtered_ls)
# [[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]

You can use collections.defaultdict to aggregate by first value in O(n) time. Then use itertools.chain to construct a list of lists.
from collections import defaultdict
from itertools import chain
dd = defaultdict(list)
for key, val in ls:
if len(dd[key]) < 3:
dd[key].append([key, val])
res = list(chain.from_iterable(dd.values()))
print(res)
# [[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]

Ghillas BELHADJ answer is good. But you should consider defaultdict for this task. The idea is taken from Raymond Hettinger who suggested to use defaultdict for grouping and counting tasks
from collections import defaultdict
def remove_sub_lists(a_list, nth_occurence):
found = defaultdict(int)
for sublist in a_list:
first_index = sublist[0]
print(first_index)
found[first_index] += 1
if found[first_index] <= nth_occurence:
yield sublist
max_3_times_first_index = list(remove_sub_lists(ls, 3)))

If the list is already sorted, you can use itertools.groupby then just keep the first three items from each group
>>> import itertools
>>> ls = [[1,1], [1,2], [1,3], [1,4], [2,2], [2,3], [3,4], [3,5], [3,6], [3,7]]
>>> list(itertools.chain.from_iterable(list(g)[:3] for _,g in itertools.groupby(ls, key=lambda i: i[0])))
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 4], [3, 5], [3, 6]]

Here's an option that doesn't use any modules:
countDict = {}
for i in ls:
if str(i[0]) not in countDict.keys():
countDict[str(i[0])] = 1
else:
countDict[str(i[0])] += 1
if countDict[str(i[0])] > 3:
ls.remove(i)

Data structure to represent multiple equivalent keys in set in Python?

Currently, I want to find the correct data structure to meet the following requirement.
There are multiple arrays with disordered element, for example,
[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]
After processing those data, the result is,
[1, 2], [2, 2, 3], [2], [1, 2, 3]
With sorted element in each array and filter the duplicate arrays.
Here are my thoughts:
Data structure Set(Arrays)? - Failed. It seems there is only one array in the build-in set
set([])
Data structure Array(Sets)? - Failed. However, there is no duplicate element in the build-in set. I want to know whether there is one data structure like multiset in C++ within Python?

Transform your list to tuple(thus can be a item of set), then back to list.
>>> [list(i) for i in set([tuple(sorted(i)) for i in a])]
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]

lst = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
map(list, set(map(tuple, map(sorted, lst)))
Output:
[[1, 2], [2], [2, 2, 3], [1, 2, 3]]

Try this:
[list(i) for i in set(map(tuple, a))]
EDIT:
Assuming that list is already sorted. Thanks to #PM2RING to remind me.
If not, then add this line above
a = [sorted(i) for i in a]
Thanks again to #PM2RING: one liner
[list(i) for i in set(map(tuple, (sorted(i) for i in a)))]
Demo

Some of the solutions currently here are destroying ordering. I'm not sure if that's important to you or not, but here is a version which preserves original ordering:
>>> from collections import OrderedDict
>>> A = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
>>> [list(k) for k in OrderedDict.fromkeys(tuple(sorted(a)) for a in A)]
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]

No Python, doesn't have a built-in multiset; the closest equivalent in the standard modules is collections.Counter, which is a type of dictionary. A Counter may be suitable for your needs, but it's hard to tell without more context.
Note that sets do not preserve order of addition. If you need to preserve the initial ordering of the lists, you can do what you want like this:
data = [[1, 2], [2, 1], [3, 2, 2], [2], [2, 1, 3], [2, 2, 3]]
a = set()
outlist = []
for s in data:
t = tuple(sorted(s))
if t not in a:
a.add(t)
outlist.append(list(t))
print(outlist)
output
[[1, 2], [2, 2, 3], [2], [1, 2, 3]]
If the number of input lists is fairly small you don't need the set (and the list<->tuple conversions), just test membership in outlist. However, that's not efficient for larger input lists since it performs a linear search on the list.

Python reverse sublist of multi dimensional list

I have the following list:
list = [[1, 2], [3, 4], [5, 6]]
How can I reverse each sublist? i.e
list = [[2, 1], [4, 3], [6, 5]]

Use a list comprehension:
[sublist[::-1] for sublist in outerlist]
Demo:
>>> outerlist = [[1, 2], [3, 4], [5, 6]]
>>> [sublist[::-1] for sublist in outerlist]
[[2, 1], [4, 3], [6, 5]]
This produces a new list. You can also reverse sublists in place by calling the list.reverse() method on each one in a loop:
for sublist in outerlist:
sublist.reverse()

The comprehension and slice syntax is great, but if you want the result to happen in-place with the same outer list, I suggest this might be more readable:
for elem in outerlist:
elem.reverse()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing duplicates from a 4D list - python

Related

How can I modify my code in order to avoid duplicate entries in this generator function?

Combination of betting odds in Python

Remove sublist after first element appears n times [closed]

Data structure to represent multiple equivalent keys in set in Python?

Python reverse sublist of multi dimensional list

Categories

Resources