Remove duplicates and combine multiple lists into one?

Remove duplicates and combine multiple lists into one? - python

How do I remove duplicates and combine multiple lists into one like so:
function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]) should return exactly:
[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]

The easiest one would be using defaultdict .
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i,j in l:
d[i].append(j) #append value to the key
>>> d
=> defaultdict(<class 'list'>, {'hello': ['me.txt'], 'good': ['me.txt', 'money.txt'],
'rep': ['money.txt']})
#to get it in a list
>>> out = [ [key,d[key]] for key in d]
>>> out
=> [['hello', ['me.txt']], ['good', ['me.txt', 'money.txt']], ['rep', ['money.txt']]]
#driver values :
IN : l = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]

Try This ( no library needed ):
your_input_data = [ ["hello","me.txt"], ["good","me.txt"], ["good","me.txt"], ["good","money.txt"], ["rep", "money.txt"] ]
my_dict = {}
for box in your_input_data:
if box[0] in my_dict:
buffer_items = []
for items in box[1:]:
if items not in my_dict[box[0]]:
buffer_items.append(items)
remove_dup = list(set(buffer_items + my_dict[box[0]]))
my_dict[box[0]] = remove_dup
else:
buffer_items = []
for items in box[1:]:
buffer_items.append(items)
remove_dup = list(set(buffer_items))
my_dict[box[0]] = remove_dup
last_point = [[keys, values] for keys, values in my_dict.items()]
print(last_point)
Good Luck ...

You can do it with traditional dictionaries too.
In [30]: l1 = [["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
In [31]: for i, j in l1:
...: if i not in d2:
...: d2[i] = j
...: else:
...: val = d2[i]
...: d2[i] = [val, j]
...:
In [32]: d2
Out[32]: {'good': ['me.txt', 'money.txt'], 'hello': 'me.txt', 'rep': 'money.txt'}
In [33]: out = [ [key,d1[key]] for key in d1]
In [34]: out
Out[34]:
[['rep', ['money.txt']],
['hello', ['me.txt']],
['good', ['me.txt', 'money.txt']]]

Let's first understand the actual problem :
Example Hint :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
Your actual problem solution :
list_1=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
no_dublicates={}
for item in list_1:
if item[0] not in no_dublicates:
no_dublicates[item[0]]=["".join(item[1:])]
else:
no_dublicates[item[0]].extend(item[1:])
list_result=[]
for key,value in no_dublicates.items():
list_result.append([key,value])
print(list_result)
output:
[['hello', ['me.txt']], ['rep', ['money.txt']], ['good', ['me.txt', 'money.txt']]]

yourList=[["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]
expectedList=[["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]
def getall(allsec, listKey, uniqlist):
if listKey not in uniqlist:
uniqlist.append(listKey)
return [listKey, [x[1] for x in allsec if x[0] == listKey]]
uniqlist=[]
result=sorted(list(filter(lambda x:x!=None, [getall(yourList,elem[0],uniqlist) for elem in yourList])))
print(result)
hope this helps

This can easily be solved using dict and sets.
def combine_duplicates(given_list):
data = {}
for element_1, element_2 in given_list:
data[element_1] = data.get(element_1, set()).add(element_2)
return [[k, list(v)] for k, v in data.items()]

Using Python to create a function that gives you the exact required output can be done as follows:
from collections import defaultdict
def function(data):
entries = defaultdict(list)
for k, v in data:
entries[k].append(v)
return sorted([k, v] for k, v in entries.items())
print(function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]))
The output is sorted before being returned as per your requirement. This would display the return from the function as:
[['good', ['me.txt', 'money.txt']], ['hello', ['me.txt']], ['rep', ['money.txt']]]
It also ensures that the keys are sorted. A dictionary is used to deal with the removal of duplicates (as keys need to be unique).
A defaultdict() is used to simplify the building of lists within the dictionary. The alternative would be to try and append a new value to an existing key, and if there is a KeyError exception, then add the new key instead as follows:
def function(data):
entries = {}
for k, v in data:
try:
entries[k].append(v)
except KeyError as e:
entries[k] = [v]
return sorted([k, v] for k, v in entries.items())

Create a empty array push the index 0 from childs arrays and join to convert all values to a string separate by space .
var your_input_data = [ ["hello","hi", "jel"], ["good"], ["good2","lo"], ["good3","lt","ahhahah"], ["rep", "nice","gr8", "job"] ];
var myprint = []
for(var i in your_input_data){
myprint.push(your_input_data[i][0]);
}
console.log(myprint.join(' '))

Related

how can I merge lists to create a python dictionary

So I have tried many methods to do this but could not find a working solution. In this problem, I have two python arrays and I would like to join them to create a big dictionary. It would go something like this:
`list1 = [
[2, "ford"],
[4,"Ferrari"],
[3, "Mercedes"],
[1, "BMW"]
]`
`list2 = [
[4, "mustang"],
[3,"LaFerrari"],
[2,"CLA"],
[1,"M5"],
[6,"opel"]
]`
The result that I would like to have is a dictionary that looks like this:
`result = {
1: ["BMW","M5"], 2: ["Ford","CLA"], 3: ["Mercedes","LaFerrari"], 4: ["Ferrari","Mustang"], 6:["Opel"]
}`
So it just basically needs to merge these two arrays based on the "key" (which is just the [0] place in the array)

It looks like task for collections.defaultdict I would do:
import collections
list1 = [
[1, "ford"],
[2,"Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2,"LaFerrari"],
[3,"CLA"],
[4,"M5"]
]
result = collections.defaultdict(list)
for key, value in list1:
result[key].append(value)
for key, value in list2:
result[key].append(value)
result = dict(result)
print(result)
Output:
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
Here I used defaultdict with lists, unlike common dict if you try do something with value under key which do not exist yet, it did place list() i.e. empty list, then do requested action (appending in this case). At end I convert it into dict just to fullfill your requirement (create a python dictionary).

Use collections.defaultdict
from collections import defaultdict
result = defaultdict(list)
for k,v in list1 + list2:
result[k].append(v)
print (dict(result))
#{2: ['ford', 'CLA'], 4: ['Ferrari', 'mustang'], 3: ['Mercedes', 'LaFerrari'], 1: ['BMW', 'M5'], 6: ['opel']}

I am also pretty new to Python, but I think something like this should work if both lists have the same keys:
list1 = [
[1, "ford"],
[2, "Ferrari"],
[3, "Mercedes"],
[4, "BMW"]
]
list2 = [
[1, "mustang"],
[2, "LaFerrari"],
[3, "CLA"],
[4, "M5"]
]
dict1 = dict(list1)
dict2 = dict(list2)
result = {}
for key,val in dict1.items():
result[key] = [val]
for key, val in dict2.items():
result[key].append(val)
print(result)
output
{1: ['ford', 'mustang'], 2: ['Ferrari', 'LaFerrari'], 3: ['Mercedes', 'CLA'], 4: ['BMW', 'M5']}
As already mentioned, I am a newbie too, so there is probably a more "pythonic" way of doing this.

First, create a dict using the values in list1. Then update the lists in the dict with the values from list2, or create new lists for the keys in list2 which don't exist in list1:
result = {i: [j] for i, j in list1} # create initial dict from all values in list1
for i, j in list2:
if i in result:
result[i].append(j) # add to preexisting list corresponding to key
else:
result[i] = [j] # create new list corresponding to key
If your lists will have multiple values, you can use this where you handle the add logic in a separate function:
result = {}
def add_to_dict(d, key, val):
if key in d:
d[key].append(val)
else:
d[key] = [val]
for el in (list1 + list2):
key, *vals = el
for val in vals:
add_to_dict(result, key, val)
Here, rather than assuming each sublist has only 2 elements, we can unpack the key as the first element and the rest of the elements into a list called vals. Then, we can iterate over the list and perform the same adding logic

If you're sure that both lists contain the same number of items and both has a matching first element in each item (1, 2, 3, 4 in your example),
result = {k: [dict(list1)[k], dict(list2)[k]] for k in dict(list1)}

Create dictionary from dict and list

I have a dictionary :
dicocategory = {}
dicocategory["a"] = ["crapow", "Killian", "pauk", "victor"]
dicocategory["b"] = ["graton", "fred"]
dicocategory["c"] = ["babar", "poca", "german", "Georges", "nowak"]
dicocategory["d"] = ["crado", "cradi", "hibou", "distopia", "fiboul"]
dicocategory["e"] = ["makenkosapo"]
and a list :
my_list = ['makenkosapo', 'Killian', 'Georges', 'poca', 'nowak']
I want to create a new dictionary with my dicocategory's keys as new keys and items of my list as values.
To get the keys of my new dict (removing duplicate content and adapted to my list) I made :
def tablemain(my_list ):
tableheaders = list()
for value in my_list:
tableheaders.append([k for k, v in dicocategory.items() if value in v])
convertlist = [j for i in tableheaders for j in i]
headerstablefinal = list(set(convertlist))
return headerstablefinal
giving me:
['e', 'a', 'c']
My problem is: I don't know how to put the items of my list in the corresponding keys.
EDIT :
Bellow an output of what I want
{"a" : ['Killian'], 'c' : ['Georges', 'poca', 'nowak'], 'e' : ['makenkosapo']}
The list my_list can change, so I want something that can create a new dictionary doesn't matter the list.
If my new list is :
my_list = ['crapow', 'german', 'pauk']
My output will be :
{'a':['crapow', 'pauk'], 'c':['german']}
Do you have any idea?
Thank you

You can use a couple of dictionary comprehensions. Calculate the intersection in the first, and in the second remove instances where the intersection is empty:
my_set = set(my_list)
# calculate intersection
res = {k: set(v) & my_set for k, v in dicocategory.items()}
# remove zero intersection values
res = {k: v for k, v in res.items() if v}
print(res)
{'a': {'Killian'},
'c': {'Georges', 'nowak', 'poca'},
'e': {'makenkosapo'}}
More efficiently, you can use a generator expression to avoid an intermediary dictionary:
# generate intersection
gen = ((k, set(v) & my_set) for k, v in dicocategory.items())
# remove zero intersection values
res = {k: v for k, v in gen if v}

You can get a dictionary containing only keys with values that match your list like this:
{k:v for k,v in dicocategory.items() if set(v).intersection(set(my_list))}
You won't be able to put that directly into a DataFrame though as the lists differ in length.

Split list into sublists based on string split

I have a list like this:
a = [['cat1.subcat1.item1', 0], ['cat1.subcat1.item2', 'hello], [cat1.subcat2.item1, 1337], [cat2.item1, 'test']]
So there may be several subcategories with items, split by a dot. But the number of categoryies and the level of depth isn't fixed and not equal among the categories.
I want the list to look like this:
a = [['cat1', [
['subcat1', [
['item1', 0],
['item2', 'hello']
]],
['subcat2', [
['item1', 1337]
]],
]],
['cat2', [
['item1', 'test']
]]
]
I hope this makes sense.
In the end I need a json string out of this. If it is somehow easier it could also directly be converted to the json string.
Any idea how to achieve this? Thanks!

You should use a nested dictionary structure. This can be processed efficiently using collections.defaultdict and functools.reduce.
Conversion to a regular dictionary is possible, though usually not necessary.
Solution
from collections import defaultdict
from functools import reduce
from operator import getitem
def getFromDict(dataDict, mapList):
"""Iterate nested dictionary"""
return reduce(getitem, mapList, dataDict)
tree = lambda: defaultdict(tree)
d = tree()
for i, j in a:
path = i.split('.')
getFromDict(d, path[:-1])[path[-1]] = j
Result
def default_to_regular_dict(d):
"""Convert nested defaultdict to regular dict of dicts."""
if isinstance(d, defaultdict):
d = {k: default_to_regular_dict(v) for k, v in d.items()}
return d
res = default_to_regular_dict(d)
{'cat1': {'subcat1': {'item1': 0,
'item2': 'hello'},
'subcat2': {'item1': 1337}},
'cat2': {'item1': 'test'}}
Explanation
getFromDict(d, path[:-1]) takes a list path[:-1] and recursively accesses dictionary values corresponding to the list items from dictionary d. I've implemented this bit functionally via functools.reduce and operator.getitem.
We then access the key path[-1], the last element of the list, from the resulting dictionary tree. This will be a dictionary since d is a defaultdict of dictionaries. We can then assign value j to this dictionary.

Not as pretty as #jpp their solution, but hey at least I tried. Using the merge function to merge deep dicts, as seen in this answer.
def merge(a, b, path=None):
"merges b into a"
if path is None: path = []
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
merge(a[key], b[key], path + [str(key)])
elif a[key] == b[key]:
pass # same leaf value
else:
raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
else:
a[key] = b[key]
return a
a = [['cat1.subcat1.item1', 0], ['cat1.subcat1.item2', 'hello'], ['cat1.subcat2.item1', 1337], ['cat2.item1', 'test']]
# convert to dict
b = {x[0]:x[1] for x in a}
res = {}
# iterate over dict
for k, v in list(b.items()):
s = k.split('.')
temp = {}
# iterate over reverse indices,
# build temp dict from the ground up
for i in reversed(range(len(s))):
if i == len(s)-1:
temp = {s[i]: v}
else:
temp = {s[i]: temp}
# merge temp dict with main dict b
if i == 0:
res = merge(res, temp)
temp = {}
print(res)
# {'cat1': {'subcat1': {'item1': 0, 'item2': 'hello'}, 'subcat2': {'item1': 1337}}, 'cat2': {'item1': 'test'}}

python select the lowest alphanumeric value with a reference from multiple lists

I have multiple lists like this:
#Symbol ID
['AAA','MG_00013']
['AAA','MG_00177']
['AAA','MG_00005']
['BBB','MG_0045']
['BBB','MG_00080']
['CCC','MG_0002'] # and so on...
and I would like to choose the list with a same symbol with the smallest ID.
So, the end result is like this:
#Symbol ID
['AAA','MG_00005']
['BBB','MG_0045']
['CCC','MG_0002'] #...
To do that, I have made them into a list of lists
listoflists =[['AAA','MG_00013'],['AAA','MG_00177'],['AAA','MG_00005'],['BBB','MG_0045'],['BBB','MG_00080'],['CCC','MG_0002']]
I'm lost from here...
for i in listoflists:
if i[0] == i[0]:
test.append(i[1])
for i in test:
print(i)
which gives a False result.
I think the logic is to make them into a list like the below and compare the alphanumeric ID and select the lowest one.
[(AAA,['MG_00013','MG_00177','MG_00005'])]
However, I'm completely lost and frustrating now...
Could you ,please, help me go through this?
===============================================
Everybody helping me out is so great!
However, the length of ID have to be considered.
For example, everybody gives me BBB wih MG_00080, but it suppose MG_0045 as 45 is less than 80...

I would think something like a dictionary might be better, but this will give your expected output.
import itertools
listoflists =[['AAA','MG_00013'],['AAA','MG_00177'],['AAA','MG_00005'],['BBB','MG_0045'],['BBB','MG_00080'],['CCC','MG_0002']]
minlists = [
min(value, key=lambda lst: lst[1])
for _, value in itertools.groupby(listoflists, lambda lst: lst[0])
]
print minlists
outputs
[['AAA', 'MG_00005'], ['BBB', 'MG_00080'], ['CCC', 'MG_0002']]
EDIT: The comparison of ids was not clear to me, but to compare them psuedo-numerically (not lexiographically), replace key=lambda lst: lst[1] with
key=lambda lst: int(lst[1][3:])

This is a good spot for a defaultdict
from collections import defaultdict
D = defaultdict(list)
for k,v in listoflists:
D[k].append(v)
return [[k, min(D[k])] for k in D]

ll =[['AAA','MG_00013'],
['AAA','MG_00177'],
['AAA','MG_00005'],
['BBB','MG_0045'],
['BBB','MG_00080'],
['CCC','MG_0002']]
d = {}
for l in ll:
# If key is not the dict, insert the entry into dict
if l[0] not in d:
d[l[0]] = l[1]
# If key is already in the dict, update the entry if value is smaller
elif int(l[1][3:]) < int(d[l[0]][3:]):
d[l[0]] = l[1]
print d
Output:
{'AAA': 'MG_00005', 'BBB': 'MG_0045', 'CCC': 'MG_0002'}

You could convert it into the dictionary of lists
d = { k[0] : [] for k in listoflists }
for k in listoflists: d[k[0]].append(k[1])
ans = [ [k,min(d[k])] for k in d ]
print ans
or just
d = { k[0] : [] for k in listoflists }
for k in listoflists: d[k[0]].append(k[1])
for k in d: print k,min(d[k])

Deleting from dict if found in new list in Python

Say I have a dictionary with whatever number of values.
And then I create a list.
If any of the values of the list are found in the dictionary, regardless of whether or not it is a key or an index how do I delete the full value?
E.g:
dictionary = {1:3,4:5}
list = [1]
...
dictionary = {4:5}
How do I do this without creating a new dictionary?

for key, value in list(dic.items()):
if key in lst or value in lst:
del dic[key]
No need to create a separate list or dictionary.
I interpreted "whether or not it is a key or an index" to mean "whether or not it is a key or a value [in the dictionary]"

it's a bit complicated because of your "values" requirement:
>>> dic = {1: 3, 4: 5}
>>> ls = set([1])
>>> dels = []
>>> for k, v in dic.items():
if k in ls or v in ls:
dels.append(k)
>>> for i in dels:
del dic[i]
>>> dic
{4: 5}

A one liner to do this would be :
[dictionary.pop(x) for x in list if x in dictionary.keys()]

dictionary = {1:3,4:5}
list = [1]
for key in list:
if key in dictionary:
del dictionary[key]

>>> dictionary = {1:3,4:5}
>>> list = [1]
>>> for x in list:
... if x in dictionary:
... del(dictionary[x])
...
>>> dictionary
{4: 5}

def remKeys(dictionary, list):
for i in list:
if i in dictionary.keys():
dictionary.pop(i)
return dictionary

I would do something like:
for i in list:
if dictionary.has_key(i):
del dictionary[i]
But I am sure there are better ways.

A few more testcases to define how I interpret your question:
#!/usr/bin/env python
def test(beforedic,afterdic,removelist):
d = beforedic
l = removelist
for i in l:
for (k,v) in list(d.items()):
if k == i or v == i:
del d[k]
assert d == afterdic,"d is "+str(d)
test({1:3,4:5},{4:5},[1])
test({1:3,4:5},{4:5},[3])
test({1:3,4:5},{1:3,4:5},[9])
test({1:3,4:5},{4:5},[1,3])

If the dictionary is small enough, it's easier to just make a new one. Removing all items whose key is in the set s from the dictionary d:
d = dict((k, v) for (k, v) in d.items() if not k in s)
Removing all items whose key or value is in the set s from the dictionary d:
d = dict((k, v) for (k, v) in d.items() if not k in s and not v in s)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates and combine multiple lists into one? - python

How do I remove duplicates and combine multiple lists into one like so: function([["hello","me.txt"],["good","me.txt"],["good","money.txt"], ["rep", "money.txt"]]) should return exactly: [["good", ["me.txt", "money.txt"]], ["hello", ["me.txt"]], ["rep", ["money.txt"]]]

This can easily be solved using dict and sets. def combine_duplicates(given_list): data = {} for element_1, element_2 in given_list: data[element_1] = data.get(element_1, set()).add(element_2) return [[k, list(v)] for k, v in data.items()]

Related

how can I merge lists to create a python dictionary

Create dictionary from dict and list

Split list into sublists based on string split

python select the lowest alphanumeric value with a reference from multiple lists

Deleting from dict if found in new list in Python

Categories

Resources