Python - split list of lists by value - python

I want to split the following list of lists
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
into the three following lists of lists:
a1 = [["aa",1,3]
["aa",3,3]]
a2 = [["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]]
a3 = [["fffffff",1,3]]
That is, according to the first value of each list. I need to do this for a list of lists with thousands of elements... How can I do it efficiently?

You're better off making a dictionary. If you really want to make a bunch of variables, you'll have to use globals(), which isn't really recommended.
a = [["aa",1,3]
["aa",3,3]
["sdsd",1,3]
["sdsd",6,0]
["sdsd",2,5]
["fffffff",1,3]]
d = {}
for sub in a:
key = sub[0]
if key not in d: d[key] = []
d[key].append(sub)
OR
import collections
d = collections.defaultdict(list)
for sub in a:
d[sub[0]].append(sub)

If input is sorted on first element:
from itertools import groupby
from operator import itemgetter
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
b = { k : list(v) for k, v in groupby(a, itemgetter(0))}

Create a dictionary with the first element as key and matching lists as value. And you will get a dictionary where value of each key value pair will be group of lists having same first element. For example,
a = [["aa", 1, 3],
["aa", 3, 3],
["sdsd", 1, 3],
["sdsd", 6, 0],
["sdsd", 2, 5],
["fffffff", 1, 3]]
d = {}
for e in a:
d[e[0]] = d.get(e[0]) or []
d[e[0]].append(e)
And now you can simply get the lists seperately,
a1 = d['aa']
a2 = d['sdsd']

A defaultdict will work nicely here:
a = [["aa",1,3],
["aa",3,3],
["sdsd",1,3],
["sdsd",6,0],
["sdsd",2,5],
["fffffff",1,3]]
from collections import defaultdict
d = defaultdict(list)
for thing in a:
d[thing[0]] += thing,
for separate_list in d.values():
print separate_list
Output
[['aa', 1, 3], ['aa', 3, 3]]
[['sdsd', 1, 3], ['sdsd', 6, 0], ['sdsd', 2, 5]]
[['fffffff', 1, 3]]

Related

How to use the listnames stored on a list for performing list operations?

I a set of lists and on those lists, I want to perform is to find the first occurence on an element on each list. Suppose I have the following lists:
D1 = [1,2,3,4,5]
D2 = [2,7,6,5,4]
D3 = [1,2,6,8,7]
Now for each list, I want to find out in what position the element 2 first occurs. In this respect, I have first stored the names of the lists in a separate list. I have done it using this code:
s_list = []
for i in range(1,4):
s_list.append('D'+str(i))
print(s_list)
Now, using these names, I want to perform the final operation using this code:
elem = 2
index_pos = []
for i in s_list:
k = i.index(elem)
index_pos.append(k)
print(index_pos)
However, on doing this, I get the following error:
TypeError: must be str, not int
I have tried str(i) instead of i but the error remains the same.
It would really be helpful if anyone can point out what I am doing wrong.
You can pack lists into a bigger list, so
lists = [D1, D2, D3]
and then iterate over that, so
elem = 2
index_pos = []
for l in lists:
k = l.index(elem)
index_pos.append(k)
print(index_pos)
The way you consider the list "D" into the s_list is not right and it is of type strings. Instead it could be done the following way.
Example:
D1 = [1,2,3,4,5]
D2 = [2,7,6,5,4]
D3 = [1,2,6,8,7]
s_list = [D1,D2,D3]
print(s_list)
Output:
[[1, 2, 3, 4, 5], [2, 7, 6, 5, 4], [1, 2, 6, 8, 7]]
Here the code is right and the same:
elem = 2
index_pos = []
for i in s_list:
k = i.index(elem)
index_pos.append(k)
print(index_pos)
Output:
[1, 0, 1]
The problem is that the elements of the list are strings. It is printed like:
['D1','D2','D3']
That is why you get an error. I suggest you define a function that takes some parameters and the return the indexes:
def retrun_indexes(ele,*args):
return [i.index(ele) for i in args if ele in i]
D1 = [1,2,3,4,5]
D2 = [2,7,6,5,4]
D3 = [1,2,6,8,7]
print(retrun_indexes(2,D1,D2,D3))
# Output: [1, 0, 1]
You can pass any number of list an it will return the result. Just make sure that always, the element to find is the first parameter to be passed in the function.
In this line for i in s_list: you are iterating over the s_list which just contains the names of the list as strings. Whereas really you want to iterate over the lists named D1, D2, D3 however these are variables.
So I think this fixes that problem by iterating over the lists themselves and their names at the same time.
D1 = [1,2,3,4,5]
D2 = [2,7,6,5,4]
D3 = [1,2,6,8,7]
s_list = []
for i in range(1,4):
s_list.append('D'+str(i))
index_pos = []
for d_list, list_name in zip([D1, D2, D3], s_list):
index_pos.append((d_list.index(2), list_name))
print(index_pos)
Out:
[(1, 'D1'), (0, 'D2'), (1, 'D3')]

Combining a nested list without affecting the key and value direction in python

I have a program that stores data in a list. The current and desired output is in the format:
# Current Input
[{'Devices': ['laptops', 'tablets'],
'ParentCategory': ['computers', 'computers']},
{'Devices': ['touch'], 'ParentCategory': ['phones']}]
# Desired Output
[{'Devices': ['laptops', 'tablets','touch'],
'ParentCategory': ['computers', 'computers','phones']}]
Can you give me an idea on how to combine the lists with another python line of code or logic to get the desired output?
You can do something like this:
def convert(a):
d = {}
for x in a:
for key,val in x.items():
if key not in d:
d[key] = []
d[key] += val
return d
Code above is for Python 3.
If you're on Python 2.7, then I believe that you should replace items with iteritems.
Solution using a dict comprehension: build the merged dictionary first by figuring out which keys it should have, then by concatenating all the lists for each key. The set of keys, and each resulting list, are built using itertools.chain.from_iterable.
from itertools import chain
def merge_dicts(*dicts):
return {
k: list(chain.from_iterable( d[k] for d in dicts if k in d ))
for k in set(chain.from_iterable(dicts))
}
Usage:
>>> merge_dicts({'a': [1, 2, 3], 'b': [4, 5]}, {'a': [6, 7], 'c': [8]})
{'a': [1, 2, 3, 6, 7], 'b': [4, 5], 'c': [8]}
>>> ds = [
{'Devices': ['laptops', 'tablets'],
'ParentCategory': ['computers', 'computers']},
{'Devices': ['touch'],
'ParentCategory': ['phones']}
]
>>> merge_dicts(*ds)
{'ParentCategory': ['computers', 'computers', 'phones'],
'Devices': ['laptops', 'tablets', 'touch']}

Filtering a (Nx1) list in Python

I have a list of the form
[(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
I want to scan the list and return those elements whose (i,1) are repeated. (I apologize I couldn't frame this better)
For example, in the given list the pairs are (2,3),(4,3) and I see that 3 is repeated so I wish to return 2 and 4. Similarly, from (3,4),(1,4),(5,4) I will return 3, 1, and 5 because 4 is repeated.
I have implemented the bubble search but that is obviously very slow.
for i in range(0,p):
for j in range(i+1,p):
if (arr[i][1] == arr[j][1]):
print(arr[i][0],arr[j][0])
How do I go about it?
You can use collections.defaultdict. This will return a mapping from the second item to a list of first items. You can then filter for repetition via a dictionary comprehension.
from collections import defaultdict
lst = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
d = defaultdict(list)
for i, j in lst:
d[j].append(i)
print(d)
# defaultdict(list, {3: [2, 4], 4: [3, 1, 5], 5: [6]})
res = {k: v for k, v in d.items() if len(v)>1}
print(res)
# {3: [2, 4], 4: [3, 1, 5]}
Using numpy allows to avoid for loops:
import numpy as np
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
a = np.array(l)
items, counts = np.unique(a[:,1], return_counts=True)
is_duplicate = np.isin(a[:,1], items[counts > 1]) # get elements that have more than one count
print(a[is_duplicate, 0]) # return elements with duplicates
# tuple(map(tuple, a[is_duplicate, :])) # use this to get tuples in output
(toggle comment to get output in form of tuples)
pandas is another option:
import pandas as pd
l = [(2,3),(4,3),(3,4),(1,4),(5,4),(6,5)]
df = pd.DataFrame(l, columns=list(['first', 'second']))
df.groupby('second').filter(lambda x: len(x) > 1)

Get a dict whose keys are combinations of a given dict without duplicate values

I have a dict like this:
dic = {'01':[1,2], '02':[1], '03':[2,3]}
what I want to achieve is a new dict, its keys are combinations of the keys (group in 2 only), and without duplicate values.
in this simple example, the output will be:
newDic = {'0102':[1,2], '0103':[1,2,3],'0203':[1,2,3]}
thanks a bunch!!
You can use a itertools.combinations to get the different combo's of keys. and then use set to get unique values of the list. Put it all into a dictionary-comprehension like this:
>>> dic = {'01':[1,2], '02':[1], '03':[2,3]}
>>> import itertools as IT
>>> {a+b: list(set(dic[a]+dic[b])) for a,b in IT.combinations(dic, 2)}
{'0203': [1, 2, 3], '0301': [1, 2, 3], '0201': [1, 2]}
You can also use join and sorted to have the keys the way you want them:
>>> {''.join(sorted([a,b])): list(set(dic[a]+dic[b])) for a,b in IT.combinations(dic, 2)}
{'0203': [1, 2, 3], '0103': [1, 2, 3], '0102': [1, 2]}
newDic = { a+b : list(set(dic[a] + dic[b])) for a in dic for b in dic if b>a }

Removing Elements in A from B

I have two lists, let's say:
a = [1,2,3]
b = [1,2,3,1,2,3]
I would like to remove 1, 2 and 3 from list b, but not all occurrences. The resulting list should have:
b = [1,2,3]
I currently have:
for element in a:
try:
b.remove(element)
except ValueError:
pass
However, this has poor performance when a and b get very large. Is there a more efficient way of getting the same results?
EDIT
To clarify 'not all occurrences', I mean I do not wish to remove both '1's from b, as there was only one '1' in a.
I would do this:
set_a = set(a)
new_b = []
for x in b:
if x in set_a:
set_a.remove(x)
else:
new_b.append(x)
Unlike the other set solutions, this maintains order in b (if you care about that).
I would do something like this:
from collections import defaultdict
a = [1, 2, 3]
b = [1, 2, 3, 1, 2, 3]
# Build up the count of occurrences in b
d = defaultdict(int)
for bb in b:
d[bb] += 1
# Remove one for each occurrence in a
for aa in a:
d[aa] -= 1
# Create a list for all elements that still have a count of one or more
result = []
for k, v in d.iteritems():
if v > 0:
result += [k] * v
Or, if you are willing to be slightly more obscure:
from operator import iadd
result = reduce(iadd, [[k] * v for k, v in d.iteritems() if v > 0], [])
defaultdict generates a count of the occurrences of each key. Once it has been built up from b, it is decremented for each occurrence of a key in a. Then we print out the elements that are still left over, allowing them to occur multiple times.
defaultdict works with python 2.6 and up. If you are using a later python (2.7 and up, I believe), you can look into collections.Counter.
Later: you can also generalize this and create subtractions of counter-style defaultdicts:
from collections import defaultdict
from operator import iadd
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
def build_dd(lst):
d = defaultdict(int)
for item in lst:
d[item] += 1
return d
def subtract_dd(left, right):
return {k: left[k] - v for k, v in right.iteritems()}
db = build_dd(b)
da = build_dd(a)
result = reduce(iadd,
[[k] * v for k, v in subtract_dd(db, da).iteritems() if v > 0],
[])
print result
But the reduce expression is pretty obscure now.
Later still: in python 2.7 and later, using collections.Counter, it looks like this:
from collections import Counter
base = [1, 2, 3]
missing = [4, 5, 6]
extra = [7, 8, 9]
a = base + missing
b = base * 4 + extra
result = Counter(b) - Counter(a)
print result
assert result == dict([(k, 3) for k in base] + [(k, 1) for k in extra])
Generally, you want to always avoid list.remove() (you are right, it would hurt the performance really badly). Also, it is much faster (O(1)) to look up elements in a dictionary or a set than in a list; so create a set out of your list1 (and if order doesn't matter, out of your list2).
Something like this:
sa = set(a)
new_b = [x for x in b if not x in sa]
# here you created a 3d list but I bet it's OK.
However I have no idea what is your actual algo for choosing elements for removal. Please elaborate on "but not all occurrences".

Categories