Python - itertools.groupby 2 - python

Just having trouble with itertools.groupby. Given a list of dictionaries,
my_list= [
"AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
from this list, I expected to create a dictionary, where the key is the shortest name and the values ​​are the longest names
example
[
{"Legacy" : "AD01", "rphy" : ["AD01AA", "AD01AB", "AD01AC", "AD01AD"]},
{"Legacy" : "AD02", "rphy" : ["AD02AA", "AD02AB", "AD02AC"]},
]
could you help me please

You can use itertools.groupby, with some nexts:
from itertools import groupby
my_list= ["AD01", "AD01AA", "AD01AB", "AD01AC", "AD01AD","AD02", "AD02AA", "AD02AB", "AD02AC"]
groups = groupby(my_list, len)
output = [{'Legacy': next(g), 'rphy': list(next(groups)[1])} for _, g in groups]
print(output)
# [{'Legacy': 'AD01', 'rphy': ['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']},
# {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]
This is not robust to reordering of the input list.
Also, if there is some "gap" in the input, e.g., if "AD01" does not have corresponding 'rphy' entries, then it will throw a StopIteration error as you have found out. In that case you can use a more conventional approach:
from itertools import groupby
my_list= ["AD01", "AD02", "AD02AA", "AD02AB", "AD02AC"]
output = []
for item in my_list:
if len(item) == 4:
dct = {'Legacy': item, 'rphy': []}
output.append(dct)
else:
dct['rphy'].append(item)
print(output)
# [{'Legacy': 'AD01', 'rphy': []}, {'Legacy': 'AD02', 'rphy': ['AD02AA', 'AD02AB', 'AD02AC']}]

One approach would be: (see the note at the end of the answer)
from itertools import groupby
from pprint import pprint
my_list = [
"AD01",
"AD01AA",
"AD01AB",
"AD01AC",
"AD01AD",
"AD02",
"AD02AA",
"AD02AB",
"AD02AC",
]
res = []
for _, g in groupby(my_list, len):
lst = list(g)
if len(lst) == 1:
res.append({"Legacy": lst[0], "rphy": []})
else:
res[-1]["rphy"].append(lst)
pprint(res)
output:
[{'Legacy': 'AD01', 'rphy': [['AD01AA', 'AD01AB', 'AD01AC', 'AD01AD']]},
{'Legacy': 'AD02', 'rphy': [['AD02AA', 'AD02AB', 'AD02AC']]}]
This assumes that your data always starts with your desired key(the name which has the smallest name compare to the next values).
Basically in every iteration you check then length of the created list from groupby. If it is 1, this mean it's your key, if not, it will add the next items to the dictionary.
Note: This code would break if there aren't at least 2 names with the length larger than the keys between two keys.

Related

What is the correct way to write this function?

I was making a program where first parameter is a list and second parameter is a list of dictionaries. I want to return a list of lists like this:
As an example, if this were a function call:
make_lists(['Example'],
[{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]
)
the expected return value would be:
[ ['Made-up'] ]
As an second example, if this were a function call:
make_lists(['Hint', 'Num'],
[{'Hint': 'Length 2 Not Required', 'Num' : 8675309},
{'Num': 1, 'Hint' : 'Use 1st param order'}]
)
the expected return value would be:
[ ['Length 2 Not Required', 8675309],
['Use 1st param order', 1]
]
I have written a code for this but my code does not return a list of lists, it just returns a single list. Please can someone explain?
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
for y in x:
for k in s:
if(y==k):
lol.append(x.get(y))
i = i+1
return lol
Expected Output:
[ ['Length 2 Not Required', 8675309],['Use 1st param order', 1] ]
Output:
['Length 2 Not Required', 8675309, 1, 'Use 1st param order']
The whole point of dictionaries, is that you can access them by key:
def make_lists(keys, dicts):
result = []
for d in dicts:
vals = [d[k] for k in keys if k in d]
if len(vals) > 0:
result.append(vals)
return result
Let's have a look what happens here:
We still have the result array, which accumulates the answers, but now it's called result instead of lol
Next we iterate through every dictionary:
for d in dicts:
For each dictionary d, we create a list, which is a lookup in that dictionary for the keys in keys, if the key k is in the dictionary d:
vals = [d[k] for k in keys if k in d]
The specs don't detail this, but I assume if none of the keys are in the dictionary, you don't want it added to the array. For that, we have a check if vals have any results, and only then we add it to the results:
if len(vals) > 0:
result.append(vals)
Try this code - I've managed to modify your existing code slighty, and added explanation in the comments. Essentially, you just need to use a sub-list and add that to the master list lol, and then in each loop iteration over elements in lod, append to the sub-list instead of the outermost list.
def make_lists(s,lod):
a = []
lol =[]
i = 0
for x in lod:
## Added
# Here we want to create a new list, and add it as a sub-list
# within 'lol'
lols = []
lol.append(lols)
## Done
for y in x:
for k in s:
if(y==k):
# Changed 'lol' to 'lols' here
lols.append(x.get(y))
i = i+1
return lol
print(make_lists(['Example'], [{'Example': 'Made-up', 'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Hint': 'Length 2 Not Required', 'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Prints:
[['Made-up']]
[['Length 2 Not Required', 8675309], [1, 'Use 1st param order']]
A simpler solution
For a cleaner (and potentially more efficient approach), I'd suggest using builtins like map and using a list comprehension to tackle this problem:
def make_lists(s, lod):
return [[*map(dict_obj.get, s)] for dict_obj in lod]
But note, that this approach includes elements as None in cases where the desired keys in s are not present in the dictionary objects within the list lod.
To work around that, you can pass the result of map to the filter builtin function so that None values (which represent missing keys in dictionaries) are then stripped out in the result:
def make_lists(s, lod):
return [[*filter(None, map(dict_obj.get, s))] for dict_obj in lod]
print(make_lists(['Example'], [{'Extra Keys' : 'Possible'}]))
print(make_lists(['Hint', 'Num'], [{'Num' : 8675309}, {'Num': 1, 'Hint' : 'Use 1st param order'}]))
Output:
[[]]
[[8675309], ['Use 1st param order', 1]]

python edit tuple duplicates in a list

my target is:
while for looping a list I would like to check for duplicates and if there are some i would like to append a number to it see following example
my list output as an example:
[('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
in a loop i would like to edit those duplicates so instead of the 2nd microsoft i would like to have microsoft1 (if there would be 3 microsoft guys so the third guy would have microsoft2)
with this i can filter the duplicates but i dont know how to edit them directly in the list
list = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
names = []
double = []
for u in list[1:]:
names.append(u[1])
list_size = len(names)
for i in range(list_size):
k = i + 1
for j in range(k, list_size):
if names[i] == names[j] and names[i] not in double:
double.append(names[i])
This is one approach using collections.defaultdict.
Ex:
from collections import defaultdict
lst = [('name','company'), ('someguy','microsoft'), ('anotherguy','microsoft'), ('thirdguy','amazon')]
seen = defaultdict(int)
result = []
for k, v in lst:
if seen[v]:
result.append((k, "{}_{}".format(v, seen[v])))
else:
result.append((k,v))
seen[v] += 1
print(result)
Output:
[('name', 'company'),
('someguy', 'microsoft'),
('anotherguy', 'microsoft_1'),
('thirdguy', 'amazon')]

coupling str elements from a list to a tuple list

I have the following list:
lines
['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North' ]
I would like to couple them in a tuple list as follows, with respect to their names:
tuple_list
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I thought maybe I could do a string search in the elements of the lines but it wont be efficient. Is there a better way to order lines elements in a way which would look like tuple_list
Paring Criteria:
If the both elements have the same Area_name: ('North', 'Mid', 'South')
E.g.: 'line_North_Mid' should be coupled with 'line_Mid_North'
Try this:
from itertools import combinations
tuple_list = [i for i in combinations(lines,2) if i[0].split('_')[1] == i[1].split('_')[2] and i[0].split('_')[2] == i[1].split('_')[1]]
or I think this is better:
[i for i in combinations(lines,2) if i[0].split('_')[1:] == i[1].split('_')[1:][::-1]]
An order-agnostic O(n) solution is possible using collections.defaultdict. The idea is to use as our dictionary keys the last 2 components of your strings delimited by '_', appending values from your input list. Then extract values and convert to a list of tuples.
from collections import defaultdict
L = ['line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North', 'line_Mid_North']
dd = defaultdict(list)
for item in L:
dd[frozenset(item.rsplit('_', maxsplit=2)[1:])].append(item)
res = list(map(tuple, dd.values()))
# [('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North')]
You can use the following list comprehension:
lines = ['line_Mid_North', 'line_North_Mid',
'line_North_South', 'line_South_North',
'line_Mid_South', 'line_South_Mid']
[(j,i) for i in lines for j in lines if j not in i
if set(j.split('_')[1:]) < set(i.split('_'))][::2]
[('line_Mid_North', 'line_North_Mid'),
('line_North_South', 'line_South_North'),
('line_Mid_South', 'line_South_Mid')]
I suggest you have a function that returns the same key for string that are supposed to be together (a grouping-key).
def key(s):
# ignore first part and sort other 2 parts, so they will always be in same order
_, part_1, part_2 = s.split('_')
return tuple(sorted([part_1, part_2]))
The you have to use some grouping method; I used defaultdict for example:
import collections
lines = [
'line_North_Mid', 'line_South_Mid',
'line_North_South', 'line_Mid_South',
'line_South_North','line_Mid_North',
]
dd = collections.defaultdict(list)
for s in lines:
dd[key(s)].append(s) # those with same key get grouped
print(list(tuple(v) for v in dd.values()))
# [
# ('line_North_Mid', 'line_Mid_North'),
# ('line_South_Mid', 'line_Mid_South'),
# ('line_North_South', 'line_South_North'),
# ]

Parse dict into specific list format using Python

Say I have the following dict
{'red':'boop','white':'beep','rose':'blip'}
And I want to get it to a list like so
['red','boop','end','white','beep','rose','blip','end']
The key / value which is to be placed in front of the list is an input.
So I essentially I want [first_key, first_value,end, .. rest of the k/v pairs..,end]
I wrote a brute force approach but I feel like there's a more pythonic way of doing it (and also because once implemented the snippet would make my code O(n^2) )
for item in lst_items
data_lst = []
for key, value in item.iteritems():
data_lst.append(key)
ata_lst.append(value)
#insert 'end' at the appropiate indeces
#more code ...
Any pythonic approach?
The below relies on itertools.chain.from_iterable to flatten the items into a single list. We pull the first two values from the chain and then use them to build a new list, which we extend with the rest of the values.
from itertools import chain
def ends(d):
if not d:
return []
c = chain.from_iterable(d.iteritems())
l = [next(c), next(c), "end"]
l.extend(c)
l.append("end")
return l
ends({'red':'boop','white':'beep','rose':'blip'})
# ['rose', 'blip', 'end', 'white', 'beep', 'red', 'boop', 'end']
If you know the key you want first, and don't care about the rest, we can use a lazily evaluated generator expression to remove it from the flattened list.
def ends(d, first):
if not d:
return []
c = chain.from_iterable((k, v) for k, v in d.iteritems() if k != first)
l = [first, d[first], "end"]
l.extend(c)
l.append("end")
return l
ends({'red':'boop','white':'beep','rose':'blip'}, 'red')
# ['red', 'boop', 'end', 'rose', 'blip', 'white', 'beep', 'end']
The first key is specified in first variable:
first = 'red'
d = {'red':'boop','white':'beep','rose':'blip'}
new_l = [first, d[first], 'end']
for k, v in d.items():
if k == first:
continue
new_l.append(k)
new_l.append(v)
new_l.append('end')
print(new_l)
Prints:
['red', 'boop', 'end', 'white', 'beep', 'rose', 'blip', 'end']
You could use enumerate and check the current index:
>>> d = {'red':'boop','white':'beep','rose':'blip'}
>>> [x for i, e in enumerate(d.items())
... for x in (e + ("end",) if i in (0, len(d)-1) else e)]
...
['white', 'beep', 'end', 'red', 'boop', 'rose', 'blip', 'end']
However, your original idea, first chaining the keys and values and then inserting the "end" items would not have O(n²), either. It would be O(n) followed by another O(n), hence still O(n).
from itertools import chain
list(chain(*item.items())) + ['end']
data_lst = [x for k, v in lst_itemsL.iteritems() for x in (k, v) ]
data_lst.insert(2, 'end')
data_lst.append('end')
This is pythonic; though will likely have the same efficiency (which can't be helped here).
This should be faster than placing if blocks inside the loops...

Merge nested list items based on a repeating value

Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.

Categories