I want to group all the lists in a tuple, based on the last element in each list and also count the mount of times the last element occurred. However the challenge I am finding is that all the lists in the tuple can be of different sizes.
Eg input
[['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
And I am trying to get the output to be
('a', 2, [('dd','ee'),('ff', 'gg', 'hh')]), ( 'b', 2, [('aa'), ('cc')]), ( 'c', 1, [('bb')])
Finally I want to then go ahead and convert it to a panda-dataframe format. If anyone can help/guide, it would be much appreciated.
Readable version
mylist.sort(key=operator.itemgetter(-1)) # sort by last element
result = []
for k, g in itertools.groupby(mylist, key=operator.itemgetter(-1)):
# remove last element from each sublist:
g = [tuple(sublist[:-1]) for sublist in g]
result.append((k, len(g), g))
Without importing a library
list = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
instances = {}
for sublist in list:
leading_elements, last_element = sublist[:-1], sublist[-1]
instances.setdefault(last_element, [])
instances[last_element].append(tuple(leading_elements))
result = tuple()
for key, val in instances.items():
result += (key, len(val), val)
Use itertools.groupby
>>> from itertools import groupby
>>> l = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
>>>
>>> f = lambda sl: sl[-1]
>>> res = [(k, [tuple(sl[:-1]) for sl in v]) for k,v in groupby(sorted(l, key=f), f)]
>>> res = [(k, len(v), v) for k,v in res]
>>> print(res)
[('a', 2, [('dd', 'ee'), ('ff', 'gg', 'hh')]), ('b', 2, [('aa',), ('cc',)]), ('c', 1, [('bb',)])]
Related
I would like print out dictionary key, value pair in the even frequency like
a = dict('A': 3, 'B': 5}
=> ['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
a = dict('A': 4, 'B': 1}
=> ['A', 'B', 'A', 'A', 'A']
I know I can use a while loop to print each key and remove the count every time until all value from all key is 0 but if there is better way to do it?
def func(d: dict):
res = []
while any(i > 0 for i in d.values()):
for k, c in d.items():
if c > 0:
res.append(k)
d[k] -= 1
return res
(I'm assuming you're using a version of Python that guarantees the iteration order of dictionaries)
Here's an itertools-y approach. It creates a generator for each letter that yields the letter the given number of times, and it combines all of them together with zip_longest so they get yielded evenly.
from itertools import repeat, zip_longest
def iterate_evenly(d):
generators = [repeat(k, v) for k,v in d.items()]
exhausted = object()
for round in zip_longest(*generators, fillvalue=exhausted):
for x in round:
if x is not exhausted:
yield x
print(list(iterate_evenly({"A": 3, "B": 5})))
print(list(iterate_evenly({"A": 4, "B": 1})))
Result:
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
['A', 'B', 'A', 'A', 'A']
You can do the same thing in fewer lines, although it becomes harder to read.
from itertools import repeat, zip_longest
def iterate_evenly(d):
exhausted = object()
return [x for round in zip_longest(*(repeat(k, v) for k,v in d.items()), fillvalue=exhausted) for x in round if x is not exhausted]
print(iterate_evenly({"A": 3, "B": 5}))
print(iterate_evenly({"A": 4, "B": 1}))
For a one-liner.
First, create a list with two elements: a list of As and a list of Bs:
>>> d = {'A': 3, 'B': 5}
>>> [[k]*v for k, v in d.items()]
[['A', 'A', 'A'], ['B', 'B', 'B', 'B', 'B']]
[k]*v means: a list with v ks. Second, interleave As and B. We need zip_longest because zip would stop after the end of the first list:
>>> import itertools
>>> list(itertools.zip_longest(*[[k]*v for k, v in d.items()]))
[('A', 'B'), ('A', 'B'), ('A', 'B'), (None, 'B'), (None, 'B')]
Now, just flatten the list and remove None values:
>>> [v for vs in itertools.zip_longest(*[[k]*v for k, v in d.items()]) for v in vs if v is not None]
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
Other example:
>>> d = {'A': 4, 'B': 1}
>>> [v for vs in itertools.zip_longest(*[[k]*v for k, v in d.items()]) for v in vs if v is not None]
['A', 'B', 'A', 'A', 'A']
You can just use sum with a generator comprehension:
res = sum(([key]*value for key, value in d.items()), [])
This exploits the fact that sum can "add" anything that can use the + operators, like lists, in addition to sequence multiplication ("A"*4 == "AAAA").
If you want the order to be randomized, use the random module:
from random import shuffle
shuffle(res)
If, as Thierry Lathuille notes, you want to cycle through the values in the original order, you can use some itertools magic:
from itertools import chain, zip_longest
res = [*filter(
bool, # drop Nones
chain(*zip_longest(
*([key]*val for key, val in d.items()))
)
)]
As an alternative to the replication & zip_longest approach, let's try to simplify the OP's original code:
def function(dictionary):
result = []
while dictionary:
result.extend(dictionary)
dictionary = {k: v - 1 for k, v in dictionary.items() if v > 1}
return result
print(function({'A': 3, 'B': 5}))
print(function({'A': 4, 'B': 1}))
OUTPUT
% python3 test.py
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
['A', 'B', 'A', 'A', 'A']
%
Although it might look otherwise, it's not destructive on the dictionary argument, unlike the OP's original code.
It could also be done using a sort of the (position,character) tuples formed by expanding each dictionary entry:
a = {'A': 3, 'B': 5}
result = [c for _,c in sorted( (p,c) for c,n in a.items() for p,c in enumerate(c*n))]
print(result) # ['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
If the dictionary's order is usable, you can forgo the sort and use this:
result = [c for i in range(max(a.values())) for c,n in a.items() if i<n]
I have two lists:
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
output = [['a','b','c'],['d','e','f'],['g','h','i']]
a and b are list of same length.
I need an output array by in such a way that whenever the value in list - a changes from 0 to 1 or from 1 to 3, A new list should be made in the output list.
can someone please help.
Use groupby:
from itertools import groupby
from operator import itemgetter
a = [0, 0, 0, 1, 1, 1, 3, 3, 3]
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
output = [list(map(itemgetter(1), group)) for _, group in groupby(zip(a, b), key=itemgetter(0))]
print(output)
Output
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
A simpler method without using any imports by utilizing dictionary:
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
d = {e: [] for e in set(a)} # Create a dictionary for each of a's unique key
[d[e].append(b[i]) for i, e in enumerate(a)] # put stuff into lists by index
lofl = list(d.values())
>>> lofl
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
Using groupby, you could do:
from itertools import groupby
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
iter_b = iter(b)
output = [[next(iter_b) for _ in group] for key, group in groupby(a)]
print(output)
# [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
groupby yields successive groups of identical values of a. For each group, we create a list containing as many of the next elements of b as there are values in the group.
As you added tag algorithm , I believe you want a solution without so many magic.
>>> def merge_lists(A, B):
... output = []
... sub_list = []
... current = A[0]
... for i in range(len(A)):
... if A[i] == current:
... sub_list.append(B[i])
... else:
... output.append(sub_list)
... sub_list = []
... sub_list.append(B[i])
... current = A[i]
... output.append(sub_list)
... return output
...
>>> a= [0,0,0,1,1,1,3,3,3]
>>> b= ['a','b','c','d','e','f','g','h','i']
>>> merge_list(a, b)
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
How to slice a list based on the length of its elements?
For example, how do I turn
['A', 'E', 'LA', 'ELA']
into
['A','E],['LA'],['ELA']
Using itertools.groupby
from itertools import groupby
l = ['A', 'E', 'LA', 'ELA']
[list(g) for _,g in groupby(l,len)]
#Output:
#[['A', 'E'], ['LA'], ['ELA']]
You can try this:
l = ['A', 'E', 'LA', 'ELA', 'B', 'CD']
maximum = max([len(i) for i in l])
minimum = min([len(i) for i in l])
l = list([i for i in l if len(i)==s] for s in range(minimum, maximum+1))
print(l)
Output:
[['A', 'E', 'B'], ['LA', 'CD'], ['ELA']]
I have this dictionary:
n ={'b': [['a'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
and require the following output:
n ={'b': ['a', 'c'], 'a': ['c', 'b'], 'c': ['b']}
I tried to use itertools and join but couldn't get it to work, can anyone help out?
Just use chain.from_iterable from itertools to combine these:
from itertools import chain
from_it = chain.from_iterable
{k: list(from_it(i)) for k, i in n.items()}
If you require unique values in the lists (which according to the title you don't), you can additionally wrap the result of from_it in a set.
I would iterate the dict and ignore the irrelevant list.
For uniqueness you can cast each inner_list to a set
n ={'b': [['a', 'b'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
new_n = {}
for k,v in n.items():
n[k] = [inner_item for item in v for inner_item in item]
print (n)
You can try this:
from itertools import chain
n ={'b': [['a'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
new_n = {a:list(set(chain(*[i[0] if len(i) == 1 else i for i in b]))) for a, b in n.items()}
Output:
{'a': ['c', 'b'], 'c': ['b'], 'b': ['a', 'c']}
A one liner solution(and not recommended) to this is :
{key: list(set([item for subarr in value for item in subarr])) for key, value in n.items()}
It is much harder to read though. If you really do not want to import anything, you can write a helper function.
def flat_and_unique_list(list_of_lists):
return list(set([item for sub_list in list_of_lists for item in sub_list]))
{key: flat_and_unique_list(value) for key, value in n.items()}
A solution with sum:
>>> {k: sum(v, []) for k, v in n.items()}
{'a': ['c', 'b', 'c'], 'b': ['a', 'c'], 'c': ['b']}
sum(iterable, start=0, /)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
Therefore, using an empty list as start value works.
Remove multiplies using without preserving order using set:
>>> {k: list(set(sum(v, []))) for k, v in n.items()}
{'a': ['c', 'b'], 'b': ['a', 'c'], 'c': ['b']}
I have a string abccddde
I need to find substrings like:
a, b, c, cc, d, dd, ddd, e
substrings ab or cd are not valid.
I tried finding all the substrings from a string but its not efficient
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
This is outputting:
['a', 'ab', 'abc', 'abcc', 'abccd', 'abccdd', 'abccddd', 'abccddde', 'b', 'bc', 'bcc', 'bccd', 'bccdd', 'bccddd', 'bccddde', 'c', 'cc', 'ccd', 'ccdd', 'ccddd', 'ccddde', 'c', 'cd', 'cdd', 'cddd', 'cddde', 'd', 'dd', 'ddd', 'ddde', 'd', 'dd', 'dde', 'd', 'de', 'e']
This was the method i followed to find the substrings but it gives all the possiblities but that is what makes it inefficient
Please Help!
You can use itertools.groupby() for this:
from itertools import groupby
s = 'abccdddcce'
l1 = ["".join(g) for k, g in groupby(s)]
l2 = [a[:i+1] for a in l1 for i in range(len(a))]
print l2
Output:
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'c', 'cc', 'e']
For larger input data, replace the Lists with Generators,
l1=()
l2=()
itertools.groupby can tell you the number of consecutive chars. After that for each group you have the char repeated upto that number.
from itertools import groupby
def substrings(s):
for char, group in groupby(s):
substr = ''
for i in group:
substr += i
yield substr
for result in substrings('abccdddcce'):
print(result)
here is one way using regex:
In [85]: [j for i in re.findall(r'((\w)(\2+)?)', s) for j in set(i) if j]
Out[85]: ['a', 'b', 'c', 'cc', 'ddd', 'dd', 'd', 'e']
from itertools import groupby
def runlength_compress(src):
return ((k, sum(1 for _ in g)) for k,g in groupby(src))
def contiguous_substrings(src):
return [c*(i+1) for c, count in runlength_compress(src) for i in range(count)]
print(contiguous_substrings('abccddde'))
The following will do what you want. I don't know if its efficient compared to other solutions though.
def get_all_substrings(text):
res = []
prev = ''
s = ''
for c in text:
if c == prev:
s += c
else:
s = prev = c
res.append(s)
return res
# Output
>>> get_all_substrings('abccddde')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e']
>>> get_all_substrings('abccdddec')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e', 'c']
Timings
import timeit
import random
size = 100
values = 'abcde'
s = ''.join(random.choice(values) for _ in range(size))
print(s)
print(timeit.timeit("get_all_substrings(s)",
setup = 'from __main__ import s, get_all_substrings',
number = 10000) )
# Example for size 100 input
abbaaebacddbdedbdbbacadcdddabaeabacdcbeebbccaadebdcecadcecceececcacebacecbbccdedddddabaeeceeeccabdcc
0.16761969871275967