I want to group all the lists in a tuple, based on the last element in each list and also count the mount of times the last element occurred. However the challenge I am finding is that all the lists in the tuple can be of different sizes.
Eg input
[['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
And I am trying to get the output to be
('a', 2, [('dd','ee'),('ff', 'gg', 'hh')]), ( 'b', 2, [('aa'), ('cc')]), ( 'c', 1, [('bb')])
Finally I want to then go ahead and convert it to a panda-dataframe format. If anyone can help/guide, it would be much appreciated.
Readable version
mylist.sort(key=operator.itemgetter(-1)) # sort by last element
result = []
for k, g in itertools.groupby(mylist, key=operator.itemgetter(-1)):
# remove last element from each sublist:
g = [tuple(sublist[:-1]) for sublist in g]
result.append((k, len(g), g))
Without importing a library
list = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
instances = {}
for sublist in list:
leading_elements, last_element = sublist[:-1], sublist[-1]
instances.setdefault(last_element, [])
result = tuple()
for key, val in instances.items():
result += (key, len(val), val)
Use itertools.groupby
>>> from itertools import groupby
>>> l = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
>>> f = lambda sl: sl[-1]
>>> res = [(k, [tuple(sl[:-1]) for sl in v]) for k,v in groupby(sorted(l, key=f), f)]
>>> res = [(k, len(v), v) for k,v in res]
>>> print(res)
[('a', 2, [('dd', 'ee'), ('ff', 'gg', 'hh')]), ('b', 2, [('aa',), ('cc',)]), ('c', 1, [('bb',)])]
I would like print out dictionary key, value pair in the even frequency like
a = dict('A': 3, 'B': 5}
=> ['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
a = dict('A': 4, 'B': 1}
=> ['A', 'B', 'A', 'A', 'A']
I know I can use a while loop to print each key and remove the count every time until all value from all key is 0 but if there is better way to do it?
def func(d: dict):
res = []
while any(i > 0 for i in d.values()):
for k, c in d.items():
if c > 0:
d[k] -= 1
return res
(I'm assuming you're using a version of Python that guarantees the iteration order of dictionaries)
Here's an itertools-y approach. It creates a generator for each letter that yields the letter the given number of times, and it combines all of them together with zip_longest so they get yielded evenly.
from itertools import repeat, zip_longest
def iterate_evenly(d):
generators = [repeat(k, v) for k,v in d.items()]
exhausted = object()
for round in zip_longest(*generators, fillvalue=exhausted):
for x in round:
if x is not exhausted:
yield x
print(list(iterate_evenly({"A": 3, "B": 5})))
print(list(iterate_evenly({"A": 4, "B": 1})))
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
['A', 'B', 'A', 'A', 'A']
You can do the same thing in fewer lines, although it becomes harder to read.
from itertools import repeat, zip_longest
def iterate_evenly(d):
exhausted = object()
return [x for round in zip_longest(*(repeat(k, v) for k,v in d.items()), fillvalue=exhausted) for x in round if x is not exhausted]
print(iterate_evenly({"A": 3, "B": 5}))
print(iterate_evenly({"A": 4, "B": 1}))
For a one-liner.
First, create a list with two elements: a list of As and a list of Bs:
>>> d = {'A': 3, 'B': 5}
>>> [[k]*v for k, v in d.items()]
[['A', 'A', 'A'], ['B', 'B', 'B', 'B', 'B']]
[k]*v means: a list with v ks. Second, interleave As and B. We need zip_longest because zip would stop after the end of the first list:
>>> import itertools
>>> list(itertools.zip_longest(*[[k]*v for k, v in d.items()]))
[('A', 'B'), ('A', 'B'), ('A', 'B'), (None, 'B'), (None, 'B')]
Now, just flatten the list and remove None values:
>>> [v for vs in itertools.zip_longest(*[[k]*v for k, v in d.items()]) for v in vs if v is not None]
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
Other example:
>>> d = {'A': 4, 'B': 1}
>>> [v for vs in itertools.zip_longest(*[[k]*v for k, v in d.items()]) for v in vs if v is not None]
['A', 'B', 'A', 'A', 'A']
You can just use sum with a generator comprehension:
res = sum(([key]*value for key, value in d.items()), [])
This exploits the fact that sum can "add" anything that can use the + operators, like lists, in addition to sequence multiplication ("A"*4 == "AAAA").
If you want the order to be randomized, use the random module:
from random import shuffle
If, as Thierry Lathuille notes, you want to cycle through the values in the original order, you can use some itertools magic:
from itertools import chain, zip_longest
res = [*filter(
bool, # drop Nones
*([key]*val for key, val in d.items()))
As an alternative to the replication & zip_longest approach, let's try to simplify the OP's original code:
def function(dictionary):
result = []
while dictionary:
dictionary = {k: v - 1 for k, v in dictionary.items() if v > 1}
return result
print(function({'A': 3, 'B': 5}))
print(function({'A': 4, 'B': 1}))
% python3 test.py
['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
['A', 'B', 'A', 'A', 'A']
Although it might look otherwise, it's not destructive on the dictionary argument, unlike the OP's original code.
It could also be done using a sort of the (position,character) tuples formed by expanding each dictionary entry:
a = {'A': 3, 'B': 5}
result = [c for _,c in sorted( (p,c) for c,n in a.items() for p,c in enumerate(c*n))]
print(result) # ['A', 'B', 'A', 'B', 'A', 'B', 'B', 'B']
If the dictionary's order is usable, you can forgo the sort and use this:
result = [c for i in range(max(a.values())) for c,n in a.items() if i<n]
I have two lists:
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
output = [['a','b','c'],['d','e','f'],['g','h','i']]
a and b are list of same length.
I need an output array by in such a way that whenever the value in list - a changes from 0 to 1 or from 1 to 3, A new list should be made in the output list.
can someone please help.
Use groupby:
from itertools import groupby
from operator import itemgetter
a = [0, 0, 0, 1, 1, 1, 3, 3, 3]
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
output = [list(map(itemgetter(1), group)) for _, group in groupby(zip(a, b), key=itemgetter(0))]
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
A simpler method without using any imports by utilizing dictionary:
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
d = {e: [] for e in set(a)} # Create a dictionary for each of a's unique key
[d[e].append(b[i]) for i, e in enumerate(a)] # put stuff into lists by index
lofl = list(d.values())
>>> lofl
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
Using groupby, you could do:
from itertools import groupby
a= [0,0,0,1,1,1,3,3,3]
b= ['a','b','c','d','e','f','g','h','i']
iter_b = iter(b)
output = [[next(iter_b) for _ in group] for key, group in groupby(a)]
# [['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
groupby yields successive groups of identical values of a. For each group, we create a list containing as many of the next elements of b as there are values in the group.
As you added tag algorithm , I believe you want a solution without so many magic.
>>> def merge_lists(A, B):
... output = []
... sub_list = []
... current = A[0]
... for i in range(len(A)):
... if A[i] == current:
... sub_list.append(B[i])
... else:
... output.append(sub_list)
... sub_list = []
... sub_list.append(B[i])
... current = A[i]
... output.append(sub_list)
... return output
>>> a= [0,0,0,1,1,1,3,3,3]
>>> b= ['a','b','c','d','e','f','g','h','i']
>>> merge_list(a, b)
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']]
How to slice a list based on the length of its elements?
For example, how do I turn
['A', 'E', 'LA', 'ELA']
Using itertools.groupby
from itertools import groupby
l = ['A', 'E', 'LA', 'ELA']
[list(g) for _,g in groupby(l,len)]
#[['A', 'E'], ['LA'], ['ELA']]
You can try this:
l = ['A', 'E', 'LA', 'ELA', 'B', 'CD']
maximum = max([len(i) for i in l])
minimum = min([len(i) for i in l])
l = list([i for i in l if len(i)==s] for s in range(minimum, maximum+1))
[['A', 'E', 'B'], ['LA', 'CD'], ['ELA']]
I have this dictionary:
n ={'b': [['a'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
and require the following output:
n ={'b': ['a', 'c'], 'a': ['c', 'b'], 'c': ['b']}
I tried to use itertools and join but couldn't get it to work, can anyone help out?
Just use chain.from_iterable from itertools to combine these:
from itertools import chain
from_it = chain.from_iterable
{k: list(from_it(i)) for k, i in n.items()}
If you require unique values in the lists (which according to the title you don't), you can additionally wrap the result of from_it in a set.
I would iterate the dict and ignore the irrelevant list.
For uniqueness you can cast each inner_list to a set
n ={'b': [['a', 'b'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
new_n = {}
for k,v in n.items():
n[k] = [inner_item for item in v for inner_item in item]
print (n)
You can try this:
from itertools import chain
n ={'b': [['a'], ['c']], 'a': [['c', 'b'], ['c']], 'c': [['b']]}
new_n = {a:list(set(chain(*[i[0] if len(i) == 1 else i for i in b]))) for a, b in n.items()}
{'a': ['c', 'b'], 'c': ['b'], 'b': ['a', 'c']}
A one liner solution(and not recommended) to this is :
{key: list(set([item for subarr in value for item in subarr])) for key, value in n.items()}
It is much harder to read though. If you really do not want to import anything, you can write a helper function.
def flat_and_unique_list(list_of_lists):
return list(set([item for sub_list in list_of_lists for item in sub_list]))
{key: flat_and_unique_list(value) for key, value in n.items()}
A solution with sum:
>>> {k: sum(v, []) for k, v in n.items()}
{'a': ['c', 'b', 'c'], 'b': ['a', 'c'], 'c': ['b']}
sum(iterable, start=0, /)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
Therefore, using an empty list as start value works.
Remove multiplies using without preserving order using set:
>>> {k: list(set(sum(v, []))) for k, v in n.items()}
{'a': ['c', 'b'], 'b': ['a', 'c'], 'c': ['b']}
I have a string abccddde
I need to find substrings like:
a, b, c, cc, d, dd, ddd, e
substrings ab or cd are not valid.
I tried finding all the substrings from a string but its not efficient
def get_all_substrings(input_string):
length = len(input_string)
return [input_string[i:j+1] for i in range(length) for j in range(i,length)]
This is outputting:
['a', 'ab', 'abc', 'abcc', 'abccd', 'abccdd', 'abccddd', 'abccddde', 'b', 'bc', 'bcc', 'bccd', 'bccdd', 'bccddd', 'bccddde', 'c', 'cc', 'ccd', 'ccdd', 'ccddd', 'ccddde', 'c', 'cd', 'cdd', 'cddd', 'cddde', 'd', 'dd', 'ddd', 'ddde', 'd', 'dd', 'dde', 'd', 'de', 'e']
This was the method i followed to find the substrings but it gives all the possiblities but that is what makes it inefficient
Please Help!
You can use itertools.groupby() for this:
from itertools import groupby
s = 'abccdddcce'
l1 = ["".join(g) for k, g in groupby(s)]
l2 = [a[:i+1] for a in l1 for i in range(len(a))]
print l2
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'c', 'cc', 'e']
For larger input data, replace the Lists with Generators,
itertools.groupby can tell you the number of consecutive chars. After that for each group you have the char repeated upto that number.
from itertools import groupby
def substrings(s):
for char, group in groupby(s):
substr = ''
for i in group:
substr += i
yield substr
for result in substrings('abccdddcce'):
here is one way using regex:
In [85]: [j for i in re.findall(r'((\w)(\2+)?)', s) for j in set(i) if j]
Out[85]: ['a', 'b', 'c', 'cc', 'ddd', 'dd', 'd', 'e']
from itertools import groupby
def runlength_compress(src):
return ((k, sum(1 for _ in g)) for k,g in groupby(src))
def contiguous_substrings(src):
return [c*(i+1) for c, count in runlength_compress(src) for i in range(count)]
The following will do what you want. I don't know if its efficient compared to other solutions though.
def get_all_substrings(text):
res = []
prev = ''
s = ''
for c in text:
if c == prev:
s += c
s = prev = c
return res
# Output
>>> get_all_substrings('abccddde')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e']
>>> get_all_substrings('abccdddec')
['a', 'b', 'c', 'cc', 'd', 'dd', 'ddd', 'e', 'c']
import timeit
import random
size = 100
values = 'abcde'
s = ''.join(random.choice(values) for _ in range(size))
setup = 'from __main__ import s, get_all_substrings',
number = 10000) )
# Example for size 100 input