I have a grouped list of strings that sort of looks like this, the lists inside of these groups will always contain 5 elements:
text_list = [['aaa','bbb','ccc','ddd','eee'],
['fff','ggg','hhh','iii','jjj'],
['xxx','mmm','ccc','bbb','aaa'],
['fff','xxx','aaa','bbb','ddd'],
['aaa','bbb','ccc','ddd','eee'],
['fff','xxx','aaa','ddd','eee'],
['iii','xxx','ggg','jjj','aaa']]
The objective is simple, group all of the list that is similar by the first 3 elements that is then compared against all of the elements inside of the other groups.
So from the above example the output might look like this (output is the index of the list):
[[0,2,4],[3,5]]
Notice how if there is another list that contains the same elements but in a different order is removed.
I've written the following code to extract the groups but they would return duplicates and I am unsure how to proceed. I also think this might not be the most efficient way to do the extraction as the real list can contain upwards to millions of groups:
grouped_list = []
for i in range(0,len(text_list)):
int_temp = []
for m in range(0,len(text_list)):
if i == m:
continue
bool_check = all( x in text_list[m] for x in text_list[i][0:3])
if bool_check:
if len(int_temp) == 0:
int_temp.append(i)
int_temp.append(m)
continue
int_temp.append(m)
grouped_list.append(int_temp)
## remove index with no groups
grouped_list = [x for x in grouped_list if x != []]
Is there a better way to go about this? How do I remove the duplicate group afterwards? Thank you.
Edit:
To be clearer, I would like to retrieve the lists that is similar to each other but only using the first 3 elements of the other lists. For example, using the first 3 elements from list A, check if list B,C,D... contains all 3 of the elements from list A. Repeat for the entire list then remove any list that contains duplicate elements.
You can build a set of frozensets to keep track of indices of groups with the first 3 items being a subset of the rest of the members:
groups = set()
sets = list(map(set, text_list))
for i, lst in enumerate(text_list):
groups.add(frozenset((i, *(j for j, s in enumerate(sets) if set(lst[:3]) <= s))))
print([sorted(group) for group in groups if len(group) > 1])
If the input list is long, it would be faster to create a set of frozensets of the first 3 items of all sub-lists and use the set to filter all combinations of 3 items from each sub-list, so that the time complexity is essentially linear to the input list rather than quadratic despite the overhead in generating combinations:
from itertools import combinations
sets = {frozenset(lst[:3]) for lst in text_list}
groups = {}
for i, lst in enumerate(text_list):
for c in map(frozenset, combinations(lst, 3)):
if c in sets:
groups.setdefault(c, []).append(i)
print([sorted(group) for group in groups.values() if len(group) > 1])
I have two lists
parent_list = ['a', 'b', 'c']
list2 = [33.33, 38.33, 43.33, 48.33, 53.33, 58.33, 63.33, 68.33, 73.33, 78.33, 83.33, 88.33, 93.33, 98.33]
The sample of my expected output is given below
[[('a', 33.333333333333336),('b', 38.333333333333336),('c', 43.333333333333336)],
[('a', 38.333333333333336),('b', 38.333333333333336),('c', 38.333333333333336)]]
Its a Cartesian product of both the lists but removing the list if the 1st element of tuple is repeated in the list and each list should have all the elements of the parent list.
To do this. I am doing
Cartesian product
def cartesian_product(list1, list2):
return list(product(list1, list2))
all_combi = cartesian_product(parent_list,list1)
Making combinations
from itertools import combinations
combi = itertools.combinations(all_combi, 3)
combi = list(set(combi))
Removing unwanted lists.
final=[]
for i in combi:
# Using set
seen = set()
# using list comprehension
Output = [(a, b) for a, b in i if not (a in seen or seen.add(a))]
Final.append(Output)
print(Final)
Removing lists less than size of 3 elements
new_list=[x for x in Final if len(x)>=3]
I am looking for a more optimized way of achieving the desired output.
A python list consist of a number of items that is equally divided by 3.
The list looks like this:
the_list = ['ab','cd','e','fgh','i', 'jklm']
I want to merge 3 items at the time for the entire list. How should I do that? (the list could have any number of items as long as the number of items can be divided by 3)
expected_output = ['abcde', 'fghijklm']
You can slice the list while iterating an index over the length of the list with a step of 3 in a list comprehension:
[''.join(the_list[i:i + 3]) for i in range(0, len(the_list), 3)]
You can also create an iterator from the list and use zip with itertools.repeat to group 3 items at a time:
from itertools import repeat
i = iter(the_list)
[''.join(t) for t in zip(*repeat(i, 3))]
Both of the above return:
['abcde', 'fghijklm']
Here's one way using a list comprehension and range:
output = [''.join(the_list[i:i+3]) for i in range(0, len(the_list), 3)]
I have a list of lists of strings and I would like to obtain the largest string on each inner list and store them into another list. For example:
tableData = [['apples','bananas','oranges','cherries'],
['Alice','Bob','Carol','David'],
['dogs','cats','moose','goose']]
widths = [0] * len(tableData)
I want to store in widths[0], the width of the longest string in tableData[0], in widths[1] the width of the longest string in tableData[1], and so on. How can I compare the length of the words in order to get the greater?
The only way I thought is doing this:
for f in range(len(tableData)):
for c in range(len(tableData[1])):
max = len(tableData[f][c])
if max < len(tableData[f][c+1]):
max = len(tableData[f][c+1])
widths[f] = max
widths = [len(max(lst, key=len)) for lst in tableData]
For your list of lists called tableData, this gives:
[8, 5, 5]
Explanation:
[max(lst, key=len) for x in tableData] gives you a list containing the longest string of each nested list. Using len(), one then obtains the lengths of these.
The same result can be achieved using:
widths = [len(sorted(lst, key=len)[-1]) for lst in tableData]
Where sorted(lst, key=len) will sort the elements of each list by length, from shortest to longest. Using [-1] one can then obtain the last value of each list, i.e. the longest strings. len()will then compute their lengths.
Don't worry about initializing widths. List comprehensions are your friends here.
# creates a list of lists of word lengths
lengths_of_words = [[len(word) for word in x] for x in tableData]
# finds the max length for each list
widths = [max(y) for y in lengths_of_words]
Could also be shortened to
widths = [max([len(word) for word in x]) for x in tableData]
This is the old, incorrect answer if people would like to learn from my mistake:
widths = [len(max(x)) for x in tableData]
I need to split a list like:
m=[[1,2,3,4,5,6,7,8,9,0],[11,12,13,14,15,16,17,18,19,20],[21,22,23,24,25,26,27,28,29,30],[31,32,33,34,35,36,37,38,39,40],[41,42,43,44,45,46,47,48,49,50],[51,52,53,54,55,56,57,58,59,60],[61,62,63,64,65,66,67,68,69,70],[71,72,73,74,75,76,77,78,79,80],[81,82,83,84,85,86,87,88,89,90],[91,92,93,94,95,96,97,98,99,100],
into smaller 5x5 lists like:
m1=[[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25],[31,32,33,34,35],[41,42,43,44,45]]
m2=[[6,7,8,9,0],[16,17,18,19,20],[26,27,28,29,30],[36,37,38,39,40],[46,47,48,49,50]]
and have a new list that contains these smaller lists:
new_list=[m1,m2,m3,m4]
thanks
First, how can you split a list of 10 elements into two lists of 5 elements?
def split_list(m):
return m[:len(m)//2], m[len(m)//2:]
Now, we want to map that over each list in m:
mm = [split_list(sublist) for sublist in m]
But now we have a list of pairs of lists, not a pair of lists of lists. How do you fix that? zip is the answer: it turns an X of Y of foo into a Y of X of foo:
new_list = list(zip(*mm))
If you don't like the fact that this gives you a list of tuples of lists instead of a list of lists of lists, just use a list comprehension with the list function:
new_list = [list(m) for m in zip(*mm)]
If you want to change it to split any list into N/5 groups of 5, instead of 2 groups of N/2, that's just a matter of changing the first function. A general-purpose grouper function will do that, like the one from the itertools recipes (or see this question for other options, if you prefer):
def grouper(iterable, n=5):
args = [iter(iterable)] * n
return itertools.izip_longest(*args)
So:
new_list = [list(mm) for mm in zip(*grouper(m, 5))]