Related
I have a grouped list of strings that sort of looks like this, the lists inside of these groups will always contain 5 elements:
text_list = [['aaa','bbb','ccc','ddd','eee'],
['fff','ggg','hhh','iii','jjj'],
['xxx','mmm','ccc','bbb','aaa'],
['fff','xxx','aaa','bbb','ddd'],
['aaa','bbb','ccc','ddd','eee'],
['fff','xxx','aaa','ddd','eee'],
['iii','xxx','ggg','jjj','aaa']]
The objective is simple, group all of the list that is similar by the first 3 elements that is then compared against all of the elements inside of the other groups.
So from the above example the output might look like this (output is the index of the list):
[[0,2,4],[3,5]]
Notice how if there is another list that contains the same elements but in a different order is removed.
I've written the following code to extract the groups but they would return duplicates and I am unsure how to proceed. I also think this might not be the most efficient way to do the extraction as the real list can contain upwards to millions of groups:
grouped_list = []
for i in range(0,len(text_list)):
int_temp = []
for m in range(0,len(text_list)):
if i == m:
continue
bool_check = all( x in text_list[m] for x in text_list[i][0:3])
if bool_check:
if len(int_temp) == 0:
int_temp.append(i)
int_temp.append(m)
continue
int_temp.append(m)
grouped_list.append(int_temp)
## remove index with no groups
grouped_list = [x for x in grouped_list if x != []]
Is there a better way to go about this? How do I remove the duplicate group afterwards? Thank you.
Edit:
To be clearer, I would like to retrieve the lists that is similar to each other but only using the first 3 elements of the other lists. For example, using the first 3 elements from list A, check if list B,C,D... contains all 3 of the elements from list A. Repeat for the entire list then remove any list that contains duplicate elements.
You can build a set of frozensets to keep track of indices of groups with the first 3 items being a subset of the rest of the members:
groups = set()
sets = list(map(set, text_list))
for i, lst in enumerate(text_list):
groups.add(frozenset((i, *(j for j, s in enumerate(sets) if set(lst[:3]) <= s))))
print([sorted(group) for group in groups if len(group) > 1])
If the input list is long, it would be faster to create a set of frozensets of the first 3 items of all sub-lists and use the set to filter all combinations of 3 items from each sub-list, so that the time complexity is essentially linear to the input list rather than quadratic despite the overhead in generating combinations:
from itertools import combinations
sets = {frozenset(lst[:3]) for lst in text_list}
groups = {}
for i, lst in enumerate(text_list):
for c in map(frozenset, combinations(lst, 3)):
if c in sets:
groups.setdefault(c, []).append(i)
print([sorted(group) for group in groups.values() if len(group) > 1])
I have a list of lists like this:
my_list_of_lists =
[['sparrow','sparrow','sparrow','junco','jay','robin'],
['sparrow','sparrow','junco', 'sparrow','robin','robin'],
['sparrow','sparrow','sparrow','sparrow','jay','robin']]
I would like to do a pairwise comparison at each position for all lists with the list like this:
#1 with 2
['sparrow','sparrow','sparrow','junco','jay','robin']
['sparrow','sparrow','junco', 'sparrow','robin','robin']
#1 with 3
['sparrow','sparrow','sparrow','junco','jay','robin']
['sparrow','sparrow','sparrow','sparrow','jay','robin']
#2 with 3
['sparrow','sparrow','junco', 'sparrow','robin','robin']
['sparrow','sparrow','sparrow','sparrow','jay','robin']
So the pairs for the 1 with 2:
pairs =[('sparrow','sparrow'), ('sparrow','sparrow'), ('sparrow','junco'),('junco','sparrow'),('junco','junco'), ('jay','robin'), ('robin','robin')]
I would like to get the counts and frequency of the pairs in each pairwise comparison:
pairs =[('sparrow','sparrow'), ('sparrow','sparrow'), ('sparrow','junco'),('junco','sparrow') ('junco','junco'), ('jay','robin'), ('robin','robin')]
sparrowsparrow_counts = 2
juncosparrow_counts = 2
jayrobin_counts = 1
robinrobin = 1
frequency_of_combos = [('sparrow', 'sparrow'):.333, ('sparrow', 'junco'):.333, ('jay', 'robin'):.167, ('robin', 'robin'): .167]
I've tried zipping but I end up zipping all of the lists (not the pairs) into tuples and I'm stumped on the rest.
I think it's somewhat related to How to calculate counts and frequencies for pairs in list of lists? but I can't figure out how to apply this to my data.
Zip the two lists, then filter out the pairs that don't match, and use collections.Counter to count them:
from collections import Counter
a = ['sparrow','sparrow','sparrow','junco','jay','robin']
b = ['sparrow','sparrow','junco', 'sparrow','robin','robin']
c = Counter([ i for i in zip(a,b) if i[0] == i[1]])
print(c)
Counter({('sparrow', 'sparrow'): 2, ('robin', 'robin'): 1})
You seem to have the frequency part figured out, but that should clear up the use of zip and Counter.
I want to separate different numbers in this list:
[([2437], [0.235]), ([4942], [0.217])]
and put them in new lists. Python returns length of this list equal to 2. I need to have two lists like these: [2437, 4942] and [0.235, 0.217].
How can I able to reach these lists from the above list in python?
You can use the following method for Python 3:
>>> my_list = [([2437], [0.235]), ([4942], [0.217])]
>>> sub_list1, sub_list2 = [[*item1, *item2] for item1, item2 in zip(*my_list)]
>>> sub_list1
[2437, 4942]
>>> sub_list2
[0.235, 0.217]
If it's always two tuples of two single item lists, another way to unpack:
l1, l2 = [[inner_list[i][0] for inner_list in my_list] for i in range(2)]
I have a function that takes in any 2-d array and return a 2-d array (the same format as the array being implemented) but the values are squared.
i.e [[1,2],[3,4]] -----> [[1,4],[9,16]]
my code so far:
m0 = [[1,2],[3,4]]
empty_list = []
for x in m0:
for i in x:
empyt_list.append(x**2)
This gives me a 1-d array but how would i return a 2-d array as the imputed value?
You can make a recursive function to handle any depth of nested lists:
def SquareList(L):
if type(L) is list:
return [SquareList(x) for x in L]
else:
return L**2
Example:
> print(SquareList([1,[3],[2,[3]],4]))
[1, [9], [4, [9]], 16]
Working with an outer list
The point is that you will need an extra list outside to store the columns. So we can introduce temporary lists we build up and add as rows:
m0 = [[1,2],[3,4]]
result = []
for sublist in m0:
row = []
for item in sublist:
row.append(item**2)
result.append(row)
Notice that we here iterate over the items of the sublist.
Using list comprehension
We can however write this more elegantly with list comprehension
result = [[x*x for x in sublist] for sublist in m0]
Note: if you have to square a number x, it is usually more efficient to use x * x, then to write x ** 2.
Using numpy (for rectangular lists)
In case the list is rectangular (all sublists have the same length), we can use numpy instead:
from numpy import array
a0 = array(m0)
result = a0 ** 2
You can just do this by a list comprehension:
empty_list = [[m0[i][j]**2 for j in range(len(m0[i]))] for i in range(len(m0))]
Or like your Codestyle:
empty_list = m0
for i in range(len(m0)):
for j in range(len(m0[i])):
empty_list[i][j] = m0[i][j] ** 2
Your problem is that you never created a 2D-list and you just append the values on the created 1D-list.
I need to split a list like:
m=[[1,2,3,4,5,6,7,8,9,0],[11,12,13,14,15,16,17,18,19,20],[21,22,23,24,25,26,27,28,29,30],[31,32,33,34,35,36,37,38,39,40],[41,42,43,44,45,46,47,48,49,50],[51,52,53,54,55,56,57,58,59,60],[61,62,63,64,65,66,67,68,69,70],[71,72,73,74,75,76,77,78,79,80],[81,82,83,84,85,86,87,88,89,90],[91,92,93,94,95,96,97,98,99,100],
into smaller 5x5 lists like:
m1=[[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25],[31,32,33,34,35],[41,42,43,44,45]]
m2=[[6,7,8,9,0],[16,17,18,19,20],[26,27,28,29,30],[36,37,38,39,40],[46,47,48,49,50]]
and have a new list that contains these smaller lists:
new_list=[m1,m2,m3,m4]
thanks
First, how can you split a list of 10 elements into two lists of 5 elements?
def split_list(m):
return m[:len(m)//2], m[len(m)//2:]
Now, we want to map that over each list in m:
mm = [split_list(sublist) for sublist in m]
But now we have a list of pairs of lists, not a pair of lists of lists. How do you fix that? zip is the answer: it turns an X of Y of foo into a Y of X of foo:
new_list = list(zip(*mm))
If you don't like the fact that this gives you a list of tuples of lists instead of a list of lists of lists, just use a list comprehension with the list function:
new_list = [list(m) for m in zip(*mm)]
If you want to change it to split any list into N/5 groups of 5, instead of 2 groups of N/2, that's just a matter of changing the first function. A general-purpose grouper function will do that, like the one from the itertools recipes (or see this question for other options, if you prefer):
def grouper(iterable, n=5):
args = [iter(iterable)] * n
return itertools.izip_longest(*args)
So:
new_list = [list(mm) for mm in zip(*grouper(m, 5))]