Form the sequence of a union of two sets efficiently - python

Dear Reader of this post,
do you have an idea how to form the sequence of a union of two lists efficiently in Python? Suppose there are the two following lists:
list_a = [['a','b'],['a','c'],['b','c']]
list_b = ['h','i']
and I want to calculate:
new_list = [['a','b','h'],['a','b','i'],['a','c','h'],['a','c','i'],['b','c','h'],
['b','c','i']].
Up to now, I am using the following loops:
new_list = [r+[j] for j in list_b for r in list_a]
However, I find this ugly. I do not like the two loops. I prefer using functions instead.
Do you have another idea to achieve the desired task? (Among other things, I tried to follow this Get the cartesian product of a series of lists in Python suggestion, but to no avail.)
I am grateful for any suggestions!
Best regards,
Fabian

You could use itertools.product:
[a+[b] for a, b in itertools.product(list_a, list_b)]
However, there's nothing really wrong with the way you did it. Using two loops in a list comprehension is fine if you really need to get every combination of elements from both iterables. The product solution, though, does have the advantage that it is more easily extended to more iterables (e.g., it's easy to get every combination of three or four or ten lists, whereas deeply nested loops can become cumbersome).

How does it look?
>>> new_list = []
>>> for item in itertools.product(list_a, list_b):
... new_list.append([x for li in list(item) for x in li])
...
>>> new_list
[['a', 'b', 'h'], ['a', 'b', 'i'], ['a', 'c', 'h'], ['a', 'c', 'i'], ['b', 'c', 'h'], ['b', 'c', 'i']]
>>>

Related

How to efficiently get common items from two lists that may have duplicates?

my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']
First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).
sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']

How to make a n-dimention list with one dimention lists with a loop

I'm learning python and I have been trying to make an automatic list of lists. For example:
The next list, which I have to split to get a list of separated letters, and then join them again to make a two lists of separated letters
lists=[['a,b,c,h'],['d,e,f,g']]
print('list length', len(lists))
splited=[]
splited1=lists[0][0].split(',')
print(splited1) #['a', 'b', 'c', 'h']
splited2=lists[1][0].split(',')
print(splited2) #['d', 'e', 'f', 'g']
listcomb=[splited1,splited2]
print(listcomb) #[['a', 'b', 'c', 'h'], ['d', 'e', 'f', 'g']]
This is what I want to get, a list that have 2 lists, but in the case I have to get more lists inside that list i want to make it automatic with a for loop.
My try, but it didn't work
listcomb2=zip(splited1,splited2)
print(listcomb2)
sepcomb = list()
print(type(sepcomb))
for x in range(len(lists)):
sep=lists[x][0].split(',')
sepcomb[x]=[sep]
print(sepcomb)
I'm having problems with splitting the letters and then joining them in a new list. Pls help
Make some tweak in your code. As we can see lenght of sepcomb is 0 so use append method to avoid this problem. As sepcomb[x]=[sep] is is assignment to x-index but it x index doesn't exist so, it will raise error
change:
for x in range(len(lists)):
sep=lists[x][0].split(',')
sepcomb[x]=[sep]
to
for x in range(len(lists)):
sep=lists[x][0].split(',')
sepcomb.append(sep)
Method-2
sepcomb = list(i[0].split(',') for i in lists)
You can simply do the following:
final=[splinted1]+[splinted2]
Or a better way directly from the lists variable would be:
final=[value[0].split(',') for value in lists]
Here you go:
lists = [['a,b,c,h'],['d,e,f,g']]
listcomb = []
for each in lists:
splited = each[0].split(',')
listcomb.append(splited)
print(listcomb)

How to merge two unequal nested list in python?

list1 = [['apple','b','c'] ,['dolly','e','f']]
list2 =[['awsme','b','c'] ,['dad','e','f'],['tally','e','f']]
list_combine = [item for sublst in zip(list1, list2) for item in sublst]
print(list_combine)
Expected Output:
list_combine = [['apple','b','c'] ,['dolly','e','f'],['awsme','b','c'] ,['dad','e','f'],['tally','e','f']]
How to merge 2 unequal nested list into single nested list in python
You can just use the '+' operator to join 2 lists.
list_combine = list1 + list2
print(list_combine)
Output
list_combine = [['apple','b','c'] ,['dolly','e','f'],['awsme','b','c'] ,['dad','e','f'],['tally','e','f']]
You may simply concatenate the two lists by defining a new variable like list3 or whatever you call.
Also due to PEP8, I just modified your code in a more Pythonic way so that it would be more readable. Things like space before comma in not suggested, but after that is recommended. This recommendation is not just in Python, but also it is the better way to write in English grammatically too.
You may check this out and inform me should you have any doubts and questions regarding my answer:
list1 = [['apple', 'b', 'c'], ['dolly', 'e', 'f']]
list2 = [['awsme', 'b', 'c'], ['dad', 'e', 'f'], ['tally', 'e', 'f']]
list3 = list1 + list2
print(list3)
I hope it would be useful.

Getting specific indexed distinct values in nested lists

I have a nested list of around 1 million records like:
l = [['a', 'b', 'c', ...], ['d', 'b', 'e', ...], ['f', 'z', 'g', ...],...]
I want to get the distinct values of inner lists on second index, so that my resultant list be like:
resultant = ['b', 'z', ...]
I have tried nested loops but its not fast, any help will be appreciated!
Since you want the unique items you can use collections.OrderedDict.fromkeys() in order to keep the order and unique items (because of using hashtable fro keys) and use zip() to get the second items.
from collections import OrderedDict
list(OrderedDict.fromkeys(zip(my_lists)[2]))
In python 3.x since zip() returns an iterator you can do this:
colls = zip(my_lists)
next(colls)
list(OrderedDict.fromkeys(next(colls)))
Or use a generator expression within dict.formkeys():
list(OrderedDict.fromkeys(i[1] for i in my_lists))
Demo:
>>> lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
>>>
>>> list(OrderedDict().fromkeys(sub[1] for sub in lst))
['b', 'z']
You can unzip the list of lists then choice the second tuple with set like below :
This code take 4.05311584473e-06 millseconds, in my laptop
list(set(zip(*lst)[1]))
Input :
lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
Out put :
['b', 'z']
Would that work for you?
result = set([inner_list[1] for inner_list in l])
I can think of two options.
Set comprehension:
res = {x[1] for x in l}
I think numpy arrays work faster than list/set comprehensions, so converting this list to an array and then using array functions can be faster. Here:
import numpy as np
res = np.unique(np.array(l)[:, 1])
Let me explain: np.array(l) converts the list to a 2d array, then [:, 1] take the second column (starting to count from 0) which consists of the second item of each sublist in the original l, and finally taking only unique values using np.unique.

Merging a list of strings and a list of lists

This maybe a duplicate but I couldn't find a specific answer.
I also found one answer in composing this question but would like to know if there is a better option or one which works without knowing which item is a list of strings.
My question:
la=['a', 'b', 'c']
lb=[['d','e'], ['f','g'], ['i','j']]
I would like:
[['a','d','e'], ['b','f','g'], ['c','i','j']]
I discovered the following works specifically for my example;
la=['a', 'b', 'c']
lb=[['d','e'], ['f','g'], ['i','j']]
[ [x] + y for x,y in zip(la, lb)]
[['a', 'd', 'e'], ['b', 'f', 'g'], ['c', 'i', 'j']]
It works because I make the string list into a list before concatenating and avoids the TypeError: cannot concatenate 'str' and 'list' objects
Is there a more elegant solution?
You can use numpy.column_stack:
>>> la=['a', 'b', 'c']
>>> lb=[['d','e'], ['f','g'], ['i','j']]
>>> import numpy as np
>>> np.column_stack((la,lb))
array([['a', 'd', 'e'],
['b', 'f', 'g'],
['c', 'i', 'j']],
dtype='|S1')
If you want an expression I can't think of anything better than using zip as above. If you want to explicitly insert elements elements from la into elements of lb at their heads, I'd do
for i in range( len(la) ):
lb[i].insert(0, la[i])
which avoids having to know what zip is or does. Maybe also first check:
if len(la) != len(lb) : raise IndexError, "List lengths differ"
without that it'll "work" when lb is longer than la. BTW This isn't exactly the same wrt corner cases / duck typing. Seems safer to use insert, which method should exist only for a list-like object, than "+".
Also, purely personally, I'd write the above on one line
for i in range( len(la) ): lb[i].insert(0, la[i])

Categories