How to find combinations of a list in a Dictionary? - python

I am trying to find any permutation of a list inside another Dictionary with tuples.
For example, what would be the best way to find any combination of [1,2,3] inside of a Dictionary which is formatted like this: {(1,3,2):'text',(3,1,2):'text'}.
The only matches that would qualify for [1,2,3] would be (1,2,3),(1,3,2),(2,1,3),(2,3,1),(3,2,1),(3,1,2).
Matches that wouldn't qualify include Lists that don't contain all of the items (For example: (1,2) or (2)), and matches that contain extra items (for example:(1,2,3,4) or (2,3,7,1)).

Use itertools.permutations() to generate all permutations of a list:
from itertools import permutations
if any(tuple(perm) in yourdictionary for perm in permutations(yourlist)):
# match found
but you really want to rethink your data structure. If you made your keys frozenset() objects instead, you simply would test for:
if frozenset(yourlist) in yourdictionary:
# match found
which would be a lot faster.
Demos:
>>> from itertools import permutations
>>> yourdictionary = {(1,3,2):'text',(3,1,2):'text'}
>>> yourlist = [1, 2, 3]
>>> print any(tuple(perm) in yourdictionary for perm in permutations(yourlist))
True
>>> yourdictionary = {frozenset([1, 2, 3]): 'text', frozenset([4, 5, 6]): 'othertext'}
>>> frozenset(yourlist) in yourdictionary
True
>>> frozenset([2, 3]) in yourdictionary
False

Related

python efficient way to compare item in list of tuples

Is there an efficient way without for loops to compare if an item inside a list of tuples is the same across all tuples in Python?
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
The expected output is Return all unique values for item in index 1 of the tuple
unique = list()
for i in lst_tups:
item = i[1]
unique.append(item)
set(unique)
Expected Output:
>>
Unique values: [1]
True if all are equal, otherwise False
I think the set comprehension is an acceptable way:
>>> unique = {i[1] for i in lst_tups}
>>> unique
{1}
If you want to avoid the for loop anyway, you can use operator.itemgetter and map (for large lists, it will be slightly more efficient than set comprehension, but the readability is worse):
>>> from operator import itemgetter
>>> unique = set(map(itemgetter(1), lst_tups))
>>> unique
{1}
Then you can confirm whether the elements are all the same by judging whether the length of the set is 1:
>>> len(unique) == 1
True
If you only want to get the result or the item you want to compare is unhashable (such as dict), you can use itertools.pairwise (in Python3.10+) to compare adjacent elements to judge (but that doesn't mean it will be faster):
>>> from itertools import pairwise, starmap
>>> from operator import itemgetter, eq
>>> all(i[1] == j[1] for i, j in pairwise(lst_tups))
True
>>> all(starmap(eq, pairwise(map(itemgetter(1), lst_tups))))
True
According to the questions raised in the comment area, when your unique item is in another position or the element itself in the sequence, the above method only needs to be slightly modified to achieve the purpose, so here are two more general solutions:
def all_equal_by_set(iterable):
return len(set(iterable)) == 1
def all_equal_by_compare(iterable):
return all(starmap(eq, pairwise(iterable)))
Then you just need to call them like this:
>>> all_equal_by_set(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_set(tup[1] for tup in lst_tups) # Note that here is a generator expression, which is no longer comprehension.
True
>>> all_equal_by_compare(map(itemgetter(1), lst_tups))
True
>>> all_equal_by_compare(tup[1] for tup in lst_tups)
True
Solution without using for loop.
import operator
lst_tups = [('Hello', 1, 'Name:'), ('Goodbye', 1, 'Surname:'), ('See you!', 1, 'Time:')]
unique = set(map(operator.itemgetter(1),lst_tups))
print(unique) # {1}
Please consider above code and write if is an efficient way according to your standards.
You can use chain.from_iterable and slicing with three step : [1::3].
from itertools import chain
res = list(chain.from_iterable(lst_tups))[1::3]
print(set(res))
# If you want to print True if all are equal, otherwise False
if len(set(res)) == 1:
print('True')
else:
print('False')
{1}

Add element to set in the form of list comprehension in Python

I have a list of sets. I want to add an element to each of these sets, and I want to do this with list comprehension. This is what I have tried:
In [1]: sets1 = [set()]
In [2]: sets2 = [{1,2}, {1,2,3}]
In [3]: [e.add(0) for e in sets1]
Out[3]: [None]
In [4]: [e.add(0) for e in sets2]
Out[4]: [None, None]
My desired output is:
[{0}]
[{1,2,0}, {1,2,3,0}]
Why does the above code return None instead of an addition of elements to the list, and how I can make this work?
I would suggest:
[e | {0} for e in sets1]
or:
[e.union({0}) for e in sets1]
I wouldn't use a list comprehension in this case, a plain for loop would be simpler:
for subset in sets1:
subset.add(0)
print(sets1)
should give you the desired output.
I already pointed it out in the comments why your approach seemingly did not work:
set.add works in place and does not return anything (thus your Nones). If you want your desired output then run the list-comprehension but don't save its result. Check your set1 and set2 after the list-comprehension to get the desired output.
So you could just check sets1 and sets2 after the list comprehension. It should return: [{0}] and [{1,2,0}, {1,2,3,0}] (order may vary because sets are unordered).
Actually your sets1 and sets2 variables have become the results that you want, because the add statement operates the sets1 but not generate a new list.
You can print(sets1) and print(sets2) to testify.
Let's first regenerate your problem.
>>> test_set = set()
>>> test_set
set()
>>> print(test_set.add(0))
None
>>> test_set
{0}
>>>
As you can see, test_set.add(0) returns None. But this is an in place operation, so the item did get added., which is evident from the above snippet.
How to solve the problem:
You can union after making the element a set rather than using the add method.
>>> [i.union({0}) for i in sets2]
[{0, 1, 2}, {0, 1, 2, 3}]
If you have a list/set of element to add to the exiting list of sets, you can do the following:
elements_to_add = [3,4,5]
>>> [i.union(set(elements_to_add)) for i in sets2]
[{1, 2, 3, 4, 5}, {1, 2, 3, 4, 5}]
However, this is not an in-place operation. sets2 would be exactly same before and after running the above list comprehension.

Indexing a list with an unique index

I have a list say l = [10,10,20,15,10,20]. I want to assign each unique value a certain "index" to get [1,1,2,3,1,2].
This is my code:
a = list(set(l))
res = [a.index(x) for x in l]
Which turns out to be very slow.
l has 1M elements, and 100K unique elements. I have also tried map with lambda and sorting, which did not help. What is the ideal way to do this?
You can do this in O(N) time using a defaultdict and a list comprehension:
>>> from itertools import count
>>> from collections import defaultdict
>>> lst = [10, 10, 20, 15, 10, 20]
>>> d = defaultdict(count(1).next)
>>> [d[k] for k in lst]
[1, 1, 2, 3, 1, 2]
In Python 3 use __next__ instead of next.
If you're wondering how it works?
The default_factory(i.e count(1).next in this case) passed to defaultdict is called only when Python encounters a missing key, so for 10 the value is going to be 1, then for the next ten it is not a missing key anymore hence the previously calculated 1 is used, now 20 is again a missing key and Python will call the default_factory again to get its value and so on.
d at the end will look like this:
>>> d
defaultdict(<method-wrapper 'next' of itertools.count object at 0x1057c83b0>,
{10: 1, 20: 2, 15: 3})
The slowness of your code arises because a.index(x) performs a linear search and you perform that linear search for each of the elements in l. So for each of the 1M items you perform (up to) 100K comparisons.
The fastest way to transform one value to another is looking it up in a map. You'll need to create the map and fill in the relationship between the original values and the values you want. Then retrieve the value from the map when you encounter another of the same value in your list.
Here is an example that makes a single pass through l. There may be room for further optimization to eliminate the need to repeatedly reallocate res when appending to it.
res = []
conversion = {}
i = 0
for x in l:
if x not in conversion:
value = conversion[x] = i
i += 1
else:
value = conversion[x]
res.append(value)
Well I guess it depends on if you want it to return the indexes in that specific order or not. If you want the example to return:
[1,1,2,3,1,2]
then you can look at the other answers submitted. However if you only care about getting a unique index for each unique number then I have a fast solution for you
import numpy as np
l = [10,10,20,15,10,20]
a = np.array(l)
x,y = np.unique(a,return_inverse = True)
and for this example the output of y is:
y = [0,0,2,1,0,2]
I tested this for 1,000,000 entries and it was done essentially immediately.
Your solution is slow because its complexity is O(nm) with m being the number of unique elements in l: a.index() is O(m) and you call it for every element in l.
To make it O(n), get rid of index() and store indexes in a dictionary:
>>> idx, indexes = 1, {}
>>> for x in l:
... if x not in indexes:
... indexes[x] = idx
... idx += 1
...
>>> [indexes[x] for x in l]
[1, 1, 2, 3, 1, 2]
If l contains only integers in a known range, you could also store indexes in a list instead of a dictionary for faster lookups.
You can use collections.OrderedDict() in order to preserve the unique items in order and, loop over the enumerate of this ordered unique items in order to get a dict of items and those indices (based on their order) then pass this dictionary with the main list to operator.itemgetter() to get the corresponding index for each item:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> itemgetter(*lst)({j:i for i,j in enumerate(OrderedDict.fromkeys(lst),1)})
(1, 1, 2, 3, 1, 2)
For completness, you can also do it eagerly:
from itertools import count
wordid = dict(zip(set(list_), count(1)))
This uses a set to obtain the unique words in list_, pairs
each of those unique words with the next value from count() (which
counts upwards), and constructs a dictionary from the results.
Original answer, written by nneonneo.

How to find the number of instances of an item in a list of lists

I want part of a script I am writing to do something like this.
x=0
y=0
list=[["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
row=list[y]
item=row[x]
print list.count(item)
The problem is that this will print 0 because it isn't searching the individual lists.How can I make it return the total number of instances instead?
Search per sublist, adding up results per contained list with sum():
sum(sub.count(item) for sub in lst)
Demo:
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> item = 'cat'
>>> sum(sub.count(item) for sub in lst)
3
sum() is a builtin function for adding up its arguments.
The x.count(item) for x in list) is a "generator expression" (similar to a list comprehension) - a handy way to create and manage list objects in python.
item_count = sum(x.count(item) for x in list)
That should do it
Using collections.Counter and itertools.chain.from_iterable:
>>> from collections import Counter
>>> from itertools import chain
>>> lst = [["cat","dog","mouse",1],["cat","dog","mouse",2],["cat","dog","mouse",3]]
>>> count = Counter(item for item in chain.from_iterable(lst) if not isinstance(item, int))
>>> count
Counter({'mouse': 3, 'dog': 3, 'cat': 3})
>>> count['cat']
3
I filtered out the ints because I didn't see why you had them in the first place.

python list sort

I have got a list i.e.
ls= [u'Cancer',u"Men's",u'Orthopedics',u'Pediatric',u"Senior's",u"Women's"]
ls.sort() does not seem to work here due to presence of single quote in the list elements.
I need to sort this list. Any idea???
Actually, the question is valid and the answer is not exactly correct in general case.
If the test material was not already sorted, it would not get correctly alphabetized but the 's would cause the list to be sorted to wrong order:
>>> l = ["'''b", "a", "a'ab", "aaa"]
>>> l.sort()
>>> l
["'''b", 'a', "a'ab", 'aaa']
>>> from functools import partial
>>> import string
>>> keyfunc = partial(string.replace, old="'", new="")
>>> l.sort(key=keyfunc)
>>> l
['a', 'aaa', "a'ab", "'''b"]
>>> ls
[u'Cancer', u"Men's", u'Orthopedics', u'Pediatric', u"Senior's", u"Women's"]
>>> ls.sort()
>>> ls
[u'Cancer', u"Men's", u'Orthopedics', u'Pediatric', u"Senior's", u"Women's"]
Since the list was sorted in the first place, it didn't change. sort has no problem with ' - but note that it sorts before the a-z and A-Z characters:
>>> ls
[u'abc', u'abz', u"ab'"]
>>> ls.sort()
>>> ls
[u"ab'", u'abc', u'abz']
>>>

Categories