Intersection in sets - python

I have my_dict with sets as values and I have x which is also a set.
I need to return list with set from my dict which contain all numbers in x. If set in my_dict does not contain all numbers in x I do not want to return it.
I want to use intersection (&) but it returns all the sets in my_dict.
my_dict = {1: {1,2,3,4,5},
2: {1,2,3,7,8},
3: {1,2,3,4}
}
x = {1,2,5}
new_list = []
for i in my_dict:
if my_dict[i] & x:
new_list.append(i)
print(new_list)
Output:
[1, 2, 3]
I need to receive [1] instead of [1, 2, 3]

When intersection becomes x that means all values in x are present in the set in dictionary.
for i in my_dict:
if (my_dict[i] & x)==x:
new_list.append(i)
print(new_list)
Edit: as suggested in the comments below you can also do
for i in my_dict:
if x.issubset(my_dict[i]):
new_list.append(i)
print(new_list)

I suggest you use the set.issuperset method, rather than using the & operator. Why combine several operators when a method exists to do exactly what you want?
new_list = []
for i in my_dict:
if my_dict[i].issuperset(x):
new_list.append(i)
Note that I'd normally write this with a list comprehension:
newlist = [key for key, value in my_dict.items() if value.issuperset(x)]

The inter section between my_dict values and x should be equal to x that means x should be a subset of my_dict value
my_dict = {1: {1,2,3,4,5},
2: {1,2,3,7,8},
3: {1,2,3,4}}
x = {1,2,5}
new_list = []
for i,j in my_dict.items():
if x.issubset(j):
new_list.append(i)
print(new_list)

This can also be solved using the issubset function. Here's an example:
for i in my_dict:
if x.issubset(my_dict[i]):
new_list.append(i)
Output: [1]
In this example, we're checking whether the value of every key value pair in the dictionary is a super-set of x (in other words x belongs to my_dict[i]), if that is the case then we just append the index to the desired list.

To check whether the entirety of a set is within another set, the nicest (in my opinon) way is to use the < and > operators, which are override to act as the equivalent of "is a superset of" in mathematics, and equivalent to the set.issuperset method. The advantage of this way is that the >= and <= operators are naturally available to check non-strict supersets.
Here's quite an idomatic way of doing it:
new_list = []
for key, value in my_dict.items():
if value >= x:
new_list.append(key)
The problem with your original code is it checks to see if there is any intersection between the two sets, i.e. they share even just one element, when you seem to want to check if all of x: set is in the set you're checking against.
I would also advise using a list compehension if you want to simplify the code, unless you have other steps you also need to do.
new_list = [key for key, value in my_dict.items() if value >= x]

Related

Easier way to check if an item from one list of tuples doesn't exist in another list of tuples in python

I have two lists of tuples, say,
list1 = [('item1',),('item2',),('item3',), ('item4',)] # Contains just one item per tuple
list2 = [('item1', 'd',),('item2', 'a',),('item3', 'f',)] # Contains multiple items per tuple
Expected output: 'item4' # Item that doesn't exist in list2
As shown in above example I want to check which item in tuples in list 1 does not exist in first index of tuples in list 2. What is the easiest way to do this without running two for loops?
Assuming your tuple structure is exactly as shown above, this would work:
tuple(set(x[0] for x in list1) - set(x[0] for x in list2))
or per #don't talk just code, better as set comprehensions:
tuple({x[0] for x in list1} - {x[0] for x in list2})
result:
('item4',)
This gives you {'item4'}:
next(zip(*list1)) - dict(list2).keys()
The next(zip(*list1)) gives you the tuple ('item1', 'item2', 'item3', 'item4').
The dict(list2).keys() gives you dict_keys(['item1', 'item2', 'item3']), which happily offers you set operations like that set difference.
Try it online!
This is the only way I can think of doing it, not sure if it helps though. I removed the commas in the items in list1 because I don't see why they are there and it affects the code.
list1 = [('item1'),('item2'),('item3'), ('item4')] # Contains just one item per tuple
list2 = [('item1', 'd',),('item2', 'a',),('item3', 'f',)] # Contains multiple items per tuple
not_in_tuple = []
OutputTuple = [(a) for a, b in list2]
for i in list1:
if i in OutputTuple:
pass
else:
not_in_tuple.append(i)
for i in not_in_tuple:
print(i)
You don't really have a choice but to loop over the two lists. Once efficient way could be to first construct a set of the first elements of list2:
items = {e[0] for e in list2}
list3 = list(filter(lambda x:x[0] not in items, list1))
Output:
>>> list3
[('item4',)]
Try set.difference:
>>> set(next(zip(*list1))).difference(dict(list2))
{'item4'}
>>>
Or even better:
>>> set(list1) ^ {x[:1] for x in list2}
{('item4',)}
>>>
that is a difference operation for sets:
set1 = set(j[0] for j in list1)
set2 = set(j[0] for j in list2)
result = set1.difference(set2)
output:
{'item4'}
for i in list1:
a=i[0]
for j in list2:
b=j[0]
if a==b:
break
else:
print(a)

Parse a list, check if it has elements from another list and print out these elements

I have a list populated from entries of a log; for sake of simplicity, something like
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"......]
This list can have an undefined number of entry, which may or may not be in sequence, since I run multiple operations in async fashion.
Then I have another list, which I use as reference to get only the list of entries; which may be like
list_template = ["entry1", "entry2", "entry3"]
I am trying to use the second list, to get sequences of entries, so I can isolate the single sequence, taking only the first instance found of each entry.
Since I am not dealing with numbers, I can't use set, so I did try with a loop inside a loop, comparing values in each list
This does not work, because it is possible that another entry may happen before what I am looking for (say, I want entry1, entry2, entry3, and the loop find entry1, but then find entry3, and since I compare every element of each list, it will be happy to find an element)
for item in listlog:
entry, value = item.split(":")
for reference_entry in list_template:
if entry == reference_entry:
print item
break
I have to, in a nutshell, find a sequence as in the template list, while these items are not necessarily in order. I am trying to parse the list once, otherwise I could do a very expensive multi-pass for each element of the template list, until I find the first occurrence and bail out. I thought that doing the loop in the loop is more efficient, since my reference list is always smaller than the log list, which is usually few elements.
How would you approach this problem, in the most efficient and pythonic way? All that I can think of, is multiple passes on the log list
you can use dict:
>>> listlog
['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
>>> list_template
['entry1', 'entry2', 'entry3']
>>> for x in listlog:
... key, value = x.split(":")
... if key not in my_dict and key in list_template:
... my_dict[key] = value
...
>>> my_dict
{'entry2': 'abbds', 'entry3': 'orieqor', 'entry1': 'abcde'}
Disclaimer : This answer could use someone's insight on performance. Sure, list/dict comprehensions and zip are pythonic but the following may very well be a poor use of those tools.
You could use zip :
>>> data = ["a:12", "b:32", "c:54"]
>>> ref = ['c', 'b']
>>> matches = zip(ref, [val for key,val in [item.split(':') for item in data] if key in ref])
>>> for k, v in matches:
>>> print("{}:{}".format(k, v))
c:32
b:54
Here's another (worse? I'm not sure, performance-wise) way to get around this :
>>> data = ["a:12", "b:32", "c:54"]
>>> data_dict = {x:y for x,y in [item.split(':') for item in data]}
>>> ["{}:{}".format(key, val) for key,val in md.items() if key in ref]
['b:32', 'c:54']
Explanation :
Convert your initial list into a dict using a dict
For each pair of (key, val) found in the dict, join both in a string if the key is found in the 'ref' list
You can use a list comprehension something like this:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
print([item for item in listlog if re.search('entry', item)])
# ['entry1:abcde', 'entry2:abbds', 'entry1:eorieo', 'entry3:orieqor', 'entry2:iroewiow']
Than u can split 'em as u wish and create a dictonary if u want:
import re
listlog = ["entry1:abcde", "entry2:abbds", "entry1:eorieo", "entry3:orieqor", "entry2:iroewiow"]
mylist = [item for item in listlog if re.search('entry', item)]
def create_dict(string, dict_splitter=':'):
_dict = {}
temp = string.split(dict_splitter)
key = temp[0]
value = temp[1]
_dict[key] = value
return _dict
mydictionary = {}
for x in mylist:
x = str(x)
mydictionary.update(create_dict(x))
for k, v in mydictionary.items():
print(k, v)
# entry1 eorieo
# entry2 iroewiow
# entry3 orieqor
As you see this method need an update, cause we have changing the dictionary value. That's bad. Most better to update value for the same key. But it's much easier as u can think

Creating a list by iterating over a dictionary

I defined a dictionary like this (list is a list of integers):
my_dictionary = {'list_name' : list, 'another_list_name': another_list}
Now, I want to create a new list by iterating over this dictionary. In the end, I want it to look like this:
my_list = [list_name_list_item1, list_name_list_item2,
list_name_list_item3, another_list_name_another_list_item1]
And so on.
So my question is: How can I realize this?
I tried
for key in my_dictionary.keys():
k = my_dictionary[key]
for value in my_dictionary.values():
v = my_dictionary[value]
v = str(v)
my_list.append(k + '_' + v)
But instead of the desired output I receive a Type Error (unhashable type: 'list') in line 4 of this example.
You're trying to get a dictionary item by it's value whereas you already have your value.
Do it in one line using a list comprehension:
my_dictionary = {'list_name' : [1,4,5], 'another_list_name': [6,7,8]}
my_list = [k+"_"+str(v) for k,lv in my_dictionary.items() for v in lv]
print(my_list)
result:
['another_list_name_6', 'another_list_name_7', 'another_list_name_8', 'list_name_1', 'list_name_4', 'list_name_5']
Note that since the order in your dictionary is not guaranteed, the order of the list isn't either. You could fix the order by sorting the items according to keys:
my_list = [k+"_"+str(v) for k,lv in sorted(my_dictionary.items()) for v in lv]
Try this:
my_list = []
for key in my_dictionary:
for item in my_dictionary[key]:
my_list.append(str(key) + '_' + str(item))
Hope this helps.
Your immediate problem is that dict().values() is a generator yielding the values from the dictionary, not the keys, so when you attempt to do a lookup on line 4, it fails (in this case) as the values in the dictionary can't be used as keys. In another case, say {1:2, 3:4}, it would fail with a KeyError, and {1:2, 2:1} would not raise an error, but likely give confusing behaviour.
As for your actual question, lists do not attribute any names to data, like dictionaries do; they simply store the index.
def f()
a = 1
b = 2
c = 3
l = [a, b, c]
return l
Calling f() will return [1, 2, 3], with any concept of a, b, and c being lost entirely.
If you want to simply concatenate the lists in your dictionary, making a copy of the first, then calling .extend() on it will suffice:
my_list = my_dictionary['list_name'][:]
my_list.extend(my_dictionary['another_list_name'])
If you're looking to keep the order of the lists' items, while still referring to them by name, look into the OrderedDict class in collections.
You've written an outer loop over keys, then an inner loop over values, and tried to use each value as a key, which is where the program failed. Simply use the dictionary's items method to iterate over key,value pairs instead:
["{}_{}".format(k,v) for k,v in d.items()]
Oops, failed to parse the format desired; we were to produce each item in the inner list. Not to worry...
d={1:[1,2,3],2:[4,5,6]}
list(itertools.chain(*(
["{}_{}".format(k,i) for i in l]
for (k,l) in d.items() )))
This is a little more complex. We again take key,value pairs from the dictionary, then make an inner loop over the list that was the value and format those into strings. This produces inner sequences, so we flatten it using chain and *, and finally save the result as one list.
Edit: Turns out Python 3.4.3 gets quite confused when doing this nested as generator expressions; I had to turn the inner one into a list, or it would replace some combination of k and l before doing the formatting.
Edit again: As someone posted in a since deleted answer (which confuses me), I'm overcomplicating things. You can do the flattened nesting in a chained comprehension:
["{}_{}".format(k,v) for k,l in d.items() for v in l]
That method was also posted by Jean-François Fabre.
Use list comprehensions like this
d = {"test1":[1,2,3,],"test2":[4,5,6],"test3":[7,8,9]}
new_list = [str(item[0])+'_'+str(v) for item in d.items() for v in item[1]]
Output:
new_list:
['test1_1',
'test1_2',
'test1_3',
'test3_7',
'test3_8',
'test3_9',
'test2_4',
'test2_5',
'test2_6']
Let's initialize our data
In [1]: l0 = [1, 2, 3, 4]
In [2]: l1 = [10, 20, 30, 40]
In [3]: d = {'name0': l0, 'name1': l1}
Note that in my example, different from yours, the lists' content is not strings... aren't lists heterogeneous containers?
That said, you cannot simply join the keys and the list's items, you'd better cast these value to strings using the str(...) builtin.
Now it comes the solution to your problem... I use a list comprehension
with two loops, the outer loop comes first and it is on the items (i.e., key-value couples) in the dictionary, the inner loop comes second and it is on the items in the corresponding list.
In [4]: res = ['_'.join((str(k), str(i))) for k, l in d.items() for i in l]
In [5]: print(res)
['name0_1', 'name0_2', 'name0_3', 'name0_4', 'name1_10', 'name1_20', 'name1_30', 'name1_40']
In [6]:
In your case, using str(k)+'_'+str(i) would be fine as well, but the current idiom for joining strings with a fixed 'text' is the 'text'.join(...) method. Note that .join takes a SINGLE argument, an iterable, and hence in the list comprehension I used join((..., ...))
to collect the joinands in a single argument.

How to delete a dictionary from a list based on values from another dictionary in a list?

I've spent about 6 hours on this, so I feel justified in asking a question. :)
My problem
list1 = [
{extension: 1000, id: 1}
{extension: 1001, id: 1}
{extension: 1000, id: 2}
]
list2 = [
{Stationextension: 1000, id: 1, name: bob}
{Stationextension: 1001, id: 1, name: sal}
{Stationextension: 1000, id: 2, name: cindy}
]
My pseudo code is this.
Delete list2[d] (whole dictionary from list of dicts) where list1[d]['extension'] == list2[d]['Stationextension'] AND id == id.
List 1 is much smaller and has to stay unchanged relative to list 2 if that matters.
Using Python3.3, I have something like
[x for x in list2 if (list1[x]['stationExtension'] == x.get('extension'))]
Thanks !
list2 = [x for x in list2 if {'extension': x['Stationextension'], 'id': x['id']} not in list1]
or, if you know that the lengths and indexes will match:
list2 = [y for x,y in itertools.izip(list1, list2) if x['extension'] == y['StationExtension'] and x['id'] == y['id']]
The right way to do this is to use the right data structures for the job. Instead of figuring out how to search a list based on a key, use a dict with that key. For example:
dict1 = {x['extension']: x for x in list1}
Now, you can just write this:
[x for x in list2 if x['stationExtension'] in dict1}
Elsewhere in your question, it seems like you wanted to use the extension and the id, not just the extension, as a key. But that's just as easy; you can use a tuple:
dict1 = {(x['extension'], x['id']): x for x in list1}
[x for x in list2 if (x['stationExtension'], x['id']) in dict1}
If you want to know why your code doesn't work… let's look at it:
[x for x in list2 if (list1[x]['stationExtension'] == x.get('extension'))]
list1[x] is trying to use the dict as an index, which doesn't mean anything. What you really want to say is "if, for any element of list1, element['stationExtension'] == x.get('extension')". And you can translate that almost directly to Python:
[x for x in list2
if any(element['stationExtension'] == x.get('extension')
for element in list1)]
And of course to add the and element['id'] == x['id'] onto the end of the condition.
But again, the dict is a better solution. Besides being a whole lot simpler to read and write, it's also more efficient. Searching for any matches in a list requires checking each element in the list; searching for any matches in a dict just requires hashing the key and looking it up in a hash table. So, this way takes O(NM) time (where N is the length of list1, and M is the length of list2), while the dict only takes O(M) time.
Assuming that you want to delete items from list2 with the same value in "Stationextention" on list2 and "extention" on list 1 -
You better first create a set with all values to delete - that will avoid a linear search at each time you want to check if an entry will stay:
indexes = set(item["extention"] for item in list1)
And - one thing that happens is that you usually can't iterate over a list and delete items from it while at it:
new_list2 = []
for item in list2:
if item["Stationextention"] not in indexes:
new_list2.append(item)
list2 = new_list2
Python 2.7:
removelist=[]
for i,x in enumerate(list2):
if x['Stationextension'] == list1[i]['extension'] and x['id'] == list1[i]['id']:
removelist+=[i]
And once you've decided what to delete:
for x in removelist:
del list1[x]

iterating quickly through list of tuples

I wonder whether there's a quicker and less time consuming way to iterate over a list of tuples, finding the right match. What I do is:
# this is a very long list.
my_list = [ (old1, new1), (old2, new2), (old3, new3), ... (oldN, newN)]
# go through entire list and look for match
for j in my_list:
if j[0] == VALUE:
PAIR_FOUND = True
MATCHING_VALUE = j[1]
break
this code can take quite some time to execute, depending on the number of items in the list. I'm sure there's a better way of doing this.
I think that you can use
for j,k in my_list:
[ ... stuff ... ]
Assuming a bit more memory usage is not a problem and if the first item of your tuple is hashable, you can create a dict out of your list of tuples and then looking up the value is as simple as looking up a key from the dict. Something like:
dct = dict(tuples)
val = dct.get(key) # None if item not found else the corresponding value
EDIT: To create a reverse mapping, use something like:
revDct = dict((val, key) for (key, val) in tuples)
The question is dead but still knowing one more way doesn't hurt:
my_list = [ (old1, new1), (old2, new2), (old3, new3), ... (oldN, newN)]
for first,*args in my_list:
if first == Value:
PAIR_FOUND = True
MATCHING_VALUE = args
break
The code can be cleaned up, but if you are using a list to store your tuples, any such lookup will be O(N).
If lookup speed is important, you should use a dict to store your tuples. The key should be the 0th element of your tuples, since that's what you're searching on. You can easily create a dict from your list:
my_dict = dict(my_list)
Then, (VALUE, my_dict[VALUE]) will give you your matching tuple (assuming VALUE exists).
I wonder whether the below method is what you want.
You can use defaultdict.
>>> from collections import defaultdict
>>> s = [('red',1), ('blue',2), ('red',3), ('blue',4), ('red',1), ('blue',4)]
>>> d = defaultdict(list)
>>> for k, v in s:
d[k].append(v)
>>> sorted(d.items())
[('blue', [2, 4, 4]), ('red', [1, 3, 1])]

Categories