Python Set Comprehension Nested in Dict Comprehension

Python Set Comprehension Nested in Dict Comprehension - python

I have a list of tuples, where each tuple contains a string and a number in the form of:
[(string_1, num_a), (string_2, num_b), ...]
The strings are nonunique, and so are the numbers, e.g. (string_1 , num_m) or (string_9 , num_b) are likely to exist in the list.
I'm attempting to create a dictionary with the string as the key and a set of all numbers occurring with that string as the value:
dict = {string_1: {num_a, num_m}, string_2: {num_b}, ...}
I've done this somewhat successfully with the following dictionary comprehension with nested set comprehension:
#st_id_list = [(string_1, num_a), ...]
#st_dict = {string_1: {num_a, num_m}, ...}
st_dict = {
st[0]: set(
st_[1]
for st_ in st_id_list
if st_[0] == st[0]
)
for st in st_id_list
}
There's only one issue: st_id_list is 18,000 items long. This snippet of code takes less than ten seconds to run for a list of 500 tuples, but over twelve minutes to run for the full 18,000 tuples. I have to think this is because I've nested a set comprehension inside a dict comprehension.
Is there a way to avoid this, or a smarter way to it?

You have a double loop, so you take O(N**2) time to produce your dictionary. For 500 items, 250.000 steps are taken, and for your 18k items, 324 million steps need to be done.
Here is a O(N) loop instead, so 500 steps for your smaller dataset, 18.000 steps for the larger dataset:
st_dict = {}
for st, id in st_id_list:
st_dict.setdefault(st, set()).add(id)
This uses the dict.setdefault() method to ensure that for a given key (your string values), there is at least an empty set available if the key is missing, then adds the current id value to that set.
You can do the same with a collections.defaultdict() object:
from collections import defaultdict
st_dict = defaultdict(set)
for st, id in st_id_list:
st_dict[st].add(id)
The defaultdict() uses the factory passed in to set a default value for missing keys.
The disadvantage of the defaultdict approach is that the object continues to produce default values for missing keys after your loop, which can hide application bugs. Use st_dict.default_factory = None to disable the factory explicitly to prevent that.

Why are you using two loops when you could do in one loop like this:
list_1=[('string_1', 'num_a'), ('string_2', 'num_b'),('string_1' , 'num_m'),('string_9' , 'num_b')]
string_num={}
for i in list_1:
if i[0] not in string_num:
string_num[i[0]]={i[1]}
else:
string_num[i[0]].add(i[1])
print(string_num)
output:
{'string_9': {'num_b'}, 'string_1': {'num_a', 'num_m'}, 'string_2': {'num_b'}}

Related

Function that makes dict from string but swaps keys and values?

I'm trying to make a function that takes in list of strings as an input like the one listed below:
def swap_values_dict(['Summons: Bahamut, Shiva, Chocomog',
'Enemies: Bahamut, Shiva, Cactaur'])
and creates a dictionary from them using the words after the colons as keys and the words before the colons as values. I need to clarify that, at this point, there are only two strings in the list. I plan to split the strings into sublists and, from there, try and assign them to a dictionary.
The output should look like
{'Bahamut': ['Summons','Enemies'],'Shiva':['Summons','Enemies'],'Chocomog':['Summons'],'Cactaur':['Enemies']}
As you can see, the words after the colon in the original list have become keys while the words before the colon (categories) have become the values. If one of the values appears in both lists, it is assigned two values in the final dictionary. I would like to be able to make similar dictionaries out of many lists of different sizes, not just ones that contain two strings. Could this be done without list comprehension and only for loops and if statements?
What I've Tried So Far
title_list = []
for i in range(len(mobs)):#counts amount of strings in list
titles = (mobs[i].split(":"))[0] #gets titles from list using split
title_list.append(titles)
title_list
this code returns ['Summons', 'Enemies'] which aren't the results I wanted to receive but I think they could help me write the function. I had planned on separating the keys and values into separate lists and then zipping them together afterwards as a dictionary.

Try:
def swap_values_dict(lst):
tmp = {}
for s in lst:
k, v = map(str.strip, s.split(":"))
tmp[k] = list(map(str.strip, v.split(",")))
out = {}
for k, v in tmp.items():
for i in v:
out.setdefault(i, []).append(k)
return out
print(
swap_values_dict(
[
"Summons: Bahamut, Shiva, Chocomog",
"Enemies: Bahamut, Shiva, Cactaur",
]
)
)
Prints:
{
"Bahamut": ["Summons", "Enemies"],
"Shiva": ["Summons", "Enemies"],
"Chocomog": ["Summons"],
"Cactaur": ["Enemies"],
}

I'd use a defaultdict. It saves you the trouble of manually checking if a key exists in your dictionary and constructing a new empty list, making for a rather concise function:
from collections import defaultdict
def swap_values_dict(mobs):
result = defaultdict(list)
for elem in mobs:
role, members = elem.split(': ')
for m in members.split(', '):
result[m].append(role)
return result

Pythonic way to get the index of element from a list of dicts depending on multiple keys

I am very new to python, and I have the following problem. I came up with the following solution. I am wondering whether it is "pythonic" or not. If not, what would be the best solution ?
The problem is :
I have a list of dict
each dict has at least three items
I want to find the position in the list of the dict with specific three values
This is my python example
import collections
import random
# lets build the list, for the example
dicts = []
dicts.append({'idName':'NA','idGroup':'GA','idFamily':'FA'})
dicts.append({'idName':'NA','idGroup':'GA','idFamily':'FB'})
dicts.append({'idName':'NA','idGroup':'GB','idFamily':'FA'})
dicts.append({'idName':'NA','idGroup':'GB','idFamily':'FB'})
dicts.append({'idName':'NB','idGroup':'GA','idFamily':'FA'})
dicts.append({'idName':'NB','idGroup':'GA','idFamily':'FB'})
dicts.append({'idName':'NB','idGroup':'GB','idFamily':'FA'})
dicts.append({'idName':'NB','idGroup':'GB','idFamily':'FB'})
# let's shuffle it, again for example
random.shuffle(dicts)
# now I want to have for each combination the index
# I use a recursive defaultdict definition
# because it permits creating a dict of dict
# even if it is not initialized
def tree(): return collections.defaultdict(tree)
# initiate mapping
mapping = tree()
# fill the mapping
for i,d in enumerate(dicts):
idFamily = d['idFamily']
idGroup = d['idGroup']
idName = d['idName']
mapping[idName][idGroup][idFamily] = i
# I end up with the mapping providing me with the index within
# list of dicts

Looks reasonable to me, but perhaps a little too much. You could instead do:
mapping = {
(d['idName'], d['idGroup'], d['idFamily']) : i
for i, d in enumerate(dicts)
}
Then access it with mapping['NA', 'GA', 'FA'] instead of mapping['NA']['GA']['FA']. But it really depends how you're planning to use the mapping. If you need to be able to take mapping['NA'] and use it as a dictionary then what you have is fine.

Checking items in a list of dictionaries in python

I have a list of dictionaries=
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]
"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)

You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:
a = {
1: {'VALUE': 2, 'foo': 'bar'},
42: {'VALUE': 45, 'spam': 'eggs'},
}
Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.
Now you can simply look up if a key exists:
if someid in a:
a[someid]['VALUE'] = newvalue
I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.
A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.
If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.

If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:
items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]
def set_value_by_id(id, value):
# Try to find the item, if it exists
for item in items:
if item["ID"] == id:
break
# Make and append the item if it doesn't exist
else: # Here, `else` means "if the loop terminated not via break"
item = {"ID": id}
items.append(id)
# In either case, set the value
item["VALUE"] = value
* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.

Convert your list into a dict and then checking for values is much more efficient.
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
if new_key not in d:
d[new_key] = new_value

Also need to update on key found:
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
d.setdefault(new_key, 0)
d[new_key] = new_value

Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.
I haven't tried if it ends up faster though.
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2
tmp = filter(lambda d: d['ID']==id, a)
# the filter will either return an empty list, or a list of one item.
if not tmp:
tmp = {"ID":id, "VALUE":"default"}
a.append(tmp)
else:
tmp = tmp[0]
# tmp is bound to the found/new dictionary

Creating (seeding) large dictionaries efficiently in Python

I have a long (500K+ rows) two column spreadsheet that looks like this:
Name Code
1234 A
1234 B
1456 C
4556 A
4556 B
4556 C
...
So there is an element (with a Name) that can have a number of Codes. But instead of one row per code, I would like to a list of all codes that occur for each element. What I want is a dictionary like this:
{"1234":["A","B"],"1456":["C"],"4556":["A","B","C"] ...]}
What I have tried is this (and I'm not including the file reading syntax).
codelist = {}
for row in rows:
name,code = well.split()
if name in codelist.keys():
codelist[name].append(code)
else:
codelist[name] = [code]
This creates the right output but progress becomes incredibly slow. So I've tried priming my dictionary with keys:
allnames = [.... list of all the names ...]
codelist = dict.fromkeys(allnames)
for row in rows:
name,code = well.split()
if codelist[name]:
codelist[name].append(code)
else:
codelist[name] = [code]
This is dramatically faster, and my question is why? Doesn't the program each time still have to search all the keys in the dict? Is there another way to speed up the dict search that doesn't include traversing a tree?
Interesting is the error I get when I use the same conditional check as before (if name in codelist.keys():) after priming my dictionary.
Traceback (most recent call last):
File ....
codelist[name].append(code)
AttributeError: 'NoneType' object has no attribute 'append'
Now, there is a key but no list to append to. So I use codelist[name] which is <NoneType> as well and appears to work. What does it mean when mydict["primed key"] is <NoneType> ?enter code here

The former one is slower because .keys() has to create a list of all keys in memory first and then the in operator performs a search on it. So, it is an O(N) search for each line from the text file, hence it is slow.
On the other hand a simple key in dict search takes O(1) time.
dict.fromkeys(allnames)
The default value assigned by dict.fromkeys is None, so you can't use append on it.
>>> d = dict.fromkeys('abc')
>>> d
{'a': None, 'c': None, 'b': None}
A better solution will be to use collections.defaultdict here, in case that is not an option then use a normal dict with either a simple if-else check or dict.setdefault.
In Python3 .keys() returns a View Object, so time complexity may differ there. But, it is still going to be slightly slower than normal key in dict search.

You might want to have a look at the defaultdict container to avoid checks
from collections import defaultdict
allnames [.... list of all the names ...]
codelist = defaultdict(list)
for row in rows:
name,code = well.split()
codelist[name].append(code)

How to compare an element of a tuple (int) to determine if it exists in a list

I have the two following lists:
# List of tuples representing the index of resources and their unique properties
# Format of (ID,Name,Prefix)
resource_types=[('0','Group','0'),('1','User','1'),('2','Filter','2'),('3','Agent','3'),('4','Asset','4'),('5','Rule','5'),('6','KBase','6'),('7','Case','7'),('8','Note','8'),('9','Report','9'),('10','ArchivedReport',':'),('11','Scheduled Task',';'),('12','Profile','<'),('13','User Shared Accessible Group','='),('14','User Accessible Group','>'),('15','Database Table Schema','?'),('16','Unassigned Resources Group','#'),('17','File','A'),('18','Snapshot','B'),('19','Data Monitor','C'),('20','Viewer Configuration','D'),('21','Instrument','E'),('22','Dashboard','F'),('23','Destination','G'),('24','Active List','H'),('25','Virtual Root','I'),('26','Vulnerability','J'),('27','Search Group','K'),('28','Pattern','L'),('29','Zone','M'),('30','Asset Range','N'),('31','Asset Category','O'),('32','Partition','P'),('33','Active Channel','Q'),('34','Stage','R'),('35','Customer','S'),('36','Field','T'),('37','Field Set','U'),('38','Scanned Report','V'),('39','Location','W'),('40','Network','X'),('41','Focused Report','Y'),('42','Escalation Level','Z'),('43','Query','['),('44','Report Template ','\\'),('45','Session List',']'),('46','Trend','^'),('47','Package','_'),('48','RESERVED','`'),('49','PROJECT_TEMPLATE','a'),('50','Attachments','b'),('51','Query Viewer','c'),('52','Use Case','d'),('53','Integration Configuration','e'),('54','Integration Command f'),('55','Integration Target','g'),('56','Actor','h'),('57','Category Model','i'),('58','Permission','j')]
# This is a list of resource ID's that we do not want to reference directly, ever.
unwanted_resource_types=[0,1,3,10,11,12,13,14,15,16,18,20,21,23,25,27,28,32,35,38,41,47,48,49,50,57,58]
I'm attempting to compare the two in order to build a third list containing the 'Name' of each unique resource type that currently exists in unwanted_resource_types. e.g. The final result list should be:
result = ['Group','User','Agent','ArchivedReport','ScheduledTask','...','...']
I've tried the following that (I thought) should work:
result = []
for res in resource_types:
if res[0] in unwanted_resource_types:
result.append(res[1])
and when that failed to populate result I also tried:
result = []
for res in resource_types:
for type in unwanted_resource_types:
if res[0] == type:
result.append(res[1])
also to no avail. Is there something i'm missing? I believe this would be the right place to perform list comprehension, but that's still in my grey basket of understanding fully (The Python docs are a bit too succinct for me in this case).
I'm also open to completely rethinking this problem, but I do need to retain the list of tuples as it's used elsewhere in the script. Thank you for any assistance you may provide.

Your resource types are using strings, and your unwanted resources are using ints, so you'll need to do some conversion to make it work.
Try this:
result = []
for res in resource_types:
if int(res[0]) in unwanted_resource_types:
result.append(res[1])
or using a list comprehension:
result = [item[1] for item in resource_types if int(item[0]) in unwanted_resource_types]

The numbers in resource_types are numbers contained within strings, whereas the numbers in unwanted_resource_types are plain numbers, so your comparison is failing. This should work:
result = []
for res in resource_types:
if int( res[0] ) in unwanted_resource_types:
result.append(res[1])

The problem is that your triples contain strings and your unwanted resources contain numbers, change the data to
resource_types=[(0,'Group','0'), ...
or use int() to convert the strings to ints before comparison, and it should work. Your result can be computed with a list comprehension as in
result=[rt[1] for rt in resource_types if int(rt[0]) in unwanted_resource_types]
If you change ('0', ...) into (0, ... you can leave out the int() call.
Additionally, you may change the unwanted_resource_types variable into a set, like
unwanted_resource_types=set([0,1,3, ... ])
to improve speed (if speed is an issue, else it's unimportant).

The one-liner:
result = map(lambda x: dict(map(lambda a: (int(a[0]), a[1]), resource_types))[x], unwanted_resource_types)
without any explicit loop does the job.
Ok - you don't want to use this in production code - but it's fun. ;-)
Comment:
The inner dict(map(lambda a: (int(a[0]), a[1]), resource_types)) creates a dictionary from the input data:
{0: 'Group', 1: 'User', 2: 'Filter', 3: 'Agent', ...
The outer map chooses the names from the dictionary.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Set Comprehension Nested in Dict Comprehension - python

Related

Function that makes dict from string but swaps keys and values?

Pythonic way to get the index of element from a list of dicts depending on multiple keys

Checking items in a list of dictionaries in python

Creating (seeding) large dictionaries efficiently in Python

How to compare an element of a tuple (int) to determine if it exists in a list

Categories

Resources