Adding items to sets in a dictionary

Adding items to sets in a dictionary - python

I have a list of dictionaries that maps different IDs to a central ID. I have a document with these different IDs associated with terms. I have created a function that now has a key the central ID from the different IDs in the document. The goFile is the document where in the first column there's an ID and in the second one there's a GOterm. The mappingList is a list containing dictionaries in which the ID in the goFile is mapped to a main ID.
My expected output is a dictionary with a main ID as a key and a set with the go terms associated with it as value.
def parseGO(mappingList, goFile):
# open the file
file = open(goFile)
# this will be the dictionary that this function returns
# entries will have as a key an Ensembl ID
# and the value will be a set of GO terms
GOdict = {}
GOset = set()
for line in file:
splitline = line.split(' ')
GO_term = splitline[1]
value_ID = splitline[0]
for dict in mappingList:
if value_ID in dict:
ENSB_term = dict[value_ID]
#my best try
for dict in mappingList:
for key in GOdict.keys():
if value_ID in dict and key == dict[value_ID]:
GOdict[ENSB_term].add(GO_term)
GOdict[ENSB_term] = GOset
return GOdict
My problem is that now I have to add to the central ID in my GOdict the terms that are associated in the document to the different IDs. To avoid duplicates i use a set (GOset). How do I do it? All my try end having all the terms mapped to all the main IDs.
Some sample:
mappingList = [{'1234': 'mainID1', '456': 'mainID2'}, {'789': 'mainID2'}]
goFile:
1234 GOTERM1
1234 GOTERM2
456 GOTERM1
456 GOTERM3
789 GOTERM1
expected output:
GOdict = {'mainID1': set([GOTERM1, GOTERM2]), 'mainID2': set([GOTERM1, GOTERM3])}

First off, you shouldn't use the variable name 'dict', as it shadows the built-in dict class, and will cause you problems at some point.
The following should work for you:
from collections import defaultdict
def parse_go(mapping_list, go_file):
go_dict = defaultdict(set)
with open(go_file) as f: # Better garbage handling using 'with'
for line in f:
(value_id, go_term) = line.split() # Feel free to change the split behaviour
# work better for you.
for map_dict in mapping_list:
if value_id in map_dict:
go_dict[map_dict[value_id]].add(go_term)
return go_dict
The code is fairly straightforward, but here's a breakdown anyway.
We use a default dictionary instead of a normal dictionary so we can eliminate all that if in or setdefault() boilerplate.
For each line in the file, we check if the first item (value_id) is a key in any of the mapping dictionaries, and if so, adds the lines second item (go_term) to that value_id's set in the dictionary.
EDIT: Request for doing this without defaultdict(). Assume that go_dict is just a normal dictionary (go_dict = {}), your for loop would look like:
for map_dict in mapping_list:
if value_id in map_dict:
esnb_entry = go_dict.setdefault(map_dict[value_id], set())
esnb_entry.add(go_term)

Related

How to create a dictionary whose values are sets?

I'm working on an exercise that requires me to build two dictionaries, one whose keys are country names, and the values are the GDP. This part works fine.
The second dictionary is where I'm lost, as the keys are supposed to be the letters A‐Z and the values are sets of country names. I tried using a for loop, which I've commented on below, where the issue lies.
If the user enters a string with only one letter (like A), the program should print all the countries that begin with that letter. When you run the program, however, it only prints out one country for each letter.
The text file contains 228 lines. ie:
1:Qatar:98900
2:Liechtenstein:89400
3:Luxembourg:80600
4:Bermuda:69900
5:Singapore:59700
6:Jersey:57000
etc.
And here's my code.
initials = []
countries=[]
incomes=[]
dictionary={}
dictionary_2={}
keywordFile = open("raw.txt", "r")
for line in keywordFile:
line = line.upper()
line = line.strip("\n")
line = line.split(":")
initials.append(line[1][0]) # first letter of second element
countries.append(line[1])
incomes.append(line[2])
for i in range(0,len(countries)):
dictionary[countries[i]] = incomes[i]
this for loop should spit out 248 values (one for each country), where the key is the initial and the value is the country name. However, it only spits out 26 values (one country for each letter in the alphabet)
for i in range(0,len(countries)):
dictionary_2[initials[i]] = countries[i]
print(dictionary_2)
while True:
inputS = str(input('Enter an initial or a country name.'))
if inputS in dictionary:
value = dictionary.get(inputS, "")
print("The per capita income of {} is {}.".format((inputS.title()), value ))
elif inputS in dictionary_2:
value = dictionary_2.get(inputS)
print("The countries that begin with the letter {} are: {}.".format(inputS, (value.title())))
elif inputS.lower() in "quit":
break
else:
print("Does not exit.")
print("End of session.")
I'd appreciate any input leading me in the right direction.

Use defaultdict to make sure each value of your initials dict is a set, and then use the add method. If you just use = you'll be overwriting the initial keys value each time, defaultdict is an easier way of using an expression like:
if initial in dict:
dict[initial].add(country)
else:
dict[initial] = {country}
See the full working example below, and also note that i'm using enumerate instead of range(0,len(countries)), which i'd also recommend:
#!/usr/bin/env python3
from collections import defaultdict
initials, countries, incomes = [],[],[]
dict1 = {}
dict2 = defaultdict(set)
keywordFile = """
1:Qatar:98900
2:Liechtenstein:89400
3:Luxembourg:80600
4:Bermuda:69900
5:Singapore:59700
6:Jersey:57000
""".split("\n\n")
for line in keywordFile:
line = line.upper().strip("\n").split(":")
initials.append(line[1][0])
countries.append(line[1])
incomes.append(line[2])
for i,country in enumerate(countries):
dict1[country] = incomes[i]
dict2[initials[i]].add(country)
print(dict2["L"])
Result:
{'LUXEMBOURG', 'LIECHTENSTEIN'}
see: https://docs.python.org/3/library/collections.html#collections.defaultdict

The values for dictionary2 should be such that they can contain a list of countries. One option is to use a list as the values in your dictionary. In your code, you are overwriting the values for each key whenever a new country with the same initial is to be added as the value.
Moreover, you can use the setdefault method of the dictionary type. This code:
dictionary2 = {}
for country in countries:
dictionary2.setdefault(country[0], []).append(country)
should be enough to create the second dictionary elegantly.
setdefault, either returns the value for the key (in this case the key is set to the first letter of the country name) if it already exists, or inserts a new key (again, the first letter of the country) into the dictionary with a value that is an empty set [].
edit
if you want your values to be set (for faster lookup/membership test), you can use the following lines:
dictionary2 = {}
for country in countries:
dictionary2.setdefault(country[0], set()).add(country)

Here's a link to a live functioning version of the OP's code online.
The keys in Python dict objects are unique. There can only ever be one 'L' key a single dict. What happens in your code is that first the key/value pair 'L':'Liechtenstein' is inserted into dictionary_2. However, in a subsequent iteration of the for loop, 'L':'Liechtenstein' is overwritten by 'L':Luxembourg. This kind of overwriting is sometimes referred to as "clobbering".
Fix
One way to get the result that you seem to be after would be to rewrite that for loop:
for i in range(0,len(countries)):
dictionary_2[initials[i]] = dictionary_2.get(initials[i], set()) | {countries[i]}
print(dictionary_2)
Also, you have to rewrite the related elif statement beneath that:
elif inputS in dictionary_2:
titles = ', '.join([v.title() for v in dictionary_2[inputS]])
print("The countries that begin with the letter {} are: {}.".format(inputS, titles))
Explanation
Here's a complete explanation of the dictionary_2[initials[i]] = dictionary_2.get(initials[i], set()) | {countries[i]} line above:
dictionary_2.get(initials[i], set())
If initials[i] is a key in dictionary_2, this will return the associated value. If initials[i] is not in the dictionary, it will return the empty set set() instead.
{countries[i]}
This creates a new set with a single member in it, countries[i].
dictionary_2.get(initials[i], set()) | {countries[i]}
The | operator adds all of the members of two sets together and returns the result.
dictionary_2[initials[i]] = ...
The right hand side of the line either creates a new set, or adds to an existing one. This bit of code assigns that newly created/expanded set back to dictionary_2.
Notes
The above code sets the values of dictionary_2 as sets. If you want to use list values, use this version of the for loop instead:
for i in range(0,len(countries)):
dictionary_2[initials[i]] = dictionary_2.get(initials[i], []) + [countries[i]]
print(dictionary_2)

You're very close to what you're looking for, You could populate your dictionaries respectively while looping over the contents of the file raw.txt that you're reading. You can also read the contents of the file first and then perform the necessary operations to populate the dictionaries. You could achieve your requirement with nice oneliners in python using dict comprehensions and groupby. Here's an example:
country_per_capita_dict = {}
letter_countries_dict = {}
keywordFile = [line.strip() for line in open('raw.txt' ,'r').readlines()]
You now have a list of all lines in the keywordFile as follows:
['1:Qatar:98900', '2:Liechtenstein:89400', '3:Luxembourg:80600', '4:Bermuda:69900', '5:Singapore:59700', '6:Jersey:57000', '7:Libya:1000', '8:Sri Lanka:5000']
As you loop over the items, you can split(':') and use the [1] and [2] index values as required.
You could use dictionary comprehension as follows:
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
Which results in:
{'Qatar': '98900', 'Libya': '1000', 'Singapore': '59700', 'Luxembourg': '80600', 'Liechtenstein': '89400', 'Bermuda': '69900', 'Jersey': '57000'}
Similarly using groupby from itertools you can obtain:
from itertools import groupby
country_list = country_per_capita_dict.keys()
country_list.sort()
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
Which results in the required dictionary of initial : [list of countries]
{'Q': ['Qatar'], 'S': ['Singapore'], 'B': ['Bermuda'], 'L': ['Luxembourg', 'Liechtenstein'], 'J': ['Jersey']}
A complete example is as follows:
from itertools import groupby
country_per_capita_dict = {}
letter_countries_dict = {}
keywordFile = [line.strip() for line in open('raw.txt' ,'r').readlines()]
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
country_list = country_per_capita_dict.keys()
country_list.sort()
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
print (country_per_capita_dict)
print (letter_countries_dict)
Explanation:
The line:
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
loops over the following list
['1:Qatar:98900', '2:Liechtenstein:89400', '3:Luxembourg:80600', '4:Bermuda:69900', '5:Singapore:59700', '6:Jersey:57000', '7:Libya:1000', '8:Sri Lanka:5000'] and splits each entry in the list by :
It then takes the value at index [1] and [2] which are the country names and the per capita value and makes them into a dictionary.
country_list = country_per_capita_dict.keys()
country_list.sort()
This line, extracts the name of all the countries from the dictionary created before into a list and sorts them alphabetically for groupby to work correctly.
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
This lambda expression takes the input as the list of countries and groups together the names of countries where each x starts with x[0] into list(g).

Updating a dictionary with integer keys

I'm working on a short assignment where I have to read in a .txt file and create a dictionary in which the keys are the number of words in a sentence and the values are the number of sentences of a particular length. I've read in the file and determined the length of each sentence already, but I'm having troubles creating the dictionary.
I've already initialized the dictionary and am trying to update it (within a for loop that iterates over the sentences) using the following code:
for snt in sentences:
words = snt.split(' ')
sDict[len(words)]+=1
It gives me a KeyError on the very first iteration. I'm sure it has to do with my syntax but I'm not sure how else to update an existing entry in the dictionary.

When you initialize the dictionary, it starts out empty. The next thing you do is look up a key so that you can update its value, but that key doesn't exist yet, because the dictionary is empty. The smallest change to your code is probably to use the get dictionary method. Instead of this:
sDict[len(words)]+=1
Use this:
sDict[len(words)] = sDict.get(len(words), 0) + 1
The get method looks up a key, but if the key doesn't exist, you are given a default value. The default default value is None, and you can specify a different default value, which is the second argument, 0 in this case.
The better solution is probably collections.Counter, which handles the common use case of counting occurrences:
import collections
s = map(str.split, sentences)
sDict = collections.Counter(map(len, s))

defaultdicts were invented for this purpose:
from collections import defaultdict
sDict = defaultdict(int)
for snt in sentences:
sDict[len(snt.split())] += 1
If you are restricted to the use of pure dictionaries in the context of your assignment, then you need to test for existence of the key before incrementing its value in order to prevent a KeyError:
sDict = {}
for snt in sentences:
num_words = len(snt.split())
if num_words in sDict:
sDict[num_words] += 1
else:
sDict[num_words] = 1

Add multiple values to dictionary

Here is my code:
for response in responses["result"]:
ids = {}
key = response['_id'].encode('ascii')
print key
for value in response['docs']:
ids[key].append(value)
Traceback:
File "people.py", line 47, in <module>
ids[key].append(value)
KeyError: 'deanna'
I am trying to add multiple values to a key. Throws an error like above

Check out setdefault:
ids.setdefault(key, []).append(value)
It looks to see if key is in ids, and if not, sets that to be an empty list. Then it returns that list for you to inline call append on.
Docs:
http://docs.python.org/2/library/stdtypes.html#dict.setdefault

If I'm reading this correctly your intention is to map the _id of a response to its docs. In that case you can bring down everything you have above to a dict comprehension:
ids = {response['_id'].encode('ascii'): response['docs']
for response in responses['result']}
This also assumes you meant to have id = {} outside of the outermost loop, but I can't see any other reasonable interpretation.
If the above is not correct,
You can use collections.defaultdict
import collections # at top level
#then in your loop:
ids = collections.defaultdict(list) #instead of ids = {}
A dictionary whose default value will be created by calling the init argument, in this case calling list() will produce an empty list which can then be appended to.
To traverse the dictionary you can iterate over it's items()
for key, val in ids.items():
print(key, val)

The reason you're getting a KeyError is this: In the first iteration of your for loop, you look up the key in an empty dictionary. There is no such key, hence the KeyError.
The code you gave will work, if you first insert an empty list into the dictionary under to appropriate key. Then append the values to the list. Like so:
for response in responses["result"]:
ids = {}
key = response['_id'].encode('ascii')
print key
if key not in ids: ## <-- if we haven't seen key yet
ids[key] = [] ## <-- insert an empty list into the dictionary
for value in response['docs']:
ids[key].append(value)
The previous answers are correct. Both defaultdict and dictionary.setdefault are automatic ways of inserting the empty list.

Accessing nested values in nested dictionaries in Python 3.3

I'm writing in Python 3.3.
I have a set of nested dictionaries (shown below) and am trying to search using a key at the lowest level and return each of the values that correspond to the second level.
Patients = {}
Patients['PatA'] = {'c101':'AT', 'c367':'CA', 'c542':'GA'}
Patients['PatB'] = {'c101':'AC', 'c367':'CA', 'c573':'GA'}
Patients['PatC'] = {'c101':'AT', 'c367':'CA', 'c581':'GA'}
I'm trying to use a set of 'for loops' to search pull out the value attached to the c101 key in each Pat* dictionary nested under the main Patients dictionary.
This is what I have so far:
pat = 'PatA'
mutations = Patients[pat]
for Pat in Patients.values(): #iterate over the Pat* dictionaries
for mut in Pat.keys(): #iterate over the keys in the Pat* dictionaries
if mut == 'c101': #when the key in a Pat* dictionary matches 'c101'
print(Pat[mut].values()) #print the value attached to the 'c101' key
I get the following error, suggesting that my for loop returns each value as a string and that this can't then be used as a dictionary key to pull out the value.
Traceback (most recent call last):
File "filename", line 13, in
for mut in Pat.keys():
AttributeError: 'str' object has no attribute 'keys'
I think I'm missing something obvious to do with the dictionaries class, but I can't quite tell what it is. I've had a look through this question, but I don't think its quite what I'm asking.
Any advice would be greatly appreciated.

Patients.keys() gives you the list of keys in Patients dictionary (['PatA', 'PatC', 'PatB']) not the list of values hence the error. You can use dict.items to iterate over key: value pairs like this:
for patient, mutations in Patients.items():
if 'c101' in mutations.keys():
print(mutations['c101'])
To make your code working:
# Replace keys by value
for Pat in Patients.values():
# Iterate over keys from Pat dictionary
for mut in Pat.keys():
if mut == 'c101':
# Take value of Pat dictionary using
# 'c101' as a key
print(Pat['c101'])
If you want you can create list of mutations in simple one-liner:
[mutations['c101'] for p, mutations in Patients.items() if mutations.get('c101')]

Patients = {}
Patients['PatA'] = {'c101':'AT', 'c367':'CA', 'c542':'GA'}
Patients['PatB'] = {'c101':'AC', 'c367':'CA', 'c573':'GA'}
Patients['PatC'] = {'c101':'AT', 'c367':'CA', 'c581':'GA'}
for keys,values in Patients.iteritems():
# print keys,values
for keys1,values1 in values.iteritems():
if keys1 is 'c101':
print keys1,values1
#print values1

Converting dict values into a set while preserving the dict

I have a dict like this:
(100002: 'APPLE', 100004: 'BANANA', 100005: 'CARROT')
I am trying to make my dict have ints for the keys (as it does now) but have sets for the values (rather than strings as it is now.) My goal is to be able to read from a .csv file with one column for the key (an int which is the item id number) and then columns for things like size, shape, and color. I want to add this information into my dict so that only the information for keys already in dict are added.
My goal dict might look like this:
(100002: set(['APPLE','MEDIUM','ROUND','RED']), 100004: set(['Banana','MEDIUM','LONG','YELLOW']), 100005: set(['CARROT','MEDIUM','LONG','ORANGE'])
Starting with my dict of just key + string for item name, I tried code like this to read the extra information in from a .csv file:
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[int(spl_line[0])].update(spl_line[1:])
Unfortunately this errors out saying AttributeError: 'str' object has no attribute 'update'. My attempts to change my dictionary's values into sets so that I can then .update them have yielded things like this: (100002: set(['A','P','L','E']), 100004: set(['B','A','N']), 100005: set(['C','A','R','O','T']))
I want to convert the values to a set so that the string that is currently the value will be the first string in the set rather than breaking up the string into letters and making a set of those letters.
I also tried making the values a set when I create the dict by zipping two lists together but it didn't seem to make any difference. Something like this
MyDict = dict(zip(listofkeys, set(listofnames)))
still makes the whole listofnames list into a set but it doesn't achieve my goal of making each value in MyDict into a set with the corresponding string from listofnames as the first string in the set.
How can I make the values in MyDict into a set so that I can add additional strings to that set without turning the string that is currently the value in the dict into a set of individual letters?
EDIT:
I currently make MyDict by using one function to generate a list of item ids (which are the keys) and another function which looks up those item ids to generate a list of corresponding item names (using a two column .csv file as the data source) and then I zip them together.
ANSWER:
Using the suggestions here I came up with this solution. I found that the section that has set()).update can easily be changed to list()).append to yield a list rather than a set (so that the order is preserved.) I also found it easier to update by .csv data input files by adding the column containing names to the FileWithTheData.csv so that I didn't have to mess with making the dict, converting the values to sets, and then adding in more data. My code for this section now looks like this:
MyDict = {}
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in itemidlist: #note that this is the list I was formerly zipping together with a corresponding list of names to make my dict
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict

Your error is because originally your MyDict variable maps an integer to a string. When you are trying to update it you are treating the value like a set, when it is a string.
You can use a defaultdict for this:
combined_dict = defaultdict(set)
# first add all the values from MyDict
for key, value in MyDict.iteritems():
combined_dict[int(key)].add(value)
# then add the values from the file
infile = open('FileWithTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
combined_dict[int(sp_line[0])].update(spl_line[1:])

Your issue is with how you are initializing MyDict, try changing it to the following:
MyDict = dict(zip(listofkeys, [set([name]) for name in listofnames]))
Here is a quick example of the difference:
>>> listofkeys = [100002, 100004, 100005]
>>> listofnames = ['APPLE', 'BANANA', 'CARROT']
>>> dict(zip(listofkeys, set(listofnames)))
{100002: 'CARROT', 100004: 'APPLE', 100005: 'BANANA'}
>>> dict(zip(listofkeys, [set([name]) for name in listofnames]))
{100002: set(['APPLE']), 100004: set(['BANANA']), 100005: set(['CARROT'])}
set(listofnames) is just going to turn your list into a set, and the only effect that might have is to reorder the values as seen above. You actually want to take each string value in your list, and convert it to a one-element set, which is what the list comprehension does.
After you make this change, your current code should work fine, although you can just do the contains check directly on the dictionary instead of explicitly checking the keys (key in MyDict is the same as key in MyDict.keys()).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding items to sets in a dictionary - python

Related

How to create a dictionary whose values are sets?

Updating a dictionary with integer keys

Add multiple values to dictionary

Accessing nested values in nested dictionaries in Python 3.3

Converting dict values into a set while preserving the dict

Categories

Resources