Building Nested dictionary in Python reading in line by line from file - python

The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp
dicty == {"A" : {"a" : 1, "b" : 1}}
The problem starts when I try to implement this on a big file, reading in line by line.
This is printing the content per line in a list:
['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']
I would like to end up with a nested dictionary (ignore decimals):
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.
This is my implementation:
def doubleDict(filename):
dicty = dict()
with open(filename, "r") as f:
row = 0
tmp = dict()
oldword = ""
for line in f:
values = line.rstrip().split(" ")
print(values)
if oldword == values[0]:
tmp[values[1]] = values[2]
else:
if oldword is not "":
dicty[oldword] = tmp
tmp.clear()
oldword = values[0]
tmp[values[1]] = values[2]
row += 1
if row % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:
{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},
which is clearly incorrect. What is wrong?

Use a collections.defaultdict() object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I used enumerate() to generate the line count here; much simpler than keeping a separate counter going.
Even without a defaultdict, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]; there is no need to keep the temp reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
All the defaultdict then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
we simply omit the if test as defaultdict will create a new dictionary for us if the key was not yet present.

While this isn't the ideal way to do things, you're pretty close to making it work.
Your main problem is that you're reusing the same tmp dictionary. After you insert it into dicty under the first key, you then clear it and start filling it with the new values. Replace tmp.clear() with tmp = {} to fix that, so you have a different dictionary for each key, instead of the same one for all keys.
Your second problem is that you're never storing the last tmp value in the dictionary when you reach the end, so add another dicty[oldword] = tmp after the for loop.
Your third problem is that you're checking if oldword is not "":. That may be true even if it's an empty string, because you're comparing identity, not equality. Just change that to if oldword:. (This one, you'll usually get away with, because small strings are usually interned and will usually share identity… but you shouldn't count on that.)
If you fix both of those, you get this:
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
I'm not sure how to turn this into the format you claim to want, because that format isn't even a valid dictionary. But hopefully this gets you close.
There are two simpler ways to do it:
Group the values with, e.g., itertools.groupby, then transform each group into a dict and insert it all in one step. This, like your existing code, requires that the input already be batched by values[0].
Use the dictionary as a dictionary. You can look up each key as it comes in and add to the value if found, create a new one if not. A defaultdict or the setdefault method will make this concise, but even if you don't know about those, it's pretty simple to write it out explicitly, and it'll still be less verbose than what you have now.
The second version is already explained very nicely in Martijn Pieters's answer.
The first can be written like this:
def doubleDict(s):
with open(filename, "r") as f:
rows = (line.rstrip().split(" ") for line in f)
return {k: {values[1]: values[2] for values in g}
for k, g in itertools.groupby(rows, key=operator.itemgetter(0))}
Of course that doesn't print out the dict so far after every 25 rows, but that's easy to add by turning the comprehension into an explicit loop (and ideally using enumerate instead of keeping an explicit row counter).

Related

Multiple dictionary list of values assignment with a single for loop for multiple keys

I want to create a dictionary with a list of values for multiple keys with a single for loop in Python3. For me, the time execution and memory footprint are of utmost importance since the file which my Python3 script is reading is rather long.
I have already tried the following simple script:
p_avg = []
p_y = []
m_avg = []
m_y = []
res_dict = {}
with open('/home/user/test', 'r') as f:
for line in f:
p_avg.append(float(line.split(" ")[5].split(":")[1]))
p_y.append(float(line.split(" ")[6].split(":")[1]))
m_avg.append(float(line.split(" ")[1].split(":")[1]))
m_avg.append(float(line.split(" ")[2].split(":")[1]))
res_dict['p_avg'] = p_avg
res_dict['p_y'] = p_y
res_dict['m_avg'] = m_avg
res_dict['m_y'] = mse_y
print(res_dict)
The format of my home/user/test file is:
n:1 m_avg:7588.39 m_y:11289.73 m_u:147.92 m_v:223.53 p_avg:9.33 p_y:7.60 p_u:26.43 p_v:24.64
n:2 m_avg:7587.60 m_y:11288.54 m_u:147.92 m_v:223.53 p_avg:9.33 p_y:7.60 p_u:26.43 p_v:24.64
n:3 m_avg:7598.56 m_y:11304.50 m_u:148.01 m_v:225.33 p_avg:9.32 p_y:7.60 p_u:26.43 p_v:24.60
.
.
.
The Python script shown above works but first it is too long and repetitive, second, I am not sure how efficient it is. I was eventually thinking to create the same with list-comprehensions. Something like that:
(res_dict['p_avg'], res_dict['p_y']) = [(float(line.split(" ")[5].split(":")[1]), float(line.split(" ")[6].split(":")[1])) for line in f]
But for all four dictionary keys. Do you think that using list comprehension could reduce the used memory footprint of the script and the speed of execution? What should be the right syntax for the list-comprehension?
[EDIT] I have changed the dict -> res_dict as it was mentioned that it is not a good practice, I have also fixed a typo, where the p_y wasn't pointing to the right value and added a print statement to print the resulting dictionary as mentioned by the other users.
You can make use of defaultdict. There is no need to split the line each time, and to make it more readable you can use a lambda to extract the fields for each item.
from collections import defaultdict
res = defaultdict(list)
with open('/home/user/test', 'r') as f:
for line in f:
items = line.split()
extract = lambda x: x.split(':')[1]
res['p_avg'].append(extract(items[5]))
res['p_y'].append(extract(items[6]))
res['m_avg'].append(extract(items[1]))
res['m_y'].append(extract(items[2]))
You can initialize your dict to contain the string/list pairs, and then append directly as you iterate through every line. Also, you don't want to keep calling split() on line on each iteration. Rather, just call once and save to a local variable and index from this variable.
# Initialize dict to contain string key and list value pairs
dictionary = {'p_avg':[],
'p_y':[],
'm_avg':[],
'm_y':[]
}
with open('/home/user/test', 'r') as f:
for line in f:
items = line.split() # store line.split() so you don't split multiple times per line
dictionary['p_avg'].append(float(items[5].split(':')[1]))
dictionary['p_y'].append(float(items[6].split(':')[1])) # I think you meant index 6 here
dictionary['m_avg'].append(float(items[1].split(':')[1]))
dictionary['m_y'].append(float(items[2].split(':')[1]))
You can just pre-define dict attributes:
d = {
'p_avg': [],
'p_y': [],
'm_avg': [],
'm_y': []
}
and then append directly to them:
with open('/home/user/test', 'r') as f:
for line in f:
splitted_line = line.split(" ")
d['p_avg'].append(float(splitted_line[5].split(":")[1]))
d['p_y'].append(float(splitted_line[5].split(":")[1]))
d['m_avg'].append(float(splitted_line[1].split(":")[1]))
d['m_avg'].append(float(splitted_line[2].split(":")[1]))
P.S. Never use variable names equal to built-in words, like dict, list etc. It can cause MANY various errors!

How to create a dictionary whose values are sets?

I'm working on an exercise that requires me to build two dictionaries, one whose keys are country names, and the values are the GDP. This part works fine.
The second dictionary is where I'm lost, as the keys are supposed to be the letters A‐Z and the values are sets of country names. I tried using a for loop, which I've commented on below, where the issue lies.
If the user enters a string with only one letter (like A), the program should print all the countries that begin with that letter. When you run the program, however, it only prints out one country for each letter.
The text file contains 228 lines. ie:
1:Qatar:98900
2:Liechtenstein:89400
3:Luxembourg:80600
4:Bermuda:69900
5:Singapore:59700
6:Jersey:57000
etc.
And here's my code.
initials = []
countries=[]
incomes=[]
dictionary={}
dictionary_2={}
keywordFile = open("raw.txt", "r")
for line in keywordFile:
line = line.upper()
line = line.strip("\n")
line = line.split(":")
initials.append(line[1][0]) # first letter of second element
countries.append(line[1])
incomes.append(line[2])
for i in range(0,len(countries)):
dictionary[countries[i]] = incomes[i]
this for loop should spit out 248 values (one for each country), where the key is the initial and the value is the country name. However, it only spits out 26 values (one country for each letter in the alphabet)
for i in range(0,len(countries)):
dictionary_2[initials[i]] = countries[i]
print(dictionary_2)
while True:
inputS = str(input('Enter an initial or a country name.'))
if inputS in dictionary:
value = dictionary.get(inputS, "")
print("The per capita income of {} is {}.".format((inputS.title()), value ))
elif inputS in dictionary_2:
value = dictionary_2.get(inputS)
print("The countries that begin with the letter {} are: {}.".format(inputS, (value.title())))
elif inputS.lower() in "quit":
break
else:
print("Does not exit.")
print("End of session.")
I'd appreciate any input leading me in the right direction.
Use defaultdict to make sure each value of your initials dict is a set, and then use the add method. If you just use = you'll be overwriting the initial keys value each time, defaultdict is an easier way of using an expression like:
if initial in dict:
dict[initial].add(country)
else:
dict[initial] = {country}
See the full working example below, and also note that i'm using enumerate instead of range(0,len(countries)), which i'd also recommend:
#!/usr/bin/env python3
from collections import defaultdict
initials, countries, incomes = [],[],[]
dict1 = {}
dict2 = defaultdict(set)
keywordFile = """
1:Qatar:98900
2:Liechtenstein:89400
3:Luxembourg:80600
4:Bermuda:69900
5:Singapore:59700
6:Jersey:57000
""".split("\n\n")
for line in keywordFile:
line = line.upper().strip("\n").split(":")
initials.append(line[1][0])
countries.append(line[1])
incomes.append(line[2])
for i,country in enumerate(countries):
dict1[country] = incomes[i]
dict2[initials[i]].add(country)
print(dict2["L"])
Result:
{'LUXEMBOURG', 'LIECHTENSTEIN'}
see: https://docs.python.org/3/library/collections.html#collections.defaultdict
The values for dictionary2 should be such that they can contain a list of countries. One option is to use a list as the values in your dictionary. In your code, you are overwriting the values for each key whenever a new country with the same initial is to be added as the value.
Moreover, you can use the setdefault method of the dictionary type. This code:
dictionary2 = {}
for country in countries:
dictionary2.setdefault(country[0], []).append(country)
should be enough to create the second dictionary elegantly.
setdefault, either returns the value for the key (in this case the key is set to the first letter of the country name) if it already exists, or inserts a new key (again, the first letter of the country) into the dictionary with a value that is an empty set [].
edit
if you want your values to be set (for faster lookup/membership test), you can use the following lines:
dictionary2 = {}
for country in countries:
dictionary2.setdefault(country[0], set()).add(country)
Here's a link to a live functioning version of the OP's code online.
The keys in Python dict objects are unique. There can only ever be one 'L' key a single dict. What happens in your code is that first the key/value pair 'L':'Liechtenstein' is inserted into dictionary_2. However, in a subsequent iteration of the for loop, 'L':'Liechtenstein' is overwritten by 'L':Luxembourg. This kind of overwriting is sometimes referred to as "clobbering".
Fix
One way to get the result that you seem to be after would be to rewrite that for loop:
for i in range(0,len(countries)):
dictionary_2[initials[i]] = dictionary_2.get(initials[i], set()) | {countries[i]}
print(dictionary_2)
Also, you have to rewrite the related elif statement beneath that:
elif inputS in dictionary_2:
titles = ', '.join([v.title() for v in dictionary_2[inputS]])
print("The countries that begin with the letter {} are: {}.".format(inputS, titles))
Explanation
Here's a complete explanation of the dictionary_2[initials[i]] = dictionary_2.get(initials[i], set()) | {countries[i]} line above:
dictionary_2.get(initials[i], set())
If initials[i] is a key in dictionary_2, this will return the associated value. If initials[i] is not in the dictionary, it will return the empty set set() instead.
{countries[i]}
This creates a new set with a single member in it, countries[i].
dictionary_2.get(initials[i], set()) | {countries[i]}
The | operator adds all of the members of two sets together and returns the result.
dictionary_2[initials[i]] = ...
The right hand side of the line either creates a new set, or adds to an existing one. This bit of code assigns that newly created/expanded set back to dictionary_2.
Notes
The above code sets the values of dictionary_2 as sets. If you want to use list values, use this version of the for loop instead:
for i in range(0,len(countries)):
dictionary_2[initials[i]] = dictionary_2.get(initials[i], []) + [countries[i]]
print(dictionary_2)
You're very close to what you're looking for, You could populate your dictionaries respectively while looping over the contents of the file raw.txt that you're reading. You can also read the contents of the file first and then perform the necessary operations to populate the dictionaries. You could achieve your requirement with nice oneliners in python using dict comprehensions and groupby. Here's an example:
country_per_capita_dict = {}
letter_countries_dict = {}
keywordFile = [line.strip() for line in open('raw.txt' ,'r').readlines()]
You now have a list of all lines in the keywordFile as follows:
['1:Qatar:98900', '2:Liechtenstein:89400', '3:Luxembourg:80600', '4:Bermuda:69900', '5:Singapore:59700', '6:Jersey:57000', '7:Libya:1000', '8:Sri Lanka:5000']
As you loop over the items, you can split(':') and use the [1] and [2] index values as required.
You could use dictionary comprehension as follows:
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
Which results in:
{'Qatar': '98900', 'Libya': '1000', 'Singapore': '59700', 'Luxembourg': '80600', 'Liechtenstein': '89400', 'Bermuda': '69900', 'Jersey': '57000'}
Similarly using groupby from itertools you can obtain:
from itertools import groupby
country_list = country_per_capita_dict.keys()
country_list.sort()
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
Which results in the required dictionary of initial : [list of countries]
{'Q': ['Qatar'], 'S': ['Singapore'], 'B': ['Bermuda'], 'L': ['Luxembourg', 'Liechtenstein'], 'J': ['Jersey']}
A complete example is as follows:
from itertools import groupby
country_per_capita_dict = {}
letter_countries_dict = {}
keywordFile = [line.strip() for line in open('raw.txt' ,'r').readlines()]
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
country_list = country_per_capita_dict.keys()
country_list.sort()
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
print (country_per_capita_dict)
print (letter_countries_dict)
Explanation:
The line:
country_per_capita_dict = {entry.split(':')[1] : entry.split(':')[2] for entry in keywordFile}
loops over the following list
['1:Qatar:98900', '2:Liechtenstein:89400', '3:Luxembourg:80600', '4:Bermuda:69900', '5:Singapore:59700', '6:Jersey:57000', '7:Libya:1000', '8:Sri Lanka:5000'] and splits each entry in the list by :
It then takes the value at index [1] and [2] which are the country names and the per capita value and makes them into a dictionary.
country_list = country_per_capita_dict.keys()
country_list.sort()
This line, extracts the name of all the countries from the dictionary created before into a list and sorts them alphabetically for groupby to work correctly.
letter_countries_dict = {k: list(g) for k,g in groupby(country_list, key=lambda x:x[0]) }
This lambda expression takes the input as the list of countries and groups together the names of countries where each x starts with x[0] into list(g).

How can I combine separate dictionary outputs from a function in one dictionary?

For our python project we have to solve multiple questions. We are however stuck at this one:
"Write a function that, given a FASTA file name, returns a dictionary with the sequence IDs as keys, and a tuple as value. The value denotes the minimum and maximum molecular weight for the sequence (sequences can be ambiguous)."
import collections
from Bio import Seq
from itertools import product
def ListMW(file_name):
seq_records = SeqIO.parse(file_name, 'fasta',alphabet=generic_dna)
for record in seq_records:
dictionary = Seq.IUPAC.IUPACData.ambiguous_dna_values
result = []
for i in product(*[dictionary[j] for j in record]):
result.append("".join(i))
molw = []
for sequence in result:
molw.append(SeqUtils.molecular_weight(sequence))
tuple= (min(molw),max(molw))
if min(molw)==max(molw):
dict={record.id:molw}
else:
dict={record.id:(min(molw), max(molw))}
print(dict)
Using this code we manage to get this output:
{'seq_7009': (6236.9764, 6367.049999999999)}
{'seq_418': (3716.3642000000004, 3796.4124000000006)}
{'seq_9143_unamb': [4631.958999999999]}
{'seq_2888': (5219.3359, 5365.4089)}
{'seq_1101': (4287.7417, 4422.8254)}
{'seq_107': (5825.695099999999, 5972.8073)}
{'seq_6946': (5179.3118, 5364.420900000001)}
{'seq_6162': (5531.503199999999, 5645.577399999999)}
{'seq_504': (4556.920899999999, 4631.959)}
{'seq_3535': (3396.1715999999997, 3446.1969999999997)}
{'seq_4077': (4551.9108, 4754.0073)}
{'seq_1626_unamb': [3724.3894999999998]}
As you can see this is not one dictionary but multiple dictionaries under each other. So is there anyway we can change our code or type an extra command to get it in this format:
{'seq_7009': (6236.9764, 6367.049999999999),
'seq_418': (3716.3642000000004, 3796.4124000000006),
'seq_9143_unamb': (4631.958999999999),
'seq_2888': (5219.3359, 5365.4089),
'seq_1101': (4287.7417, 4422.8254),
'seq_107': (5825.695099999999, 5972.8073),
'seq_6946': (5179.3118, 5364.420900000001),
'seq_6162': (5531.503199999999, 5645.577399999999),
'seq_504': (4556.920899999999, 4631.959),
'seq_3535': (3396.1715999999997, 3446.1969999999997),
'seq_4077': (4551.9108, 4754.0073),
'seq_1626_unamb': (3724.3894999999998)}
Or in someway manage to make clear that it should use the seq_ID ans key and the Molecular weight as a value for one dictionary?
Set a dictionnary right before your for loop, then update it during your loop such as :
import collections
from Bio import Seq
from itertools import product
def ListMW(file_name):
seq_records = SeqIO.parse(file_name, 'fasta',alphabet=generic_dna)
retDict = {}
for record in seq_records:
dictionary = Seq.IUPAC.IUPACData.ambiguous_dna_values
result = []
for i in product(*[dictionary[j] for j in record]):
result.append("".join(i))
molw = []
for sequence in result:
molw.append(SeqUtils.molecular_weight(sequence))
tuple= (min(molw),max(molw))
if min(molw)==max(molw):
retDict[record.id] = molw
else:
retDict[record.id] = (min(molw), max(molw))}
# instead of printing now, print in the end of your function / script
# print(dict)
Right now, you're setting a new dict at each turn of your loop, and print it. It is just a normal behaviour of your code to print lots and lots of dict.
you're creating a dictionary with 1 entry at each iteration.
You want to:
define a dict variable (better use dct to avoid reusing built-in type name) before your loop
rewrite the assignment to dict in the loop
So before the loop:
dct = {}
and in the loop (instead of your if + dict = code), in a ternary expression, with min & max computed only once:
minval = min(molw)
maxval = max(molw)
dct[record.id] = molw if minval == maxval else (minval,maxval)

Is there a special value that doesn't insert a key in a dictionary

Is there a way of assigning a special key to a dictionary that actually does nothing?
I want to do something like:
mydict = {}
key, value = 'foo', 'bar'
mydict[key] = value % now my dict has {'foo': 'bar'}
Now here I want some "special" value of key such that when I run:
mydict[key] = value
It doesn't actually do anything, so mydict is still {'foo': 'bar'} (no extra keys or values added)
I tried using:
d[None] = None # It actually adds {None: None} to the dict
d[] = [] # Invalid syntax
Why I need this:
Well it's basically to handle an initial case.
I have a file which is actually a FASTA format:
>id_3362
TGTCAGTGTTCCCCGTGGCCCTGCGGTTGGAATTGCAGCGGGTCGCTTTAGTTCTGGCAT
ATATTTTGACGGTGCCGGCCGGCGATACTGACGTGTGAGGACTTGAATTTGTACCAGCGC
AACACTTCCAAAGCCTGGACTAGGTTGT
>id_4743
CGGGGGATCTAATGTGGCTGCCACGGGTTGAAAAATGG
>id_5443
ATATTTTGACGGTGCCGGCCGGCGATACTGACGTGTGAGGACTTGAATTTGTACCAGCGC
AACACTTCCAAAGCCTGGACTAGGTTGT
My approach is to read line by line, concatenating the lines into a sequence until the next key is found (line starting with >).
Then I save the key (id) with the associated value (sequence) in a dictionary, update the key and start accumulating the next sequence.
Of course I can have a dedicated code (repeated) that handles the first case (which I think it's not a clean approach) or I can have an if inside the loop that reads each line (which will execute every time)
So the cleanest approach would be every time an id is found, save the previous id with the accumulated seq to the dictionay, but to handle the first line I need some special value for the key.
Here's my code:
def read_fasta(filename):
mydict = {}
id = None # this has to be the special character I'm looking for
seq = ''
with open(filename) as f:
for line in f:
if line[0] == '>':
mydict[id] = seq # save current id and seq
id = line[1:].rstrip('\n') # update id
seq = '' # clean seq
else:
seq += line.rstrip('\n') # accumulate seq
As you can see, in this code the first line will insert the value {None:''} to the dictionary.
I could of course delete this key at the very end, but I'm wondering if I can have an initial value that doesn't insert anything when executed.
Any suggestions?
You could of course do:
id = None
then:
if id is not None: mydict[id] = seq
If you want to avoid insertion without if testing, you could also use a non-hashable value at start.
id = []
then catch the "unhashable exception". That would work, although ugly, but no extra overhead because the exception is triggered only once.
try:
mydict[id] = seq
except TypeError:
pass
Aside: if speed is your concern then don't use string concatenation
seq += line.rstrip('\n')
is just horribly underperformant. Instead:
define seq as a list: seq = []
append lines to seq: seq.append(line.rstrip('\n'))
in the end create the final string: seq = "".join(seq)

TypeError: string indices must be integers, not dict - nested values in dict

I am trying to write a code with multiple keys and value in python dict. For example, my dict will look like this:
d={"p1": {"d1":{"python":14,"Programming":15}}, "p2": {"d1":{"python":14,"Programming":15}} }
Here I have a method that populates dictionary if a particular value does not exists.
Here is how the code looks like:
Unable to add Value to python dictionary and write to a file
I modified my function to accept 1 parameter. For example my parameter is
How do update the dictionary in this case?
I tried :
FunDictr(d1)
#say the calling function passed 'foo'
#means d1 = foo
with io.open("fileo.txt", "r", encoding="utf-8") as filei:
d = dict()
for line in filei:
words = line.strip().split()
for word in words:
if word in d:
d[d1][word] += 1
else:
d[d1][word] = 1
I am expecting to see {"foo": {"d1":{"python":14}}
When I write this, I get error:
d[d1][word] += 1
TypeError: string indices must be integers, not dict
Based on the sample you gave, it looks like you are looking for something a little like this.
def FunDictr(base_dictionary,string_key):
d = dict()
d[string_key] = dict()
d[string_key]["d1"] = dict()
with io.open("fileo.txt", "r", encoding="utf-8") as filei:
for line in filei:
words = line.strip().split()
for word in words:
if word in d[string_key]["d1"]:
d[string_key]["d1"][word] += 1
else:
d[string_key]["d1"][word] = 1
This will accept an initial dictionary, add a new item with the key passed to the function, create a nested dictionary as the value of that key with a key of the string d1, then create ANOTHER dictionary inside of that which holds the word counts.
Out of curiosity, what is the purpose of the extra layer of nesting? Why not shoot for.
{'foo':{'python':14}}
Also, I strongly suggest more descriptive variable names. It makes life easier :)

Categories