I'm working on a short assignment where I have to read in a .txt file and create a dictionary in which the keys are the number of words in a sentence and the values are the number of sentences of a particular length. I've read in the file and determined the length of each sentence already, but I'm having troubles creating the dictionary.
I've already initialized the dictionary and am trying to update it (within a for loop that iterates over the sentences) using the following code:
for snt in sentences:
words = snt.split(' ')
sDict[len(words)]+=1
It gives me a KeyError on the very first iteration. I'm sure it has to do with my syntax but I'm not sure how else to update an existing entry in the dictionary.
When you initialize the dictionary, it starts out empty. The next thing you do is look up a key so that you can update its value, but that key doesn't exist yet, because the dictionary is empty. The smallest change to your code is probably to use the get dictionary method. Instead of this:
sDict[len(words)]+=1
Use this:
sDict[len(words)] = sDict.get(len(words), 0) + 1
The get method looks up a key, but if the key doesn't exist, you are given a default value. The default default value is None, and you can specify a different default value, which is the second argument, 0 in this case.
The better solution is probably collections.Counter, which handles the common use case of counting occurrences:
import collections
s = map(str.split, sentences)
sDict = collections.Counter(map(len, s))
defaultdicts were invented for this purpose:
from collections import defaultdict
sDict = defaultdict(int)
for snt in sentences:
sDict[len(snt.split())] += 1
If you are restricted to the use of pure dictionaries in the context of your assignment, then you need to test for existence of the key before incrementing its value in order to prevent a KeyError:
sDict = {}
for snt in sentences:
num_words = len(snt.split())
if num_words in sDict:
sDict[num_words] += 1
else:
sDict[num_words] = 1
Related
I'm trying to create a running leaderboard in which each person starts with one point and I add to the key if they accomplish something. I'm not certain a dictionary is the best way to do it so recommendations are definitely welcomed.
I tried a list to begin with but a dictionary seemed to better suit my needs as I had lists inside of lists
myDict = {'person1' : 1 , 'person2' : 1 , 'person3' : 1}
If person1 were to do something i'd like their key to change to 2. I need to increment the keys, not assign a specific key. Also I will continually add entries to the dict for which I need their default value to be 1.
edit: Chris had a super helpful suggestion to use collections.defaultdict so that calling key that isn't in a dict adds it instead of returning a keyerror
A value can be added or changed or reassigned in a python dictionary by simply accessing through it's key
myDict[key] = value
In your case:
myDict["person1"] = 2 # Reassignment or changing
myDict["person1"] += 1 # Increementing
If the key doesn't exist, incrementing will be a problem. In that scenario, you need to check if the key is present or not.
if myDict["person5"]:
myDict["person5"] += 1
else:
myDict["person5"] = 1
Reference https://docs.python.org/3/tutorial/datastructures.html#dictionaries
Unless you want to do something like sorting players by scores at the end, a dictionary seems a good option. (You can do the sorting but have to have a workaround since dictionary is only indexed by its keys)
Otherwise you can do the following to update the scores
myDict = {}
person = '<person_name>'
# in case the person did something
if person in myDict:
myDict[person] += 1
else:
myDict[person] = 1
You can update a dictionary as follows:
>>> myDict = {'person1': 1, 'person2': 1}
>>> myDict['person7'] = 2
You may also want to investigate
import collections
myDict = collections.defaultdict(lambda: 1)
myDict['person7'] += 1
as this will automatically initialize unset values to 1 the first time they are read.
I have a dataframe which contains the below column:
column_name
CUVITRU 8 gram
CUVITRU 1 grams
I want to replace these gram and grams to gm. So I have created a dictionary
dict_ = {'gram':'gm','grams':'gm'}
I am able to replace it but it is converting grams to gms. Below is the column after conversion:
column_name
CUVITRU 8 gm
CUVITRU 1 gms
How can I solve this issue.
Below is my code:
dict_ = {'gram':'gm','grams':'gm'}
for key, value in dict_abbr.items():
my_string = my_string.replace(key,value)
my_string = ' '.join(unique_list(my_string.split()))
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
because it finds 'gram' in 'grams', one way is to instead of string use reg exp for replacement on word boundaries, like (r"\b%s\.... look at the answer usign .sub here for example: search-and-replace-with-whole-word-only-option
You don't actually care about the dict; you care about the key/value pairs produced by its items() method, so just store that in the first place. This lets you specify the order of replacements to try regardless of your Python version.
d = [('grams':'gm'), ('gram':'gm')]
for key, value in d:
my_string = my_string.replace(key,value)
You can make replacements in the reverse order of the key lengths instead:
dict_ = {'gram':'gm','grams':'gm'}
for key in sorted(dict_abbr, key=len, reverse=True):
my_string = my_string.replace(key, dict_[key])
Put the longer string grams before the shorter one gram like this {'grams':'gm','gram':'gm'}, and it will work.
Well, I’m using a recent python 3 like 3.7.2 which guarantees that the sequence of retrieving items is the same as that they are created in the dictionary. For earlier Pythons that may happen (and this appears to be the problem) but isn’t guaranteed.
I have a list of dictionaries that maps different IDs to a central ID. I have a document with these different IDs associated with terms. I have created a function that now has a key the central ID from the different IDs in the document. The goFile is the document where in the first column there's an ID and in the second one there's a GOterm. The mappingList is a list containing dictionaries in which the ID in the goFile is mapped to a main ID.
My expected output is a dictionary with a main ID as a key and a set with the go terms associated with it as value.
def parseGO(mappingList, goFile):
# open the file
file = open(goFile)
# this will be the dictionary that this function returns
# entries will have as a key an Ensembl ID
# and the value will be a set of GO terms
GOdict = {}
GOset = set()
for line in file:
splitline = line.split(' ')
GO_term = splitline[1]
value_ID = splitline[0]
for dict in mappingList:
if value_ID in dict:
ENSB_term = dict[value_ID]
#my best try
for dict in mappingList:
for key in GOdict.keys():
if value_ID in dict and key == dict[value_ID]:
GOdict[ENSB_term].add(GO_term)
GOdict[ENSB_term] = GOset
return GOdict
My problem is that now I have to add to the central ID in my GOdict the terms that are associated in the document to the different IDs. To avoid duplicates i use a set (GOset). How do I do it? All my try end having all the terms mapped to all the main IDs.
Some sample:
mappingList = [{'1234': 'mainID1', '456': 'mainID2'}, {'789': 'mainID2'}]
goFile:
1234 GOTERM1
1234 GOTERM2
456 GOTERM1
456 GOTERM3
789 GOTERM1
expected output:
GOdict = {'mainID1': set([GOTERM1, GOTERM2]), 'mainID2': set([GOTERM1, GOTERM3])}
First off, you shouldn't use the variable name 'dict', as it shadows the built-in dict class, and will cause you problems at some point.
The following should work for you:
from collections import defaultdict
def parse_go(mapping_list, go_file):
go_dict = defaultdict(set)
with open(go_file) as f: # Better garbage handling using 'with'
for line in f:
(value_id, go_term) = line.split() # Feel free to change the split behaviour
# work better for you.
for map_dict in mapping_list:
if value_id in map_dict:
go_dict[map_dict[value_id]].add(go_term)
return go_dict
The code is fairly straightforward, but here's a breakdown anyway.
We use a default dictionary instead of a normal dictionary so we can eliminate all that if in or setdefault() boilerplate.
For each line in the file, we check if the first item (value_id) is a key in any of the mapping dictionaries, and if so, adds the lines second item (go_term) to that value_id's set in the dictionary.
EDIT: Request for doing this without defaultdict(). Assume that go_dict is just a normal dictionary (go_dict = {}), your for loop would look like:
for map_dict in mapping_list:
if value_id in map_dict:
esnb_entry = go_dict.setdefault(map_dict[value_id], set())
esnb_entry.add(go_term)
I am trying to write a code with multiple keys and value in python dict. For example, my dict will look like this:
d={"p1": {"d1":{"python":14,"Programming":15}}, "p2": {"d1":{"python":14,"Programming":15}} }
Here I have a method that populates dictionary if a particular value does not exists.
Here is how the code looks like:
Unable to add Value to python dictionary and write to a file
I modified my function to accept 1 parameter. For example my parameter is
How do update the dictionary in this case?
I tried :
FunDictr(d1)
#say the calling function passed 'foo'
#means d1 = foo
with io.open("fileo.txt", "r", encoding="utf-8") as filei:
d = dict()
for line in filei:
words = line.strip().split()
for word in words:
if word in d:
d[d1][word] += 1
else:
d[d1][word] = 1
I am expecting to see {"foo": {"d1":{"python":14}}
When I write this, I get error:
d[d1][word] += 1
TypeError: string indices must be integers, not dict
Based on the sample you gave, it looks like you are looking for something a little like this.
def FunDictr(base_dictionary,string_key):
d = dict()
d[string_key] = dict()
d[string_key]["d1"] = dict()
with io.open("fileo.txt", "r", encoding="utf-8") as filei:
for line in filei:
words = line.strip().split()
for word in words:
if word in d[string_key]["d1"]:
d[string_key]["d1"][word] += 1
else:
d[string_key]["d1"][word] = 1
This will accept an initial dictionary, add a new item with the key passed to the function, create a nested dictionary as the value of that key with a key of the string d1, then create ANOTHER dictionary inside of that which holds the word counts.
Out of curiosity, what is the purpose of the extra layer of nesting? Why not shoot for.
{'foo':{'python':14}}
Also, I strongly suggest more descriptive variable names. It makes life easier :)
from collections import *
ignore = ['the','a','if','in','it','of','or']
ArtofWarCounter = Counter(ArtofWarLIST)
for word in ArtofWarCounter:
if word in ignore:
del ArtofWarCounter[word]
ArtofWarCounter is a Counter object containing all the words from the Art of War. I'm trying to have words in ignore deleted from the ArtofWarCounter.
Traceback:
File "<pyshell#10>", line 1, in <module>
for word in ArtofWarCounter:
RuntimeError: dictionary changed size during iteration
Don't loop over all words of a dict to find a entry, dicts are much better at lookups.
You loop over the ignore list and remove the entries that exist:
ignore = ['the','a','if','in','it','of','or']
for word in ignore:
if word in ArtofWarCounter:
del ArtofWarCounter[word]
For minimal code changes, use list, so that the object you are iterating over is decoupled from the Counter
ignore = ['the','a','if','in','it','of','or']
ArtofWarCounter = Counter(ArtofWarLIST)
for word in list(ArtofWarCounter):
if word in ignore:
del ArtofWarCounter[word]
In Python2, you can use ArtofWarCounter.keys() instead of list(ArtofWarCounter), but when it is so simple to write code that is futureproofed, why not do it?
It is a better idea to just not count the items you wish to ignore
ignore = {'the','a','if','in','it','of','or'}
ArtofWarCounter = Counter(x for x in ArtofWarLIST if x not in ignore)
note that I made ignore into a set which makes the test x not in ignore much more efficient
See the following question for why your current method is not working:
Remove items from a list while iterating
Basically you should not add or remove items from a collection while you are looping over it. collections.Counter is a subclass of dict, see the following warning in the documentation for dict.iteritems():
Using iteritems() while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries.
Use a counter, traverse the loop backwards (last to first), remove as needed. Loop until zero.