Adding a new key to a nested dictionary in python

Adding a new key to a nested dictionary in python - python

I need to add a key with a value that increases by one for every item in the nested dictionary. I have been trying to use the dict['key']='value' syntax but can't get it to work for a nested dictionary. I'm sure it's a very simple.
My Dictionary:
mydict={'a':{'result':[{'key1':'value1','key2':'value2'},
{'key1':'value3','key2':'value4'}]}}
This is the code that will add the key to the main part of the dictionary:
for x in range(len(mydict)):
number = 1+x
str(number)
mydict[d'index']=number
print mydict
#out: {d'index':d'1',d'a'{d'result':[...]}}
I want to add the new key and value to the small dictionaries inside the square parentheses:
{'a':{'result':[{'key1':'value1',...,'index':'number'}]}}
If I try adding more layers to the last line of the for loop I get a traceback error:
Traceback (most recent call last):
File "C:\Python27\program.py", line 34, in <module>
main()
File "C:\Python27\program.py", line 23, in main
mydict['a']['result']['index']=number
TypeError: list indices must be integers, not unicode
I've tried various different ways of listing the nested items but no joy. Can anyone help me out here?

The problem is that mydict is not simply a collection of nested dictionaries. It contains a list as well. Breaking up the definition helps clarify the internal structure:
dictlist = [{'key1':'value1','key2':'value2'},
{'key1':'value3','key2':'value4'}]
resultdict = {'result':dictlist}
mydict = {'a':resultdict}
So to access the innermost values, we have to do this. Working backwards:
mydict['a']
returns resultdict. Then this:
mydict['a']['result']
returns dictlist. Then this:
mydict['a']['result'][0]
returns the first item in dictlist. Finally, this:
mydict['a']['result'][0]['key1']
returns 'value1'
So now you just have to amend your for loop to iterate correctly over mydict. There are probably better ways, but here's a first approach:
for inner_dict in mydict['a']['result']: # remember that this returns `dictlist`
for key in inner_dict:
do_something(inner_dict, key)

I'm not fully sure what you're trying to do, but I think itertools.count would be able to help here.
>>> c = itertools.count()
>>> c.next()
0
>>> c.next()
1
>>> c.next()
2
>>> c.next()
3
... and so on.
Using this, you can keep incrementing the value that you want to use in your dicts
Hope this helps

Related

Comparing items through a tuple in Python

I am given an assignment when I am supposed to define a function that returns the second element of a tuple if the first element of a tuple matches with the argument of a function.
Specifically, let's say that I have a list of student registration numbers that goes by:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
And I have defined a function that is supposed to take in the argument of reg_num, such as "S12345", and return the name of the student in this case, "John". If the number does not match at all, I need to print "Not found" as a message. In essence, I understand that I need to sort through the larger tuple, and compare the first element [0] of each smaller tuple, then return the [1] entry of each smaller tuple. Here's what I have in mind:
def get_student_name(reg_num, particulars):
for i in records:
if reg_num == particulars[::1][0]:
return particulars[i][1]
else:
print("Not found")
I know I'm wrong, but I can't tell why. I'm not well acquainted with how to sort through a tuple. Can anyone offer some advice, especially in syntax? Thank you very much!

When you write for i in particulars, in each iteration i is an item of the collection and not an index. As such you cannot do particulars[i] (and there is no need - as you already have the item). In addition, remove the else statement so to not print for every item that doesn't match condition:
def get_student_name(reg_num, particulars):
for i in particulars:
if reg_num == i[0]:
return i[1]
print("Not found")
If you would want to iterate using indices you could do (but less nice):
for i in range(len(particulars)):
if reg_num == particulars[i][0]:
return particulars[i][1]

Another approach, provided to help learn new tricks for manipulating python data structures:
You can turn you tuple of tuples:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
into a dictionary:
>>> pdict = dict(particulars)
>>> pdict
{'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
You can look up the value by supplying the key:
>>> r = 'S23456'
>>> dict(pdict)[r]
'Max'
The function:
def get_student_name(reg, s_data):
try:
return dict(s_data)[reg]
except:
return "Not Found"
The use of try ... except will catch errors and just return Not Found in the case where the reg is not in the tuple in the first place. It will also catch of the supplied tuple is not a series of PAIRS, and thus cannot be converted the way you expect.
You can read more about exceptions: the basics and the docs to learn how to respond differently to different types of error.

for loops in python
Gilad Green already answered your question with a way to fix your code and a quick explanation on for loops.
Here are five loops that do more or less the same thing; I invite you to try them out.
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
for t in particulars:
print("{} {}".format(t[0], t[1]))
for i in range(len(particulars)):
print("{}: {} {}".format(i, particulars[i][0], particulars[i][1]))
for i, t in enumerate(particulars):
print("{}: {} {}".format(i, t[0], t[1]))
for reg_value, student_name in particulars:
print("{} {}".format(reg_value, student_name))
for i, (reg_value, student_name) in enumerate(particulars):
print("{}: {} {}".format(i, reg_value, student_name))
Using dictionaries instead of lists
Most importantly, I would like to add that using an unsorted list to store your student records is not the most efficient way.
If you sort the list and maintain it in sorted order, then you can use binary search to search for reg_num much faster than browsing the list one item at a time. Think of this: when you need to look up a word in a dictionary, do you read all words one by one, starting by "aah", "aback", "abaft", "abandon", etc.? No; first, you open the dictionary somewhere in the middle; you compare the words on that page with your word; then you open it again to another page; compare again; every time you do that, the number of candidate pages diminishes greatly, and so you can find your word among 300,000 other words in a very small time.
Instead of using a sorted list with binary search, you could use another data structure, for instance a binary search tree or a hash table.
But, wait! Python already does that very easily!
There is a data structure in python called a dictionary. See the documentation on dictionaries. This structure is perfectly adapted to most situations where you have keys associated to values. Here the key is the reg_number, and the value is the student name.
You can define a dictionary directly:
particulars = {'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
Or you can convert your list of tuples to a dictionary:
particulars = (("S12345", "John"), ("S23456", "Max"), ("S34567", "Mary"))
particulars_as_dict = dict(particulars)
Then you can check if an reg_number is in the dictionary, with they keyword in; you can return the student name using square brackets or with the method get:
>>> particulars = {'S12345': 'John', 'S23456': 'Max', 'S34567': 'Mary'}
>>> 'S23456' in particulars
True
>>> 'S98765' in particulars
False
>>>
>>> particulars['S23456']
'Max'
>>> particulars.get('S23456')
'Max'
>>> particulars.get('S23456', 'not found')
'Max'
>>>
>>> particulars['S98765']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'S98765'
>>> particulars.get('S98765')
None
>>> particulars.get('S98765', 'not found')
'not found'

How to delete item in nested list if it contains keyword?

I have 2 lists. The list named "keyword" is a list I manually created, and the nested list named "mylist" is an output of a function that I have in my script. This is what they look like:
keyword = ["Physics", "Spanish", ...]
mylist = [("Jack","Math and Physics"),
("Bob","English"),
("Emily","Physics"),
("Mark","Gym and Spanish"),
("Brian", "Math and Gym"),
...]
What I am trying to do is delete each item in the nested list if that item (in parenthesis) contains any of the keywords written in the "keyword" list.
For example, in this case, any items in "mylist" that contain the words "Physics" or "Spanish" should be deleted from "mylist". Then, when I print "mylist", this should be the output:
[("Bob","English"), ("Brian", "Math and Gym")]
I tried searching through the internet and many different SO posts to learn how to do this (such as this), but when I modify (because I have a nested list, instead of just a list) the code and run it, I get the following error:
Traceback (most recent call last):
File "namelist.py", line 165, in <module>
asyncio.get_event_loop().run_until_complete(request1())
File "C:\Users\XXXX\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 576, in run_until_complete
return future.result()
File "namelist.py", line 154, in request1
mylist.remove(a)
ValueError: list.remove(x): x not in list
Does anyone know how to fix this error, and could you share your code?
EDIT: By the way, the real "mylist" I have on my script is much longer than what I wrote here, and I have about 15 keywords. When I run it on a small scale like this, the code works well, but as soon as I have more than 5 keywords, for some reason, I keep getting this error.

You could join each of the tuples into a string and then check if any keyword is in the string to filter your list.
newlist = [m for m in mylist if not any(k for k in keywords if k in ' '.join(m))]
print(newlist)
# [('Bob', 'English'), ('Brian', 'Math and Gym')]

for key in keyword:
for tup in mylist:
[mylist.remove(tup) for i in tup if key in i]

You can start by splitting the fields with and and looking at intersection between the keys and the fields of each person. For instance, you could imagine something like this:
new_list = []
for name,fields in mylist:
# Convert the string into a set of string for intersection
field_set = set(fields.split(" and "))
field_in_keys = field_set.intersection(keyword)
# Add in the new list if no intersection is found
if len(field_in_keys) == 0:
new_list.append((name,fields))
You get:
[('Bob', 'English'), ('Brian', 'Math and Gym')]
If you care for speed, then pandas might do the work more efficiently

for x in keyword:
for i in mylist:
for w in i[1].split(' '):
if w == x:
mylist.remove(i)
If you just loop through each word I think that will work as well.

How to build a dictionary of string lengths using a for loop?

I am pretty new to python, but I know the basic commands. I am trying to create a for loop which, when given a list of sentences, adds a key for the length of each sentence. The value of each key would be the frequency of that sentence length in the list, so that the format would look something like this:
dictionary = {length1:frequency, length2:frequency, etc.}
I can't seem to find any previously answered questions that deal with this specifically - creating keys using basic functions, then changing the key's value by the frequency of that result. Here is the code I have:
dictionary = {}
for i in sentences:
dictionary[len(i.split())] += 1
When I try to run the code, I get this message:
Traceback (most recent call last):
File "<pyshell#11>", line 2, in <module>
dictionary[len(i.split())] += 1
KeyError: 27
Any help fixing my code and an explanation of where I went wrong would be so appreciated!

I think this will solve your problem, in Python 3:
sentences ='Hi my name is xyz'
words = sentences.split()
dictionary ={}
for i in words:
if len(i) in dictionary:
dictionary[len(i)]+=1
else:
dictionary[len(i)] = 1
print(dictionary)
Output:
{2: 3, 3: 1, 4: 1}
In dictionary, first you have to assign some value to key then only you can use that value for further calculations Or else there is alternative use defaultdict to assign default value to each key.
Hope this helps.

Python - Updating value in one dictionary is updating value in all dictionaries

I have a list of dictionaries called lod. All dictionaries have the same keys but different values. I am trying to update one specific value in the list of values for the same key in all the dictionaries.
I am attempting to do it with the following for loop:
for i in range(len(lod)):
a=lod[i][key][:]
a[p]=a[p]+lov[i]
lod[i][key]=a
What's happening is each is each dictionary is getting updated len(lod) times so lod[0][key][p] is supposed to have lov[0] added to it but instead it is getting lov[0]+lov[1]+.... added to it.
What am I doing wrong?
Here is how I declared the list of dicts:
lod = [{} for _ in range(len(dataul))]
for j in range(len(dataul)):
for i in datakl:
rrdict[str.split(i,',')[0]]=list(str.split(i,',')[1:len(str.split(i,','))])
lod[j]=rrdict

The problem is in how you created the list of dictionaries. You probably did something like this:
list_of_dicts = [{}] * 20
That's actually the same dict 20 times. Try doing something like this:
list_of_dicts = [{} for _ in range(20)]
Without seeing how you actually created it, this is only an example solution to an example problem.
To know for sure, print this:
[id(x) for x in list_of_dicts]
If you defined it in the * 20 method, the id is the same for each dict. In the list comprehension method, the id is unique.

This it where the trouble starts: lod[j] = rrdict. lod itself is created properly with different dictionaries. Unfortunately, afterwards any references to the original dictionaries in the list get overwritten with a reference to rrdict. So in the end, the list contains only references to one single dictionary. Here is some more pythonic and readable way to solve your problem:
lod = [{} for _ in range(len(dataul))]
for rrdict in lod:
for line in datakl:
splt = line.split(',')
rrdict[splt[0]] = splt[1:]

You created the list of dictionaries correctly, as per other answer.
However, when you are updating individual dictionaries, you completely overwrite the list.
Removing noise from your code snippet:
lod = [{} for _ in range(whatever)]
for j in range(whatever):
# rrdict = lod[j] # Uncomment this as a possible fix.
for i in range(whatever):
rrdict[somekey] = somevalue
lod[j] = rrdict
Assignment on the last line throws away the empty dict that was in lod[j] and inserts a reference to the object represented by rrdict.
Not sure what your code does, but see a commented-out line - it might be the fix you are looking for.

Creating (seeding) large dictionaries efficiently in Python

I have a long (500K+ rows) two column spreadsheet that looks like this:
Name Code
1234 A
1234 B
1456 C
4556 A
4556 B
4556 C
...
So there is an element (with a Name) that can have a number of Codes. But instead of one row per code, I would like to a list of all codes that occur for each element. What I want is a dictionary like this:
{"1234":["A","B"],"1456":["C"],"4556":["A","B","C"] ...]}
What I have tried is this (and I'm not including the file reading syntax).
codelist = {}
for row in rows:
name,code = well.split()
if name in codelist.keys():
codelist[name].append(code)
else:
codelist[name] = [code]
This creates the right output but progress becomes incredibly slow. So I've tried priming my dictionary with keys:
allnames = [.... list of all the names ...]
codelist = dict.fromkeys(allnames)
for row in rows:
name,code = well.split()
if codelist[name]:
codelist[name].append(code)
else:
codelist[name] = [code]
This is dramatically faster, and my question is why? Doesn't the program each time still have to search all the keys in the dict? Is there another way to speed up the dict search that doesn't include traversing a tree?
Interesting is the error I get when I use the same conditional check as before (if name in codelist.keys():) after priming my dictionary.
Traceback (most recent call last):
File ....
codelist[name].append(code)
AttributeError: 'NoneType' object has no attribute 'append'
Now, there is a key but no list to append to. So I use codelist[name] which is <NoneType> as well and appears to work. What does it mean when mydict["primed key"] is <NoneType> ?enter code here

The former one is slower because .keys() has to create a list of all keys in memory first and then the in operator performs a search on it. So, it is an O(N) search for each line from the text file, hence it is slow.
On the other hand a simple key in dict search takes O(1) time.
dict.fromkeys(allnames)
The default value assigned by dict.fromkeys is None, so you can't use append on it.
>>> d = dict.fromkeys('abc')
>>> d
{'a': None, 'c': None, 'b': None}
A better solution will be to use collections.defaultdict here, in case that is not an option then use a normal dict with either a simple if-else check or dict.setdefault.
In Python3 .keys() returns a View Object, so time complexity may differ there. But, it is still going to be slightly slower than normal key in dict search.

You might want to have a look at the defaultdict container to avoid checks
from collections import defaultdict
allnames [.... list of all the names ...]
codelist = defaultdict(list)
for row in rows:
name,code = well.split()
codelist[name].append(code)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a new key to a nested dictionary in python - python

I'm not fully sure what you're trying to do, but I think itertools.count would be able to help here. >>> c = itertools.count() >>> c.next() 0 >>> c.next() 1 >>> c.next() 2 >>> c.next() 3 ... and so on. Using this, you can keep incrementing the value that you want to use in your dicts Hope this helps

Related

Comparing items through a tuple in Python

How to delete item in nested list if it contains keyword?

How to build a dictionary of string lengths using a for loop?

Python - Updating value in one dictionary is updating value in all dictionaries

Creating (seeding) large dictionaries efficiently in Python

Categories

Resources