I am trying to create a python script and I am stuck with the dictionaries. I have read through some of the other forums but can't seem to get anywhere. I am a very new python programmer so please be gentle.
What I want to do:
1) set up a dictionary like this: {'Name':'userid','jobid:jobid','walltime:walltime,'nodes:nds'}
2) iterate through a list of entries created from and external function call and extract information to populate the dictionary
3) Problem: I cannot figure out how to append entries to the appropriate keys
For example, I want this:
{‘Name’:’jose’,’jobid’:’001,002,003,005’,’walltime:32:00,240:00,04:00,07:00’,’nodes’:32,32,500’}
Notice for one userid, I have multiple jobids, walltimes and nodes.
(len(jobids)==len(walltimes)==len(nodes) for any one userid but can vary across userids)
I am able to get the script to find the first value for each username, but it never appends. How can I get this to append?
Here is what I have tried
from collections import defaultdict
pdict = defaultdict(list)
start the loop:
# get new values – add these to the dictionary keyed
# on username (create a new entry or append to existing entry)
…
(jobid,userid,jobname, sessid, nds, tsk, walltime,rest)= m.groups()
...
if userid in pdict:
print "DEBUG: %s is currently in the dictionary -- appending entries" %(userid)
pdict[userid][‘jobid’] = pdict[userid][jobid].append(jobid) I
# repeat for nodes, walltime, etc
if not userid in pdict:
print "DEBUG: %s is not in the dictionary creating entry" %(userid)
pdict[userid] = {} # define a dictionary within a dictionary with key off userid
pdict[userid]['jobid'] = jobid
pdict[userid]['jobname'] = jobname
pdict[userid]['nodes'] = nds
pdict[userid]['walltime'] = walltime
I know this is wrong but can’t figure out how to get the append to work. I have tried many of the suggestions offered on this site. I need to append (to the dictionary) the most recent values from the loop keyed to userid
Here is an example of the ouput – it does not append multiple entries for each userid but rather takes only the first value for each userid
userid jmreill contains data: {'nodes': '1', 'jobname':
'A10012a_ReMig_Q', 'walltime': '230:0', 'jobid': '1365582'}
userid igorysh contains data: {'nodes': '20', 'jobname':
'emvii_Beam_fwi6', 'walltime': '06:50', 'jobid': '1398100'}
Any suggestions? This should be easy but I can’t figure it out!
from collections import defaultdict
pdict = defaultdict(dict)
start the loop:
# get new values – add these to the dictionary keyed
# on username (create a new entry or append to existing entry)
…
(jobid,userid,jobname, sessid, nds, tsk, walltime,rest)= m.groups()
...
if userid in pdict:
print "DEBUG: %s is currently in the dictionary -- appending entries" %(userid)
pdict[userid][jobid].append(jobid)
# repeat for nodes, walltime, etc
if userid not in pdict:
print "DEBUG: %s is not in the dictionary creating entry" %(userid)
pdict[userid]['jobid'] = [jobid]
pdict[userid]['jobname'] = jobname
pdict[userid]['nodes'] = nds
pdict[userid]['walltime'] = walltime
The value corresponding to key 'jobid' should be a list of strings rather than a string. If you create your dictionary this way, you can append new jobid's to the list simply by:
pdict[userid]['jobid'].append(jobid)
I cant remember the explanation why to use the lambda expression in the following code, but you have to define a defaultdict of a defaultdict:
pdict = defaultdict(lambda: defaultdict(list))
pdict[userid][‘jobid’].append('1234')
will work.
The append() method does not return the list...it is modified in place. Also, you need to initialize your elements as lists (using square brackets):
if userid in pdict:
print "DEBUG: %s is currently in the dictionary -- appending entries" %(userid)
pdict[userid][jobid].append(jobid) ## just call append here
# repeat for nodes, walltime, etc
if not userid in pdict:
print "DEBUG: %s is not in the dictionary creating entry" %(userid)
pdict[userid] = {} # define a dictionary within a dictionary with key off userid
pdict[userid]['jobid'] = [jobid,] ## initialize with lists
pdict[userid]['jobname'] = [jobname,]
pdict[userid]['nodes'] = [nds,]
pdict[userid]['walltime'] = [walltime,]
append doesn't return a value, it modifies the list in place, and you forgot to quote 'jobid' on the right of equals. So you should replace pdict[userid][‘jobid’] = pdict[userid][jobid].append(jobid) with pdict[userid]['jobid'].append(jobid). Also take into account comment from #Jasper.
You are looking for a dict of dicts? AutoVivification is the perfect solution. Implement the perl’s autovivification feature in Python.
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
This makes everything easier. Note that the value of pdict[userid]['jobid'] should be a list [jobid] instead of a variable jobid as you have multiple jobid.
pdict = AutoVivification()
if userid in pdict:
pdict[userid]['jobid'].append(jobid)
else:
pdict[userid]['jobid'] = [jobid] # a list
Refer to What is the best way to implement nested dictionaries in Python?.
Related
I am asking an ElasticSearch database to provide me with a list of indices and their creation dates using Python 2.7 and the Requests package. The idea is to quickly calculate which indices have exceeded the retention policy and need to be put to sleep.
The request works perfectly and the results are exactly what I want. However, when I run the code below, when I try to convert the json result to a dict, the type of theDict is correct but it reports a size of 1, when there should be at least a couple dozen entries. What am I doing wrong? I have a feeling it's something really dumb but I just can't snag it! :)
import json
import requests
esEndPoint = "https://localhost:9200"
retrieveString = "/_cat/indices?h=index,creation.date.string&format=json&s=creation.date"
# Gets the current indices and their creation dates
def retrieveIndicesAndDates():
try:
theResult = requests.get(esEndPoint+retrieveString)
print (theResult.content)
except Exception as e:
print("Unable to retrieve list of indices with creation dates.")
print("Error: "+e)
exit(3)
return theResult.content
def main():
theDict = dict(json.loads(retrieveIndicesAndDates()))
print(type(theDict)) # Reports correct type
print(len(theDict)) # Always outputs "1" ??
for index, creationdate in theDict.items():
print("Index: ",index,", Creation date: ",theDict[index])
return
The json the call returns:
[{"index":".kibana","creation.date.string":"2017-09-14T15:01:38.611Z"},{"index":"logstash-2018.07.23","creation.date.string":"2018-07-23T00:00:01.024Z"},{"index":"cwl-2018.07.23","creation.date.string":"2018-07-23T00:00:03.877Z"},{"index":"k8s-testing-internet-2018.07.23","creation.date.string":"2018-07-23T14:19:10.024Z"},{"index":"logstash-2018.07.24","creation.date.string":"2018-07-24T00:00:01.023Z"},{"index":"k8s-testing-internet-2018.07.24","creation.date.string":"2018-07-24T00:00:01.275Z"},{"index":"cwl-2018.07.24","creation.date.string":"2018-07-24T00:00:02.157Z"},{"index":"k8s-testing-internet-2018.07.25","creation.date.string":"2018-07-25T00:00:01.022Z"},{"index":"logstash-2018.07.25","creation.date.string":"2018-07-25T00:00:01.186Z"},{"index":"cwl-2018.07.25","creation.date.string":"2018-07-25T00:00:04.012Z"},{"index":"logstash-2018.07.26","creation.date.string":"2018-07-26T00:00:01.026Z"},{"index":"k8s-testing-internet-2018.07.26","creation.date.string":"2018-07-26T00:00:01.185Z"},{"index":"cwl-2018.07.26","creation.date.string":"2018-07-26T00:00:02.587Z"},{"index":"k8s-testing-internet-2018.07.27","creation.date.string":"2018-07-27T00:00:01.027Z"},{"index":"logstash-2018.07.27","creation.date.string":"2018-07-27T00:00:01.144Z"},{"index":"cwl-2018.07.27","creation.date.string":"2018-07-27T00:00:04.485Z"},{"index":"ctl-2018.07.27","creation.date.string":"2018-07-27T09:02:09.854Z"},{"index":"cfl-2018.07.27","creation.date.string":"2018-07-27T11:12:44.681Z"},{"index":"elb-2018.07.27","creation.date.string":"2018-07-27T11:13:51.340Z"},{"index":"cfl-2018.07.24","creation.date.string":"2018-07-27T11:45:23.697Z"},{"index":"cfl-2018.07.23","creation.date.string":"2018-07-27T11:45:24.646Z"},{"index":"cfl-2018.07.25","creation.date.string":"2018-07-27T11:45:25.700Z"},{"index":"cfl-2018.07.26","creation.date.string":"2018-07-27T11:45:26.341Z"},{"index":"elb-2018.07.24","creation.date.string":"2018-07-27T11:45:27.440Z"},{"index":"elb-2018.07.25","creation.date.string":"2018-07-27T11:45:29.572Z"},{"index":"elb-2018.07.26","creation.date.string":"2018-07-27T11:45:36.170Z"},{"index":"logstash-2018.07.28","creation.date.string":"2018-07-28T00:00:01.023Z"},{"index":"k8s-testing-internet-2018.07.28","creation.date.string":"2018-07-28T00:00:01.316Z"},{"index":"cwl-2018.07.28","creation.date.string":"2018-07-28T00:00:03.945Z"},{"index":"elb-2018.07.28","creation.date.string":"2018-07-28T00:00:53.992Z"},{"index":"ctl-2018.07.28","creation.date.string":"2018-07-28T00:07:19.543Z"},{"index":"k8s-testing-internet-2018.07.29","creation.date.string":"2018-07-29T00:00:01.026Z"},{"index":"logstash-2018.07.29","creation.date.string":"2018-07-29T00:00:01.378Z"},{"index":"cwl-2018.07.29","creation.date.string":"2018-07-29T00:00:04.100Z"},{"index":"elb-2018.07.29","creation.date.string":"2018-07-29T00:00:59.241Z"},{"index":"ctl-2018.07.29","creation.date.string":"2018-07-29T00:06:44.199Z"},{"index":"logstash-2018.07.30","creation.date.string":"2018-07-30T00:00:01.024Z"},{"index":"k8s-testing-internet-2018.07.30","creation.date.string":"2018-07-30T00:00:01.179Z"},{"index":"cwl-2018.07.30","creation.date.string":"2018-07-30T00:00:04.417Z"},{"index":"elb-2018.07.30","creation.date.string":"2018-07-30T00:01:01.442Z"},{"index":"ctl-2018.07.30","creation.date.string":"2018-07-30T00:08:28.936Z"},{"index":"cfl-2018.07.30","creation.date.string":"2018-07-30T06:52:16.739Z"}]
Your error is trying to convert a list of dicts to a dict:
theDict = dict(json.loads(retrieveIndicesAndDates()))
# ^^^^^ ^
That would only work for a dict of lists. It would be redundant, though.
Just use the reply directly. Each entry is a dict with the appropriate keys:
data = json.loads(retrieveIndicesAndDates())
for entry in data:
print("Index: ", entry["index"], ", Creation date: ", entry["creation.date.string"])
So what happens when you do convert that list to a dict? Why is there just one entry?
The dict understands three initialisation methods: keywords, mappings and iterables. A list fits the last one.
Initialisation from an iterable goes through it and expects key-value iterables as elements. If one were to do it manually, it would look like this:
def sequence2dict(sequence):
map = {}
for element in sequence:
key, value = element
map[key] = value
return map
Notice how each element is unpacked via iteration? In the reply each element is a dict with two entries. Iteration on that yields the two keys but ignores the values.
key, value = {"index":".kibana","creation.date.string":"2017-09-14T15:01:38.611Z"}
print(key, '=>', value) # prints "index => creation.date.string"
To the dict constructor, every element in the reply has the same key-value pair: "index" and "creation.date.string". Since keys in a dict are unique, all elements collapse to the same entry: {"index": "creation.date.string"}.
I try to append function objects to list, which is element of dictionary:
jobs = {}
job = sched.add_date_job(callback, run_at, [params])
jobs[hereCanBeRandomNumber].append(job)
But, it seems I have a problem in last line. Compiler says:KeyError: 118096950.
What's the problem?
The way you're adding to your dictionary is incorrect.
jobs[hereCanBeRandomNumber].append(job)
Translates to "Append the job to the value of the dictionary with key hereCanBeRandomNumber"
If you're trying to add to the dictionary, use:
jobs[hereCanBeRandomNumber] = job
This will add to the jobs dict, so it looks like:
jobs = {118096950: job}
So your problem here is you are trying to append to a key that does not exist yet.
Take a dictionary like
jobs = {}
and doing
jobs[123].append(foo)
will produce a KeyError as nothing exists at 123 yet.
To get around this you can do either of the following:
from collections import defaultdict
jobs = defaultdict(list)
jobs[123].append(foo)
which means if a key does not exist it is initialised to an empty list first or
jobs = {}
jobs[123] = jobs.get(123, []).append(job)
which checks jobs for the presence of the key and if it doesn't exist used an empty list
I have a list of dictionaries that maps different IDs to a central ID. I have a document with these different IDs associated with terms. I have created a function that now has a key the central ID from the different IDs in the document. The goFile is the document where in the first column there's an ID and in the second one there's a GOterm. The mappingList is a list containing dictionaries in which the ID in the goFile is mapped to a main ID.
My expected output is a dictionary with a main ID as a key and a set with the go terms associated with it as value.
def parseGO(mappingList, goFile):
# open the file
file = open(goFile)
# this will be the dictionary that this function returns
# entries will have as a key an Ensembl ID
# and the value will be a set of GO terms
GOdict = {}
GOset = set()
for line in file:
splitline = line.split(' ')
GO_term = splitline[1]
value_ID = splitline[0]
for dict in mappingList:
if value_ID in dict:
ENSB_term = dict[value_ID]
#my best try
for dict in mappingList:
for key in GOdict.keys():
if value_ID in dict and key == dict[value_ID]:
GOdict[ENSB_term].add(GO_term)
GOdict[ENSB_term] = GOset
return GOdict
My problem is that now I have to add to the central ID in my GOdict the terms that are associated in the document to the different IDs. To avoid duplicates i use a set (GOset). How do I do it? All my try end having all the terms mapped to all the main IDs.
Some sample:
mappingList = [{'1234': 'mainID1', '456': 'mainID2'}, {'789': 'mainID2'}]
goFile:
1234 GOTERM1
1234 GOTERM2
456 GOTERM1
456 GOTERM3
789 GOTERM1
expected output:
GOdict = {'mainID1': set([GOTERM1, GOTERM2]), 'mainID2': set([GOTERM1, GOTERM3])}
First off, you shouldn't use the variable name 'dict', as it shadows the built-in dict class, and will cause you problems at some point.
The following should work for you:
from collections import defaultdict
def parse_go(mapping_list, go_file):
go_dict = defaultdict(set)
with open(go_file) as f: # Better garbage handling using 'with'
for line in f:
(value_id, go_term) = line.split() # Feel free to change the split behaviour
# work better for you.
for map_dict in mapping_list:
if value_id in map_dict:
go_dict[map_dict[value_id]].add(go_term)
return go_dict
The code is fairly straightforward, but here's a breakdown anyway.
We use a default dictionary instead of a normal dictionary so we can eliminate all that if in or setdefault() boilerplate.
For each line in the file, we check if the first item (value_id) is a key in any of the mapping dictionaries, and if so, adds the lines second item (go_term) to that value_id's set in the dictionary.
EDIT: Request for doing this without defaultdict(). Assume that go_dict is just a normal dictionary (go_dict = {}), your for loop would look like:
for map_dict in mapping_list:
if value_id in map_dict:
esnb_entry = go_dict.setdefault(map_dict[value_id], set())
esnb_entry.add(go_term)
I have a list of dictionaries=
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]
"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)
You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:
a = {
1: {'VALUE': 2, 'foo': 'bar'},
42: {'VALUE': 45, 'spam': 'eggs'},
}
Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.
Now you can simply look up if a key exists:
if someid in a:
a[someid]['VALUE'] = newvalue
I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.
A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.
If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.
If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:
items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]
def set_value_by_id(id, value):
# Try to find the item, if it exists
for item in items:
if item["ID"] == id:
break
# Make and append the item if it doesn't exist
else: # Here, `else` means "if the loop terminated not via break"
item = {"ID": id}
items.append(id)
# In either case, set the value
item["VALUE"] = value
* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.
Convert your list into a dict and then checking for values is much more efficient.
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
if new_key not in d:
d[new_key] = new_value
Also need to update on key found:
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
d.setdefault(new_key, 0)
d[new_key] = new_value
Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.
I haven't tried if it ends up faster though.
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2
tmp = filter(lambda d: d['ID']==id, a)
# the filter will either return an empty list, or a list of one item.
if not tmp:
tmp = {"ID":id, "VALUE":"default"}
a.append(tmp)
else:
tmp = tmp[0]
# tmp is bound to the found/new dictionary
Here is my code:
for response in responses["result"]:
ids = {}
key = response['_id'].encode('ascii')
print key
for value in response['docs']:
ids[key].append(value)
Traceback:
File "people.py", line 47, in <module>
ids[key].append(value)
KeyError: 'deanna'
I am trying to add multiple values to a key. Throws an error like above
Check out setdefault:
ids.setdefault(key, []).append(value)
It looks to see if key is in ids, and if not, sets that to be an empty list. Then it returns that list for you to inline call append on.
Docs:
http://docs.python.org/2/library/stdtypes.html#dict.setdefault
If I'm reading this correctly your intention is to map the _id of a response to its docs. In that case you can bring down everything you have above to a dict comprehension:
ids = {response['_id'].encode('ascii'): response['docs']
for response in responses['result']}
This also assumes you meant to have id = {} outside of the outermost loop, but I can't see any other reasonable interpretation.
If the above is not correct,
You can use collections.defaultdict
import collections # at top level
#then in your loop:
ids = collections.defaultdict(list) #instead of ids = {}
A dictionary whose default value will be created by calling the init argument, in this case calling list() will produce an empty list which can then be appended to.
To traverse the dictionary you can iterate over it's items()
for key, val in ids.items():
print(key, val)
The reason you're getting a KeyError is this: In the first iteration of your for loop, you look up the key in an empty dictionary. There is no such key, hence the KeyError.
The code you gave will work, if you first insert an empty list into the dictionary under to appropriate key. Then append the values to the list. Like so:
for response in responses["result"]:
ids = {}
key = response['_id'].encode('ascii')
print key
if key not in ids: ## <-- if we haven't seen key yet
ids[key] = [] ## <-- insert an empty list into the dictionary
for value in response['docs']:
ids[key].append(value)
The previous answers are correct. Both defaultdict and dictionary.setdefault are automatic ways of inserting the empty list.