I have a list of dictionaries=
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4},...]
"ID" is a unique identifier for each dictionary. Considering the list is huge, what is the fastest way of checking if a dictionary with a certain "ID" is in the list, and if not append to it? And then update its "VALUE" ("VALUE" will be updated if the dict is already in list, otherwise a certain value will be written)
You'd not use a list. Use a dictionary instead, mapping ids to nested dictionaries:
a = {
1: {'VALUE': 2, 'foo': 'bar'},
42: {'VALUE': 45, 'spam': 'eggs'},
}
Note that you don't need to include the ID key in the nested dictionary; doing so would be redundant.
Now you can simply look up if a key exists:
if someid in a:
a[someid]['VALUE'] = newvalue
I did make the assumption that your ID keys are not necessarily sequential numbers. I also made the assumption you need to store other information besides VALUE; otherwise just a flat dictionary mapping ID to VALUE values would suffice.
A dictionary lets you look up values by key in O(1) time (constant time independent of the size of the dictionary). Lists let you look up elements in constant time too, but only if you know the index.
If you don't and have to scan through the list, you have a O(N) operation, where N is the number of elements. You need to look at each and every dictionary in your list to see if it matches ID, and if ID is not present, that means you have to search from start to finish. A dictionary will still tell you in O(1) time that the key is not there.
If you can, convert to a dictionary as the other answers suggest, but in case you you have reason* to not change the data structure storing your items, here's what you can do:
items = [{"ID":1, "VALUE":2}, {"ID":2, "VALUE":2}, {"ID":3, "VALUE":4}]
def set_value_by_id(id, value):
# Try to find the item, if it exists
for item in items:
if item["ID"] == id:
break
# Make and append the item if it doesn't exist
else: # Here, `else` means "if the loop terminated not via break"
item = {"ID": id}
items.append(id)
# In either case, set the value
item["VALUE"] = value
* Some valid reasons I can think of include preserving the order of items and allowing duplicate items with the same id. For ways to make dictionaries work with those requirements, you might want to take a look at OrderedDict and this answer about duplicate keys.
Convert your list into a dict and then checking for values is much more efficient.
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
if new_key not in d:
d[new_key] = new_value
Also need to update on key found:
d = dict((item['ID'], item['VALUE']) for item in a)
for new_key, new_value in new_items:
d.setdefault(new_key, 0)
d[new_key] = new_value
Answering the question you asked, without changing the datastructure around, there's no real faster way of looking without a loop and checking every element and doing a dictionary lookup for each one - but you can push the loop down to the Python runtime instead of using Python's for loop.
I haven't tried if it ends up faster though.
a = [{"ID":1, "VALUE":2},{"ID":2, "VALUE":2},{"ID":3, "VALUE":4}]
id = 2
tmp = filter(lambda d: d['ID']==id, a)
# the filter will either return an empty list, or a list of one item.
if not tmp:
tmp = {"ID":id, "VALUE":"default"}
a.append(tmp)
else:
tmp = tmp[0]
# tmp is bound to the found/new dictionary
Related
I have 2 dictionary and an input
letter = 'd'
dict_1 = {"label_1": array(['a','b']), "label_2": array(['c','d']), ...}
dict_2 = {"label_1": array(['x','y']), "label_2": array(['z','o']), ...}
letter_translated = some_function(letter)
output desired: 'o'
What I have in mind right now is to get the index number from the array of the key "label_2" in dict_1 then searching for the same index in dict_2. I am open to other way of doing it. If you are unclear about the question, feel free to drop a comment.
Note: the arrays are numpy arrays
I propose to iterate through the first dictionary, while keeping a trace of how to get to the current element (key and i) so that we can look in the second dict in the same place:
from numpy import array
dict_1 = {"label_1": array(['a','b']), "label_2": array(['c','d'])}
dict_2 = {"label_1": array(['x','y']), "label_2": array(['z','o'])}
def look_for_corresponding(letter, d1, d2):
for key, array_of_letters in d1.items():
for position, d1_letter in enumerate(array_of_letters):
if d1_letter == letter:
return d2[key][position]
return None # Line not necessary, but added for clarity
output = look_for_corresponding('d', dict_1, dict_2)
print(output)
# o
Of course, this code will fail if dict_1 and dict_2 do not have exactly the same structure, or if the arrays are more than 1D. If those cases apply to you, please edit your question to indicate it.
Also, I am not sure what should be done if the letter is not to be found within dict_1. This code will return None, but it could also raise an exception.
What do you mean with 'index'? The number?
dictionaries don't have the concept of counted indices of their entries. You can only access data through the key (here "label_2"), or by iterating (for key in dict_1 ...).
The order is not guaranteed and can change. The order or your declaration is not kept.
If you wish to have "label_2" in both, then you need to access
key = "label_2"
item_from_1 = dict_1[key]
item_from_2 = dict_2[key]
If you need to iterate dict_1, then on each item find the appropriate item in the second, then this also needs to go over the key:
for (key,value1) in dict_1.iteritems():
value2 = dict_2[key]
.....
Note that the order the items come up in the loop may vary. Even from one run of the program to the next.
data = {
"items" : [{"potion" : 1}, {"potion2" : 1}]
}
print(data["items"][0]["potion"])
So, here's the jiz. I want to get potion2 without providing number like [0] but i can't because some variables has 5 items within the list while another one might have 3 items so providing a number might not giving me what i need. Is there a way to get potion2 without providing that number before it?
I'm assuming you don't want to provide the index because hard coding it will not work in all circumstances.
You can just pull out any items in the list which have that key.
Build a list of any items, which have that key. It might ordinarily be just one, but the container itself does not enforce that only one entry can have that key.
After that you can either iterate over the list or check if the returned value is empty and just take the first element.
>>> data = {'items': [{'potion': 1}, {'potion2': 1}]}
>>> e = filter(lambda i: 'potion' in i, data['items'])
>>> for i in e:
... print(i['potion'])
...
1
Or to pull out only the first element. I realize you said no indices, but this index is applied to the filtered list and we check that its not empty first, so it's a valid thing to do.
>>> if e:
... print(e[0]['potion'])
...
1
I was just wondering if there is a simple way to do this. I have a particular structure that is parsed from a file and the output is a list of a dict of a list of a dict. Currently, I just have a bit of code that looks something like this:
for i in xrange(len(data)):
for j, k in data[i].iteritems():
for l in xrange(len(data[i]['data'])):
for m, n in data[i]['data'][l].iteritems():
dostuff()
I just wanted to know if there was a function that would traverse a structure and internally figure out whether each entry was a list or a dict and if it is a dict, traverse into that dict and so on. I've only been using Python for about a month or so, so I am by no means an expert or even an intermediate user of the language. Thanks in advance for the answers.
EDIT: Even if it's possible to simplify my code at all, it would help.
You never need to iterate through xrange(len(data)). You iterate either through data (for a list) or data.items() (or values()) (for a dict).
Your code should look like this:
for elem in data:
for val in elem.itervalues():
for item in val['data']:
which is quite a bit shorter.
Will, if you're looking to decend an arbitrary structure of array/hash thingies then you can create a function to do that based on the type() function.
def traverse_it(it):
if (isinstance(it, list)):
for item in it:
traverse_it(item)
elif (isinstance(it, dict)):
for key in it.keys():
traverse_it(it[key])
else:
do_something_with_real_value(it)
Note that the average object oriented guru will tell you not to do this, and instead create a class tree where one is based on an array, another on a dict and then have a single function to process each with the same function name (ie, a virtual function) and to call that within each class function. IE, if/else trees based on types are "bad". Functions that can be called on an object to deal with its contents in its own way "good".
I think this is what you're trying to do. There is no need to use xrange() to pull out the index from the list since for iterates over each value of the list. In my example below d1 is therefore a reference to the current data[i].
for d1 in data: # iterate over outer list, d1 is a dictionary
for x in d1: # iterate over keys in d1 (the x var is unused)
for d2 in d1['data']: # iterate over the list
# iterate over (key,value) pairs in inner most dict
for k,v in d2.iteritems():
dostuff()
You're also using the name l twice (intentionally or not), but beware of how the scoping works.
well, question is quite old. however, out of my curiosity, I would like to respond to your question for much better answer which I just tried.
Suppose, dictionary looks like: dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
Solution:
dict1 = { 'a':5,'b': [1,2,{'a':100,'b':100}], 'dict 2' : {'a':3,'b':5}}
def recurse(dict):
if type(dict) == type({}):
for key in dict:
recurse(dict[key])
elif type(dict) == type([]):
for element in dict:
if type(element) == type({}):
recurse(element)
else:
print element
else:
print dict
recurse(dict1)
I have a dict that has string-type keys whose exact values I can't know (because they're generated dynamically elsewhere). However, I know that that the key I want contains a particular substring, and that a single key with this substring is definitely in the dict.
What's the best, or "most pythonic" way to retrieve the value for this key?
I thought of two strategies, but both irk me:
for k,v in some_dict.items():
if 'substring' in k:
value = v
break
-- OR --
value = [v for (k,v) in some_dict.items() if 'substring' in k][0]
The first method is bulky and somewhat ugly, while the second is cleaner, but the extra step of indexing into the list comprehension (the [0]) irks me. Is there a better way to express the second version, or a more concise way to write the first?
There is an option to write the second version with the performance attributes of the first one.
Use a generator expression instead of list comprehension:
value = next(v for (k,v) in some_dict.iteritems() if 'substring' in k)
The expression inside the parenthesis will return an iterator which you will then ask to provide the next, i.e. first element. No further elements are processed.
How about this:
value = (v for (k,v) in some_dict.iteritems() if 'substring' in k).next()
It will stop immediately when it finds the first match.
But it still has O(n) complexity, where n is the number of key-value pairs. You need something like a suffix list or a suffix tree to speed up searching.
If there are many keys but the string is easy to reconstruct from the substring, then it can be faster reconstructing it. e.g. often you know the start of the key but not the datestamp that has been appended on. (so you may only have to try 365 dates rather than iterate through millions of keys for example).
It's unlikely to be the case but I thought I would suggest it anyway.
e.g.
>>> names={'bob_k':32,'james_r':443,'sarah_p':12}
>>> firstname='james' #you know the substring james because you have a list of firstnames
>>> for c in "abcdefghijklmnopqrstuvwxyz":
... name="%s_%s"%(firstname,c)
... if name in names:
... print name
...
james_r
class MyDict(dict):
def __init__(self, *kwargs):
dict.__init__(self, *kwargs)
def __getitem__(self,x):
return next(v for (k,v) in self.iteritems() if x in k)
# Defining several dicos ----------------------------------------------------
some_dict = {'abc4589':4578,'abc7812':798,'kjuy45763':1002}
another_dict = {'boumboum14':'WSZE x478',
'tagada4783':'ocean11',
'maracuna102455':None}
still_another = {12:'jfg',45:'klsjgf'}
# Selecting the dicos whose __getitem__ method will be changed -------------
name,obj = None,None
selected_dicos = [ (name,obj) for (name,obj) in globals().iteritems()
if type(obj)==dict
and all(type(x)==str for x in obj.iterkeys())]
print 'names of selected_dicos ==',[ name for (name,obj) in selected_dicos]
# Transforming the selected dicos in instances of class MyDict -----------
for k,v in selected_dicos:
globals()[k] = MyDict(v)
# Exemple of getting a value ---------------------------------------------
print "some_dict['7812'] ==",some_dict['7812']
result
names of selected_dicos == ['another_dict', 'some_dict']
some_dict['7812'] == 798
I prefer the first version, although I'd use some_dict.iteritems() (if you're on Python 2) because then you don't have to build an entire list of all the items beforehand. Instead you iterate through the dict and break as soon as you're done.
On Python 3, some_dict.items(2) already results in a dictionary view, so that's already a suitable iterator.
I've got a list of tuples extracted from a table in a DB which looks like (key , foreignkey , value). There is a many to one relationship between the key and foreignkeys and I'd like to convert it into a dict indexed by the foreignkey containing the sum of all values with that foreignkey, i.e. { foreignkey , sumof( value ) }. I wrote something that's rather verbose:
myDict = {}
for item in myTupleList:
if item[1] in myDict:
myDict [ item[1] ] += item[2]
else:
myDict [ item[1] ] = item[2]
but after seeing this question's answer or these two there's got to be a more concise way of expressing what I'd like to do. And if this is a repeat, I missed it and will remove the question if you can provide the link.
Assuming all your values are ints, you could use a defaultdict to make this easier:
from collections import defaultdict
myDict = defaultdict(int)
for item in myTupleList:
myDict[item[1]] += item[2]
defaultdict is like a dictionary, except if you try to get a key that isn't there it fills in the value returned by the callable - in this case, int, which returns 0 when called with no arguments.
UPDATE: Thanks to #gnibbler for reminding me, but tuples can be unpacked in a for loop:
from collections import defaultdict
myDict = defaultdict(int)
for _, key, val in myTupleList:
myDict[key] += val
Here, the 3-item tuple gets unpacked into the variables _, key, and val. _ is a common placeholder name in Python, used to indicate that the value isn't really important. Using this, we can avoid the hairy item[1] and item[2] indexing. We can't rely on this if the tuples in myTupleList aren't all the same size, but I bet they are.
(We also avoid the situation of someone looking at the code and thinking it's broken because the writer thought arrays were 1-indexed, which is what I thought when I first read the code. I wasn't alleviated of this until I read the question. In the above loop, however, it's obvious that myTupleList is a tuple of three elements, and we just don't need the first one.)
from collections import defaultdict
myDict = defaultdict(int)
for _, key, value in myTupleList:
myDict[key] += value
Here's my (tongue in cheek) answer:
myDict = reduce(lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1], myTupleList, {})
It is ugly and bad, but here is how it works.
The first argument to reduce (because it isn't clear there) is lambda d, t: (d.__setitem__(t[1], d.get(t[1], 0) + t[2]), d)[1]. I will talk about this later, but for now, I'll just call it joe (no offense to any people named Joe intended). The reduce function basically works like this:
joe(joe(joe({}, myTupleList[0]), myTupleList[1]), myTupleList[2])
And that's for a three element list. As you can see, it basically uses its first argument to sort of accumulate each result into the final answer. In this case, the final answer is the dictionary you wanted.
Now for joe itself. Here is joe as a def:
def joe(myDict, tupleItem):
myDict[tupleItem[1]] = myDict.get(tupleItem[1], 0) + tupleItem[2]
return myDict
Unfortunately, no form of = or return is allowed in a Python lambda so that has to be gotten around. I get around the lack of = by calling the dicts __setitem__ function directly. I get around the lack of return in by creating a tuple with the return value of __setitem__ and the dictionary and then return the tuple element containing the dictionary. I will slowly alter joe so you can see how I accomplished this.
First, remove the =:
def joe(myDict, tupleItem):
# Using __setitem__ to avoid using '='
myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2])
return myDict
Next, make the entire expression evaluate to the value we want to return:
def joe(myDict, tupleItem):
return (myDict.__setitem__(tupleItem[1], myDict.get(tupleItem[1], 0) + tupleItem[2]),
myDict)[1]
I have run across this use-case for reduce and dict many times in my Python programming. In my opinion, dict could use a member function reduceto(keyfunc, reduce_func, iterable, default_val=None). keyfunc would take the current value from the iterable and return the key. reduce_func would take the existing value in the dictionary and the value from the iterable and return the new value for the dictionary. default_val would be what was passed into reduce_func if the dictionary was missing a key. The return value should be the dictionary itself so you could do things like:
myDict = dict().reduceto(lambda t: t[1], lambda o, t: o + t, myTupleList, 0)
Maybe not exactly readable but it should work:
fks = dict([ (v[1], True) for v in myTupleList ]).keys()
myDict = dict([ (fk, sum([ v[2] for v in myTupleList if v[1] == fk ])) for fk in fks ])
The first line finds all unique foreign keys. The second line builds your dictionary by first constructing a list of (fk, sum(all values for this fk))-pairs and turning that into a dictionary.
Look at SQLAlchemy and see if that does all the mapping you need and perhaps more