Python:
I have to use the length of a list which is the value for a key in a dictionary. I have to use this value in FOR loop. Is it better to fetch the length of the list associated with the key every time or fetch the length from a different dictionary which has the same keys?
I am using len() in the for loop as of now.
len() is very fast - it runs in contant time (see Cost of len() function) so I would not build a new data structure just to cache its answer. Just use it each time you need it.
Building a whole extra data structure, that would definitely be using more resources, and most likely slower. Just make sure you write your loop over my_dict.items(), not over the keys, so you don't unnecessarily redo the key lookups inside the loop.
E.g., use something like this for efficient looping over your dict:
my_dict = <some dict where the values are lists>
for key, value in my_dict.items():
# use key, value (your list) and len(value) (its length) as needed
Related
I'm working with a set that contains tuples of the form (position, name) and need to check if a value already exists in the set for the name while ignoring the position.
Is there a way that I can use the in operator similar to value in my_set, ignoring the position variable in the tuple during comparison, but still retrieving it? Something similar to (_, value) in my_set or (*, value) in my_set), but those don't work, first one returning an incorrect value, and the second raising a SyntaxError.
Obviously I can use a loop or a generator comprehension like value in (tup[1] for tup in my_set), but that doesn't retrieve the position variable from that tuple, and I was curious if there was some form of one-liner comprehension that would do this.
You can do this in O(n) with the existing data structure (iterating the set), but for O(1) you'll have to change data structure. You will need to make a lookup:
from collections import defaultdict
positions = defaultdict(list)
for position, name in my_set:
positions[name].append(position)
Now this is an O(1) operation:
name in positions
Retrieving all per name:
for pos in positions[name]:
...
If you want this to keep in synch with my_set mutations, then you will need to add in hooks for updating positions at the same time as adds/deletes to my_set. It might be better to rethink the underlying data structure entirely, for example, using a dict instead of a set in the first place.
I have a very large dict, and I want to del many elements from it. Perhaps I should do this:
new_dict = { key:big_dict[key] for key in big_dict if check(big_dict[key] }
However, I don't have enough memory to keep both old_dict and new_dict in RAM. Is there any way to deal?
Add:
I can't del elements one by one. I need to do a test for the values to find which elements I want to del.
I also can't del elements in a for loop like:
for key in dic:
if test(dic(key)):
del dic[key]
It case a error... Can't change len(dic) in the loop...
My God... I even can't make a set to remember keys to del, there are too much keys...
I see, if dict class don't have a function to do this, perhaps the only way to do this is to bug a new computer...
Here are some options:
Make a new 'dict' on disk, for which pickle and shelve may be helpful.
Iterate through and build up a list of keys until it reaches a certain size, delete those, and then repeat the iteration again, allowing you to make a bigger list each time.
Store the keys to delete in terms of their index in .keys(), which can be more memory efficient. This is OK as long as the dictionary is not modified between calls to .keys(). If about half of the elements are to be deleted, do this with a binary sequeunce (1 = delete, 0 = keep). If a vast majority of elements are to be deleted (or not deleted) store the appropriate keys as integers in a list.
You could try iterating through the dictionary and deleting the element that you do not require by
del big_dict[key]
This way you wouldn't be making copies of the dictionary.
You can use
big_dict.pop("key", None)
refer here
How to remove a key from a python dictionary?
I have values in a list of lists.
I would like to send the whole block to a conversion function which then returns all the converted values in the same structure.
my_list = [sensor1...sensor4] = [hum1...hum3] = [value1, value2, value3, value4]
So several nested lists
def conversion(my_list): dictionaries
for sensor in my_list:
for hum in sensor:
for value in hum:
map(function, value)
Is there a way to do a list comprehension as a one liner? I'm not sure how to use the map function in comprehensions especially when you have several nested iterations.
map(function, value)
Since you are just mapping a function on each value, without collecting the return value in a list, using a list comprehension is not a good idea. You could do it, but you would be collecting list items that have no value, for the sole purpose of throwing them away later—just so you can save a few lines that actually serve a much better purpose: Clearly telling what’s going on, without being in a single, long, and complicated line.
So my advice would be to keep it as it is. It makes more sense like that and clearly shows what’s going on.
I am however collecting the values. They all need to be converted and saved in the same structure as they were.
In that case, you still don’t want a list comprehension as that would mean that you created a new list (for no real reason). Instead, just update the most-inner list. To do that, you need to change the way you’re iterating though:
for sensor in my_list:
for hum in sensor:
for i, value in enumerate(hum):
hum[i] = map(function, value)
This will update the inner list.
Alternatively, since value is actually a list of values, you can also replace the value list’s contents using the slicing syntax:
for sensor in my_list:
for hum in sensor:
for value in hum:
value[:] = map(function, value)
Also one final note: If you are using Python 3, remember that map returns a generator, so you need to convert it to a list first using list(map(function, value)); or use a list comprehension for that part with [function(v) for v in value].
This is the right way to do it. You can use list comprehension to do that, but you shouldn't for code readability and because it's probably not faster.
I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?
To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.
Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary
I have a default dict of dicts whose primary key is a timestamp in the string form 'YYYYMMDD HH:MM:SS.' The keys are entered sequentially. How do I access the last entered key or the key with the latest timestamp?
Use an OrderedDict from the collections module if you simply need to access the last item entered. If, however, you need to maintain continuous sorting, you need to use a different data structure entirely, or at least an auxiliary one for the purposes of indexing.
Edit: I would add that, if accessing the final element is an operation that you have to do very rarely, it may be sufficient simply to sort the dict's keys and select the maximum. If you have to do this frequently, however, repeatedly sorting would become prohibitively expensive. Depending on how your code works, the simplest approach would probably be to simply maintain a single variable that, at any given point, contains the last key added and/or the maximum value added (i.e., is updated with each subsequent addition to the dict). If you want to maintain a record of additions that extends beyond just the last item, however, and don't require continuous sorting, an OrderedDict is ideal.
Use OrderedDict rather than a built-in dict
You can try something like this:
>>> import time
>>> data ={'20120627 21:20:23':'first','20120627 21:20:40':'last'}
>>> latest = lambda d: time.strftime('%Y%m%d %H:%M:%S',max(map(lambda x: time.strptime(x,'%Y%m%d %H:%M:%S'),d.keys())))
>>> data[latest(data)]
'last'
but it probably would be slow on large data sets.
If you want to know who entered the last (according to time of entrance) see the example below:
import datetime
format='%Y%m%d %H:%M'
Dict={'20010203 12:00':'Dave',
'20000504 03:00':'Pete',
'20020825 23:00':'kathy',
'20030102 01:00':'Ray'}
myDict={}
for key,val in Dict.iteritems():
TIME= str(datetime.datetime.strptime(key,format))
myDict[TIME]= val
myDict=sorted(myDict.iteritems(), key=lambda (TIME,v): (TIME))
print myDict[-1]