I'm working with a set that contains tuples of the form (position, name) and need to check if a value already exists in the set for the name while ignoring the position.
Is there a way that I can use the in operator similar to value in my_set, ignoring the position variable in the tuple during comparison, but still retrieving it? Something similar to (_, value) in my_set or (*, value) in my_set), but those don't work, first one returning an incorrect value, and the second raising a SyntaxError.
Obviously I can use a loop or a generator comprehension like value in (tup[1] for tup in my_set), but that doesn't retrieve the position variable from that tuple, and I was curious if there was some form of one-liner comprehension that would do this.
You can do this in O(n) with the existing data structure (iterating the set), but for O(1) you'll have to change data structure. You will need to make a lookup:
from collections import defaultdict
positions = defaultdict(list)
for position, name in my_set:
positions[name].append(position)
Now this is an O(1) operation:
name in positions
Retrieving all per name:
for pos in positions[name]:
...
If you want this to keep in synch with my_set mutations, then you will need to add in hooks for updating positions at the same time as adds/deletes to my_set. It might be better to rethink the underlying data structure entirely, for example, using a dict instead of a set in the first place.
Related
Python:
I have to use the length of a list which is the value for a key in a dictionary. I have to use this value in FOR loop. Is it better to fetch the length of the list associated with the key every time or fetch the length from a different dictionary which has the same keys?
I am using len() in the for loop as of now.
len() is very fast - it runs in contant time (see Cost of len() function) so I would not build a new data structure just to cache its answer. Just use it each time you need it.
Building a whole extra data structure, that would definitely be using more resources, and most likely slower. Just make sure you write your loop over my_dict.items(), not over the keys, so you don't unnecessarily redo the key lookups inside the loop.
E.g., use something like this for efficient looping over your dict:
my_dict = <some dict where the values are lists>
for key, value in my_dict.items():
# use key, value (your list) and len(value) (its length) as needed
I have an object to store data:
Vertex(key, children)
and a list to store this objects
vertices = []
im using another list
key_vertices = []
to store keys of my vertices, so i can easy access(withot looping every object), each time i need to check vertex with such key exist, like that:
if key not in self.key_vertices:
# add a new key to the array
self.key_vertices.append(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))
i think it a bit complicated, maybe someone now better way, to store multiple Vertices objects with ability easy to check and access them
Thanks
your example works fine, the only problem you could have is a performance issue with the in operator for list which is O(n).
If you don't care about the order of the keys (which is likely), just do this:
self.key_vertices = set()
then:
if key not in self.key_vertices:
# add a new key to the array
self.key_vertices.add(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))
you'll save a lot of time in the in operator because set in is way faster due to key hashing.
And if you don't care about order in self.verteces, just do a dictionary, and in that case, you probably don't need the first key parameter to your Vertex structure.
self.verteces = dict()
if key not in self.verteces:
# create a vertex object to store dependency information
self.verteces[key] = Vertex(children)
When you need to check for membership, a list is not the best choice as every object in the list will be checked.
If key is hashable, use a set.
If it's not hashable but is comparable, use a tree (unavailable in the standard library). Try to make it hashable.
If I understand correctly, you want to check if an element has already been added for O(1) (i.e. that you do not have to check every element in the list).
The easiest way to do that is use a set. A set is an unordered list that allows you to check if an element exists with a constant time O(1). You can think of a set like a dict with keys only but it works just like a list:
for value in mySet:
print(value)
print("hello" in mySet)
If you need an ordered list (most of the time, you don't), your approach is pretty good but I would use a set instead:
self.vertece_set = set() # Init somewhere else ;)
if key not in self.vertece_set:
# add a new key to the array
self.vertece_set.add(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))
I have values in a list of lists.
I would like to send the whole block to a conversion function which then returns all the converted values in the same structure.
my_list = [sensor1...sensor4] = [hum1...hum3] = [value1, value2, value3, value4]
So several nested lists
def conversion(my_list): dictionaries
for sensor in my_list:
for hum in sensor:
for value in hum:
map(function, value)
Is there a way to do a list comprehension as a one liner? I'm not sure how to use the map function in comprehensions especially when you have several nested iterations.
map(function, value)
Since you are just mapping a function on each value, without collecting the return value in a list, using a list comprehension is not a good idea. You could do it, but you would be collecting list items that have no value, for the sole purpose of throwing them away later—just so you can save a few lines that actually serve a much better purpose: Clearly telling what’s going on, without being in a single, long, and complicated line.
So my advice would be to keep it as it is. It makes more sense like that and clearly shows what’s going on.
I am however collecting the values. They all need to be converted and saved in the same structure as they were.
In that case, you still don’t want a list comprehension as that would mean that you created a new list (for no real reason). Instead, just update the most-inner list. To do that, you need to change the way you’re iterating though:
for sensor in my_list:
for hum in sensor:
for i, value in enumerate(hum):
hum[i] = map(function, value)
This will update the inner list.
Alternatively, since value is actually a list of values, you can also replace the value list’s contents using the slicing syntax:
for sensor in my_list:
for hum in sensor:
for value in hum:
value[:] = map(function, value)
Also one final note: If you are using Python 3, remember that map returns a generator, so you need to convert it to a list first using list(map(function, value)); or use a list comprehension for that part with [function(v) for v in value].
This is the right way to do it. You can use list comprehension to do that, but you shouldn't for code readability and because it's probably not faster.
What I have is a dictionary of words and I'm generating objects that contain
(1) Original word (e.g. cats)
(2) Alphabetized word (e.g. acst)
(3) Length of the word
Without knowing the length of the longest word, is it possible to create an array (or, in Python, a list) such that, as I scan through the dictionary, it will append an object with x chars into a list in array[x]?
For example, when I encounter the word "a", it will append the generated object to the list at array[1]. Next, for aardvark, if will append the generated object to the list at array[8], etc.
I thought about creating an array of size 1 and then adding on to it, but I'm not sure how it would work.
Foe example: for the first word, a, it will append it to the list stored in array[1]. However, for next word, aardvark, how am I supposed to check/generate more spots in the list until it hits 8? If I append to array, I need give the append function an arg. But, I can't give it just any arg since I don't want to change previously entered values (e.g. 'a' in array[1]).
I'm trying to optimize my code for an assignment, so the alternative is going through the list a second time after I've determined the longest word. However, I think it would be better to do it as I alphabetize the words and create the objects such that I don't have to go through the lengthy dictionary twice.
Also, quick question about syntax: listOfStuff[x].append(y) will initialize/append to the list within listOfStuff at the value x with the value y, correct?
Store the lengths as keys in a dict rather than as indexes in a list. This is really easy if you use a defaultdict from the collections module - your algorithm will look like this:
from collections import defaultdict
results = defaultdict(list)
for word in words:
results[len(word)].append(word)
This ties in to your second question: listOfStuff[x].append(y) will append to a list that already exists at listofStuff[x]. It will not create a new one if that hasn't already been initialised to a (possibly empty) list. If x isn't a valid index to the list (eg, x=3 into a listOfStuff length 2), you'll get an IndexError. If it exists but there is something other than another list there, you will probably get an AttributeError.
Using a dict takes care of the first problem for you - assigning to a non-existent dict key is always valid. Using a defaultdict extends this idea to also reading from a non-existent key - it will insert a default value given by calling the function you give the defaultdict when you create it (in this case, we gave it list, so it calls it and gets an empty list) into the dict the first time you use it.
If you can't use collections for some reason, the next best way is still to use dicts - they have a method called setdefault that works similarly to defaultdicts. You can use it like this:
results = {}
for word in words:
results.setdefault(len(word), []).append(word)
as you can see, setdefault takes two arguments: a key and a default value. If the key already exists in the dict, setdefault just returns its current value as if you'd done results[key]. If that would be an error, however, it inserts the second argument into the dictionary at that key, and then returns it. This is a little bit clunkier to use than defaultdict, but when your default value is an empty list it is otherwise the same (defaultdict is better to use when your default is expensive to create, however, since it only calls the factory function as needed, but you need to precompute it to pass into setdefault).
It is technically possible to do this with nested lists, but it is ugly. You have to:
Detect the case that the list isn't big enough
Figure out how many more elements the list needs
Grow the list to that size
the most Pythonic way to do the first bit is to catch the error (something you could also do with dicts if setdefault and defaultdict didn't exist). The whole thing looks like this:
results = []
for word in words:
try:
results[len(word)]
except IndexError:
# Grow the list so that the new highest index is
# len(word)
new_length = len(word) + 1
difference = len(results) - new_length
results.extend([] for _ in range(difference))
finally:
results[len(word)].append(word)
Stay with dicts to avoid this kind of mess. lists are specifically optimised for the case that the exact numeric index of any element isn't meaningful outside of the list, which doesn't meet your use case. This type of code is really common when you have a mismatch between what your code needs to do and what the data structures you're using are good at, and it is worth learning as early as possible how to avoid it.
I have a default dict of dicts whose primary key is a timestamp in the string form 'YYYYMMDD HH:MM:SS.' The keys are entered sequentially. How do I access the last entered key or the key with the latest timestamp?
Use an OrderedDict from the collections module if you simply need to access the last item entered. If, however, you need to maintain continuous sorting, you need to use a different data structure entirely, or at least an auxiliary one for the purposes of indexing.
Edit: I would add that, if accessing the final element is an operation that you have to do very rarely, it may be sufficient simply to sort the dict's keys and select the maximum. If you have to do this frequently, however, repeatedly sorting would become prohibitively expensive. Depending on how your code works, the simplest approach would probably be to simply maintain a single variable that, at any given point, contains the last key added and/or the maximum value added (i.e., is updated with each subsequent addition to the dict). If you want to maintain a record of additions that extends beyond just the last item, however, and don't require continuous sorting, an OrderedDict is ideal.
Use OrderedDict rather than a built-in dict
You can try something like this:
>>> import time
>>> data ={'20120627 21:20:23':'first','20120627 21:20:40':'last'}
>>> latest = lambda d: time.strftime('%Y%m%d %H:%M:%S',max(map(lambda x: time.strptime(x,'%Y%m%d %H:%M:%S'),d.keys())))
>>> data[latest(data)]
'last'
but it probably would be slow on large data sets.
If you want to know who entered the last (according to time of entrance) see the example below:
import datetime
format='%Y%m%d %H:%M'
Dict={'20010203 12:00':'Dave',
'20000504 03:00':'Pete',
'20020825 23:00':'kathy',
'20030102 01:00':'Ray'}
myDict={}
for key,val in Dict.iteritems():
TIME= str(datetime.datetime.strptime(key,format))
myDict[TIME]= val
myDict=sorted(myDict.iteritems(), key=lambda (TIME,v): (TIME))
print myDict[-1]