Python, how to check object is in list? - python

I have an object to store data:
Vertex(key, children)
and a list to store this objects
vertices = []
im using another list
key_vertices = []
to store keys of my vertices, so i can easy access(withot looping every object), each time i need to check vertex with such key exist, like that:
if key not in self.key_vertices:
# add a new key to the array
self.key_vertices.append(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))
i think it a bit complicated, maybe someone now better way, to store multiple Vertices objects with ability easy to check and access them
Thanks

your example works fine, the only problem you could have is a performance issue with the in operator for list which is O(n).
If you don't care about the order of the keys (which is likely), just do this:
self.key_vertices = set()
then:
if key not in self.key_vertices:
# add a new key to the array
self.key_vertices.add(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))
you'll save a lot of time in the in operator because set in is way faster due to key hashing.
And if you don't care about order in self.verteces, just do a dictionary, and in that case, you probably don't need the first key parameter to your Vertex structure.
self.verteces = dict()
if key not in self.verteces:
# create a vertex object to store dependency information
self.verteces[key] = Vertex(children)

When you need to check for membership, a list is not the best choice as every object in the list will be checked.
If key is hashable, use a set.
If it's not hashable but is comparable, use a tree (unavailable in the standard library). Try to make it hashable.

If I understand correctly, you want to check if an element has already been added for O(1) (i.e. that you do not have to check every element in the list).
The easiest way to do that is use a set. A set is an unordered list that allows you to check if an element exists with a constant time O(1). You can think of a set like a dict with keys only but it works just like a list:
for value in mySet:
print(value)
print("hello" in mySet)
If you need an ordered list (most of the time, you don't), your approach is pretty good but I would use a set instead:
self.vertece_set = set() # Init somewhere else ;)
if key not in self.vertece_set:
# add a new key to the array
self.vertece_set.add(key)
# create a vertex object to store dependency information
self.verteces.append(Vertex(key, children))

Related

Solutions for a Dynamic Infinite Tree Structure in Python

I am trying to build a Tree Structure, starting at a point 1, which can branch into infinte directions. Every point can path into infinite other points ( 1.1, 1.2, 1.3, ... ) and each of those points can also path into infinite points (1.1.1, 1.2.1, 1.2.2, ...).
My plan was to store an Object at every point and be able to refer to them by a position 1.1.1 etc. Also i decided to generate every point dynamically, so the Tree starts at 1 and only branches when an Object is created.
Since i tend to overcomplicate things i used a nested Dictionary, so i could refer to a object by using dict[1][1]["data"], but i'm struggling with the use of an infinite nested Dictionary:
How do i use a Dictionary if the amount of "[1]" varies? (think dict[1][1][1]....[1]["data"]).
I can simply loop through the dict to find the data, like
for i in [1.1.1]:
point = dict[i]
But i can't find a way to open new dictionary branches, or store data, when the amount of "[1]" is unknown.
Basically, I want to know if a simpler solution exists and how to deal with too many nested "[]" brackets.
You might want a different way of retrieving values than using [], since as you said it's hard to do when you don't know how deep something is.
Instead you can use a simple recursive function, and use a list for your key instead of a string:
def fetch_field(subtree, key_list):
if not key_list:
return subtree["data"]
return fetch_field(subtree[key_list[0]], key_list[1:])
key = "1.2.1.3"
# Instead of using a string, split it into a list:
key = key.split(".")
fetch_field(tree, key)
You can tweak the function to accept a string instead of an array if you like, I personally prefer working with a list instead of messing around with strings.

Ignoring a value in tuple comparison but still retrieving it

I'm working with a set that contains tuples of the form (position, name) and need to check if a value already exists in the set for the name while ignoring the position.
Is there a way that I can use the in operator similar to value in my_set, ignoring the position variable in the tuple during comparison, but still retrieving it? Something similar to (_, value) in my_set or (*, value) in my_set), but those don't work, first one returning an incorrect value, and the second raising a SyntaxError.
Obviously I can use a loop or a generator comprehension like value in (tup[1] for tup in my_set), but that doesn't retrieve the position variable from that tuple, and I was curious if there was some form of one-liner comprehension that would do this.
You can do this in O(n) with the existing data structure (iterating the set), but for O(1) you'll have to change data structure. You will need to make a lookup:
from collections import defaultdict
positions = defaultdict(list)
for position, name in my_set:
positions[name].append(position)
Now this is an O(1) operation:
name in positions
Retrieving all per name:
for pos in positions[name]:
...
If you want this to keep in synch with my_set mutations, then you will need to add in hooks for updating positions at the same time as adds/deletes to my_set. It might be better to rethink the underlying data structure entirely, for example, using a dict instead of a set in the first place.

How to remove many elements in a very big dict?

I have a very large dict, and I want to del many elements from it. Perhaps I should do this:
new_dict = { key:big_dict[key] for key in big_dict if check(big_dict[key] }
However, I don't have enough memory to keep both old_dict and new_dict in RAM. Is there any way to deal?
Add:
I can't del elements one by one. I need to do a test for the values to find which elements I want to del.
I also can't del elements in a for loop like:
for key in dic:
if test(dic(key)):
del dic[key]
It case a error... Can't change len(dic) in the loop...
My God... I even can't make a set to remember keys to del, there are too much keys...
I see, if dict class don't have a function to do this, perhaps the only way to do this is to bug a new computer...
Here are some options:
Make a new 'dict' on disk, for which pickle and shelve may be helpful.
Iterate through and build up a list of keys until it reaches a certain size, delete those, and then repeat the iteration again, allowing you to make a bigger list each time.
Store the keys to delete in terms of their index in .keys(), which can be more memory efficient. This is OK as long as the dictionary is not modified between calls to .keys(). If about half of the elements are to be deleted, do this with a binary sequeunce (1 = delete, 0 = keep). If a vast majority of elements are to be deleted (or not deleted) store the appropriate keys as integers in a list.
You could try iterating through the dictionary and deleting the element that you do not require by
del big_dict[key]
This way you wouldn't be making copies of the dictionary.
You can use
big_dict.pop("key", None)
refer here
How to remove a key from a python dictionary?

How can I initialize and increment an undefined value within a list in Python 3?

What I have is a dictionary of words and I'm generating objects that contain
(1) Original word (e.g. cats)
(2) Alphabetized word (e.g. acst)
(3) Length of the word
Without knowing the length of the longest word, is it possible to create an array (or, in Python, a list) such that, as I scan through the dictionary, it will append an object with x chars into a list in array[x]?
For example, when I encounter the word "a", it will append the generated object to the list at array[1]. Next, for aardvark, if will append the generated object to the list at array[8], etc.
I thought about creating an array of size 1 and then adding on to it, but I'm not sure how it would work.
Foe example: for the first word, a, it will append it to the list stored in array[1]. However, for next word, aardvark, how am I supposed to check/generate more spots in the list until it hits 8? If I append to array, I need give the append function an arg. But, I can't give it just any arg since I don't want to change previously entered values (e.g. 'a' in array[1]).
I'm trying to optimize my code for an assignment, so the alternative is going through the list a second time after I've determined the longest word. However, I think it would be better to do it as I alphabetize the words and create the objects such that I don't have to go through the lengthy dictionary twice.
Also, quick question about syntax: listOfStuff[x].append(y) will initialize/append to the list within listOfStuff at the value x with the value y, correct?
Store the lengths as keys in a dict rather than as indexes in a list. This is really easy if you use a defaultdict from the collections module - your algorithm will look like this:
from collections import defaultdict
results = defaultdict(list)
for word in words:
results[len(word)].append(word)
This ties in to your second question: listOfStuff[x].append(y) will append to a list that already exists at listofStuff[x]. It will not create a new one if that hasn't already been initialised to a (possibly empty) list. If x isn't a valid index to the list (eg, x=3 into a listOfStuff length 2), you'll get an IndexError. If it exists but there is something other than another list there, you will probably get an AttributeError.
Using a dict takes care of the first problem for you - assigning to a non-existent dict key is always valid. Using a defaultdict extends this idea to also reading from a non-existent key - it will insert a default value given by calling the function you give the defaultdict when you create it (in this case, we gave it list, so it calls it and gets an empty list) into the dict the first time you use it.
If you can't use collections for some reason, the next best way is still to use dicts - they have a method called setdefault that works similarly to defaultdicts. You can use it like this:
results = {}
for word in words:
results.setdefault(len(word), []).append(word)
as you can see, setdefault takes two arguments: a key and a default value. If the key already exists in the dict, setdefault just returns its current value as if you'd done results[key]. If that would be an error, however, it inserts the second argument into the dictionary at that key, and then returns it. This is a little bit clunkier to use than defaultdict, but when your default value is an empty list it is otherwise the same (defaultdict is better to use when your default is expensive to create, however, since it only calls the factory function as needed, but you need to precompute it to pass into setdefault).
It is technically possible to do this with nested lists, but it is ugly. You have to:
Detect the case that the list isn't big enough
Figure out how many more elements the list needs
Grow the list to that size
the most Pythonic way to do the first bit is to catch the error (something you could also do with dicts if setdefault and defaultdict didn't exist). The whole thing looks like this:
results = []
for word in words:
try:
results[len(word)]
except IndexError:
# Grow the list so that the new highest index is
# len(word)
new_length = len(word) + 1
difference = len(results) - new_length
results.extend([] for _ in range(difference))
finally:
results[len(word)].append(word)
Stay with dicts to avoid this kind of mess. lists are specifically optimised for the case that the exact numeric index of any element isn't meaningful outside of the list, which doesn't meet your use case. This type of code is really common when you have a mismatch between what your code needs to do and what the data structures you're using are good at, and it is worth learning as early as possible how to avoid it.

Look up python dict value by expression

I have a dict that has unix epoch timestamps for keys, like so:
lookup_dict = {
1357899: {} #some dict of data
1357910: {} #some other dict of data
}
Except, you know, millions and millions and millions of entries. I'd like to subset this dict, over and over again. Ideally, I'd love to be able to write something like I can in R, like:
lookup_value = 1357900
dict_subset = lookup_dict[key >= lookup_value]
# dict_subset now contains {1357910: {}}
But I confess, I can't find any actual proof that this is something Python can do without having, one way or the other, to iterate over every row. If I understand Python correctly (and I might not), key lookup of the form key in dict uses binary search, and is thus very fast; any way to do a binary search, on dict keys?
To do this without iterating, you're going to need the keys in sorted order. Then you just need to do a binary search for the first one >= lookup_value, instead of checking each one for >= lookup_value.
If you're willing to use a third-party library, there are plenty out there. The first two that spring to mind are bintrees (which uses a red-black tree, like C++, Java, etc.) and blist (which uses a B+Tree). For example, with bintrees, it's as simple as this:
dict_subset = lookup_dict[lookup_value:]
And this will be as efficient as you'd hope—basically, it adds a single O(log N) search on top of whatever the cost of using that subset. (Of course usually what you want to do with that subset is iterate the whole thing, which ends up being O(N) anyway… but maybe you're doing something different, or maybe the subset is only 10 keys out of 1000000.)
Of course there is a tradeoff. Random access to a tree-based mapping is O(log N) instead of "usually O(1)". Also, your keys obviously need to be fully ordered, instead of hashable (and that's a lot harder to detect automatically and raise nice error messages on).
If you want to build this yourself, you can. You don't even necessarily need a tree; just a sorted list of keys alongside a dict. You can maintain the list with the bisect module in the stdlib, as JonClements suggested. You may want to wrap up bisect to make a sorted list object—or, better, get one of the recipes on ActiveState or PyPI to do it for you. You can then wrap the sorted list and the dict together into a single object, so you don't accidentally update one without updating the other. And then you can extend the interface to be as nice as bintrees, if you want.
Using the following code will work out
some_time_to_filter_for = # blah unix time
# Create a new sub-dictionary
sub_dict = {key: val for key, val in lookup_dict.items()
if key >= some_time_to_filter_for}
Basically we just iterate through all the keys in your dictionary and given a time to filter out for we take all the keys that are greater than or equal to that value and place them into our new dictionary

Categories