Setting values in large tuples efficiently

Setting values in large tuples efficiently - python

I am using ROS and I am currently trying to munipulate a costmap, so essentially I am wanting to change individual values in a tuple that has a length in this situation of approaching 7 digits.
See http://docs.ros.org/api/nav_msgs/html/msg/OccupancyGrid.html
Originally I tried just turning this long tuple into a list then changing the values and then turning this back into a tuple, as you can imagine this extremely inefficient. I need this to be able to run quickly as it needs to update the costmap often for dynamic object avoidance.
Is there a way that I can change individual values in a tuple efficiently?

Sadly, this is simply a limitation of ROS's python message data model. Array-like structures are always deserialized as tuple for performance reasons, except for lists of bool for some reason. And tuple is immutable.
However, if you were in C++ space, you would be receiving a const OccupancyGridConstPtr& anyway, so it would still be just as immutable. Or you could have registered the callback as OccupancyGrid message and get pass-by-value, but you're just moving the copy to method-call-time. There's no avoiding the copy if you intend on modifying the grid, wether you're in Python or C++.
There is no need to convert back to tuple however, ROS's python message serialization accepts either list or tuple.
You can gain quite a bit of efficiency as well if you can do some of your processing work during that copy (saves an iteration over the grid). Though I don't know exactly what you're trying to do, so I don't know the feasibility of that.

The fact that tuples are immutable is supposed to be a feature, not a bug.
So if you need to make changes, the data probably ought not to be in tuple form in the first place.
Can't you have a list from begin to end? (I do not know ROS)
If that is impossible, I'd say you either have a serious problem with the software design or you are trying to solve the wrong problem.

Related

Why does Python allow mixed type lists?

Is there a reason that Python allows lists to contain multiple types? If I had a mixed collection of objects I would think the safe data type to use would be a tuple. Also, I find it strange that list methods (like sort) can be called on mixed lists so I assume there must be a good reason for allowing this. It appears at first glance that this would make writing type safe functions much more difficult.

The difference between a tuple and a list is that lists are mutable and tuples are not. So its not about data type safety, its about whether or not you want the elements to be able to be changed

Understanding len function with iterators

Reading the documentation I have noticed that the built-in function len doesn't support all iterables but just sequences and mappings (and sets). Before reading that, I always thought that the len function used the iteration protocol to evaluate the length of an object, so I was really surprised reading that.
I read the already-posted questions (here and here) but I am still confused, I'm still not getting the real reason why not allow len to work with all iterables in general.
Is it a more conceptual/logical reason than an implementational one? I mean when I'm asking the length of an object, I'm asking for one property (how many elements it has), a property that objects as generators don't have because they do not have elements inside, the produce elements.
Furthermore generator objects can yield infinite elements bring to an undefined length, something that can not happen with other objects as lists, tuples, dicts, etc...
So am I right, or are there more insights/something more that I'm not considering?

The biggest reason is that it reduces type safety.
How many programs have you written where you actually needed to consume an iterable just to know how many elements it had, throwing away anything else?
I, in quite a few years of coding in Python, never needed that. It's a non-sensical operation in normal programs. An iterator may not have a length (e.g. infinite iterators or generators that expects inputs via send()), so asking for it doesn't make much sense. The fact that len(an_iterator) produces an error means that you can find bugs in your code. You can see that in a certain part of the program you are calling len on the wrong thing, or maybe your function actually needs a sequence instead of an iterator as you expected.
Removing such errors would create a new class of bugs where people, calling len, erroneously consume an iterator, or use an iterator as if it were a sequence without realizing.
If you really need to know the length of an iterator, what's wrong with len(list(iterator))? The extra 6 characters? It's trivial to write your own version that works for iterators, but, as I said, 99% of the time this simply means that something with your code is wrong, because such an operation doesn't make much sense.
The second reason is that, with that change, you are violating two nice properties of len that currently hold for all (known) containers:
It is known to be cheap on all containers ever implemented in Python (all built-ins, standard library, numpy & scipy and all other big third party libraries do this on both dynamically sized and static sized containers). So when you see len(something) you know that the len call is cheap. Making it work with iterators would mean that suddenly all programs might become inefficient due to computations of the length.
Also note that you can, trivially, implement O(1) __len__ on every container. The cost to pre-compute the length is often negligible, and generally worth paying.
The only exception would be if you implement immutable containers that have part of their internal representation shared with other instances (to save memory). However, I don't know of any implementation that does this, and most of the time you can achieve better than O(n) time anyway.
In summary: currently everybody implements __len__ in O(1) and it's easy to continue to do so. So there is an expectation for calls to len to be O(1). Even if it's not part of the standard. Python developers intentionally avoid C/C++'s style legalese in their documentation and trust the users. In this case, if your __len__ isn't O(1), it's expected that you document that.
It is known to be not destructive. Any sensible implementation of __len__ doesn't change its argument. So you can be sure that len(x) == len(x), or that n = len(x);len(list(x)) == n.
Even this property is not defined in the documentation, however it's expected by everyone, and currently, nobody violates it.
Such properties are good, because you can reason and make assumptions about code using them.
They can help you ensure the correctness of a piece of code, or understand its asymptotic complexity. The change you propose would make it much harder to look at some code and understand whether it's correct or what would be it's complexity, because you have to keep in mind the special cases.
In summary, the change you are proposing has one, really small, pro: saving few characters in very particular situations, but it has several, big, disadvantages which would impact a huge portion of existing code.
An other minor reason. If len consumes iterators I'm sure that some people would start to abuse this for its side-effects (replacing the already ugly use of map or list-comprehensions). Suddenly people can write code like:
len(print(something) for ... in ...)
to print text, which is really just ugly. It doesn't read well. Stateful code should be relagated to statements, since they provide a visual cue of side-effects.

Lists are for homogeneous data and tuples are for heterogeneous data... why?

I feel like this must have been asked before (probably more than once), so potential apologies in advance, but I can't find it anywhere (here or through Google).
Anyway, when explaining the difference between lists and tuples in Python, the second thing mentioned, after tuples being immutable, is that lists are best for homogeneous data and tuples are best for heterogeneous data. But nobody seems to think to explain why that's the case. So why is that the case?

First of all, that guideline is only sort of true. You're free to use tuples for homogenous data and lists for heterogenous data, and there may be cases where that's a fine thing to do. One important case is if you need the collection to the hashable so you can use it as a dictionary key; in this case you must use a tuple, even if all the elements are homogenous in nature.
Also note that the homogenous/heterogenous distinction is really about the semantics of the data, not just the types. A sequence of a name, occupation, and address would probably be considered heterogenous, even though all three might be represented as strings. So it's more important to think about what you're going to do with the data (i.e., will you actually treat the elements the same) than about what types they are.
That said, I think one reason lists are preferred for homogenous data is because they're mutable. If you have a list of several things of the same kind, it may make sense to add another one to the list, or take one away; when you do that, you're still left with a list of things of the same kind.
By contrast, if you have a collection of things of heterogenous kinds, it's usually because you have a fixed structure or "schema" to them (e.g., the first one is an ID number, the second is a name, the third is an address, or whatever). In this case, it doesn't make sense to add or remove an element from the collection, because the collection is an integrated whole with specified roles for each element. You can't add an element without changing your whole schema for what the elements represent.
In short, changes in size are more natural for homogenous collections than for heterogenous collections, so mutable types are more natural for homogenous collections.

The difference is philosophical more than anything.
A tuple is meant to be a shorthand for fixed and predetermined data meanings. For example:
person = ("John", "Doe")
So, this example is a person, who has a first name and last name. The fixed nature of this is the critical factor. Not the data type. Both "John" and "Doe" are strings, but that is not the point. The benefit of this is unchangeable nature:
You are never surprised to find a value missing. person always has two values. Always.
You are never surprised to find something added. Unlike a dictionary, another bit of code can't "add a new key" or attribute
This predictability is called immutability
It is just a fancy way of saying it has a fixed structure.
One of the direct benefits is that it can be used as a dictionary key. So:
some_dict = {person: "blah blah"}
works. But:
da_list = ["Larry", "Smith"]
some_dict = {da_list: "blah blah"}
does not work.
Don't let the fact that element reference is similar (person[0] vs da_list[0]) throw you off. person[0] is a first name. da_list[0] is simply the first item in a list at this moment in time.

It's not a rule, it's just a tradition.
In many languages, lists must be homogenous and tuples must be fixed-length. This is true of C++, C#, Haskell, Rust, etc. Tuples are used as anonymous structures. It is the same way in mathematics.
Python's type system, however, does not allow you to make these distinctions: you can make tuples of dynamic length, and you can make lists with heterogeneous data. So you are allowed to do whatever you want with lists and tuples in Python, it just might be surprising to other people reading your code. This is especially true if the people reading your code have a background in mathematics or are more familiar with other languages.

Lists are often used for iterating over them, and performing the same operation to every element in the list. Lots of list operations are based on that. For that reason, it's best to have every element be the same type, lest you get an exception because an item was the wrong type.
Tuples are more structured data; they're immutable so if you handle them correctly you won't run into type errors. That's the data structure you'd use if you specifically want to combine multiple types (like an on-the-fly struct).

Fastest Get Python Data Structures

I am developing AI to perform MDP, I am getting states(just integers in this case) and assigning it a value, and I am going to be doing this a lot. So I am looking for a data structure that can hold(no need for delete) that information and will have a very fast get/update function. Is there something faster than the regular dictionary? I am looking for anything really so native python, open sourced, I just need fast getting.

Using a Python dictionary is the way to go.

You're saying that all your keys are integers? In that case, it might be faster to use a list and just treat the list indices as the key values. However, you'd have to make sure that you never delete or add list items; just start with as many as you think you'll need, setting them all equal to None, as shown:
mylist = [None for i in xrange(totalitems)]
Then, when you need to "add" an item, just set the corresponding value.
Note that this probably won't actually gain you much in terms of actual efficiency, and it might be more confusing than just using a dictionary.
For 10,000 items, it turns out (on my machine, with my particular test case) that accessing each one and assigning it to a variable takes about 334.8 seconds with a list and 565 seconds with a dictionary.

If you want a rapid prototype, use python. And don't worry about speed.
If you want to write fast scientific code (and you can't build on fast native libraries, like LAPACK for linear algebra stuff) write it in C, C++ (maybe only to call from Python). If fast instead of ultra-fast is enough, you can also use Java or Scala.

List or Dictionary for storing game map

So I currently have a 2d list of objects that define the map of a game where each object represents a tile on that map. As I was repurposing the code for something else, I wondered if it would make more sense to use a dictionary to store the map data or to continue using a list. With a list, the indices represent the x and y of the map, whereas in a dictionary, a (x,y) tuple would be the keys for the dictionary.
The reason I ask is because the map changing is a rare event, so the data is fairly static, and as far as i know, the fairly constant lookups will be faster in dictionaries. It should also simplify looping through the map to draw it. Mostly I think using dictionaries will simplify accessing the data, though I'm not sure that will be the case in all cases.
Are these benefits worth the additional memory that I assume the dictionary will take up? or am I even right about the benefits being benefits?
EDIT
I know that the current method works, its was moreso to whether or not it would make sense to switch in order to have cleaner looking code and to find any potential drawbacks.
Stuff like looping through the array would go from something like
for i in range(size[0]):
for e in range(size[1]):
thing.blit(....using i and e)
to
for i, e in dict.items():
i.blit(....using i and e)
or looking up a dict item would be
def get(x, y):
if (x in range(size[0])) and (y in range(size[1])):
return self.map[x][y].tile
to
def get(item):
return self.dict.get(item)
its not much, but its somewhat cleaner, and if its not any slower and there are no other drawbacks i see no reason not to.

I would be wary of premature optimization.
Does your current approach have unacceptable performance? Does the data structure you're using make it harder to reason about or write your code?
If there isn't a specific problem you need to solve that can't be addressed with your current architecture, I would be wary about changing it.

This is a good answer to reference about the speed and memory usage of python lists vs dictionarys: https://stackoverflow.com/a/513906/1506904
Until you get to an incredibly large data set it is most likely that your current method, if it is working well for you, will be perfectly suitable.

I'm not sure that you'll get "the right" answer for this, but when I created the *Game of Life *in Python I used a dict. Realistically there should not be a substantial difference between the lookup cost of multi-dimensional lists and the lookup cost in a dict (both are O(1)), but if you're using a dict then you don't need to worry about instantiating the entire game-board. In chess, this means that you are only creating 32 pieces instead of 64 squares and 32 pieces. In the game of Go, on the other hand you can create only 1 object instead of 361 list-cells.
That said, in the case of the dict you will need to instantiate the tuples. If you can cache those (or only iterate the dict's keys) then maybe you will get the best of all worlds.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.