Does python iterable imply countable? - python

I've stumbled upon an interesting case - only thing I am sure about is that I will get an iterable object.
What I really, and only, want to do is to count it.
I've searched if iterable in python implies countable and I found various places claiming so, except the official docs.
So 2 questions arise:
In Python, does iterable => countable (number of items)? Or is it just very common to be so?
Is there a generic pythonic way to get count from an iterable? Which seems to be answered here https://stackoverflow.com/a/3345807/1835470 i.e. not without counting, but author provided a pythonic one-liner:
sum(1 for _ in iterableObject)

iterables do not support the len method. To get their length you have to cast them into something that does support it or manually loop through them (and exhaust them in the process) while counting such as in this code snippet you posted.
As a result, the requirement of using an iterable and needing to know its length is somewhat conflicting. Plus as #Veedrac says in the comments, iterables can have infinite length.
To sum up, if you need to know how many items are contained, you need a different data structure.

Related

Why is it allowed for a list to append itself?

I noticed by accident that in Python you can write something like the following:
l = list()
for i in range(5):
l.append(l)
If I print out what l is it looks like this:
[[...], [...], [...], [...], [...]]
And now I can do things like:
l[0][0][0][0]...[0]
which will return still the exact same list.
The error I had made in my code was not detected by my linter and so might have been difficult to spot. So my question is: why is it allowed for a list to append itself? What use can you make of it?
If it's not useful it should probably be disallowed.
Your criteria for not allowing a list in a list is that it isn't useful. But it's hard to prove that's not true. Maybe nobody knows yet a use, but that doesn't someone won't in the future. In the meantime, there is added cost to the language both in terms of explaining what exceptions there are to adding an object to a list, and in implementing those exceptions.
If you could prove that it is harmful to allow adding a list to itself, then you would have a stronger case for making that change to the language. But "I accidentally typed the wrong variable name" is not a strong argument for harm.
Why is it allowed for list to append to itself? Remember that in python, everything is a reference. A list is just a sequence of references to objects. Since a list is an object, it can be referenced by another list. When a list references an object, its not like that object is actually contained within the list now, its just that a single element of the sequence is now a reference to that object. Though circular references are sometimes bad, they are sometimes useful, and nothing has actually been broken here. Remember, a list containing itself is actually just a list that has a reference to itself.
Why is it allowed for a list to append itself?
Your assumption that it's an error and should not be allowed is incorrect. It's up to us as engineers to decide if it's a good fit for the situation we have or not. I'll turn this question around, why do you think it should be disallowed?
What use can you make of it?
Adding arrays to arrays is actually pretty common. I could imagine a situation where you have to start with 5 identical arrays, perform different calculations on each one, then compare the results. This would be a way to accomplish that.

Comparison, Python deque's or indexes?

I am required to access the leftmost element of a Python list, while popping elements from the left. One example of this usage could be the merge operation of a merge sort.
There are two good ways of doing it.
collections.deque
Using indexes, keep an increasing integer as I pop.
Both these methods seems efficient in complexity terms (Should mention at the beginning of the program I need to convert the list to deque, meaning some extra O(n)).
So in the terms of style and speed, which is the best for Python usage ? I did not see an official recommendation or did not encounter a similar question which is why I am asking this.
Thanks in advance.
Related: Time complexity
Definitely collections.deque. First, it's already written so you don't have to. Second, it's written in C so probably it is going to be much faster than another Python re-implementation. Third, using index leaves the head of the list unused, whereas queue is more clever. Appending to a standard list only to increase the starting index on each left pop is very inefficient because you would need to realloc the list more often.

Understanding len function with iterators

Reading the documentation I have noticed that the built-in function len doesn't support all iterables but just sequences and mappings (and sets). Before reading that, I always thought that the len function used the iteration protocol to evaluate the length of an object, so I was really surprised reading that.
I read the already-posted questions (here and here) but I am still confused, I'm still not getting the real reason why not allow len to work with all iterables in general.
Is it a more conceptual/logical reason than an implementational one? I mean when I'm asking the length of an object, I'm asking for one property (how many elements it has), a property that objects as generators don't have because they do not have elements inside, the produce elements.
Furthermore generator objects can yield infinite elements bring to an undefined length, something that can not happen with other objects as lists, tuples, dicts, etc...
So am I right, or are there more insights/something more that I'm not considering?
The biggest reason is that it reduces type safety.
How many programs have you written where you actually needed to consume an iterable just to know how many elements it had, throwing away anything else?
I, in quite a few years of coding in Python, never needed that. It's a non-sensical operation in normal programs. An iterator may not have a length (e.g. infinite iterators or generators that expects inputs via send()), so asking for it doesn't make much sense. The fact that len(an_iterator) produces an error means that you can find bugs in your code. You can see that in a certain part of the program you are calling len on the wrong thing, or maybe your function actually needs a sequence instead of an iterator as you expected.
Removing such errors would create a new class of bugs where people, calling len, erroneously consume an iterator, or use an iterator as if it were a sequence without realizing.
If you really need to know the length of an iterator, what's wrong with len(list(iterator))? The extra 6 characters? It's trivial to write your own version that works for iterators, but, as I said, 99% of the time this simply means that something with your code is wrong, because such an operation doesn't make much sense.
The second reason is that, with that change, you are violating two nice properties of len that currently hold for all (known) containers:
It is known to be cheap on all containers ever implemented in Python (all built-ins, standard library, numpy & scipy and all other big third party libraries do this on both dynamically sized and static sized containers). So when you see len(something) you know that the len call is cheap. Making it work with iterators would mean that suddenly all programs might become inefficient due to computations of the length.
Also note that you can, trivially, implement O(1) __len__ on every container. The cost to pre-compute the length is often negligible, and generally worth paying.
The only exception would be if you implement immutable containers that have part of their internal representation shared with other instances (to save memory). However, I don't know of any implementation that does this, and most of the time you can achieve better than O(n) time anyway.
In summary: currently everybody implements __len__ in O(1) and it's easy to continue to do so. So there is an expectation for calls to len to be O(1). Even if it's not part of the standard. Python developers intentionally avoid C/C++'s style legalese in their documentation and trust the users. In this case, if your __len__ isn't O(1), it's expected that you document that.
It is known to be not destructive. Any sensible implementation of __len__ doesn't change its argument. So you can be sure that len(x) == len(x), or that n = len(x);len(list(x)) == n.
Even this property is not defined in the documentation, however it's expected by everyone, and currently, nobody violates it.
Such properties are good, because you can reason and make assumptions about code using them.
They can help you ensure the correctness of a piece of code, or understand its asymptotic complexity. The change you propose would make it much harder to look at some code and understand whether it's correct or what would be it's complexity, because you have to keep in mind the special cases.
In summary, the change you are proposing has one, really small, pro: saving few characters in very particular situations, but it has several, big, disadvantages which would impact a huge portion of existing code.
An other minor reason. If len consumes iterators I'm sure that some people would start to abuse this for its side-effects (replacing the already ugly use of map or list-comprehensions). Suddenly people can write code like:
len(print(something) for ... in ...)
to print text, which is really just ugly. It doesn't read well. Stateful code should be relagated to statements, since they provide a visual cue of side-effects.

Python - Why do the find and index methods work differently?

In Python, find and index are very similar methods, used to look up values in a sequence type. find is used for strings, while index is for lists and tuples. They both return the lowest index (the index furthest to the left) that the supplied argument is found.
For example, both of the following would return 1:
"abc".find("b")
[1,2,3].index(2)
However, one thing I'm somewhat confused about is that, even though the two methods are very similar, and fill nearly the same role, just for different data types, they have very different reactions to attempting to find something not in the sequence.
"abc".find("d")
Returns -1, to signify 'not found', while
[1,2,3].index(4)
raises an exception.
Basically, why do they have different behaviors? Is there a particular reason, or is it just a weird inconsistency for no particular reason?
Now, I'm not asking how to deal with this – obviously, a try/except block, or a conditional in statement, would work. I'm simply asking what the rationale was for making the behavior in just that particular case different. To me, it would make more sense to have a particular behavior to say not found, for consistency's sake.
Also, I'm not asking for opinions on whether the reason is a good reason or not – I'm simply curious about what the reason is.
Edit: Some have pointed out that strings also have an index method, which works like the index method for lists, which I'll admit I didn't know, but that just makes me wonder why, if strings have both, lists only have index.
This has always been annoying ;-) Contrary to one answer, there's nothing special about -1 with respect to strings; e.g.,
>>> "abc"[-1]
'c'
>>> [2, 3, 42][-1]
42
The problem with find() in practice is that -1 is in fact not special as an index. So code using find() is prone to surprises when the thing being searched for is not found - it was noted even before Python 1.0.0 was released that such code often went on to do a wrong thing.
No such surprises occur when index() is used instead - an exception can't be ignored silently. But setting up try/except for such a simple operation is not only annoying, it adds major overhead (extra time) for what "should be" a fast operation. Because of that, string.find() was added in Python 0.9.9 (before then, only string.index() could be used).
So we have both, and that persists even into Python 3. Pick your poison :-)

Python documentation: iterable many times?

In documenting a Python function, I find it more Pythonic to say:
def Foo(i):
"""i: An interable containing…"""
…rather than…
def Foo(i):
"""i: A list of …"""
When i really doesn't need to be a list. (Foo will happily operate on a set, tuple, etc.) The problem is generators. Generators typically only allow 1 iteration. Most functions are OK with generators or iterables that only allow a single pass, but some are not.
For those functions that cannot accept generators/things that can only be iterated once, is there a clear, consistent Python term to say "thing that can only be iterated more than once"?
The Python glossary for iterable and iterator seem to have a "once, but maybe more if you're lucky" definition.
I don't know of a standard term for this, at least not offhand, but I think "reusable iterable" would get the point across if you need a short phrase.
In practice, it's generally possible to structure your function so that you don't need to iterate over i more than once. Alternatively, you can create a list out of the iterable and then iterate over the list as many times as you want; or you can use itertools.tee to get multiple independent "copies" of the iterator. That lets you accept a generator even if you do need to use it more than once.
This is probably more a matter of style and preference than anything else, yet... I have a different take on my documentation: I always write the docstring according to the expected input in the context of the program.
Example: if I wrote a function that expect to go over keys of a dictionary and ignore its values I write:
arg : a dictionary of...
even if for e in arg: would work with other iterables. I chose to do so, because within the context of my code, I don't care if the function would still work... I care more that whoever reads the documentation understand how that function is meant to be used.
On the other hand, if I am writing a utility function that can cope with a wide spectrum of iterables by design, I go one of these two ways:
document what kind of exception will be rose under certain conditions [ex: "Raise TypeError if the iterable can't be iterated more than once"]
perform some pre-emptive argument handling that will make the function compatible with 'once-only' iterables.
In other words, I try to either make my function solid enough to handle edge cases, or to be very outspoken on its limitations.
Again: there's nothing wrong with the approach you want to take, but I consider this one of the cases in which "explicit is better than implicit": a documentation in which is mentioned "reusable iterable" is definitively accurate, but the adjective could easily be overlooked.
HTH!

Categories