Python iter() time complexity? - python

I was looking up an efficient way to retrieve an (any) element from a set in Python and came across this method:
anyElement = next(iter(SET))
What exactly happens when you generate an iterator out of a container such as a set? Does it simply create a pointer to the location of the object in memory and move that pointer whenever next is called? Or does it convert the set to a list then create an iterator out of that?
My main concern is if it were the latter, it seems iter() would be an O(n) operation. At that point it would be better to just pop an item from the set, store the popped item in a variable, then re-insert the popped item back into the set.
Thanks for any information in advance!

sets are iterable, but don't have a .__next__() method, so iter() is calling the .__iter__() method of the set instance, returning an iterable which does have the __next__ method.
As this is a wrapper around an O(1) call, it will operate in O(1) time once declared
https://wiki.python.org/moin/TimeComplexity
See also Retrieve an arbitrary key from python3 dict in O(1) time for an extended answer on .__next__()!

Related

pyspark groupByKey's Iterable object (ResultIterable) what are the advantages of this?

I'm not finding anything helpful on line regards to the result structure after the groupByKey transformation. What can I do with the ResultIterable object after groupByKey? I would have expected a list returned with the key. I can convert it to a list but not sure if I'm missing something
what are the advantages of this?
Serialization
A special result iterable. This is used because the standard
iterator can not be pickled
What can I do with the "ResultIterable"
The same things you can do with any Iterable object:
class ResultIterable(collections.Iterable):
specifically you can assume that it implements __iter__ dunder method - it means it can be iterated or converted to another collection and can be used whenever iterable objects is expected.
I would have expected a list
list requires specific implementation of the collection. Iterable allows other options, including larger than memory collections, and specific implementation can be change whenever needed.

How to map a function to each dictionary element?

I have a dictionary self.what_to_build
I am iterating on each element, and apply another method to each element using the following way:
[self.typeBuild(obj_type,dest) for obj_type,dest in self.what_to_build.items()]
It is my understanding this builds a list in memory, while there is no real impact on the program, I would like to refrain from this, I really do not need the list, just applying the method.
How would I do this same map, in the most Pythonic way, without doing a list comprehension
Just use a regular loop:
for obj_type,dest in self.what_to_build.items():
self.typeBuild(obj_type, dest)
A list comprehension indeed creates a list object with the return values of the self.typeBuild() calls, which is a waste of CPU and memory if you don't need those return values.
Don't get too hung up trying to write 'compact' code; readability is found in just the right level of verbosity.

Set changed size during iteration

I'm new to python coming from a c++ background. I was just playing around with sets trying to calculate prime numbers and got a "Set changed size during iteration" error.
How internally does python know the set changed size during iteration?
Is it possible to do something similar in user defined objects?
The pythonic way to filter sets, lists or dicts is with list [or dict] expressions
your_filtered_set = set([elem for elem in original_set if condition(elem)])
It's trivial to do so with a user-defined object: just set a flag each time you modify the object, and have the iterator check that flag each time it tries to retrieve an item.
Generally, you should not modify a set while iterating over it, as you risk missing an item or getting the same item twice.

Sorting a list of lists with itertools imap in Python

I'm wondering what's happening when I execute this code and also if there is a better way to accomplish the same task. Is a list of lists being made in memory to preform the sort, and then bar is assigned to be an iterator of foo.values()? Or possibly foo.values() is sorted in the allotted dictionary memory space (seems unlikely)?
Imagine the first value in the list, the integer, refers to a line number in a file. I want to open the file and update only the lines referenced in the foo.values() lists with the rest of the data in the list (EG update line 1 with strings '123' and '097').
from itertools import imap
>>> foo = {'2134':[1, '123', '097'], '6543543':[3, '1'], '12315':[2, '454']}
>>> bar = imap([].sort(), foo.values())
Thanks~
First, you're passing [].sort(), which is just None, as the first argument to imap, meaning it's doing nothing at all. As the docs explain: "If function is set to None, then imap() returns the arguments as a tuple."
To pass a callable to a higher-order function like imap, you have to pass the callable itself, not call it and pass the result.
Plus, you don't want [].sort here; that's a callable with no arguments that just sorts an empty list, which is useless.
You probably wanted list.sort, the unbound method, which is a callable with one argument that will sort whatever list it's given.
So, if you did that, what would happen is that you'd creating an iterator that, if you iterated it, would generate a bunch of None values and, as a side effect, sort each list in foo.values(). No new lists would be created anywhere, because list.sort mutates the list in-place and returns None.
But since you don't ever iterate it anyway, it hardly matters what you put into imap; what it actually does is effectively nothing.
Generally, abusing map/imap/comprehensions/etc. for side-effects of the expressions is a bad idea. An iterator that generates useless values, but that you have to iterate anyway, is a recipe for confusion at best.
The simple thing to do here is to just use a loop:
for value in foo.values():
value.sort()
Or, instead of sorting in-place, generate new sorted values:
bar = imap(sorted, foo.values())
Now, as you iterate bar, each list will be sorted and given to you, so you can use it. If you iterate this, it will generate a sorted list in memory for each list… but only one will ever be alive at a time (unless you explicitly stash them somewhere).

Correct way to iterate twice over a list?

What is the correct way to perform multiple iteration over a container? From python documentation:
Iterator - A container object (such as a list) produces a fresh new
iterator each time you pass it to the iter() function or use it in a
for loop. Attempting this with an iterator will just return the same
exhausted iterator object used in the previous iteration pass, making
it appear like an empty container.
The intention of the protocol is that once an iterator’s next() method
raises StopIteration, it will continue to do so on subsequent calls.
Implementations that do not obey this property are deemed broken.
(This constraint was added in Python 2.3; in Python 2.2, various
iterators are broken according to this rule.)
If I have this code:
slist = [1,2,3,4]
rlist = reversed(slist)
list(rlist)
#[4,3,2,1]
tuple(rlist)
#()
What would be the easiest and most correct way to iterate over 'rlist' twice?
rlist = list(reversed(slist))
Then iterate as often as you want. This trick applies more generally; whenever you need to iterate over an iterator multiple times, turn it into a list. Here's a code snippet that I keep copy-pasting into different projects for exactly this purpose:
def tosequence(it):
"""Turn iterable into a sequence, avoiding a copy if possible."""
if not isinstance(it, collections.Sequence):
it = list(it)
return it
(Sequence is the abstract type of lists, tuples and many custom list-like objects.)
I wouldn't stored the list twice, if you can not combine it to iterate once, then I would
slist = [1,2,3,4]
for element in reversed(slist):
print element # do first iteration stuff
for element in reversed(slist):
print element # do second iteration stuff
Just think of the reversed() as setting up a reverse iterator on slist. The reversed is cheap. That being said, if you only ever need it reversed, I would reverse it and just have it stored like that.
What is the correct way to perform multiple iteration over a container?
Just do it twice in a row. No problem.
What would be the easiest and most correct way to iterate over 'rlist' twice?
See, the reason that isn't working for you is that rlist isn't "a container".
Notice how
list(slist) # another copy of the list
tuple(slist) # still works!
So, the simple solution is to just ensure you have an actual container of items if you need to iterate multiple times:
rlist = list(reversed(slist)) # we store the result of the first iteration
# and then that result can be iterated over multiple times.
If you really must not store the items, try itertools.tee. But note that you won't really avoid storing the items if you need to complete one full iteration before starting the next. In the general case, storage is really unavoidable under those restrictions.
Why don't you simply reverse the original list in-place (slist.reverse()), then iterate over it as many times as you wish, and finally reverse it again to obtain the original list once again?
If this doesn't work for you, the best solution for iterating over the list in reversed order is to create a new reverse iterator every time you need to iterate
for _ in xrange(as_many_times_as_i_wish_to_iterate_this_list_in_reverse_order):
for x in reversed(slist):
do_stuff(x)

Categories