More efficient ways of doing this - python

for i in vr_world.getNodeNames():
if i != "_error_":
World[i] = vr_world.getChild(i)
vr_world.getNodeNames() returns me a gigantic list, vr_world.getChild(i) returns a specific type of object.
This is taking a long time to run, is there anyway to make it more efficient? I have seen one-liners for loops before that are supposed to be faster. Ideas?

kaloyan suggests using a generator. Here's why that may help.
If getNodeNames() builds a list, then your loop is basically going over the list twice: once to build it, and once when you iterate over the list.
If getNodeNames() is a generator, then your loop doesn't ever build the list; instead of creating the item and adding it to the list, it creates the item and yields it to the caller.
Whether or not this helps is contingent on a couple of things. First, it has to be possible to implement getNodeNames() as a generator. We don't know anything about the implementation details of that function, so it's not possible to say if that's the case. Next, the number of items you're iterating over needs to be pretty big.
Of course, none of this will have any effect at all if it turns out that the time-consuming operation in all of this is vr_world.getChild(). That's why you need to profile your code.

I don't think you can make it faster than what you have there. Yes, you can put the whole thing on one line but that will not make it any faster. The bottleneck obviously is getNodeNames(). If you can make it a generator, you will start populating the World dict with results sooner (if that matters to you) and if you make it filter out the "_error_" values, you will not have the deal with that at a later stage.

World = dict((i, vr_world.getChild(i)) for i in vr_world.getNodeNames() if i != "_error_")
This is a one-liner, but not necessarily much faster than your solution...

Maybe you can use a filter and a map, however I don't know if this would be any faster:
valid = filter(lambda i: i != "_error_", vr_world.getNodeNames())
World = map(lambda i: vr_world.getChild(i), valid)
Also, as you'll see a lot around here, profile first, and then optimize, otherwise you may be wasting time. You have two functions there, maybe they are the slow parts, not the iteration.

Related

Is using list declarations to shorthand operations a bad thing to do? [duplicate]

Think about a function that I'm calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).
def fun_with_side_effects(x):
...side effects...
return y
Now, is it Pythonic to use list comprehensions to call this func:
[fun_with_side_effects(x) for x in y if (...conditions...)]
Note that I don't save the list anywhere
Or should I call this func like this:
for x in y:
if (...conditions...):
fun_with_side_effects(x)
Which is better and why?
It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.
You shouldn't use a list comprehension, because as people have said that will build a large temporary list that you don't need. The following two methods are equivalent:
consume(side_effects(x) for x in xs)
for x in xs:
side_effects(x)
with the definition of consume from the itertools man page:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Of course, the latter is clearer and easier to understand.
List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.
So I would got for the second option, just iterating over the list and then call the function when the conditions apply.
Second is better.
Think of the person who would need to understand your code. You can get bad karma easily with the first :)
You could go middle between the two by using filter(). Consider the example:
y=[1,2,3,4,5,6]
def func(x):
print "call with %r"%x
for x in filter(lambda x: x>3, y):
func(x)
Depends on your goal.
If you are trying to do some operation on each object in a list, the second approach should be adopted.
If you are trying to generate a list from another list, you may use list comprehension.
Explicit is better than implicit.
Simple is better than complex. (Python Zen)
You can do
for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass
but it's not very pretty.
Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn't do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.
But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:
any(fun_with_side_effects(x) and False for x in y if (...conditions...))
or:
all(fun_with_side_effects(x) or True for x in y if (...conditions...))
These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn't be used.
I think this is ugly and I wouldn't actually do it in code. But if you insist on implementing your loops in this fashion, that's how I would do it.
I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that's a bad thing.

making python code block with loop faster

Is there a way I can implement the code block below using map or list comprehension or any other faster way, keeping it functionally the same?
def name_check(names, n_val):
lower_names = names.lower()
for item in set(n_val):
if item in lower_names:
return True
return False
Any help here is appreciated
A simple implementation would be
return any(character in names_lower for character in n_val)
A naive guess at the complexity would be O(K*2*N) where K is the number of characters in names and N is the number of characters in n_val. We need one "loop" for the call to lower*, one for the inner comprehension, and one for any. Since any is a built-in function and we're using a generator expression, I would expect it to be faster than your manual loop, but as always, profile to be sure.
To be clear, any short-circuits, so that behaviour is preserved
Notes on Your Implementation
On using a set: Your intuition to use a set to reduce the number of checks is a good one (you could add it to my form above, also), but it's a trade-off. In the case that the first element short circuits, the extra call to set is an additional N steps to produce the set expression. In the case where you wind up checking each item, it will save you some time. It depends on your expected inputs. If n_val was originally an iterable, you've lost that benefit and allocated all the memory up front. If you control the input to the function, why not just recommend it's called using lists that don't have duplicates (i.e., call set() on its input), and leave the function general?
* #Neopolitan pointed out that names_lower = names.lower() should be called out of the loop, as your original implementation called it, else it may (will?) be called repeatedly in the generator expression

how to parallelize big for loops in python

I just got to Python, and I am still in the steep phase of the learning curve. Thank you for any comments ahead.
I have a big for loop to run (big in the sense of many iterations), for example:
for i in range(10000)
for j in range(10000)
f((i,j))
I though that it would be a common question of how to parallelize it, and after hours of search on google I arrived at the solution using "multiprocessing" module, as the following:
pool=Pool()
x=pool.map(f,[(i,j) for i in range(10000) for j in range(10000)])
This works when the loop is small. However, it is really slow if the loop is large, Or sometimes a memory error occurs if the loops are too big. It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?
So this parallelization does not work for me because I do not really need to store all arguments in a list. Is there a better way to do this? I appreciate any suggestions or references. Thank you.
It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?
Yes, because you're using a list comprehension, which explicitly asks it to generate that list.
(Note that xrange isn't really relevant here, because you only have two ranges at a time, each 10K long; compared to the 100M of the argument list, that's nothing.)
If you want it to generate the values on the fly as needed, instead of all 100M at once, you want to use a generator expression instead of a list comprehension. Which is almost always just a matter of turning the brackets into parentheses:
x=pool.map(f,((i,j) for i in range(10000) for j in range(10000)))
However, as you can see from the source, map will ultimately just make a list if you give it a generator, so in this case, that won't solve anything. (The docs don't explicitly say this, but it's hard to see how it could pick a good chunksize to chop the iterable into if it didn't have a length…).
And, even if that weren't true, you'd still just run into the same problem again with the results, because pool.map returns a list.
To solve both problems, you can use pool.imap instead. It consumes the iterable lazily, and returns a lazy iterator of results.
One thing to note is that imap does not guess at the best chunksize if you don't pass one, but just defaults to 1, so you may need a bit of thought or trial&error to optimize it.
Also, imap will still queue up some results as they come in, so it can feed them back to you in the same order as the arguments. In pathological cases, it could end up queuing up (poolsize-1)/poolsize of your results, although in practice this is incredibly rare. If you want to solve this, use imap_unordered. If you need to know the ordering, just pass the indexes back and forth with the args and results:
args = ((i, j) for i in range(10000) for j in range(10000))
def indexed_f(index, (i, j)):
return index, f(i, j)
results = pool.imap_unordered(indexed_f, enumerate(args))
However, I notice that in your original code, you're not doing anything at all with the results of f(i, j). In that case, why even bother gathering up the results at all? In that case, you can just go back to the loop:
for i in range(10000):
for j in range(10000):
map.apply_async(f, (i,j))
However, imap_unordered may still be worth using, because it provides a very easy way to block until all of the tasks are done, while still leaving the pool itself running for later use:
def consume(iterator):
deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)

Is it Pythonic to use list comprehensions for just side effects?

Think about a function that I'm calling for its side effects, not return values (like printing to screen, updating GUI, printing to a file, etc.).
def fun_with_side_effects(x):
...side effects...
return y
Now, is it Pythonic to use list comprehensions to call this func:
[fun_with_side_effects(x) for x in y if (...conditions...)]
Note that I don't save the list anywhere
Or should I call this func like this:
for x in y:
if (...conditions...):
fun_with_side_effects(x)
Which is better and why?
It is very anti-Pythonic to do so, and any seasoned Pythonista will give you hell over it. The intermediate list is thrown away after it is created, and it could potentially be very, very large, and therefore expensive to create.
You shouldn't use a list comprehension, because as people have said that will build a large temporary list that you don't need. The following two methods are equivalent:
consume(side_effects(x) for x in xs)
for x in xs:
side_effects(x)
with the definition of consume from the itertools man page:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is none, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
Of course, the latter is clearer and easier to understand.
List comprehensions are for creating lists. And unless you are actually creating a list, you should not use list comprehensions.
So I would got for the second option, just iterating over the list and then call the function when the conditions apply.
Second is better.
Think of the person who would need to understand your code. You can get bad karma easily with the first :)
You could go middle between the two by using filter(). Consider the example:
y=[1,2,3,4,5,6]
def func(x):
print "call with %r"%x
for x in filter(lambda x: x>3, y):
func(x)
Depends on your goal.
If you are trying to do some operation on each object in a list, the second approach should be adopted.
If you are trying to generate a list from another list, you may use list comprehension.
Explicit is better than implicit.
Simple is better than complex. (Python Zen)
You can do
for z in (fun_with_side_effects(x) for x in y if (...conditions...)): pass
but it's not very pretty.
Using a list comprehension for its side effects is ugly, non-Pythonic, inefficient, and I wouldn't do it. I would use a for loop instead, because a for loop signals a procedural style in which side-effects are important.
But, if you absolutely insist on using a list comprehension for its side effects, you should avoid the inefficiency by using a generator expression instead. If you absolutely insist on this style, do one of these two:
any(fun_with_side_effects(x) and False for x in y if (...conditions...))
or:
all(fun_with_side_effects(x) or True for x in y if (...conditions...))
These are generator expressions, and they do not generate a random list that gets tossed out. I think the all form is perhaps slightly more clear, though I think both of them are confusing and shouldn't be used.
I think this is ugly and I wouldn't actually do it in code. But if you insist on implementing your loops in this fashion, that's how I would do it.
I tend to feel that list comprehensions and their ilk should signal an attempt to use something at least faintly resembling a functional style. Putting things with side effects that break that assumption will cause people to have to read your code more carefully, and I think that's a bad thing.

When to use "while" or "for" in Python

When I should use a while loop or a for loop in Python? It looks like people prefer using a for loop (for brevity?). Is there any specific situation which I should use one or the other? Is it a matter of personal preference? The code I have read so far made me think there are big differences between them.
Yes, there is a huge difference between while and for.
The for statement iterates through a collection or iterable object or generator function.
The while statement simply loops until a condition is False.
It isn't preference. It's a question of what your data structures are.
Often, we represent the values we want to process as a range (an actual list), or xrange (which generates the values) (Edit: In Python 3, range is now a generator and behaves like the old xrange function. xrange has been removed from Python 3). This gives us a data structure tailor-made for the for statement.
Generally, however, we have a ready-made collection: a set, tuple, list, map or even a string is already an iterable collection, so we simply use a for loop.
In a few cases, we might want some functional-programming processing done for us, in which case we can apply that transformation as part of iteration. The sorted and enumerate functions apply a transformation on an iterable that fits naturally with the for statement.
If you don't have a tidy data structure to iterate through, or you don't have a generator function that drives your processing, you must use while.
while is useful in scenarios where the break condition doesn't logically depend on any kind of sequence. For example, consider unpredictable interactions:
while user_is_sleeping():
wait()
Of course, you could write an appropriate iterator to encapsulate that action and make it accessible via for – but how would that serve readability?¹
In all other cases in Python, use for (or an appropriate higher-order function which encapsulate the loop).
¹ assuming the user_is_sleeping function returns False when false, the example code could be rewritten as the following for loop:
for _ in iter(user_is_sleeping, False):
wait()
The for is the more pythonic choice for iterating a list since it is simpler and easier to read.
For example this:
for i in range(11):
print i
is much simpler and easier to read than this:
i = 0
while i <= 10:
print i
i = i + 1
for loops is used
when you have definite itteration (the number of iterations is known).
Example of use:
Iterate through a loop with definite range: for i in range(23):.
Iterate through collections(string, list, set, tuple, dictionary): for book in books:.
while loop is an indefinite itteration that is used when a loop repeats
unkown number of times and end when some condition is met.
Note that in case of while loop the indented body of the loop should modify at least one variable in the test condition else the result is infinite loop.
Example of use:
The execution of the block of code require that the user enter specified input: while input == specified_input:.
When you have a condition with comparison operators: while count < limit and stop != False:.
Refrerences: For Loops Vs. While Loops, Udacity Data Science, Python.org.
First of all there are differences between the for loop in python and in other languages.
While in python it iterates over a list of values (eg: for value in [4,3,2,7]), in most other languages (C/C++, Java, PHP etc) it acts as a while loop, but easier to read.
For loops are generally used when the number of iterations is known (the length of an array for example), and while loops are used when you don't know how long it will take (for example the bubble sort algorithm which loops as long as the values aren't sorted)
Consider processing iterables. You can do it with a for loop:
for i in mylist:
print i
Or, you can do it with a while loop:
it = mylist.__iter__()
while True:
try:
print it.next()
except StopIteration:
break
Both of those blocks of code do fundamentally the same thing in fundamentally the same way. But the for loop hides the creation of the iterator and the handling of the StopIteration exception so that you don't need to deal with them yourself.
The only time I can think of that you'd use a while loop to handle an iterable would be if you needed to access the iterator directly for some reason, e.g. you needed to skip over items in the list under some circumstances.
For loops usually make it clearer what the iteration is doing. You can't always use them directly, but most of the times the iteration logic with the while loop can be wrapped inside a generator func. For example:
def path_to_root(node):
while node is not None:
yield node
node = node.parent
for parent in path_to_root(node):
...
Instead of
parent = node
while parent is not None:
...
parent = parent.parent
while loop is better for normal loops
for loop is much better than while loop while working with strings, like lists, strings etc.
If your data is dirty and it won't work with a for loop, you need to clean your data.
For me if your problem demands multiple pointers to be used to keep
track of some boundary I would always prefer While loop.
In other cases it's simply for loop.

Categories