Python Code Optimization LineProfiler [closed]

Python Code Optimization LineProfiler [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
My function gets a combinations of numbers, can be small or large depending on user input, and will loop through each combinations to perform some operations.
Below is a line-profiling I ran on my function and it is takes 0.336 seconds to run. While this is fast, this is only a subset of a bigger framework. In this bigger framework, I will need to run this function 50-20000 times which when multiplied to 0.336, takes 16.8 to 6720 seconds (I hope this is right). Before it takes 0.996 seconds but I've manage to cut it by half through avoiding functions calls.
The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations. My dictionary is a collection data and it looks something like this:
dic = {"array1", a,
"array2", b,
"array3", c,
"listofarray", [ [list of 5 array], [list of 5 array], [list of 5 2d Array ] ]
}
I was able to cut it by another ~0.01 seconds when I placed the dictionary lookback outside of the loop..
x = dic['listofarray']['list of 5 2d array']
So when I loop to get access to the the 5 different elements I just did x[i].
Other than that I am lost in terms of where to add more performance boost.
Note: I apologize that I haven't provided any code. I'd love to show but its proprietary. I just wanted to get some thoughts on whether I am looking at the right place for speed ups.
I am willing to learn and apply new things so if cython or some other data structure can speed things up, i am all ears. Thanks so much
PS:
inside my first __getitem__:
inside my second __getitem__:
EDIT:
I am using iter tools product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).

The major contributor to time is the two __getitem__ which is accessing dictionary for information N times depending on the number of combinations.
No it isn't. Your two posted profile traces clearly show that they're NumPy/Pandas __getitem__ functions, not dict.__getitem__. So, you're trying to optimize the wrong place.
Which explains why moving all the dict stuff out of the loop made a difference of a small fraction of a percent.
Most likely the problem is that you're looping over some NumPy object, or using some fake-vectorized function (e.g., via vectorize), rather than performing some NumPy-optimized broadcasting operation. That's what you need to fix.
For example, if you compare these:
np.vectorize(lambda x: x*2)(a)
a * 2
… the second one will go at least 10x faster on any sizable array, and it's mostly because of all the time spending doing __getitem__—which includes boxing up numbers to be usable by your Python function. (There's also some additional cost in not being able to use CPU-vectorized operations, cacheable tight loops, etc., but even if you arrange things to be complicated enough that those don't enter into it, you're still going to get much faster code.)
Meanwhile:
I am using itertools.product(xrange(10), repeat=len(food_choices)) and iterating over this. I covert everything into numpy arrays np.array(i).astype(float).
So you're creating 10**n separate n-element arrays? That's not making sensible use of NumPy. Each array is tiny, and most likely you're spending as much time building and pulling apart the arrays as you are doing actual work. Do you have the memory to build a single giant array with an extra 10**n-long axis instead? Or, maybe, batch it up into groups of, say, 100K? Because then you could actually build and process the whole array in native NumPy-vectorized code.
However, the first thing you might want to try is to just run your code in PyPy instead of CPython. Some NumPy code doesn't work right with PyPy/NumPyPy, but there's fewer problems with each version, so you should definitely try it.
If you're lucky (and there's a pretty good chance of that), PyPy will JIT the repeated __getitem__ calls inside the loop, and make it much faster, with no change in your code.
If that helps (or if NumPyPy won't work on your code), Cython may be able to help more. But only if you do all the appropriate static type declarations, etc. And often, PyPy already helps enough that you're done.

Related

Handling large list of lists in python

I have this mathematical task in which I am supposed to find some combinations, etc. That doesn't matter, the problem is that I am trying to do it with itertools module and it worked fine on smaller combinations (6 - places), but now I want to do the same for large combination (18 - places) so here I run into problem because I only have 8GB of RAM and this list comes around 5GB and with my system running it consumes all RAM and then program drops MemoryError. So my question is: what would be good alternative to the method I'm using(code below)?
poliedar_kom = list(itertools.combinations_with_replacement(range(0, 13), 18))
poliedar_len = len(poliedar_kom)
So when I have this list and it's length, the rest of program is going through every value in list and checking for condition with values in another smaller list. As I already said that's problem because this list gets too big for my PC, but I'm probably doing something wrong.
Note: I am using latest Python 3.8 64-bit
Summary: I have too big list of lists through which I have to loop to check values for conditions.
EDIT: I appreciate all answers, I have to try them now, if you have any new possible solution to the problem please post it.
EDIT 2: Thanks everyone, you helped me really much. I marked answer that pointed me to Youtube video because it made me realize that my code is already generator. Thanks everyone!!!

Use generators for large data ranges, time and space complexity of the code will not increase exponentially with large data size, refer to the link for more details:
https://www.youtube.com/watch?v=bD05uGo_sVI

For any application requiring more than say, 1e4 items, you should refrain from using python lists, which are very memory- and processor-intesive
For such uses, I generally go to numpy arrays or pandas dataframes
If you aren't comfortable with these, is there some way you could refactor your algorithm so that you don't hold every value in memory at once, like with a generator?

in your case!
1) store this amount of data not in the RAM but inside a file or something in your HDD/SDD (say some SQL databases or NoSQL databases)
2) write a generator that processes each list (group of list for more efficiency) inside the whole list one after the other until the end
it will be good for you to use something like mongodb or mysql/mariadb/postgresql to store this amount of datas.

How to improve Python code speed

I was solving this python challenge http://coj.uci.cu/24h/problem.xhtml?abb=2634 and this is my answer
c = int(input())
l = []
for j in range(c) :
i = raw_input().split()[1].split('/')
l.append(int(i[1]))
for e in range(1,13) :
print e , l.count(e)
But it was not the fastest python solution, so i tried to find how to improve the speed and i found that xrange was faster than range. But when i tried the following code it was actually slower
c = int(input())
l = []
for j in xrange(c):
i = raw_input().split()[1].split('/')[1]
l.append(i)
for e in xrange(1,13) :
print e , l.count(`e`)
so i have 2 questions :
How can i improve the speed of my script
Where can i find information on how to improve python speed
When i was looking for this info i found sites like this one https://wiki.python.org/moin/PythonSpeed/PerformanceTips
but it doesn't specify for example, if it is faster/slower to split a string multiple times in a single line or in multiple lines, for example using part of the script mentioned above :
i = raw_input().split()[1].split('/')[1]
vs
i = raw_input().split()
i = i[1].split('/')
i = i[1]
Edit : I have tried all your suggestions but my first answer is still the fastest and i don't know why. My firs answer was 151ms and #Bakuriu's answer was 197ms and my answer using collections.Counter was 188ms.
Edit 2 : Please disregard my last edit, i just found out that the method for checking your code performance in the site mentioned above does not work, if you upload the same code more times the performance is different each time, some times it's slower and sometimes faster

Assuming you are using CPython, the golden rule is to push as much work as possible into built-in functions, since these are written in C and thus avoid the interpreter overhead.
This means that you should:
Avoid explicit loops when there is a function/method that already does what you want
Avoid expensive lookups in inner loops. In rare circumstances you may go as far as use local variables to store built-in functions.
Use the right data structures. Don't simply use lists and dicts. The standard library contains other data types, and there are many libraries out there. Consider which should be the efficient operations to solve your problem and choose the correct data structure
Avoid meta-programming. If you need speed you don't want a simple attribute lookup to trigger 10 method calls with complex logic behind the scenes. (However where you don't really need speed metaprogramming is really cool!)
Profile your code to find the bottleneck and optimize the bottleneck. Often what we think about performance of some concrete code is completely wrong.
Use the dis module to disassemble the bytecode. This gives you a simple way to see what the interpreter will really do. If you really want to know how the interpreter works you should try to read the source for PyEval_EvalFrameEx which contains the mainloop of the interpreter (beware: hic sunt leones!).
Regarding CPython you should read An optimization anecdote by Guido Van Rossum. It gives many insights as to how performance can change with various solutions. An other example could be this answer (disclaimer: it's mine) where the fastest solution is probably very counter intuitive for someone not used to CPython workings.
An other good thing to do is to study all most used built-in and stdlib data types, since each one has both positive and negative proporties. In this specific case calling list.count() is an heavy operation, since it has to scan the whole list every time it is performed. That's probably were a lot of the time is consumed in your solution.
One way to minimize interpreter overhead is to use collections.Counter, which also avoids scanning the data multiple times:
from collections import Counter
counts = Counter(raw_input().split('/')[-2] for _ in range(int(raw_input())))
for i in range(1, 13):
print(i, counts[str(i)])
Note that there is no need to convert the month to an integer, so you can avoid those function calls (assuming the months are always written in the same way. No 07 and 7).
Also I don't understand why you are splitting on whitespace and then on the / when you can simply split by the / and take the one-to-last element from the list.
An other (important) optimization could be to read all stdin to avoid multiple IO calls, however this may not work in this situation since the fact that they tell you how many employees are there probably means that they are not sending an EOF.
Note that different versions of python have completely different ways of optimizing code. For example PyPy's JIT works best when you perform simply operations in loops that the JIT is able to analyze and optimize. So it's about the opposite of what you would do in CPython.

speeding up numpy.dot inside list comprehension

I have a numpy script that is currently running quite slowly.
spends the vast majority of it's time performing the following operation inside a loop:
terms=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex,Ey,Ez_av)
res=[np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms]
res=array(res)
Ex[1:Nx-1]=res[1:Nx-1,0]
Ey[1:Nx-1]=res[1:Nx-1,1]
It's the list comprehension that is really slowing this code down.
In this case, Coeff_3, and Coeff_2 are length 1000 lists whose elements are 3x3 numpy matricies, and Ex,Ey,Ez, Curl_x, etc are all length 1000 numpy arrays.
I realize it might be faster if i did things like setting a single 3x1000 E vector, but i have to perform a significant amount of averaging of different E vectors between step, which would make things very unwieldy.
Curiously however, i perform this operation twice per loop (once for Ex,Ey, once for Ez), and performing the same operation for the Ez's takes almost twice as long:
terms2=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex_av,Ey_av,Ez)
res2=array([np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms2])
Anyone have any idea what's happening? Forgive me if it's anything obvious, i'm very new to python.

As pointed out in previous comments, use array operations. np.hstack(), np.vstack(), np.outer() and np.inner() are useful here. You're code could become something like this (not sure about your dimensions):
Cxyz = np.vstack((Curl_x,Curl_y,Curl_z))
C2xyz = np.dot(C2, Cxyz)
...
Check the shape of your resulting dimensions, to make sure you translated your problem right. Sometimes numexpr can also to speed up such tasks significantly with little extra effort,

Can I improve python runtime by compiling?

I'm writing a small toy simulation in python. Granted, this simulations are slow. To my understanding, the major reason that python codes are slow is the fact that python is in interpreted language. I don't want to give up python since the clear syntax and the available library cut the writing time significantly. So is there a simple way for me to "compile" my python code?
Edit
I answer some questions:
Yes, I'm using numpy. It greatly simplify the code and I don't think I can improve performance writing the functions on my own. I use numpy for all my lists and and I add all of the beads together. Namely. I invoke
pos += V*dt + forces*0.5*dt**2
where ''pos'', 'V', and 'forces' are all np.array of (2000,3) dimensions.
I'm quite certain that the slow part in the forces calculation. This is logical as I have to iterate over all my particles and check their position. For my real project (Ph.D. stuff) I have code of about roughly the same level of complexity, and I know that this is the expensive stuff.

If none of the solutions in the comment suffice, you can also take a look at cython.
For a quick tutorial & example check:
http://docs.cython.org/src/tutorial/cython_tutorial.html
Used at the correct spots (e.g. around frequently called functions) it can easily speed things up by a factor of 10 - 100.

Python is a slightly odd language in that it is both interpreted and compiled. Well sort of. When you run it is compiled to ".pyc" bytecode - so we can quickly get bogged down in semantic details here. Hell I don't even know if what I just said is strictly accurate. But at the end of the day you want to speed things up so...
First, use the profiler and timeit to work out where all the time is going
Second, rewrite your pure python code to improve the slow bits you've discovered
Third, see how it goes when optimised
Now, depends on your scenario, but seriously think "Can I run it on a bigger CPU/memory"
Ok, try rewriting those slow sections in C++
Screw it, write it all in C++
If you get so far as the last option I dare say you're screwed and the savings aren't going to be significant.

Increasing efficiency of Python copying large datasets

I'm having a bit of trouble with an implementation of random forests I'm working on in Python. Bare in mind, I'm well aware that Python is not intended for highly efficient number crunching. The choice was based more on wanting to get a deeper understanding of and additional experience in Python. I'd like to find a solution to make it "reasonable".
With that said, I'm curious if anyone here can make some performance improvement suggestions to my implementation. Running it through the profiler, it's obvious the most time is being spent executing the list "append" command and my dataset split operation. Essentially I have a large dataset implemented as a matrix (rather, a list of lists). I'm using that dataset to build a decision tree, so I'll split on columns with the highest information gain. The split consists of creating two new dataset with only the rows matching some critera. The new dataset is generated by initializing two empty lista and appending appropriate rows to them.
I don't know the size of the lists in advance, so I can't pre-allocate them, unless it's possible to preallocate abundant list space but then update the list size at the end (I haven't seen this referenced anywhere).
Is there a better way to handle this task in python?

Without seeing your codes, it is really hard to give any specific suggestions since optimisation is code-dependent process that varies case by case. However there are still some general things:
review your algorithm, try to reduce the number of loops. It seems
you have a lot of loops and some of them are deeply embedded in
other loops (I guess).
if possible use higher performance utility modules such as itertools
instead of naive codes written by yourself.
If you are interested, try PyPy (http://pypy.org/), it is a
performance-oriented implementation of Python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.