I have python program which works on aws-lambda.
It sometimes works and sometimes not (about one time in 20 times)
No error message, nor no signal.
So I guesss this is because of memory... maybe.
For testing purpose,
I want to make the program which wastes the memory on purpose.
I searched around but can't find the good sample or explanation.
Almost all samples are for not wasting memory
Is there any good way to do this?
Related
I am working on an optimization algorithm which I need to run at least 100 times to see its performance. The entire script is a straight loop script, running a specific set of code multiple times over. The problem is that this entire thing takes up to 10 hours on a small dataset.
Is it possible to run this on a platform so that I can decrease this time? Can I run it faster on the cloud?
I suggest to you to "divide and conquer" your algo.
First, take a small sample of data in order to explore the code without lost too much time. After that you have a manageable piece of code, you can appy some kind of profiling tools to see where your time is spent.
Say there are many (about 300,000) JSON files that take much time (about 30 minutes) to load into a list of Python objects. Profiling revealed that it is in fact not the file access but the decoding, which takes most of the time. Is there a format that I can convert these files to, which can be loaded much faster into a python list of objects?
My attempt: I converted the files to ProtoBuf (aka Google's Protocol Buffers) but even though I got really small files (reduced to ~20% of their original size), the time to load them did not improve that dramatically (still more than 20 minutes to load them all).
You might be looking into the wrong direction with the conversion as it will probably not cut your loading times as much as you would like. If the decoding is taking a lot of time, it will probably take quite some time from other formats as well, assuming that the JSON decoder is not really badly written. I am assuming the standard library functions have decent implementations, and JSON is not a lousy format for data storage speed-wise.
You could try running your program with PyPy instead of the default CPython implementation that I will assume you are using. PyPy could decrease the execution time tremendously. It has a faster JSON module and uses a JIT which might speed up your program a lot.
If you are using Python 3 you could also try using ProcessPoolExecutor to run the file loading and data deserialization / decoding concurrently. You will have to experiment with the degree of concurrency, but a good starting point is the number of your CPU cores, which you can halve or double. If your program waits for I/O a lot, you should run a higher degree of concurrency, if the degree of I/O is smaller you can try and reduce the concurrency. If you write each executor so that they load the data into Python objects and simply return them, you should be able to cut your loading times significantly. Note that you must use a process-driven approach, using threads will not work with the GIL.
You could also use a faster JSON library which could speed up your execution times two or three-fold in an optimal case. In a real-world use case the speed up will probably be smaller. Do note that these might not work with PyPy since it uses an alternative CFFI implementation and will not work with CPython programs, and PyPy has a good JSON module anyway.
Try ujson, it's quite a bit faster.
"Decoding takes most of the time" can be seen as "building the Python objects takes all the time". Do you really need all these things as Python objects in RAM all the time? It must be quite a lot.
I'd consider using a proper database for e.g. querying data of such size.
If you need mass processing of a different kind, e.g. stats or matrix processing, I'd take a look at pandas.
I am writing a 3D engine in Python using Pygame. For the first 300 or so frames, it takes from about 0.004 to 0.006 seconds to render 140 polygons. But after that, it suddenly takes an average of about 0.020 seconds to do the same task. This is concerning to me because this is a small-scale test, and even though 50 FPS is decent, it cannot be sustained at 1000 polygons, for instance.
I have already done a lot of streamlining to my code. I have also done some slightly deeper profiling, and it appears that the increased time is more or less proportionately distributed, which suggests that the problem is not specific to a single piece of code.
I assume that the problem has something to do with memory usage, but I do not know exactly why this problem is happening. What specific issue is causing this to happen, and how can I optimize my code to fix it, as well as some more general practices? Since my code is very long, it is posted here.
Although I can't answer your question exactly, I would use a task manager and watch the "python" (or "pygame" depending on your OS) process, and view it's memory consumption. If that turns out to be the issue, you could check to see what variables you don't need after a certain time, and then you could clear those variables.
Edit: Some CPUs have data loss prevention systems. What I mean by this is this:
If Application X takes up 40ish % of the CPU (it doesn't have to be that high). After a certain amount of time, the CPU will throttle the amount of CPU that Application X is allowed to use. This can cause slowdown for things such as this. This doesn't happen with (most) games because they're set up to tell the CPU to expect that amount of strain.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've completed writing a multiclass classification algorithm that uses boosted classifiers. One of the main calculations consists of weighted least squares regression.
The main libraries I've used include:
statsmodels (for regression)
numpy (pretty much everywhere)
scikit-image (for extracting HoG features of images)
I've developed the algorithm in Python, using Anaconda's Spyder.
I now need to use the algorithm to start training classification models. So I'll be passing approximately 7000-10000 images to this algorithm, each about 50x100, all in gray scale.
Now I've been told that a powerful machine is available in order to speed up the training process. And they asked me "am I using GPU?" And a few other questions.
To be honest I have no experience in CUDA/GPU, etc. I've only ever heard of them. I didn't develop my code with any such thing in mind. In fact I had the (ignorant) impression that a good machine will automatically run my code faster than a mediocre one, without my having to do anything about it. (Apart from obviously writing regular code efficiently in terms of loops, O(n), etc).
Is it still possible for my code to get speeded up simply by virtue of being on a high performance computer? Or do I need to modify it to make use of a parallel-processing machine?
The comments and Moj's answer give a lot of good advice. I have some experience on signal/image processing with python, and have banged my head against the performance wall repeatedly, and I just want to share a few thoughts about making things faster in general. Maybe these help figuring out possible solutions with slow algorithms.
Where is the time spent?
Let us assume that you have a great algorithm which is just too slow. The first step is to profile it to see where the time is spent. Sometimes the time is spent doing trivial things in a stupid way. It may be in your own code, or it may even be in the library code. For example, if you want to run a 2D Gaussian filter with a largish kernel, direct convolution is very slow, and even FFT may be slow. Approximating the filter with computationally cheap successive sliding averages may speed things up by a factor of 10 or 100 in some cases and give results which are close enough.
If a lot of time is spent in some module/library code, you should check if the algorithm is just a slow algorithm, or if there is something slow with the library. Python is a great programming language, but for pure number crunching operations it is not good, which means most great libraries have some binary libraries doing the heavy lifting. On the other hand, if you can find suitable libraries, the penalty for using python in signal/image processing is often negligible. Thus, rewriting the whole program in C does not usually help much.
Writing a good algorithm even in C is not always trivial, and sometimes the performance may vary a lot depending on things like CPU cache. If the data is in the CPU cache, it can be fetched very fast, if it is not, then the algorithm is much slower. This may introduce non-linear steps into the processing time depending on the data size. (Most people know this from the virtual memory swapping, where it is more visible.) Due to this it may be faster to solve 100 problems with 100 000 points than 1 problem with 10 000 000 points.
One thing to check is the precision used in the calculation. In some cases float32 is as good as float64 but much faster. In many cases there is no difference.
Multi-threading
Python - did I mention? - is a great programming language, but one of its shortcomings is that in its basic form it runs a single thread. So, no matter how many cores you have in your system, the wall clock time is always the same. The result is that one of the cores is at 100 %, and the others spend their time idling. Making things parallel and having multiple threads may improve your performance by a factor of, e.g., 3 in a 4-core machine.
It is usually a very good idea if you can split your problem into small independent parts. It helps with many performance bottlenecks.
And do not expect technology to come to rescue. If the code is not written to be parallel, it is very difficult for a machine to make it parallel.
GPUs
Your machine may have a great GPU with maybe 1536 number-hungry cores ready to crunch everything you toss at them. The bad news is that making GPU code is a bit different from writing CPU code. There are some slightly generic APIs around (CUDA, OpenCL), but if you are not accustomed to writing parallel code for GPUs, prepare for a steepish learning curve. On the other hand, it is likely someone has already written the library you need, and then you only need to hook to that.
With GPUs the sheer number-crunching power is impressive, almost frightening. We may talk about 3 TFLOPS (3 x 10^12 single-precision floating-point ops per second). The problem there is how to get the data to the GPU cores, because the memory bandwidth will become the limiting factor. This means that even though using GPUs is a good idea in many cases, there are a lot of cases where there is no gain.
Typically, if you are performing a lot of local operations on the image, the operations are easy to make parallel, and they fit well a GPU. If you are doing global operations, the situation is a bit more complicated. A FFT requires information from all over the image, and thus the standard algorithm does not work well with GPUs. (There are GPU-based algorithms for FFTs, and they sometimes make things much faster.)
Also, beware that making your algorithms run on a GPU bind you to that GPU. The portability of your code across OSes or machines suffers.
Buy some performance
Also, one important thing to consider is if you need to run your algorithm once, once in a while, or in real time. Sometimes the solution is as easy as buying time from a larger computer. For a dollar or two an hour you can buy time from quite fast machines with a lot of resources. It is simpler and often cheaper than you would think. Also GPU capacity can be bought easily for a similar price.
One possibly slightly under-advertised property of some cloud services is that in some cases the IO speed of the virtual machines is extremely good compared to physical machines. The difference comes from the fact that there are no spinning platters with the average penalty of half-revolution per data seek. This may be important with data-intensive applications, especially if you work with a large number of files and access them in a non-linear way.
I am afraid you can not speed up your program by just running it on a powerful computer. I had this issue while back. I first used python (very slow), then moved to C(slow) and then had to use other tricks and techniques. for example it is sometimes possible to apply some dimensionality reduction to speed up things while having reasonable accurate result, or as you mentioned using multi processing techniques.
Since you are dealing with image processing problem, you do a lot of matrix operations and GPU for sure would be a great help. there are some nice and active cuda wrappers in python that you can easily use, by not knowing too much CUDA. I tried Theano, pycuda and scikit-cuda (there should be more than that since then).
I'm trying to read in a somewhat large dataset using pandas read_csv or read_stata functions, but I keep running into Memory Errors. What is the maximum size of a dataframe? My understanding is that dataframes should be okay as long as the data fits into memory, which shouldn't be a problem for me. What else could cause the memory error?
For context, I'm trying to read in the Survey of Consumer Finances 2007, both in ASCII format (using read_csv) and in Stata format (using read_stata). The file is around 200MB as dta and around 1.2GB as ASCII, and opening it in Stata tells me that there are 5,800 variables/columns for 22,000 observations/rows.
I'm going to post this answer as was discussed in comments. I've seen it come up numerous times without an accepted answer.
The Memory Error is intuitive - out of memory. But sometimes the solution or the debugging of this error is frustrating as you have enough memory, but the error remains.
1) Check for code errors
This may be a "dumb step" but that's why it's first. Make sure there are no infinite loops or things that will knowingly take a long time (like using something the os module that will search your entire computer and put the output in an excel file)
2) Make your code more efficient
Goes along the lines of Step 1. But if something simple is taking a long time, there's usually a module or a better way of doing something that is faster and more memory efficent. That's the beauty of Python and/or open source Languages!
3) Check The Total Memory of the object
The first step is to check the memory of an object. There are a ton of threads on Stack about this, so you can search them. Popular answers are here and here
to find the size of an object in bites you can always use sys.getsizeof():
import sys
print(sys.getsizeof(OBEJCT_NAME_HERE))
Now the error might happen before anything is created, but if you read the csv in chunks you can see how much memory is being used per chunk.
4) Check the memory while running
Sometimes you have enough memory but the function you are running consumes a lot of memory at runtime. This causes memory to spike beyond the actual size of the finished object causing the code/process to error. Checking memory in real time is lengthy, but can be done. Ipython is good with that. Check Their Document.
use the code below to see the documentation straight in Jupyter Notebook:
%mprun?
%memit?
Sample use:
%load_ext memory_profiler
def lol(x):
return x
%memit lol(500)
#output --- peak memory: 48.31 MiB, increment: 0.00 MiB
If you need help on magic functions This is a great post
5) This one may be first.... but Check for simple things like bit version
As in your case, a simple switching of the version of python you were running solved the issue.
Usually the above steps solve my issues.