I'm working on a program in python on Windows 7 that matches features between multiple images in real time. It is intended to be the only program running.
When I run it on my laptop, it runs very slowly. However, when I check how much memory it is using with the task manager, it is only using about 46,000 KB. I would like to increase the memory available to the python process so that it can use all available memory.
Any advice would be greatly appreciated.
Python does not have a built-in mechanism for limiting memory consumption; if that's all it's using, then that's all it'll use.
If you're doing image comparisons, chances are good you are CPU-bound, not memory-bound. Unless those are gigantic images, you're probably OK.
So, check your code for performance problems, use heuristics to avoid running unnecessary code, and send what you've got out for code review for others to help you.
Each process can use the same amount of (virtual) memory that the OS makes available. Python is not special in that regard. Perhaps you'd want to modify your program, but we'd have to see some code to comment on that.
Related
I'm working on a NLP classification problem over a large database of emails (~1 million). I need to use spacy to parse texts and I'm using the nlp.pipe() method as nlp.pipe(emails,n_process=CPU_CORES, batch_size=20) to loop over the dataset.
The code works but I'm facing a (maybe not so)weird behavior:
the processes are being created but they are all in SLEEP state but one, casually some of them go in RUN state for a few seconds and then back to sleep. So I find myself with one single process using one core at 100% but of course the script not using all the CPU cores.
It's like the processes don't get "fed" input data from pipe.
Does anybody know how to properly use spacy nlp pipe or anyway how to avoid this situation? no way to use nlp.pipe with the GPU?
Thank you very much!
Sandro
EDIT: I still have no solution but i've noticed that if I set the batch_size=divmod(len(emails),CPU_CORES), the processes all starts running at 100% CPU and after a few seconds they all switch to sleeping state but one. It really looks like some element in spacy pipe gets locked while waiting for something to end.... any idea??
EDIT2: Setting batch_size=divmod(len(emails),CPU_CORES) while processing a large dataset leads inevitably to a spacy memory error:
MemoryError: Unable to allocate array with shape (1232821, 288) and data type float32
*thing that is maybe not so weird as my machine has 10GB of RAM and (1232821×288×32)bits / 8 = 1.4GB multiplied by 6 (CPU_CORES) leads to a 8.5GB of RAM needed. Therefore I guess that, having other stuff in memory already, it turns out to be plausible. *
I've found that using n_process=n works well for some models, like en_core_web_lg, but fails with others, such as en_core_web_trf.
For whatever reason, en_core_web_trf seems to use all cores with just a batch_size specified, whereas en_core_web_lg uses just one unless n_process=n is given. Likewise, en_core_web_trf fails with a closure error if n_process=n is specified.
Ok, I think I found an improvement but honestly the behavior it's not really clear to me. Now the sleeping processes are way less, with most of them stable running and a few sleeping or switching between the two states.
What I've done was to clean and speedup all the code inside the for loop and set the nlp.pipe args like this:
for e in nlp.pipe(emails,n_process=CPU_CORES-1, batch_size=200):
If anybody have any explanation about this or any suggestion on how to improve even more, it's of course more than welcome :)
I read something about hyperthreading and found out how beneficial it is for performace. I know the idea is something like when 2 disjoint processes on core are going to execute, in hyperthreading it can be executed in the same time. I also know, that hyperthreading is using some magic with spliting core to 2 virtual ones.
But imagine this situation. I have some basic python script doing some maths. Its 1 threaded process and can push core to 100% for a long time. Its always doing the same, so it cannot use hyperthreading best feature to execute in same time.
What will be the difference in performance in HT on and off?
In ht on, will i have only 50-60% of the proccess performance or close to 100%?
My example: I have 2 cores(4 threads), and im running 1 program on 100%, windows manager is showing around 32-33% cpu used by that process. I also try it with Process Explorer, there i see its capped with 25% on that process.So, if i dont have HT on, it will be 50%. But im now not sure, if im using only half of the core power, or its showing wrong numbers or whats over. I need some explanation.
Python does not allow taking advantage of processor parallelism except for some special cases (like numpy, gzip and other C extensions). For more information, read about the GIL.
HT will not change anything unless your system has other things running as well.
PS: You can test and measure this!
I have a couple of Python/Numpy programs that tend to cause the PC to freeze/run very slowly when they use too much memory. I can't even stop the scripts or move the cursor anymore, when it uses to much memory (e.g. 3.8/4GB)
Therefore, I would like to quit the program automatically when it hits a critical limit of memory usage, e.g. 3GB.
I could not find a solution yet. Is there a Pythonic way to deal with this, since I run my scripts on Windows and Linux machines.
You could limit the process'es memory limit, but that is OS specific.
Another solution would be checking value of psutil.virtual_memory(), and exiting your program if it reaches some point.
Though OS-independent, the second solution is not Pythonic at all. Memory management is one of the things we have operating systems for.
I'd agree that in general you want to do this from within the operating system - only because there's a reliability factor in having "possibly runaway code check itself for possibly runaway behavior"
If a hard and fast requirement is to do this WITHIN the script, then I think we'd need to know more about what you're actually doing. If you have a single large data structure that's consuming the majority of the memory, you can use sys.getsizeof to identify how large that structure is, and throw/catch an error if it gets larger than you want.
But without knowing at least a little more about the program structure, I think it'll be hard to help...
I have just written a script that is intended to be run 24/7 to update some files. However, if it takes 3 minutes to update one file, then it would take 300 minutes to update 100 files.
Is it possible to run n instances of the script to manage n separate files to speed up the turnaround time?
Yes it is possible. Use the multiprocessing module to start several concurrent processes. This has the advantage that you do not run into problems because of the Global Interpreter Lock and threads as is explained in the manual page. The manual page includes all the examples you will need to make your script execute in parallel. Of course this works best if the processes do not have to interact, which your example suggests.
I suggest you first find out if there is any way to reduce the 3 minutes in a single thread.
The method I use to discover speedup opportunities is demonstrated here.
That will also tell you if you are purely I/O bound.
If you are completely I/O bound, and all files are on a single disk, parallelism won't help.
In that case, possibly storing the files on a solid-state drive would help.
On the other hand, if you are CPU bound, parallelism will help, as #hochl said.
Regardless, find the speedup opportunities and fix them.
I've never seen any good-size program that didn't have one or several of them.
That will give you one speedup factor, and parallelism will give you another, and the total speedup will be the product of those two factors.
I have a Python program that dies with a MemoryError when I feed it a large file. Are there any tools that I could use to figure out what's using the memory?
This program ran fine on smaller input files. The program obviously needs some scalability improvements; I'm just trying to figure out where. "Benchmark before you optimize", as a wise person once said.
(Just to forestall the inevitable "add more RAM" answer: This is running on a 32-bit WinXP box with 4GB RAM, so Python has access to 2GB of usable memory. Adding more memory is not technically possible. Reinstalling my PC with 64-bit Windows is not practical.)
EDIT: Oops, this is a duplicate of Which Python memory profiler is recommended?
Heapy is a memory profiler for Python, which is the type of tool you need.
The simplest and lightweight way would likely be to use the built in memory query capabilities of Python, such as sys.getsizeof - just run it on your objects for a reduced problem (i.e. a smaller file) and see what takes a lot of memory.
In your case, the answer is probably very simple: Do not read the whole file at once but process the file chunk by chunk. That may be very easy or complicated depending on your usage scenario. Just for example, a MD5 checksum computation can be done much more efficiently for huge files without reading the whole file in. The latter change has dramatically reduced memory consumption in some SCons usage scenarios but was almost impossible to trace with a memory profiler.
If you still need a memory profiler: eliben already suggested sys.getsizeof. If that doesn't cut it, try Heapy or Pympler.
You asked for a tool recommendation:
Python Memory Validator allows you to monitor the memory usage, allocation locations, GC collections, object instances, memory snapshots, etc of your Python application. Windows only.
http://www.softwareverify.com/python/memory/index.html
Disclaimer: I was involved in the creation of this software.