How to analyse performance of python without completing it? - python

As I was writing a Python script using a third party module, the workload was so big that the OS (Linux with 32GB memory) killed it everytime before it could complete. We learned from syslog that it ran out of physical memory, so the OS killed it through OOM.
Many current performance analysis tools e.g. profile require completion of the script and can not go into the modules that the script used. So I reckon that this should be a common case where completion of the script is not available, and performance analysis is needed desperately under this kind of circumstance. Any advice?

From the original question:
Profile is an amazing tool for performance analysis and does not require completion, and can go into the module that the script used. I think for this question, the best answer is to use profile.

Related

Prevent freezing when running python scripts from terminal

I am running some python scripts in my linux terminal that happen to be pretty resource intensive, but when I do my system will become pretty non-responsive until the process has completed. I know there are commands like nice and cpulimit but I haven't found a great way to just open a terminal that is somehow resource limited (and what percentage of resources can be devoted to it) and can be used to run any scripts during that particular session.
So is there a good way to do this?

python copying directory and reading text files Remotely

I'm about to start working on a project where a Python script is able to remote into a Windows Server and read a bunch of text files in a certain directory. I was planning on using a module called WMI as that is the only way I have been able to successfully remotely access a windows server using Python, But upon further research I'm not sure i am going to be using this module.
The only problem is that, these text files are constantly updating about every 2 seconds and I'm afraid that the script will crash if it comes into an MutEx error where it tries to open the file while it is being rewritten. The only thing I can think of is creating a new directory, copying all the files (via script) into this directory in the state that they are in and reading them from there; and just constantly overwriting these ones with the new ones once it finishes checking all of the old ones. Unfortunately I don't know how to execute this correctly, or efficiently.
How can I go about doing this? Which python module would be best for this execution?
There is Windows support in Ansible these days. It uses winrm. There are plenty of Python libraries that utilize winrm, just google it, but Ansible is very versatile.
http://docs.ansible.com/intro_windows.html
https://msdn.microsoft.com/en-us/library/aa384426%28v=vs.85%29.aspx
I've done some work with WMI before (though not from Python) and I would not try to use it for a project like this. As you said WMI tends to be obscure and my experience says such things are hard to support long-term.
I would either work at the Windows API level, or possibly design a service that performs the desired actions access this service as needed. Of course, you will need to install this service on each machine you need to control. Both approaches have merit. The WinAPI approach pretty much guarantees you don't invent any new security holes and is simpler initially. The service approach should make the application faster and required less network traffic. I am sure you can think of others easily.
You still have to have the necessary permissions, network ports, etc. regardless of the approach. E.g., WMI is usually blocked by firewalls and you still run as some NT process.
Sorry, not really an answer as such -- meant as a long comment.
ADDED
Re: API programming, though you have no Windows API experience, I expect you find it familiar for tasks such as you describe, i.e., reading and writing files, scanning directories are nothing unique to Windows. You only need to learn about the parts of the API that interest you.
Once you create the appropriate security contexts and start your client process, there is nothing service-oriented in the, i.e., your can simply open and close files, etc., ignoring that fact that the files are remote, other than server name being included in the UNC name of the file/folder location.

Portable way to check if virtualized OS

Is there any portable way or library to check if Python script is running on a virtualized OS and also which virtualization platform it's running on.
This Detect virtualized OS from an application? questions discusses a c version.
I think you call linux command virt-what in python.
The descriptio of virt-what is here: http://people.redhat.com/~rjones/virt-what/
To my knowledge, there is no nice, portable way to figure this out from Python. Most of the time, the way people try to figure out if they're being virtualized or not is looking for clues left by the VM -- either some instruction isn't quite right, or some behavior is off.
However, all might not be lost if you're willing to go off box. When you're in a VM, you will almost never have perfect native performance. Thus, if you make enough measurements against a server you trust, then it might be possible to detect if you're in a VM. This is especially the case if you're running on a machine with multiple machines. Check your own time, how much time you're getting scheduled, and how much wall time has past (based on an external measurement because you can't trust the local machine). You'll probably have better luck if you can watch how much time has passed on the local machine rather than just inside one process.

Memory not released by python cherrypy application on linux

I have a long running process that will fetch 100k rows from the db genrate a web page and then release all the small objets (list, tuples and dicts). On windows, after each request the memory is freed. Howerver, on linux, the memory of the server keeps growing.
The following posts describes what the problem is and one possible solution.
http://pushingtheweb.com/2010/06/python-and-tcmalloc/
Is there any other way to get around this problem without having to compile my own python version which uses tcmalloc. That option is going to be very difficult to do, since python is controlled by the sys admin.
You may be able to compile Python in your own working directory rather than try to have the sysadmin replace the system Python.
First you should confirm that the tcmalloc solution solves your problem and does not impact performance too much for your application

Measuring CPU time per-thread on Windows

I'm developing a long-running multi-threaded Python application for Windows, and I want the process to know the CPU time that each of its threads has taken. I can get the overall times for the entire process with os.times() but I need to know the per-thread times.
I know that there are external tools such as the Sysinternals Process Explorer, but my program itself needs to have this information. If I were on Linux, I look in the /proc filesystem, as described here. If I were writing C code, I'd use the GetThreadTimes call, as described here.
So how can I accomplish this on Windows using Python?
win32process.GetThreadTimes
You want the Python for Windows Extensions to do hairy windows things.
Or you can simply use yappi. (https://code.google.com/p/yappi/) It transparently uses GetThreadTimes() if CPU clock type is selected for profiling.
See here also for an example: https://code.google.com/p/yappi/wiki/YThreadStats_v082

Categories