Obtaining profiling information on running python app - python

If you have a python application in production (i.e not already running under a debugger or profiler), is there any way to attach to the python process/instance and examine it? It would be useful to know:
what code is consuming time
what is using memory
Any insight would be appreciated.

Related

Service to trigger and run python scripts?

So far when dealing with web scraping projects, I've used GAppsScript, meaning that I can easily trigger the script to be run once a day.
Is there an equivalent service when dealing with python scripts? I have a RaspberryPi, so I guess I can keep it on 24/7 and use cronjobs to trigger the script daily. But that seems rather wasteful, since I'm talking about a few small scripts that take only a few seconds to run.
Is there any service that allows me to trigger a python script once a day? (without a need to keep a local machine on 24/7) The simpler the solution the better, wouldn't want to overengineer such a basic use case if a ready-made system already exists.
The only service I've found so far to do this with is WayScript and here's a python example running in the cloud. The free tier that should be enough for most simple/hobby-tier usecases.

How to expose an NLTK based ML(machine learning) Python Script as a Web Service?

Let me explain what I'm trying to achieve. In the past while working on Java platform, I used to write Java codes(say, to push or pull data from MySQL database etc.) then create a war file which essentially bundles all the class files, supporting files etc and put it under a servlet container like Tomcat and this becomes a web service and can be invoked from any platform.
In my current scenario, I've majority of work being done in Java, however the Natural Language Processing(NLP)/Machine Learning(ML) part is being done in Python using the NLTK, Scipy, Numpy etc libraries. I'm trying to use the services of this Python engine in existing Java code. Integrating the Python code to Java through something like Jython is not that straight-forward(as Jython does not support calling any python module which has C based extensions, as far as I know), So I thought the next option would be to make it a web service, similar to what I had done with Java web services in the past. Now comes the actual crux of the question, how do I run the ML engine as a web service and call the same from any platform, in my current scenario this happens to be Java. I tried looking in the web, for various options to achieve this and found things like CherryPy, Werkzeug etc but not able to find the right approach or any sample code or anything that shows how to invoke a NLTK-Python script and serve the result through web, and eventually replicating the functionality Java web service provides. In the Python-NLTK code, the ML engine does a data-training on a large corpus(this takes 3-4 minutes) and we don't want the Python code to go through this step every time a method is invoked. If I make it a web service, the data-training will happen only once, when the service starts and then the service is ready to be invoked and use the already trained engine.
Now coming back to the problem, I'm pretty new to this web service things in Python and would appreciate any pointers on how to achieve this .Also, any pointers on achieving the goal of calling NLTK based python scripts from Java, without using web services approach and which can deployed on production servers to give good performance would also be helpful and appreciable. Thanks in advance.
Just for a note, I'm currently running all my code on a Linux machine with Python 2.6, JDK 1.6 installed on it.
One method is to build an XML-RPC server, but you may wish to fork a new process for each connection to prevent the server from seizing up. I have written a detailed tutorial on how to go about this: https://speakerdeck.com/timclicks/case-studies-of-python-in-parallel?slide=68.
NLTK based system tends to be slow at response per request, but good throughput can be achieved given enough RAM.

What are some good examples of processes to automate for a python beginner to start with?

Trying to get some initial bearings on useful processes that a basic working knowledge of python can assist with or make less tedious. Specifically, processes that can be executed on the command line in a Linux environment. An example or two of both the tedious process as well as sample code to use as a starting point would be greatly appreciated.
What you want to automate depends on what you are doing manually and what your role is ? If you are a system administrator (say) and if you have shell scripts written to automate some of the tasks (like server management, user account creation etc.) you can port them to Python.

Syntax for using mr.ripley for benchmarking

I have a Plone 3.3.5 site that I'm migrating to plone.app.blob for BLOB storage. I'm looking to measure the difference in performance and resource usage by replaying requests to the site, pre-migration and post-migration.
I found that mr.ripley comes with it's own buildout and I used that to install it. That buildout contains a section which creates a script at bin/replay, which is configured by some parameters in the buildout.cfg. The included parameters look like they should work for my instance as I'm running on port 8080 as well.
I copied one of my (smaller) apache logs into the base directory of my mr.ripley buildout and chowned it so that my zope user can read it. Then I try to run it like this:
time bin/replay mysite.com_access.log
It seems to run (doesn't produce any errors or drop me back into the shell) however I don't see any signs that it's loading up the server. My RAM and CPU usage in top still look like the machine is idling.
Many hours later the process does still not seem to have been completed. I ran it using screen, detached and returned several times to the session, but it just seems to be stuck.
Any recommendations as to what I might be missing?
I've performed before and after load testing to test architecture changes. To do this we used JMeter. We took apache logs that represented the typical use we were after. JMeter allows these to be replayed. In addition it will simulate cookies/sessions and browser cache responses to make the request even more realistic.
Then we built a buildout to deploy jmeter and it's configuration out to several test nodes and let it run.
I know this doesn't answer your direct question but it's an alternative approach.

Memory not released by python cherrypy application on linux

I have a long running process that will fetch 100k rows from the db genrate a web page and then release all the small objets (list, tuples and dicts). On windows, after each request the memory is freed. Howerver, on linux, the memory of the server keeps growing.
The following posts describes what the problem is and one possible solution.
http://pushingtheweb.com/2010/06/python-and-tcmalloc/
Is there any other way to get around this problem without having to compile my own python version which uses tcmalloc. That option is going to be very difficult to do, since python is controlled by the sys admin.
You may be able to compile Python in your own working directory rather than try to have the sysadmin replace the system Python.
First you should confirm that the tcmalloc solution solves your problem and does not impact performance too much for your application

Categories