Syntax for using mr.ripley for benchmarking - python

I have a Plone 3.3.5 site that I'm migrating to plone.app.blob for BLOB storage. I'm looking to measure the difference in performance and resource usage by replaying requests to the site, pre-migration and post-migration.
I found that mr.ripley comes with it's own buildout and I used that to install it. That buildout contains a section which creates a script at bin/replay, which is configured by some parameters in the buildout.cfg. The included parameters look like they should work for my instance as I'm running on port 8080 as well.
I copied one of my (smaller) apache logs into the base directory of my mr.ripley buildout and chowned it so that my zope user can read it. Then I try to run it like this:
time bin/replay mysite.com_access.log
It seems to run (doesn't produce any errors or drop me back into the shell) however I don't see any signs that it's loading up the server. My RAM and CPU usage in top still look like the machine is idling.
Many hours later the process does still not seem to have been completed. I ran it using screen, detached and returned several times to the session, but it just seems to be stuck.
Any recommendations as to what I might be missing?

I've performed before and after load testing to test architecture changes. To do this we used JMeter. We took apache logs that represented the typical use we were after. JMeter allows these to be replayed. In addition it will simulate cookies/sessions and browser cache responses to make the request even more realistic.
Then we built a buildout to deploy jmeter and it's configuration out to several test nodes and let it run.
I know this doesn't answer your direct question but it's an alternative approach.

Related

How to expose an NLTK based ML(machine learning) Python Script as a Web Service?

Let me explain what I'm trying to achieve. In the past while working on Java platform, I used to write Java codes(say, to push or pull data from MySQL database etc.) then create a war file which essentially bundles all the class files, supporting files etc and put it under a servlet container like Tomcat and this becomes a web service and can be invoked from any platform.
In my current scenario, I've majority of work being done in Java, however the Natural Language Processing(NLP)/Machine Learning(ML) part is being done in Python using the NLTK, Scipy, Numpy etc libraries. I'm trying to use the services of this Python engine in existing Java code. Integrating the Python code to Java through something like Jython is not that straight-forward(as Jython does not support calling any python module which has C based extensions, as far as I know), So I thought the next option would be to make it a web service, similar to what I had done with Java web services in the past. Now comes the actual crux of the question, how do I run the ML engine as a web service and call the same from any platform, in my current scenario this happens to be Java. I tried looking in the web, for various options to achieve this and found things like CherryPy, Werkzeug etc but not able to find the right approach or any sample code or anything that shows how to invoke a NLTK-Python script and serve the result through web, and eventually replicating the functionality Java web service provides. In the Python-NLTK code, the ML engine does a data-training on a large corpus(this takes 3-4 minutes) and we don't want the Python code to go through this step every time a method is invoked. If I make it a web service, the data-training will happen only once, when the service starts and then the service is ready to be invoked and use the already trained engine.
Now coming back to the problem, I'm pretty new to this web service things in Python and would appreciate any pointers on how to achieve this .Also, any pointers on achieving the goal of calling NLTK based python scripts from Java, without using web services approach and which can deployed on production servers to give good performance would also be helpful and appreciable. Thanks in advance.
Just for a note, I'm currently running all my code on a Linux machine with Python 2.6, JDK 1.6 installed on it.
One method is to build an XML-RPC server, but you may wish to fork a new process for each connection to prevent the server from seizing up. I have written a detailed tutorial on how to go about this: https://speakerdeck.com/timclicks/case-studies-of-python-in-parallel?slide=68.
NLTK based system tends to be slow at response per request, but good throughput can be achieved given enough RAM.

How to remotely update Python applications

What is the best method to push changes to a program written in Python? I have a piece of software that is written in Python that will regularly be updated. What would be the best way to do this? All the machines will have Windows 7.
Also, excuse the ambiguity of my question. This will be my first time having to implement an updating procedure. Feel free to mention specifics you would like me ot add.
If you're not already packaging your program with InnoSetup, I strongly recommend you switch to it, because it has facilities to make this sort of thing easier. You can specify any special situations, such as files that should not be updated if they already exist (i.e. if you have any internal configuration files or things like that), in the InnoSetup script.
Next, to allow the client machine to find out about new versions of your app, keep a very small file on your public web server that has the version number of the current release and the URL to the latest version's installer exe. For this file to be useful, whenever you release a newer version of your program you must update this file, as well as the version number in the InnoSetup script, and also some kind of APP_VERSION constant in your program.
Then, you'll need to handle these parts of the updater yourself:
Detecting when a newer version is available by retrieving the current-version file from your web server over HTTP, and comparing the version number there to the app's own APP_VERSION. Make sure to do this query in a way that fails gracefully if the client machine doesn't have Internet access, and that doesn't block the GUI while it is doing the request (in case there's a network issue that forces the query to wait a long while for a timeout).
If a newer version is available, asking the user if they want to update, and if they say yes downloading an updated installer to the TEMP directory. Depending on what GUI toolkit you are using, there are various mechanisms for displaying a progress dialog during the download; this is a good idea since the installer is likely to be at least an MB.
Closing your app, running a special update script in the background, then starting up the app again.
The update script will wait for the original process to die completely (easiest way to do this is to pass in the original process's PID as a command line argument and have the update script send a query signal 0 to that process every second or so until it goes away.) It can then run the installer silently in the background, perhaps while displaying a "Please Wait..." dialog to the user. Once the installer is done and reports success in its return code, the updater can restart your program.
Depending on how big your app is, this is more wasteful of bandwidth than the method using git or another SCM. Every update with this approach would involve downloading the entire installer for the latest version of the app, whereas an SCM would only download the files that have changed. However, it has the advantage that it requires no special server facilities except a regular web server, and no special installation of the SCM client on the user's computer.
Plus, InnoSetup is just generally cool. :-)
I would suggest using a source control program such as git or subversion. Also, if you are okay with everyone seeing the code, you can post the code on github, where anyone can pull from it. You could make it private, but you would have to pay for it and all the users would also have create a github account and set it up with their git install.
If you use a source control program, the other people will have to pull the edits manually by running a command, but you could make a script pr batch file that does this and have it run at start up or at regular intervals.
Just to be clear, if you want to do this, you will have to put the code on a server with and SSH support and set up git. If you don't want to go through all of the server set up, I would reccomend github.
git- http://git-scm.com/ (For windows version, go to downloads and select msysGit)
github - https://github.com/
For those of you that would be looking into something a little less dated, I was just looking at how to create python applications that can be updated remotely (though not limited to Windows like OP).
It seems like esky as been a solution for a while. Though it's been deprecated since 2018.
The latest and most up to date solution seem to be a combination of pyinstaller and pyupdater. Note that I don't have personal experience with it, I'm looking for a friend.
It seems to support windows, linux and Mac though and both python 2 and 3 so definitely worth having a look.
The basic principles of application updates are described well by DSimon's answer.
However, update security is a different matter altogether: You don't want your clients to end up installing malicious files.
PyUpdater, as suggested in jlengrand's answer, does provide some secure update functionality, but, unfortunately, PyUpdater 4.0 is broken and there has not been a new release in over half a year (now Aug 2022).
There's also python-tuf, which is the reference implementation of The Update Framework (TUF).
TUF (python-tuf) does everything humanly possible to ensure your update files are distributed securely. However, it does not handle application-specific things like checking for new application versions and installation on the client side.

Python: tool to keep track of deployments

I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.
pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.
Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.

Hosting a non-trivial python program on the web?

I'm a complete novice in this area, so please excuse my ignorance.
I have three questions:
What's the best (fastest, easiest, headache-free) way of hosting a python program online?
I'm currently looking at Google App Engine and Web Frameworks for Python, but all the options are a bit overwhelming.
Which gui/viz libraries will transfer to a web app environment without problems?
I'm willing to sacrifice some performance for the sake of simplicity.
(Google App Engine can't do C libraries, so this is causing a dilemma.)
Where can I learn more about running a program locally vs. having a program continuously run on a server and taking requests from multiple users?
Currently I have a working Python program that only uses standard Python libraries. It currently uses around 2.7gb of ram, but as I increase my dataset, I'm predicting it will use closer to 6gb. I can run it on my personal machine, and everything is just peachy. I'd like to continue developing on the front end on my home machine and implement the web app later.
Here is a relevant, previous post of mine.
Depending on your knowledge with server administration, you should consider a dedicated server. I was doing running some custom Python modules with Numpy, Scipy, Pandas, etc. on some data on a shared server with Godaddy. One program I wrote took 120 seconds to complete. Recently we switched to a dedicated server and it now takes 2 seconds. The shared environment used CGI to run Python and I installed mod_python on the dedicated server.
Using a dedicated server allows COMPLETE control (including root access) to the server which allows the compilation and/or installation of anything. It is a bit pricy but if you're making money with your stuff it might be worth it.
Another option would be to use something like http://www.dyndns.com/ where you can host a domain on your own machine.
So with that said, perhaps some answers:
It depends on your requirements. ~4gb of RAM might require a dedicated server. What you are asking is not necessarily an easy task so don't be afraid to get your hands dirty.
Not sure what you mean here.
A server is just a computer that responds to requests. On the dedicated server (I keep mentioning) you are operating in a Unix (or Windows) environment just like you would locally. You use SOFTWARE (e.g. Apache web server) to serve client requests. My vote is mod_python.
It's a greater headache than a dedicated server, but it should be much closer to your needs to go with an Amazon EC2 instance.
http://aws.amazon.com/ec2/#instance
Their extra large instance should be more than large enough for what you need to do, and you only turn the instance on when you need it so you don't have the massive bill that you get with a dedicated server that's the same size.
There are some nice javascript based visualization toolkits out there, so you can model your application to return raw (json) data and render that on the client.
I can mention d3.js http://mbostock.github.com/d3/ and the JavaScript InfoVis Toolkit http://thejit.org/

What are some successful methods for deploying a Django application on the desktop?

I have a Django application that I would like to deploy to the desktop. I have read a little on this and see that one way is to use freeze. I have used this with varying success in the past for Python applications, but am not convinced it is the best approach for a Django application.
My questions are: what are some successful methods you have used for deploying Django applications? Is there a de facto standard method? Have you hit any dead ends? I need a cross platform solution.
I did this a couple years ago for a Django app running as a local daemon. It was launched by Twisted and wrapped by py2app for Mac and py2exe for Windows. There was both a browser as well as an Air front-end hitting it. It worked pretty well for the most part but I didn't get to deploy it out in the wild because the larger project got postponed. It's been a while and I'm a bit rusty on the details, but here are a few tips:
IIRC, the most problematic thing was Python loading C extensions. I had an Intel assembler module written with C "asm" commands that I needed to load to get low-level system data. That took a while to get working across both platforms. If you can, try to avoid C extensions.
You'll definitely need an installer. Most likely the app will end up running in the background, so you'll need to mark it as a Windows service, Unix daemon, or Mac launchd application.
In your installer you'll want to provide a way to locate a free local TCP port. You may have to write a little stub routine that the installer runs or use the installer's built-in scripting facility to find a port that hasn't been taken and save it to a config file. You then load the config file inside your settings.py and whatever front-end you're going to deploy. That's the shared port. Or you could just pick a random number and hope no other service on the desktop steps on your toes :-)
If your front-end and back-end are separate apps then you'll need to design an API for them to talk to each other. Make sure you provide a flag to return the data in both raw and human-readable form. It really helps in debugging.
If you want Django to be able to send notifications to the user, you'll want to integrate with something like Growl or get Python for Windows extensions so you can bring up toaster pop-up notifications.
You'll probably want to stick with SQLite for database in which case you'll want to make sure you use semaphores to tackle multiple requests vying for the database (or any other shared resource). If your app is accessed via a browser users can have multiple windows open and hit the app at the same time. If using a custom front-end (native, Air, etc...) then you can control how many instances are running at a given time so it won't be as much of an issue.
You'll also want some sort of access to local system logging facilities since the app will be running in the background and make sure you trap all your exceptions and route it into the syslog. A big hassle was debugging Windows service startup issues. It would have been impossible without system logging.
Be careful about hardcoded paths if you want to stay cross-platform. You may have to rely on the installer to write a config file entry with the actual installation path which you'll have to load up at startup.
Test actual deployment especially across a variety of firewalls. Some of the desktop firewalls get pretty aggressive about blocking access to network services that accept incoming requests.
That's all I can think of. Hope it helps.
If you want a good solution, you should give up on making it cross platform. Your code should all be portable, but your deployment - almost by definition - needs to be platform-specific.
I would recommend using py2exe on Windows, py2app on MacOS X, and building deb packages for Ubuntu with a .desktop file in the right place in the package for an entry to show up in the user's menu. Unfortunately for the last option there's no convenient 'py2deb' or 'py2xdg', but it's pretty easy to make the relevant text file by hand.
And of course, I'd recommend bundling in Twisted as your web server for making the application easily self-contained :).

Categories