I'm tinkering with some big-data queries in the ipython shell using the Django ORM. This is on a Debian 6 VM in VMware Fusion on OS X, the VM is allowed access 4 or 8 cores (I've played with the settings) of the 4-core HT i7 on the host.
When I watch the progress in top, when doing for example a 'for result in results: do_query()' in the python shell, it seems that python and one of the postgres processes are always co-located on the same physical CPU core - their total CPU usage never adds up to more than 100%, python is usually 65% to postgres' 25% or so. iowait on the VM isn't excessively high.
I'm not positive they're always on the same core, but it sure looks it. Given how I plan to scale this eventually, I'd prefer that the python process(es) and postgress workers be scheduled more optimally. Any insight?
Right now, if your code works the way I think it works, Postgres is always waiting for Python to send it a query, or Python is waiting for Postgres to come back with a response. There's no situation where they'd both be doing work at once, so only one ever runs at a time.
To start using your machine more heavily, you'll need to implement some sort of multithreading on the Python end. Since you haven't given many details on what your queries are, it's hard to say what that might look like.
Related
I am developing an automation tool that is supposed to upgrade IP network devices.
I developed 2 totally separated script for the sake of simplicity - I am not an expert developer - one for the core and aggregation nodes, and one for the access nodes.
The tool executes software upgrade on the routers, and verifies the result by executing a set of post check commands. The device role implies the "size" of the router. Bigger routers take much more to finish the upgrade. Meanwhile the smaller ones are up much earlier than the bigger ones, the post check cannot be started until the bigger ones finish the upgrade, because they are connected to each other.
I want to implement a reliable signaling between the 2 scripts. That is, the slower script(core devices) flips a switch when the core devices are up, while the other script keeps checking this value, and start the checks for the access devices.
Both script run 200+ concurrent sessions moreover, each and every access device(session) needs individual signaling, so all the sessions keep checking the same value in the DB.
First I used the keyring library, but noticed that the keys do disappear sometimes. Now I am using a txt file to manipulate the signal values. It looks pretty much armature, so I would like to use MongoDB.
Would it cause any performance issues or unexpected exception?
The script will be running for 90+ minutes. Is it OK to connect to the DB once at the beginning of the script, set the signal to False, then 20~30 minutes later keep checking for an additional 20 minutes. Or is it advised to establish a new connection for reading the value for each and every parallel session?
The server runs on the same VM as the script. What exceptions shall I expect?
Thank you!
I'm using g2.2 xlarge instance of amazon.
I'm having this function that takes 3 minutes to run on my laptop which is so slow.
However, when running it on EC2 it takes the same time , sometimes even more.
Seeing the statistics , I noticed EC2 uses at its best 25% of CPU.
I paralleled my code, It's better but I get the same execution time between my laptop and EC2.
for my function:
I have an image as input,I run my function 2 times(image with and without image processing) that I managed to run them in parallel. I then extract 8 text fields from that image using 2 machine learning algorithms(faster-rcnn(field detection)+clstm(text reading) and then the text is displayed on my computer.
Any idea how to improve performance (processing time) in EC2?
I think you need to profile your code locally and ensure it really is CPU bound. Could it be that time is spent on the network or accessing disk (e.g. reading the image to start with).
If it is CPU bound then explore how to exploit all the cores available (and 25% sounds suspicious - is it maxing out one core?). Python can be hard to parallelise due to the (in)famous GIL. However, only worry about this when you can prove it's a problem, profile first!
I have a pipe coming from my Web server to my primary development desktop in order to have a slot open for heavy CPU processes without paying for the premium from Amazon or another cloud platform. I do however still use this machine for other personal things such as video encoding or gaming.
Is there a way to combine both a NICE value and a cpulimit value in order to slow down the maximum percentage of the CPU being used but it has the highest priority so it will absolutely be done when requested. Say for example I wanted 25% of my CPU available on demand to the process no matter what I was doing on the machine currently.
Ideally I would like it to be able to allow a higher percentage during times that I am not using the machine but setting a minimum that is always available.
Is there a clean way to do this? The only way that I found so far is by sticking the process in a separate virtual machine but it feels like I'm making things a whole lot more complicated than they need to be in order to make it run smoothly. On top of that, the ability to allow a higher percentage from a limited virtual machine currently doesn't exist as far as I know.
As a side note, I'm doing all this on a Mac so this solution will have to be Unix based. And the server I'm using is python's CherryPy for easy expansions on new developments.
Thank you in advance.
I'm writing a Oracle of Bacon type website that involves a breadth first search on a very large directed graph (>5 million nodes with an average of perhaps 30 outbound edges each). This is also essentially all the site will do, aside from display a few mostly text pages (how it works, contact info, etc.). I currently have a test implementation running in Python, but even using Python arrays to efficiently represent the data, it takes >1.5gb of RAM to hold the whole thing. Clearly Python is the wrong language for a low-level algorithmic problem like this, so I plan to rewrite most of it in C using the Python/C bindings. I estimate that this'll take about 300 mb of RAM.
Based on my current configuration, this will run through mod_wsgi in apache 2.2.14, which is set to use mpm_worker_module. Each child apache server will then load up the whole python setup (which loads the C extension) thus using 300 mb, and I only have 4gb of RAM. This'll take time to load and it seems like it'd potentially keep the number of server instances lower than it could otherwise be. If I understand correctly, data-heavy (and not client-interaction-heavy) tasks like this would typically get divorced from the server by setting up an SQL database or something of the sort that all the server processes could then query. But I don't know of a database framework that'd fit my needs.
So, how to proceed? Is it worth trying to set up a database divorced from the webserver, or in some other way move the application a step farther out than mod_wsgi, in order to maybe get a few more server instances running? If so, how could this be done?
My first impression is that the database, and not the server, is always going to be the limiting factor. It looks like the typical Apache mpm_worker_module configuration has ServerLimit 16 anyways, so I'd probably only get a few more servers. And if I did divorce the database from the server I'd have to have some way to run multiple instances of the database as well (I already know that just one probably won't cut it for the traffic levels I want to support) and make them play nice with the server. So I've perhaps mostly answered my own question, but this is a kind of odd situation so I figured it'd be worth seeing if anyone's got a firmer handle on it. Anything I'm missing? Does this implementation make sense? Thanks in advance!
Technical details: it's a Django website that I'm going to serve using Apache 2.2.14 on Ubuntu 10.4.
First up, look at daemon mode of mod_wsgi and don't use embedded mode as then you can control separate to Apache child processes the number of Python WSGI application processes. Secondly, you would be better off putting the memory hungry bits in a separate backend process. You might use XML-RPC or other message queueing system to communicate with the backend processes, or even perhaps see if you can use Celery in some way.
I am running several thousand python processes on multiple servers which go off, lookup a website, do some analysis and then write the results to a central MySQL database.
It all works fine for about 8 hours and then my scripts start to wait for a MySQL connection.
On checking top it's clear that the MySQL daemon is overloaded as it is using up to 90% of most of the CPUs.
When I stop all my scripts, MySQL continues to use resources for some time afterwards.
I assume it is still updating the indexes? - If so, is there anyway of determining which indexes it is working on, or if not what it is actually doing?
Many thanks in advance.
Try enabling the slow query log: http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
Also, take a look at the output of SHOW PROCESSLIST; on the mysql shell, it should give you some more information.
There are a lot of tweaks that can be done to improve the performance of MySQL. Given your workload, you would probably benefit a lot from mysql 5.5 and higher, which improved performance on multiprocessor machines. Is the machine in question hitting VM? if it is paging out, then the performance of mysql will be horrible.
My suggestions:
check version of mysql. If possible, get the latest 5.5 version.
Look at the config files for mysql called my.cnf. Make sure that it makes sense on your machine. There are example config files for small, medium, large, etc machines to run MySQL. I think the default setup is for a machine with < 1 Gig of ram.
As the other answer suggests, turn on slow query logging.