How to improve Trac's performance - python

I have noticed that my particular instance of Trac is not running quickly and has big lags. This is at the very onset of a project, so not much is in Trac (except for plugins and code loaded into SVN).
Setup Info: This is via a SELinux system hosted by WebFaction. It is behind Apache, and connections are over SSL. Currently the .htpasswd file is what I use to control access.
Are there any recommend ways to improve the performance of Trac?

It's hard to say without knowing more about your setup, but one easy win is to make sure that Trac is running in something like mod_python, which keeps the Python runtime in memory. Otherwise, every HTTP request will cause Python to run, import all the modules, and then finally handle the request. Using mod_python (or FastCGI, whichever you prefer) will eliminate that loading and skip straight to the good stuff.
Also, as your Trac database grows and you get more people using the site, you'll probably outgrow the default SQLite database. At that point, you should think about migrating the database to PostgreSQL or MySQL, because they'll be able to handle concurrent requests much faster.

We've had the best luck with FastCGI. Another critical factor was to only use https for authentication but use http for all other traffic -- I was really surprised how much that made a difference.

I have noticed that if
select disctinct name from wiki
takes more than 5 seconds (for example due to a million rows in this table - this is a true story (We had a script that filled it)), browsing wiki pages becomes very slow and takes over 2*t*n, where t is time of execution of the quoted query (>5s of course), and n is a number of tracwiki links present on the viewed page.
This is due to trac having a (hardcoded) 5s cache expire for this query. It is used by trac to tell what the colour should the link be. We re-hardcoded the value to 30s (We need that many pages, so every 30s someone has to wait 6-7s).
It may not be what caused Your problem, but it may be. Good luck on speeding up Your Trac instance.

Serving the chrome files statically with and expires-header could help too. See the end of this page.

Related

Django: Concurrent access to settings.py

I am not sure whether I have to care about concurrency, but I didn't find any documentation about it.
I have some data stored at my settings.py like ip addresses and each user can take one or give one back. So I have read and write operations and I want that only one user read the file at the same moment.
How could I handle this?
And yes, I want to store the data at the settings.py. I found also the module django-concurrency. But I couldn't find anything at the documentation.
as e4c5 mentioned, conventionally settings.py is pretty light on logic. The loading mechanism for settings is pretty obscure and, I personally, like to stay away from things that are difficult to understand and interact with :)
You absolutely have to care about concurrency. How are you running your application? It's tricky because in the dev env you have a simple server and usually handle only a handful of requests at the same time (and a couple years ago the dev server was single threaded)
If you're running your application using a forking server, how will you share data between processes? one process won't even see the other processes settings.py changes. I'm not even sure of how it would look like with a threading server, but it would probably at least require a source code audit of your web server to understand the specifics of how requests are handled and how memory is shared.
Using a DB is by far the easiest solution, (you should be able to use an in memory db as an option too memcache/redis/etc). DB's provide concurrency support out the box and will be a lot more easier to reason about and provides primitives for concurrent accessing of data. And in the case of redis, which is single threaded you won't even have to worry about concurrent accesses to your shared IP addresses
And yes, I want to store the data at the settings.py.
No you definitely don't want to do that. the settings.py file is configuring django and any pluggable apps that you may use with it. it's not intended to be used as a place for dumping data. Data goes into a database.
And don't forget that the settings.py file is usually read only once.

Google App Engine development server random (?) slowdowns

I'm doing a small web application which might need to eventually scale somewhat, and am curious about Google App Engine. However, I am experiencing a problem with the development server (dev_appserver.py):
At seemingly random, requests will take 20-30 seconds to complete, even if there is no hard computation or data usage. One request might be really quick, even after changing a script of static file, but the next might be very slow. It seems to occur more systematically if the box has been left for a while without activity, but not always.
CPU and disk access is low during the period. There is not allot of data in my application either.
Does anyone know what could cause such random slowdowns? I've Google'd and searched here, but need some pointers.. /: I've also tried --clear_datastore and --use_sqlite, but the latter gives an error: DatabaseError('file is encrypted or is not a database',). Looking for the file, it does not seem to exist.
I am on Windows 8, python 2.7 and the most recent version of the App Engine SDK.
Don't worry about it. It (IIRC) keeps the whole DB (datastore) in memory using a "emulation" of the real thing. There are lots of other issues that you won't see when deployed.
I'd suggest that your hard drive is spinning down and the delay you see is it taking a few seconds to wake back up.
If this becomes a problem, develop using the deployed version. It's not so different.
Does this happen in all web browsers? I had issues like this when viewing a local app engine dev site in several browsers at the same time for cross-browser testing. IE would then struggle, with requests taking about as long as you describe.
If this is the issue, I found the problems didn't occur with IETester.
Sorry if it's not related, but I thought this was worth mentioning just in case.

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

Is it more efficient to parse external XML or to hit the database?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls instead for that same information.
First off -- measure. Don't just assume that one is better or worse than the other.
Second, if you really don't want to measure, I'd guess the database is a bit faster (assuming the database is relatively local compared to the web service). Network latency usually is more than parse time unless we're talking a really complex database or really complex XML.
Everyone is being very polite in answering this question: "it depends"... "you should test"... and so forth.
True, the question does not go into great detail about the application and network topographies involved, but if the question is even being asked, then it's likely a) the DB is "local" to the application (on the same subnet, or the same machine, or in memory), and b) the webservice is not. After all, the OP uses the phrases "external service" and "display on your own site." The phrase "parsing it once or however many times you need to each day" also suggests a set of data that doesn't exactly change every second.
The classic SOA myth is that the network is always available; going a step further, I'd say it's a myth that the network is always available with low latency. Unless your own internal systems are crap, sending an HTTP query across the Internet will always be slower than a query to a local DB or DB cluster. There are any number of reasons for this: number of hops to the remote server, outage or degradation issues that you can't control on the remote end, and the internal processing time for the remote web service application to analyze your request, hit its own persistence backend (aka DB), and return a result.
Fire up your app. Do some latency and response times to your DB. Now do the same to a remote web service. Unless your DB is also across the Internet, you'll notice a huge difference.
It's not at all hard for a competent technologist to scale a DB, or for you to completely remove the DB from caching using memcached and other paradigms; the latency between servers sitting near each other in the datacentre is monumentally less than between machines over the Internet (and more secure, to boot). Even if achieving this scale requires some thought, it's under your control, unlike a remote web service whose scaling and latency are totally opaque to you. I, for one, would not be too happy with the idea that the availability and responsiveness of my site are based on someone else entirely.
Finally, what happens if the remote web service is unavailable? Imagine a world where every request to your site involves a request over the Internet to some other site. What happens if that other site is unavailable? Do your users watch a spinning cursor of death for several hours? Do they enjoy an Error 500 while your site borks on this unexpected external dependency?
If you find yourself adopting an architecture whose fundamental features depend on a remote Internet call for every request, think very carefully about your application before deciding if you can live with the consequences.
Consuming the webservices is more efficient because there are a lot more things you can do to scale your webservices and webserver (via caching, etc.). By consuming the middle layer, you also have the options to change the returned data format (e.g. you can decide to use JSON rather than XML). Scaling database is much harder (involving replication, etc.) so in general, reduce hits on DB if you can.
There is not enough information to be able to say for sure in the general case. Why don't you do some tests and find out? Since it sounds like you are using python you will probably want to use the timeit module.
Some things that could effect the result:
Performance of the web service you are using
Reliability of the web service you are using
Distance between servers
Amount of data being returned
I would guess that if it is cacheable, that a cached version of the data will be faster, but that does not necessarily mean using a local RDBMS, it might mean something like memcached or an in memory cache in your application.
It depends - who is calling the web service? Is the web service called every time the user hits the page? If that's the case I'd recommend introducing a caching layer of some sort - many web service API's throttle the amount of hits you can make per hour.
Whether you choose to parse the cached XML on the fly or call the data from a database probably won't matter (unless we are talking enterprise scaling here). Personally, I'd much rather make a simple SQL call than write a DOM Parser (which is much more prone to exceptional scenarios).
It depends from case to case, you'll have to measure (or at least make an educated guess).
You'll have to consider several things.
Web service
it might hit database itself
it can be cached
it will introduce network latency and might be unreliable
or it could be in local network and faster than accessing even local disk
DB
might be slow since it needs to access disk (although databases have internal caches, but those are usually not targeted)
should be reliable
Technology itself doesn't mean much in terms of speed - in one case database parses SQL, in other XML parser parses XML, and database is usually acessed via socket as well, so you have both parsing and network in either case.
Caching data in your application if applicable is probably a good idea.
As a few people have said, it depends, and you should test it.
Often external services are slow, and caching them locally (in a database in memory, e.g., with memcached) is faster. But perhaps not.
Fortunately, it's cheap and easy to test.
Test definitely. As a rule of thumb, XML is good for communicating between apps, but once you have the data inside of your app, everything should go into a database table. This may not apply in all cases, but 95% of the time it has for me. Anytime I ever tried to store data any other way (ex. XML in a content management system) I ended up wishing I would have just used good old sprocs and sql server.
It sounds like you essentially want to cache results, and are wondering if it's worth it. But if so, I would NOT use a database (I assume you are thinking of a relational DB): RDBMSs are not good for caching; even though many use them. You don't need persistence nor ACID.
If choice was between Oracle/MySQL and external web service, I would start with just using service.
Instead, consider real caching systems; local or not (memcache, simple in-memory caches etc).
Or if you must use a DB, use key/value store, BDB works well. Store response message in its serialized form (XML), try to fetch from cache, if not, from service, parse. Or if there's a convenient and more compact serialization, store and fetch that.

How to Disable Django / mod_WSGI Page Caching

I have Django running in Apache via mod_wsgi. I believe Django is caching my pages server-side, which is causing some of the functionality to not work correctly.
I have a countdown timer that works by getting the current server time, determining the remaining countdown time, and outputting that number to the HTML template. A javascript countdown timer then takes over and runs the countdown for the user.
The problem arises when the user refreshes the page, or navigates to a different page with the countdown timer. The timer appears to jump around to different times sporadically, usually going back to the same time over and over again on each refresh.
Using HTTPFox, the page is not being loaded from my browser cache, so it looks like either Django or Apache is caching the page. Is there any way to disable this functionality? I'm not going to have enough traffic to worry about caching the script output. Or am I completely wrong about why this is happening?
[Edit] From the posts below, it looks like caching is disabled in Django, which means it must be happening elsewhere, perhaps in Apache?
[Edit] I have a more thorough description of what is happening: For the first 7 (or so) requests made to the server, the pages are rendered by the script and returned, although each of those 7 pages seems to be cached as it shows up later. On the 8th request, the server serves up the first page. On the 9th request, it serves up the second page, and so on in a cycle. This lasts until I restart apache, when the process starts over again.
[Edit] I have configured mod_wsgi to run only one process at a time, which causes the timer to reset to the same value in every case. Interestingly though, there's another component on my page that displays a random image on each request, using order('?'), and that does refresh with different images each time, which would indicate the caching is happening in Django and not in Apache.
[Edit] In light of the previous edit, I went back and reviewed the relevant views.py file, finding that the countdown start variable was being set globally in the module, outside of the view functions. Moving that setting inside the view functions resolved the problem. So it turned out not to be a caching issue after all. Thanks everyone for your help on this.
From my experience with mod_wsgi in Apache, it is highly unlikely that they are causing caching. A couple of things to try:
It is possible that you have some proxy server between your computer and the web server that is appropriately or inappropriately caching pages. Sometimes ISPs run proxy servers to reduce bandwidth outside their network. Can you please provide the HTTP headers for a page that is getting cached (Firebug can give these to you). Headers that I would specifically be interested in include Cache-Control, Expires, Last-Modified, and ETag.
Can you post your MIDDLEWARE_CLASSES from your settings.py file. It possible that you have a Middleware that performs caching for you.
Can you grep your code for the following items "load cache", "django.core.cache", and "cache_page". A *grep -R "search" ** will work.
Does the settings.py (or anything it imports like "from localsettings import *") include CACHE_BACKEND?
What happens when you restart apache? (e.g. sudo services apache restart). If a restart clears the issue, then it might be apache doing caching (it is possible that this could also clear out a locmen Django cache backend)
Did you specifically setup Django caching? From the docs it seems you would clearly know if Django was caching as it requires work beforehand to get it working. Specifically, you need to define where the cached files are saved.
http://docs.djangoproject.com/en/dev/topics/cache/
Are you using a multiprocess configuration for Apache/mod_wsgi? If you are, that will account for why different responses can have a different value for the timer as likely that when timer is initialised will be different for each process handling requests. Thus why it can jump around.
Have a read of:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
Work out in what mode or configuration you are running Apache/mod_wsgi and perhaps post what that configuration is. Without knowing, there are too many unknowns.
I just came across this:
Support for Automatic Reloading To help deployment tools you can
activate support for automatic reloading. Whenever something changes
the .wsgi file, mod_wsgi will reload all the daemon processes for us.
For that, just add the following directive to your Directory section:
WSGIScriptReloading On

Categories