Consuming RSS in Django ( / Python) - python

For a site I'm working on I would like to import a lot of RSS feeds using Django. Since I need the content of them fast I will need to cache them locally (either in the database or in some other way)
Is there a standard app to do RSS consumption in Django, or is there a standard way to do this in Python?
Of course I could implement it myself, but I'd rather reuse a good piece of code (since there's a lot of stuff to consider, like what to do when an item updates, how long to wait before checking for updates, etc, and I'd rather reuse someone elses thinking about this).
(I did google django and rss, but everything that seems to popup is feed generation; surely there must be other sites out there using Django and consuming RSS?)

Check outhttp://feedparser.org/docs/ http://code.google.com/p/feedparser/
One of the best Python libraries for parsing RSS and Atom Feeds; although it seems like you want to do a bit more (caching, auto-refresh etc.)

Related

Backtrader, using mongodb as datafeed instead CSV

I'm quite new to backtrader and since I've started I couldn't stop wondering why there's no database support for the datafeed. I've found a page on the official website where's described how to implement a custom datafeed. The implementation should be pretty easy, but on github (or more in general on the web) I couldn't fine a single one implementation of a feed with MongoDb. I understand that CSV are easier to manage and so on, but in some cases could require a lot of RAM for storing data all at once in memory. On the other hand, having a db can be "RAM friendly" but will take longer during the backtesting process even if the DB is a documental one. Does anyone have any experience with both of these two approaches? And if yes, there's some code I can take a look at?
Thanks!

Architecture question for app which serves data from SQLite in memory

I am building a dynamic <div> into my (very low-traffic) website, which I want to display only some data from a memory-based SQLite database running within a Python program on the server. Being a novice in web tech, I can't decide which technologies and principles should go into this project.
Right now, the only decided-upon technologies are Python and Apache. Python, at the very least, needs to be constantly running to fetch data from the external source, format it, and enter it into the database. Problem #1 is where this database should reside. Ideally, I would like it in RAM, since the database will update both often and around the clock. Then the question becomes, "How does one retrieve the data?". Note: the query will never change; I want the web page to receive the same JSON structure only with up-to-date values. From here, I see two options with the first, again, being ideal:
1) Perform some simple "hey someone wants the stuff" interaction with the Python program (remember that this program will be running) whenever someone loads the page, which is responded to with the JSON data. This should be fairly easy with WebSockets, but I understand they have fallen out of favor.
2) Have the Python program periodically create/update an HTML file, which the page loads with jQuery. I could do this with my current knowledge, but I find it inelegant and it would be accepting several compromises such as increased disk read/writes and possibly out-of-date data unless read/writes are increased even further, essentially rendering the benefits from a memory database useless.
So, is my ideal case feasible? Can I implement an API into my Python program to listen for requests? Would the request be made with jQuery? Node.js? PHP? Maybe even with Apache? Do I bypass Python by manipulating the VFS? The techs available feel overwhelming and most online resources only detail generating HTML with Python (Django, Flask, etc.).
Thank you!
WSGI is the tech I was looking for!

rich web client vs thin web client

I have one design decision to make.
In my web(ajax) application we need to decide where should we put user interface logic ?
Should It be completely loaded via javascript ( pure single page ) . and Only data comes and go.
or
Should server send some format (XML) which translated via javascript to dynamically create rich user interface. ( semi-ajax ). so some data and ui comes and go.
Which option is better ? ( speed, ease of development, platform independence )
Thanks.
IMO, it depends mostly on what kind of application it is. Is it used more like a desktop application? Then single page might work well. Having an Ajax-client to a large extent has the same drawbacks as using frames but that isn't a big problem in desktop-style applications.
Your second option works better if it more like a traditional website with many different pages with dfferent content, Then you want to have separate URL's to that different content. But then making an Ajax application might not give you all that much in the first place. Having some bits of Ajax on the page might be useful, but loading all the data with Ajax might not add anything to your app, except making it slower.
I faced similar dilemma few months back. As Lennart (above) says it makes sense to go for pyjamas or similar library if your app is more desktopish.
Further one of the biggest advantage of pyjamas provide is logically well separated backend and frontend code. IMO that is very important.
If your app is not like a desktop app (like ours was), then multipage offers more advantages, such as single change wont breaks entire app, easier to maintain etc. You might want to consider can have your app server serve json and other web server serves static content and js. Js would request json app server for data. That way we managed to keep out frontend and backend separate. Further we chose mootools as js lib over pyjamas. Ofcourse it is upto your taste and need of application. We did use python template server side templates but at compile time not at runtime like usual approach. This needed to change our thinking a little but the offered many advantages.
I end up telling you my story but I thought it's relevant and hope that helps.
The biggest influence is whether you are concerned about initial page load time. If you don't mind having all the UI there at page load, your app can be more responsive by just shuttling data instead of UI. If you want faster load and don't mind larger AJAX requests, sending some UI markup isn't bad. If you have the server power to pre-render UI with data and send the fully-ready marked-up data to the user, their browser will perform more quickly, and initial page-load should be fast.
Which course you choose should depend on the task at hand. Not all requests need be handled the same way.
Which option is better ? ( speed, ease of development, platform independence )
Platform independence, if you mean cross-browser compatibility, is a HUGE reason to use pyjamas because the python code includes a sane override infrastructure which handles everything for you. No more JS compatibility classes.
Anyway Pyjamas is all about loading the client app and then using json-rpc for the data only. That's because it's faster (once the app loaded up), easier to separate server and client, easier to maintain since all the UI code is in widgets in one place.
I've seen stuff like DokuWiki which use a php script to serve up javascript and my first thought was "WHY?" but it works pretty well I guess. It probably makes sense if you mostly have static pages with the occasional bit of JS for decoration.

Python templates for huge HTML/XML

Recently I needed to generate a huge HTML page containing a report with several thousand row table. And, obviously, I did not want to build the whole HTML (or the underlying tree) in memory. As result, I built the page with the old good string interpolation, but I do not like the solution.
Thus, I wonder whether there are Python templating engines that can yield resulting page content by parts.
UPD 1: I am not interested in listing all available frameworks and templating engines.
I am interested in templating solutions that I can use separately from any framework and which can yield content by portions instead of building the whole result in memory.
I understand the usability enhancements from partial content loading with client scripting, but that is out of the scope of my current question. Say, I want to generate a huge HTML/XML and stream it into a local file.
Most popular template engines have a way to generate or write rendered result to file objects with chunks. For example:
Template.generate() in Jinja2
Template.render_context() in Mako
Stream.serialize() in Genshi
It'd be more user-friendly (assuming they have javascript enabled) to build the table via javascript by using e.g. a jQuery plugin which allows automatical loading of contents as soon as you scroll down. Then only few rows are loaded initially and when the user scrolls down more rows are loaded on demand.
If that's not a solution, you could use three templates: one for everything before the rows, one for everything after the rows and a third one for the rows.
Then you first send the before-rows template, then generate the rows and send them immediately, then the after-rows template. Then you will have only one block/row in memory instead of the whole table.
There is no problem with building something like this in memory. Several thousand rows is by no means big.
For your templating needs you can use any of the:
rst http://docutils.sourceforge.net/rst.html (just needs core docutils)
markdown http://daringfireball.net/projects/markdown/ (python lib)
sphinx https://www.sphinx-doc.org (python based)
json http://www.json.org/ (python lib)
There are some tools that allow generation of HTML from these markup languages.
You don't need a streaming templating engine - I do this all the time, and long before you run into anything vaguely heavy server-side, the browser will start to choke. Rendering a 10000 row table will peg the CPU for several seconds in pretty much any browser; scrolling it will be bothersomely choppy in chrome, and the browser mem usage will rise regardless of browser.
What you can do (and I've previously implemented, even though in retrospect it turns out not to be necessary) is use client-side xslt. Printing the xslt processing instruction and the opening and closing tag using strings is easy and fairly safe; then you can stream each individual row as a standalone xml element using whatever xml writer technique you prefer.
However - you really don't need this, and likely never will - if ever your html generator gets too slow, the browser will be an order of magnitude more problematic.
So, unless you benchmarked this and have determined you really have a problem, don't waste your time. If you do have a problem, you can solve it without fundamentally changing the method - in memory generation can work just fine.
Are you using a web framework for this?
http://www.pylonshq.com includes compatibility with several templating engines.
http://www.djangoproject.com/ Django has its own templating language.
I think an answer that included lazy loading of the rows with javascript would work for web view, but I presume the report is going to need to be printed, in which case you'll have to build the whole thing at some point, right?

Pros and Cons of different approaches to web programming in Python

I'd like to do some server-side scripting using Python. But I'm kind of lost with the number of ways to do that.
It starts with the do-it-yourself CGI approach and it seems to end with some pretty robust frameworks that would basically do all the job themselves. And a huge lot of stuff in between, like web.py, Pyroxide and Django.
What are the pros and cons of the frameworks or approaches that you've worked on?
What trade-offs are there?
For what kind of projects they do well and for what they don't?
Edit: I haven't got much experience with web programing yet.
I would like to avoid the basic and tedious things like parsing the URL for parameters, etc.
On the other hand, while the video of blog created in 15 minutes with Ruby on Rails left me impressed, I realized that there were hundreds of things hidden from me - which is cool if you need to write a working webapp in no time, but not that great for really understanding the magic - and that's what I seek now.
CGI is great for low-traffic websites, but it has some performance problems for anything else. This is because every time a request comes in, the server starts the CGI application in its own process. This is bad for two reasons: 1) Starting and stopping a process can take time and 2) you can't cache anything in memory. You can go with FastCGI, but I would argue that you'd be better off just writing a straight WSGI app if you're going to go that route (the way WSGI works really isn't a whole heck of a lot different from CGI).
Other than that, your choices are for the most part how much you want the framework to do. You can go with an all singing, all dancing framework like Django or Pylons. Or you can go with a mix-and-match approach (use something like CherryPy for the HTTP stuff, SQLAlchemy for the database stuff, paste for deployment, etc). I should also point out that most frameworks will also let you switch different components out for others, so these two approaches aren't necessarily mutually exclusive.
Personally, I dislike frameworks that do too much magic for me and prefer the mix-and-match technique, but I've been told that I'm also completely insane. :)
How much web programming experience do you have? If you're a beginner, I say go with Django. If you're more experienced, I say to play around with the different approaches and techniques until you find the right one.
The simplest web program is a CGI script, which is basically just a program whose standard output is redirected to the web browser making the request. In this approach, every page has its own executable file, which must be loaded and parsed on every request. This makes it really simple to get something up and running, but scales badly both in terms of performance and organization. So when I need a very dynamic page very quickly that won't grow into a larger system, I use a CGI script.
One step up from this is embedding your Python code in your HTML code, such as with PSP. I don't think many people use this nowadays, since modern template systems have made this pretty obsolete. I worked with PSP for awhile and found that it had basically the same organizational limits as CGI scripts (every page has its own file) plus some whitespace-related annoyances from trying to mix whitespace-ignorant HTML with whitespace-sensitive Python.
The next step up is very simple web frameworks such as web.py, which I've also used. Like CGI scripts, it's very simple to get something up and running, and you don't need any complex configuration or automatically generated code. Your own code will be pretty simple to understand, so you can see what's happening. However, it's not as feature-rich as other web frameworks; last time I used it, there was no session tracking, so I had to roll my own. It also has "too much magic behavior" to quote Guido ("upvars(), bah").
Finally, you have feature-rich web frameworks such as Django. These will require a bit of work to get simple Hello World programs working, but every major one has a great, well-written tutorial (especially Django) to walk you through it. I highly recommend using one of these web frameworks for any real project because of the convenience and features and documentation, etc.
Ultimately you'll have to decide what you prefer. For example, frameworks all use template languages (special code/tags) to generate HTML files. Some of them such as Cheetah templates let you write arbitrary Python code so that you can do anything in a template. Others such as Django templates are more restrictive and force you to separate your presentation code from your program logic. It's all about what you personally prefer.
Another example is URL handling; some frameworks such as Django have you define the URLs in your application through regular expressions. Others such as CherryPy automatically map your functions to urls by your function names. Again, this is a personal preference.
I personally use a mix of web frameworks by using CherryPy for my web server stuff (form parameters, session handling, url mapping, etc) and Django for my object-relational mapping and templates. My recommendation is to start with a high level web framework, work your way through its tutorial, then start on a small personal project. I've done this with all of the technologies I've mentioned and it's been really beneficial. Eventually you'll get a feel for what you prefer and become a better web programmer (and a better programmer in general) in the process.
If you decide to go with a framework that is WSGI-based (for instance TurboGears), I would recommend you go through the excellent article Another Do-It-Yourself Framework by Ian Bicking.
In the article, he builds a simple web application framework from scratch.
Also, check out the video Creating a web framework with WSGI by Kevin Dangoor. Dangoor is the founder of the TurboGears project.
If you want to go big, choose Django and you are set. But if you want just to learn, roll your own framework using already mentioned WebOb - this can be really fun and I am sure you'll learn much more (plus you can use components you like: template system, url dispatcher, database layer, sessions, et caetera).
In last 2 years I built few large sites using Django and all I can say, Django will fill 80% of your needs in 20% of time. Remaining 20% of work will take 80% of the time, no matter which framework you'd use.
It's always worth doing something the hard way - once - as a learning exercise. Once you understand how it works, pick a framework that suits your application, and use that. You don't need to reinvent the wheel once you understand angular velocity. :-)
It's also worth making sure that you have a fairly robust understanding of the programming language behind the framework before you jump in -- trying to learn both Django and Python at the same time (or Ruby and Rails, or X and Y), can lead to even more confusion. Write some code in the language first, then add the framework.
We learn to develop, not by using tools, but by solving problems. Run into a few walls, climb over, and find some higher walls!
If you've never done any CGI programming before I think it would be worth doing one project - perhaps just a sample play site just for yourself - using the DIY approach. You'll learn a lot more about how all the various parts work than you would by using a framework. This will help in you design and debug and so on all your future web applications however you write them.
Personally I now use Django. The real benefit is very fast application deployment. The object relational mapping gets things moving fast and the template library is a joy to use. Also the admin interface gives you basic CRUD screens for all your objects so you don't need to write any of the "boring" stuff.
The downside of using an ORM based solution is that if you do want to handcraft some SQL, say for performance reasons, it much harder than it would have been otherwise, although still very possible.
If you are using Python you should not start with CGI, instead start with WSGI (and you can use wsgiref.handlers.CGIHandler to run your WSGI script as a CGI script. The result is something that is basically as low-level as CGI (which might be useful in an educational sense, but will also be somewhat annoying), but without having to write to an entirely outdated interface (and binding your application to a single process model).
If you want a less annoying, but similarly low-level interface, using WebOb would provide that. You would be implementing all the logic, and there will be few dark corners that you won't understand, but you won't have to spend time figuring out how to parse HTTP dates (they are weird!) or parse POST bodies. I write applications this way (without any other framework) and it is entirely workable. As a beginner, I'd advise this if you were interested in understanding what frameworks do, because it is inevitable you will be writing your own mini framework. OTOH, a real framework will probably teach you good practices of application design and structure. To be a really good web programmer, I believe you need to try both seriously; you should understand everything a framework does and not be afraid of its internals, but you should also spend time in a thoughtful environment someone else designed (i.e., an existing framework) and understand how that structure helps you.
OK, rails is actually pretty good, but there is just a little bit too much magic going on in there (from the Ruby world I would much prefer merb to rails). I personally use Pylons, and am pretty darn happy. I'd say (compared to django), that pylons allows you to interchange ints internal parts easier than django does. The downside is that you will have to write more stuff all by youself (like the basic CRUD).
Pros of using a framework:
get stuff done quickly (and I mean lighning fast once you know the framework)
everything is compying to standards (which is probably not that easy to achieve when rolling your own)
easier to get something working (lots of tutorials) without reading gazillion articles and docs
Cons:
you learn less
harder to replace parts (not that much of an issue in pylons, more so with django)
harder to tweak some low-level stuff (like the above mentioned SQLs)
From that you can probably devise what they are good for :-) Since you get all the code it is possible to tweak it to fit even the most bizzare situations (pylons supposedly work on the Google app engine now...).
For smaller projects, rolling your own is fairly easy. Especially as you can simply import a templating engine like Genshi and get alot happening quite quickly and easily. Sometimes it's just quicker to use a screwdriver than to go looking for the power drill.
Full blown frameworks provide alot more power, but do have to be installed and setup first before you can leverage that power. For larger projects, this is a negligible concern, but for smaller projects this might wind up taking most of your time - especially if the framework is unfamiliar.

Categories