I have the problem that for a project I need to work with a framework (Python), that has a poor documentation. I know what it does since it is the back end of a running application. I also know that no framework is good if the documentation is bad and that I should prob. code it myself. But, I have a time constraint. Therefore my question is: Is there a cooking recipe on how to understand a poorly documented framework?
What I tried until now is checking some functions and identify the organizational units in the framework but I am lacking a system to do it more effectively.
If I were you, with time constaraints, and bound to use a specific framework. I'll go in the following manner:
List down the use cases I desire to implement using the framework
Identify the APIs provided by the framework that helps me implement the use cases
Prototype the usecases based on the available documentation and reading
The prototyping is not implementing the entire use case, but to identify the building blocks around the case and implementing them. e.g., If my usecase is to fetch the Students, along with their courses, and if I were using Hibernate to implement, I would prototype the database accesss, validating how easily am I able to access the database using Hibernate, or how easily I am able to get the relational data by means of joining/aggregation etc.
The prototyping will help me figure out the possible limitations/bugs in the framework. If the limitations are more of show-stoppers, I will implement the supporting APIs myself; or I can take a call to scrap out the entire framework and write one for myself; whichever makes more sense.
You may also use python debugging library: pdb. After importing it with import pdb you may set traces in the body of functions and classes pdb.set_trace(). Then it will stop the execution of the program in the line and you may look at existing variables and processes.
I'm trying to pull similar data in from several third party APIs, all of which have slightly varying schemas, and convert them all into a unified schema to store in a DB and expose through a unified API. This is actually a re-write of a system that already does this, minus storing in the DB, but which is hard to test and not very elegant. I figured I'd turn to the community for some wisdom.
Here are some thoughts/what I'd like to achieve.
An easy way to specify schema mappings from the external APIs schema to the internal schema. I realize that some nuances in the data might be lost by converting to a unified schema, but that's life. This schema mapping might not be easy to do and perhaps overkill from the academic papers I've found on the matter.
An alternative solution would be to allow third parties to develop the interfaces to the external APIs. The code quality of these third parties may or may not be known, but could be established via thorough tests.
Therefore the system should be easy to test, I'm thinking by mocking the external API calls to have reproducible data and ensure that parsing and conversion is being done correctly.
One of the external API interfaces crashing should not bring down the rest of them.
Some sort of schema validation/way to detect if the external API schemas have changed without warning
This will end up being integrated into a Django project, so it could be written as a Django app, which would likely make unit and integration testing easier. On the other hand, I would like to keep it as decoupled as possible from Django. Although the API interfaces would have to know what format to convert to, could this be specified at runtime?
Am I missing anything in the wishlist? Unrealistic? Headed down the wrong path? Would love to get some feedback.
I'm not sure if there are libraries/OS project which already do some of this. The less wheels I have to reinvent the better. Would any part of this be valuable as an OS project?
In the previous version I spawned a bunch of threads that would handle individual requests. Although I've never used it, I've been told I should look at gevent as a way to handle this.
For your second bullet point you should check out Temboo. Temboo normalizes access to over 100 APIs, meaning that you can talk to them all using a common syntax in the language of your choice. In this case you would use the Temboo Python SDK - available here.
(Full disclosure: I work at Temboo)
I am working on a big project that involves a lot of web based and AI work. I am extremely comfortable with Python, though my only concern is with concurrent programming and scaling this project to make it work on clusters. Thus, Clojure for AI and support for Java function calls and bring about concurrent programming.
Is this a good idea to do all the web-based api work with Python and let Clojure take care of most of the concurrent AI work?
Edit:
Let me explain the interaction in detail. Python would be doing most of the dirty work (scraping, image processing, improving the database and all that.) Clojure, if possible, would either deal with the data base or get the data from Python. I except something CPython sort of linking with Python and Clojure.
Edit2:
Might be a foolish question to ask, but this being a rather long term project which will evolve quite a bit and go under several iterations, is Clojure a language here to stay? Is it portable enough?
I built an embarrassingly parallel number-crunching application with a backend in Clojure (on an arbitrary number of machines) and a frontend in Ruby on Rails. I don't particularly like RoR, but this was a zero-budget project at the time and we had a Rails programmer at hand who was willing to work for free.
The Clojure part consisted of (roughly) a controller, number crunching nodes, and a server implementing a JSON-over-HTTP API which was the interface to the Rails web app. The Clojure nodes used RabbitMQ to talk to each other. Because we defined clear APIs between different parts of the application, it was easy to later rewrite the frontend in Clojure (because that better suited our needs).
If you're working on a distributed project with a long life span and continuous development effort, it could make sense to design the application as a number of separate modules that communicate through well defined APIs (json, bson, ... over AMQP, HTTP, ... or a database). That means you can get started quickly using a language you're comfortable with, and rewrite parts in another language at a later stage if necessary.
I can't see a big problem with using Python for the web apps and Clojure for the concurrent data crunching / back end code. I assume you would use something like JSON over http for the communications between the two, which should work fine.
I'd personally use Clojure for both (using e.g. the excellent Noir as a web framework and Korma for the database stuff.), but if as you say your experience is mostly in Python then it probably makes sense to stick with Python from a productivity perspective (in the short term at least).
To answer the questions regarding the future of Clojure:
It's definitely here to stay. It has a very active community and is probably one of the "hottest" JVM languages right now (alongside Scala and Groovy). It seems to be doing particularly well in the big data / analytics space
Clojure has a particular advantage in terms of library support, since it can easily make use of any Java libraries. This is a huge advantage for a new langauge from a practical perspective, since it immediately solves what is usually one of the biggest issues in getting a new language ecosystem off the ground.
Clojure is a new language that is still undergoing quite a lot of development. If you choose to use Clojure, you should be aware that you will need to put in some effort to stay current and keep your code up to date with the latest Clojure versions. I've personally not found this to be an issue, but it may come as a surprise to people used to more "stable" languages like Java.
Clojure is very portable - it will basically run anywhere that you can get a reasonably modern JVM, which is pretty much everywhere nowadays.
If you can build both sides to use Data and Pure(ish) Functions to communicate then this should work very well. wrapping your clojure functions in web services that take and retrun JSON (or more preferably clojure forms) should make them accessible to your Python based front end will no extra fuss.
Of course it's more fun to write it in Clojure all the way through. ;)
If this is a long term project than building clean Functional (as in takes and returns values) interfaces that exchange Data becomes even more important because it will give you the ability to evolve the components independently.
In such scenarios I personally like to start in the below sequence.
Divide the system into subsystems with "very clear" definition of what each system does and that definition should follow the principle of "do one thing and keep it simple". At this stage don't think about language etc.
Choose the platform (not languages) on which these subsystems will run. Ex: JVM, Python VM, NodeJs, CLR(Mono), other VMs. Try to select few platforms or if possible just one as that does make life easier down the road in terms of complexity.
Choose the language to program those platforms. This is very subjective but for JVM you can go with Clojure or Jython (in case you like Dynamic languages as I do).
As far as Clojure future is concerned, this is a language developed by "community of amazing programmers" and not by some corporation. I hope that clears your doubt about the "long term" concern of Clojure. By the way Clojure is LISP, so you can modify the language the way you want it and fix things yourself even if someone don't do that for you.
I need a scripting language for describing very complicated workflows.
These workflows need to be paused
whenever user input is required, and
resumed after it is given (could be
months later). Seems like serializable continuations from Stackless would be a good fit.
Users also need to be able to edit
the workflows themselves. I'm not sure how serialized continuations would handle underlying code changes. I think I might need to save the Git version hash along with the continuation, and only 'upgrade' a continuation at checkpoints where no state is needed.
I prefer the Python syntax since
readability is a very high priority,
and dynamic features are key. I'm open to suggestions, though.
Eventually I'll probably write a visual flow-chart editor that maniupulates the underlying code.
I've looked in depth at Stackless and PyPy. Stackless doesn't seem to offer any promises of sandboxing, while PyPy seems to offer both stackless and sandboxed, but I can't find any mention of having both at the same time.
Any solutions? If there's an expert out there who can get me going with a good solution, I've got a paypal account and I'm willing to use it.
Your serialization requirement will be difficult in most languages with native co-routine libraries. You might need to implement co-routines in another way to allow for object graph serialization.
Lua has the Pluto library, which CAN persist threads (co-routines): http://lua-users.org/wiki/PlutoLibrary
As far as "safe" execution in a sandbox, Lua is a first choice. You can have multiple lua states in a single application with zero issues, and it supports co-routines in the language. It also has the benefit of being quite fast in its VM form, and with luajit is competitive with Java JIT in many cases.
I'd like to do some server-side scripting using Python. But I'm kind of lost with the number of ways to do that.
It starts with the do-it-yourself CGI approach and it seems to end with some pretty robust frameworks that would basically do all the job themselves. And a huge lot of stuff in between, like web.py, Pyroxide and Django.
What are the pros and cons of the frameworks or approaches that you've worked on?
What trade-offs are there?
For what kind of projects they do well and for what they don't?
Edit: I haven't got much experience with web programing yet.
I would like to avoid the basic and tedious things like parsing the URL for parameters, etc.
On the other hand, while the video of blog created in 15 minutes with Ruby on Rails left me impressed, I realized that there were hundreds of things hidden from me - which is cool if you need to write a working webapp in no time, but not that great for really understanding the magic - and that's what I seek now.
CGI is great for low-traffic websites, but it has some performance problems for anything else. This is because every time a request comes in, the server starts the CGI application in its own process. This is bad for two reasons: 1) Starting and stopping a process can take time and 2) you can't cache anything in memory. You can go with FastCGI, but I would argue that you'd be better off just writing a straight WSGI app if you're going to go that route (the way WSGI works really isn't a whole heck of a lot different from CGI).
Other than that, your choices are for the most part how much you want the framework to do. You can go with an all singing, all dancing framework like Django or Pylons. Or you can go with a mix-and-match approach (use something like CherryPy for the HTTP stuff, SQLAlchemy for the database stuff, paste for deployment, etc). I should also point out that most frameworks will also let you switch different components out for others, so these two approaches aren't necessarily mutually exclusive.
Personally, I dislike frameworks that do too much magic for me and prefer the mix-and-match technique, but I've been told that I'm also completely insane. :)
How much web programming experience do you have? If you're a beginner, I say go with Django. If you're more experienced, I say to play around with the different approaches and techniques until you find the right one.
The simplest web program is a CGI script, which is basically just a program whose standard output is redirected to the web browser making the request. In this approach, every page has its own executable file, which must be loaded and parsed on every request. This makes it really simple to get something up and running, but scales badly both in terms of performance and organization. So when I need a very dynamic page very quickly that won't grow into a larger system, I use a CGI script.
One step up from this is embedding your Python code in your HTML code, such as with PSP. I don't think many people use this nowadays, since modern template systems have made this pretty obsolete. I worked with PSP for awhile and found that it had basically the same organizational limits as CGI scripts (every page has its own file) plus some whitespace-related annoyances from trying to mix whitespace-ignorant HTML with whitespace-sensitive Python.
The next step up is very simple web frameworks such as web.py, which I've also used. Like CGI scripts, it's very simple to get something up and running, and you don't need any complex configuration or automatically generated code. Your own code will be pretty simple to understand, so you can see what's happening. However, it's not as feature-rich as other web frameworks; last time I used it, there was no session tracking, so I had to roll my own. It also has "too much magic behavior" to quote Guido ("upvars(), bah").
Finally, you have feature-rich web frameworks such as Django. These will require a bit of work to get simple Hello World programs working, but every major one has a great, well-written tutorial (especially Django) to walk you through it. I highly recommend using one of these web frameworks for any real project because of the convenience and features and documentation, etc.
Ultimately you'll have to decide what you prefer. For example, frameworks all use template languages (special code/tags) to generate HTML files. Some of them such as Cheetah templates let you write arbitrary Python code so that you can do anything in a template. Others such as Django templates are more restrictive and force you to separate your presentation code from your program logic. It's all about what you personally prefer.
Another example is URL handling; some frameworks such as Django have you define the URLs in your application through regular expressions. Others such as CherryPy automatically map your functions to urls by your function names. Again, this is a personal preference.
I personally use a mix of web frameworks by using CherryPy for my web server stuff (form parameters, session handling, url mapping, etc) and Django for my object-relational mapping and templates. My recommendation is to start with a high level web framework, work your way through its tutorial, then start on a small personal project. I've done this with all of the technologies I've mentioned and it's been really beneficial. Eventually you'll get a feel for what you prefer and become a better web programmer (and a better programmer in general) in the process.
If you decide to go with a framework that is WSGI-based (for instance TurboGears), I would recommend you go through the excellent article Another Do-It-Yourself Framework by Ian Bicking.
In the article, he builds a simple web application framework from scratch.
Also, check out the video Creating a web framework with WSGI by Kevin Dangoor. Dangoor is the founder of the TurboGears project.
If you want to go big, choose Django and you are set. But if you want just to learn, roll your own framework using already mentioned WebOb - this can be really fun and I am sure you'll learn much more (plus you can use components you like: template system, url dispatcher, database layer, sessions, et caetera).
In last 2 years I built few large sites using Django and all I can say, Django will fill 80% of your needs in 20% of time. Remaining 20% of work will take 80% of the time, no matter which framework you'd use.
It's always worth doing something the hard way - once - as a learning exercise. Once you understand how it works, pick a framework that suits your application, and use that. You don't need to reinvent the wheel once you understand angular velocity. :-)
It's also worth making sure that you have a fairly robust understanding of the programming language behind the framework before you jump in -- trying to learn both Django and Python at the same time (or Ruby and Rails, or X and Y), can lead to even more confusion. Write some code in the language first, then add the framework.
We learn to develop, not by using tools, but by solving problems. Run into a few walls, climb over, and find some higher walls!
If you've never done any CGI programming before I think it would be worth doing one project - perhaps just a sample play site just for yourself - using the DIY approach. You'll learn a lot more about how all the various parts work than you would by using a framework. This will help in you design and debug and so on all your future web applications however you write them.
Personally I now use Django. The real benefit is very fast application deployment. The object relational mapping gets things moving fast and the template library is a joy to use. Also the admin interface gives you basic CRUD screens for all your objects so you don't need to write any of the "boring" stuff.
The downside of using an ORM based solution is that if you do want to handcraft some SQL, say for performance reasons, it much harder than it would have been otherwise, although still very possible.
If you are using Python you should not start with CGI, instead start with WSGI (and you can use wsgiref.handlers.CGIHandler to run your WSGI script as a CGI script. The result is something that is basically as low-level as CGI (which might be useful in an educational sense, but will also be somewhat annoying), but without having to write to an entirely outdated interface (and binding your application to a single process model).
If you want a less annoying, but similarly low-level interface, using WebOb would provide that. You would be implementing all the logic, and there will be few dark corners that you won't understand, but you won't have to spend time figuring out how to parse HTTP dates (they are weird!) or parse POST bodies. I write applications this way (without any other framework) and it is entirely workable. As a beginner, I'd advise this if you were interested in understanding what frameworks do, because it is inevitable you will be writing your own mini framework. OTOH, a real framework will probably teach you good practices of application design and structure. To be a really good web programmer, I believe you need to try both seriously; you should understand everything a framework does and not be afraid of its internals, but you should also spend time in a thoughtful environment someone else designed (i.e., an existing framework) and understand how that structure helps you.
OK, rails is actually pretty good, but there is just a little bit too much magic going on in there (from the Ruby world I would much prefer merb to rails). I personally use Pylons, and am pretty darn happy. I'd say (compared to django), that pylons allows you to interchange ints internal parts easier than django does. The downside is that you will have to write more stuff all by youself (like the basic CRUD).
Pros of using a framework:
get stuff done quickly (and I mean lighning fast once you know the framework)
everything is compying to standards (which is probably not that easy to achieve when rolling your own)
easier to get something working (lots of tutorials) without reading gazillion articles and docs
Cons:
you learn less
harder to replace parts (not that much of an issue in pylons, more so with django)
harder to tweak some low-level stuff (like the above mentioned SQLs)
From that you can probably devise what they are good for :-) Since you get all the code it is possible to tweak it to fit even the most bizzare situations (pylons supposedly work on the Google app engine now...).
For smaller projects, rolling your own is fairly easy. Especially as you can simply import a templating engine like Genshi and get alot happening quite quickly and easily. Sometimes it's just quicker to use a screwdriver than to go looking for the power drill.
Full blown frameworks provide alot more power, but do have to be installed and setup first before you can leverage that power. For larger projects, this is a negligible concern, but for smaller projects this might wind up taking most of your time - especially if the framework is unfamiliar.