I'm working on a Python project, currently using Django, which does quite a bit of NLP work in a form post process. I'm using the NLTK package, and profiling my code and experimenting I've realised that the majority of the time the code takes is performing the import process of NLTK and various other packages. My question is, is there a way I can have this server start up, do these imports and then just wait for requests, passing them to a function that uses the already imported packages? This would be much faster and less wasteful than performing such imports on every request. If anybody has any ideas to avoid importing large packages on every request, it'd be great if you could help me out!
Thanks,
Callum
Django, under most deployment mechanism, does not import modules for every request. Even the development server only reloads code when it changes. I don't know how you're verifying that all the imports are re-run each time, but that certainly shouldn't be happening.
Related
I've recently implemented two-factor auth in a django application. I used a third-party package for it, which is well-tested already. I want to write unit tests for my code, but it seems silly to test things which are really just their package. I feel really odd writing larger-scale selenium tests for the login process, especially e.g. scanning a QR code. Is the answer that if I'm not doing anything new with the code, just dropping in the existing library, that I can't effectively write tests for it? (because it's unnecessary?)
Let me explain what I'm trying to achieve. In the past while working on Java platform, I used to write Java codes(say, to push or pull data from MySQL database etc.) then create a war file which essentially bundles all the class files, supporting files etc and put it under a servlet container like Tomcat and this becomes a web service and can be invoked from any platform.
In my current scenario, I've majority of work being done in Java, however the Natural Language Processing(NLP)/Machine Learning(ML) part is being done in Python using the NLTK, Scipy, Numpy etc libraries. I'm trying to use the services of this Python engine in existing Java code. Integrating the Python code to Java through something like Jython is not that straight-forward(as Jython does not support calling any python module which has C based extensions, as far as I know), So I thought the next option would be to make it a web service, similar to what I had done with Java web services in the past. Now comes the actual crux of the question, how do I run the ML engine as a web service and call the same from any platform, in my current scenario this happens to be Java. I tried looking in the web, for various options to achieve this and found things like CherryPy, Werkzeug etc but not able to find the right approach or any sample code or anything that shows how to invoke a NLTK-Python script and serve the result through web, and eventually replicating the functionality Java web service provides. In the Python-NLTK code, the ML engine does a data-training on a large corpus(this takes 3-4 minutes) and we don't want the Python code to go through this step every time a method is invoked. If I make it a web service, the data-training will happen only once, when the service starts and then the service is ready to be invoked and use the already trained engine.
Now coming back to the problem, I'm pretty new to this web service things in Python and would appreciate any pointers on how to achieve this .Also, any pointers on achieving the goal of calling NLTK based python scripts from Java, without using web services approach and which can deployed on production servers to give good performance would also be helpful and appreciable. Thanks in advance.
Just for a note, I'm currently running all my code on a Linux machine with Python 2.6, JDK 1.6 installed on it.
One method is to build an XML-RPC server, but you may wish to fork a new process for each connection to prevent the server from seizing up. I have written a detailed tutorial on how to go about this: https://speakerdeck.com/timclicks/case-studies-of-python-in-parallel?slide=68.
NLTK based system tends to be slow at response per request, but good throughput can be achieved given enough RAM.
I've been working on a Flask app which handles SMS messages using Twilio, stores them in a database, and provides access to a frontend via JSONP GET requests. I've daemonized it using supervisord, which seems to be working pretty well, but every few days it starts to hang (i.e. all requests pend forever or time out) and I have to restart the process. (I've also tried simply running it with nohup, but same problem.) I was suspicious that sqlite3 was somehow blocking occasionally, but my most recent test was to write a request method which didn't involve database access, and that's timing out too. I'm incredibly puzzled -- hopefully you've seen something similar or know what might be causing this.
The relevant code can be found here, and it's currently running (and stalled, as of this post) on my VPS at mattnichols.net:6288
Thanks!
Update: do you think this could be an issue with Flask's dev server? I'd like to believe that wrapping my app with Tornado (or something similar) could solve the problem, but I've also run other things for much longer without problems using the dev server.
For the record, this seems to have been solved by running my app using Tornado instead of the Flask dev server. Wrapping my Flask code into a Tornado server was super easy once I decided to do so: consult http://flask.pocoo.org/docs/deploying/wsgi-standalone/#tornado if you find yourself in my same situation.
I am experimenting with Flask coming from Django and I really like it. There is just one problem that I ran into. I read the flask docs and the part about big applications or something like that and it explains a way to divide your project in packages, each one with its own static and templates folder as well as its own views module. the thing is that I cannot find a way that works to put the models in there using SQLAlchemy with the Flask extension. It works from the interactive prompt to create the tables, but when i use it inside the code it breaks. So I wanted to know how more experienced Flask developers solved this.
While I'm not ready to announce because I'm still actively working on refining the samples, you would probably benefit from the flask-skeleton project that I'm developing. I got tired of reinventing the wheel with regards to bootstrapping Flask websites so I started to a complete sample project that uses my best practices. I haven't added any unit tests yet, but this should be good enough for you to start with. Please send me feedback or suggestions if you come across any.
https://github.com/sean-/flask-skeleton/
Actually I found out what I was looking for. Instead of importing flaskext.sqlalchemy on the main __init__ you import it in the model. After that you import the model in the main __init__ and with db.init_app() start it and pass the app configurations. It is not as flexible as the skeleton shown in #Sean post, but it was what I wanted to know. If i weren't toying around probably the skeleton would be the one I'd use.
I'm working with web2py and for some reason web2py seems to fail to notice when code has changed in certain cases. I can't really narrow it down, but from time to time changes in the code are not reflected, web2py obviously has the old version cached somewhere.
The only thing that helps is quitting web2py and restarting it (i'm using the internal server).
Any hints ? Thank you !
web2py does cache your code, except for Google App Engine (for speed). That is not the problem. If you you edit code in models, views or controllers, you see the effect immediately.
The problem may be modules; if you edit code in modules you will not see the effect immediately, unless you import them with local_import('module', reload=True), or by restarting web2py.
Is that is also not your problem, then your browser is caching something. Please bring up this question to the web2py mailing list as we can help more.
P.S. If you are using the latest web2py it no longer comes with cherrypy. The built-in web server is called Rocket.
web2py itself shouldn't "cache" your code, but whatever app server you're using it on surely might. But web2py can be deployed on such a huge variety of app servers that it's impossible to give completely general suggestions.
If you're using the popular cherrypy WSGI server that I believe comes bundled with web2py, for example, see, in cherrypy's own docs, the AutoReload feature. Such features are not recommended in a production deployment (they can require very significant resources), but they sure come in handy when you're just developing!-)