Copying the serialization class between two servers - python

I'm building a system with two servers and I'm building an API interface between them. The one is a normal Django web server and the other is a calculation server (also Django powered) which performs complex calculations from specific inputs. I've split the website and the calculation server to decouple the components.
I'm using the Django rest framework and I've created a serialization class on the webserver. This is for the inputs that gets sent to the calculation server and is populated from various DB entries. I then pass the serialized data as parameters in a get request to the calc server. I then copy that same serialization class to the calculation server to de-serialize/decode the data and perform the calculation.
Is it normal to use this approach, where I'm copying the serialization class between the two servers? Usually when I copy something I'm doing it wrong.
The calculated results are then just returned to my web server using built-in python and django functions. I don't see a need for the django rest framework during this step.

Related

How to store global state in Django application outside database?

I want to implement a neural network model in Django application so that it can communicate via REST API with other application. Django application iteratively (1) collects a batch of training data from the other application, (2) retrains the model on so far aggregated data and (3) gives predictions on demand from that other application. Time is crucial factor here. How and where can I store an instance of the trained model between those steps?
If you don't want to use a (SQL) database you can also use Django's caching framework to store nearly any kind of data that is somehow serializable. It offers a quite simplte and convenient API (cache.set()/cache.get() and you can use backends like memcached and redis (which could also be stored to disk). For more complicated use cases you might be looking into using redis with its own API which enables you to do more complicated stuff than when accessing it trough the caching API. Using these possiblities you can also share data between multiple processes/workers.

Writing a Django backend program that runs indefinitely -- what to keep in mind?

I am trying to write a Django app that queries a remote database for some data, performs some calculations on a portion of this data and stores the results (in the local database using Django models). It also filters another portion and stores the result separately. My front end then queries my Django database for these processed data and displays them to the user.
My questions are:
How do I write an agent program that continuously runs in the backend, downloads data from the remote database, does calculations/ filtering and stores the result in the local Django database ? Particularly, what are the most important things to keep in mind when writing a program that runs indefinitely?
Is using cron for this purpose a good idea ?
The data retrieved from the remote database belong to multiple users and each user's data must be kept/ stored separately in my local database as well. How do I achieve that? using row-level/ class-instance level permissions maybe? Remember that the backend agent does the storage, update and delete. Front end only reads data (through http requests).
And finally, I allow creation of new users. If a new user has valid credentials for the remote database the user should be allowed to use my app. In which case, my backend will download this particular user's data from the remote database, performs calculations/ filtering and presents the results to the user. How can I handle the dynamic creation of objects/ database tables for the new users? and how can I differentiate between users' data when retrieving them ?
Would very much appreciate answers from experienced programmers with knowledge of Django. Thank you.
For
1) The standard get-go solution for timed and background task is Celery which has Django integration. There are others, like Huey https://github.com/coleifer/huey
2) The usual solution is that each row contains user_id column for which this data belongs to. This maps to User model using Django ORM's ForeignKey field. Do your users to need to query the database directly or do they have direct database accounts? If not then this solution should be enough. It sounds like it your front end has 1 database connection and all permission logic is handled by the front end, not the database itself.
3) See 2

Web application: Hold large object between requests

I'm working on a web application related to genome searching. This application makes use of this suffix tree library through Cython bindings. Objects of this type are large (hundreds of MB up to ~10GB) and take as long to load from disk as it takes to process them in response to a page request. I'm looking for a way to load several of these objects once on server boot and then use them for all page requests.
I have tried using a remote manager / client setup using the multiprocessing module, modeled after this demo, but it fails when the client connects with an error message that says the object is not picklable.
I would suggest writing a small Flask (or even raw WSGI… But it's probably simpler to use Flask, as it will be easier to get up and running quickly) application which loads the genome database then exposes a simple API. Something like this:
app = Flask(__name__)
database = load_database()
#app.route('/get_genomes')
def get_genomes():
return database.all_genomes()
app.run(debug=True)
Or, you know, something a bit more sensible.
Also, if you need to be handling more than one request at a time (I believe that app.run will only handle one at a time), start by threading… And if that's too slow, you can os.fork() after the database is loaded and run multiple request handlers from there (that way they will all share the same database in memory).

Python Web Backend

I am an experienced Python developer starting to work on web service
backend system. The system feeds data (constantly) from the web to a
MySQL database. This data is later displayed by a frontend side (there
is no connection between the frontend and the backend). The backend
system constantly downloads flight information from the web (some of
the data is fetched via APIs, and some by downloading and parsing
text / xls files). I already have a script that downloads the data,
parses it, and inserts it to the MySQL db - all in a big loop. The
frontend side is just a bunch of php pages that properly display the
data by querying the MySQL server.
It is crucial that this web service be robust, strong and reliable.
Therefore, I have been looking into the proper ways to design it, and came across the following parts to comprise my system:
1) django as a framework (for HTTP connections and for using Piston)
2) Piston as an API provider (this is great because then my front-end can use the API instead of actually running queries)
3) SQLAlchemy as the DB layer (I don't like the little control you get when using django ORM, I want to be able to run a more complex DB framework)
4) Apache with mod_wsgi to run everything
5) And finally, Celery (or django-cron) to actually run my infinite loop that pulls the data off the web - hopefully in some sort of organized tasks format). This is the part I am least sure of, and any pointers are appreciated.
This all sounds great. I used django before to write websites (aka
request handlers that return data). However, other than using Celery or django-cron I can't really see how it fits a role of a constant data feeding backend.
I just wanted to run this by you guys to hear your ideas / comments. Any input you have / pointers to documentation and/or other libraries would be greatly greatly appreciated!
If You are about to use SQLAlchemy, I would refrain from using Django: Django is fine if You are using the whole stack, but as You are about to rip Models off, I do not see much value in using it and I would take a look at another option (perhaps Pylons or pure old CherryPy would do).
Even more so if FEs will not run queries, but only ask API providers.
As for robustness, I am more satisfied with starting separate fcgi processess with supervise and using more lightweight web server (ligty / nginx), but that's a matter of taste.
For the "infinite loop" part, it depends on what behavior you want: if there is a problem with the source, would you just like to skip the step or repeat it multiple times when source is back up?
Periodic Tasks might be good for former, while cron that would just spawn scraping tasks is better for latter.

Can Django Test Client Be Used for API Calls in Production?

I'm building a Django app with an API built on Piston. For the sake of keeping everything as DRY as possible and the API complete, I'd like my internal applications to call the API rather than the models (kind of a proxy-view-controller a la https://github.com/raganwald/homoiconic/blob/master/2010/10/vc_without_m.md but all on one django install for now). So the basic setup is:
Model -> API -> Application -> User Client
I can overload some core Piston classes to create an internal client interface for the application, but I'm wondering if I could just use the Django Test Client to accomplish the same thing. So to create an article, rather than calling the model I would run:
from django.test.client import Client
c = Client()
article = c.post('/api/articles', {
'title' : 'My Title',
'content' : 'My Content'
})
Is there a reason I shouldn't use the test client to do this? (performance, for instance) Is there a better tool that's more tailored for this specific purpose?
After reviewing the code for TestClient, it doesn't appear to have any additional overhead related to testing. Rather, it just functions as a basic client for internal requests. I'll be using the test client as the internal client, and using Piston's DjangoEmitter to get model objects back from the API.
Only testing will tell whether the internal request mechanism is too much of a performance hit.

Categories