Centralized object store for multiple Python processes - python

I'm looking for a pythonic and simple way to synchronously share a common data source across multiple Python processes.
I've been thinking about using Pyro4 or Flask to write a kind of a CRUD service that I can get and put objects from and into. But Flask appears to be a lot of coding for a simple task and Pyro4 seems to require some name service.
Do you know of any (preferably easy to use, matured, high-level) library or package that provides centralized storage and high performance access to objects shared across multiple Python processes?

Take a look at Redis
Redis is an in-memory key-value database.
And you can download redis-py to use redis with python

Related

Fastest way of communication between multiple EC2 instances in python

I am looking for the absolute fastest way to submit a short string from one to multiple EC2 Instances of the type t2.nano. Example: Something happens on Instance 1, Instance 2,3,4 should (almost) instantly know about it. Target is < 5ms. For now the instances are all in the same region and same cluster availability zone.
What I have looked at so far:
Shared drive where instance1 can drop the data and the rest of the instances can check it
-> Not possible as this instance type does not support shared drives
Redis
-> I tested this locally and it is pretty slow actually, at least XXms, and sometimes XXXms for one read and one write (just for
testing).
Any ideas how to solve this problem?
You can try AWS EFS
Multiple compute instances, including Amazon EC2, Amazon ECS, and AWS Lambda, can access an Amazon EFS file system at the same time, providing a common data source for workloads and applications running on more than one compute instance or server.
https://docs.aws.amazon.com/efs/latest/ug/whatisefs.html
Consider using Twisted for Publish/Subscribe between your clients, where clients see all messages posted by other clients.
Alternatively, consider Autobahn which builds abstraction layers on Twisted including WebSocket-based pub/sub and WAMP.

Using imported variable in Celery task

I have a Flask app which uses SocketIO to communicate with users currently online. I keep track of them by mapping the user ID with a session ID, which I can then use to communicate with them:
online_users = {'uid...':'sessionid...'}
I delcare this in my run.py file where the app is launched, and then I import it when I need it as such:
from app import online_users
I'm using Celery with RabbitMQ for task deployment, and I need to use this dict from within the tasks. So I import it as above, but when I use it it is empty even when I know it is populated. I realize after reading this that it is because each task is asynchronous and starts a new process with an empty dict, and so my best bet is to use some sort of database or cache.
I'd rather not run an additional service, and I only need to read from the dict (I won't be writing to it from the tasks). Is a cache/database my only option here?
That depends on what you have in the dict.... If it's something you can serialize to string you can serialize it to Json and pass it as an argument to that task. If it's an object you cannot serialize, then yes you need to use a cache/database.
I came across this discussion which seems to be a solution for exactly what I'm trying to do.
Communication through a message queue is now implemented in package python-socketio, through the use of Kombu, which provides a common API to work with several message queues including Redis and RabbitMQ.
Supposedly an official release will be out soon, but as of now it can be done using an additional package.

DB Connectivity from multiple modules

I am working on a system where a bunch of modules connect to a MS SqlServer DB to read/write data. Each of these modules are written in different languages (C#, Java, C++) as each language serves the purpose of the module best.
My question however is about the DB connectivity. As of now, all these modules use the language-specific Sql Connectivity API to connect to the DB. Is this a good way of doing it ?
Or alternatively, is it better to have a Python (or some other scripting lang) script take over the responsibility of connecting to the DB? The modules would then send in input parameters and the name of a stored procedure to the Python Script and the script would run it on the database and send the output back to the respective module.
Are there any advantages of the second method over the first ?
Thanks for helping out!
If we assume that each language you use will have an optimized set of classes to interact with databases, then there shouldn't be a real need to pass all database calls through a centralized module.
Using a "middle-ware" for database manipulation does offer a very significant advantage. You can control, monitor and manipulate your database calls from a central and single location. So, for example, if one day you wake up and decide that you want to log certain elements of the database calls, you'll need to apply the logical/code change only in a single piece of code (the middle-ware). You can also implement different caching techniques using middle-ware, so if the different systems share certain pieces of data, you'd be able to keep that data in the middle-ware and serve it as needed to the different modules.
The above is a very advanced edge-case and it's not commonly used in small applications, so please evaluate the need for the above in your specific application and decide if that's the best approach.
Doing things the way you do them now is fine (if we follow the above assumption) :)

Shared object between requests in Django

I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!
Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.
Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.
Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.

Websocket Server with twisted and Python doing complex jobs in the background

I want to code a Server which handles Websocket Clients while doing mysql selects via sqlalchemy and scraping several Websites on the same time (scrapy). The received data has to be calculated, saved to the db and then send to the websocket Clients.
My question ist how can this be done in Python from the logical point of view. How do I need to set up the code structure and what modules are the best solution for this job? At the moment I'm convinced of using twisted with threads in which the scrape and select stuff is running. But can this be done an easier way? I only find simple twisted examples but obviously this seems to be a more complex job. Are there similar examples? How do I start?
Cyclone, a Twisted-based 'network toolkit', based on/similar to facebook/friendfeed's Tornado server, contains support for WebSockets: https://github.com/fiorix/cyclone/blob/master/cyclone/web.py#L908
Here's example code:
https://github.com/fiorix/cyclone/blob/master/demos/websocket/websocket.tac
Here's an example of using txwebsocket:
http://www.saltycrane.com/blog/2010/05/quick-notes-trying-twisted-websocket-branch-example/
You may have a problem using SQLAlchemy with Twisted; from what I have read, they do not work well together (source). Are you married to SQLA, or would another, more compatible OR/M suffice?
Some twisted-friendly OR/Ms include Storm (a fork) and Twistar, and you can always fall back on Twisted's core db abstraction library twisted.enterprise.adbapi.
There are also async-friendly db libraries for other products, such as txMySQL, txMongo, and txRedis, and paisley (couchdb).
You could conceivably use both Cyclone (or txwebsockets) and Scrapy as child services of the same MultiService, running on different ports, but packaged within the same Application instance. The services may communicate, either through the parent service or some RPC mechanism (like JSONRPC, Perspective Broker, AMP, XML-RPC (2) etc), or you can just write to the db from the scrapy service and read from it using websockets. Redis would be great for this IMO.
Ideally you'll want to avoid writing your own WebSockets server, but since you're running Twisted, you might not be able to do that: there are several WebSockets implementations (see this search on PyPI). Unfortunately none of them are Twisted-based [Edit see #JP-Calderone's comment below.]
Twisted should drive the master server, so you probably want to begin with writing something that can be run via twistd (see here if your'e new to this). The WebSocket implementation mentioned by #JP-Calderone and Scrapy are both Twisted -based so they should be reasonable trivial to drive from your master Twisted-based server. SQLAlchemy will be more difficult, I've commented on this before in this question.

Categories