Best approach for Python Synchronized Locks across machines

Best approach for Python Synchronized Locks across machines - python

So I have User A and Use rB both accessing the same script do_cool_things.py on the network.
Id like to make that the method critical_cool_things() is only accessed by one user at a time.
What would be the best approach for this?
My first thought was threading or multiprocessing, but that would requires each python instance to share some memory in order to use the same locks. This doesn't seem possible if its separate machines are accessing do_cool_things.py.
I'm now thinking a simple .lock file in a common location would suffice.
What do you think?

you can use redis like:
if redis.setnx(self.key, expires):
see an example here

Related

What is the easiest way to copy a class instance that contains SimPy processes?

I'm trying to create a copy of a class instance that I can simulate without affecting the original instance of the class. I've tried using copy.copy, but I run into this problem:
system.simulate(until=100)
print(system.env.now) # prints 100
copy_of_system = copy.copy(system)
copy_of_system.simulate(until=200)
print(copy_of_system.env.now) # prints 200
print(system.env.now) # prints 200, but should print 100
When I use copy.deepcopy I get TypeError: can't pickle generator objects. Is there any effective way to create an independent copy of the system object?

One way of having an existing simpy.Environment take several different paths of execution in parallel would be to use os.fork() when you're done setting up the Environment. You then can, for example, leverage the interprocess communication primitives in multiprocessing, such as Queues, etc, for collecting the interesting results of the different simulations. This requires writing a fair amount of boilerplate associated with manual forking, but it can be more than worth it.
NB: os.fork(), that I know of, is available only under Unix-like OSes.

Another way is to create two instances of the your model, each in their own environment, and just run one longer then the other. You can use a common seed for any random generators. You can use a process with a time out to make changes at your branch points. Of course the longer the warm up, the slower this method will be.

Concurrent file accesses from different scripts python

I have several scripts. Each of them does some computation and it is completely independent from the others. Once these computations are done, they will be saved to disk and a record updated.
The record is maintained by an instance of a class, which saves itself to disks. I would like to have a single record instance used in multiple scripts (for example, record_manager = RecordManager(file_on_disk). And then record_manager.update(...) ); but I can't do this right now, because when updating the record there may be concurrent write accesses to the same file on disk, leading to data loss. So I have a separate record manager for every script, and then I merge the records manually later.
What is the easiest way to have a single instance used in all the scripts that solves the concurrent write access problem?
I am using macOS (High sierra) and linux (Ubuntu 16.04).
Thanks!

To build a custom solution to this you will probably need to write a short new queuing module. This queuing module will have write access to the file(s) alone and be passed write actions from the existing modules in your code.
The queue logic and logic should be a pretty straightforward queue architecture.
There may also be libraries that exist in python to handle this problem that would avoid you writing your own queue class.
Finally, it is possible that this whole thing will be/could be handled in some way by your OS, independent of python.

control access to a resource with two application instances

I have a use case where i have to count the usage of my application that which feature is used most. I am storing all my statistics in a file. Each time application closes it will write the statistics to that file.
Application is standalone and is in the server, and multiple people can use the application at the same time by running the application(So its like multiple copy of application will be running at different places).
So the problem is if multiple people try to update the stat at the same time there are chances of getting the false statistics(concurrency control problem). So how can I handle such situations in python?
I store the followiing data in my stat file::
stats = {user1 : {feature1 : count, feature2 : count, etc..},
user2 : {feature1 : count, feature2 : count, etc..}
}

You can use portalocker, it's a great module that gives you a portable way to use filesystem locks.
If you are confined to one platform, you could use platform-specific file-locking primitives, usually exposed via python standard library.
You could also use a proper database but it seems like a huge overkill for the task at hand.
EDIT: seeing that you are about to use threading locks, don't! Threading locks (mutexes or semaphores) only prevent several threads in the same process from accessing shared variable, they do not work across separate processes, especially independant program instances! What you need is a file-locking mechanism, not thread-locks.

Your problem seems like its a flavor of the readers-writers problem which is extended to transactions from the database. I would create a SQL-transactional class which is primarily responsible for the CRUD operations with your database. I would then have a flag which could be locked and released in a very similar way to mutex locks, this way you are able to ensure that no two concurrent processes are in their critical section (performing an UPDATE or INSERT into the DB) at the same time.
I also believe there is a readers-writers lock which specifically deals with controlling access to a shared resource, which might be of interest to you.
Please let me know if you have any questions!

In your code where the shared resource can be used by multiple applications, use Locks
import threading
mylock = threading.Lock()
with mylock:
....
#access the shared resource

How should I share and store data in a small multithreaded python application?

I'm writing a small multithreaded client-side python application that contains a small webserver (only serves page to the localhost) and a daemon. The webserver loads and puts data into a persistent "datastore", and the daemon processes this data, modifies it and adds some more. It should also takes care of the synchronization with the disk.
I'd like to avoid complicated external things like SQL or other databases as much as possible.
What are good and simple ways to design the datastore? Bonus points if your solution uses only standard python.

What you're seeking isn't too Python specific, because AFAIU you want to communicate between two different processes, which are only incidentally written in Python. If this indeed is your problem, you should look for a general solution, not a Python-specific one.
I think that a simple No-SQL key-value datastore such as Redis, for example, could be a very nice solution for your situation. Contrary to "complicated" using a tool designed specifically for such a purpose will actually make your code simpler.
If you insist on a Python-only solution, then consider using the Python bindings for SQLite which come pre-installed with Python. An SQLite DB can be concurrently used by two processes in a safe manner, as long as your semantics of data access are well defined (i.e. problems you have to solve anyway, the tool nonwithstanding).

What are some good ways to do intermachine locking?

Our server cluster consists of 20 machines, each with 10 pids of 5 threads. We'd like some way to prevent any two threads, in any pid, on any machine, from modifying the same object at the same time.
Our code's written in Python and runs on Linux, if that helps narrow things down.
Also, it's a pretty rare case that two such threads want to do this, so we'd prefer something that optimizes the "only one thread needs this object" case to be really fast, even if it means that the "one thread has locked this object and another one needs it" case isn't great.
What are some of the best practices?

If you want to synchronize across machines you need a Distributed Lock Manager.
I did some quick googling and came up with: Stackoverflow.
Unfortunately they only suggest Java version, but it's a start.
If you are trying to synchronize access to files: Your filesystem should already have some wort of locking service in place. If not consider changing it.

I assume you came across this blog post http://amix.dk/blog/post/19386 during your googling?
The author demonstrates a simple interface to memcachedb which it uses as a dummy distributed lock manager. It's a great idea, and memcache is probably one of the faster thing's you'll be able to interface with. Note that it does use the more recently added with statement.
Here is an example usage from his blog post:
from __future__ import with_statement
import memcache
from memcached_lock import dist_lock
client = memcache.Client(['127.0.0.1:11211'])
with dist_lock('test', client):
print 'Is there anybody out there!?'

if you can get the complete infrastructure for a distributed lock manager then go ahead and use that. But that infrastructure is not easy to setup! But here is a practical solution:
-designate the node with the lowest ip address as the the master node
(that means if the node with lowest ip address hangs, a new node with lowest ip address will become new master)
-let all nodes contact the master node to get the lock on the object.
-let the master node use native lock semantics to get the lock.
this will simplify things unless you need complete clustering infrastructure and DLM to do the job.

Write code using immutable objects. Write objects that implement the Singleton Pattern.
Use a stable Distributed messaging technology such as IPC, webservices, or XML-RPC.
I would take a look at Twisted. They got plenty of solutions for such task.
I wouldn't use threads in Python esp with regards to the GIL, I would look at using Processes as working applications and use a comms technology as described above for intercommunications.
Your singleton class could then appear in one of these applications and interfaced via comms technology of choice.
Not a fast solution with all the interfacing, but if done correctly should be stable.

There may be a better way of doing this, but i would use the Lock class from the threading module to access the "protected" objects in a with statement, here would be an example:
from __future__ import with_statement
from threading import Lock
mylock = Lock()
with mylock.acquire():
[ 'do things with protected data here' ]
[ 'the rest of the code' ]
for more examples about Lock usages, have a look here.
Edit: this solution isn't suitable for this question as threading.Lock is not distributed, sorry

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best approach for Python Synchronized Locks across machines - python

you can use redis like: if redis.setnx(self.key, expires): see an example here

Related

What is the easiest way to copy a class instance that contains SimPy processes?

Concurrent file accesses from different scripts python

control access to a resource with two application instances

How should I share and store data in a small multithreaded python application?

What are some good ways to do intermachine locking?

Categories

Resources