How to realize a RW lock with multithreading gRPC?

How to realize a RW lock with multithreading gRPC? - python

I am trying to write a simple multi-threading server-client stock trading Python program with gRPC and concurrent.future.ThreadPoolExecutor
The server will run a specific number of threads, performs Lookup() and Trade() request sent from clients. The server will maintain a list. That means Lookup() should set read lock, and Trade() should set write lock.
However, it seems that the documentation of gRPC doesn't mention anything about RW lock. Is ThreadPoolExecutor thred-safe?
Any suggestion is appreciated!

Related

Interprocess communication with SPSC queue in python

I have multiple write-heavy Python applications (producer1.py, producer2.py, ...) and I'd like to implement an asynchronous, non-blocking writer (consumer.py) as a separate process, so that the producers are not blocked by disk access or contention.
To make this more easily optimizable, assume I just need to expose a logging call that passes a fixed length string from a producer to the writer, and the written file does not need to be sorted by call time. And the target platform can be Linux-only. How should I implement this with minimal latency penalty on the calling thread?
This seems like an ideal setup for multiple lock-free SPSC queues but I couldn't find any Python implementations.
Edit 1
I could implement a circular buffer as a memory-mapped file on /dev/shm, but I'm not sure if I'll have atomic CAS in Python?

The simplest way would be using an async TCP/Unix Socket server in consumer.py.
Using HTTP will be an overhead in this case.
A producer, TCP/Unix Socket client, will send data to consumer then consumer will respond right away before writing data in disk drive.
File IO in consumer are blocking but it will not block producers as stated above.

Python multiple processes instead of threads?

I am working on a web backend that frequently grabs realtime market data from the web, and puts the data in a MySQL database.
Currently I have my main thread push tasks into a Queue object. I then have about 20 threads that read from that queue, and if a task is available, they execute it.
Unfortunately, I am running into performance issues, and after doing a lot of research, I can't make up my mind.
As I see it, I have 3 options:
Should I take a distributed task approach with something like Celery?
Should I switch to JPython or IronPython to avoid the GIL issues?
Or should I simply spawn different processes instead of threads using processing?
If I go for the latter, how many processes is a good amount? What is a good multi process producer / consumer design?
Thanks!

Maybe you should use an event-driven approach, and use an event-driven oriented frameworks like twisted(python) or node.js(javascript), for example this frameworks make use of the UNIX domain sockets, so your consumer listens at some port, and your event generator object pushes all the info to the consumer, so your consumer don't have to check every time to see if there's something in the queue.

First, profile your code to determine what is bottlenecking your performance.
If each of your threads are frequently writing to your MySQL database, the problem may be disk I/O, in which case you should consider using an in-memory database and periodically write it to disk.
If you discover that CPU performance is the limiting factor, then consider using the multiprocessing module instead of the threading module. Use a multiprocessing.Queue object to push your tasks. Also make sure that your tasks are big enough to keep each core busy for a while, so that the granularity of communication doesn't kill performance. If you are currently using threading, then switching to multiprocessing would be the easiest way forward for now.

Should I use epoll or just blocking recv in threads?

I'm trying to write a scalable custom web server.
Here's what I have so far:
The main loop and request interpreter are in Cython. The main loop accepts connections and assigns the sockets to one of the processes in the pool (has to be processes, threads won't get any benefit from multi-core hardware because of the GIL).
Each process has a thread pool. The process assigns the socket to a thread.
The thread calls recv (blocking) on the socket and waits for data. When some shows up, it gets piped into the request interpreter, and then sent via WSGI to the application running in that thread.
Now I've heard about epoll and am a little confused. Is there any benefit to using epoll to get socket data and then pass that directly to the processes? Or should I just go the usual route of having each thread wait on recv?
PS: What is epoll actually used for? It seems like multithreading and blocking fd calls would accomplish the same thing.

If you're already using multiple threads, epoll doesn't offer you much additional benefit.
The point of epoll is that a single thread can listen for activity on many file selectors simultaneously (and respond to events on each as they occur), and thus provide event-driven multitasking without requiring the spawning of additional threads. Threads are relatively cheap (compared to spawning processes), but each one does require some overhead (after all, they each have to maintain a call stack).
If you wanted to, you could rewrite your pool processes to be single-threaded using epoll, which would reduce your overall thread usage count, but of course you'd have to consider whether that's something you care about or not - in general, for low numbers of simultaneous requests on each worker, the overhead of spawning threads wouldn't matter, but if you want each worker to be able to handle 1000s of open connections, that overhead can become significant (and that's where epoll shines).
But...
What you're describing sounds suspiciously like you're basically reinventing the wheel - your:
main loop and request interpreter
pool of processes
sounds almost exactly like:
nginx (or any other load balancer/reverse proxy)
A pre-forking tornado app
Tornado is a single-threaded web server python module using epoll, and it has the capability built-in for pre-forking (meaning that it spawns multiple copies of itself as separate processes, effectively creating a process pool). Tornado is based on the tech created to power Friendfeed - they needed a way to handle huge numbers of open connections for long-polling clients looking for new real-time updates.
If you're doing this as a learning process, then by all means, reinvent away! It's a great way to learn. But if you're actually trying to build an application on top of these kinds of things, I'd highly recommend considering using the existing, stable, communally-developed projects - it'll save you a lot of time, false starts, and potential gotchas.
(P.S. I approve of your avatar. <3)

The epoll function (and the other functions in the same family poll and select) allow you to write single threading networking code that manage multiple networking connection. Since there is no threading, there is no need fot synchronisation as would be required in a multi-threaded program (this can be difficult to get right).
On the other hand, you'll need to have an explicit state machine for each connection. In a threaded program, this state machine is implicit.
Those function just offer another way to multiplex multiple connexion in a process. Sometimes it is easier not to use threads, other times you're already using threads, and thus it is easier just to use blocking sockets (which release the GIL in Python).

Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?

i am working on a project that requires me to create multiple threads to download a large remote file. I have done this already but i cannot understand while it takes a longer amount of time to download a the file with multiple threads compared to using just a single thread. I used my xampp localhost to carry out the time elapsed test. I would like to know if its a normal behaviour or is it because i have not tried downloading from a real server.
Thanks
Kennedy

9 women can't combine to make a baby in one month. If you have 10 threads, they each have only 10% the bandwidth of a single thread, and there is the additional overhead for context switching, etc.

Python threading use something call the GIL (Golbal Interpreter Lock) that sometime degrade the programs execution time.
Without doing a lot of talk here i invite you to read this and this maybe it can help you to understand your problem, you can also see the two conference here and here.
Hope this can help :)

Twisted uses non-blocking I/O, that means if data is not available on socket right now, doesn't block the entire thread, so you can handle many socket connections waiting for I/O in one thread simultaneous. But if doing something different than I/O (parsing large amounts of data) you still block the thread.
When you're using stdlib's socket module it does blocking I/O, that means when you're call socket.read and data is not available at the moment — it will block entire thread, so you need one thread per connection to handle concurrent download.
These are two approaches to concurrency:
Fork new thread for new connection (threading + socket from stdlib).
Multiplex I/O and handle may connections in one thread (Twisted).

Processing High-Volume Streaming Data with Twisted or using Threads, Queue in Python

I am getting at extremely fast rate, tweets from a long-lived connection to the Twitter API Streaming Server. I proceed by doing some heavy text processing and save the tweets in my database.
I am using PyCurl for the connection and callback function that care of text processing and saving in the db. See below my approach who is not working properly.
I am not familiar with network programming, so would like to know:
How can use Threads, Queue or Twisted frameworks to solve this problem ?
def process_tweet():
# do some heaving text processing
def open_stream_connection():
connect = pycurl.Curl()
connect.setopt(pycurl.URL, STREAMURL)
connect.setopt(pycurl.WRITEFUNCTION, process_tweet)
connect.setopt(pycurl.USERPWD, "%s:%s" % (TWITTER_USER, TWITTER_PASS))
connect.perform()

You should have a number of threads receiving the messages as they come in. That number should probably be 1 if you are using pycurl, but should be higher if you are using httplib - the idea being you want to be able to have more than one query on the Twitter API at a time, so there is a steady amount of work to process.
When each Tweet arrives, it is pushed onto a Queue.Queue. The Queue ensures that there is thread-safety in the communications - each tweet will only be handled by one worker thread.
A pool of worker threads is responsible for reading from the Queue and dealing with the Tweet. Only the interesting tweets should be added to the database.
As the database is probably the bottleneck, there is a limit to the number of threads in the pool that are worth adding - more threads won't make it process faster, it'll just mean more threads are waiting in the queue to access the database.
This is a fairly common Python idiom. This architecture will scale only to a certain degree - i.e. what one machine can process.

Here's simple setup if you are OK with using a single machine.
1 thread accepts connections. After a connection is accepted, it passes the accepted connection to another thread for processing.
You can, of course, use processes (e.g, using multiprocessing) instead of threads, but I'm not familiar with multiprocessing to give advice. The setup would be the same: 1 process accepts connections, then passes them to subprocesses.
If you need to shard the processing across multiple machines, then the simple thing to do would be to stuff the message into the database, then notify the workers about the new record (this will require some sort of coordination/locking between the workers). If you want to avoid hitting the database, then you'll have to pipe messages from your network process to the workers (and I'm not well versed enough in low level networking to tell you how to do that :))

I suggest this organization:
one process reads Twitter, stuffs tweets into database
one or more processes reads database, processes each, inserts into new database. Original tweets either deleted or marked processed.
That is, you have two more more processes/threads. The tweet database could be seen as a queue of work. Multiple worker processes take jobs (tweets) off the queue, and create data in the second database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.