For a project we need to request data through an API (HTTP/1.1), depending on what you find you can then send a request.post with instructions after which the API will send back a response. I made the program multithreaded so that the main program keeps requesting data while in case I want to post an instruction I spawn a thread to do that. (request data only takes 1sec where posting instruction and getting reply can take upto 3sec to respond)
However the problem I am walking into is that sometimes 1 of my threads hangs and only finishes if I issue command thread.join().I can see that it hangs as the data that i get in the main thread should resemble my previous instructions (send by the threads), (I allow a 5second period for the server to resemble the instructions I send prior, so it is not the case that the server is not yet updated). If I would now send the same instructions again I will find that now both instructions make it to the server (the hanging one, and the newly issued one). So somehow sending the new instructions has as a side-effect the previous instructions also get send.
The problem looks related to threading as my code doesn't hang when just executed serial. Looking at posts like this didn't help as I do not know in advance what my instruction needs to be for my asynchronous requests. It is important that I make use of persistent connections and reuse them as that saves alot of time on handshakes etc.
My questions:
What is a proper way of handling a connectionpool of persistent connections in a multithreaded way. (so it doesn't hang)
How can I debug/troubleshoot the Thread to find out where it hangs.
Requests gets recommended as a package often but maybe there are others, better suited for this kind of application?
Example Code:
import requests
from threading import Thread
req = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections = 10, pool_maxsize=10)
req.mount('',adapter)
url='http://python-requests.org'
def main(url=''):
thread_list=[]
counter=0
while True:
resp = req.get(url)
interesting = 1 #
if interesting:
instructions = {}
a = Thread(target = send_instructions, kwargs = dict(url = url, instructions = instructions))
a.start()
thread_list.append(a)
tmp=[]
for x in thread_list:
if x.isAlive():
tmp.append(x)
thread_list = tmp
if counter>10:
break
counter+=1
def send_instructions(url='', instructions=''):
resp=req.post(url, headers = instructions)
print(resp)
main(url)
Related
Disclaimer: This is similar to some other questions relating to this error but my program is not using any multi-threading/processing and I'm working with the 'requests' module instead of raw socket commands so none of the solutions I saw related to my issue.
I have a basic status-checking program running Python 3.4 on Windows that uses a GET request to pull some data off a status site hosted by a number of servers I have to keep watch over. The core code is setup like this:
import requests
import time
URL_LIST = [some, list, of, the, status, sites] # https:// sites
session = requests.session()
previous_data = ""
while 1:
data = ""
for url in URL_LIST:
headers = {'X-Auth-Token': Associated_Auth_Token}
try:
status = session.get(url, headers=headers).json()['status']
except ConnectionError:
status = "SERVER DOWN"
data += "%s \t%s\n" % (url, status)
if data != previous_data:
print(data)
previous_data = data
time.sleep(15)
...which typically runs just fine for hours (this script is intended to run 24/7 and has additional logging built in I left out here for simplicity and relevance) but eventually it crashes and throws the error mentioned in the title:
[WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
The servers I'm requesting from are notoriously slow at times (and sometimes go down entirely, hence the try/except) so my inclination would be that after looping this over and over eventually a request has not fully finished before the next request comes through and Windows tries to step on itself, but I don't see how that could happen with my code since it iterates serially through the URLs.
Also, if this is a TIME_WAIT issue as some other related posts ran into, I'd rather not have to wait for that to finish since I'd like to update every 15 seconds or better, so then I considered closing and opening a new requests session every so often since it typically works fine for hours before hitting a snag, but based off Lukasa's comment here:
To avoid getting sockets in TIME_WAIT, the best thing to do is to use a single Session object at as high a scope as you can and leave it open for the lifetime of your program. Requests will do its best to re-use the sockets as much as possible, which should prevent them lapsing into TIME_WAIT
...it sounds that is not a good idea - though when he says 'lifetime of your program' he may not intend the statement to include 24/7 use as in my case.
So instead of blindly trying things and waiting some number of hours for the program to crash again so I can see if the error changes, I wanted to consult the wealth of knowledge here first to see if anyone can see what's going wrong and knows how I should fix it.
Thanks!
I have a script that in the end executes two functions. It polls for data on a time interval (runs as daemon - and this data is retrieved from a shell command run on the local system) and, once it receives this data will: 1.) function 1 - first write this data to a log file, and 2.) function 2 - observe the data and then send an email IF that data meets certain criteria.
The logging will happen every time, but the alert may not. The issue is, in cases that an alert needs to be sent, if that email connection stalls or takes a lengthy amount of time to connect to the server, it obviously causes the next polling of the data to stall (for an undisclosed amount of time, depending on the server), and in my case it is very important that the polling interval remains consistent (for analytics purposes).
What is the most efficient way, if any, to keep the email process working independently of the logging process while still operating within the same application and depending on the same data? I was considering creating a separate thread for the mailer, but that kind of seems like overkill in this case.
I'd rather not set a short timeout on the email connection, because I want to give the process some chance to connect to the server, while still allowing the logging to be written consistently on the given interval. Some code:
def send(self,msg_):
"""
Send the alert message
:param str msg_: the message to send
"""
self.msg_ = msg_
ar = alert.Alert()
ar.send_message(msg_)
def monitor(self):
"""
Post to the log file and
send the alert message when
applicable
"""
read = r.SensorReading()
msg_ = read.get_message()
msg_ = read.get_message() # the data
if msg_: # if there is data in general...
x = read.get_failed() # store bad data
msg_ += self.write_avg(read)
msg_ += "==============================================="
self.ctlog.update_templog(msg_) # write general data to log
if x:
self.send(x) # if bad data, send...
This is exactly the kind of case you want to use threading/subprocesses for. Fork off a thread for the email, which times out after a while, and keep your daemon running normally.
Possible approaches that come to mind:
Multiprocessing
Multithreading
Parallel Python
My personal choice would be multiprocessing as you clearly mentioned independent processes; you wouldn't want a crashing thread to interrupt the other function.
You may also refer this before making your design choice: Multiprocessing vs Threading Python
Thanks everyone for the responses. It helped very much. I went with threading, but also updated the code to be sure it handled failing threads. Ran some regressions and found that the subsequent processes were no longer being interrupted by stalled connections and the log was being updated on a consistent schedule . Thanks again!!
import requests
while True:
try:
posting = requests.post(url,json = data,headers,timeout = 3.05)
except requests.exceptions.ConnectionError as e:
continue
# If a read_timeout error occurs, start from the beginning of the loop
except requests.exceptions.ReadTimeout as e:
continue
a link to more code : Multiple accidental POST requests in Python
This code is using requests library to perform POST requests indefinitely. I noticed that when try fails multiple of times and the while loop starts all over multiple of times, that when I can finally send the post request, I find out multiple of entries from the server side at the same second. I was writing to a txt file at the same time and it showed one entry only. Each entry is 5 readings. Is this an issue with the library itself? Is there a way to fix this?! No matter what kind of conditions that I put it still doesn't work :/ !
You can notice the reading at 12:11:13 has 6 parameters per second while at 12:14:30 (after the delay, it should be every 10 seconds) it is a few entries at the same second!!! 3 entries that make up 18 readings in one second, instead of 6 only!
It looks like the server receives your requests and acts upon them but fails to respond in time (3s is a pretty low timeout, a load spike/paging operation can easily make the server miss it unless it employs special measures). I'd suggest to
process requests asynchronously (e.g. spawn threads; Asynchronous Requests with Python requests discusses ways to do this with requests) and do not use timeouts (TCP has its own timeouts, let it fail instead).
reuse the connection(s) (TCP has quite a bit of overhead for connection establishing/breaking) or use UDP instead.
include some "hints" (IDs, timestamps etc.) to prevent the server from adding duplicate records. (I'd call this one a workaround as the real problem is you're not making sure if your request was processed.)
From the server side, you may want to:
Respond ASAP and act upon the info later. Do not let pending action prevent answering further requests.
I have a bunch of servers with multiple instances accessing a resource that has a hard limit on requests per second.
I need a mechanism to lock the access on this resource for all servers and instances that are running.
There is a restful distributed lock manager I found on github: https://github.com/thefab/restful-distributed-lock-manager
Unfortunately there seems to be a min. lock time of 1 second and it's relatively unreliable. In several tests it took between 1 and 3 seconds to unlock a 1 second lock.
Is there something well tested with a python interface I can use for this purpose?
Edit: I need something that auto unlocks in under 1 second. The lock will never be released in my code.
My first idea was using Redis. But there are more great tools and some are even lighter, so my solution builds on zmq. For this reason you do not have to run Redis, it is enough to run small Python script.
Requirements Review
Let me review your requirements before describing solution.
limit number of request to some resource to a number of requests within fixed period of time.
auto unlocking
resource (auto) unlocking shall happen in time shorter than 1 second.
it shall be distributed. I will assume, you mean that multiple distributed servers consuming some resource shall be able and it is fine to have just one locker service (more on it at Conclusions)
Concept
Limit number of requests within timeslot
Timeslot can be a second, more seconds, or shorter time. The only limitation is precision of time measurement in Python.
If your resource has hard limit defined per second, you shall use timeslot 1.0
Monitoring number of requests per timeslot until next one starts
With first request for accessing your resource, set up start time for next timeslot and initialize request counter.
With each request, increase request counter (for current time slot) and allow the request unless you have reached max number of allowed requests in current time slot.
Serve using zmq with REQ/REP
Your consuming servers could be spread across more computers. To provide access to LockerServer, you will use zmq.
Sample code
zmqlocker.py:
import time
import zmq
class Locker():
def __init__(self, max_requests=1, in_seconds=1.0):
self.max_requests = max_requests
self.in_seconds = in_seconds
self.requests = 0
now = time.time()
self.next_slot = now + in_seconds
def __iter__(self):
return self
def next(self):
now = time.time()
if now > self.next_slot:
self.requests = 0
self.next_slot = now + self.in_seconds
if self.requests < self.max_requests:
self.requests += 1
return "go"
else:
return "sorry"
class LockerServer():
def __init__(self, max_requests=1, in_seconds=1.0, url="tcp://*:7777"):
locker=Locker(max_requests, in_seconds)
cnt = zmq.Context()
sck = cnt.socket(zmq.REP)
sck.bind(url)
while True:
msg = sck.recv()
sck.send(locker.next())
class LockerClient():
def __init__(self, url="tcp://localhost:7777"):
cnt = zmq.Context()
self.sck = cnt.socket(zmq.REQ)
self.sck.connect(url)
def next(self):
self.sck.send("let me go")
return self.sck.recv()
Run your server:
run_server.py:
from zmqlocker import LockerServer
svr = LockerServer(max_requests=5, in_seconds=0.8)
From command line:
$ python run_server.py
This will start serving locker service on default port 7777 on localhost.
Run your clients
run_client.py:
from zmqlocker import LockerClient
import time
locker_cli = LockerClient()
for i in xrange(100):
print time.time(), locker_cli.next()
time.sleep(0.1)
From command line:
$ python run_client.py
You shall see "go", "go", "sorry"... responses printed.
Try running more clients.
A bit of stress testing
You may start clients first and server later on. Clients will block until the server is up, and then will happily run.
Conclusions
described requirements are fulfilled
number of requests is limited
no need to unlock, it allows more requests as soon as there is next time slot available
LockerService is available over network or local sockets.
it shall be reliable, zmq is mature solution, python code is rather simple
it does not require time synchronization across all participants
performance will be very good
On the other hand, you may find, that limits of your resource are not so predictable as you assume, so be prepared to play with parameters to find proper balance and be always prepared for exceptions from this side.
There is also some space for optimization of providing "locks" - e.g. if locker runs out of allowed requests, but current timeslot is already almost completed, you might consider waiting a bit with your "sorry" and after a fraction of second provide "go".
Extending it to real distributed lock manager
By "distributed" we might also understand multiple locker servers running together. This is more difficult to do, but is also possible. zmq allows very easy connection to multiple urls, so clients could really easily connect to multiple locker servers. There is a question, how to coordinate locker servers not to allow too many request to your resource. zmq allows inter-server communication. One model could be, that each locker server would publish each provided "go" on PUB/SUB. All other locker servers would be subscribed, and used each "go" to increase their local request counter (with a bit modified logic).
The lowest effort way to implement this is to use lockable.
It offers low-level lock semantics and it comes with a Python client. Iportantly, you don't need to set up any database or server, it works by storing the lock on the lockable servers.
Locks have variable TTLs, but you can also release them early:
$ pip install lockable-dev
from lockable import Lock
my_lock = Lock('my-lock-name')
# acquire the lock
my_lock.acquire()
# release the lock
my_lock.release()
For my cluster I'm using ZooKeeper with python-kazoo library for queues and locks.
Modified example from kazoo api documentation for your purpose:
http://kazoo.readthedocs.org/en/latest/api/recipe/lock.html
zk = KazooClient()
lock = zk.Lock("/lockpath", "my-identifier")
if lock.acquire(timeout=1):
code here
lock.release()
But you need at least three nodes for ZooKeeper as I remember.
Your requirements seem very specific. I'd consider writing a simple lock server then implementing the locks client side with a class that acquires a lock when it is created then deletes the lock when it goes out of scope.
class Lock(object):
def __init__(self,resource):
print "Lock acquired for",resource
# Connect to lock server and acquire resource
def __del__(self):
print "Lock released"
# Connect to lock server and unlock resource if locked
def callWithLock(resource,call,*args,**kwargs):
lock = Lock(resource)
return call( *args, **kwargs )
def test( asdf, something="Else" ):
return asdf + " " + something
if __name__ == "__main__":
import sys
print "Calling test:",callWithLock( "resource.test", test, sys.argv[0] )
Sample output
$ python locktest.py
Calling test: Lock acquired for resource.test
Lock released
locktest.py Else
The distributed lock manager Taooka http://taooka.com has a TTL accuracy to nanoseconds. But it only has Golang client library.
I have a requirement where I need to hit up to 2000 URLs per minute and save the response to a database. The URLS need to be hit within 5 seconds of the start of every minute (but the response can wait). Then, at the next minute, the same will happen and so on. So, it's time critical.
I've tried using Python multiprocessing and threading to solve the problem. However, some URLs may take up to 30 minutes to respond, which blocks all other URLs from being processed.
I'm also open to using something lower level such as C, but don't know where to start.
Any guidance in the right direction will help, thanks.
You need something lighter than a thread, since if each URL can block for a long time then you'll need to send them all simultaneously instead of via a thread pool.
gevent is a Python wrapper around the eventlib loop that's good at this sort of thing. From their docs:
>>> import gevent
>>> from gevent import socket
>>> urls = ['www.google.com', 'www.example.com', 'www.python.org']
>>> jobs = [gevent.spawn(socket.gethostbyname, url) for url in urls]
>>> gevent.joinall(jobs, timeout=2)
>>> [job.value for job in jobs]
['74.125.79.106', '208.77.188.166', '82.94.164.162']
I am not sure if I have understood the problem correctly, but if you are using 'n' processes and if all 'n' of them get stuck on a response, then changing the language will not solve your issue. Since the bottleneck is the server which you are requesting, and not your local driver code. You can eliminate this dependency by switching to an asynchronous mechanism. Do not wait for the response! Let a callback handle it for you!
EDIT: You might want to have a look at https://github.com/kennethreitz/grequests