Limiting number of HTTP requests per second on Python - python

I've written a script that fetches URLs from a file and sends HTTP requests to all the URLs concurrently. I now want to limit the number of HTTP requests per second and the bandwidth per interface (eth0, eth1, etc.) in a session. Is there any way to achieve this on Python?

You could use Semaphore object which is part of the standard Python lib:
python doc
Or if you want to work with threads directly, you could use wait([timeout]).
There is no library bundled with Python which can work on the Ethernet or other network interface. The lowest you can go is socket.
Based on your reply, here's my suggestion. Notice the active_count. Use this only to test that your script runs only two threads. Well in this case they will be three because number one is your script then you have two URL requests.
import time
import requests
import threading
# Limit the number of threads.
pool = threading.BoundedSemaphore(2)
def worker(u):
# Request passed URL.
r = requests.get(u)
print r.status_code
# Release lock for other threads.
pool.release()
# Show the number of active threads.
print threading.active_count()
def req():
# Get URLs from a text file, remove white space.
urls = [url.strip() for url in open('urllist.txt')]
for u in urls:
# Thread pool.
# Blocks other threads (more than the set limit).
pool.acquire(blocking=True)
# Create a new thread.
# Pass each URL (i.e. u parameter) to the worker function.
t = threading.Thread(target=worker, args=(u, ))
# Start the newly create thread.
t.start()
req()

You could use a worker concept like described in the documentation:
https://docs.python.org/3.4/library/queue.html
Add a wait() command inside your workers to get them waiting between the requests (in the example from documentation: inside the "while true" after the task_done).
Example: 5 "Worker"-Threads with a waiting time of 1 sec between the requests will do less then 5 fetches per second.

Note the solution below still send the requests serially but limits the TPS (transactions per second)
TLDR;
There is a class which keeps a count of the number of calls that can still be made in the current second. It is decremented for every call that is made and refilled every second.
import time
from multiprocessing import Process, Value
# Naive TPS regulation
# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:
def __init__(self, expected_tps):
self.number_of_tokens = Value('i', 0)
self.expected_tps = expected_tps
self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket
def refill_bucket_per_second(self):
while True:
print("refill")
self.refill_bucket()
time.sleep(1)
def refill_bucket(self):
self.number_of_tokens.value = self.expected_tps
print('bucket count after refill', self.number_of_tokens)
def start(self):
self.bucket_refresh_process.start()
def stop(self):
self.bucket_refresh_process.kill()
def get_token(self):
response = False
if self.number_of_tokens.value > 0:
with self.number_of_tokens.get_lock():
if self.number_of_tokens.value > 0:
self.number_of_tokens.value -= 1
response = True
return response
def test():
tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
tps_bucket.start()
total_number_of_requests = 60 ## Let's say I want to send 60 requests
request_number = 0
t0 = time.time()
while True:
if tps_bucket.get_token():
request_number += 1
print('Request', request_number) ## This is my request
if request_number == total_number_of_requests:
break
print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
tps_bucket.stop()
if __name__ == "__main__":
test()

Related

Python Multiprocessing Schema For Dumping data through process1 and loading required amount by process 2

I Like To Draw Your Attention Since I Am New To Multiprocessing coding
here is the problem
i have two process that i have to run on differnt cores using multiprocessing module
the first task has to collect data from sensors line by line and append it to memory(data structure:python list)
it is a serial connection(one by one data given)
i have already a working code for this
but since i have to collect data for it till sensor is connected
for infinite time till code runs
the second process is to collect first 140 data from the data structure accesed above
and print in second process function for some other task
,br>
the pseduo code looks like
Buffer=[] #global list
def process1():
Obj=port.open()
a=read_data(Obj)
Buffer.append(a)
port.close()
retunrn Buffer
def process2(Buffer):
print('hello from process2')
if len(Buffer)>=140:
print(Buffer)
#do some task
else:
print(Buffer)
def interprocesscommunication()
import multiprocessing
while(True):
p1=multiprocessing.Process(target=process1)
p2=multiprocessing.Process(target=process2,args=(Buffer))
p1.run()
p2.run()
anyone has better schema of how to run both process for infinite time parallely on different core
Note: I am also in doubt about both process can share memory or not if not how?
Processes cannot share memory in this way.
You have to synchronize state in some way.
The most common approach is to use multiprocessing.Pipe()
Also, take a look on this example:
# example of using a duplex pipe between processes
from time import sleep
from random import random
from multiprocessing import Process
from multiprocessing import Pipe
# generate and send a value
def generate_send(connection, value):
# generate value
new_value = random()
# block
sleep(new_value)
# update value
value = value + new_value
# report
print(f'>sending {value}', flush=True)
# send value
connection.send(value)
# ping pong between processes
def pingpong(connection, send_first):
print('Process Running', flush=True)
# check if this process should seed the process
if send_first:
generate_send(connection, 0)
# run until limit reached
while True:
# read a value
value = connection.recv()
# report
print(f'>received {value}', flush=True)
# send the value back
generate_send(connection, value)
# check for stop
if value > 10:
break
print('Process Done', flush=True)
# entry point
if __name__ == '__main__':
# create the pipe
conn1, conn2 = Pipe(duplex=True)
# create players
player1 = Process(target=pingpong, args=(conn1,True))
player2 = Process(target=pingpong, args=(conn2,False))
# start players
player1.start()
player2.start()
# wait for players to finish
player1.join()
player2.join()
Source: Multiprocessing Pipe in Python

Turn for-loop code into multi-threading code with max number of threads

Background: I'm trying to do 100's of dymola simulations with the python-dymola interface. I managed to run them in a for-loop. Now I want them to run while multi-threading so I can run multiple models parallel (which will be much faster). Since probably nobody uses the interface, I wrote some simple code that also shows my problem:
1: Turn a for-loop into a definition that is run into another for-loop BUT both the def and the for-loop share the same variable 'i'.
2: Turn a for-loop into a definition and use multi-threading to execute it. A for-loop runs the command one by one. I want to run them parallel with a maximum of x threads at the same time. The result should be the same as when executing the for-loop
Example-code:
import os
nSim = 100
ndig='{:01d}'
for i in range(nSim):
os.makedirs(str(ndig.format(i)))
Note that the name of the created directories are just the numbers from the for-loop (this is important). Now instead of using the for-loop, I would love to create the directories with multi-threading (note: probably not interesting for this short code but when calling and executing 100's of simulation models it definitely is interesting to use multi-threading).
So I started with something simple I thought, turning the for-loop into a function that then is run inside another for-loop and hoped to have the same result as with the for-loop code above but got this error:
AttributeError: 'NoneType' object has no attribute 'start'
(note: I just started with this, because I did not use the def-statement before and the thread package is also new. After this I would evolve towards the multi-threading.)
1:
import os
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
for i in range(nSim):
simulation(i=i).start
After that failed, I tried to evolve to multi-threading (converting the for-loop into something that does the same but with multi-threading and by that running the code parallel instead of one by one and with a maximum number of threads):
2:
import os
import threading
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
if __name__ == '__main__':
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
Unfortunately that attempt failed as well and now I got the error:
NameError: name 'i' is not defined
Does anybody has suggestions for issues 1 or 2?
Both examples are incomplete. Here's a complete example. Note that target gets passed the name of the function target=simulation and a tuple of its arguments args=(i,). Don't call the function target=simulation(i=i) because that just passes the result of the function, which is equivalent to target=None in this case.
import threading
nSim = 100
def simulation(i):
print(f'{threading.current_thread().name}: {i}')
if __name__ == '__main__':
threads = [threading.Thread(target=simulation,args=(i,)) for i in range(nSim)]
for t in threads:
t.start()
for t in threads:
t.join()
Output:
Thread-1: 0
Thread-2: 1
Thread-3: 2
.
.
Thread-98: 97
Thread-99: 98
Thread-100: 99
Note you usually don't want more threads that CPUs, which you can get from multiprocessing.cpu_count(). You can use create a thread pool and use queue.Queue to post work that the threads execute. An example is in the Python Queue documentation.
Cannot call .start like this
simulation(i=i).start
on an non-threading object. Also, you have to import the module as well
It seems like you forgot to add 'for' and indent the code in your loop
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
to
for i in range(nSim):
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
If you would like to have max number of thread in a pool, and to run all items in the queue. We can continue #mark-tolonen answer and do like this:
import threading
import queue
import time
def main():
size_of_threads_pool = 10
num_of_tasks = 30
task_seconds = 1
q = queue.Queue()
def worker():
while True:
item = q.get()
print(my_st)
print(f'{threading.current_thread().name}: Working on {item}')
time.sleep(task_seconds)
print(f'Finished {item}')
q.task_done()
my_st = "MY string"
threads = [threading.Thread(target=worker, daemon=True) for i in range(size_of_threads_pool)]
for t in threads:
t.start()
# send the tasks requests to the worker
for item in range(num_of_tasks):
q.put(item)
# block until all tasks are done
q.join()
print('All work completed')
# NO need this, as threads are while True, so never will stop..
# for t in threads:
# t.join()
if __name__ == '__main__':
main()
This will run 30 tasks of 1 second in each, using 10 threads.
So total time would be 3 seconds.
$ time python3 q_test.py
...
All work completed
real 0m3.064s
user 0m0.033s
sys 0m0.016s
EDIT: I found another higher-level interface for asynchronously executing callables.
Use concurrent.futures, see the example in the docs:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Note the max_workers=5 that will tell the max number of threads, and
note the for loop for url in URLS that you can use.

Have Python wait for a function to finish before proceeding with the program

I have a python program that I have written. This python program calls a function within a module I have also written and passes it some data.
program:
def Response(Response):
Resp = Response
def main():
myModule.process_this("hello") #Send string to myModule Process_this function
#Should wait around here for Resp to contain the Response
print Resp
That function processes it and passes it back as a response to function Response in the main program.
myModule:
def process_this(data)
#process data
program.Response(data)
I checked and all the data is being passed correctly. I have left out all the imports and the data processing to keep this question as concise as possible.
I need to find some way of having Python wait for resp to actually contain the response before proceeding with the program. I've been looking threading and using semaphores or using the Queue module, but i'm not 100% sure how I would incorporate either into my program.
Here's a working solution with queues and the threading module. Note: if your tasks are CPU bound rather than IO bound, you should use multiprocessing instead
import threading
import Queue
def worker(in_q, out_q):
""" threadsafe worker """
abort = False
while not abort:
try:
# make sure we don't wait forever
task = in_q.get(True, .5)
except Queue.Empty:
abort = True
else:
# process task
response = task
# return result
out_q.put(response)
in_q.task_done()
# one queue to pass tasks, one to get results
task_q = Queue.Queue()
result_q = Queue.Queue()
# start threads
t = threading.Thread(target=worker, args=(task_q, result_q))
t.start()
# submit some work
task_q.put("hello")
# wait for results
task_q.join()
print "result", result_q.get()

Help adding threading for GUI progress

I have an FTP function that traces the progress of running upload but my understanding of threading is limited and i have been unable to implement a working solution... I'd like to add a GUI progress bar to my current Application by using threading. Can someone show me a basic function using asynchronous threads that can be updated from another running thread?
def ftpUploader():
BLOCKSIZE = 57344 # size 56 kB
ftp = ftplib.FTP()
ftp.connect(host)
ftp.login(login, passwd)
ftp.voidcmd("TYPE I")
f = open(zipname, 'rb')
datasock, esize = ftp.ntransfercmd(
'STOR %s' % os.path.basename(zipname))
size = os.stat(zipname)[6]
bytes_so_far = 0
print 'started'
while 1:
buf = f.read(BLOCKSIZE)
if not buf:
break
datasock.sendall(buf)
bytes_so_far += len(buf)
print "\rSent %d of %d bytes %.1f%%\r" % (
bytes_so_far, size, 100 * bytes_so_far / size)
sys.stdout.flush()
datasock.close()
f.close()
ftp.voidresp()
ftp.quit()
print 'Complete...'
Here's a quick overview of threading, just in case :) I won't go into too much detail into the GUI stuff, other than to say that you should check out wxWidgets. Whenever you do something that takes a long time, like:
from time import sleep
for i in range(5):
sleep(10)
You'll notice that to the user, the entire block of code seems to take 50 seconds. In those 5 seconds, your application can't do anything like update the interface, and so it looks like it's frozen. To solve this problem, we use threading.
Usually there are two parts to this problem; the overall set of things you want to process, and the operation that takes a while, that we'd like to chop up. In this case, the overall set is the for loop and the operation we want chopped up is the sleep(10) function.
Here's a quick template for the threading code, based on our previous example. You should be able to work your code into this example.
from threading import Thread
from time import sleep
# Threading.
# The amount of seconds to wait before checking for an unpause condition.
# Sleeping is necessary because if we don't, we'll block the os and make the
# program look like it's frozen.
PAUSE_SLEEP = 5
# The number of iterations we want.
TOTAL_ITERATIONS = 5
class myThread(Thread):
'''
A thread used to do some stuff.
'''
def __init__(self, gui, otherStuff):
'''
Constructor. We pass in a reference to the GUI object we want
to update here, as well as any other variables we want this
thread to be aware of.
'''
# Construct the parent instance.
Thread.__init__(self)
# Store the gui, so that we can update it later.
self.gui = gui
# Store any other variables we want this thread to have access to.
self.myStuff = otherStuff
# Tracks the paused and stopped states of the thread.
self.isPaused = False
self.isStopped = False
def pause(self):
'''
Called to pause the thread.
'''
self.isPaused = True
def unpause(self):
'''
Called to unpause the thread.
'''
self.isPaused = False
def stop(self):
'''
Called to stop the thread.
'''
self.isStopped = True
def run(self):
'''
The main thread code.
'''
# The current iteration.
currentIteration = 0
# Keep going if the job is active.
while self.isStopped == False:
try:
# Check for a pause.
if self.isPaused:
# Sleep to let the os schedule other tasks.
sleep(PAUSE_SLEEP)
# Continue with the loop.
continue
# Check to see if we're still processing the set of
# things we want to do.
if currentIteration < TOTAL_ITERATIONS:
# Do the individual thing we want to do.
sleep(10)
# Update the count.
currentIteration += 1
# Update the gui.
self.gui.update(currentIteration,TOTAL_ITERATIONS)
else:
# Stop the loop.
self.isStopped = True
except Exception as exception:
# If anything bad happens, report the error. It won't
# get written to stderr.
print exception
# Stop the loop.
self.isStopped = True
# Tell the gui we're done.
self.gui.stop()
To call this thread, all you have to do is:
aThread = myThread(myGui,myOtherStuff)
aThread.start()

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script
Use the threading module.
Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....
You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()
You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error
Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

Categories