I have been working on a small PoC where I have been trying to improve my knowledge with Threading but unfortunately I got stuck and here I am,
import time
found_products = []
site_catalog = [
"https://www.graffitishop.net/Sneakers",
"https://www.graffitishop.net/T-shirts",
"https://www.graffitishop.net/Sweatshirts",
"https://www.graffitishop.net/Shirts"
]
def threading_feeds():
# Create own thread for each URL as we want to run concurrent
for get_links in site_catalog:
monitor_feed(link=get_links)
def monitor_feed(link: str) -> None:
old_payload = product_data(...)
while True:
new_payload = product_data(...)
if old_payload != new_payload:
for links in new_payload:
if links not in found_products:
logger.info(f'Detected new link -> {found_link} | From -> {link}')
# Execute new thread as we don't want to block this function wait for filtering() to be done before continue
filtering(link=found_link)
else:
logger.info("Nothing new")
time.sleep(60)
continue
def filtering(found_link):
...
1 - Im currently trying to do a monitor where I have multiple links to start with, my plan is that I would like to each url to run concurrently instead of needing to wait 1 by 1:
def threading_feeds():
# Create own thread for each URL as we want to run concurrent
for get_links in site_catalog:
monitor_feed(link=get_links)
How can I do that?
2 - If we seem to find a new product that has appeared to the given URL inside the monitor_feed, how can I set a new thread to execute the call filtering(link=found_link)? I dont want to wait for it to be done before it continues to loop back for the While True but instead it should do the filtering(link=found_link) in the background while it stil executes the monitor_feed
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(monitor_feed, site_catalog)
You can use ThreadPoolExecutor.
Related
Background: I'm trying to do 100's of dymola simulations with the python-dymola interface. I managed to run them in a for-loop. Now I want them to run while multi-threading so I can run multiple models parallel (which will be much faster). Since probably nobody uses the interface, I wrote some simple code that also shows my problem:
1: Turn a for-loop into a definition that is run into another for-loop BUT both the def and the for-loop share the same variable 'i'.
2: Turn a for-loop into a definition and use multi-threading to execute it. A for-loop runs the command one by one. I want to run them parallel with a maximum of x threads at the same time. The result should be the same as when executing the for-loop
Example-code:
import os
nSim = 100
ndig='{:01d}'
for i in range(nSim):
os.makedirs(str(ndig.format(i)))
Note that the name of the created directories are just the numbers from the for-loop (this is important). Now instead of using the for-loop, I would love to create the directories with multi-threading (note: probably not interesting for this short code but when calling and executing 100's of simulation models it definitely is interesting to use multi-threading).
So I started with something simple I thought, turning the for-loop into a function that then is run inside another for-loop and hoped to have the same result as with the for-loop code above but got this error:
AttributeError: 'NoneType' object has no attribute 'start'
(note: I just started with this, because I did not use the def-statement before and the thread package is also new. After this I would evolve towards the multi-threading.)
1:
import os
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
for i in range(nSim):
simulation(i=i).start
After that failed, I tried to evolve to multi-threading (converting the for-loop into something that does the same but with multi-threading and by that running the code parallel instead of one by one and with a maximum number of threads):
2:
import os
import threading
nSim = 100
ndig='{:01d}'
def simulation(i):
os.makedirs(str(ndig.format(i)))
if __name__ == '__main__':
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
Unfortunately that attempt failed as well and now I got the error:
NameError: name 'i' is not defined
Does anybody has suggestions for issues 1 or 2?
Both examples are incomplete. Here's a complete example. Note that target gets passed the name of the function target=simulation and a tuple of its arguments args=(i,). Don't call the function target=simulation(i=i) because that just passes the result of the function, which is equivalent to target=None in this case.
import threading
nSim = 100
def simulation(i):
print(f'{threading.current_thread().name}: {i}')
if __name__ == '__main__':
threads = [threading.Thread(target=simulation,args=(i,)) for i in range(nSim)]
for t in threads:
t.start()
for t in threads:
t.join()
Output:
Thread-1: 0
Thread-2: 1
Thread-3: 2
.
.
Thread-98: 97
Thread-99: 98
Thread-100: 99
Note you usually don't want more threads that CPUs, which you can get from multiprocessing.cpu_count(). You can use create a thread pool and use queue.Queue to post work that the threads execute. An example is in the Python Queue documentation.
Cannot call .start like this
simulation(i=i).start
on an non-threading object. Also, you have to import the module as well
It seems like you forgot to add 'for' and indent the code in your loop
i in range(nSim)
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
to
for i in range(nSim):
simulation_thread[i] = threading.Thread(target=simulation(i=i))
simulation_thread[i].daemon = True
simulation_thread[i].start()
If you would like to have max number of thread in a pool, and to run all items in the queue. We can continue #mark-tolonen answer and do like this:
import threading
import queue
import time
def main():
size_of_threads_pool = 10
num_of_tasks = 30
task_seconds = 1
q = queue.Queue()
def worker():
while True:
item = q.get()
print(my_st)
print(f'{threading.current_thread().name}: Working on {item}')
time.sleep(task_seconds)
print(f'Finished {item}')
q.task_done()
my_st = "MY string"
threads = [threading.Thread(target=worker, daemon=True) for i in range(size_of_threads_pool)]
for t in threads:
t.start()
# send the tasks requests to the worker
for item in range(num_of_tasks):
q.put(item)
# block until all tasks are done
q.join()
print('All work completed')
# NO need this, as threads are while True, so never will stop..
# for t in threads:
# t.join()
if __name__ == '__main__':
main()
This will run 30 tasks of 1 second in each, using 10 threads.
So total time would be 3 seconds.
$ time python3 q_test.py
...
All work completed
real 0m3.064s
user 0m0.033s
sys 0m0.016s
EDIT: I found another higher-level interface for asynchronously executing callables.
Use concurrent.futures, see the example in the docs:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Note the max_workers=5 that will tell the max number of threads, and
note the for loop for url in URLS that you can use.
from arcgis.gis import GIS
from IPython.display import display
gis = GIS("portal url", 'username', 'password')
#search for the feature layer named Ports along west coast
search_result = gis.content.search('title:Ports along west coast')
#access the item's feature layers
ports_item = search_result[0]
ports_layers = ports_item.layers
#query all the features and display it on a map
ports_fset = ports_layers[0].query() #an empty query string will return all
ports_flayer = ports_layers[0]
ports_features = ports_fset.features
# select San Francisco feature
sfo_feature = [f for f in ports_features if f.attributes['port_name']=='SAN FRANCISCO'][0]
sfo_feature.attributes
try:
update_result = ports_flayer.edit_features(updates=[sfo_edit])
except:
pass
This is the example that I have shown in which I am trying to update feature layer. Actually I am updating the records in a loop so there are many records. The problem is in the case "Internet Connection" is just shot down It get stuck to the function edit_features.
So there is no way It could go to except and continue the flow.
I just have to ctrl+c to stop script execution because it got hanged and edit_features() function. What can I do?
If I am in your situation i would search arcgis API docs for setting connection timeout, if you can't find any i suggest:
use threading module to run update function in separate thread, it is not the efficient way but in case if it gets stuck, you can continue running your remaining code.
use python requests library to check any website and you check for response code before making the update.
code for threading will look like:
from threading import Thread
from time import sleep
def update():
global update_result
update_result = ports_flayer.edit_features(updates=[sfo_edit])
try:
update_result = None
t1 = Thread(target=update)
t1.daemon = True # mark our thread as a daemon
t1.start()
sleep(10) # wait 10 seconds then timeout, adjust according to your needs
if update_result == None:
print('failed to update')
except Exception as e:
print(e)
I have below code to download a file inside a loop,
import wget
try:
wget.download(url)
except:
pass
But if the Internet goes down, it doesn't return!
So my whole loop is stuck.
I want to repeat the same download if internet goes down. So I wanna know does any error happen.
How can i mitigate this?
One simple solution is to move your download code to a thread and make it a separate process which can be interrupted.
You can use python Thread and Timer module to achieve it.
from threading import Thread, Timer
from functools import partial
import time
import urllib
def check_connectivity(t):
try:
urllib.request.urlopen("http://google.com", timeout=2)
except Exception as e:
t._Thread__stop()
class Download(Thread):
def run(self):
print("Trying to download file....")
con = partial(check_connectivity, self)
while True:
t = Timer(5, con) # Checks the connectivity every 5 second or less.
t.start()
# your download code....
def main():
down = Download()
down.start()
down.join()
You code move your main download loop inside the thread's run method. And start a timer inside which listens for the network connectivity.
I have a python program that I have written. This python program calls a function within a module I have also written and passes it some data.
program:
def Response(Response):
Resp = Response
def main():
myModule.process_this("hello") #Send string to myModule Process_this function
#Should wait around here for Resp to contain the Response
print Resp
That function processes it and passes it back as a response to function Response in the main program.
myModule:
def process_this(data)
#process data
program.Response(data)
I checked and all the data is being passed correctly. I have left out all the imports and the data processing to keep this question as concise as possible.
I need to find some way of having Python wait for resp to actually contain the response before proceeding with the program. I've been looking threading and using semaphores or using the Queue module, but i'm not 100% sure how I would incorporate either into my program.
Here's a working solution with queues and the threading module. Note: if your tasks are CPU bound rather than IO bound, you should use multiprocessing instead
import threading
import Queue
def worker(in_q, out_q):
""" threadsafe worker """
abort = False
while not abort:
try:
# make sure we don't wait forever
task = in_q.get(True, .5)
except Queue.Empty:
abort = True
else:
# process task
response = task
# return result
out_q.put(response)
in_q.task_done()
# one queue to pass tasks, one to get results
task_q = Queue.Queue()
result_q = Queue.Queue()
# start threads
t = threading.Thread(target=worker, args=(task_q, result_q))
t.start()
# submit some work
task_q.put("hello")
# wait for results
task_q.join()
print "result", result_q.get()
I've written a script that fetches URLs from a file and sends HTTP requests to all the URLs concurrently. I now want to limit the number of HTTP requests per second and the bandwidth per interface (eth0, eth1, etc.) in a session. Is there any way to achieve this on Python?
You could use Semaphore object which is part of the standard Python lib:
python doc
Or if you want to work with threads directly, you could use wait([timeout]).
There is no library bundled with Python which can work on the Ethernet or other network interface. The lowest you can go is socket.
Based on your reply, here's my suggestion. Notice the active_count. Use this only to test that your script runs only two threads. Well in this case they will be three because number one is your script then you have two URL requests.
import time
import requests
import threading
# Limit the number of threads.
pool = threading.BoundedSemaphore(2)
def worker(u):
# Request passed URL.
r = requests.get(u)
print r.status_code
# Release lock for other threads.
pool.release()
# Show the number of active threads.
print threading.active_count()
def req():
# Get URLs from a text file, remove white space.
urls = [url.strip() for url in open('urllist.txt')]
for u in urls:
# Thread pool.
# Blocks other threads (more than the set limit).
pool.acquire(blocking=True)
# Create a new thread.
# Pass each URL (i.e. u parameter) to the worker function.
t = threading.Thread(target=worker, args=(u, ))
# Start the newly create thread.
t.start()
req()
You could use a worker concept like described in the documentation:
https://docs.python.org/3.4/library/queue.html
Add a wait() command inside your workers to get them waiting between the requests (in the example from documentation: inside the "while true" after the task_done).
Example: 5 "Worker"-Threads with a waiting time of 1 sec between the requests will do less then 5 fetches per second.
Note the solution below still send the requests serially but limits the TPS (transactions per second)
TLDR;
There is a class which keeps a count of the number of calls that can still be made in the current second. It is decremented for every call that is made and refilled every second.
import time
from multiprocessing import Process, Value
# Naive TPS regulation
# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:
def __init__(self, expected_tps):
self.number_of_tokens = Value('i', 0)
self.expected_tps = expected_tps
self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket
def refill_bucket_per_second(self):
while True:
print("refill")
self.refill_bucket()
time.sleep(1)
def refill_bucket(self):
self.number_of_tokens.value = self.expected_tps
print('bucket count after refill', self.number_of_tokens)
def start(self):
self.bucket_refresh_process.start()
def stop(self):
self.bucket_refresh_process.kill()
def get_token(self):
response = False
if self.number_of_tokens.value > 0:
with self.number_of_tokens.get_lock():
if self.number_of_tokens.value > 0:
self.number_of_tokens.value -= 1
response = True
return response
def test():
tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
tps_bucket.start()
total_number_of_requests = 60 ## Let's say I want to send 60 requests
request_number = 0
t0 = time.time()
while True:
if tps_bucket.get_token():
request_number += 1
print('Request', request_number) ## This is my request
if request_number == total_number_of_requests:
break
print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
tps_bucket.stop()
if __name__ == "__main__":
test()