Why is urllib.request.urlopen blocking in this case?

Why is urllib.request.urlopen blocking in this case? - python

In the following code
def sendPostRequest():
request = urllib.request.Request(myURL, myBody, myHeaders)
print("created POST request", request)
response = urllib.request.urlopen(request)
print("finished POST", response)
for i in range(5):
t = threading.Thread(target=sendPostRequest)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
, the line print("finished POST", response) is never reached, while I can observe in the server logs that the request arrived successfully. The line print("created POST request", request) is reached however.
Why is this the case?

The code makes thread daemon threads.
According to threading documentation:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left. The initial value is inherited from the creating thread. The
flag can be set through the daemon property or the daemon constructor
argument.
The program maybe end before the response is returned from the server.
Instead of using daemon thread, use non-daemon thread, or explicitly wait the threads to finish started using Thread.join.
threads = []
for i in range(5):
t = threading.Thread(target=sendPostRequest)
t.start()
threads.append(t)
for t in threads:
t.join()

Related

How to manually start and stop a thread running blocking code within a Python asyncio event loop?

I have an app that runs in asyncio loop. I wish to start a thread which will execute a piece of blocking code:
def run(self)->bool:
self._running = True
while self._running:
#MOVE FORWORD / BACKWORDS
if kb.is_pressed('x'):
self.move_car("forward")
elif kb.is_pressed('w'):
self.move_car("backward")
return True
until I decide to stop it and manually set the self._running = False by calling:
def stop(self):
self._running = False
These are both methods of a class controlling the whole operation of a raspberry pi robot-car I made.
I want this to run on a separate thread so that my main application can still listen to my input and stop the thread while the other thread is running in this while loop you can see above.
How can I achieve that?Note For sending the start and stop signal I use http requests but this does not affect the core of my question.

You can run the code of your blocking function run inside the default executor of the loop. This is documented here Executing code in thread or process pools.
async def main():
# asumming you have a class `Interface`
# that contains `run` and an async method `listen_for_stop`.
loop = asyncio.get_running_loop()
inter = Interface()
run_task = loop.run_in_executor(None, inter.run)
results = await asyncio.gather(run_task, inter.listen_for_stop())
With asyncio.gather you await for the execution of the two tasks concurrently.
Also you should check Running Tasks Concurrently.

python threads and Queue messages between them

I have a program that contains two parts, they work at the same time using threads ,they communicate using a Queue.
import kafka
import time
from queue import Queue
from threading import Thread
# A thread that consumes data from Kafka consumer,it will stop after 40s if there are no new messages.
def consumer(in_q):
consumer = kafka.KafkaConsumer('mytopic',bootstrap_servers=['myserver'],enable_auto_commit=True,group_id='30',auto_offset_reset='earliest',consumer_timeout_ms=40000)
for message in consumer:
messageStr=message.value.decode("utf-8")
in_q.put(messageStr)
print(messageStr)
print(message.offset)
print("consumer is closed ")
# A thread that modify data
def modifier (out_q):
while True:
if(out_q.empty()==False):
data=out_q.get()
print('data after some modification',data)
# Create the shared queue and launch both threads
message = Queue()
consumeMessgae = Thread(target = consumer, args =(message, ))
modifyMessage = Thread(target = modifier , args =(message, ))
consumeMessgae.start()
modifyMessage.start()
I want to update my modifier function to be able :
change the while loop because it is CPU consuming and instead keep listening to the Queue
I want to be able to close the modifier function when the consumer thread is closed (consumer function will automatically close after 40s if no new messages)
how can I achieve this ?

You can achieve that using a threading.Event to notify the modifier thread to abort execution. the final code would be as follows:
import kafka
import time
from queue import Queue, Empty
from threading import Thread, Event
# A thread that consumes data from Kafka consumer,it will stop after 40s if there are no new messages.
def consumer(in_q, event):
consumer = kafka.KafkaConsumer('mytopic', bootstrap_servers=['myserver'], enable_auto_commit=True, group_id='30',
auto_offset_reset='earliest', consumer_timeout_ms=40000)
try:
for message in consumer:
messageStr = message.value.decode("utf-8")
in_q.put(messageStr)
print(messageStr)
print(message.offset)
except StopIteration:
# This exception is raised by the kafka.KafkaConsumer iterator, after be waiting 40000 without any new message:
# Notify the modifier thread, that he should abort execution:
event.set()
print("consumer is closed ")
# A thread that modify data
def modifier(out_q, event):
while True:
try:
# Block in the queue only for 1 second waiting for a pending item,
# to be able to check if the event was signaled, at most after 1 second lapsed:
data = out_q.get(timeout=1)
except Empty:
print("Queue 'out_q' is empty")
else:
# Will be executed only if there were no Exception.
# Add you additional logic here, but take care of handle any possible Exception that could be generated
# by your data processing logic:
print('data after some modification', data)
# Wait for 1 second before next while loop iteration, but it will shot-circuit before 1 second lapsed, if the
# event is signaled by the consumer thread:
if event.wait(1):
# If we're here is because event was signaled by consumer thread, so we must abort execution:
break
print("modifier is closed ")
# Create the shared queue and launch both threads
message = Queue()
# Create an Event, it will be used by the consumer thread, to notify the modifier thread that he must abort execution:
alert_event = Event()
consumeMessgae = Thread(target=consumer, args=(message, alert_event,))
modifyMessage = Thread(target=modifier, args=(message, alert_event,))
consumeMessgae.start()
modifyMessage.start()

How to kill threads that are listening to message queue elegantly

In my Python application, I have a function that consumes message from Amazon SQS FIFO queue.
def consume_msgs():
sqs = boto3.client('sqs',
region_name='us-east-1',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
print('STARTING WORKER listening on {}'.format(QUEUE_URL))
while 1:
response = sqs.receive_message(
QueueUrl=QUEUE_URL,
MaxNumberOfMessages=1,
WaitTimeSeconds=10,
)
messages = response.get('Messages', [])
for message in messages:
try:
print('{} > {}'.format(threading.currentThread().getName(), message.get('Body')))
body = json.loads(message.get('Body'))
sqs.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=message.get('ReceiptHandle'))
except Exception as e:
print('Exception in worker > ', e)
sqs.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=message.get('ReceiptHandle'))
time.sleep(10)
In order to scale up, I am using multi threading to process messages.
if __name__ == '__main__:
for i in range(3):
t = threading.Thread(target=consume_msgs, name='worker-%s' % i)
t.setDaemon(True)
t.start()
while True:
print('Waiting')
time.sleep(5)
The application runs as service. If I need to deploy new release, it has to be restarted. Is there a way have the threads exist gracefully when main process is being terminated? In stead of killing the threads abruptly, they finish with current message first and stop receiving the next messages.

Since your threads keep looping, you cannot just join them, but you need to signal them it's time to break out of the loop too in order to be able to do that. This docs hint might be useful:
Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
With that, I've put the following example together, which can hopefully help a bit:
from threading import Thread, Event
from time import sleep
def fce(ident, wrap_up_event):
cnt = 0
while True:
print(f"{ident}: {cnt}", wrap_up_event.is_set())
sleep(3)
cnt += 1
if wrap_up_event.is_set():
break
print(f"{ident}: Wrapped up")
if __name__ == '__main__':
wanna_exit = Event()
for i in range(3):
t = Thread(target=fce, args=(i, wanna_exit))
t.start()
sleep(5)
wanna_exit.set()
A single event instance is passed to fce which would just keep running endlessly, but when done with each iteration, before going back to the top check, if the event has been set to True. And before exiting from the script, we set this event to True from the controlling thread. Since the threads are no longer marked as daemon threads, we do not have to explicitly join them.
Depending on how exactly you want to shutdown your script, you will need to handle the incoming signal (SIGTERM perhaps) or KeyboardInterrupt exception for SIGINT. And perform your clean-up before exiting, the mechanics of which remain the same. Apart from not letting python just stop execution right away, you need to let your threads know they should not re-enter the loop and wait for them to be joined.
The SIGINT is a bit simpler, because it's exposed as a python exception and you could do for instance this for the "main" bit:
if __name__ == '__main__':
wanna_exit = Event()
for i in range(3):
t = Thread(target=fce, args=(i, wanna_exit))
t.start()
try:
while True:
sleep(5)
print('Waiting')
except KeyboardInterrupt:
pass
wanna_exit.set()
You can of course send SIGINT to a process with kill and not only from the controlling terminal.

RQ Timeout does not kill multi-threaded jobs

I'm having problems running multithreaded tasks using python RQ (tested on v0.5.6 and v0.6.0).
Consider the following piece of code, as a simplified version of what I'm trying to achieve:
thing.py
from threading import Thread
class MyThing(object):
def say_hello(self):
while True:
print "Hello World"
def hello_task(self):
t = Thread(target=self.say_hello)
t.daemon = True # seems like it makes no difference
t.start()
t.join()
main.py
from rq import Queue
from redis import Redis
from thing import MyThing
conn = Redis()
q = Queue(connection=conn)
q.enqueue(MyThing().say_hello, timeout=5)
When executing main.py (while rqworker is running in background), the job breaks as expected by timeout, within 5 seconds.
Problem is, when I'm setting a task containing thread/s such as MyThing().hello_task, the thread runs forever and nothing happens when the 5 seconds timeout is over.
How can I run a multithreaded task with RQ, such that the timeout will kill the task, its sons, grandsons and their wives?

When you run t.join(), the hello_task thread blocks and waits until the say_hello thread returns - thus not receiving the timeout signal from rq. You can allow the main thread to run and properly receive the timeout signal by using Thread.join with a set amount of time to wait, while waiting for the thread to finish running. Like so:
def hello_task(self):
t = Thread(target=self.say_hello)
t.start()
while t.isAlive():
t.join(1) # Block for 1 second
That way you could also catch the timeout exception and handle it, if you wish:
def hello_task(self):
t = Thread(target=self.say_hello)
t.start()
try:
while t.isAlive():
t.join(1) # Block for 1 second
except JobTimeoutException: # From rq.timeouts.JobTimeoutException
print "Thread killed due to timeout"
raise

Threading in python using queue

I wanted to use threading in python to download lot of webpages and went through the following code which uses queues in one of the website.
it puts a infinite while loop. Does each of thread run continuously with out ending till all of them are complete? Am I missing something.
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)

Setting the thread's to be daemon threads causes them to exit when the main is done. But, yes you are correct in that your threads will run continuously for as long as there is something in the queue else it will block.
The documentation explains this detail Queue docs
The python Threading documentation explains the daemon part as well.
The entire Python program exits when no alive non-daemon threads are left.
So, when the queue is emptied and the queue.join resumes when the interpreter exits the threads will then die.
EDIT: Correction on default behavior for Queue

Your script works fine for me, so I assume you are asking what is going on so you can understand it better. Yes, your subclass puts each thread in an infinite loop, waiting on something to be put in the queue. When something is found, it grabs it and does its thing. Then, the critical part, it notifies the queue that it's done with queue.task_done, and resumes waiting for another item in the queue.
While all this is going on with the worker threads, the main thread is waiting (join) until all the tasks in the queue are done, which will be when the threads have sent the queue.task_done flag the same number of times as messages in the queue . At that point the main thread finishes and exits. Since these are deamon threads, they close down too.
This is cool stuff, threads and queues. It's one of the really good parts of Python. You will hear all kinds of stuff about how threading in Python is screwed up with the GIL and such. But if you know where to use them (like in this case with network I/O), they will really speed things up for you. The general rule is if you are I/O bound, try and test threads; if you are cpu bound, threads are probably not a good idea, maybe try processes instead.
good luck,
Mike

I don't think Queue is necessary in this case. Using only Thread:
import threading, urllib2, time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, host):
threading.Thread.__init__(self)
self.host = host
def run(self):
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(self.host)
print url.read(1024)
start = time.time()
def main():
#spawn a pool of threads
for i in range(len(hosts)):
t = ThreadUrl(hosts[i])
t.start()
main()
print "Elapsed Time: %s" % (time.time() - start)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is urllib.request.urlopen blocking in this case? - python

Related

How to manually start and stop a thread running blocking code within a Python asyncio event loop?

python threads and Queue messages between them

How to kill threads that are listening to message queue elegantly

RQ Timeout does not kill multi-threaded jobs

Threading in python using queue

Categories

Resources