I am trying to simulate an environment with vms and trying to run an object method in background thread. My code looks like the following.
hyper_v.py file :
import random
from threading import Thread
from virtual_machine import VirtualMachine
class HyperV(object):
def __init__(self, hyperv_name):
self.hyperv_name = hyperv_name
self.vms_created = {}
def create_vm(self, vm_name):
if vm_name not in self.vms_created:
vm1 = VirtualMachine({'vm_name': vm_name})
self.vms_created[vm_name] = vm1
vm1.boot()
else:
print('VM:', vm_name, 'already exists')
def get_vm_stats(self, vm_name):
print('vm stats of ', vm_name)
print(self.vms_created[vm_name].get_values())
if __name__ == '__main__':
hv = HyperV('temp')
vm_name = 'test-vm'
hv.create_vm(vm_name)
print('getting vm stats')
th2 = Thread(name='vm1_stats', target=hv.get_vm_stats(vm_name) )
th2.start()
virtual_machine.py file in the same directory:
import random, time, uuid, json
from threading import Thread
class VirtualMachine(object):
def __init__(self, interval = 2, *args, **kwargs):
self.vm_id = str(uuid.uuid4())
#self.vm_name = kwargs['vm_name']
self.cpu_percentage = 0
self.ram_percentage = 0
self.disk_percentage = 0
self.interval = interval
def boot(self):
print('Bootingup', self.vm_id)
th = Thread(name='vm1', target=self.update() )
th.daemon = True #Setting the thread as daemon thread to run in background
print(th.isDaemon()) #This prints true
th.start()
def update(self):
# This method needs to run in the background simulating an actual vm with changing values.
i = 0
while(i < 5 ): #Added counter for debugging, ideally this would be while(True)
i+=1
time.sleep(self.interval)
print('updating', self.vm_id)
self.cpu_percentage = round(random.uniform(0,100),2)
self.ram_percentage = round(random.uniform(0,100),2)
self.disk_percentage = round(random.uniform(0,100),2)
def get_values(self):
return_json = {'cpu_percentage': self.cpu_percentage,
'ram_percentage': self.ram_percentage,
'disk_percentage': self.disk_percentage}
return json.dumps(return_json)
The idea is to create a thread that keeps on updating the values and on request, we read the values of the vm object by calling the vm_obj.get_values() we would be creating multiple vm_objects to simulate multiple vms running in parallel and we need to get the information from a particular vm on request.
The problem, that I am facing, is that the update() function of the vm doesnot run in the background (even though the thread is set as daemon thread).
The method call hv.get_vm_stats(vm_name) waits until the completion of vm_object.update() (which is called by vm_object.boot()) and then prints the stats. I would like to get the stats of the vm on request by keeping the vm_object.update() running in the background forever.
Please share your thoughts if I am overlooking anything related to the basics. I tried looking into the issues related to the python threading library but I could not come to any conclusion. Any help is greatly appreciated. The next steps would be to have a REST api to call these functions to get the data of any vm but I am struck with this problem.
Thanks in advance,
As pointed out by #Klaus D in the comments, my mistake was using the braces when specifying the target function in the thread definition, which resulted in the function being called right away.
target=self.update() will call the method right away. Remove the () to
hand the method over to the thread without calling it.
Related
I am pretty new to Python and have a question about threading.
I have one function that is called pretty often. This function starts another function in a new Thread.
def calledOften(id):
t = threading.Thread(target=doit, args=(id))
t.start()
def doit(arg):
while true:
#Long running function that is using arg
When calledOften is called everytime a new Thread is created. My goal is to always terminate the last running thread --> At all times there should be only one running doit() Function.
What I tried:
How to stop a looping thread in Python?
def calledOften(id):
t = threading.Thread(target=doit, args=(id,))
t.start()
time.sleep(5)
t.do_run = False
This code (with a modified doit Function) worked for me to stop the thread after 5 seconds.
but i can not call t.do_run = False before I start the new thread... Thats pretty obvious because it is not defined...
Does somebody know how to stop the last running thread and start a new one?
Thank you ;)
I think you can decide when to terminate the execution of a thread from inside the thread by yourself. That should not be creating any problems for you. You can think of a Threading manager approach - something like below
import threading
class DoIt(threading.Thread):
def __init__(self, id, stop_flag):
super().__init__()
self.id = id
self.stop_flag = stop_flag
def run(self):
while not self.stop_flag():
pass # do something
class CalledOftenManager:
__stop_run = False
__instance = None
def _stop_flag(self):
return CalledOftenManager.__stop_run
def calledOften(self, id):
if CalledOftenManager.__instance is not None:
CalledOftenManager.__stop_run = True
while CalledOftenManager.__instance.isAlive():
pass # wait for the thread to terminate
CalledOftenManager.__stop_run = False
CalledOftenManager.__instance = DoIt(id, CalledOftenManager._stop_flag)
CalledOftenManager.__instance.start()
# Call Manager always
CalledOftenManager.calledOften(1)
CalledOftenManager.calledOften(2)
CalledOftenManager.calledOften(3)
Now, what I tried here is to make a controller for calling the thread DoIt. Its one approach to achieve what you need.
I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose
Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!
I've come across an unusual problem in regards to updating variables. I've built a simple class object to help me with some network sniffing. I wanted to make a parallel process which allows me to run some network tests and capture the traffic generated using python so I can extend the program to do amazing things. I'm using scapy's sniffing function to help with the interface sniffing.
Scapy's sniffer allows you to pass a function into itself function that allows you to create a 'stop sniffing' condition. In my case I've created function stop_filter and I wish to stop the Scapy sniff function by simply updating the self.stop_sniffing instance variable. I've presented the program output below, which shows self.stop_sniffing getting set to True in Function stop, but is then set back to False (or is not updated at all) when printed in stop_filter. I have no clue why this is happening and no solution comes to mind as it's such a weird problem.
If anyone with fresh eyes can see what insane thing I've done here it would be greatly appreciated!
from scapy.all import *
from multiprocessing import Process
class DatasetSniffer:
def __init__(self, iface, local_dir='.'):
self.iface = iface
self.master = None
self.local_dir = local_dir
self.stop_sniffing = False # Never updates! why!?
self.writer = PcapWriter(local_dir+"/master.pcap", append=True, sync=True)
def stop_filter(self, p):
# Note: 'p' gets passed in by Scapy function 'sniff'
print self.stop_sniffing
# Return 'True' to stop sniffer
return self.stop_sniffing
def sniff(self):
sniff(store=0, prn=self.writer.write, iface=self.iface, stop_filter=self.stop_filter)
def start(self):
self.master = Process(target=self.sniff)
self.master.start()
def stop(self):
self.stop_sniffing = True
# Shows that self.stop_sniffing is 'True'
print self.stop_sniffing
self.master.join()
if __name__ == "__main__":
interface = 'en3'
sniffer = DatasetSniffer(interface)
sniffer.start()
# some process
time.sleep(5)
sniffer.stop()
Shell output:
sudo python sniffing.py
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
True
False
False
False
False
The Problem
You are not using multiple threads in this example code you are using multiple processes.
Here you have two separate processes, that do not share memory:
the original process
a new process, started by multiprocessing.Process.start
this process will have been started by forking the original process, creating a copy of its memory at the time of the fork. They do not "share" memory.
Now, when you call DatasetSniffer.stop within your original process, this will not alter the value of stop_sniffing in the new ("master") process.
How to Communicate Then?
When using multiprocessing, you can communicate using a Pipe. Something like this:
readable_pipe, writable_pipe = multiprocessing.Pipe(duplex=False)
process = Process(target=do_something)
Now, our original process can send a message by writing to the pipe:
writable_pipe.send("stop")
while the new process can check for messages using:
if readable_pipe.poll():
msg = readable_pipe.recv()
Try working this into your code.
Thanks for all your suggestions. After a glass of inspiration I managed to knock up this script. Probably a nicer way to approach my problem without making too many changes. So this code allows the threads to use the stop function outside the class, thus allowing all the asynchronous tasks to use the stop_filter.
Found this information in the link below. Hopfully this post will be useful to someone else!
http://www.tutorialspoint.com/python/python_multithreading.htm
Cheers!
import threading
from scapy.all import *
from datetime import datetime
directory = str(datetime.now().strftime("%Y%m%d%H%M%S"))
os.makedirs(directory)
DatasetSnifferExit = 0
class DatasetSniffer(threading.Thread):
def __init__(self, iface, local_dir='.', filename=str(datetime.now())):
self.iface = iface
self.filename = filename
self.local_dir = local_dir
self.stop_sniffing = False
self.writer = PcapWriter(local_dir+"/"+filename+".pcap", append=True, sync=True)
threading.Thread.__init__(self)
def run(self):
sniff_interface(self.writer.write, self.iface)
def stop_filter(p):
if DatasetSnifferExit:
return True
else:
return False
def sniff_interface(write, iface):
sniff(store=0, prn=write, iface=iface, stop_filter=stop_filter)
if __name__ == "__main__":
DatasetSnifferExit = False
# Create new threads
pcap1 = DatasetSniffer('en3', directory, "master")
pcap2 = DatasetSniffer('en0', directory, "slave")
# Start new Threads
pcap1.start()
pcap2.start()
# Do stuff
time.sleep(10)
# Finished doing stuff
DatasetSnifferExit = True
Note: I want to implement this without using any framework.
I have to create an web application using python. The application should maintain a running average of the CPU usage for each process over the past 60 seconds. It should should act as a web server and when it gets a request, it should return the current average for each process. Following are the scripts I've written. record_usage.py is a script which I want to run as soon as the server.py is run. So that it runs and maintain the cpu usage data, which I intend to read whenever I get an XHR request and send it back to the client.
So, my problem is how do I invoke this requirement? I tried running record_usage.py using subprocess.POPEN after starting the server. record_usage.py starts running in background as well. But when I try accessing the data created by it, the class object I create is not the one it uses but a new one. How to complete this link?
Kindly ask things that I could not make clear.
Latest changes in server.py
if __name__ == '__main__':
RU_OBJ = RU(settings.SAMPLING_FREQ, settings.AVG_INTERVAL)
RU_LOCK = RLock()
# Record CPU usage in a thread.
ru_thread = Thread(target=RU_OBJ.record, args=(RU_LOCK,))
ru_thread.daemon = True
ru_thread.start()
# Run server.
run()
Latest change in record_usage.py
def record(self, lock):
while True:
with lock:
self.add_processes()
time.sleep(self.sampling_freq)
Is this a proper way of applying locks? A similar lock is being applied when am reading the processes information. Would it work?
Added the functions:
def add_processes(self,):
for _process in psutil.process_iter():
try:
new_proc = _process.as_dict(attrs=['cpu_times', 'name', 'pid',
'status'])
except psutil.NoSuchProcess:
continue
pid, (user, _sys) = new_proc['pid'], new_proc.pop('cpu_times')
# Get or create details object for the process.
existing = self.processes.setdefault(pid, new_proc)
# Get or create queue object for the CPU times of the process.
queue_dict = self.process_queue.setdefault(pid, dict())
# User CPU time.
user_q = queue_dict.setdefault('user_q', PekableQueue(self.avg_interval))
user_q.enqueue(user)
user_avg = get_avg(user_q)
# System CPU time.
sys_q = queue_dict.setdefault('sys_q', PekableQueue(self.avg_interval))
sys_q.enqueue(_sys)
sys_avg = get_avg(sys_q)
# Update the details object for the process.
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
def get_curr_processes(self):
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
To collect statistics in another thread:
if __name__ == '__main__':
from threading import Thread, Lock
import record_usage
lock = Lock()
t = Thread(target=record_usage.record, args=[lock])
t.daemon = True
t.start()
run(lock)
If you change some shared data in one thread and read it in another then you could protect the places where you access/change the value with a lock:
#...
with self.lock:
existing = self.processes.setdefault(pid, new_proc)
#...
with self.lock:
existing.update(user_avg=user_avg, sys_avg=sys_avg, **new_proc)
#...
def get_curr_processes(self):
with self.lock:
return [self.processes[pid] for pid in psutil.get_pid_list()
if pid in self.processes]
It is essential that self.lock is the same object in all threads. If self.processes is a dict then you don't need to use a lock in CPython. The methods are implemented in C and the interpreter doesn't release GIL (global lock) while calling them i.e., only one thread at a time accesses the dict.
I've got an event-driven chatbot and I'm trying to implement spam protection. I want to silence a user who is behaving badly for a period of time, without blocking the rest of the application.
Here's what doesn't work:
if user_behaving_badly():
ban( user )
time.sleep( penalty_duration ) # Bad! Blocks the entire application!
unban( user )
Ideally, if user_behaving_badly() is true, I want to start a new thread which does nothing but ban the user, then sleep for a while, unban the user, and then the thread disappears.
According to this I can accomplish my goal using the following:
if user_behaving_badly():
thread.start_new_thread( banSleepUnban, ( user, penalty ) )
"Simple" is usually an indicator of "good", and this is pretty simple, but everything I've heard about threads has said that they can bite you in unexpected ways. My question is: Is there a better way than this to run a simple delay loop without blocking the rest of the application?
instead of starting a thread for each ban, put the bans in a priority queue and have a single thread do the sleeping and unbanning
this code keeps two structures a heapq that allows it to quickly find the soonest ban to expire and a dict to make it possible to quickly check if a user is banned by name
import time
import threading
import heapq
class Bans():
def __init__(self):
self.lock = threading.Lock()
self.event = threading.Event()
self.heap = []
self.dict = {}
self.thread = threading.thread(target=self.expiration_thread)
self.thread.setDaemon(True)
self.thread.start()
def ban_user(self, name, duration):
with self.lock:
now = time.time()
expiration = (now+duration)
heapq.heappush(self.heap, (expiration, user))
self.dict[user] = expiration
self.event.set()
def is_user_banned(self, user):
with self.lock:
now = time.time()
return self.dict.get(user, None) > now
def expiration_thread(self):
while True:
self.event.wait()
with self.lock:
next, user = self.heap[0]
now = time.time()
duration = next-now
if duration > 0:
time.sleep(duration)
with self.lock:
if self.heap[0][0] = next:
heapq.heappop(self.heap)
del self.dict(user)
if not self.heap:
self.event.clear()
and is used like this:
B = Bans()
B.ban_user("phil", 30.0)
B.is_user_banned("phil")
Use a threading timer object, like this:
t = threading.Timer(30.0, unban)
t.start() # after 30 seconds, unban will be run
Then only unban is run in the thread.
Why thread at all?
do_something(user):
if(good_user(user)):
# do it
else
# don't
good_user():
if(is_user_baned(user)):
if(past_time_since_ban(user)):
user_good_user(user)
elif(is_user_bad()):
ban_user()
ban_user(user):
# add a user/start time to a hash
is_user_banned()
# check hash
# could check if expired now too, or do it seperately if you care about it
is_user_bad()
# check params or set more values in a hash
This is language agnostic, but consider a thread to keep track of stuff. The thread keeps a data structure that has something like "username" and "banned_until" in a table. The thread is always running in the background checking the table, if banned_until is expired, it unblocks the user. Other threads go on normally.
If you're using a GUI,
most GUI modules have a timer function which can abstract all the yuck multithreading stuff,
and execute code after a given time,
though still allowing the rest of the code to be executed.
For instance, Tkinter has the 'after' function.