This is crazy! I heard Python threads are slow but this is beyond normal.
Here is the pseudo-code:
class ReadThread:
v = []
def __init__(self, threaded = True):
self.v = MySocket('127.0.0.1')
if threaded:
thread.start_new_thread(self._scan, ())
def read(self):
t0 = datetime.now()
self.v.read('SomeVariable')
t = datetime.now()
dt = (t-t0).total_seconds()
print dt
def _scan(self):
while True:
self.read()
If I run the read() in a while loop in the main-thread as this:
r = ReadThread(threaded = False)
while True:
r.read()
dt is about 78 ms with small variation. Now if I run it in a new thread like this:
r = ReadThread(threaded = True)
while True:
pass
dt is about 130 ms with +-10ms variance!
Why is it so slow? Am I doing something really wrong? It's the same thing just in a new thread!
MySocket() is is an object that uses a socket to read/write variables to a server and read() just gets some variable for the test.
It is hard to reproduce this problem locally without knowing what MySocket is, and the full example. However, I can try guessing, that the problem is this cycle:
while True:
pass
It is VERY CPU-consuming. The CPU literally goes around all the time, taking the CPU cycles to itself, and not letting the socket to work.
Contrary, the socket read operations are usually blocking and idling for the data to arrive, so they consume almost no CPU.
In the first example, you run your socket while nothing else eats CPU. In the second example, the main thread consumes 1 CPU completely.
Try replacing this cycle with a usual idling operation, e.g. time.sleep(60). So the main thread will idle for 60s while the socket thread will read and process data.
r = ReadThread(threaded = True)
time.sleep(60)
What will be the measuring in that case then?
Related
This code is supposed to control a servo from stdin
import asyncio
import sys
import threading
from multiprocessing import Process
async def connect_stdin_stdout():
loop = asyncio.get_event_loop()
reader = asyncio.StreamReader()
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
w_transport, w_protocol = await loop.connect_write_pipe(asyncio.streams.FlowControlMixin, sys.stdout)
writer = asyncio.StreamWriter(w_transport, w_protocol, reader, loop)
return reader, writer
servo0ang = 90
async def main():
reader, writer = await connect_stdin_stdout()
while True:
res = await reader.read(100)
if not res:
break
servo0ang = int(res)
# Main program logic follows:
def runAsync():
asyncio.run(main())
def servoLoop():
pwm = Servo()
while True:
pwm.setServoPwm('0', servo0ang)
if __name__ =="__main__":
p = Process(target = servoLoop)
p.start()
runAsync()
p.join()
When i run it the async function starts but servoLoop doesn't
It was supposed to turn the servo to the angle specified in stdin. I'm a bit rusty at Python.
The Servo class is from an example program that came with the robot I'm working with and it works there
So, as I said in comment, you are not sharing servo0ang. You have two processes. Each of them has its own variables. They have the same name, and the same initial value, as in a fork in other languages. Because the new process starts as a copy of the main one. But they are just 2 different python running, with almost nothing to do which each other (one if the parent of the other, so can join it).
If you need to share data, you have either to send them through pipes connecting the two processes. Or by creating a shared memory that both process will access to (it seems easy. And in python, it is quite easy. But it is also easy to have some inefficient polling systems, like yours seems to be, with its infinite loop trying to poll values of servo0ang as fast as it can to not miss any change. In reality, very often, it would be a better idea to wait on pipes. But well, I won't discuss the principles of your project. Just how to do what you want to do, not whether it is a good idea or not).
In python, you have, in the multiprocessing module a Value class that creates memory that can then be shared among processes of the same machine (with Manager you could even share value among processes of different machines, but that is slower)
from multiprocessing import Process, Value
import time # I don't like infinite loop without sleep
v=Value('i',90) # Creates an integer, with initial value of 90 in shared memory
x=90 # Just a normal integer, by comparison
print(v.value,x) # read it
v.value=80 # Modify it
x=80
def f(v):
global x
while True:
time.sleep(1)
v.value = (v.value+1)%360
x = (x+1)%360
p=Process(target=f, args=(v,))
p.start()
while True:
print("New val", v.value, x)
time.sleep(5)
As you see, the value in the main loop increases approx. 5 at each loop. Because the process running f increased it by 1 5 times in the meantime.
But x in that same loop doesn't change. Because it is only the x of the process that runs f (the same global x, but different process. It is as you were running the same program, twice, into two different windows) that changes.
Now, applied to your code
import asyncio
import sys
import threading
import time
from multiprocessing import Process, Value
async def connect_stdin_stdout():
loop = asyncio.get_event_loop()
reader = asyncio.StreamReader()
protocol = asyncio.StreamReaderProtocol(reader)
await loop.connect_read_pipe(lambda: protocol, sys.stdin)
w_transport, w_protocol = await loop.connect_write_pipe(asyncio.streams.FlowControlMixin, sys.stdout)
writer = asyncio.StreamWriter(w_transport, w_protocol, reader, loop)
return reader, writer
servo0ang = Value('i', 90)
async def main():
reader, writer = await connect_stdin_stdout()
while True:
res = await reader.read(100)
if not res:
break
servo0ang.value = int(res)
# Main program logic follows:
def runAsync():
asyncio.run(main())
class Servo:
def setServoPwm(self, s, ang):
time.sleep(1)
print(f'\033[31m{ang=}\033[m')
def servoLoop():
pwm = Servo()
while True:
pwm.setServoPwm('0', servo0ang.value)
if __name__ =="__main__":
p = Process(target = servoLoop)
p.start()
runAsync()
p.join()
I used a dummy Servo class that just prints in red the servo0ang value.
Note that I've change nothing in your code, outside that.
Which means, that, no, asyncio.run was not blocking the other process. I still agree with comments you had, on the fact that it is never great to combine both asyncio and processes. Here, you have no other concurrent IO, so your async/await is roughly equivalent to a good old while True: servo0ang.value=int(input()). It is not like your input could yield to something else. There is nothing else, at least not in this process (if your two processes were communicating through a pipe, that would be different)
But, well how ever vainly convoluted your code may be, it works, and asyncio.run is not blocking the other process. It is just that the other process was endlessly calling setPwm with the same, constant, 90 value, that could never change, since that process was doing nothing else with this variable than calling setPwm with. It was doing nothing to try do grab a new value from the main process.
With Value shared memory, there is nothing to do neither. But this time, since it is shared memory, it is less vain to expect that the value changes when nothing changes it in the process.
I have a memory greedy script and I dont want to freeze my computer while running it.
I need to pause excution whenever the memory usage exceeds a limit lets say 60% and then get back where it left. I think this is not a very programming practice.
lets say I have a code:
while True:
do this stuff
and this other stuff
and this other ...
...
The only solution I know is to plague the code with many:
while psutil.virtual_memory().percent >memory_limit:
time.sleep(30)
between lines, like this:
while True:
while psutil.virtual_memory().percent >memory_limit:
time.sleep(30)
do this stuff
while psutil.virtual_memory().percent >memory_limit:
time.sleep(30)
and this other stuff
while psutil.virtual_memory().percent >memory_limit:
time.sleep(30)
and this other ...
while psutil.virtual_memory().percent >memory_limit:
time.sleep(30)
...
what is not a nice code, the most aproximate solution to that would be
while not psutil.virtual_memory().percent >memory_limit:
do this stuff
and this other stuff
and this other ...
...
else:
time.sleep(30)
but this restarts the excution instead of resuming.
Why python doesn't have an integrated whenever to do this.
You could use multi-threading. Have a function with a while loop (add a sleep(1) or something in there so it doesn't run too much) and then just put that on a separate thread to the main program.
Threading basics: https://realpython.com/intro-to-python-threading/
It could look like this:
def do_stuff():
do_stuff
def check():
check
threads = []
t = threading.Thread(target=check())
t.daemon = True
threads.append(t)
#You could do the same for the other function (I wont include that as it is just the same thing)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
More information is needed.
Usually the only thing that will consume loads of memory is either a memory leak, or large data sets. Memory leaks can be fixed and large data sets can be handled in different ways to reduce the amount of memory being used. Pausing the program is not going to reduce how much memory it is consuming.
If you can provide the actual code being used it would be extremely helpful in order to come up with the best solution.
---EDIT---
You probably have some kind of loop that goes over the data you are reading. I do not know what size chunks of data you are dealing with and what the actual data is being read from (local file, network data....) so that makes it harder to give the proper solution.
Here are some examples of things you can do. You will have to alter them in order to use them in your code.
import threading
import psutil
memory_event = threading.Event()
MEMORY_LIMIT = 60.0
def check_memory():
memory = psutil.virtual_memory()
percent = ((memory.total - memory.available) / memory.total) * 100.0
while percent > MEMORY_LIMIT:
memory_event.wait(10)
memory = psutil.virtual_memory()
percent = ((memory.total - memory.available) / memory.total) * 100.0
data = open('some_path_to_data', 'r')
while data:
check_memory()
# no need to call check memory multiple times because the data that
# is loaded only gets pulled into memory when readline gets called
line = data.readline()
Another option is to use threads. This is a more complex way to go about it but it is probably a better way as it will be able to check the memory use every 100 milliseconds
import threading
import psutil
memory_event = threading.Event()
exit_event = threading.Event()
main_thread_stall = threading.Event()
MEMORY_LIMIT = 60.0
def check_memory_thread():
while not exit_event.is_set():
memory = psutil.virtual_memory()
percent = ((memory.total - memory.available) / memory.total) * 100.0
if percent > MEMORY_LIMIT:
memory_event.clear()
else:
memory_event.set()
# a 0.1 second wait doesn't cause excessive cpu usage
exit_event.wait(0.1)
def data_parser_thread():
# wait until there is enough memory available
memory_event.wait()
while not exit_event.is_set():
with open('some_path_to_data', 'r') as f:
data = f.read()
# do work on data
# finished parsing data so exit the program
if data == '':
exit_event.set()
break
# wait until there is enough memory available
memory_event.wait()
try:
# join the memory thread to make sure it exits properly
memory_thread.join()
except threading.ThreadError:
pass
# stops the main thread from waiting.
main_thread_stall.set()
memory_thread = threading.Thread(target=check_memory_thread)
data_thread = threading.Thread(target=check_memory_thread)
memory_thread.start()
data_thread.start()
try:
main_thread_stall.wait()
except KeyboardInterrupt:
exit_event.set()
try:
# join the memory thread to make sure it exits properly
memory_thread.join()
except threading.ThreadError:
pass
memory_event.clear()
# make sure the data thread has exited before terminating the program
try:
# join the data thread to make sure it exits properly
data_thread.join()
except threading.ThreadError:
pass
NOTE: Both of the code examples are pseudo code and have not been tested. They are for example only and they will have to be modified and tested for your specific application. Without seeing the actual code you are using I am not going to be able to provide a specific way to solve your problem.
I have a simple code in python 3 using schedule and socket:
import schedule
import socket
from time import sleep
def readDataFromFile():
data = []
with open("/tmp/tmp.txt", "r") as f:
for singleLine in f.readlines():
data.append(str(singleLine))
if(len(data)>0):
writeToBuffer(data)
def readDataFromUDP():
udpData = []
rcvData, addr = sock.recvfrom(256)
udpData.append(rcvData.decode('ascii'))
if(len(udpData)>0):
writeToBuffer(udpData)
.
.
.
def main():
schedule.every().second.do(readDataFromFile)
schedule.every().second.do(readDataFromUDP)
while(1):
schedule.run_pending()
sleep(1)
UDP_IP = "192.xxx.xxx.xxx"
UDP_PORT = xxxx
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))
main()
The problem is, script hung up on the sock.rcvfrom() instruction, and wait until data come.
How force python to run this job independently? Better idea is to run this in threads?
You can use threads here, and it'll work fine, but it will require a few changes. First, the scheduler on your background thread is going to try to kick off a new recvfrom every second, no matter how long the last one took. Second, since both threads are apparently trying to call the same writeToBuffer function, you're probably going to need a Lock or something else to synchronize them.
Rewriting the whole program around an asynchronous event loop is almost certainly overkill here.
Just changing the socket to be nonblocking and doing a hybrid is probably the simplest change, e.g., by using settimeout:
# wherever you create your socket
sock.settimeout(0.8)
# ...
def readDataFromUDP():
udpData = []
try:
rcvData, addr = sock.recvfrom(256)
except socket.timeout:
return
udpData.append(rcvData.decode('ascii'))
if(len(udpData)>0):
writeToBuffer(udpData)
Now, every time you call recvfrom, if there's data available, you'll handle it immediately; if not, it'll wait up to 0.8 seconds, and then raise an exception, which means you have no data to process, so go back and wait for the next loop. (There's nothing magical about that 0.8; I just figured something a little less than 1 second would be a good idea, so there's time left to do all the other work before the next schedule time hits.)
Under the covers, this works by setting the OS-level socket to non-blocking mode and doing some implementation-specific thing to wait with a timeout. You could do the same yourself by using setblocking(False) and using the select or selectors module to wait up to 0.8 seconds for the socket to be ready, but it's easier to just let Python take care of that for you.
I have a problem in python where I want to run two loops at the same time. I feel like I need to do this because the second loop needs to be rate limited, but the first loop really shouldn't be rate limited. Also, the second loop takes an input from the first.
I'm looking fro something that works something like this:
for line in file:
do some stuff
list = []
list.append("an_item")
Rate limited:
for x in list:
do some stuff simultaneously
There are two basic approaches with different tradeoffs: synchronously switching between tasks, and running in threads or subprocesses. First, some common setup:
from queue import Queue # or Queue, if python 2
work = Queue()
def fast_task():
""" Do the fast thing """
if done:
return None
else:
return result
def slow_task(arg):
""" Do the slow thing """
RATE_LIMIT = 30 # seconds
Now, the synchronous approach. It has the advantage of being much simpler, and easier to debug, at the cost of being a bit slower. How much slower depends on the details of your tasks. How it works is, we run a tight loop that calls the fast job every time, and the slow job only if enough time has passed. If the fast job is no longer producing work and the queue is empty, we quit.
import time
last_call = 0
while True:
next_job = fast_task()
if next_job:
work.put(next_job)
elif work.empty():
# nothing left to do
break
else:
# fast task has done all its work - short sleep to slow the spin
time.sleep(.1)
now = time.time()
if now - last_call > RATE_LIMIT:
last_call = now
slow_task(work.get())
If you feel like this doesn't work fast enough, you can try the multiprocessing approach. You can use the same structure for working with threads or processes, depending on whether you import from multiprocessing.dummy or multiprocessing itself. We use a multiprocessing.Queue for communication instead of queue.Queue.
def do_the_fast_loop(work_queue):
while True:
next_job = fast_task()
if next_job:
work_queue.put(next_job)
else:
work_queue.put(None) # sentinel - tells slow process to quit
break
def do_the_slow_loop(work_queue):
next_call = time.time()
while True:
job = work_queue.get()
if job is None: # sentinel seen - no more work to do
break
time.sleep(max(0, next_call - time.time()))
next_call = time.time() + RATE_LIMIT
slow_task(job)
if __name__ == '__main__':
# from multiprocessing.dummy import Queue, Process # for threads
from multiprocessing import Queue, Process # for processes
work = Queue()
fast = Process(target=fast_task, args=(work,))
slow = Process(target=slow_task, args=(work,))
fast.start()
slow.start()
fast.join()
slow.join()
As you can see, there's quite a lot more machinery for you to implement, but it will be somewhat faster. Again, how much faster depends a lot on your tasks. I'd try all three approaches - synchronous, threaded, and multiprocess - and see which you like best.
You need to do 2 things:
Put the function require data from the other on its own process
Implement a way to communicate between the two processes (e.g. Queue)
All of this must be done thanks to the GIL.
I have code that reads data from 7 devices every second for an infinite amount of time. Each loop, a thread is created which starts 7 processes. After each process is done the program waits 1 second and starts again. Here is a snippet the code:
def all_thread(): #function that handels the threading
thread = threading.Thread(target=all_process) #prepares a thread for the devices
thread.start() #starts a thread for the devices
def all_process(): #function that prepares and runs processes
processes = [] #empty list for the processes to be stored
while len(gas_list) > 0: #this gaslist holds the connection information for my devices
for sen in gas_list: #for each sen(sensor) in the gas list
proc = multiprocessing.Process(target=main_reader, args=(sen, q)) #declaring a process variable that sends the gas object, value and queue information to reading function
processes.append(proc) #adding the process to the processes list
proc.start() #start the process
for sen in processes: #for each sensor in the processes list
sen.join() #wait for all the processes to complete before starting again
time.sleep(1) #wait one second
However, this uses 100% of my CPU. Is this by design of threading and multiprocessing or just bad coding? Is there a way I can limit the CPU usage? Thanks!
Update:
The comments were mentioning the main_reader() function so I will put it into the question. All it does is read each device, takes all the data and appends it to a list. Then the list is put into a queue to be displayed in the tkinter GUI.
def main_reader(data, q): #this function reads the device which takes less than a second
output_list = get_registry(data) #this function takes the device information, reads the registry and returns a list of data
q.put(output_list) #put the output list into the queue
As you state in the comments, your main_reader takes only a fraction of a second to run, which means process creation overhead might cause your problem.
Here is an example with multiprocessing.Pool. This creates a pool of workers and submits your tasks to them. Processes are started only once and never shut down or joined if this is meant to be an infinite loop. If you want to shut your pool down, you can do so by joining and closing it (see documentation for that).
from multiprocessing import Pool, Manager
from time import sleep
import threading
from random import random
gas_list = [1,2,3,4,5,6,7,8,9,10]
def main_reader(sen, rqu):
output = "%d/%f" % (sen, random())
rqu.put(output)
def all_processes(rq):
p = Pool(len(gas_list) + 1)
while True:
for sen in gas_list:
p.apply_async(main_reader, args=(sen, rq))
sleep(1)
m = Manager()
q = m.Queue()
t = threading.Thread(target=all_processes, args=(q,))
t.daemon = True
t.start()
while True:
r = q.get()
print r
If this does not help, you need to start digging deeper. I would first increase the sleep in your infinite loop to 10 seconds or even longer. This would allow you to monitor the behaviour of your program. If CPU peaks for a moment and then settles down for 10 seconds or so, you know the problem is in your main_reader. If it is still 100%, your problem must be elsewhere.
Is it possible your problem is not in this part of your program at all? You seem to launch this all in a thread, which indicates your main program is doing something else. Can it be this something else that peaks the CPU?