I am trying to parallelize the load of the data and the learning in a pytorch project. From main thread I create 2 threads, one loading the next batch while the other is learning from the current batch. The loading threading is transferring the loaded data through a Queue.
My problem is: the program suddenly stops at random state of execution without any error message (debug execution or not). Sometimes at the first epoch, sometimes after 7 epoch (50min)... Is it possible to have an error message somehow ? Does anyone have encountered that problem ?
It makes me think about memory leakage, but I check all the code about the shared data. I have also seen things about prints not being thread safe so I removed them... Please also note that the code didn't had this problem before parallelization.
I am using:
conda environement.
Threading.thread
pytorch
Windows server
Update: Apparently the pytorch code referencing cuda learning doesn't like to be called from a separate thread. If I keep my cuda learning in the main thread, it stays alive...
Code: Since I have less unexpected crashes with the cuda learning in the main thread. I only kept one thread. (it also makes more sense)
part of the main:
dataQueue = queue.Queue()
dataAvailable = threading.Event()
doneComputing = threading.Event()
# Create new thread
loadThread = LoadingThread(1,dataQueue)
#learningThread.daemon = True
loadThread.daemon=True
# Add threads to thread list
threads = []
threads.append(loadThread)
print(" >> Parrallel process start")
# Start new Thread
loadThread.start()
doneComputing.set()
# Learning process
for i_import in range(import_init['n_import_train']):
# Wait for data loaded
dataAvailable.wait()
data = dataQueue.get()
dataAvailable.clear()
# DO learning
net,net_prev,y_all,yhat_all = doLearning(data, i_import, net, net_prev, y_all, yhat_all)
doneComputing.set()
# Wait for all threads to complete
for t in threads:
t.join()
One interesting fact is that it seems that the program crashes more often if the model sent to cuda is heavy ... Could tat be only a cuda problem?
Related
I am designing a trading strategy to work with binance API as my research project. The following is the basic scheme of program
The program needs to wait to receive a line of data (kline) from the API. The time needs to be different for different pairs and at different time of the day (which is set in a separate trading logic file). In short, the waiting time is not fixed (hence I am using threading.Condition() with .wait() and .notify() functions)
Once the data is available, condition.notify() comes into action and technical analysis is performed on the line received
Because I want the program to run forever, the thread is started in a while loop
However, during my testing phase, after receiving around 12000 lines of data, the script gave me the following error
File "/usr/lib/python3.8/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
I am thinking the system ran out of memory because it is creating a new thread during every iteration of the loop. Is there a better way to make the program wait to receive data? or to better manage the thread responsible for waiting?
During the testing phase, I am reading the data from csv files line by line.
tc_kline_received = threading.Condition()
for file in list_of_files:
with open(file, "r") as csv_file:
file_reader = reader(csv_file)
for line in file_reader:
btc_busd_5_min.append(line)
The following function pops one line from the list and stores in a separate variable. It is also used to for the thread responsible for waiting
def collect_kline():
global btc_busd_5_min
global kline_btc_busd_5_min
global klines_dict
kline_btc_busd_5_min = btc_busd_5_min.pop(0)
klines_dict["btc_busd_5_min"] = kline_btc_busd_5_min
with tc_kline_received:
tc_kline_received.notify()
Below is the never ending while loop.
while True:
with tc_kline_received:
t_collect_kline = threading.Thread(target=collect_kline)
t_collect_kline.start()
tc_kline_received.wait()
t_collect_kline.join()
insert_kline_to_db(klines_dict)
create_ta_db(klines_dict)
The last two functions are to create a sqlite db with technical analysis.
I am thinking the system ran out of memory because it is creating a new thread during every iteration of the loop.
The loop creates a new thread on every iteration, but it cannot get past the t_collect_kline.join() line until the new thread is finished, so there will never be more than one "collect_kline()" thread running at any given moment in time. If your program is running out of memory, it's not because of too many threads.
Unfortunately, I don't know what half of the lines in that program actually do, so I can't say what might be using up a lot of memory.
About those threads though...
...Each time your main loop creates a new thread, it then does nothing until the thread is finished. That doesn't seem useful. The whole point of creating a new thread is that the new thread can do one thing, while the original thread does some other thing. But, your original thread does nothing else. It never makes any sense to create a new thread if the very next thing you do is wait for it to end.
You could make your program smaller, simpler, and probably faster if you'd just do this in your main loop instead:
while True:
global btc_busd_5_min
global klines_dict
klines_dict["btc_busd_5_min"] = btc_busd_5_min.pop(0)
insert_kline_to_db(klines_dict)
create_ta_db(klines_dict)
But like I said, if you've got an out-of-memory problem, then this change isn't going to fix it. It will only change when and where the out-of-memory condition gets reported.
How could we clear up the GPU memory after finishing a deep learning model training with Jupyter notebook. The problem is, no matter what framework I am sticking to (tensorflow, pytorch) the memory stored in the GPU do not get released except I kill the process manually or kill the kernel and restart the Jupyter. Do you have any idea how we can possible get rid of this problem by automating the steps?
The only walkaround I found was to use threading. Executing the Training with a subprocess.
An example:
def Training(arguments):
....
....
return model
if __name__=='__main__':
Subprocess = Process(# The complete function defined above
target = Training,
# Pass the arguments defined in the complete function
# above.
# Note - The comma after the arguments : In order for
# Python
# to understand this is a tuple
args = (arguments, ))
# Starting the defined subprocess
Subprocess.start()
# Wait for the subprocess to get completed
Subprocess.join()
I am new to python and having some problems.
I wrote an update_manager class that can communicate with the user via Tcp and preform installations of different components.
My update_manager class uses 2 other classes(they are his members) to accomplish this. The first is used for TCP communication and the second for actual installation. the installation class runs from the main thread and the communication is run by using Threading.thread() function.
my main locks like this:
if __name__ == "__main__":
new_update = UpdateManager()
#time.sleep(10)
new_update.run()
and the run functions is:
def run(self):
comm_thread = threading.Thread(target=
self._comm_agent.start_server_tcp_comunication)
comm_thread.start()
while True:
if (False == self.is_recovery_required()):
self.calculate_free_storage_for_update_zip_extraction()
self.start_communication_with_client_in_state_machine()
self._comm_agent.disable_synchronized_communication()
self.start_update_install()
self._comm_agent.enable_synchronized_communication()
if (True == self.is_dry_run_requested()):
self.preform_cleanup_after_dry_run()
else:
self.reset_valid_states()
self.preform_clean_up_after_update_cycle()
I use 2 multiprocessing.Queue() to sync between the threads and between the user. One for incoming messages and one for outgoing messages.
At first TCP communication is synchronous, user provides installation file and few other things.
Once installation begins TCP communication is no longer synchronous.
During the installation I use 4 different install methods. and all but one work just fine with no problem(user can pool the update_manager process and ask progress questions and get immediate a reply)
The problematic one is the instantiation of rpm files. for this I tried calling for os.system() and subprocess.run() and it works but for big rpm files I notices the entire process with my threads freezes until
the call finishes(I can see the progress bar of rpm installation on my screen during this freeze).
What I noticed and tried:
1.There is no freeze during other installation methods which use python.
libraries.
2.Once user connects via TCP there are only 2 threads for the update_manager, once first request is sent and a reply is send back 2 more threads appear (I assume it have something to do with the queues I use).
3.I created third thread that prints time(and has nothing to do with the queues), and I start it as soon as update_manager process starts. When the 2 threads freeze this one keeps going.
4.On some rare occasions process will unfreeze just for a message to go throw from client to update_manager and freeze back.
Edit: I forgot one more important point
5. The freeze occurs when calling:
os.system("rpm --nodeps --force -ivh rpm_file_name")
But does not happen when calling:
os.system("sleep 5")
I would really appreciate some indigent, Thanks.
The problem was with the incoming queue.
I used:
if (True == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
This is a simple bug, thread just got stuck on an empty queue.
But even If the code is changed to
if (False == self._outgoing_message_queue.empty()):
temp: dict = self._outgoing_message_queue.get()
it might cause the same behavior because between the moment the if statement is evaluated and the moment the get() is called a contact switch might occur and the queue might become empty and thread will get stuck on .get() as in my original code.
A better solution is to use get_nowait()
try:
temp = self._outgoing_message_queue.get_nowait()
except:
temp = None
Currently i'm trying to use proper threading to execute a bunch of scripts.
They are sorted like that:
Main Thread (Runs the Flask app)
-Analysis Thread (Runs the analysis script which invokes all needed scripts)
-3 different functions executed as thread (Divided in 3 parts so the analysis runs quicker)
My problem is i have a global variable with the analysis thread to be able to determine after the call wether the thread is running or not. The first time it does start and running just fine. Then you can call that endpoint as often as you like it wont do anything because i return a 423 to state that the thread (the analysis) is still running. After all scripts are finished, the if clause with analysis_thread.isAlive() returns false as it should and tries to start the analysis again with analysis_thread.start() but that doesn't work, it throws an exception saying the thread is already active and can't be started twice.
Is there a way to achieve that the script can be started and while it is running it returns another code but when it is finished i can start it again ?
Thanks for reading and for all your help
Christoph
The now hopefully working solution is to never stop the thread and just let it wait.
in the analysis script i have a global variable which indicates the status it is set to False by default.
inside the function it runs two whiles:
while True:
while not thread_status:
time.sleep(30)
execution of the other scripts.
thread_status = False # to ensure the execution runs just once.
I then just set the flag to True from the Controller class so it starts executing
In my application, I have a process which runs on another thread, takes a few seconds to complete and only needs to be done once. I also have a loading window which would let users know that the application is still running and let them cancel the process. This loading window calls a function every 0.5s to update the message : Processing., Processing.. or Processing... in a cycle.
The problem I have is that the computing time increases significantly with the loading window. Here are the 2 different implementation :
Without loading window :
processing_thread.start()
processing_thread.join()
With loading window :
processing_thread.start()
loading_window = LoadingWindow()
while processing_thread.is_alive():
try:
loading_window.window.update_idletasks()
loading_window.window.update()
except TclError:
return
Note that I don't use mainloop but an equivalent implementation which enables me the check if my process is still running - a bit like join and mainloop merged together (Tkinter understanding mainloop). I also tested it with mainloop() and it still didn't reduce the processing time in a significant way.
For now, the quick fix was to slow down the loop and add more idle time in the main thread :
processing_thread.start()
loading_window = LoadingWindow()
while processing_thread.is_alive():
try:
loading_window.window.update_idletasks()
loading_window.window.update()
time.sleep(0.5)
except TclError:
return
This reduced the time to something similar to what I have without the loading window but it brings me 2 problems (as far as I could see) :
Response time is slower (0.5s at worst)
The application will end some time after the process ended (0.5s at worst)
Is there a way to implement this without these drawbacks?
Would multiprocessing (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing) solve this?
Thank you