I run CNN code on Colab notebook, and it takes long time. However, the connection always breaks after I ran 2 or 3 hours and cannot be reconnect back. I was told Colab virture machine would break connection after 12 hours without any operates, so how can I avoid restart my code after connection broken, or any easier way to controll Colab?
Colab restarts on inactivity in 2-3 hours, you need to reconnect it. To simply avoid this show some activity on runtime.
Related
I'm trying to make a program that keeps checking bitcoin price every 5 minutes and emails me when certain conditions are met using the schedule module in the PyCharm IDE, but this slows down my computer and stops when I need to restart or update anything windows.
Here's what I've got so far
import schedule
import time
def job():
print("I'm working...")
schedule.every(5).minutes.do(job)
while True:
schedule.run_pending()
time.sleep(1)
Is there any way I can have this run on an external system that doesn't lose connection to wifi and always runs? Are there other modules that that run on interrupts/ don't slow the computer?
I think the best answer to your problem doesn't actually involve code.
What you might be looking to work with is something called a raspberry pi. Its a small credit card sized computer that doesn't cost very much, I think it might cost about 30-40 bucks starting.
If you can handle the linux system, which I actually think is easier than windows and similar to Mac, you can program the raspberry pi to be a computer that's sole purpose is meant to run your program.
It won't slow down your computer because it'll be running on a different system, and as for the wifi, there's no real way to get around that unless you have multiple different service providers, unless you set the program to also monitor it's connectivity to wifi. If you do this, there might be some sort of way that you could at least send a notification via local network to your computer, or phone depending on how comfortable you are with building local servers.
Hope that helps!
How come after 2 hours of running a model, I get a popup window saying:
Runtime disconnected
The connection to the runtime has timed out.
CLOSE RECONNECT
I had restarted my runtime and thought I have 12 hours to train a model. Any Idea how to avoid this? My other question: Is it possible to find out the time left for runtime to get disconnected using a TF or Python API?
Runtime gets disconnected when the notebook goes to "idle" mode for a time greater than 90 minutes. This is an unofficial number, as google colab has no official release about this. This is how google colab gets away with it by answering cheekily:
An extract from the Official Colab FAQ
Where is my code executed? What happens to my execution state if I close the browser window?
Code is executed in a virtual machine dedicated to your account.
Virtual machines are recycled when idle for a while, and have a
maximum lifetime enforced by the system.
So to avoid this, keep your browser open and don't let your system sleep for a time greater than 90 minutes.
This also means if you happen to close your browser within the 90 minutes, then if you reopen the notebook within 90 minutes you will still have all your running processes and session variables intact!
Also, note that currently you can run a notebook for a maximum of 12 hours. (in the "non-idle" state of course).
To answer your second question, this "idle state" stuff is a colab thing. So I don't think TF or Python will have anything to do with it.
So it is good practise to have your models saved into a folder periodically. This way, in the unfortunate event of your runtime getting disconnected, your work will not be lost. And you can simply restart your training from the latest saved model!
PS: I got the number 90 minutes from an experiment done by a fellow user
I am training a neural network for Neural Machine Traslation on Google Colaboratory. I know that the limit before disconnection is 12 hrs, but I am frequently disconnected before (4 or 6 hrs). The amount of time required for the training is more then 12 hrs, so I add some savings each 5000 epochs.
I don't understand if when I am disconnected from Runtime (GPU is used) the code is still execute by Google on the VM? I ask it because I can easily save the intermediate models on Drive, and so continue the train also if I am disconnected.
Does anyone know it?
Yes, for ~1.5 hours after you close the browser window.
To keep things running longer, you'll need an active tab.
I recently started to use Google Colab to train my CNN model. It always needs about 10+ hours to train once. But I cannot stay in the same place during these 10+ hours, so I always poweroff my notebook and let the process keep going.
My code will save models automatically. I figured out that when I disconnect from the Colab, the process are still saving models after disconnection.
Here are the questions:
When I try to reconnect to the Colab notebook, it always stuck at "INITIALIZAING" stage and can't connect. I'm sure that the process is running. How do I know if the process is OVER?
Is there any way to reconnect to the ongoing process? It will be nice to me to observe the training losses during the training.
Sorry for my poor English, thanks alot.
Output your loss results to a log file saved in your drive, and periodically check this file.
You can run your training process like:
!log_file = "/content/drive/My Drive/path/log.log"
!python train.py > "${log_file}"
first question: restart runtime from runtime menu
second question: i think you can use tensorboard to monitor your work.
It seems there's no normal way to do this. But you can save your model to Google Drive with current training epoch number, so when you see something like "my_model_epoch_1000" on your google drive, you will know that the process is over.
For a flood warning system, my Raspberry Pi rings the bells in near-real-time but uses a NAS drive as a postbox to output data files to a PC for slower-time graphing and reporting, and to receive various input data files back. Python on the Pi takes precisely 10 seconds to establish that the NAS drive right next to it is not currently available. I need that to happen in less than a second for each access attempt, otherwise the delays add up and the Pi fails to refresh the hardware watchdog in time. (The Pi performs tasks on a fixed cycle: every second, every 15 seconds (watchdog), every 75 seconds and every 10 minutes.) All disc access attempts are preceded by tests with try-except. But try-except doesn't help, as tests like os.path.exists() or with open() both take 10 seconds before raising the exception, even when the NAS drive is powered down. It's as though there's a 10-second timeout way down in the comms protocol rather than up in the software.
Is there a way of telling try-except not to be so patient? If not, how can I get a more immediate indicator of whether the NAS drive is going to hold up the Pi at the next read/write, so that the Pi can give up and wait till the next cycle? I've done all the file queueing for that, but it's wasted if every check takes 10 seconds.
Taking on MQTT at this stage would be a big change to this nearly-finished project. But your suggestion of decoupling the near-real-time Python from the NAS drive by using a second script is I think the way to go. If the Python disc interface commands wait 10 seconds for an answer, I can't help that. But I can stop it holding up the time-critical Python functions by keeping all time-critical file accesses local in Pi memory, and replicating whole files in both directions between the Pi and the NAS drive whenever they change. In fact I already have the opportunistic replicator code in Python - I just need to move it out of the main time-critical script into a separate script that will replicate the files. And the replicator Python script will do any waiting, rather than the time-critical Python script. The Pi scheduler will decouple the two scripts for me. Thanks for your help - I was beginning to despair!