Python killed on GCP - python

I have been working on comparison to run deep learning code on local machine and Google Cloud Platform.
The code is about recurrent neural network and it ran perfectly well on local machine.
But on GCP cloud shell, when I want to compile my python file, it shows "Killed"
userID#projectID:~$ python rnn.py
Killed
Is it because that I am out of memory? (because I tried to run line by line, and on the second time I assigned large data to a variable, it stuck.)
My code is somewhat like this
imdb = np.load('imdb_word_emb.npz')
X_train = imdb['X_train']
X_test = imdb['X_test']
on the third line, the machine stuck and showed "Killed"
I tried to change the order of the second and third line, it still stuck at the third line.
My training data is a (25000,80,128)-array. So is my testing data. The data set works perfectly well on my local machine. I am sure there are no problem with this data set.
Or is it because of other reasons?
It would be awesome if people who know how to solve or even few key words tell me how to deal with this. Thank you :D

The error you are getting is because Cloud Shell is not intended for computational or network intensive processes, see Cloud Shell limitations.
I understand you want to compare your local machine with Google Cloud Platform. As stated in the public docs:
"When you start Cloud Shell, it provisions a g1-small Google Compute
Engine"
A g1-small machine type has 1.70GB RAM and a shared physical core. Keeping this in mind and also that is a limited as stated before, your local machine is likely more powerful than Cloud Shell so you'd not see any improvement.
I recommend you to create a Compute Engine instance with a different machine type, you can use a custom machine type to set the number of cores and GB of RAM you want to have. I guess you want to benefit from running your workload faster in Google Compute Engine so you can choose a better machine type than your local one in terms of resources and compare how much it improves.

Related

Limit number of cores used on server for tensorflow 2 and keras

I try to run a Python script that trains several Neural Networks using TensorFlow and Keras. The problem is that I cannot restrict the number of cores used on the server, even though it works on my local desktop.
The basic structure is that I have defined a function run_net that runs the neural net. This function is called with different parameters in parallel using joblib (see below). Additionally, I have tried running the function iteratively with different parameters which didn't solve the problem.
Parallel(n_jobs=1, backend="multiprocessing")(
delayed(run_net)
If I run that on my local Windows Desktop, everything works fine. However, if I try to run the same script on our institute's server with 48 cores and check CPU usage using htop command, all cores are used. I already tried setting n_jobs in joblib Parallel to 1 and it looks like CPU usage goes to 100% once the tensorflow models are trained.
I already searched for different solutions and the main one that I found is the one below. I define that before running the parallel jobs shown above. I also tried placing the code below before every fit or predict method of the model.
NUM_PARALLEL_EXEC_UNITS = 5
config = tf.compat.v1.ConfigProto(
intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,
inter_op_parallelism_threads=2,
device_count={"CPU": NUM_PARALLEL_EXEC_UNITS},
)
session = tf.compat.v1.Session(config=config)
K.set_session(session)
At this point, I am quite lost and have no idea how to make Tensorflow and/or Keras use a limited number of cores as the server I am using is shared across the institute.
The server is running linux. However, I don't know which exact distribution/version it is. I am very new to running code on a server.
These are the versions I am using:
python == 3.10.8
tensorflow == 2.10.0
keras == 2.10.0
If you need any other information, I am happy to provide that.
Edit 1
Both the answer suggested in this thread doesn't work as well as using only these commands:
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)
after trying some things, I have found a solution to my problem. With the following code, I can restrict the number of CPUs used:
os.environ["OMP_NUM_THREADS"] = "5"
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)
Note, that I have no idea how many CPUs will be used in the end. I noticed that it isn't five cores being used but more. As I don't really care about the exact number of cores but just that I don't use all cores, I am fine with that solution for now. If anybody knows how to calculate the number of cores used from the information provided above, let me know.

Why Might Python Be Running Slow

I am trying to run a program I created that uses a neural network to predict stock prices. I am trying to run it for a number of various different stocks. I am running the same exact code on both my desktop and my laptop.
At first I was running the code only on my desktop, and it was running very slow. At first I thought it was just because of the number of calculations to be made for the neural network. However, I also started running the code on my laptop to be able to run it for two stocks at the same time.
The code runs much much faster on my laptop (I would estimate about 20x faster), even though my desktop has a much better processor, GPU, etc... I am also using the same size data set for each run as well.
I added the lines of code:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
So that python should be using my processor and not my graphics processor, I am not sure if that makes a difference.
Any idea why this might be?

tcmalloc: large alloc python in Google Colab

I was trying to apply a deep learning algorithm(CNN) in python but after separating training-testing data and transforming time series to image step my Colab Notebook crashed and restarted itself again.
It gives an error like "Your session crashed after using all RAM" and when I checked app.log I saw something about tcmalloc: large alloc. I didn't find anything to fix this crashed.
Do you have any idea how to prevent this warning and fixed this situation?
Your session ran out of all available RAM. You can purchase Colab Pro to get extra RAM or you can use a Higher RAM machine and use the Neural Network there

Python compiler call another python compiler to execute a script (execute a script from one independent machine to another)

I know the question title is weird!.
I have two virtual machines. First one has limited resources, while the second one has enough resources just like normal machine. The first machine will receive a signal from an external device. This signal will trigger a python compiler to execute a script. The script is big and the first machine does not have enough resources to execute it.
I can copy the script to the second machine to run it there, but I can't make the second machine receive the external signal. I am wondering if there is a way to make the compiler on the first machine ( once the external signal received) call the compiler on the second machine, so the compiler on the second machine executes the script? so the second compiler should use the second machine resources. check the attached image please.
Assume that the connection is established between the two machines and they can see each other, and the second machine has a copy from the script. I just need the commands that pass ( the execution ) to the second machine and make it use its own resources.
You should look into the microservice architecture to do this.
You can achieve this either by using flask and sending server requests between each machine, or something like nameko, which will allow you to create a "bridge" between machines and call functions between them (seems like what you are more interested in). Example for nameko:
Machine 2 (executor of resource-intensive script):
from nameko.rpc import rpc
class Stuff(object):
#rpc
def example(self):
return "Function running on Machine 2."
You would run the above script through the Nameko shell, as detailed in the docs.
Machine 1:
from nameko.standalone.rpc import ClusterRpcProxy
# This is the amqp server that machine 2 would be running.
config = {
'AMQP_URI': AMQP_URI # e.g. "pyamqp://guest:guest#localhost"
}
with ClusterRpcProxy(config) as cluster_rpc:
cluster_rpc.Stuff.example() # Function running on Machine 2.
More info here.
Hmm, there's many approaches to this problem.
If you want a python only solution, you can check out
dispy http://dispy.sourceforge.net/
Or Dask. https://dask.org/
If you want a robust solution (what I use on my home computing cluster but imo overkill for your problem) you can use
SLURM. SLURM is basically a way to string multiple computers together into a "supercomputer". https://slurm.schedmd.com/documentation.html
For a semi-quick, hacky solution. You can write a microservice. Essentially, your "weak" computer will receive the message then send a http request to your "strong" computer. Your strong computer will contain the actual program, compute results, and pass back the result to your "weak" computer.
Flask is an easy and lightweight solution for this.
All of these solutions require some type of networking. At the least, the computers need to be on the same LAN or both have access over the web.
There are many other approaches not mentioned. For example, you can export a NFS (netowrk file storage) and have one computer put a file in the shared folder and the other computer perform work on the file. I'm sure there are plenty other contrived ways to accomplish this task :). I'd be happy to expand on a particular method if you want.

Spawning new robot in running ROS Gazebo simulation

The problem statement is in simulating both 'car' and a quadcopter in ROS Gazebo SITL as mentioned in this question. Two possibilities have been considered for the same which is as depicted in the image.
(Option 1 uses 6 terminals with independent launch files and MAVproxy initiation terminals)
While trying to search for Option 1, the documentation appeared to be sparse (The idea is to launch the simulation with ErleRover and then spawn ErleCopter on-the-go; I haven't found any official documentation mentioning either the possibility or the impossibility of this option). Can somebody be requested to let me know how option 1 can be achieved or why it is impossible by mentioning corresponding official documentation?
Regarding option 2, additional options have been explored; The problem is apparently with two aspects: param vs rosparam and tf2 vs tf_prefix.
Some of the attempts of simulation of multiple turtlebots have used tf_prefix which is deprecated. But, I have been unable to find any example which uses tf2 while simulating multiple (different) robots. But, tf2 works on ROS Hydro (and thus Indigo). Another possible option is the usage of rosparam instead of param (only). But, documentation on that is sparse regarding the usage of same on multi-robot simulation and I have been able to find only one example (for a single robot Husky).
But, one thing is clearer: MAVproxy can support multiple robots through the usage of SYSID and component-ID parameters. (upto 255 robots with 0 being a broadcast ID) Thus, port numbers have to be modified (possibly 14000 and 15000 as each vehicle uses 4 consecutive ports) just like the UCTF simulation. (vehicle_base_port = VEHICLE_BASE_PORT + mav_sys_id*4)
To summarise the question, the main concern is to simulate an independent car moving around and an independent quadcopter flying around in the ROS Gazebo SITL (maybe using Python nodes; C++ is fine too). Can somebody be requested to let me know the answers to the following sub-questions?
Is this kind of simulation possible? (Either by the usage of ROS Indigo, Gazebo 7, MAVproxy 1.5.2 on Ubuntu 14.04 or by modifying UCTF project to spawm a car like ErleRover if there is no other option)
(You are kindly requested to let me know the examples if possible and official links if this is impossible)
If on-the-go launch is not possible with two launch files, is it possible to launch two different robots with a single launch file?
This is an optional question: How to modify the listener (subscriber) of the node? (Is it to be done in the Python node?)
This simulation is taking relatively long time with system software crashing for about 3 times (NVIDIA instead of Noveau, broken packages etc) and any help will be whole-heartedly, gratefully and greatly appreciated. Thanks for your time and consideration.
Prasad N R

Categories