I have a question concerning the access of a ray multi node cluster:
Considering a deployed cluster on AWS or kubernetes, how can a
seperate python process from outside the cluster run tasks on the
cluster?
e.g. A Web client tries to invoke python tasks,that should run within the Remote cluster.
EDIT: It comes down to the question:
Is there a Ray native API to connect to a remote cluster from a computer/server outside of the actual cluster?
https://ray.readthedocs.io/en/latest/deploy-on-kubernetes.html#running-ray-programs says you can get a shell on the ray pods and run the tasks or create a new Kubernetes job.
So the question boils down to how do I connect to the Kubernetes cluster from a web application.
You can connect to the Kubernetes cluster from a web application using one of the clients in https://github.com/kubernetes-client. Here is the Python one https://github.com/kubernetes-client/python
You can connect to remote Ray servers (e.g. running in a different container or on a different machine within the same cluster [1]) on the client-server port (10001 by default) rather than the Ray Core (GCS) server (6379 by default), using the standard ray.init(), as long as you prefix the URL or IP with ray://:
For example:
import ray
ray_cluster_ip_or_url = "<remote url redacted>"
# ray_api_server_port = 6379 # Ray API (GCS) server port
ray_client_server_port = 10001 # Ray client-server port
# caution: it is necessary to prefix the URL or IP with "ray://"
# to be able to connect to the client-server port
ray.init(address=f"ray://{ray_cluster_ip_or_url}:{ray_client_server_port}",
ignore_reinit_error=True)
More info
This alternative method does not require the ray:// prefix, but is deprecated:
[..]
ray.util.connect(conn_str=f"{ray_cluster_ip_or_url}:{ray_client_server_port}", secure=False)
[1] The restriction of client-server connections to the same k8s cluster is due to the binary protocols being used by Ray, which cannot be passed though an HTTP proxy outside of the cluster. This is why you need to connect to service ports, not standard HTTP(S) ports.
Related
I have a business case where I want to access a clustered Redis cache from one account (let's say account A) to an account B.
I have used the solution mentioned in the below link and for the most part, it works Base Solution
The base solution works fine if I am trying to access the clustered Redis via redis-py however if I try to use it with redis-py-cluster it fails.
I am testing all this in a staging environment where the Redis cluster has only one node but in the production environment, it has two nodes, so the redis-py approach will not work for me.
Below is my sample code
redis = "3.5.3"
redis-py-cluster = "2.1.3"
==============================
from redis import Redis
from rediscluster import RedisCluster
respCluster = 'error'
respRegular = 'error'
host = "vpce-XXX.us-east-1.vpce.amazonaws.com"
port = "6379"
try:
ru = RedisCluster(startup_nodes=[{"host": host, "port": port}], decode_responses=True, skip_full_coverage_check=True)
respCluster = ru.get('ABC')
except Exception as e:
print(e)
try:
ru = Redis(host=host, port=port, decode_responses=True)
respRegular = ru.get('ABC')
except Exception as e:
print(e)
return {"respCluster": respCluster, "respRegular": respRegular}
The above code works perfectly in account A but in account B the output that I got was
{'respCluster': 'error', 'respRegular': '123456789'}
And the error that I am getting is
rediscluster.exceptions.ClusterError: TTL exhausted
In account A we are using AWS ECS + EC2 + docker to run this and
In account B we are running the code in an AWS EKS Kubernetes pod.
What should I do to make the redis-py-cluster work in this case? or is there an alternative to redis-py-cluster in python to access a multinode Redis cluster?
I know this is a highly specific case, any help is appreciated.
EDIT 1: Upon further research, it seems that TTL exhaust is a general error, in the logs the initial error is
redis.exceptions.ConnectionError:
Error 101 connecting to XX.XXX.XX.XXX:6379. Network is unreachable
Here the XXXX is the IP of the Redus cluster in Account A.
This is strange since the redis-py also connects to the same IP and port,
this error should not exist.
So turns out the issue was due to how redis-py-cluster manages host and port.
When a new redis-py-cluster object is created it gets a list of host IPs from the Redis server(i.e. Redis cluster host IPs form account A), after which the client tries to connect to the new host and ports.
In normal cases, it works as the initial host and the IP from the response are one and the same.(i.e. the host and port added at the time of object creation)
In our case, the object creation host and port are obtained from the DNS name from the Endpoint service of Account B.
It leads to the code trying to access the actual IP from account A instead of the DNS name from account B.
The issue was resolved using Host port remapping, here we bound the IP returned from the Redis server from Account A with IP Of Account B's endpoints services DNA name.
Based on your comment:
this was not possible because of VPCs in Account-A and Account-B had the same CIDR range. Peered VPCs can’t have the same CIDR range.
I think what you are looking for is impossible. Routing within a VPC always happens first - it happens before any route tables are considered at all. Said another way, if the destination of the packet lies within the sending VPC it will never leave that VPC because AWS will try routing it within its own VPC, even if the IP isn't in use at that time in the VPC.
So, if you are trying to communicate with a another VPC which has the same IP range as yours, even if you specifically put a route to egress traffic to a different IP (but in the same range), the rule will be silently ignored and AWS will try to deliver the packet in the originating VPC, which seems like it is not what you are trying to accomplish.
I am running a Dask script on a EC2 instance from AWS. I would like to connect and see the dashboard provided by Dask, but I can't figure out how.
I am creating a Local Cluster on my EC2 instance, the scripts runs fine and I am connecting to my AWS instance via Putty. However, I would like to see the available dashboard: on my PC it is enough to connect to the provided IP and port, but I am not able to do that on the AWS machine.
Once the script is running, this is my output for the "parameters" of the local cluster:
<Client: 'inproc://172.31.29.4/7475/1' processes=1 threads=8, memory=27.94 GiB>
LocalCluster(b8be08dd, 'inproc://172.31.29.4/7475/1', workers=1, threads=8, memory=27.94 GiB)
dashboard address: {'dashboard': 8787}
For example, I tried to write 172.32.29.4:8787/status on my browser, but I wasn't able to connect to the dashboard.
I already checked this question: How to view Dask dashboard when running on a virtual machine? However I am using a Local Cluster and I would like to connect to its dashboard from remote. Is it possible? If so, how?
The answer is in the comments, but I will type it out here, so that the original question looks "answered".
You need two things to connect to a port on an EC2 machine: the external IP, and access. The former is most easily found from the AWS console. For the latter, you typically need to edit the security group to add an inbound TCP rule for the port (either open to the world, or just your IP). There are other ways to do this part, depending on whether your machine is inside a VPC, has any custom gateways or routers... but if you don't know what that means, find the security group first. Both the public IP and the security group will be linked from the machine's row in the EC2 "running instances" list.
I've setup dask-labextension visualizations to provide this type of UI.
Create the client object:
from dask.distributed import Client
client = Client()
Then click the magnifying glass provided by the extension to automatically connect with the cluster.
The detailed instructions are in this post.
I have a single node kubernetes cluster and a watcher which watches for services, pods and endpoints. The goal of the watcher is to watch for changes in service endpoints and get the IPs from the endpoints and update them as members to haproxy for load balancing.
I am able to do that. Now I want to update the service external IP with the IP which is listening for requests on haproxy. Is it possible to update the external IP from the watcher.
Note: I have written the watcher in python.
I am using a library (ShareDB) for operational transformation, and the server and client side use a websocket-json-stream to communicate. However this ShareDB is being run on nodejs as a service (I'm using zerorpc to control my node processes), as my main web framework is Tornado (python). I understand from this thread that with a stateful protocol such as TCP, the connections are differentiated by the client port (so only one server port is required). And according to this response regarding how websockets handle multiple incoming requests, there is no difference in the underlying transport channel between tcp and websockets.
So my question is, if I create a websocket from the client to the python server, and then also from the client to my nodejs code (the ShareDB service) how can the server differentiate which socket goes with which? Is it the servers responsibility to only have a single socket 'listening' for a connection a given time (i.e. to first establish communication with the Python server and then to start listening for the second websocket?)
The simplest way to run two server processes on the same physical server box is to have each of them listen on a different port and then the client connects to the appropriate port on that server to indicate which server it is trying to connect to.
If you can only have one incoming port due to your server environment, then you can use something like a proxy. You still have your two servers listening on different ports, but neither one is listening on the port that is open to the outside world. The proxy listens on the one incoming port that is open to the outside world and then based on some characteristics of the incoming connection, the proxy directs that incoming connection to the appropriate server process.
The proxy can be configured to identify which process you are trying to connect to either via the URL or the DNS hostname.
I've included more detail below, but the question I'm trying to answer is in the title. I'm currently trying to figure this out, but thought I'd ask here first in case anyone knows the answer off-hand.
About my setup
I have a Kubernetes service running on a Google Compute Engine cluster (started via Google Container Engine). It consists of a service (for the front-end stable IP), a replication controller, and pods running a Python server. The server is a Python gRPC server sleep-listening on a port.
There are 2 pods (2 replicas specified in the replication controller), one rc, one service, and 4 GCE instances (set to autoscale up to 5 based on CPU).
I'd like the service to be able to handle an arbitrary number of clients that want to stream information. However, I'm currently seeing that the service only talks to 16 of the clients.
I'm hypothesizing that the number of connections is either limited by the number of GCE instances I have, or by the number of pods. I'll be doing experiments to see how changing these numbers affects things.
Figured it out:
It's not the number of GCE instances: I increased the number of GCE instances with no change in the number of streaming clients.
It's the number of pods: each pod apparently can handle 8 connections. I simply scaled my replication controller with kubernetes scale rc <rc-name> --replicas=3 to support 24 clients.
I'll be looking into autoscaling (with a horizontal pod scaler?) the number of pods based on incoming HTTP requests.
Update 1:
Kubernetes doesn't currently support horizontal pod scaling based on HTTP.
Update 2:
Apparently there are other things at play here, like the size of the thread pool available to the server. With N threads and P pods, I'm able to maintain P*N open channels. This works particularly well for me because my clients only need to poll the server once every few seconds, and they sleep when inactive.