I'm working to setup Jupyter notebook servers on Kubernetes that are able to launch pyspark. Each user is able to have a multiple servers running at once, and would access each by navigating to the appropriate host combined with a path to the server's fully-qualified name. For example: http://<hostname>/<username>/<notebook server name>.
I have a top-level function defined that allows a user create a SparkSession that points to the Kubernetes master URL and sets their pod to be the Spark driver.
This is all well and good, but I would like to enable end users to access the URL for the Spark Web UI so that they can track their jobs. The Spark on Kubernetes documentation has port forwarding as their recommended scheme for achieving this. It seems to be that for any security-minded organization, allowing any random user to setup port forwarding in this way would be unacceptable.
I would like to use an Ingress Kubernetes definition to allow external access to the driver's Spark Web UI. I've setup something like the following:
# Service
apiVersion: v1
kind: Service
metadata:
namespace: <notebook namespae>
name: <username>-<notebook server name>-svc
spec:
type: ClusterIP
sessionAffinity: None
selector:
app: <username>-<notebook server name>-notebook
ports:
- name: app-svc-port
protocol: TCP
port: 8888
targetPort: 8888
- name: spark-ui-port
protocol: TCP
port: 4040
targetPort: 4040
# Ingress
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: workspace
name: <username>-<notebook server name>-ing
annotations:
kubernetes.io/ingress.class: traefik
spec:
rules:
- host: <hostname>
http:
paths:
- path: /<username>/<notebook server name>
backend:
serviceName: <username>-<notebook server name>-svc
servicePort: app-svc-port
- path: /<username>/<notebook server name>/spark-ui
backend:
serviceName: <username>-<notebook server name>-svc
servicePort: spark-ui-port
However, under this setup, when I navigate to http://<hostname>/<username>/<notebook server name>/spark-ui/, I'm redirected to http://<hostname>/jobs. This is because /jobs is the default entry point to Spark's Web UI. However, I don't have an ingress rule for that path, and can't set such a rule since every user's Web UI would collide with each other in the load balancer (unless I have a misunderstanding, which is totally possible).
Under the Spark UI configuration settings, there doesn't seem to be a way to set a root path for the Spark session. You can change the port on which it runs, but what I'd like to do make the UI serve at something like: http://<hostname>/<username>/<notebook server name>/spark-ui/<jobs, stages, etc>. Is there really no way of changing what comes after the hostname of the URL and before the last part?
1: set your spark config
spark.ui.proxyBase: /foo
2: Set the nginx annotations in Ingress
annotations:
nginx.ingress.kubernetes.io/proxy-redirect-from: http://$host/
nginx.ingress.kubernetes.io/proxy-redirect-to: http://$host/foo/
3:Annotation to rewrite target:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
rules:
- host: <host>
http:
paths:
- backend:
serviceName: <service>
servicePort: <port>
path: /foo(/|$)(.*)
Yes, you can achieve this. Specifically you can do this by setting the spark.ui.proxyBase property within spark-defaults.conf or at the run-time.
Example:
echo "spark.ui.proxyBase $SPARK_UI_PROXYBASE" >> /opt/spark/conf/spark-defaults.conf;
Then this should work.
Related
I have a Kubernetes cluster that is making use of an Ingress to forward on traffic to a frontend React app and a backend Flask app. My problem is that the React app only works if rewrite-target annotation is not set and the flask app only works if it is.
How can I get my flask app accessible without setting this value (commented out in below yaml).
Here is the Ingress controller:
metadata:
name: thesis-ingress
namespace: thesis
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/add-base-url: "true"
# nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/service-upstream: "true"
spec:
tls:
- hosts:
- thesis
secretName: ingress-tls
rules:
- host: thesis.info
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 3000
- path: /backend
pathType: Prefix
backend:
service:
name: backend
port:
number: 5000
Your question didn't specify, but I'm guessing your capture group was to rewrite /backend/(.+) to /$1; on that assumption:
Be aware that annotations are per-Ingress, but all Ingress resources are unioned across the cluster to comprise the whole of the configuration. Thus, if you need one rewrite and one without, just create two Ingress resources
metadata:
name: thesis-frontend
namespace: thesis
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/add-base-url: "true"
nginx.ingress.kubernetes.io/service-upstream: "true"
spec:
tls:
- hosts:
- thesis
secretName: ingress-tls
rules:
- host: thesis.info
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 3000
---
metadata:
name: thesis-backend
namespace: thesis
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/add-base-url: "true"
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/service-upstream: "true"
spec:
tls:
- hosts:
- thesis
secretName: ingress-tls
rules:
- host: thesis.info
- path: /backend/(.+)
backend:
service:
name: backend
port:
number: 5000
This question might seem like a duplicate of this.
I am trying to run Apache Beam python pipeline using flink on an offline instance of Kubernetes. However, since I have user code with external dependencies, I am using the Python SDK harness as an External Service - which is causing errors (described below).
The kubernetes manifest I use to launch the beam python SDK:
apiVersion: apps/v1
kind: Deployment
metadata:
name: beam-sdk
spec:
replicas: 1
selector:
matchLabels:
app: beam
component: python-beam-sdk
template:
metadata:
labels:
app: beam
component: python-beam-sdk
spec:
hostNetwork: True
containers:
- name: python-beam-sdk
image: apachebeam/python3.7_sdk:latest
imagePullPolicy: "Never"
command: ["/opt/apache/beam/boot", "--worker_pool"]
ports:
- containerPort: 50000
name: yay
apiVersion: v1
kind: Service
metadata:
name: beam-python-service
spec:
type: NodePort
ports:
- name: yay
port: 50000
targetPort: 50000
selector:
app: beam
component: python-beam-sdk
When I launch my pipeline with the following options:
beam_options = PipelineOptions([
"--runner=FlinkRunner",
"--flink_version=1.9",
"--flink_master=10.101.28.28:8081",
"--environment_type=EXTERNAL",
"--environment_config=10.97.176.105:50000",
"--setup_file=./setup.py"
])
I get the following error message (within the python sdk service):
NAME READY STATUS RESTARTS AGE
beam-sdk-666779599c-w65g5 1/1 Running 1 4d20h
flink-jobmanager-74d444cccf-m4g8k 1/1 Running 1 4d20h
flink-taskmanager-5487cc9bc9-fsbts 1/1 Running 2 4d20h
flink-taskmanager-5487cc9bc9-zmnv7 1/1 Running 2 4d20h
(base) [~]$ sudo kubectl logs -f beam-sdk-666779599c-w65g5
2020/02/26 07:56:44 Starting worker pool 1: python -m apache_beam.runners.worker.worker_pool_main --service_port=50000 --container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', '--logging_endpoint=localhost:39283', '--artifact_endpoint=localhost:41533', '--provision_endpoint=localhost:42233', '--control_endpoint=localhost:44977']
2020/02/26 09:09:07 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endpoint=localhost:39283 --artifact_endpoint=localhost:41533 --provision_endpoint=localhost:42233 --control_endpoint=localhost:44977
2020/02/26 09:11:07 Failed to obtain provisioning information: failed to dial server at localhost:42233
caused by:
context deadline exceeded
I have no idea what the logging- or artifact endpoint (etc.) is. And by inspecting the source code it seems like that the endpoints has been hard-coded to be located at localhost.
(You said in a comment that the answer to the referenced post is valid, so I'll just address the specific error you ran into in case someone else hits it.)
Your understanding is correct; the logging, artifact, etc. endpoints are essentially hardcoded to use localhost. These endpoints are meant to be only used internally by Beam and are not configurable. So the Beam worker is implicitly assumed to be on the same host as the Flink task manager. Typically, this is accomplished by making the Beam worker pool a sidecar of the Flink task manager pod, rather than a separate service.
Does anybody know how to run Beam Python pipelines with Flink when Flink is running as pods in Kubernetes?
I have successfully managed to run a Beam Python pipeline using the Portable runner and the job service pointing to a local Flink server running in Docker containers.
I was able to achieve that mounting the Docker socket in my Flink containers, and running Flink as root process, so the class DockerEnvironmentFactory can create the Python harness container.
Unfortunately, I can't use the same solution when Flink is running in Kubernetes. Moreover, I don't want to create the Python harness container using the Docker command from my pods.
It seems that Bean runner automatically selects Docker for executing Python pipelines. However, I noticed there is an implementation called ExternalEnvironmentFactory, but I am not sure how to use it.
Is there a way to deploy a side container and use a different factory to run the Python harness process? What is the correct approach?
This is the patch for DockerEnvironmentFactory:
diff -pr beam-release-2.15.0/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java beam-release-2.15.0-1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
*** beam-release-2.15.0/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java 2019-08-14 22:33:41.000000000 +0100
--- beam-release-2.15.0-1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java 2019-09-09 16:02:07.000000000 +0100
*************** package org.apache.beam.runners.fnexecut
*** 19,24 ****
--- 19,26 ----
import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects.firstNonNull;
+ import java.net.InetAddress;
+ import java.net.UnknownHostException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.time.Duration;
*************** public class DockerEnvironmentFactory im
*** 127,133 ****
ImmutableList.<String>builder()
.addAll(gcsCredentialArgs())
// NOTE: Host networking does not work on Mac, but the command line flag is accepted.
! .add("--network=host")
// We need to pass on the information about Docker-on-Mac environment (due to missing
// host networking on Mac)
.add("--env=DOCKER_MAC_CONTAINER=" + System.getenv("DOCKER_MAC_CONTAINER"));
--- 129,135 ----
ImmutableList.<String>builder()
.addAll(gcsCredentialArgs())
// NOTE: Host networking does not work on Mac, but the command line flag is accepted.
! .add("--network=flink")
// We need to pass on the information about Docker-on-Mac environment (due to missing
// host networking on Mac)
.add("--env=DOCKER_MAC_CONTAINER=" + System.getenv("DOCKER_MAC_CONTAINER"));
*************** public class DockerEnvironmentFactory im
*** 222,228 ****
private static ServerFactory getServerFactory() {
ServerFactory.UrlFactory dockerUrlFactory =
! (host, port) -> HostAndPort.fromParts(DOCKER_FOR_MAC_HOST, port).toString();
if (RUNNING_INSIDE_DOCKER_ON_MAC) {
// If we're already running in a container, we need to use a fixed port range due to
// non-existing host networking in Docker-for-Mac. The port range needs to be published
--- 224,230 ----
private static ServerFactory getServerFactory() {
ServerFactory.UrlFactory dockerUrlFactory =
! (host, port) -> HostAndPort.fromParts(getCanonicalHostName(), port).toString();
if (RUNNING_INSIDE_DOCKER_ON_MAC) {
// If we're already running in a container, we need to use a fixed port range due to
// non-existing host networking in Docker-for-Mac. The port range needs to be published
*************** public class DockerEnvironmentFactory im
*** 237,242 ****
--- 239,252 ----
}
}
+ private static String getCanonicalHostName() throws RuntimeException {
+ try {
+ return InetAddress.getLocalHost().getCanonicalHostName();
+ } catch (UnknownHostException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
/** Provider for DockerEnvironmentFactory. */
public static class Provider implements EnvironmentFactory.Provider {
private final boolean retainDockerContainer;
*************** public class DockerEnvironmentFactory im
*** 269,275 ****
public ServerFactory getServerFactory() {
switch (getPlatform()) {
case LINUX:
! return ServerFactory.createDefault();
case MAC:
return DockerOnMac.getServerFactory();
default:
--- 279,286 ----
public ServerFactory getServerFactory() {
switch (getPlatform()) {
case LINUX:
! return DockerOnMac.getServerFactory();
! // return ServerFactory.createDefault();
case MAC:
return DockerOnMac.getServerFactory();
default:
This is the Docker compose file I use to run Flink:
version: '3.4'
services:
jobmanager:
image: tenx/flink:1.8.1
command: 'jobmanager'
environment:
JOB_MANAGER_RPC_ADDRESS: 'jobmanager'
DOCKER_MAC_CONTAINER: 1
FLINK_JM_HEAP: 128
volumes:
- jobmanager-data:/data
- /var/run/docker.sock:/var/run/docker.sock
ports:
- target: 8081
published: 8081
protocol: tcp
mode: ingress
networks:
- flink
taskmanager:
image: tenx/flink:1.8.1
command: 'taskmanager'
environment:
JOB_MANAGER_RPC_ADDRESS: 'jobmanager'
DOCKER_MAC_CONTAINER: 1
FLINK_TM_HEAP: 1024
TASK_MANAGER_NUMBER_OF_TASK_SLOTS: 2
networks:
- flink
volumes:
- taskmanager-data:/data
- /var/run/docker.sock:/var/run/docker.sock
- /var/folders:/var/folders
volumes:
jobmanager-data:
taskmanager-data:
networks:
flink:
external: true
This is my Python pipeline:
import apache_beam as beam
import logging
class LogElements(beam.PTransform):
class _LoggingFn(beam.DoFn):
def __init__(self, prefix=''):
super(LogElements._LoggingFn, self).__init__()
self.prefix = prefix
def process(self, element, **kwargs):
logging.info(self.prefix + str(element))
yield element
def __init__(self, label=None, prefix=''):
super(LogElements, self).__init__(label)
self.prefix = prefix
def expand(self, input):
input | beam.ParDo(self._LoggingFn(self.prefix))
from apache_beam.options.pipeline_options import PipelineOptions
options = PipelineOptions(["--runner=PortableRunner", "--job_endpoint=localhost:8099"])
p = beam.Pipeline(options=options)
(p | beam.Create([1, 2, 3, 4, 5]) | LogElements())
p.run()
This is how I run the job service:
gradle :runners:flink:1.8:job-server:runShadow -PflinkMasterUrl=localhost:8081
Docker is automatically selected for executing the Python harness.
I can change the image used to run the Python container:
options = PipelineOptions(["--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=DOCKER", "--environment_config=beam/python:latest"])
I can disable Docker and enable the ExternalEnvironmentFactory:
options = PipelineOptions(["--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=EXTERNAL", "--environment_config=server"])
but I have to implement some callback answering on http://server:80.
Is there an implementation available?
To answer the question above, basically you want to add beam_worker_pool container along side with the flink task manager container in the same pods. So in the yaml file that you use to deploy flink task managers, add a new container:
- name: beam-worker-pool
image: apache/beam_python3.7_sdk:2.22.0
args: ["--worker_pool"]
ports:
- containerPort: 50000
name: pool
livenessProbe:
tcpSocket:
port: 50000
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
securityContext:
runAsUser: 9999
I found the solution. The new version of Apache Beam 2.16.0 provides an implementation to use in combination with environment type EXTERNAL. The implementation is based on worker_pool_main which has been created to support Kubernetes.
I know it is a bit outdated but there is a Flink operator for Kubernetes now.
Here are examples how to run Apache Beam with Flink using an operator:
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/tree/master/examples/beam
I am running my backend using Python and Django with uWSGI. We recently migrated it to Kubernetes (GKE) and our pods are consuming a LOT of memory and the rest of the cluster is starving for resources. We think that this might be related to the uWSGI configuration.
This is our yaml for the pods:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-pod
namespace: my-namespace
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 10
maxUnavailable: 10
selector:
matchLabels:
app: my-pod
template:
metadata:
labels:
app: my-pod
spec:
containers:
- name: web
image: my-img:{{VERSION}}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
protocol: TCP
command: ["uwsgi", "--http", ":8000", "--wsgi-file", "onyo/wsgi.py", "--workers", "5", "--max-requests", "10", "--master", "--vacuum", "--enable-threads"]
resources:
requests:
memory: "300Mi"
cpu: 150m
limits:
memory: "2Gi"
cpu: 1
livenessProbe:
httpGet:
httpHeaders:
- name: Accept
value: application/json
path: "/healthcheck"
port: 8000
initialDelaySeconds: 15
timeoutSeconds: 5
periodSeconds: 30
readinessProbe:
httpGet:
httpHeaders:
- name: Accept
value: application/json
path: "/healthcheck"
port: 8000
initialDelaySeconds: 15
timeoutSeconds: 5
periodSeconds: 30
envFrom:
- configMapRef:
name: configmap
- secretRef:
name: secrets
volumeMounts:
- name: service-account-storage-credentials-volume
mountPath: /credentials
readOnly: true
- name: csql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=my-project:region:backend=tcp:1234",
"-credential_file=/secrets/credentials.json"]
ports:
- containerPort: 1234
name: sql
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: credentials
mountPath: /secrets/sql
readOnly: true
volumes:
- name: credentials
secret:
secretName: credentials
- name: volume
secret:
secretName: production
items:
- key: APPLICATION_CREDENTIALS_CONTENT
path: key.json
We are using the same uWSGI configuration that we had before the migration (when the backend was being executed in a VM).
Is there a best practice config for running uWSGI in K8s? Or maybe something that I am doing wrong in this particular config?
You activated 5 workers in uwsgi, that could mean 5 times the need of memory if your application is using lazy-loading techniques (my advice: load everything at startup and trust pre-fork check this). However, you could try reducing number of workers and instead raising number of threads.
Also, you should drop max-requests, this makes your app reload every 10 requests, that's non-sense in a production environment (doc). If you have troubles with memory leaks, use reload-on-rss instead.
I would do something like this, maybe less or more threads per worker depending on how your app uses it (adjust according to cpu usage/availability per pod in production):
command: ["uwsgi", "--http", ":8000", "--wsgi-file", "onyo/wsgi.py", "--workers", "2", "--threads", "10", "--master", "--vacuum", "--enable-threads"]
ps: as zerg said in comment you should of course ensure your app is not running DEBUG mode, together with low logging output.
I'm running uwsgi+flask application,
The app is running as a k8s pod.
When i deploy a new pod (a new version), the existing pod get SIGTERM.
This causes the master to stop accepting new connection at the same moment, what causes issues as the LB still pass requests to the pod (for a few more seconds).
I would like the master to wait 30 sec BEFORE stop accepting new connections (When getting SIGTERM) but couldn't find a way, is it possible?
My uwsgi.ini file:
[uwsgi]
;https://uwsgi-docs.readthedocs.io/en/latest/HTTP.html
http = :8080
wsgi-file = main.py
callable = wsgi_application
processes = 2
enable-threads = true
master = true
reload-mercy = 30
worker-reload-mercy = 30
log-5xx = true
log-4xx = true
disable-logging = true
stats = 127.0.0.1:1717
stats-http = true
single-interpreter= true
;https://github.com/containous/traefik/issues/615
http-keepalive=true
add-header = Connection: Keep-Alive
Seems like this is not possible to achieve using uwsgi:
https://github.com/unbit/uwsgi/issues/1974
The solution - (as mentioned on this kubernetes issue):
https://github.com/kubernetes/contrib/issues/1140
Is to use the prestop hook, quite ugly but will help to achieve zero downtime:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
lifecycle:
preStop:
exec:
command: ["/bin/sleep","5"]
The template is taken from this answer: https://stackoverflow.com/a/39493421/3659858
Another option is to use the CLI option:
--hook-master-start "unix_signal:15 gracefully_kill_them_all"
or in the .ini file (remove the double quotes):
hook-master-start = unix_signal:15 gracefully_kill_them_all
which will gracefully terminate workers after receiving a SIGTERM (signal 15).
See the following for reference.
When I tried the above though, it didn't work as expected from within a docker container. Instead, you can also use uWSGI's Master FIFO file. The Master FIFO file can be specified like:
--master-fifo <filename>
or
master-fifo = /tmp/master-fifo
Then you can simply write a q character to the file and it will gracefully shut down your workers before exiting.