How can i run a pyspark job on k8s?

How can i run a pyspark job on k8s? - python

I'm trying to run a hello world spark application on k8s cluster. I've built my own docker image with the script on top of a standard pyspark docker image and now I'm trying to run this image on k8s cluster, but get the following error. DNS pods logs are okay.
My current Dockerfile:
FROM semenchukou/spark-py:v2.4.1
COPY . /app
WORKDIR /app
The command I'm using to deploy the job on k8s:
bin/spark-submit
--master k8s://https://172.20.234.174:6443
--deploy-mode cluster
--conf spark.executor.instances=2
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.container.image=semenchukou/pyspark-k8s-example:likeEx3
--name spark_k8s_hello_world_0
--conf spark.kubernetes.pyspark.pythonVersion=3
local:///app/HelloWorldSpark.py
Error:
Traceback (most recent call last):
File "/app/HelloWorldSpark.py", line 10, in <module>
.appName("PythonPi")\
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 367, in getOrCreate
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [sparkk8shelloworld0-1583151334880-driver] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 13 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at okhttp3.Dns$1.lookup(Dns.java:39)
at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:810)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
... 20 more
What am I doing wrong?
The helloWorkd script:
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SparkSession
if __name__ == '__main__':
spark = SparkSession\
.builder\
.appName("PythonEx")\
.getOrCreate()
txt = spark.sparkContext.textFile('hdfs://172.20.234.174:1515/testing/testFile.txt')
first = txt.first()
spark.sparkContext.parallelize(first).saveAsTextFile('hdfs://172.20.234.174:9000/testing/result.txt')
spark.stop()

Your error stacktrace says:
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
which means that by some reason you do not have Service kubernetes in namespace default or you have DNS related problems in your cluster.
Also this issue has already been discussed with you here.

Related

HDP 2.6.5 : java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource

I'm using HDP 2.6.5 and when i want to save my data (get on a json file) to a mongoDB database i have this problem.
I'm using the sandbox.
My code :
#HDFS-Mongo Used to write the json file from HDFS to Mongo DB
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder \
.appName("testdb") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/testdb.test1") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/testdb.test1") \
.getOrCreate()
df = my_spark.read.option("multiline", "true").json("hdfs://sandbox-hdp.hortonworks.com:8020/user/root/output2.json")
df.count()
df.printSchema()
df.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").option("database","testdb").option("collection", "test1").save()
The errors :
Traceback (most recent call last): File "hdfs_mongo.py", line 19, in <module> df.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").option("database","testdb").option("collection", "test1").save() File "/usr/local/lib/python3.6/site-packages/pyspark/sql/readwriter.py", line 738, in save self._jwrite.save() File "/usr/local/lib/python3.6/site-packages/py4j/java_gateway.py", line 1322, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/local/lib/python3.6/site-packages/pyspark/sql/utils.py", line 111, in deco return f(*a, **kw) File "/usr/local/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o44.save. : java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource. Please find packages at http://spark.apache.org/third-party-projects.html                                                                                            at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:443) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:852) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:256) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:656) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:656) at scala.util.Failure.orElse(Try.scala:224) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:656) ... 16 more
I already try to uninstall python and install again (python3.6), i use this code and run it but it don't works :
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
.config("spark.mongodb.output.uri", "mongodb://xxx.xxx.xxx.xxx:27017/sample1.zips") \
.config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.3.2') \
.getOrCreate()
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
df.printSchema()

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (Can´t run a python program using Spark)

Unable to load native-hadoop library for your platform... using built-in-java classes where applicable (Can´t run a python program using Spark). I am trying to run this code just for testing if sparks work:
Code
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName('sparkdf').getOrCreate()
data = [["node.js", "dbms", "integration"],
["jsp", "SQL", "trigonometry"],
["php", "oracle", "statistics"],
[".net", "db2", "Machine Learning"]]
columns = ["Web Technologies", "Data bases", "Maths"]
dataframe = spark.createDataFrame(data, columns)
dataframe.show()
I think enviroment variables are set up correctly and winutils.exe is located at the Hadoop\bin directory. But i kept getting this error:
Error
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/04/26 12:48:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "C:\Users\ldocampo\prueba.py", line 10, in <module>
spark = SparkSession.builder.master("local").appName('sparkdf').getOrCreate()
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 384, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 146, in __init__
self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 209, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\context.py", line 321, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py", line 1568, in __call__
return_value = get_return_value(
File "C:\Users\ldocampo\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module #0x51f116b8) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module #0x51f116b8
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:483)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)

Pyspark read REST API

It is the first time I am using Pyspark. I would like to create an ETL which extract from the API and put the data into a database in my local environment. But I have an error to call the API as shown below. Any help would be appreciated.
t
t
The error:
Traceback (most recent call last):
File "etl.py", line 9, in <module>
df = spark.read.format("org.apache.dsext.spark.datasource.rest.RestDataSource").options(**options).load()
File "/home/ubuntu/.local/lib/python3.6/site-packages/pyspark/sql/readwriter.py", line 184, in load
return self._df(self._jreader.load())
File "/home/ubuntu/.local/lib/python3.6/site-packages/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/home/ubuntu/.local/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o32.load.
: java.lang.ClassNotFoundException: Failed to find data source: org.apache.dsext.spark.datasource.rest.RestDataSource. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:679)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:248)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: org.apache.dsext.spark.datasource.rest.RestDataSource.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:653)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:653)
... 14 more
My code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("{your app name here}").getOrCreate()
uri = "https://min-api.cryptocompare.com/data/histoday?fsym=BTC&tsym=JPY&limit=30&aggregate=1&e=CCCAGG"
options = { 'url' : uri, 'method' : 'GET', 'readTimeout' : '10000', 'connectionTimeout' : '2000', 'partitions' : '10'}
df = spark.read.format("org.apache.dsext.spark.datasource.rest.RestDataSource").options(**options).load()
df.printSchema()
JAVA version:
openjdk 11.0.9.1 2020-11-04
OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04)
OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04, mixed mode, sharing)

I believe that this issue was raised due to a missing dependency.
In the code, you mentioned org.apache.dsext.spark.datasource.rest.RestDataSource as your format, this particular functionality is not inbuild in spark but depends on third party package called REST Data Source
you need to create a jar file by building the codebase and add it to your spark as follows:
$SPARK_HOME/bin/spark-shell --jars spark-datasource-rest_2.11-2.1.0-SNAPSHOT.jar --packages org.scalaj:scalaj-http_2.10:2.3.0

Issue while deploy the cloud function

I am new to GCP. I created a cloud function and tried to deploy it. I encountered the following error while deploying. Can anyone help me solve this issue? Thank you!
Command:
gcloud functions deploy first_cloud_function_http --runtime python37 --trigger-http --allow-unauthenticated --verbosity debug
Error Logs:
DEBUG: Running [gcloud.functions.deploy] with arguments: [--allow-unauthenticated: "True", --runtime: "python37", --trigger-http: "True", --verbosity: "debug", NAME: "first_cloud_function_http"]
INFO: Not using ignore file.
INFO: Not using ignore file.
Deploying function (may take a while - up to 2 minutes)...failed.
DEBUG: (gcloud.functions.deploy) OperationError: code=13, message=Failed to initialize region (action ID: 78ed38913711b6cd)
Traceback (most recent call last):
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 983, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 808, in Run
resources = command_instance.Run(args)
File "/home/hasher/GN/google-cloud-sdk/lib/surface/functions/deploy.py", line 351, in Run
return _Run(args, track=self.ReleaseTrack())
File "/home/hasher/GN/google-cloud-sdk/lib/surface/functions/deploy.py", line 305, in _Run
on_every_poll=[TryToLogStackdriverURL])
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py", line 318, in CatchHTTPErrorRaiseHTTPExceptionFn
return func(*args, **kwargs)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py", line 369, in WaitForFunctionUpdateOperation
on_every_poll=on_every_poll)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 151, in Wait
on_every_poll)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 121, in _WaitForOperation
sleep_ms=SLEEP_MS)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 219, in RetryOnResult
result = func(*args, **kwargs)
File "/home/hasher/GN/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 73, in _GetOperationStatus
raise exceptions.FunctionsError(OperationErrorToString(op.error))
googlecloudsdk.api_lib.functions.exceptions.FunctionsError: OperationError: code=13, message=Failed to initialize region (action ID: 78ed38913711b6cd)
ERROR: (gcloud.functions.deploy) OperationError: code=13, message=Failed to initialize region (action ID: 78ed38913711b6cd)

Notice that as stated on the documentation of the gcloud functions deploy command you'd necessarily need to set the --region flag.
To check the available regions where Cloud Functions is available refer to the following sections of the documentation.
For example, if you'd like to deploy the function in the europe-west1 region running the following command would suffice:
gcloud functions deploy first_cloud_function_http --region europe-west1 --runtime python37 --trigger-http --allow-unauthenticated --verbosity debug
Additionally, if you'd like to avoid using the --region flag you can set a default region for Cloud Functions by running:
gcloud config set functions/region REGION
where you could change the REGION field to any of the locations mentioned above.

How to get data from cassandra via pyspark?

I'm trying to get data from cassandra via pyspark. And I got the connector from github . But I failed to do that.
The following is the code.
import pyspark_cassandra
from pyspark_cassandra import CassandraSparkContext
from pyspark import SparkConf
#from pyspark.sql import SQLContext
conf = SparkConf() \
.setAppName("PySpark Cassandra Test") \
.setMaster("spark://192.192.141.21:7077") \
.set("spark.cassandra.connection.host", "192.192.141.26:9042")
sc = CassandraSparkContext(conf=conf)
sc.cassandraTable("oltpdb", "XiangWan") \
.select("dt", "wid") \
.where("wid='XiangWan001'", "daybucket in ('20190326')","dt >= '2019-03-26 13:18:03'") \
.collect()
So, with the following command:
spark-submit /root/model/connect_cannandra_via_spark.py
I got the error:
Traceback (most recent call last):
File "/root/model/connect_cannandra_via_spark.py", line 25, in <module>
df = (SQLContext
AttributeError: 'property' object has no attribute 'format'
[root#CDH21 python]# spark-submit /root/model/connect_cannandra_via_spark.py
19/04/11 14:06:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/11 14:06:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/04/11 14:06:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Traceback (most recent call last):
File "/root/model/connect_cannandra_via_spark.py", line 12, in <module>
sc.cassandraTable("oltpdb", "XiangWan") \
File "/root/anaconda3/lib/python3.6/site-packages/pyspark_cassandra-0.9.0-py3.6.egg/pyspark_cassandra/context.py", line 33, in cassandraTable
File "/root/anaconda3/lib/python3.6/site-packages/pyspark_cassandra-0.9.0-py3.6.egg/pyspark_cassandra/rdd.py", line 324, in __init__
File "/root/anaconda3/lib/python3.6/site-packages/pyspark_cassandra-0.9.0-py3.6.egg/pyspark_cassandra/rdd.py", line 213, in _helper
File "/root/anaconda3/lib/python3.6/site-packages/pyspark_cassandra-0.9.0-py3.6.egg/pyspark_cassandra/util.py", line 99, in helper
File "/root/anaconda3/lib/python3.6/site-packages/pyspark_cassandra-0.9.0-py3.6.egg/pyspark_cassandra/util.py", line 88, in load_class
File "/root/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/root/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o24.loadClass.
: java.lang.ClassNotFoundException: pyspark_cassandra.PythonHelper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
How should I do?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can i run a pyspark job on k8s? - python

Your error stacktrace says: Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again which means that by some reason you do not have Service kubernetes in namespace default or you have DNS related problems in your cluster. Also this issue has already been discussed with you here.

Related

HDP 2.6.5 : java.lang.ClassNotFoundException: com.mongodb.spark.sql.DefaultSource.DefaultSource

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (Can´t run a python program using Spark)

Pyspark read REST API

Issue while deploy the cloud function

How to get data from cassandra via pyspark?

Categories

Resources