py4j.protocol.Py4JError: An error occurred while calling o112.save - python

I'm running a pyspark job submit on a university server:
My configuration is :
--master yarn --deploy-mode cluster --num-executors 150 --executor-cores 4 --executor-memory 28g --driver-memory 28g
My first few steps runs correctly :
df = spark.read.format('csv') \
.option('header',True) \
.option('multiLine', True) \
.load(data_file)
df.show()
udf_function = udf(stamp, StringType())
new_df = df.withColumn("column_a", udf_function(struct([df[x] for x in df.columns])))
new_df.show()
When I try to run the following commands separately, I get two very similar errors:
Command 1:
new_df.select("column_a").distinct().show(100)
Error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "python_stamp.py", line 93, in <module>
main()
File "python_stamp.py", line 82, in main
new_df.select("planning_cluster_id").distinct().show(100)
File "/hadoop4/yarn/nm/usercache/apps/appcache/application_1593105789029_2249545/container_e01_1593105789029_2249545_02_000002/pyspark.zip/pyspark/sql/dataframe.py", line 380, in show
Command 2:
new_df.write.mode("overwrite").format("csv").option("delimiter", ",").option("header", "true").save(save_path)
Error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "python_stamp.py", line 91, in <module>
main()
File "python_stamp.py", line 83, in main
new_df.write.mode("overwrite").format("csv").option("delimiter", ",").option("header", "true").save(save_path)
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/pyspark.zip/pyspark/sql/readwriter.py", line 738, in save
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/hadoop1/yarn/nm/usercache/apps/appcache/application_1593105789029_2249417/container_e01_1593105789029_2249417_02_000002/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o112.save
Does anyone know the reason behind it? I'm pretty confident that it's not because of any memory error, as the previous steps which show the table, load the table all are running correctly.
Additional information: When I run all of these commands on pyspark shell, they run perfectly well.

Related

How to integrate and mocking Redshift and S3 locally using redshift-fake-driver

I would like to build redshift and s3 locally, and then use them for tasks that may run from airflow, tools ... to reduce CI/CD code when have to deploy them to dev, also want to avoid conflict about resources, files, ...
Currently can use LocalStack's S3, but for Redshift, jusr looking for solutions but only get combination using redshift-fake-driver along with package JayDeBeApi in python, but it seems not working properly
import jpype # JPype1==1.4.1
import jaydebeapi # JayDeBeApi==1.2.3
jars = "/Users/trancongminh/Downloads/jars/*"
jpype.startJVM(classpath=jars)
driverName = "jp.ne.opt.redshiftfake.postgres.FakePostgresqlDriver"
print(jpype.JClass(driverName))
# as I spin up a docker container for postgresQL
connectionString = "jdbc:postgresqlredshift://localhost:5432/docker"
uid = "docker"
pwd = "docker"
driverFileName = "/Users/trancongminh/Downloads/jars/redshift-fake-driver_2.12-1.0.15.jar"
conn = jaydebeapi.connect(
jclassname=driverName,
url=connectionString,
driver_args={'user': uid, 'password': pwd},
jars=driverFileName
)
curs = conn.cursor()
curs.execute("SELECT * FROM pg_catalog.pg_tables limit 10;")
curs.fetchall()
curs.execute("copy db_table_name_v2 from 'http://localhost:4566/events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet' CREDENTIALS 'aws_access_key_id=test;aws_secret_access_key=test' ")
But get errors like No such file or directory, or smth like this
Traceback (most recent call last):
File "FakeConnection.scala", line 31, in jp.ne.opt.redshiftfake.FakeConnection.prepareStatement
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 531, in execute
self._prep = self._connection.jconn.prepareStatement(operation)
java.lang.NoSuchMethodError: java.lang.NoSuchMethodError: 'void scala.util.parsing.combinator.Parsers.$init$(scala.util.parsing.combinator.Parsers)'
or may be like this:
Traceback (most recent call last):
File "FakePreparedStatement.scala", line 138, in jp.ne.opt.redshiftfake.FakePreparedStatement$FakeAsIsPreparedStatement.execute
Exception: Java Exception
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
org.postgresql.util.PSQLException: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 536, in execute
_handle_sql_exception()
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 165, in _handle_sql_exception_jpype
reraise(exc_type, exc_info[1], exc_info[2])
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 57, in reraise
raise value.with_traceback(tb)
File "/Users/trancongminh/Pelago/pelago-ds-env/lib/python3.9/site-packages/jaydebeapi/__init__.py", line 534, in execute
is_rs = self._prep.execute()
jaydebeapi.DatabaseError: org.postgresql.util.PSQLException: ERROR: could not open file "s3://events-streaming/traveller/v2/ym_202210/d_04/hm_131901.parquet" for reading: No such file or directory
Hint: COPY FROM instructs the PostgreSQL server process to read a file. You may want a client-side facility such as psql's \copy
Anyy body has exp with this pattern just help, thanks
Solutions or keywords that helpful for further investigation

ChannelAttribution built-in function “markov_model_local_api” not working on sample codes

I am working on a channels attribution project in Python. I use the python library "ChannelAttribution". When I followed the instruction from the official documentation and tried to use the built-in function "markov_model_local_api", it kept throwing an error. Then I tried to run the sample codes from https://channelattribution.io/pdf/ChannelAttributionWhitePaper.pdf:
from ChannelAttribution import *
#Download data
Data = pd.read_csv("https://channelattribution.io/csv/Data.csv",sep=";")
#Generate token
generate_token(email="my_email", job="my_job", company="my_institution")
token = 'my_token'
#Train
res=markov_model_local_api(token, Data, var_path="path", var_conv="total_conversions",var_value="total_conversion_value", var_null="total_null", order=1, sep=">")
It throws these errors below:
Exception ignored in: <function Connection.__del__ at 0x7fba45595670>
Traceback (most recent call last):
File "/Users/li/opt/anaconda3/lib/python3.8/site-packages/pysftp/__init__.py", line 1013, in __del__
self.close()
File "/Users/li/opt/anaconda3/lib/python3.8/site-packages/pysftp/__init__.py", line 784, in close
if self._sftp_live:
AttributeError: 'Connection' object has no attribute '_sftp_live'
Traceback (most recent call last):
File "src/cypack/ChannelAttribution.pyx", line 1188, in ChannelAttribution.markov_model_local_api
File "src/cypack/ChannelAttribution.pyx", line 894, in ChannelAttribution.__f_initialize_connection
File "/Users/li/opt/anaconda3/lib/python3.8/site-packages/pysftp/__init__.py", line 132, in __init__
self._tconnect['hostkey'] = self._cnopts.get_hostkey(host)
File "/Users/li/opt/anaconda3/lib/python3.8/site-packages/pysftp/__init__.py", line 71, in get_hostkey
raise SSHException("No hostkey for host %s found." % host)
SSHException: No hostkey for host 13.58.174.83 found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/li/Downloads/untitled0.py", line 18, in <module>
res=markov_model_local_api(token, Data, var_path="path", var_conv="total_conversions", var_value="total_conversion_value", var_null="total_null", order=1, sep=">")
File "src/cypack/ChannelAttribution.pyx", line 1248, in ChannelAttribution.markov_model_local_api
UnboundLocalError: local variable 'filename' referenced before assignment
How can I fix all errors and exceptions?

Able to start chrome from path but can't control it in pywinauto

I had tried this code:
from pywinauto import Application
app = Application().start("chrome")
but I get this error:
Traceback (most recent call last):
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1038, in start
(h_process, _, dw_process_id, _) = win32process.CreateProcess(
pywintypes.error: (2, 'CreateProcess', 'The system cannot find the file specified.')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1052, in start
raise AppStartError(message)
pywinauto.application.AppStartError: Could not create the process "chrome"
Error returned by CreateProcess: (2, 'CreateProcess', 'The system cannot find the file specified.')
So I tried this instead:
from pywinauto import Application
app = Application().start(r"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe")
This launched google however, I can't control it. When I run this code:
from pywinauto import Application
app = Application().start(r"C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe")
app.NewTab.Edit.set_text("A link")
I get the following error:
Traceback (most recent call last):
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 250, in __resolve_control
ctrl = wait_until_passes(
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\timings.py", line 458, in wait_until_passes
raise err
pywinauto.timings.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 379, in __getattribute__
ctrls = self.__resolve_control(self.criteria)
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 261, in __resolve_control
raise e.original_exception
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\timings.py", line 436, in wait_until_passes
func_val = func(*args, **kwargs)
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 203, in __get_ctrl
dialog = self.backend.generic_wrapper_class(findwindows.find_element(**criteria[0]))
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\findwindows.py", line 84, in find_element
elements = find_elements(**kwargs)
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\findwindows.py", line 305, in find_elements
elements = findbestmatch.find_best_control_matches(best_match, wrapped_elems)
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\findbestmatch.py", line 536, in find_best_control_matches
raise MatchError(items = name_control_map.keys(), tofind = search_text)
pywinauto.findbestmatch.MatchError: Could not find 'NewTab' in 'dict_keys([])'
I don't understand this error at all. I had searched stack overflow and tried this code:
from pywinauto import Application
app = Application().start("chrome.exe --force-renderer-accessibility")
app.NewTab.Edit.set_text("A link")
but it yielded the same results. I got this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'app' is not defined
>>> app = Application().start("chrome.exe --force-renderer-accessibility")
Traceback (most recent call last):
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1038, in start
(h_process, _, dw_process_id, _) = win32process.CreateProcess(
pywintypes.error: (2, 'CreateProcess', 'The system cannot find the file specified.')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1052, in start
raise AppStartError(message)
pywinauto.application.AppStartError: Could not create the process "chrome.exe --force-renderer-accessibility"
Error returned by CreateProcess: (2, 'CreateProcess', 'The system cannot find the file specified.')
>>> app.NewTab.Edit.set_text("A link")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'app' is not defined
>>> app = Application().start("chrome.exe --force-renderer-accessibility")
Traceback (most recent call last):
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1038, in start
(h_process, _, dw_process_id, _) = win32process.CreateProcess(
pywintypes.error: (2, 'CreateProcess', 'The system cannot find the file specified.')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\DEVDHRITI\AppData\Local\Programs\Python\Python310\lib\site-packages\pywinauto\application.py", line 1052, in start
raise AppStartError(message)
pywinauto.application.AppStartError: Could not create the process "chrome.exe --force-renderer-accessibility"
Error returned by CreateProcess: (2, 'CreateProcess', 'The system cannot find the file specified.')
Also I'm using python 3.10.0 so is this a bug of my version?

twitterImgBot stops working after some hours

I'm trying to get this, https://github.com/joaquinlpereyra/twitterImgBot, to work
and it works and it seems ok.
But after some hours, it stops working and this error comes up:
*python3 twitterbot.py
Traceback (most recent call last):
File "/home/user/.local/lib/python3.7/site-packages/tweepy/binder.py", line 118, in build_path
value = quote(self.session.params[name])
KeyError: 'id'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "twitterbot.py", line 209, in <module>
main()
File "twitterbot.py", line 200, in main
orders()
File "twitterbot.py", line 118, in orders
timeline.delete_tweet_by_id(tweet.in_reply_to_status_id, api)
File "/home/user/Skrivebord/twitterboot/lo/bot/timeline.py", line 12, in delete_tweet_by_id
api.destroy_status(id_to_delete)
File "/home/user/.local/lib/python3.7/site-packages/tweepy/binder.py", line 245, in _call
method = APIMethod(args, kwargs)
File "/home/user/.local/lib/python3.7/site-packages/tweepy/binder.py", line 71, in __init__
self.build_path()
File "/home/user/.local/lib/python3.7/site-packages/tweepy/binder.py", line 120, in build_path
raise TweepError('No parameter value found for path variable: %s' % name)
tweepy.error.TweepError: No parameter value found for path variable: id*
It seems like the Python has some problem because if I make a new install on a another PC it works for some hours and then stops.
Strange.
This is likely because tweet is not in reply to a status, so has an in_reply_to_status_id attribute that's None, so API.destroy_status is called with an id of None.

Using a Win 10 system to access all connected devices using device manager and Py script

I am unable to access the disk drives on the Windows system using infi.manager package found on PyPi.
Tried the below :
from infi.devicemanager import DeviceManager
dm = DeviceManager()
dm.root.rescan()
disks = dm.disk_drives
names = [disk.friendly_name for disk in disks]
Error messages :
Traceback (most recent call last):
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\setupapi\functions.py", line 56, in callee
yield decorated_func(*args, **kwargs)
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\setupapi\functions.py", line 70, in SetupDiEnumDeviceInfo
interface(device_info_set, index, device_info_buffer)
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\cwrap\__init__.py", line 138, in __new__
return_value = function(*args[1:], **kwargs)
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\setupapi\__init__.py", line 35, in errcheck
raise WindowsException(GetLastError())
infi.devicemanager.setupapi.WindowsException: 259, No more data is available.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\setupapi\functions.py", line 60, in callee
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\__init__.py", line 215, in disk_drives
for controller in self.storage_controllers:
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\__init__.py", line 227, in storage_controllers
return self.get_devices_from_handle(handle)
File "C:\Users\rsushmit\AppData\Local\Programs\Python\Python37-32\lib\site-packages\infi\devicemanager\__init__.py", line 198, in get_devices_from_handle
for devinfo in functions.SetupDiEnumDeviceInfo(handle):
RuntimeError: generator raised StopIteration
from infi.devicemanager import DeviceManager
dm = DeviceManager()
dm.root.rescan()
devices = dm.all_devices
for device in devices:
print(device)

Categories