Pyspark databricks connect library issue - python

I am currently coding pyspark pipelines using databricks connect library. The steps I followed are given here. This library has been installed in a virtual environment.
When I try to execute this code
spark.read.load(path).first()
I get this error
<class 'TypeError'>, 'JavaPackage' object is not callable, <traceback object at 0x0000017AB70ECF88>
Traceback (most recent call last):
File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 83, in <module>
run()
File "D:/Friendsurance/Repository/data-ingestion/job/main.py", line 79, in run
el_job.run()
File "D:\Friendsurance\Repository\data-ingestion\job\task\__init__.py", line 18, in run
data: DataFrame = self.extract()
File "D:\Friendsurance\Repository\data-ingestion\job\task\ELTask.py", line 14, in extract
return self.extractor.extract()
File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\BucketExtractor.py", line 26, in extract
self.spark, self.load_storage.get_path(), self.conf.partition_column
File "D:\Friendsurance\Repository\data-ingestion\job\task\extractor\__init__.py", line 14, in calculate_last_day_run
spark.read.load(path).first().show()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1381, in first
return self.head()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1369, in head
rs = self.head(1)
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 1371, in head
return self.take(n)
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 657, in take
return self.limit(num).collect()
File "D:\Friendsurance\Repository\data-ingestion\venv\lib\site-packages\pyspark\sql\dataframe.py", line 596, in collect
if self._sc._conf.get(self._sc._jvm.PythonSecurityUtils.USE_FILE_BASED_COLLECT()):
TypeError: 'JavaPackage' object is not callable
But when I am out of the virtual environment where I am using the pyspark library provided here, I am able to execute the same line and it gives me the output.
Can anyone please tell me where I am going wrong with this?

Related

error while generating two log.html files in robot framework by Rebot model

I am trying to generate an error log html by “rebot” package of robot framework and its getting generated successfully.
But if I use the rebot function in my module then its affect default log and report html which gets generated after script execution.
[ ERROR ] Unexpected error: AttributeError: 'NoneType' object has no attribute 'encode'
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/robot/utils/application.py", line 83, in _execute
rc = self.main(arguments, **options)
File "/usr/local/lib/python3.5/dist-packages/robot/run.py", line 445, in main
result = suite.run(settings)
File "/usr/local/lib/python3.5/dist-packages/robot/running/model.py", line 248, in run
self.visit(runner)
File "/usr/local/lib/python3.5/dist-packages/robot/model/testsuite.py", line 161, in visit
visitor.visit_suite(self)
File "/usr/local/lib/python3.5/dist-packages/robot/model/visitor.py", line 87, in visit_suite
suite.tests.visit(self)
File "/usr/local/lib/python3.5/dist-packages/robot/model/itemlist.py", line 76, in visit
item.visit(visitor)
File "/usr/local/lib/python3.5/dist-packages/robot/model/testcase.py", line 74, in visit
visitor.visit_test(self)
File "/usr/local/lib/python3.5/dist-packages/robot/running/runner.py", line 159, in visit_test
self._output.end_test(ModelCombiner(test, result))
File "/usr/local/lib/python3.5/dist-packages/robot/output/output.py", line 59, in end_test
LOGGER.end_test(test)
File "/usr/local/lib/python3.5/dist-packages/robot/output/logger.py", line 183, in end_test
logger.end_test(test)
File "/usr/local/lib/python3.5/dist-packages/robot/output/console/verbose.py", line 51, in end_test
self._writer.status(test.status, clear=True)
File "/usr/local/lib/python3.5/dist-packages/robot/output/console/verbose.py", line 114, in status
self._clear_status()
File "/usr/local/lib/python3.5/dist-packages/robot/output/console/verbose.py", line 124, in _clear_status
self._write_info()
File "/usr/local/lib/python3.5/dist-packages/robot/output/console/verbose.py", line 90, in _write_info
self._stdout.write(self._last_info)
File "/usr/local/lib/python3.5/dist-packages/robot/output/console/highlighting.py", line 51, in write
self._write(console_encode(text, stream=self.stream))
File "/usr/local/lib/python3.5/dist-packages/robot/utils/encoding.py", line 60, in console_encode
return string.encode(encoding, errors).decode(encoding)
The error message is rather clear. It looks like you're trying to encode a variable which contains a None instead of a string. You need to make sure that this variable always contains a string and handle cases where something else is inside. You can do it for example using the try ... except statement.

flake8 internal error in regular expression engine

I'm trying to run flake8 on a docker django built like described here (tutorial page)
when building the docker image I get an error from flake8 which is run in an docker-compose file with like so
$ flake8 --ignore=E501,F401 .
multiprocessing.pool.RemoteTraceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 48, in
return list(map(*
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 666, in
return checker.run_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 598, in run_checks
self.run_ast_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 495, in run_ast_checks
checker = self.run_check(plugin, tree=ast)
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 426, in run_check
self.processor.keyword_arguments_for( File "/usr/local/lib/python3.8/site-packages/flake8/processor.py", line 241, in keyword_arguments_for arguments[param] = getattr(self, param)
File "/usr/local/lib/python3.8/site-packages/flake8/processor.py", line 119, in file_tokens
self._file_tokens = list(
File "/usr/local/lib/python3.8/tokenize.py", line 525, in _tokenize
pseudomatch = _compile(PseudoToken).match(line, pos)
RuntimeError: internal error in regular expression engine
The above exception was the direct cause of the following exception: Traceback (most recent call last):
File "/usr/local/bin/flake8", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/flake8/main/cli.py", line 18, in main
app.run(argv)
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 393, in run
self._run(argv)
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 381, in _run
self.run_checks()
File "/usr/local/lib/python3.8/site-packages/flake8/main/application.py", line 300, in run_checks
self.file_checker_manager.run()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 329, in run
self.run_parallel()
File "/usr/local/lib/python3.8/site-packages/flake8/checker.py", line 293, in run_parallel
for ret in pool_map:
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 448, in <genexpr>
return (item for chunk in result for item in chunk)
File "/usr/local/lib/python3.8/multiprocessing/pool.py", line 865, in next
raise value
RuntimeError: internal error in regular expression engine
When I run flake8 with the --verbose flag, I get an error like this:
Fatal Python error: deletion of interned string failed
Python runtime state: initialized
KeyError: 'FILENAME_RE'
from the tokenizer.py
Does anyone know how to solve this?
Additional Data:
running docker-compose v1.25.4 on an raspberry 3 with buster lite.
Installed and compiled Python 3.8.2 from source with the flag --enableloadable-sqlite
Thanks for helping!
If you don't care for flake8, then just delete that line. It will work. I had the same problem.
I know this isn't an answer. I would add a comment instead if I could.
I encountered a very similar error profile when I had refactored and left a lot of data which formerly had been in the top of the project down in what became the package folder. There were 33621 files including .venv, .nox, .pytest_cache, .coverage.
File "/usr/lib/python3.8/multiprocessing/pool.py", line 448, in <genexpr>
return (item for chunk in result for item in chunk)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
IndexError: string index out of range
If you see a similar signature (the parallel processing engine being overwhelmed in some way and throwing an exception), you may want to review your project directory structure and make sure that you did not put this kind of data in your sources directories.

What is the 'rdt' object in the error: 'KeyError: "Unable to open object (Object 'rdt' doesn't exist)"?

I am trying to execute the following code
impot spacepy.time as spt
import spacepy.omni as om
ticks = spt.Ticktock(['2002-02-02T12:00:00', '2002-02-02T12:10:00'], 'ISO')
d = om.get_omni(ticks)
d.tree(levels=1)
that is the example at the spacepy documentation.
I got the error:
Traceback (most recent call last):
File "<ipython-input-28-bd1a52c0010b>", line 1, in <module>
data = om.get_omni(ticks)
File "/usr/local/lib/python2.7/dist-packages/spacepy-0.1.6-py2.7.egg/spacepy/omni.py", line 252, in get_omni
enval, stval = omnirange(dbase=ldb)[1], omnirange(dbase=ldb)[0]
File "/usr/local/lib/python2.7/dist-packages/spacepy-0.1.6-py2.7.egg/spacepy/omni.py", line 377, in omnirange
start, end = hfile['RDT'][0], hfile['RDT'][-1]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "~/.local/lib/python2.7/site-packages/h5py/_hl/group.py", line 166, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
File "h5py/h5o.pyx", line 190, in h5py.h5o.open (/tmp/pip-4rPeHA-build/h5py/h5o.c:3570)
KeyError: "Unable to open object (Object 'rdt' doesn't exist)"
I don't know how to fix this.
The same problem occur when executing other SpacePy codes.
If you run SpacePy for the first time, a special dataset of OMNI data (more details on it here) needs to be downloaded. To obtain it, simply execute:
import spacepy
spacepy.toolbox.update()
For this function to work properly, you have to make sure that all dependencies according to the Installation Guideline are met - especially the NASA CDF library is needed.

AttributeError: '_GzipStreamFile' object has no attribute '_checkReadable

I am trying to setup commonsearch and i am the point where backend imports alexa1m data to rocksdb, but it die with error( error below ).
Traceback (most recent call last):
File "urlserver/import.py", line 21, in <module>
ds.import_dump()
File "./urlserver/datasources/__init__.py", line 62, in import_dump
for i, row in self.iter_dump():
File "./urlserver/datasources/__init__.py", line 102, in iter_dump
f = self.open_dump()
File "./urlserver/datasources/__init__.py", line 144, in open_dump
return GzipStreamFile(f)
File "/cosr/back/venv/src/gzipstream/gzipstream/gzipstreamfile.py", line 62, in __init__
super(GzipStreamFile, self).__init__(self._gzipstream)
File "/usr/lib64/python2.6/io.py", line 921, in __init__
raw._checkReadable()
AttributeError: '_GzipStreamFile' object has no attribute '_checkReadable'
have been fighting with this without progress for 2 days now.. if somebody could givme somekind of insight or advice i would be more than happy!
Found the solution, it was problem in the data parsed to the tool, not the tool itself.

OSError: Result too large

I'am playing around with scapy but i cant get it to work. I tried different code's but all gave me the same output:
Traceback (most recent call last):
File "<module1>", line 7, in <module>
File "C:\Python26\lib\site-packages\scapy\sendrecv.py", line 357, in srp
s = conf.L2socket(iface=iface, filter=filter, nofilter=nofilter, type=type)
File "C:\Python26\lib\site-packages\scapy\arch\pcapdnet.py", line 313, in __init__
self.outs = dnet.eth(iface)
File "dnet.pyx", line 112, in dnet.eth.__init__
OSError: Result too large
Iam using python 2.6 with all dependencies installed for scapy.
How to fix this?

Categories