spark-nlp 'JavaPackage' object is not callable - python

I am using jupyter lab to run spark-nlp text analysis. At the moment I am just running the sample code:
import sparknlp
from pyspark.sql import SparkSession
from sparknlp.pretrained import PretrainedPipeline
#create or get Spark Session
#spark = sparknlp.start()
spark = SparkSession.builder \
.appName("ner")\
.master("local[4]")\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.5")\
.config("spark.kryoserializer.buffer.max", "500m")\
.getOrCreate()
print("sparknlp version", sparknlp.version(), "sparkversion", spark.version)
#download, load, and annotate a text by pre-trained pipeline
pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
result = pipeline.annotate('Harry Potter is a great movie')
I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-bfd6884be04c> in <module>
15
16 #download, load, and annotate a text by pre-trained pipeline
---> 17 pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
18 result = pipeline.annotate('Harry Potter is a great movie')
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/sparknlp/pretrained.py in __init__(self, name, lang, remote_loc, parse_embeddings, disk_location)
89 def __init__(self, name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None):
90 if not disk_location:
---> 91 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
92 else:
93 self.model = PipelineModel.load(disk_location)
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/sparknlp/pretrained.py in downloadPipeline(name, language, remote_loc)
49 def downloadPipeline(name, language, remote_loc=None):
50 print(name + " download started this may take some time.")
---> 51 file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
52 if file_size == "-1":
53 print("Can not find the model to download please check the name!")
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/sparknlp/internal.py in __init__(self, name, language, remote_loc)
190 def __init__(self, name, language, remote_loc):
191 super(_GetResourceSize, self).__init__(
--> 192 "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
193
194
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/sparknlp/internal.py in __init__(self, java_obj, *args)
127 super(ExtendedJavaWrapper, self).__init__(java_obj)
128 self.sc = SparkContext._active_spark_context
--> 129 self._java_obj = self.new_java_obj(java_obj, *args)
130 self.java_obj = self._java_obj
131
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/sparknlp/internal.py in new_java_obj(self, java_class, *args)
137
138 def new_java_obj(self, java_class, *args):
--> 139 return self._new_java_obj(java_class, *args)
140
141 def new_java_array(self, pylist, java_class):
~/.pyenv/versions/3.7.9/lib/python3.7/site-packages/pyspark/ml/wrapper.py in _new_java_obj(java_class, *args)
67 java_obj = getattr(java_obj, name)
68 java_args = [_py2java(sc, arg) for arg in args]
---> 69 return java_obj(*java_args)
70
71 #staticmethod
TypeError: 'JavaPackage' object is not callable
I read a few of the github issues developers raised in spark-nlp repo, but the fixes are not working for me. I am wondering if the use of pyenv is causing problems, but it works for everything else.
My jupyter lab is launched like so:
/home/myuser/.pyenv/shims/jupyter lab --no-browser --allow-root --notebook-dir /home/myuser/workdir/notebooks
My env configuration:
ubuntu: 20.10
Apache Spark: 3.0.1
pyspark: 2.4.4
spark-nlp: 2.6.5
pyenv: 1.2.21
Java:
openjdk 11.0.9 2020-10-20
OpenJDK Runtime Environment (build 11.0.9+10-post-Ubuntu-0ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.9+10-post-Ubuntu-0ubuntu1, mixed mode, sharing)
jupyter:
jupyter core : 4.7.0
jupyter-notebook : 6.1.5
qtconsole : 5.0.1
ipython : 7.19.0
ipykernel : 5.4.2
jupyter client : 6.1.7
jupyter lab : 2.2.9
nbconvert : 6.0.7
ipywidgets : 7.5.1
nbformat : 5.0.8
traitlets : 5.0.5
I appreciate your help .. thank you

Remove Spark 3.0.1, leave just PySpark 2.4.x. as Spark NLP still doesn't support Spark 3.x. Use Java 8 instead of Java 11 because it's not supported in Spark 2.4.

Related

ipynb `%load_ext rpy2.ipython` errors "... 3 arguments passed to .Internal(gettext) which requires 2"

I use Anaconda Jupyter notebook via VSCode or directly via browser.
I installed rpy2 via conda as:
conda install -c r r-essentials rpy2 --yes
Then in a Python cell of Jupyter Notebook, I ran:
%load_ext rpy2.ipython
And got
R[write to console]: Error in gettext(fmt, domain = domain, trim = trim) :
3 arguments passed to .Internal(gettext) which requires 2
Details
I use Windows 10. I have a separate R installation on my computer too.
In ipynb, the same error appears if I run:
import rpy2.robjects as robjects
More details on the error:
%load_ext rpy2.ipython
R[write to console]: Error in gettext(fmt, domain = domain, trim = trim) :
3 arguments passed to .Internal(gettext) which requires 2
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_18052\1837600852.py in <cell line: 1>()
----> 1 get_ipython().run_line_magic('load_ext', 'rpy2.ipython')
c:\Users\user\anaconda3\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
2415 kwargs['local_ns'] = self.get_local_scope(stack_depth)
2416 with self.builtin_trap:
-> 2417 result = fn(*args, **kwargs)
2418 return result
2419
c:\Users\user\anaconda3\lib\site-packages\decorator.py in fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
233 fun.__name__ = func.__name__
234 fun.__doc__ = func.__doc__
c:\Users\user\anaconda3\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):
...
812
RRuntimeError: Error in gettext(fmt, domain = domain, trim = trim) :
3 arguments passed to .Internal(gettext) which requires 2
Other things I tried:
# Works:
import rpy2
print(rpy2.__version__)
# Returns:
3.5.1
import rpy2.situation
for row in rpy2.situation.iter_info():
print(row)
# Returns:
rpy2 version:
3.5.1
Python version:
3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC v.1929 64 bit (AMD64)]
Looking for R's HOME:
Environment variable R_HOME: None
InstallPath in the registry: C:\Program Files\R\R-4.2.2
Environment variable R_USER: None
Environment variable R_LIBS_USER: None
Unable to determine R library path: Command '('C:\\Users\\user\\anaconda3\\lib\\R\\bin\\Rscript', '-e', 'cat(Sys.getenv("LD_LIBRARY_PATH"))')' returned non-zero exit status 1.
R version:
In the PATH: R version 4.1.3 (2022-03-10) -- "One Push-Up"
Loading R library from rpy2: OK
Additional directories to load R packages from:
None
C extension compilation:
include:
['C:/PROGRA~1/R/R-4.2.2/include', 'C:/PROGRA~1/R/R-4.2.2/include/x64']
libraries:
['R', 'm']
library_dirs:
['C:/PROGRA~1/R/R-4.2.2/bin/x64']
extra_compile_args:
[]
extra_link_args:
[]
A similar issue with no answer is here.
Might be related but is related to continuous integration: https://github.com/rpy2/rpy2/issues/874

nglview installed but will not import inside Juypter Notebook via Anaconda.Navigator

I'm having trouble importing nglview inside a Juypter Notebook (JNb) cell. The instance of JNb is started via the base (root) Environment inside Anaconda.Navigator GUI. Inside Anaconda.Navigator, I've installed nglview. But the import continues to fail.
Versions:
Jupyter Notebook (inside Anaconda.Navigator) - 6.4.12
Anaconda.Navigator (GUI) - 2.3.2
Python - 3.9
nglview - 3.0.3 (installed but not importing)
ipython 8.5.0
This is the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [1], line 5
3 import pandas as pd
4 import matplotlib.pyplot as plt
----> 5 import nglview as nv
7 # the next line is necessary to display plots in Jupyter
8 get_ipython().run_line_magic('matplotlib', 'inline')
File ~\anaconda3\lib\site-packages\nglview\__init__.py:4
1 import warnings
3 # for doc
----> 4 from . import adaptor, datafiles, show, widget
5 from ._version import get_versions
6 from .adaptor import *
File ~\anaconda3\lib\site-packages\nglview\show.py:13
3 from . import datafiles
4 from .adaptor import (ASEStructure, ASETrajectory, BiopythonStructure,
5 FileStructure, HTMDTrajectory, IODataStructure,
6 IOTBXStructure, MDAnalysisTrajectory, MDTrajTrajectory,
(...)
11 RdkitStructure,
12 TextStructure)
---> 13 from .widget import NGLWidget
15 __all__ = [
16 'demo',
17 'show_pdbid',
(...)
40 'show_biopython',
41 ]
44 def show_pdbid(pdbid, **kwargs):
File ~\anaconda3\lib\site-packages\nglview\widget.py:19
15 from traitlets import (Bool, CaselessStrEnum, Dict, Instance, Int, Integer,
16 List, Unicode, observe, validate)
17 import traitlets
---> 19 from . import color, interpolate
20 from .adaptor import Structure, Trajectory
21 from .component import ComponentViewer
File ~\anaconda3\lib\site-packages\nglview\color.py:114
110 else:
111 raise ValueError(f"{obj} must be either list of list or string")
--> 114 ColormakerRegistry = _ColormakerRegistry()
File ~\anaconda3\lib\site-packages\nglview\base.py:10, in _singleton.<locals>.getinstance()
8 def getinstance():
9 if cls not in instances:
---> 10 instances[cls] = cls()
11 return instances[cls]
File ~\anaconda3\lib\site-packages\nglview\color.py:47, in _ColormakerRegistry.__init__(self, *args, **kwargs)
45 try:
46 get_ipython() # only display in notebook
---> 47 self._ipython_display_()
48 except NameError:
49 pass
File ~\anaconda3\lib\site-packages\nglview\color.py:54, in _ColormakerRegistry._ipython_display_(self, **kwargs)
52 if self._ready:
53 return
---> 54 super()._ipython_display_(**kwargs)
AttributeError: 'super' object has no attribute '_ipython_display_'
What is missing? I need to resolve this issue within the GUI of Anaconda.Navigator, as I need nglview as part of an exercise for students who do not have a background in computational sciences. I'm not after a solution that uses anything but the GUI. Asking a group over Zoom to start hacking around with a Mac/Windows/Linux terminal would be a nightmare. Many thanks.
UPDATE
Recent efforts have included:
closing and restarting Anaconda.Navigator GUI
"Quit" the Jupyter server (option found in the browser tab). Restarted the server.
%conda install -c conda-forge nglview at the top of the Notebook. It just informs me that it's already installed.
Closing the tab and all mentions of Anaconda and Jupyter (but not the browser window instance itself).
The fact that I haven't restarted the machine itself is a big grey elephant. Unfortunately, it's running a long Quantum chemistry calculation in the background that can't be continued after boot :-( Sorry. But I don't want to get hung up on restarting a machine - it shouldn't come down to that.
Check whether the version of ipywidget in your current conda version is above 8.0.0. Because Jupyter notebook is not compatible with the new version of ipywidget. Thus try the command below to install the older version of ipywidget, then ǹglview` should be properly imported:
conda install "ipywidgets <8" -c conda-forge

Backend "sox_io" is not one of available backends: ['soundfile']

I have try to study deep learning in colab with local runtime but after a few step I got the error
notice: I use windows 10
link: https://colab.research.google.com/drive/14-smojGNfo-Gr_voM9AfcMiuV9YvXWU1?hl=en
I've got error from this line:
show_random_elements(common_voice_train.remove_columns(["path"]), num_examples=20)
the error:
File C:\ProgramData\Anaconda3\envs\myenv\lib\site-packages\datasets\features\audio.py:175, in Audio._decode_mp3(self, path_or_file)
174 import torchaudio.transforms as T
--> 175 torchaudio.set_audio_backend("sox_io")
176 except RuntimeError as err:
File C:\ProgramData\Anaconda3\envs\myenv\lib\site-packages\torchaudio\backend\utils.py:43, in set_audio_backend(backend)
42 if backend is not None and backend not in list_audio_backends():
---> 43 raise RuntimeError(
44 f'Backend "{backend}" is not one of '
45 f'available backends: {list_audio_backends()}.')
47 if backend is None:
To support decoding 'mp3' audio files, please install 'sox'.
it seem I need to install sox but I already installed it. the windows cant use sox to decode so I also install soundfile but the problem still be there

ModuleNotFoundError: No java install detected. Please install java to use language-tool-python

I would like to check the number if issues in a given sentence.
my code is
import language_tool_python
tl = language_tool_python.LanguageTool('en-US')
txt = "good mooorning sirr and medam my namee anderen i am from amerecia !"
m = tl.check(txt)
len(m)
Instead of returning the number i am getting error message as shown below.
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-1c4c9134d6f4> in <module>
1 import language_tool_python
----> 2 tool = language_tool_python.LanguageTool('en-US')
3
4 text = "Your the best but their are allso good !"
5 matches = tool.check(text)
E:\Anaconda\lib\site-packages\language_tool_python\server.py in __init__(self, language, motherTongue, remote_server, newSpellings, new_spellings_persist)
43 self._update_remote_server_config(self._url)
44 elif not self._server_is_alive():
---> 45 self._start_server_on_free_port()
46 if language is None:
47 try:
E:\Anaconda\lib\site-packages\language_tool_python\server.py in _start_server_on_free_port(self)
212 self._url = 'http://{}:{}/v2/'.format(self._HOST, self._port)
213 try:
--> 214 self._start_local_server()
215 break
216 except ServerError:
E:\Anaconda\lib\site-packages\language_tool_python\server.py in _start_local_server(self)
222 def _start_local_server(self):
223 # Before starting local server, download language tool if needed.
--> 224 download_lt()
225 err = None
226 try:
E:\Anaconda\lib\site-packages\language_tool_python\download_lt.py in download_lt(update)
142 ]
143
--> 144 confirm_java_compatibility()
145 version = LATEST_VERSION
146 filename = FILENAME.format(version=version)
E:\Anaconda\lib\site-packages\language_tool_python\download_lt.py in confirm_java_compatibility()
73 # found because of a PATHEXT-related issue
74 # (https://bugs.python.org/issue2200).
---> 75 raise ModuleNotFoundError('No java install detected. Please install java to use language-tool-python.')
76
77 output = subprocess.check_output([java_path, '-version'],
ModuleNotFoundError: No java install detected. Please install java to use language-tool-python.
When I run the code I get no java install detected
How to solve this issue?
I think this is not an issue with the Code itself when I run the code you provided
import language_tool_python
tl = language_tool_python.LanguageTool('en-US')
txt = "good mooorning sirr and medam my namee anderen i am from amerecia !"
m = tl.check(txt)
len(m)
I get as result a number in this case
OUT: 8
In the Documentation of the language-tool-python is written:
By default, language_tool_python will download a LanguageTool server .jar and run that in the background to detect grammar errors locally. However, LanguageTool also offers a Public HTTP Proofreading API that is supported as well. Follow the link for rate-limiting details. (Running locally won't have the same restrictions.)
So You will need Java (JRE and SKD). Also it's Written in the Requirements of the library:
Prerequisites
Python 3.5+
LanguageTool (Java 8.0 or higher)
The installation process should take care of downloading LanguageTool (it may take a few minutes). Otherwise, you can manually download LanguageTool-stable.zip and unzip it into where the language_tool_python package resides.
Source:
https://pypi.org/project/language-tool-python/
Python 2.7 - JavaError when using grammar-check 1.3.1 library
I Hope I could help.

Spark-nlp Pretrained-model not loading in windows

I am trying to install pretrained pipelines in spark-nlp in windows 10 with python.
The following is the code I have tried so far in the Jupyter notebook in the local system:
! java -version
# should be Java 8 (Oracle or OpenJDK)
! conda create -n sparknlp python=3.7 -y
! conda activate sparknlp
! pip install --user spark-nlp==2.6.4 pyspark==2.4.5
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp
# Start Spark Session with Spark NLP
# start() functions has two parameters: gpu and spark23
# sparknlp.start(gpu=True) will start the session with GPU support
# sparknlp.start(sparrk23=True) is when you have Apache Spark 2.3.x installed
spark = sparknlp.start()
# Download a pre-trained pipeline
pipeline = PretrainedPipeline('explain_document_ml', lang='en')
I am getting the following error:
explain_document_ml download started this may take some time.
Approx size to download 9.4 MB
[OK!]
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~\AppData\Roaming\Python\Python37\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
~\Anaconda3\envs\py37\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline.
: java.lang.IllegalArgumentException: requirement failed: Was not found appropriate resource to download for request: ResourceRequest(explain_document_ml,Some(en),public/models,2.6.4,2.4.4) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader#2570f26e
at scala.Predef$.require(Predef.scala:224)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadResource(ResourceDownloader.scala:345)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:376)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:371)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:474)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)
During handling of the above exception, another exception occurred:
IllegalArgumentException Traceback (most recent call last)
<ipython-input-2-d18238e76d9f> in <module>
11
12 # Download a pre-trained pipeline
---> 13 pipeline = PretrainedPipeline('explain_document_ml', lang='en')
~\Anaconda3\envs\py37\lib\site-packages\sparknlp\pretrained.py in __init__(self, name, lang, remote_loc, parse_embeddings, disk_location)
89 def __init__(self, name, lang='en', remote_loc=None, parse_embeddings=False, disk_location=None):
90 if not disk_location:
---> 91 self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
92 else:
93 self.model = PipelineModel.load(disk_location)
~\Anaconda3\envs\py37\lib\site-packages\sparknlp\pretrained.py in downloadPipeline(name, language, remote_loc)
58 t1.start()
59 try:
---> 60 j_obj = _internal._DownloadPipeline(name, language, remote_loc).apply()
61 jmodel = PipelineModel._from_java(j_obj)
62 finally:
~\Anaconda3\envs\py37\lib\site-packages\sparknlp\internal.py in __init__(self, name, language, remote_loc)
179 class _DownloadPipeline(ExtendedJavaWrapper):
180 def __init__(self, name, language, remote_loc):
--> 181 super(_DownloadPipeline, self).__init__("com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline", name, language, remote_loc)
182
183
~\Anaconda3\envs\py37\lib\site-packages\sparknlp\internal.py in __init__(self, java_obj, *args)
127 super(ExtendedJavaWrapper, self).__init__(java_obj)
128 self.sc = SparkContext._active_spark_context
--> 129 self._java_obj = self.new_java_obj(java_obj, *args)
130 self.java_obj = self._java_obj
131
~\Anaconda3\envs\py37\lib\site-packages\sparknlp\internal.py in new_java_obj(self, java_class, *args)
137
138 def new_java_obj(self, java_class, *args):
--> 139 return self._new_java_obj(java_class, *args)
140
141 def new_java_array(self, pylist, java_class):
~\AppData\Roaming\Python\Python37\site-packages\pyspark\ml\wrapper.py in _new_java_obj(java_class, *args)
65 java_obj = getattr(java_obj, name)
66 java_args = [_py2java(sc, arg) for arg in args]
---> 67 return java_obj(*java_args)
68
69 #staticmethod
~\Anaconda3\envs\py37\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
~\AppData\Roaming\Python\Python37\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco
IllegalArgumentException: 'requirement failed: Was not found appropriate resource to download for request: ResourceRequest(explain_document_ml,Some(en),public/models,2.6.4,2.4.4) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader#2570f26e'
This is one of the common issues with Apache Spark & Spark NLP when the Java/Spark/Hadoop is not correctly setup on Windows:
You need to follow these steps correctly to avoid the common issues including failed pretrained() downloads:
Download OpenJDK from here: https://adoptopenjdk.net/?variant=openjdk8&jvmVariant=hotspot
Make sure it is 64-bit
Make sure you install it in the root C:\java Windows doesn't like space in the path.
During installation after changing the path, select setting Path
Download winutils and put it in C:\hadoop\bin https://github.com/cdarlint/winutils/blob/master/hadoop-2.7.3/bin/winutils.exe
Download Anaconda 3.6 from Archive, I didn't like the new 3.8 (Apache Spark 2.4.x only works with Python 3.6 and 3.7): https://repo.anaconda.com/archive/Anaconda3-2020.02-Windows-x86_64.exe
Download Apache Spark 2.4.6 and extract it in C:\spark\
Set the env for HADOOP_HOME to C:\hadoop and SPARK_HOME to C:\spark
Set Paths for %HADOOP_HOME%\bin and %SPARK_HOME%\bin
Install C++ (again the 64 bit) https://www.microsoft.com/en-us/download/confirmation.aspx?id=14632
Create C:\temp and C:\temp\hive
Fix permissions:
C:\Users\maz>%HADOOP_HOME%\bin\winutils.exe chmod 777 /tmp/hive
C:\Users\maz>%HADOOP_HOME%\bin\winutils.exe chmod 777 /tmp/
Either create a conda env for python 3.6, install pyspark==2.4.6 spark-nlp numpy and use Jupyter/python console, or in the same conda env you can go to spark bin for pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.6.5.

Categories