How to use pycharm to run an application in remote spark cluster - python

I have installed PyCharm on my local system and have configured it to run spark applications in local mode in windows.
My spark cluster is in a remote Ubuntu box.
How can I run a spark application in the remote spark cluster, which is on Ubuntu, from my locally installed PyCharm which is on Windows?
My goal is to run the application in a remote cluster so workarounds are also welcome.

PyCharm is already setup for this. Ideally you want to setup a deployment and a remote interpreter for your setup, ideally via ssh.
This allows you to upload your codebase to the cluster (so that the pyspark driver has access to it), but run it from your laptop. Remote interpreter then takes care of resolving dependencies on the cluster.
Have a look here https://www.jetbrains.com/help/pycharm/configuring-remote-interpreters-via-ssh.html and here https://www.jetbrains.com/help/pycharm/creating-a-remote-server-configuration.html.
NB: Before you start configuring the remote interpreter, it's better to install venv or conda on your cluster and create a virtual environment, so that you don't have any dependencies or outdated packages. You then point the remote interpreter config to the python binary of the environment, such as /app/anaconda3/envs/my_env/bin/python.

Related

Remote VSCode notebook looks for python on local machine path

I am having trouble with my vscode and ssh connection.
I have installed miniconda on my local machine at home/sam/miniconda3, and I have installed anaconda on a remote HPC machine on home/sam/anaconda3.
I have replicated conda envirenments in both using a yml file. Recently, the I keep getting errors while connecting to ssh that "python was not found". After digging for a while, I realize that when running a jupter nb, it looks for python under the local machine path (home/sam/miniconda3). Despite the case that I have selected the kernel from the remote machine path (/home/sam/anaconda3/envs/myenv/bin/python3.9).
How can I get the notebook in vscode to look at the remote path when connected to the remote machine?
Thanks heaps!

Mounting a virtual environment via SSHFS on local machine using it's Python3 file not working

So I have mounted a part of a development server which hold a virtual environment that is used for development testing. The reason for this is to get access to the installed packages such as Django-rest-framework and Django itself and not having it set up locally (to be sure to use the same version as the development server has). I know that it's perhaps better to use Docker for this, but that's not the case right now.
The way I've done it is installing SSHFS via an external brew (as it's no longer supported in the brew core) - via this link https://github.com/gromgit/homebrew-fuse
After that I've run this command in the terminal to via SSH mount the specific part of the development server that holds the virtual enviornment:
sshfs -o ssh_command='ssh -i /Users/myusername/.ssh/id_rsa' myusername#servername:/home/myusername/projectname/env/bin ~/mnt/projectname
It works fine and I have it mounted on my local disk in mnt/projectname.
Now I go into VSCode and go into the folder and select the file called "python3" as my interpreter (which I should, right?). However, this file is just an alias, being 16 bytes in size. I suspect something is wrong here, but I'm not sure on how to fix it. Can someone maybe take a look and give some input? I'll attach a screenshot of the mounted directory.
Screenshot of virtualenv directory mounted on local machine
The solution to the problem was using the VSCode extension Remote - SSH and run VSCode directly in the remote location, and from there being able to access the virtual environment.

Opening script saved on a cluster with local spyder installation

I have anaconda, hence spyder, installed on a local machine. What I am trying to do is to use my local spyder installation to open a .py script saved on a remote cluster (in my office) via ssh. The issues that I am encountering are the following:
I cannot run spyder from the cluster - there is no graphical device whatsoever. For example, we have actually anaconda installed on the cluster, but when I ran spyder from the command line, I get the following error message: Could not connect to any X display
I cannot mount the (remote) drivers, where the .py scripts are located, onto my local machine when I am working from home (which is the case when I am at work, connected to the internet via cable). If this was the case, I could simply launch spyder on my local machine, then open the scripts. I can only access the files on some drivers mounted onto the cluster via ssh.
As, however, I can access the .py scripts saved on the cluster via ssh (I can open then with programs installed locally e.g. vim, jpico etc), I was wondering whether it is possible to use the command line to open a script saved on a remote cluster using my local spyder installation, something like $ spyder /path/to/myScript/savedOnTheRemoteCluster.py
(Spyder maintainer here) As of May 2019 our editor is not capable of working with files on remote locations. So your best option right now is to mount your remote server with sshfs to make it appear as a local directory and then open any file present there in Spyder.

Virtualenv not recognised in ipython Notebook Server

I have iPython running from a secured server on an Ubuntu server VM running on my laptop.
Command line ipython works on the server vm from the virtualenv. I can also start the notebook server on the server vm from the virtualenv without errors.
I can access notebooks from the host laptop and execute code in cells, but if I start the notebook server after activating a virtualenv I can't import any of the Python modules I've installed in the virtualenv.
It looks like the notebook server process is running the system Python but not the version in my virtualenv. Is there a way to tell the notebook server process which virtualenv to use?
Because virtualenv on activation adds its own ways to the begining of the PATH environment variable, you have two options:
a) create correct virtualenv on the notebook server and install everything from there
b) modify PYTHONPATH variable in order to get access to your libraries.

import python module/packge into remote machine

I have setup virtual box and install package (WxPython) into virtual machine and doing programming for learning wxpython and Python. we connect into remote machine using putty in windows and ssh in virtual box.
I want to do some experiment/analysis with exiting code using WxPython But we do not have permission to install python package into remote machine. if I raise ticket package to install package towards IT team it require lot of business justification.
As it is my personal interest, I do not have any business reason
is it possible,can I access wxPython package into remote machine which is installed into virtual box.
I am not in any way associated with the tool I am going to suggest. I have used Vagrant with VirtualBox and it has worked fine for me.
The code is in a folder which is accessible from both Virtual as well as Base machine.
Can you not install it locally on the remote machine, and not globally, then just export that python path?

Categories