NumbaPro on Cuda device over ssh connection - python

I'm using Python/NumbaPro to use my CUDA complient GPU on a windows box. I use Cygwin as shell and from within a cygwin console it has no problems finding my CUDA device. I test with the simple command
numbapro.check_cuda()
But when I'm connection to the box over OpenSSH (as part of my Cygwin setup), I get the following error:
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init:
Call to cuInit results in CUDA_ERROR_NO_DEVICE:
How to fix this?

The primary cause of this is Windows service session 0 isolation. When you run any application via a service which runs in session 0 (so sshd, or windows remote desktop, for example), the machines native display driver is unavailable. For CUDA applications, this means that you are get a no device available error at runtime because the sshd you use to login is running as a service and there is no available CUDA driver.
The are a few workarounds:
Run the sshd as a process rather than a service.
If you have a compatible GPU, use the TCC driver rather than the GPU display driver.
On the secondary problem, the Python runtime error you are seeing comes from the multiprocessing module. From this question it appears that the root cause is probably the NUMBER_OF_PROCESSORS environment variable not being set. You can use one of the workarounds in that thread to get around that problem

Related

Send command to MacBook Terminal from Pycharm installed on virtual Windows

Background:
Through VMWare Fusion installed on my MacBook, I have Windows installed virtually in the VMWare Fusion environment. On the Windows, I have Pycharm IDE through which I run automated python program to control bench instruments from Keysight and Techroniks. No issues.
PS- The instrument drivers are available only for Windows, thats the reason I am using Windows virtually on MacBook
Question:
From Pycharm (installed on virtual Windows), I would like to send any command (say, print Hello World) to the Terminal of the MacBook.
How to do this and what would be the command syntax (or package needed)?
There is no single package to do this.
At a minimum, your Mac host would need to run a server process. Then the VM would need to be on a host network bridge such that it is remotely addressable. Then, you can write a client that sends RPC requests to the host's server process.
At a low-level, you can use socket library, but you may want something higher level like httpserver.
Related - VMWare fusion: connecting to host's web server from guest
The other option without external dependencies would be to communicate over a file-system share.
If you want to install external software, then you can introduce a remote message queue or database.

Maximum retries exhausted. Could not install VS Code server on [azure compute instance name]: Cannot communicate with the Jupyter endpoint

I am trying to edit my jupyter notebook which is in Azure ML Workspace. I use the 'edit-in vscode' option, it opens the vs-code or when I try to connect to compute instance directly from vs code, then I am getting the following error:
[Info - 2021-11-08 07:06:06.885] Using commit id "b3318bc0524af3d74034b8bb8a64df0ccf35549a" and quality "stable" for server
[Error - 2021-11-08 07:06:06.971] Invalid response: 405 Method Not Allowed
[Error - 2021-11-08 07:06:06.971] Cannot communicate with the Jupyter endpoint.
[Error - 2021-11-08 07:06:07.981] Maximum retries exhausted. Could not install VS Code server on saksham-dubey: Cannot communicate with the Jupyter endpoint.
Earlier it was working fine and I had no issues with vs-code, I was able to edit/run files directly but without changing anything, it started giving me problem.
So far, I have tried uninstalling/removing vscode completely, reinstall it, deleting Azure compute instance and creating a fresh instance but nothing worked, I am still facing the issue.
What is the problem here and how to resolve this?
ps- saksham-dubey is my compute instance name, which is a Azure Compute Instance (Nvidia Tesla K80 GPU)
Been dealing with the same issue since Friday, tried pretty much anything to no avail.
Luckily I found a workaround by enabling SSH on a new GPU instance & connecting to the instance through Remote SSH from VS Code.
Will keep posted if I found a solution for the UI access.
I found the fix to the problem.
The problem was caused due to some backend error in the 'Azure Machine Learning' vs-code extension.
The issue was caused in the version: 0.6.26
The fix was implemented on the version: 0.6.27
Updating your vs-code extension will make it work.
For any further information you can refer: Github Link to vscode-tools-for-ai
Also you can open an issue on same Github page, if anything like this happens again.

Making a file executable on a by the work place locked Mac

I have a Macbook pro that I have received from work. I want to run my webscraper with selenium. However the moment I call the webscraper from my Python code I get the following errors:
In a window: “chromedriver” cannot be opened because the developer cannot be verified.
macOS cannot verify that this app is free from malware.
Chrome downloaded this file today at 09:24 from chromedriver.storage.googleapis.com
In the terminal: selenium.common.exceptions.WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -9
I have already:
Moved the file (chromedriver executable) to different directories (username/bin, usr/bin, usr/local/bin) that are in my PATH env.
Tried to unlock execution via mac interface via the instructions in this link: https://support.apple.com/en-us/HT202491
Changed the file characteristics with "sudo chmod 777 chromedriver"
Contacted my work ICT department. They don't want to remove the lock.
Is there a way to work around this lock? Help is very much appreciated.
Maybe you can find a vulnerability in mac or the softwares that are already installed in your mac. check cvedetails.com, any other website like that is fine, they synchronize their database.
But before doing so, you should know that this is policy is forced with your network admin. It means there is an agent already installed on your mac that is reporting the current status of mac to admin. If you break it then it will report it. You might loose your job.

Problem submitting Pyspark jobs from Windows Driver to Ubuntu Spark Cluster

I am having trouble submitting a Pyspark job from my Windows driver machine (Win 10) to a simple Spark cluster running on Ubuntu.
There are several posts already that attempt to answer this question, most notably this one from ThatDataGuy here but none of them have helped.
Every time I try to submit the simple wordcount.py example to my remote master from my Windows box, I get the following error:
Cannot run program 'C:\apps\Python\3.6.6\python.exe': error=2, No such file or directory
This is a Java IOException generated by the Py4J jar.
My Spark cluster is a simple Master, 1 Worker setup in VirtualBox setup via Vagrant. All machines, (my Spark driver laptop, and 2 VMs (Master / Worker) have identical Spark 2.4.2, Python 3.6.6, and Scala 12.8. Note that Scala programs using spark-submit against the remote cluster work fine, as well as anything run in local mode. Also, the code examples work fine when run on either the Master or Worker nodes directly. It's only when I try to use my Windows laptop as a Spark driver in Pyspark, against the Ubuntu Spark cluster, that this issue arises. It always returns the error above.
It seems that Py4j is trying to use or instantiate Python from my Windows Driver's python path, which of course my Linux cluster can't see. I have already set the Pyspark Python path to a different value in the cluster nodes. I have set the both PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON in the nodes environment variables (via .bashrc), in spark-defaults.conf, AND in spark-env.sh files. All values point to /usr/local/bin/python3 as that's where Python 3.6.6 is installed on the Master and Worker nodes.
I've also (just as a hunch) aliased "python" to point to /usr/local/bin/python3 in the nodes and then changed my Windows python shortcut to pull up the same Python version. No luck but I was grasping at straws. ;/ Error simply changed to:
Cannot run program 'python': error=2, No such file or directory
I did see an article where the Py4J 0.10.7 library does not support Python 3.7, so this caused me to drop down to Python 3.6. Error stayed the same after that though.
The only thing I haven't done is to try to setup an additional shared / synced folder in Vagrant back to my Windows Python installation and then use /vagrant/shared/python/whatever in my remote PYSPARK settings. No idea if that would work though given I'm dealing with Windows and Linux Python flavors (all 3.6.6). Ugh. :/
Any ideas? I've got a Windows 10 machine and I like to do my Python development there. I've also got 64GB of RAM so I'd like to use it. Please don't make me switch to Scala! ;)
-- Pyspark works fine in local
spark-submit C:\apps\Spark\spark-2.4.2\examples\src\main\python\wordcount.py C:\Users\sitex\Desktop\p_and_p_ch1.txt
-- Pyspark fails when calling master with IOException
spark-submit --master spark://XXX.XX.XXX.XXX:7077 C:\apps\Spark\spark-2.4.2\examples\src\main\python\wordcount.py C:\Users\sitex\Desktop\p_and_p_ch1.txt
UPDATE: Ok, so it looks like my workaround is to pretend that my Driver (Windows laptop) knows the Python path in Linux that the Worker needs to use. Fortunately for me, I do it as this entire setup is running on my laptop. Here is the code that gets me past the error:
spark-submit --conf spark.pyspark.driver.python=python --conf spark.pyspark.python=/usr/local/bin/python3 --master spark://172.28.128.150:7077 C:\apps\Spark\spark-2.4.2\examples\src\main\python\wordcount.py /vagrant/shared/p_and_p_ch1.txt
Now, I should add that this DOESN'T run wordcount.py as I quickly realized that my Cluster cannot figure out Window's paths and my attempt to use a Vagrant synced / shared folder is resulting in a File Not Found on the p_and_p_ch1.txt file. But it does get me past my dreaded error. I can figure out how to stash my files on a network share / S3 / et some other day.
This puts a lot of onus on the Spark Driver knowing exactly what Python path the Cluster needs to use. Fortunately I know these settings as the setup is entirely on my laptop, but isn't the entire point of this that I am supposed to submit Spark jobs to a cluster without the Driver (me) knowing settings like the Worker nodes' Python path? I'm wondering if this is just a Windows + Linux quirk?

Paraview python use offscreen image rendering

I'm trying to generate images using Paraview in a non-interactive PBS job on a remote linux machine. From the command-line, if I have a file called cone.py with the following contents
from paraview.simple import *
Cone()
Show()
SaveScreenshot('cone.png')
and I type pvpython cone.py in the command-line, then a window pops up showing me the image of the cone, and the image is saved. I don't want the window to pop up. It does even if I use pvbatch cone.py or pvbatch cone.py --use-offscreen-rendering. If I try to run this code from within a PBS job, the image isn't generated (probably because it can't generate the window) and the following error message is generated:
ERROR: In /home/kitware/Dashboards/buildbot/paraview-debian4dash-linux-shared-release_qt4_superbuild/source-paraview/VTK/Rendering/OpenGL/vtkXOpenGLRenderWindow.cxx, line 542
vtkXOpenGLRenderWindow (0x139559c0): bad X server connection. DISPLAY=/var/spool/PBS/mom_priv/jobs/1443323.rrlogin.internal.SC: line 8: 21926 Aborted pvbatch cone.py
/home/kitware isn't a valid directory on my machine. Any help would be appreciated.
If your remote Linux machine has X installed you need to set your DISPLAY variable before running pvbatch. If your remote Linux machine doesn't have X installed you'll need to build with OSMesa (info and directions here).
Looks like you are using the official ParaView binaries for a non-interactive PBS job which doesn't have an accessible X server. You have two options:
Check with your site admins to see if there's a way to start up an X server with your job. This would be typically the case if your remote machine has GPUs.
Build ParaView from source with OSMesa (which stands for offscreen-Mesa) support. Check out this wiki page for details. Also refer to this blog post if you're building ParaView 5.0 or later with OpenGL2 backend. This will work on systems without an accessible X server.
You cannot use the binaries distributed at paraview.org on X-less systems. --use-offscreen-rendering still requires access to an X server to create the OpenGL context unless build with OSMesa support (in which case the command line option is not necessary). If you are wondering why the command line option then? It's to avoid creating a popup window on X-enabled systems.

Categories