Running spark-submit from pycharm - python

I am trying to figure out how to develop apache-spark program in PyCharm.
I have followed article in this link.
I define SPARK_HOME and add pyspark to Python path well. There is no error
in importing pyspark modules and autocomplete works fine.
However I get an error on defining SparkContext when I run the program in PyCharm.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
...
...
Exception: Java gateway process exited before sending the driver its port number
I managed to run the program on terminal with submit-spark.
Do I need to change the configuration on PyCharm or Is there anyway to run
submit-spark instead of python in PyCharm?

If you are fine by terminal submit-spark you can add a run configuration that does that for you. Otherwise you can see some configuration in Edit Run/Debug Configurations window as well. This post in particular can you get you there.

Related

PyCharm run configuration only shows python tests. How to run it a regular run?

I have cloned a git repository and am trying to run the code on PyCharm IDE. When I try to run it, my usual run option is not available and only run nosetests is available. I read that this is a module to help testing the code, but I don't see an import nosetests or anything like that which helps me to understand why my IDE automatically runs nosetests on this particular code.
Question: How can I run this like a normal code and why I'm seeing this run option instead.
I found multiple questions on how people accidentally changed their IDE setting in a way that all the codes are running using nosetest but not my question. I would appreciate if you can share a link that gives more details on this.
It seems that you do not have a Run Configuration in project that runs the code just tests. In PyCharm go to "Run" -> "Run..." (Shift + Alt + F10) and choose "Edit Configurations..." on the plus sign you can add a new configuration running python code "normally".
It is explained in detail on Jetbrains website:
https://www.jetbrains.com/help/pycharm/creating-and-editing-run-debug-configurations.html?keymap=primary_windows
From what I understand, you are not able to run the py code. You can achieve this easily on the terminal provided within Pycharm, using the commands provided in the project README.
Alternatively, if you want to run it using the GUI, you can edit the Run Configuration by clicking the dropdown near the Run icon at the top.
For further information please head out to https://www.jetbrains.com/help/pycharm/creating-and-editing-run-debug-configurations.html?keymap=primary_windows

Exe created on virtual server with PyInstaller fails to execute

I have created an exe using Pyinstaller with python virtualenv, I am able to do this in the normal python environment without any problems, however when creating it on a virtual server the exe once created fails to execute (open).
I have created the exe with the following debug options,
Scripts\pyinstaller --debug=all --log-level=DEBUG
There is nothing in the log that indicates any specific error.
However when I try to open the exe, In the Tracing process while clicking through the different windows, the following message pops up:
'
[49852]: LOADER: Error activating the context: ActivateActCtx:
An Attempt to set the process default activation context failed because the process default activation context as already set.
It allows you to click "OK" through the rest of the messages and then:
I have tried to trace and debug and find the error, however I have not had much success, there are no missing modules or errors in the log.
I am using Python 3.7.4
Any help would be appreciated.

Pycharm is not letting me run my script 'test_splitter.py' , but instead 'Nosetests in test_splitter.py'?

I see many posts on 'how to run nosetests', but none on how to make pycharm et you run a script without nosetests. And yet, I seem to only be able to run or debug 'Nosetests test_splitter.py' and not ust 'test_splitter.py'!
I'm relatively new to pycharm, and despite going through the documentation, I don't quite understand what nosetests are about and whether they would be preferrable for me testing myscript. But I get an error
ModuleNotFoundError: No module named 'nose'
Process finished with exit code 1
Empty suite
I don't have administartive access so cannot download nosetests, if anyone would be sugesting it. I would just like to run my script! Other scripts are letting me run them just fine without nosetests!
I found the solution: I can run without nosetests from the 'Run' dropdown options in the toolbar, or Alt+Shift+F10.
It is probably because you are using an interpreter which doesn't have the nosetests installed.
You can configure your project interpreter from: File > Settings > Project Interpreter

Error while Registering the script to be run at start-up, how to resolve?

I am from electrical engineering and currently working on a project using UP-Board, I have attached LEDs, switch, Webcam, USB flash drive with it. I have created an executable script that I want to run at startup.
when I try to run the script in terminal using the code sudo /etc/init.d/testRun start it runs perfectly. Now when I write this command in terminal sudo update-rc.d testRun defaults to register the script to be run at startup it gives me the following error
insserv: warning: script 'testRun' missing LSB tags and overrides
Please guide me how to resolve this? I am from Electrical engineering background, so novice in this field of coding. Thanks a lot :)
The thing to remember is that you run the script as you but like chron startup does not, so you need to:
Ensure that the executable flags are set for all users and that it is in a directory that everybody has access to.
Use the absolute path for every thing, including the script.
Specify what to run it with, again with the absolute path.

Launching a spark ec2 cluster from windows

I'm running windows 8 and would like to launch a spark cluster. I'm using the this tutorial. It isn't running with windows CLI, so I tried installing and using cygwin. With that I was able to change the environment variables and also run the ec2 script but I get the error:
ERROR: The identity file must be accessible only by you.
You can fix this with: chmod 400 "SpakPlaygroundKeyPair.pem"
So I'm stuck here. I saw that in This Question it was suggested to run the python file directly, which is actually what I want to do, but I'm not sure how. e.g. When you run the script, you have to specify things like
--key-pair=SpakPlaygroundKeyPair --identity-file=SpakPlaygroundKeyPair.pem --region=us-east-1 --zone=us-east-1a --instance-type=t2.micro launch my-spark-cluster
How do you tell that to the python script?
I ran into the same issue with on windows 10. Luckily the file permission requirements are coded into the spark_ec2.py script, and are not a fundamental limitation of the AWS python API.
I ended up commenting out the following lines in the spark_ec2.py script:
if not (file_mode & S_IRUSR) or not oct(file_mode)[-2:] == '00':
print("ERROR: The identity file must be accessible only by you.", file=stderr)
print('You can fix this with: chmod 400 "{f}"'.format(f=opts.identity_file),
file=stderr)
sys.exit(1)
Simply run the suggested fix; Like this:
$ chmod 400 "SpakPlaygroundKeyPair.pem"
This should give only you read permissions to the pem file.

Categories