Launching a spark ec2 cluster from windows - python

I'm running windows 8 and would like to launch a spark cluster. I'm using the this tutorial. It isn't running with windows CLI, so I tried installing and using cygwin. With that I was able to change the environment variables and also run the ec2 script but I get the error:
ERROR: The identity file must be accessible only by you.
You can fix this with: chmod 400 "SpakPlaygroundKeyPair.pem"
So I'm stuck here. I saw that in This Question it was suggested to run the python file directly, which is actually what I want to do, but I'm not sure how. e.g. When you run the script, you have to specify things like
--key-pair=SpakPlaygroundKeyPair --identity-file=SpakPlaygroundKeyPair.pem --region=us-east-1 --zone=us-east-1a --instance-type=t2.micro launch my-spark-cluster
How do you tell that to the python script?

I ran into the same issue with on windows 10. Luckily the file permission requirements are coded into the spark_ec2.py script, and are not a fundamental limitation of the AWS python API.
I ended up commenting out the following lines in the spark_ec2.py script:
if not (file_mode & S_IRUSR) or not oct(file_mode)[-2:] == '00':
print("ERROR: The identity file must be accessible only by you.", file=stderr)
print('You can fix this with: chmod 400 "{f}"'.format(f=opts.identity_file),
file=stderr)
sys.exit(1)

Simply run the suggested fix; Like this:
$ chmod 400 "SpakPlaygroundKeyPair.pem"
This should give only you read permissions to the pem file.

Related

Pipenv and ModuleNotFoundError

I have spent hours looking into this issue without any success.
I've looked at various SO discussions and none seem to solve my problem so out of pure frustration here is my question...
I'm trying to launch a script within a windows batch file. The problem is that when I do the script fails because it can not find some of the modules used.
After various attempts I have found that the batch file aspect, at this stage, seems to be irrelevant.
So, ignoring batch files for a minute, If I run the script like this
pipenv run python myscript.py
It works. If I run the following it doesnt
path-to-env\Scripts\activate
python myscript.py
It returns an error ModuleNotFoundError: No module named 'xxx'
It activates the venv OK, but something is not right as it cant find code used in script
Within my IDE (Visual Code) everything works OK
I do have quite a complicated directory structure but given that both the IDE and "pipenv run python myscript.py" work as expected it must be due to something else.
Any ideas or pointers on where I need to be looking? I'm afraid my understanding of pipenv isnt up to solving this ;)
EDIT
In my attempts to solve this I had added the line PYTHONPATH=. to my .env file. This seems to be responsible for allowing this line to work:
pipenv run python myscript.py
If I remove it, then the above ALSO generates the ModuleNotFoundError
OK so after trying lots of various combinations I did finally manage to get this to work.
Although I have no idea why this solution works and others didnt..
It requires two batch files.
One to launch the python script which will contain a line like this
python myscript.py
And another to create the env via pipenv and then call the first batch file
It will have a line like this
pipenv run \path\to\first\batchfile.bat
This combination works and can be successfully called from the Windows Task Scheduler

Running python script via execute command from shell in Jenkins

The Jenkins job is set up so that it checks out latest version of a git repo
which executes some python code. The git repo is checked out to our linux lab pc and it runs there.
In the script, we are checking status of some labpc network interfaces. I made a small script which executes the following lines but it throws error like "no file or directory." The command is ok but it fails because linux env is not visible. The strange thing is that we have like 10 testcases and in 6 of them it works perfectly fine and it 4 it fails and it always fails for just those. The sequence of events is exactly the same in all the testcases ...
res = subprocess.check_output(['ip', 'link', 'show', 'dev', '<interface name>'])
logger.info(res)
The scripts works when executed locally so there is jenkins issue which is behind this. Does anybody have any tips to resolve this?
The problem is solved by putting 'sudo' before the command. Even
though the command doesn't require sudo rights, it requires it
when run with jenkins.
You can open in jenkins the node where you can execute groovy commands to the server and test that it works.

Run a python script from bamboo

I'm trying to run a python script from bamboo. I created a script task and wrote inline "python myFile.py". Should I be listing the full path for python?
I changed the working directory to the location of myFile.py so that is not a problem. Is there anything else I need to do within the configuration plan to properly run this script? It isn't running but I know it should be running because the script works fine from terminal on my local machine. Thanks
I run a lot of python tasks from bamboo, so it is possible. Using the Script task is generally painless...
You should be able to use your script task to run the commands directly and have stdout written to the logs. Since this is true, you can run:
'which python' -- Output the path of which python that is being ran.
'pip list' -- Output a list of which modules are installed with pip.
You should verify that the output from the above commands matches the output when ran from the server. I'm guessing they won't match up and once that is addressed, everything will work fine.
If not, comment back and we can look at a few other things.
For the future, there are a handful of different ways you can package things with python which could assist with this problem (e.g. automatically installing missing modules, etc).
You can also use the Script Task directly with an inline Python script to run your myFile.py:
/usr/bin/python <<EOF
print "Hello, World!"
EOF
Check this page for a more complex example:
https://www.langhornweb.com/display/BAT/Run+Python+script+as+a+Bamboo+task?desktop=true&macroName=seo-metadata

Running spark-submit from pycharm

I am trying to figure out how to develop apache-spark program in PyCharm.
I have followed article in this link.
I define SPARK_HOME and add pyspark to Python path well. There is no error
in importing pyspark modules and autocomplete works fine.
However I get an error on defining SparkContext when I run the program in PyCharm.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
...
...
Exception: Java gateway process exited before sending the driver its port number
I managed to run the program on terminal with submit-spark.
Do I need to change the configuration on PyCharm or Is there anyway to run
submit-spark instead of python in PyCharm?
If you are fine by terminal submit-spark you can add a run configuration that does that for you. Otherwise you can see some configuration in Edit Run/Debug Configurations window as well. This post in particular can you get you there.

Permission Denied when executing python file in linux

I am working with my Raspberry Pi 2 B+ and I am using Raspbian. I have a python script located at /home/pi/Desktop/control/gpio.py
When I type /home/pi/Desktop/control/gpio.py into the command line, I get the message
bash: /home/pi/Desktop/control/gpio.py Permission denied
I have tried running sudo -s before running that command also but that doesnt work. My python script is using the Rpi.GPIO library.
If someone could please explain why I am getting this error it would be appreciated!
You will get this error because you do not have the execute permission on your file. There are two ways to solve it:
Not executing the file in the first place. By running python gpio.py python will load the file by reading it, so you don't need to have execute permission.
Granting yourself execute permission. You do this by running chmod u+x yourfile.py.
However, doing so will not work unless you add a shebang at the top of your python program. It will let your linux know which interpreter it should start. For instance:
#!/usr/bin/env python
This would try to run python using your current $PATH settings. If you know which python you want, put it here instead.
#!/usr/bin/python3
Remember the shebang must be the very first line of your program.
do like this maybe work:
cd /home/pi/Desktop/control/
python gpio.py
Because gpio.py is not a executable file, you should run it by python instead

Categories