writing a data logging program that is intended to run when raspberry boots. I'm using lxsessions autostart to launch a shell script that has the command to launch my python program (my python script requires sudo)
while I continue to debug I would like the terminal window to stay open if/when it encounters an error.
I had done this successfully once before but lost my work.
my autostart file is:
#!/bin/bash
#lxpanel --profile LXDE-pi
#pcmanfm --desktop --profile LXDE-pi
#lxterminal -e sudo sh /home/pi/launcher.sh
#xscreensaver -no-splash
my script file is:
#!/bin/sh
echo Script is running
sudo /usr/bin/python3 /home/pi/hms/hms5-1.py
I thought something like this (in the autstart file) would work, but no:
#lxterminal -e -hold sudo sh /home/pi/launcher.sh
a simple internet search spit out of examples on how to execute command at boot, even launching scripts but nothing has helped so far. Thank you in advance.....
So I rebuilt my Raspberry Pi and had to go through this again. So after I was able to get it to work once, I followed my instructions from before, edited them to be clearer the posted here. NOTE - I think the mistake I made before was using sudo (sudo nano) when I should have just used nano....
Also note the python program I am launching is /home/pi/hms/hms2-v2.py
*** setting this up is a 4 step process ***
YOU MUST HAVE XTERM
STEP 1 - INSTALL XTERM:
sudo apt-get install xterm
STEP 2
read https://www.raspberrypi.org/forums/viewtopic.php?t=227191
FIRST CREATE autostart here: /home/pi/.config/lxsession/LXDE-pi/autostart
NOTE THE FOLDERS BELOW /home/pi/.config/ MAY NOT EXIST, IF NOT CREATE THEM EXACTLY AS ABOVE. NOTE: the directory must be LXDE-pi NOT LXDE
Then edit the autostart file
by using nano ~/.config/lxsession/LXDE-pi/autostart
NOTE: DO NOT use sudo in the above command
put the following in the file:
#!/bin/bash
#lxpanel --profile LXDE-pi
#pcmanfm --desktop --profile LXDE-pi
sh /home/pi/launcher.sh
#xscreensaver -no-splash
Step 3
create the script (.sh) file: launcher.sh in the directory /home/pi
include the following in the file launcher.sh:
#!/bin/sh
echo starting script
xterm -T "HMS" -geometry 100x70+10+35 -hold -e sudo /usr/bin/python3 /home/pi/hms/hms2-v2.py
Step 4
make the .sh file executable with: sudo chmod +x launcher.sh
I have an application which exposes the urls using mutual Authentication. Now I am writing a python script which uses Popen to run the curl command to connect to the application and gets me the required data. But when I run the python script I get following error.
curl: (58) could not load PEM client certificate, OpenSSL error error:02001002:system library:fopen:No such file or directory, (no key found, wrong pass phrase, or wrong file format?)
I am running the application on windows 7 machine. I have curl and openssl installed. The command that is run is given below
curl -v https://localhost:9400/<URL> -H "Connection:close" --cacert 'C:/local_cert/root.crt' --cert 'C:/local_cert/client.crt' --key 'C:/local_cert/client.key' --pass client_key_passwd
Now for testing I ran the same command in Git Bash for windows. I got the result successfully.
But when I run the same command in Git Cmd for windows or Windows Cmd I get the same above error.
I have checked the paths to cert are correct, they are in PEM format, I have openssl and curl installed.For some reasons I cannot use Requests or urllib3 python pacakges and only can use curl. The above make me believe that there is some setting that Windows Cmd and Git Cmd for windows is missing some settings but I am not sure what it may be.
After trying lot of things I finally figured out the answer. The error said no file found, wrong passphrase or wrong format. Since the command worked in git bash I was sure that its not a issue with file or passphrase. Concentrating on no file found I found below link
Windows PATH to posix path conversion in bash
which gave me an idea that may be the way I am specifying the path is incorrect depending on which version of curl we are using. So after trying various combination I found that if you use plain curl in git bash following both cmd will work
curl -v https://localhost:9400/<URL> -H "Connection:close" --cacert 'C:/local_cert/root.crt' --cert 'C:/local_cert/client.crt' --key 'C:/local_cert/client.key' --pass client_key_passwd
and
curl -v https://localhost:9400/<URL> -H "Connection:close" --cacert C:/local_cert/root.crt --cert C:/local_cert/client.crt --key C:/local_cert/client.key --pass client_key_passwd
But in windows Cmd or when calling curl from python only following cmd will work
curl -v https://localhost:9400/<URL> -H "Connection:close" --cacert C:/local_cert/root.crt --cert C:/local_cert/client.crt --key C:/local_cert/client.key --pass client_key_passwd
So In nutshell it was a issue with quotes because the way your curl utility is called and which version of curl is used (compiled for windows or not) the interpretation of quotes will be different.
I would like to include a third party python library when running Hadoop streaming job.
I followed the suggestions in the post here but it doesn't seem to work.
I submitted a command like this:
hadoop jar /usr/local/hadoop/hadoop-2.2.0/lib/hadoop-streaming-2.2.0.jar \
-input $hdfs_input_file \
-output $hdfs_output_file \
-mapper $mapper_file \
-combiner $reducer_file \
-reducer $reducer_file \
-file $mapper_file \
-file $reducer_file \
-file $packaged_file
The $packaged_file is a packaged file that contains the third party library.
My script failed at this line (in $mapper_file):
xyz = importer.load_module('library_name')
The error message is
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
However, the above line of code run well in ipython. I could even run the following line in ipython
xyz.method_foo()
Any suggestions on this problem? Thanks!
I have written mapper and reducer for the wordcount example in python. The scripts works fine as a standalone ones. but I get error when run in hadoop.
I am using hadoop2.2
Here is my command:
hadoop jar share/hadoop/tools/sources/hadoop-streaming*.jar -mapper wordmapper.py -reducer wordreducer.py -file wordmapper.py -file wordreducer.py -input /data -output/output/result7
Exception in thread "main" java.lang.ClassNotFoundException: share.hadoop.tools.sources.hadoop-streaming-2.2.0-test-sources.jar
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
how to fix this?
Can u please try it with
hadoop jar $HADOOP_PREFIX/hadoop/tools/sources/hadoop-streaming*.jar -mapper 'wordmapper.py' -reducer 'wordreducer.py' -file $CODE_FOLDER/wordmapper.py -file $CODE_FOLDER/wordreducer.py -input /data -output /output/result7
Where $HADOOP_PREFIX is folder location where the hadoop is placed on your machine.
for eg./usr/local/ for my machine.
If you can manually acces that location and check whether that jar is present.
And $CODE_FOLDER contains the code file where the script is saved.
From this guide, I have successfully run the sample exercise. But on running my mapreduce job, I am getting the following error
ERROR streaming.StreamJob: Job not Successful!
10/12/16 17:13:38 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
Error from the log file
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Mapper.py
import sys
i=0
for line in sys.stdin:
i+=1
count={}
for word in line.strip().split():
count[word]=count.get(word,0)+1
for word,weight in count.items():
print '%s\t%s:%s' % (word,str(i),str(weight))
Reducer.py
import sys
keymap={}
o_tweet="2323"
id_list=[]
for line in sys.stdin:
tweet,tw=line.strip().split()
#print tweet,o_tweet,tweet_id,id_list
tweet_id,w=tw.split(':')
w=int(w)
if tweet.__eq__(o_tweet):
for i,wt in id_list:
print '%s:%s\t%s' % (tweet_id,i,str(w+wt))
id_list.append((tweet_id,w))
else:
id_list=[(tweet_id,w)]
o_tweet=tweet
[edit] command to run the job:
hadoop#ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input my-input/* -output my-output
Input is any random sequence of sentences.
Thanks,
Your -mapper and -reducer should just be the script name.
hadoop#ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input my-input/* -output my-output
When your scripts are in the job that is in another folder within hdfs which is relative to the attempt task executing as "." (FYI if you ever want to ad another -file such as a look up table you can open it in Python as if it was in the same dir as your scripts while your script is in M/R job)
also make sure you have chmod a+x mapper.py and chmod a+x reducer.py
Try to add
#!/usr/bin/env python
top of your script.
Or,
-mapper 'python m.py' -reducer 'r.py'
You need to explicitly instruct that mapper and reducer are used as python script, as we have several options for streaming. You can use either single quotes or double quotes.
-mapper "python mapper.py" -reducer "python reducer.py"
or
-mapper 'python mapper.py' -reducer 'python reducer.py'
The full command goes like this:
hadoop jar /path/to/hadoop-mapreduce/hadoop-streaming.jar \
-input /path/to/input \
-output /path/to/output \
-mapper 'python mapper.py' \
-reducer 'python reducer.py' \
-file /path/to/mapper-script/mapper.py \
-file /path/to/reducer-script/reducer.py
I ran into this error recently, and my problem turned out to be something as obvious (in hindsight) as these other solutions:
I simply had a bug in my Python code. (In my case, I was using Python v2.7 string formatting whereas the AWS EMR cluster I had was using Python v2.6).
To find the actual Python error, go to Job Tracker web UI (in the case of AWS EMR, port 9100 for AMI 2.x and port 9026 for AMI 3.x); find the failed mapper; open its logs; and read the stderr output.
make sure your input directory only contains the correct files
I too had the same problem
i tried solution of marvin W
and i also install spark , ensure that u have installed spark , not just pyspark(dependency) but also install the framework installtion tutorial
follow that tutorial
if you run this command in a hadoop cluster, make sure that python is installed in every NodeMnager instance.
#hadoop