Ambari - Trouble Checking Status for Custom Application

Ambari - Trouble Checking Status for Custom Application - python

I've been playing around with creating a custom application inside of our Ambari installation. After a little bit of toying, I've successfully got this configured to do the installation and startup actions with the appropriate log creation\output and pid creation. The piece that I'm struggling with now is having Ambari maintain the status of this newly installed application. After following some of the instructions here : http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/ (specifically the Component Status section), I've been able to make some progress -- however it's not exactly what I want.
When including the following in master.py, Ambari will see the service as momentarily active after initial startup, but then the application will appear as red (offline). It marks it as offline, even though when I check the server, I see the appropriate process running.
def status(self, env):
import params
print 'Checking status of pid file'
check=format("{params.pid}/Application.pid")
check_process_status(check)
However, when I modify it to look like the following, Ambari has no problem tracking the status and monitors it appropriately
def status(self, env):
import params
print 'Checking status of pid file'
dummy_master_pid_file = "/var/run/Application/Application.pid"
check_process_status(dummy_master_pid_file)
Has anyone else run into this issue? Is there something that I'm missing in regards to creating this custom application inside of Ambari? Any help or pointing in the right direction will be appreciated.
FYI. This is Ambari 2.1 running on Centos 6.7

Recently, I have solved a similar problem. And the solution is to put a string "{"securityState": "UNKNOWN"}" into the file - /var/lib/ambari-agent/data/structured-out-status.json.
The way to find this solution is by watching ambari-agent log : PythonExecutor.py:149 - {'msg': 'Unable to read structured output from /var/lib/ambari-agent/data/structured-out-status.json'}. Hope it will help.

maybe this is your parameter problem.
def status(self, env):
import params
print 'Checking status of pid file'
pid_path = params.pid
check=format("{pid_path}/Application.pid")
check_process_status(check)

Related

Python on task scheduler is not working with some libraries

I have made a lot of research on the forum, and still can't figure out how to solve my issue. I am running a Python code from the Windows task scheduler, it can run it but stop at some point. I have create a log file to see where the code stop running and it stops when I am doing an HTTP GET from the requests library.
r = session.get(urlRequest, allow_redirects=True)
The code runs fine on Spyder. Any suggestions?
I created a bat file as well and same issue.
#echo off
"C:\Users\NAME\Anaconda3\python.exe" "C:\Users\NAME\Documents\GTD_scheduledTasks\exchangeRate.py"
pause
In my log files, i asked to print several parameters:
Sys.excecutable C:\Users\NAME\Anaconda3\python.exe
Sys.path
['C:\Users\NAME\Documents\GTD_scheduledTasks',
'C:\Users\NAME\Anaconda3\python39.zip',
'C:\Users\NAME\Anaconda3\DLLs',
'C:\Users\NAME\Anaconda3\lib',
'C:\Users\NAME\Anaconda3',
'C:\Users\NAME\Anaconda3\lib\site-packages',
'C:\Users\NAME\Anaconda3\lib\site-packages\win32',
'C:\Users\NAME\Anaconda3\lib\site-packages\win32\lib',
'C:\Users\NAME\Anaconda3\lib\site-packages\Pythonwin']
os.getcwd() C:\WINDOWS\system32
Thanks!
Edit: I also checked form Spyder where is my ex files for Python and I used the one from sys.excecutable (C:\Users\NAME\Anaconda3\python.exe) both with one \ and with double \
When I go to the event history in the Task Scheduler, i see:
successfully completed: I dont get error return but some statements are not processed (the requests)
actionName is C:\WINDOWS\SYSTEM32\cmd.exe. Not sure if relevant

Azure Batch JobPreparationTask fails with "UserError"

I am trying to mount a File Share (Not a blob storage) during the JobPerparationTask. My node OS is Ubuntu 16.04.
To do this, I am doing the following:
job_user = batchmodels.AutoUserSpecification(
scope=batchmodels.AutoUserScope.pool,
elevation_level=batchmodels.ElevationLevel.admin)
start_task = batch.models.JobPreparationTask(command_line=start_commands, user_identity=batchmodels.UserIdentity(auto_user=job_user))
end_task = batch.models.JobReleaseTask(command_line=end_commands,user_identity=batchmodels.UserIdentity(auto_user=job_user))
job = batch.models.JobAddParameter(
job_id,
batch.models.PoolInformation(pool_id=pool_id),job_preparation_task=start_task, job_release_task=end_task)
My start_commands and end_commands are fine, but there is something wrong with the User permissions...
I get no output in the stderr.txt or in the stdout.txt file.
I do not see any logs what-so-ever (where are they?). All I am able to find is a message showing this:
Exit code
1
Retry count
0
Failure info
Category
UserError
Code
FailureExitCode
Message
The task exited with an exit code representing a failure
Details
Message: The task exited with an exit code representing a failure
Very detailed error message!
Anyway, I have also tried changing AutoUserScope.oool to AutoUserScope.task, but there is no change.
Anyone have any ideas?

I had this issue which was frustrating me because I couldn't get any logs from my application.
What I ended up doing is RDP'ing into the node my job ran on, going to %AZ_BATCH_TASK_WORKING_DIR% as specified in Azure Batch Compute Environment Variables and then checking the stdout.txt and stderr.txt in my job.
The error was that I formulated my CloudTask's commandline incorrectly, so it could not launch my application in the first place.
To RDP into your machine, in Azure Portal:
Batch Account
Pools (select your pool)
Nodes
Select the node that ran your job
Select "Connect" link at the top.

Riak returning locked status when storing key

I have a five node riak cluster and am doing some basic application testing with a python RiakClient using pbc. Code looks something like this:
b = riakclient.bucket('test')
item = b.get('key1')
item.data = 'testdata'
item.store()
I am getting {error,locked} as a RiakError back. Once this starts to happen I also get a lot of errors between the cluster nodes that look like this:
Handoff receiver for partition 1134123.... exited abnormally ... {error,locked}
Any ideas what this might be or how to resolve? This is riak 2.0.2, thinking about updating but hoping to not have to do that yet.
update: this problem manifested itself after the docker (did I mention that?) container I was using was restarted. After restart the riak process came up but was not in a functioning state despite being marked healthy in the cluster. A 'sv restart riak' got the cluster back to working. Still wondering what this means, does not seem to be documented although it does seem to mean the node is in a read-only state.

This was a problem in the shutdown scripts, node was coming up in unclean state. A restart fixed the issue.

Android's MonkeyRunner occasionally throws exceptions

I am running an automated test using an Android emulator driving an app with a Monkey script written in Python.
The script is copying files onto the emulator, clicks buttons in the app and reacts depending on the activities that the software triggers during its operation. The script is supposed to be running the cycle a few thousand times so I have this in a loop to run the adb tool to copy the files, start the activities and see how the software is reacting by calling the getProperty method on the device with the parameter 'am.current.comp.class'.
So here is a very simplified version of my script:
for target in targets:
androidSDK.copyFile(emulatorName, target, '/mnt/sdcard')
# Runs the component
device.startActivity(component='com.myPackage/com.myPackage.myactivity')
while 1:
if device.getProperty('am.current.comp.class') == 'com.myPackage.anotheractivity':
time.sleep(1) # to allow the scree to display the new activity before I click on it
device.touch(100, 100, 'DOWN_AND_UP')
# Log the result of the operation somewhere
break
time.sleep(0.1)
(androidSDK is a small class I've written that wraps some utility functions to copy and delete files using the adb tool).
On occasions the script crashes with one of a number of exceptions, for instance (I am leaving out the full stack trace)
[com.android.chimpchat.adb.AdbChimpDevice]com.android.ddmlib.ShellCommandUnresponsiveException
or
[com.android.chimpchat.adb.AdbChimpDevice] Unable to get variable: am.current.comp.class
[com.android.chimpchat.adb.AdbChimpDevice]java.net.SocketException: Software caused connectionabort: socket write error
I have read that sometimes the socket connection to the device becomes unstable and may need a restart (adb start-server and adb kill-server come in useful).
The problem I'm having is that the tools are throwing Java exceptions (Monkey runs in Jython), but I am not sure how those can be trapped from within my Python script. I would like to be able to determine the exact cause of the failure inside the script and recover the situation so I can carry on with my iterations (re-establish the connection, for instance? Would for instance re-initialising my device with another call to MonkeyRunner.waitForConnection be enough?).
Any ideas?
Many thanks,
Alberto
EDIT. I thought I'd mention that I have discovered that it is possible to catch Java-specific exceptions in a Jython script, should anyone need this:
from java.net import SocketException
...
try:
...
except(SocketException):
...

It is possible to catch Java-specific exceptions in a Jython script:
from java.net import SocketException
...
try:
...
except(SocketException):
...
(Taken from OP's edit to his question)

This worked for me:
device.shell('exit')# Exit the shell

python,running command line servers - they're not listening properly

Im attempting to start a server app (in erlang, opens ports and listens for http requests) via the command line using pexpect (or even directly using subprocess.Popen()).
the app starts fine, logs (via pexpect) to the screen fine, I can interact with it as well via command line...
the issue is that the servers wont listen for incoming requests. The app listens when I start it up manually, by typing commands in the command line. using subprocess/pexpect stops the app from listening somehow...
when I start it manually "netstat -tlp" displays the app as listening, when I start it via python (subprocess/pexpect) netstat does not register the app...
I have a feeling it has something to do with the environemnt, the way python forks things, etc.
Any ideas?
thank you
basic example:
note:
"-pz" - just ads ./ebin to the modules path for the erl VM, library search path
"-run" - runs moduleName, without any parameters.
command_str = "erl -pz ./ebin -run moduleName"
child = pexpect.spawn(command_str)
child.interact() # Give control of the child to the user
all of this stuff works correctly, which is strange. I have logging inside my code and all the log messages output as they should. the server wouldnt listen even if I started up its process via a bash script, so I dont think its the python code thats causing it (thats why I have a feeling that its something regarding the way the new OS process is started).

It could be to do with the way that command line arguments are passed to the subprocess.
Without more specific code, I can't say for sure, but I had this problem working on sshsplit ( https://launchpad.net/sshsplit )
To pass arguments correctly (in this example "ssh -ND 3000"), you should use something like this:
openargs = ["ssh", "-ND", "3000"]
print "Launching %s" %(" ".join(openargs))
p = subprocess.Popen(openargs, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
This will not only allow you to see exactly what command you are launching, but should correctly pass the values to the executable. Although I can't say for sure without seeing some code, this seems the most likely cause of failure (could it also be that the program requires a specific working directory, or configuration file?).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Ambari - Trouble Checking Status for Custom Application - python

maybe this is your parameter problem. def status(self, env): import params print 'Checking status of pid file' pid_path = params.pid check=format("{pid_path}/Application.pid") check_process_status(check)

Related

Python on task scheduler is not working with some libraries

Azure Batch JobPreparationTask fails with "UserError"

Riak returning locked status when storing key

Android's MonkeyRunner occasionally throws exceptions

python,running command line servers - they're not listening properly

Categories

Resources