In chef server, how to determine when a node is fully bootstrapped

In chef server, how to determine when a node is fully bootstrapped - python

From python, I am using knife to launch a server.
e.g.
knife ec2 server create -r "role[nginx_server]" --region ap-southeast-1 -Z ap-southeast-1a -I ami-ae1a5dfc --flavor t1.micro -G nginx -x ubuntu -S sg_development -i /home/ubuntu/.ec2/sg_development.pem -N webserver1
I will then use the chef-server api to check for when the bootstrap is complete so I can then use boto and other tools to configure the newly created server. Pseudo code will look like this:
cmd = """knife ec2 server create -r "role[nginx_server]...."""
os.system(cmd)
boot = False
while boot==False:
chefTrigger = getStatusFromChefApi()
if chefTrigger==True:
boot=True
continue with code for further proccessing
My question is: What is the trigger in the chef-server that will indicate when the node is fully processed by chef? Note, I used the -N to name the server and will query its properties, but what do I look for? Is there a bool? A status?
Thanks

TL;DR: Use a report/exception handler instead.
When the node has finished running chef-client successfully, it will save the node object to the Chef Server. One of the attributes automatically generated by ohai every time Chef runs is node['ohai_time'], which is the Unix epoch timestamp when ohai was executed (at the beginning of the Chef run). A node that has not successfully saved itself to the server will not have the ohai_time at all. However, this attribute merely tracks the time when ohai ran, not necessarily when chef-client saved to the server (since that can be a few seconds to minutes depending on what your recipes are doing). Note if the chef run exits due to an unhandled exception, it won't save to the server by default.
A more reliable way to be notified when a node has completed is to use a Report/Exception handler, which can send a message to a variety of places and APIs. See the documentation for more information.

Related

openshift liveness probe to detect long running process

I have a python data load service. One of the steps in the service is to refresh multiple Oracle materialized views. We have noticed that the service often gets stuck at this step and the issue gets fixed after a restart (pod).
I want to configure a command based openshift liveness probe here.
The purpose is to detect if the service is stuck at this step for say more than x hours, if yes then the probe fails and pod is restarted.
The service doesn't have http access to it.
we use enormous logging in the script that is being run here.
Is there a way to poll the openshift deployment log (latest one) and look for certain messages.
example:
#msg1
print("Refreshing materialized views")
.
.
.
#msg2
print("materialized view refreshed")
msg1 marks the start of potentially problematic step. My intent to write a command that polls the log and looks for msg2 (as it marks completion, exit status 0), if it doesn't find msg2 for more than 5 hours say, it must return non zero exit status causing the probe to fail.
How can I implement this?
Is this the best way to do it?

How to add cron jobs in CloudFoundry

I have a couple of Python apps in CloudFoundry. Now I would like to schedule their execution. For example a specific app has to be executed on the second day of each month.
I coudldn't find anything on the internet. Is that even possible?

Cloud Foundry will deploy your application inside a container. You could use libraries to execute your code on a specific schedule but either way you're paying to have that instance run the whole time.
What you're trying to do is a perfect candidate for "serverless computing" (also known as "event-driven" or "function as a service" computing.
These deployment technologies execute functions on response to a trigger e.g. a REST api call, a certain timestamp, a new database insert etc...
You could execute your python cloud foundry apps using the Openwhisk serverless compute platform.
IBM offer a hosted version of this running on their cloud platform, Bluemix.
I don't know what your code looks like so I'll use this sample hello world function:
import sys
def main(dict):
if 'message' in dict:
name = dict['message']
else:
name = 'stranger'
greeting = 'Hello ' + name + '!'
print(greeting)
return {'greeting':greeting}
You can upload your actions (functions) to OpenWhisk using either the online editor or the CLI.
Once you've uploaded your actions you can automate them on a specific schedule by using the Alarm Package. To do this in the online editor click "automate this process" and pick the alarm package.
To do this via the CLI we need to first create a trigger:
$ wsk trigger create regular_hello_world --feed /whisk.system/alarms/alarm -p cron '0 0 9 * * *'
ok: created trigger feed regular_hello_world
This will trigger every day at 9am. We then need to link this trigger to our action by creating a rule:
$ wsk rule create regular_hello_rule regular_hello_world hello_world
ok: created rule regular_hello_rule
For more info see the docs on creating python actions.

The CloudFoundry platform itself does not have a scheduler (at least not at this time) and the containers where you application runs do not have cron installed (unlikely to ever happen).
If you want to schedule code to periodically run, you have a few options.
You can deploy an application that includes a scheduler. The scheduler can run your code directly in that container or it can trigger the code to run elsewhere (ex: it sends an HTTP request to another application and that request triggers the code to run). If you trigger the code to run elsewhere, you can make the scheduler app run pretty lean (maybe with 64m of memory or less) to reduce costs.
You can look for a third party scheduler service. The availability of and cost of services like this will vary depending on your CF provider, but there are service offerings to handle scheduling. These typically function like the previous example where an HTTP request is sent to your app at a specific time and that triggers your scheduled code. Many service providers offer free tiers, which give you a small number of triggers per month at no cost.
If you have a server outside of CF with cron installed, you can use cron there to schedule the tasks and trigger the code to run on CF. You can do this like the previous examples by sending HTTP requests to your app, however, this option also gives you the possibility to make use of CloudFoundry's task feature.
CloudFoundry has the concept of a task, which is a one-time execution of some code. With it, you can execute the cf run-task command to trigger the task to run. Ex: cf run-task <app-name> "python my-task.py". More on that in the docs, here. The nice part about using tasks is that your provider will only bill you while the task is running.
To see if your provider has tasks available, run cf feature-flags and look to see if task_creation is set to enabled.
Hope that helps!

'net use' command returns nothing when called by subprocess.

I'm running a daemon as a service on a Windows server that's meant to listen to triggers and create folders on a server. However I ran into difficulty with the fact that though the command prompt recognise my 'Y:' drive mapping, the service does not.
Looking into it, I was advised that the issue was likely that the mapping was not universally set up. So I tried to get the service to run the net use command and map the same drive at that level of access.
Note: The daemon uses logger.info to write to a text file.
command = ['net', 'use','Y', '\\\\REAL.IP.ADDRESS\\FOLDER',
'/user:USER', 'password']
response = subprocess.Popen(command,stdout=subprocess.PIPE)
result = response.communicate()
logger.info("net use result:")
logger.info(result[0])
logger.info(result[1])
command = ['net', 'use',]
response = subprocess.Popen(command,stdout=subprocess.PIPE)
result = response.communicate()
logger.info("Current drives:")
logger.info(result[0])
logger.info(result[1])
However when running that I got no response at all from the process, and then a response telling me that there are no current drives.
INFO - net use result:
INFO -
INFO - None
INFO - Current drives:
INFO - New connections will be remembered. There are no entries in the list.
INFO - None
Maybe I'm dumb but shouldn't it return something in response, especially if it's failing to execute the command? Or am I actually not able to map drives at this level?
Note: The daemon's logger module prepends every line with INFO - so for the purpose of this question you can ignore that.

By default, services run under the Local System account and cannot access network resources. If you want to be able to access your network from a service, try running the service as a user with network privileges. (Note that this may be a security concern!)
In the Services panel, go to the Properties of your service and click the Log On tab. Select This account and specify the user credentials.

Nested Fabric Connections

The scenario is our production servers are sitting in a private subnet with a NAT instance in front of them to allow maintenance via SSH. Currently we connect to the NAT instance via SSH then via SSH from there to the respective server.
What I would like to do is run deployment tasks from my machine using the NAT as a proxy without uploading the codebase to the NAT instance. Is this possible with Fabric or am I just going to end up in a world of pain?
EDIT
Just to follow up on this, as #Morgan suggested, the gateway option will indeed fix this issue.
For a bit of completeness, in my fabfile.py:
def setup_connections():
"""
This should be called as the first task in all calls in order to setup the correct connections
e.g. fab setup_connections task1 task2...
"""
env.roledefs = {}
env.gateway = 'ec2-user#X.X.X.X' # where all the magic happens
tag_mgr = EC2TagManager(...)
for role in ['web', 'worker']:
env.roledefs[role] = ['ubuntu#%s' % ins for ins in
tag_mgr.get_instances(instance_attr='private_ip_address', role=role)]
env.key_filename = '/path/to/server.pem'
#roles('web')
def test_uname_web():
run('uname -a')
I can now run fab setup_connections test_uname_web and I can get the uname of my web server

So if you have a newer version of Fabric (1.5+) you can try using the gateway options. I've never used it myself, but seems like what you'd want.
Documentation here:
env var
cli flag
execution model notes
original ticket
Also if you run into any issues, all of us tend to idle in irc.

Django and root processes

In my Django project I need to be able to check whether a host on the LAN is up using an ICMP ping. I found this SO question which answers how to ping something in Python and this SO question which links to resources explaining how to use the sodoers file.
The Setting
A Device model stores an IP address for a host on the LAN, and after adding a new Device instance to the DB (via a custom view, not the admin) I envisage checking to see if the device responds to a ping using an AJAX call to an API which exposes the capability.
The Problem
However (from the docstring of a library suggested in the the first SO question) "Note that ICMP messages can only be sent from processes running as root."
I don't want to run Django as the root user, since it is bad practice. However this part of the process (sending and ICMP ping) needs to run as root. If with a Django view I wish to send off a ping packet to test the liveness of a host then Django itself is required to be running as root since that is the process which would be invoking the ping.
Solutions
These are the solutions I can think of, and my question is are there any better ways to only execute select parts of a Django project as root, other than these:
Run Django as root (please no!)
Put a "ping request" in a queue that another processes -- run as root -- can periodically check and fulfil. Maybe something like celery.
Is there not a simpler way?
I want something like a "Django run as root" library, is this possible?

Absolutely no way, do not run the Django code as root!
I would run a daemon as root (written in Python, why not) and then IPC between the Django instance and your daemon. As long as you're sure to validate the content and properly handle it (e.g. use subprocess.call with an array etc) and only pass in data (not commands to execute) it should be fine.
Here is an example client and server, using web.py
Server: http://gist.github.com/788639
Client: http://gist.github.com/788658
You'll need to install webpy.org but it's worth having around anyway. If you can hard-wire the IP (or hostname) into the server and remove the argument, all the better.

What's your OS here? You might be able to write a little program that does what you want given a parameter, and stick that in the sudoers file, and give your django user permission to run it as root.
/etc/sudoers

I don't know what kind of system you're on, but on any box I've encountered, one does not have to be root to run the command-line ping program (it has the suid bit set, so it becomes root as necessary). So you could just invoke that. It's a bit more overhead, but probably negligible compared to network latency.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.