I'm writing a script which runs in a CI/CD pipeline.
It first does some configuration (fetching credentials, pulling down files from a server).
Then, it runs an external CLI tool to analyse these files using subprocess.run(cli_args).
After that is done, the results are published to a few other systems.
My problem is that the output of the CLI appears before some or all of the previous logs. A simplified version of the code may look like this:
print("Fetching CLI configuration")
exit_code, config = fetch_cli_configuration_without_ssl_verification(server_credentials)
if exit_code != 0 || not config:
print("Error; something went wrong")
exit(exit_code)
print("Got CLI configuration")
print("Loading result server credentials")
res_server = os.environ["RES_SERVER"]
res_user = os.environ["RES_USER"]
res_pwd = os.environ["RES_PWD"]
if not (res_server && res_user && res_pwd):
print("Could not load result server credentials")
exit(1)
print("Loaded credentials for result server", res_server)
print("Running CLI-Tool")
# Logs "I am the CLI"
exit_code = subprocess.run(["mycli", "do", "this", "and", "that"], cwd="somesubdir").returncode
if exit_code != 0:
print("Error: CLI tool finished with non-zero exit code")
exit(exit_code)
print("CLI tool finished successfully")
print("Uploading result data")
upload_result_data(read_file_text("cli_results.json"), res_server, res_user, res_pwd)
print("Done uploading result data")
The output I get looks something like
/opt/python-3.10.4/lib/python3.10/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'result_server.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
I am the CLI
Fetching CLI configuration
Got CLI configuration
Loading result server credentials
Loaded credentials for result server result_server.com
Running CLI-Tool
CLI tool finished successfully
Uploading result data
Done uploading result data
How can I make sure the CLI output appears after "Running CLI-Tool"?
This is most likely because python buffers its standard output, so flush it before calling subprocess
sys.stdout.flush()
Related
I am using GCP with its Cloud Functions to execute web scrapers on a frequent basis. Also locally, my script is working without any problems.
I have a setup.py file in which I am initializing the connection to a Kafka Producer. This looks like this:
p = Producer(
{
"bootstrap.servers": os.environ.get("BOOTSTRAP.SERVERS"),
"security.protocol": os.environ.get("SECURITY.PROTOCOL"),
"sasl.mechanisms": os.environ.get("SASL.MECHANISMS"),
"sasl.username": os.environ.get("SASL.USERNAME"),
"sasl.password": os.environ.get("SASL.PASSWORD"),
"session.timeout.ms": os.environ.get("SESSION.TIMEOUT.MS")
}
)
def delivery_report(err, msg):
"""Called once for each message produced to indicate delivery result.
Triggered by poll() or flush()."""
print("Got here!")
if err is not None:
print("Message delivery failed: {}".format(err))
else:
print("Message delivered to {} [{}]".format(msg.topic(), msg.partition()))
return "DONE."
I am importing this setup in main.py in which my scraping functions are defined. This looks similar to this:
from setup import p, delivery_report
def scraper():
try:
# I won't insert my whole scraper here since it's working fine ...
print(scraped_data_as_dict)
p.produce(topic, json.dumps(scraped_data_as_dict), callback=delivery_report)
p.poll(0)
except Exception as e:
# Do sth else
The point here is: I am printing my scraped data in the console. But it doesn't do anything with the producer. It's not even logging an failed producer message (deliver_report) on the console. It's like my script is ignoring the producer command. Also, there are no Error reports in the LOG of the Cloud Function. What am I doing wrong since the function is doing something, except the important stuff? What do I have to be aware of when connection Kafka with Cloud Functions?
By no means do I write scripts very often, but I am trying to write a Nagios plugin to check the status of a RAID controller on a remote host. The issue is that the command to get the output requires elevated privileges. What would be the correct, and most effective way to pull this off? The goal is to run:
'/opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -a0'
on a remote host from the monitoring server,
and then follow the basic idea of this logic:
#Nagios Plugin for Testing LSI Raid Status
import os, sys
import argparse
import socket
import subprocess
#nagios exit codes do not change#
OK = 0
WARNING = 1
CRITICAL = 2
DEPENDENT = 3
UNKNOWN = 4
#nagios exit codes do not change#
#patterns to be searched
active = str("Active")
online = str("Online")
k = str("OK")
degrade = str("Degraded")
fail = str("Failed")
parser = argparse.ArgumentParser(description='Py3 script for monitoring RAID status.')
#arguments
parser.add_argument("--user",
metavar = '-U',
help = "username for remote connection")
parser.add_argument("--hostname",
metavar = '-H',
help = "hostname of the remote host")
args = parser.parse_args()
print(args)
#turning args into variables
hostname = args.hostname
user = args.user
ssh = subprocess.Popen(f"ssh {user}#{hostname} /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -a0", shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
check = ssh.stdoutreadlines()
OK_STR = str("RAID is OK!")
WARN_STR = str("Warning! Something is wrong with the RAID!")
CRIT_STR = str("CRITICAL! THE RAID IS BROKEN")
UNK_STR = str("Uh oh! Something ain't right?")
if (degrade) in (check):
print(WARN_STR) and exit(WARNING)
elif (fail) in (check):
print (CRIT_STR) and exit(CRITICAL)
elif (active) or (online) or (k) in (check):
print(OK_STR) and exit(OK)
else:
print(UNK_STR) and exit(UNKNOWN)
Any thoughts? This is far from my forte (and also an unfinished script) so I apologize for the layman format and any confusion in my phrasing.
I am trying to write a Nagios plugin to check the status of a RAID controller on a remote host. The issue is that the command to get the output requires elevated privileges. What would be the correct, and most effective way to pull this off?
I would recommend running the script remotely over NRPE on the system in question, and then give the user the NRPE daemon is running as (probably nagios or similar) sudo permissions to run that script with some very exact parameters.
The nrpe.cfg file mentions this example:
# Usage scenario:
# Execute restricted commmands using sudo. For this to work, you need to add
# the nagios user to your /etc/sudoers. An example entry for alllowing
# execution of the plugins from might be:
#
# nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/
...but there's no reason to be so forgiving, you can make it a lot safer by only allowing an exact command:
nagios ALL = NOPASSWD: /usr/sbin/megacli
Note that this allows any parameters with that command, this is even safer as it will not allow any other variants (example):
nagios ALL = NOPASSWD: /usr/sbin/megacli -a foo -b bar -c5 -w1
Then configure the nrpe command to run the above with sudo before it, and it should work. You can verify by su:ing to the nagios user and trying the sudo command yourself.
Also, note that there are very likely some available modules you can import for python nagios plugins that makes it easier for you, to get built-in support for things like thresholds and their syntax.
I have a custom python plugin that I am using to pull data into Telegraf. It prints out line protocol output, as expected.
In my Ubuntu 18.04 environment, when this plugin is run I see a single line in my logs:
2020-12-28T21:55:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py': Traceback (most recent call last):...
That is it. I can't figure out how to get the actual traceback.
If I run sudo -u telegraf /usr/bin/telegraf -config /etc/telegraf/telegraf.conf, the plugin works as expected. It polls and loads data exactly as it should.
I'm not sure how to move forward with troubleshooting this error when telegraf is executing the plugin on it's own.
I have restarted the telegraf service. I have verified permissions (and I think that the execution above shows that it should work).
A few additional details based on the comments and answers received:
The plugin lives in a directory where the entire structure is owned by telegraf:telegraf. The error does not seem to indicate that it can't see the file that is being executed, but rather something within the file is failing when Telegraf executes the plugin.
The code for the plug in is below.
Plugin code (/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py):
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import requests
import os
RUNTIME_URL = INTERNAL_URL
MEASUREMENT = "MY_MEASUREMENT"
CREDENTIALS = "GOOGLE_SERVICE_FILE.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = CREDENTIALS # ENV VAR REQUIRED BY GOOGLE CODE BELOW
CLIENT_ID = VALUE_FROM_GOOGLE
exclude_fields = ["name", "version"] # Don't try to put these into influxdb from json response
def make_iap_request(url, client_id, method="GET", **kwargs):
# Code provided by Google docs
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service
# account.
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(method, url, headers={"Authorization": "Bearer {}".format(open_id_connect_token)}, **kwargs)
if resp.status_code == 403:
raise Exception("Service account does not have permission to " "access the IAP-protected application.")
elif resp.status_code != 200:
raise Exception(
"Bad response from application: {!r} / {!r} / {!r}".format(resp.status_code, resp.headers, resp.text)
)
else:
return resp.json()
def print_results(results):
"""
Take the results of a Dolores call and print influx line protocol results
"""
for item in results["workflow"]:
line_protocol_line_base = f"{MEASUREMENT},name={item['name']}"
values = ""
for key, value in item.items():
if key not in exclude_fields:
values = values + f",{key}={value}"
values = values[1:]
line_protocol_line = f"{line_protocol_line_base} {values}"
print(line_protocol_line)
def main():
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
if __name__== "__main__":
main()
Relevant portion of the telegraf.conf file:
[[inputs.exec]]
## Commands array
commands = [
"/my_company/plugins-enabled/plugin-*/poll_*.py",
]
Agent section of config file
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = true
What do I do next?
The exec plugin is truncating your Exception message at the newline. If you wrap your call to make_iap_request in a try/except block, and then print(e, file=sys.stderr) rather than letting the Exception bubble all the way up, that should tell you more.
def main():
"""
Query URL and print line protocol
"""
try:
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
except Exception as e:
print(e, file=sys.stderr)
Alternately your script could log error messages to it's own log file, rather than passing them back to Telegraf. This would give you more control over what's logged.
I suspect you're running into an environment issue, where there's something different about how you're running it. If not permissions, it could be environment variable differences.
Please do check the permissions.
It seems like it's a permission error. Since telegraf has the necessary permissions running sudo -u telegraf works. But the user you're trying from doesn't have the necessary permissions for accessing the files in /my_company/plugins-enabled/.
So I will recommend looking into them and changing the permissions to Other can access and write or to the username you are trying to use telegraf from.
In order to fix this run the command to go to the directory:
cd /my_company/plugins-enabled/
Then to change ownership to you and only you:
sudo chown -R $(whoami)
Then to change the read/write permissions to all files and folders otherwise:
sudo chmod -R u+w
And if you want everyone, literally everyone on the system to have access to read/write to those files and folders and just want to give all permissions to everyone:
sudo chmod -R 777
I'm writing python app which currently is being hosted on Heroku. It is in early development stage, so I'm using free account with one web dyno. Still, I want my heavier tasks to be done asynchronously so I'm using iron worker add-on. I have it all set up and it does the simplest jobs like sending emails or anything that doesn't require any data being sent back to the application. The question is: How do I send the worker output back to my application from the iron worker? Or even better, how do I notify my app that the worker is done with the job?
I looked at other iron solutions like cache and message queue, but the only thing I can find is that I can explicitly ask for the worker state. Obviously I don't want my web service to poll the worker because it kind of defeats the original purpose of moving the tasks to background. What am I missing here?
I see this question is high in Google so in case you came here with hopes to find some more details, here is what I ended up doing:
First, I prepared the endpoint on my app. My app uses Flask, so this is how the code looks:
#app.route("/worker", methods=["GET", "POST"])
def worker():
#refresh the interface or whatever is necessary
if flask.request.method == 'POST':
return 'Worker endpoint reached'
elif flask.request.method == 'GET':
worker = IronWorker()
task = worker.queue(code_name="hello", payload={"WORKER_DB_URL": app.config['WORKER_DB_URL'],
"WORKER_CALLBACK_URL": app.config['WORKER_CALLBACK_URL']})
details = worker.task(task)
flask.flash("Work queued, response: ", details.status)
return flask.redirect('/')
Note that in my case, GET is here only for testing, I don't want my users to hit this endpoint and invoke the task. But I can imagine situations when this is actually useful, specifically if you don't use any type of scheduler for your tasks.
With the endpoint ready, I started to look for a way of visiting that endpoint from the worker. I found this fantastic requests library and used it in my worker:
import sys, json
from sqlalchemy import *
import requests
print "hello_worker initialized, connecting to database..."
payload = None
payload_file = None
for i in range(len(sys.argv)):
if sys.argv[i] == "-payload" and (i + 1) < len(sys.argv):
payload_file = sys.argv[i + 1]
break
f = open(payload_file, "r")
contents = f.read()
f.close()
payload = json.loads(contents)
print "contents: ", contents
print "payload as json: ", payload
db_url = payload['WORKER_DB_URL']
print "connecting to database ", db_url
db = create_engine(db_url)
metadata = MetaData(db)
print "connection to the database established"
users = Table('users', metadata, autoload=True)
s = users.select()
#def run(stmt):
# rs = stmt.execute()
# for row in rs:
# print row
#run(s)
callback_url = payload['WORKER_CALLBACK_URL']
print "task finished, sending post to ", callback_url
r = requests.post(callback_url)
print r.text
So in the end there is no real magic here, the only important thing is to send the callback url in the payload if you need to notify your page when the task is done. Alternatively you can place the endpoint url in the database if you use one in your app. Btw. the snipped above also shows how to connect to the postgresql database in your worker and print all the users.
One last thing you need to be aware of is how to format your .worker file, mine looks like this:
# set the runtime language. Python workers use "python"
runtime "python"
# exec is the file that will be executed:
exec "hello_worker.py"
# dependencies
pip "SQLAlchemy"
pip "requests"
This will install the latest versions of SQLAlchemy and requests, if your project is dependent on any specific version of the library, you should do this instead:
pip "SQLAlchemy", "0.9.1"
Easiest way - push message to your api from worker - it's log or anything you need to have in your app
We're experiencing some strange issues with Nagios. We wrote a script in Python which uses paramiko, httplib and re. When we comment out the code that is written to use paramiko the script returns OK in Nagios. When we uncomment the script the status just returns (null).
This is our code
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
#ssh.connect(self.get('hostname'),int(self.get('port')),self.get('username'),allow_agent=True)
ssh.connect('192.168.56.102' , 22 , 'oracle' ,allow_agent=True)
link = '127.0.0.1:4848'
stdin,stdout,stderr = ssh.exec_command('wget --no-proxy ' + link + ' 2>&1 | grep -i "failed\|error"')
result = stdout.readlines()
result = " ".join(result)
if result == "":
return MonitoringResult(MonitoringResult.OK,'Webservice up')
else:
return MonitoringResult(MonitoringResult.CRITICAL,'Webservice down %s' % result)
So when we comment out the part above
if result =="":
and add result="" above the if, it will just return 'Webservice up' in Nagios. When we enable the code above the if it will just return (null). Is there a conflict with paramiko or something?
When running the code in terminal it just returns the correct status but it doesn't show when implemented in Nagios
I found that although nagios runs as effective user "nagios" it is using the environment settings for user "root" and not able to read the root private key to make the connection.
Adding key_filename='/nagiosuserhomedir/.ssh/id_dsa' to the options for ssh.connect() solved this same problem for me (pass the private key for nagios user explicitly in the code).