How do I troubleshoot this error in telegraf?

How do I troubleshoot this error in telegraf? - python

I have a custom python plugin that I am using to pull data into Telegraf. It prints out line protocol output, as expected.
In my Ubuntu 18.04 environment, when this plugin is run I see a single line in my logs:
2020-12-28T21:55:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py': Traceback (most recent call last):...
That is it. I can't figure out how to get the actual traceback.
If I run sudo -u telegraf /usr/bin/telegraf -config /etc/telegraf/telegraf.conf, the plugin works as expected. It polls and loads data exactly as it should.
I'm not sure how to move forward with troubleshooting this error when telegraf is executing the plugin on it's own.
I have restarted the telegraf service. I have verified permissions (and I think that the execution above shows that it should work).
A few additional details based on the comments and answers received:
The plugin lives in a directory where the entire structure is owned by telegraf:telegraf. The error does not seem to indicate that it can't see the file that is being executed, but rather something within the file is failing when Telegraf executes the plugin.
The code for the plug in is below.
Plugin code (/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py):
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import requests
import os
RUNTIME_URL = INTERNAL_URL
MEASUREMENT = "MY_MEASUREMENT"
CREDENTIALS = "GOOGLE_SERVICE_FILE.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = CREDENTIALS # ENV VAR REQUIRED BY GOOGLE CODE BELOW
CLIENT_ID = VALUE_FROM_GOOGLE
exclude_fields = ["name", "version"] # Don't try to put these into influxdb from json response
def make_iap_request(url, client_id, method="GET", **kwargs):
# Code provided by Google docs
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service
# account.
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(method, url, headers={"Authorization": "Bearer {}".format(open_id_connect_token)}, **kwargs)
if resp.status_code == 403:
raise Exception("Service account does not have permission to " "access the IAP-protected application.")
elif resp.status_code != 200:
raise Exception(
"Bad response from application: {!r} / {!r} / {!r}".format(resp.status_code, resp.headers, resp.text)
)
else:
return resp.json()
def print_results(results):
"""
Take the results of a Dolores call and print influx line protocol results
"""
for item in results["workflow"]:
line_protocol_line_base = f"{MEASUREMENT},name={item['name']}"
values = ""
for key, value in item.items():
if key not in exclude_fields:
values = values + f",{key}={value}"
values = values[1:]
line_protocol_line = f"{line_protocol_line_base} {values}"
print(line_protocol_line)
def main():
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
if __name__== "__main__":
main()
Relevant portion of the telegraf.conf file:
[[inputs.exec]]
## Commands array
commands = [
"/my_company/plugins-enabled/plugin-*/poll_*.py",
]
Agent section of config file
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = true
What do I do next?

The exec plugin is truncating your Exception message at the newline. If you wrap your call to make_iap_request in a try/except block, and then print(e, file=sys.stderr) rather than letting the Exception bubble all the way up, that should tell you more.
def main():
"""
Query URL and print line protocol
"""
try:
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
except Exception as e:
print(e, file=sys.stderr)
Alternately your script could log error messages to it's own log file, rather than passing them back to Telegraf. This would give you more control over what's logged.
I suspect you're running into an environment issue, where there's something different about how you're running it. If not permissions, it could be environment variable differences.

Please do check the permissions.
It seems like it's a permission error. Since telegraf has the necessary permissions running sudo -u telegraf works. But the user you're trying from doesn't have the necessary permissions for accessing the files in /my_company/plugins-enabled/.
So I will recommend looking into them and changing the permissions to Other can access and write or to the username you are trying to use telegraf from.
In order to fix this run the command to go to the directory:
cd /my_company/plugins-enabled/
Then to change ownership to you and only you:
sudo chown -R $(whoami)
Then to change the read/write permissions to all files and folders otherwise:
sudo chmod -R u+w
And if you want everyone, literally everyone on the system to have access to read/write to those files and folders and just want to give all permissions to everyone:
sudo chmod -R 777

Related

Why does output of subprocess.run appear too early?

I'm writing a script which runs in a CI/CD pipeline.
It first does some configuration (fetching credentials, pulling down files from a server).
Then, it runs an external CLI tool to analyse these files using subprocess.run(cli_args).
After that is done, the results are published to a few other systems.
My problem is that the output of the CLI appears before some or all of the previous logs. A simplified version of the code may look like this:
print("Fetching CLI configuration")
exit_code, config = fetch_cli_configuration_without_ssl_verification(server_credentials)
if exit_code != 0 || not config:
print("Error; something went wrong")
exit(exit_code)
print("Got CLI configuration")
print("Loading result server credentials")
res_server = os.environ["RES_SERVER"]
res_user = os.environ["RES_USER"]
res_pwd = os.environ["RES_PWD"]
if not (res_server && res_user && res_pwd):
print("Could not load result server credentials")
exit(1)
print("Loaded credentials for result server", res_server)
print("Running CLI-Tool")
# Logs "I am the CLI"
exit_code = subprocess.run(["mycli", "do", "this", "and", "that"], cwd="somesubdir").returncode
if exit_code != 0:
print("Error: CLI tool finished with non-zero exit code")
exit(exit_code)
print("CLI tool finished successfully")
print("Uploading result data")
upload_result_data(read_file_text("cli_results.json"), res_server, res_user, res_pwd)
print("Done uploading result data")
The output I get looks something like
/opt/python-3.10.4/lib/python3.10/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'result_server.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
I am the CLI
Fetching CLI configuration
Got CLI configuration
Loading result server credentials
Loaded credentials for result server result_server.com
Running CLI-Tool
CLI tool finished successfully
Uploading result data
Done uploading result data
How can I make sure the CLI output appears after "Running CLI-Tool"?

This is most likely because python buffers its standard output, so flush it before calling subprocess
sys.stdout.flush()

Running `connect_get_namespaced_pod_exec` using kubernetes client corev1api gives bad request

kubernetes client corev1api connect_get_namespaced_pod_exec fails to run for python.
I have checked the python version == 2.7 and pip freeze - ipaddress==1.0.22, urllib3==1.24.1 and websocket-client==0.54.0 are the versions which satisfy the requirement - as mentioned here: https://github.com/kubernetes-client/python/blob/master/README.md#hostname-doesnt-match
followed the issue on this thread - https://github.com/kubernetes-client/python/issues/36 - not much help.
Tried usings stream as suggested here - https://github.com/kubernetes-client/python/blob/master/examples/exec.py
Ran:
api_response = stream(core_v1_api.connect_get_namespaced_pod_exec,
name, namespace,
command=exec_command,
stderr=True, stdin=False,
stdout=True, tty=False)
Got this error:
ApiException: (0)
Reason: hostname '10.47.7.95' doesn't match either of '', 'cluster.local'
Without stream using directly the CoreV1Api -
Ran :
core_v1_api = client.CoreV1Api()
api_response = core_v1_api.connect_get_namespaced_pod_exec(name=name,namespace=namespace,command=exec_command,stderr=True, stdin=False,stdout=True, tty=False)
Got this error:
ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Sat, 05 Jan 2019 08:01:22 GMT', 'Content-Length': '139', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Upgrade request required","reason":"BadRequest","code":400}

I wrote a simple program to check that:
from kubernetes import client, config
from kubernetes.stream import stream
# create an instance of the API class
config.load_kube_config()
api_instance = client.CoreV1Api()
exec_command = [
'/bin/sh',
'-c',
'echo This is Prafull Ladha and it is test function']
resp = stream(api_instance.connect_get_namespaced_pod_exec, "nginx-deployment-76bf4969df-467z2", 'default',
command=exec_command,
stderr=True, stdin=False,
stdout=True, tty=False)
print("Response: " + resp)
It is working perfectly fine for me.
I believe you're using minikube for development purpose. It is not able to recognise your hostname. You can make it work by disabling assert_hostname in your program like:
from kubernetes.client import configuration
config.load_kube_config()
configuration.assert_hostname = False
This should resolve your issue.

Adding container='name' to the call would work especially if you have any sidecar container like istio-proxy is running on the POD.

Watch out when using the stream() suggestion from Prafull Ladha answer. It has a lot of pitfalls:
Will not throw exception in case the command fails. Just return empty string. To cover that, one need to set stream(..., _preload_content=False) and then manually check resp.returncode and manually call resp.readline_stdout in a loop.
Will timeout if the command takes more than 5min to run without producing any output. To solve that you need to call resp.sock.ping() in a parallel thread.
I'm afraid i don't have a better alternative solution, and not the reputation to add this as comment.

Session authentication with Django channels

Trying to get authentication working with Django channels with a very simple websockets app that echoes back whatever the user sends over with a prefix "You said: ".
My processes:
web: gunicorn myproject.wsgi --log-file=- --pythonpath ./myproject
realtime: daphne myproject.asgi:channel_layer --port 9090 --bind 0.0.0.0 -v 2
reatime_worker: python manage.py runworker -v 2
I run all processes when testing locally with heroku local -e .env -p 8080, but you could also run them all separately.
Note I have WSGI on localhost:8080 and ASGI on localhost:9090.
Routing and consumers:
### routing.py ###
from . import consumers
channel_routing = {
'websocket.connect': consumers.ws_connect,
'websocket.receive': consumers.ws_receive,
'websocket.disconnect': consumers.ws_disconnect,
}
and
### consumers.py ###
import traceback
from django.http import HttpResponse
from channels.handler import AsgiHandler
from channels import Group
from channels.sessions import channel_session
from channels.auth import channel_session_user, channel_session_user_from_http
from myproject import CustomLogger
logger = CustomLogger(__name__)
#channel_session_user_from_http
def ws_connect(message):
logger.info("ws_connect: %s" % message.user.email)
message.reply_channel.send({"accept": True})
message.channel_session['prefix'] = "You said"
# message.channel_session['django_user'] = message.user # tried doing this but it doesn't work...
#channel_session_user_from_http
def ws_receive(message, http_user=True):
try:
logger.info("1) User: %s" % message.user)
logger.info("2) Channel session fields: %s" % message.channel_session.__dict__)
logger.info("3) Anything at 'django_user' key? => %s" % (
'django_user' in message.channel_session,))
user = User.objects.get(pk=message.channel_session['_auth_user_id'])
logger.info(None, "4) ws_receive: %s" % user.email)
prefix = message.channel_session['prefix']
message.reply_channel.send({
'text' : "%s: %s" % (prefix, message['text']),
})
except Exception:
logger.info("ERROR: %s" % traceback.format_exc())
#channel_session_user_from_http
def ws_disconnect(message):
logger.info("ws_disconnect: %s" % message.__dict__)
message.reply_channel.send({
'text' : "%s" % "Sad to see you go :(",
})
And then to test, I go into Javascript console on the same domain as my HTTP site, and type in:
> var socket = new WebSocket('ws://localhost:9090/')
> socket.onmessage = function(e) {console.log(e.data);}
> socket.send("Testing testing 123")
VM481:2 You said: Testing testing 123
And my local server log shows:
ws_connect: test#test.com
1) User: AnonymousUser
2) Channel session fields: {'_SessionBase__session_key': 'chnb79d91b43c6c9e1ca9a29856e00ab', 'modified': False, '_session_cache': {u'prefix': u'You said', u'_auth_user_hash': u'ca4cf77d8158689b2b6febf569244198b70d5531', u'_auth_user_backend': u'django.contrib.auth.backends.ModelBackend', u'_auth_user_id': u'1'}, 'accessed': True, 'model': <class 'django.contrib.sessions.models.Session'>, 'serializer': <class 'django.core.signing.JSONSerializer'>}
3) Anything at 'django_user' key? => False
4) ws_receive: test#test.com
Which, of course, makes no sense. Few questions:
Why would Django see message.user as an AnonymousUser but have the actual user id _auth_user_id=1 (this is my correct user ID) in the session?
I am running my local server (WSGI) on 8080 and daphne (ASGI) on 9090 (different ports). And I didn't include session_key=xxxx in my WebSocket connection - yet Django was able to read my browser's cookie for the correct user, test#test.com? According to Channels docs, this shouldn't be possible.
Under my setup, what is the best / simplest way to carry out authentication with Django channels?

Note: This answer is explicit to channels 1.x, channels 2.x uses a different auth mechanism.
I had a hard time with django channels too, i had to dig into the source code to better understand the docs ...
Question 1:
The docs mention this kind of long trail of decorators relying on each other (http_session, http_session_user ...) that you can use to wrap your message consumers, in the middle of that trail it states this:
Now, one thing to note is that you only get the detailed HTTP information during the connect message of a WebSocket connection (you can read more about that in the ASGI spec) - this means we’re not wasting bandwidth sending the same information over the wire needlessly.
This also means we’ll have to grab the user in the connection handler and then store it in the session;....
Its easy to get lost in all that, at least we both did ...
You just have to remember that this happens when you use channel_session_user_from_http:
It calls http_session_user
a. calls http_session which will parse the message and give us a message.http_session attribute.
b. Upon returning from the call, it initiates a message.user based on the information it got in message.http_session ( this will bite you later)
It calls channel_session which will initiate a dummy session in message.channel_session and ties it to the message reply channel.
Now it calls transfer_user which will move the http_session into the channel_session
This happens during the connection handling of a websocket, so on subsequent messages you won't have acces to detailed HTTP information, so what's happening after the connect is that you're calling channel_session_user_from_http again, which in this situation (post-connect messages) calls http_session_user which will attempt reading the Http information but fails resulting in setting message.http_session to None and overriding message.user to AnonymousUser.
That's why you need to use channel_session_user in this case.
Question 2:
Channels can use Django sessions either from cookies (if you’re running your websocket server on the same port as your main site, using something like Daphne), or from a session_key GET parameter, which works if you want to keep running your HTTP requests through a WSGI server and offload WebSockets to a second server process on another port.
Remember http_session, that decorator that gets us the message.http_session data? it appears that if it doesn't find a session_key GET parameter it fails to settings.SESSION_COOKIE_NAME, which is the regular sessionid cookie, so whether you provide session_key or not, you'll still get connected if you're logged in, of course that happens only when your ASGI and WSGI servers are on the same domain (127.0.0.1 in this case), the port difference doesn't matter.
I think the difference that the docs are trying to communicate but didn't expand on is that you need to setup session_key GET parameter when having your ASGI and WSGI servers on different domains since cookies are restricted by domain not port.
Due to that lack of explanation i had to test running ASGI and WSGI on same port and different port and the result was the same, i was still getting authenticated, changed one server domain to 127.0.0.2 instead of 127.0.0.1 and the authentication was gone, set the session_key get parameter and the authentication was back again.
Update: a rectification of the docs paragraph was just pushed to the channels repo, it was meant to mention domain instead of port like i mentioned.
Question 3:
my answer is the same as turbotux's but longer, you should use #channel_session_user_from_http on ws_connect and #channel_session_user on ws_receive and ws_disconnect, nothing from what you showed tells that it won't work if you do that change, maybe try removing http_user=True from your receive consumer? even thou i suspect it has no effect since its undocumented and intended only to be used by Generic Consumers...
Hope this helps!

To answer your first question you need to use the:
channel_session_user
decorator in the receive and disconnect calls.
channel_session_user_from_http
calls the transfer_user session during the connect method to transfer the http session to the channel session. This way all future calls may access the channel session to retrieve user information.
To your second question I believe what you are seeing is that default web socket library passes the browser cookies over the connection.
Third, I think your setup will be working quite well once have changed the decorators.

I ran into this problem and I found that it was due to a couple of issues that might be the cause. I'm not suggesting this will solve your issue, but might give you some insight. Keep in mind I am using rest framework. First I was overriding the User model. Second when I defined the application variable in my root routing.py I didn't use my own AuthMiddleware. I was using the docs suggested AuthMiddlewareStack. So, per the Channels docs, I defined my own custom authentication middleware, which takes my JWT value from the cookies, authenticates it and assigns it to the scope["user"] like so:
routing.py
from channels.routing import ProtocolTypeRouter, URLRouter
import app.routing
from .middleware import JsonTokenAuthMiddleware
application = ProtocolTypeRouter(
{
"websocket": JsonTokenAuthMiddleware(
(URLRouter(app.routing.websocket_urlpatterns))
)
}
middleware.py
from http import cookies
from django.contrib.auth.models import AnonymousUser
from django.db import close_old_connections
from rest_framework.authtoken.models import Token
from rest_framework_jwt.authentication import BaseJSONWebTokenAuthentication
class JsonWebTokenAuthenticationFromScope(BaseJSONWebTokenAuthentication):
def get_jwt_value(self, scope):
try:
cookie = next(x for x in scope["headers"] if x[0].decode("utf-8")
== "cookie")[1].decode("utf-8")
return cookies.SimpleCookie(cookie)["JWT"].value
except:
return None
class JsonTokenAuthMiddleware(BaseJSONWebTokenAuthentication):
def __init__(self, inner):
self.inner = inner
def __call__(self, scope):
try:
close_old_connections()
user, jwt_value =
JsonWebTokenAuthenticationFromScope().authenticate(scope)
scope["user"] = user
except:
scope["user"] = AnonymousUser()
return self.inner(scope)
Hope this helps this helps!

Ironworker job done notification

I'm writing python app which currently is being hosted on Heroku. It is in early development stage, so I'm using free account with one web dyno. Still, I want my heavier tasks to be done asynchronously so I'm using iron worker add-on. I have it all set up and it does the simplest jobs like sending emails or anything that doesn't require any data being sent back to the application. The question is: How do I send the worker output back to my application from the iron worker? Or even better, how do I notify my app that the worker is done with the job?
I looked at other iron solutions like cache and message queue, but the only thing I can find is that I can explicitly ask for the worker state. Obviously I don't want my web service to poll the worker because it kind of defeats the original purpose of moving the tasks to background. What am I missing here?

I see this question is high in Google so in case you came here with hopes to find some more details, here is what I ended up doing:
First, I prepared the endpoint on my app. My app uses Flask, so this is how the code looks:
#app.route("/worker", methods=["GET", "POST"])
def worker():
#refresh the interface or whatever is necessary
if flask.request.method == 'POST':
return 'Worker endpoint reached'
elif flask.request.method == 'GET':
worker = IronWorker()
task = worker.queue(code_name="hello", payload={"WORKER_DB_URL": app.config['WORKER_DB_URL'],
"WORKER_CALLBACK_URL": app.config['WORKER_CALLBACK_URL']})
details = worker.task(task)
flask.flash("Work queued, response: ", details.status)
return flask.redirect('/')
Note that in my case, GET is here only for testing, I don't want my users to hit this endpoint and invoke the task. But I can imagine situations when this is actually useful, specifically if you don't use any type of scheduler for your tasks.
With the endpoint ready, I started to look for a way of visiting that endpoint from the worker. I found this fantastic requests library and used it in my worker:
import sys, json
from sqlalchemy import *
import requests
print "hello_worker initialized, connecting to database..."
payload = None
payload_file = None
for i in range(len(sys.argv)):
if sys.argv[i] == "-payload" and (i + 1) < len(sys.argv):
payload_file = sys.argv[i + 1]
break
f = open(payload_file, "r")
contents = f.read()
f.close()
payload = json.loads(contents)
print "contents: ", contents
print "payload as json: ", payload
db_url = payload['WORKER_DB_URL']
print "connecting to database ", db_url
db = create_engine(db_url)
metadata = MetaData(db)
print "connection to the database established"
users = Table('users', metadata, autoload=True)
s = users.select()
#def run(stmt):
# rs = stmt.execute()
# for row in rs:
# print row
#run(s)
callback_url = payload['WORKER_CALLBACK_URL']
print "task finished, sending post to ", callback_url
r = requests.post(callback_url)
print r.text
So in the end there is no real magic here, the only important thing is to send the callback url in the payload if you need to notify your page when the task is done. Alternatively you can place the endpoint url in the database if you use one in your app. Btw. the snipped above also shows how to connect to the postgresql database in your worker and print all the users.
One last thing you need to be aware of is how to format your .worker file, mine looks like this:
# set the runtime language. Python workers use "python"
runtime "python"
# exec is the file that will be executed:
exec "hello_worker.py"
# dependencies
pip "SQLAlchemy"
pip "requests"
This will install the latest versions of SQLAlchemy and requests, if your project is dependent on any specific version of the library, you should do this instead:
pip "SQLAlchemy", "0.9.1"

Easiest way - push message to your api from worker - it's log or anything you need to have in your app

python nose and twisted

I am writing a test for a function that downloads the data from an url with Twisted (I know about twisted.web.client.getPage, but this one adds some extra functionality). Either ways, I want to use nosetests since I am using it throughout the project and it doesn't look appropriate to use Twisted Trial only for this particular test.
So what I am trying to do is something like:
from nose.twistedtools import deferred
#deferred()
def test_download(self):
url = 'http://localhost:8000'
d = getPage(url)
def callback(data):
assert len(data) != 0
d.addCallback(callback)
return d
On localhost:8000 listens a test server. The issue is I always get twisted.internet.error.DNSLookupError
DNSLookupError: DNS lookup failed: address 'localhost:8000' not found: [Errno -5] No address associated with hostname.
Is there a way I can fix this? Does anyone actually uses nose.twistedtools?
Update: A more complete traceback
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/nose-0.11.2-py2.6.egg/nose/twistedtools.py", line 138, in errback
failure.raiseException()
File "/usr/local/lib/python2.6/dist-packages/Twisted-9.0.0-py2.6-linux-x86_64.egg/twisted/python/failure.py", line 326, in raiseException
raise self.type, self.value, self.tb
DNSLookupError: DNS lookup failed: address 'localhost:8000' not found: [Errno -5] No address associated with hostname.
Update 2
My bad, it seems in the implementation of getPage, I was doing something like:
obj = urlparse.urlparse(url)
netloc = obj.netloc
and passing netloc to the the factory when I should've passed netloc.split(':')[0]

Are you sure your getPage function is parsing the URL correctly? The error message seems to suggest that it is using the hostname and port together when doing the dns lookup.
You say your getPage is similar to twisted.web.client.getPage, but that works fine for me when I use it in this complete script:
#!/usr/bin/env python
from nose.twistedtools import deferred
from twisted.web import client
import nose
#deferred()
def test_download():
url = 'http://localhost:8000'
d = client.getPage(url)
def callback(data):
assert len(data) != 0
d.addCallback(callback)
return d
if __name__ == "__main__":
args = ['--verbosity=2', __file__]
nose.run(argv=args)
While running a simple http server in my home directory:
$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...
The nose test gives the following output:
.
----------------------------------------------------------------------
Ran 1 test in 0.019s
OK

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I troubleshoot this error in telegraf? - python

Related

Why does output of subprocess.run appear too early?

Running `connect_get_namespaced_pod_exec` using kubernetes client corev1api gives bad request

Session authentication with Django channels

Ironworker job done notification

python nose and twisted

Categories

Resources