python stream handler for multiple servers - python

There are servers running in multiple locations,I need to stream the application log data from these servers to a ZMQ(Zero Message Queue) using python stream handler.How do i use the stream handler to get this done? I have already referred the Python Handlers documentation https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler

You can post your logs from different servers as json to the ZMQ iteratively. For the ZMQ make a PyZMQ application which will have a message handler, listening to your incoming json from these servers. Then as per requirement the incoming json data can be processed and stored in a file (or wherever you want to store). This file can be read for the incoming logs ( eg: tail -f fileName.txt or fileName.log)
Here is link which will help you setup a PyZMQ application:
Designing and Testing PyZMQ Applications – Part 1
For logging specifically you can use these example:
A simple Python logging example
Logging, StreamHandler and standard streams

Related

How to stream logs from elk stack to python

I have a kafka consumer in python to process log data (stacktrace analysis and automatic issue creation) and we are also using elk stack in parallel. Is there any possibility to stream logs to python via elk to get rid of kafka? I have no experience in elk and can't find anything about streaming from it. Seems that I can just query log data once per time but this doesn't seem a perfect decision.
No, you cannot stream data out of Elasticsearch on its own.
If your input is something else, you can use Logstash's various output plugins (or write your own) that can write into a supported Python library
For example: pipe, tcp, websocket/http, exec plugins are all generic enough to be used with any language
However, logstash does not persist events like Kafka does, so if you want something that can handle back pressure and doesn't drop events, you'd keep Kafka around

How to handle callbacks in Python 3?

I have a custom HTTP method/verb (lets say LISTEN) which allows me to listen for an update on a resource stored on a remote server. The API available for this has a blocking call which will get my client code to listen for an update till I interrupt the execution of that call. Just to provide an example, if I were to perform a curl as follows:
curl -X LISTEN http://<IP-Address>:<Port>/resource
The execution of this creates a blocking call, providing me updates on the resource whenever a new value for this resource is pushed to the server (similar to a pub-sub model), the response for that would look similar to this:
{"data":"value update 1","id":"id resource"}
{"data":"value update 2","id":"id resource"}
(...)
If I were to write code to handle this in Python, how do I call my url using this custom verb and handle the blocking call/call back while ensuring that this does not block the execution of the rest of my code?
If you're using Python requests lib with a custom HTTP verb and need to read stream content, you can do something like this:
import json
import requests # sudo pip3 install requests
url = "http://........."
r = requests.request('LISTEN', url, stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
Note: by default all requests calls are blocking, so you need to run this code in a separate thread/process to avoid that.
...while ensuring that this does not block the execution of the rest of my code
Since you provided no details about your application, I will try to list some general thoughts on question.
Your task can be solved in many ways. Solution depends on your app architecture.
If this is a web server, you can take a look at tornado(see streaming callback) or aiohttp streaming examples.
On the other hand you can run the code above in a separate process and communicate with other applications/services using RabbitMQ for example (or other ipc mechanism).

Logging with command line waitress-serve

Is there a way to log waitress-serve output into a file?
The current command I use is:
waitress-serve --listen=localhost:8080 --threads=1 my_app_api:app
The application we used was not written with waitress in mind earlier, so we choose to serve it with command line to avoid change (for now at least).
TLDR waitress-serve doesn't provide a way to do it. See the 'how do i get it to log' section.
Background
Per the documentation for the command-line usage of waitress-serve, no - there's no way to setup logging. See arguments docs.
waitress-serve is just an executable to make running your server more convenient. It's source-code is here runner.py. If you read it, you can see it actually is basically just calling from waitress import serve; serve(**args) for you. (That code clip is not literally what it's doing, but in spirit yes).
The documentation for waitress says that it doesn't log http traffic. That's not it's job. But it will log it's own errors or stacktraces. logging docs. If you read the waitress source trying to find when it logs stuff, you'll notice it doesn't seem to log http traffic anywhere github log search. It primarily logs stuff to do with the socket layer.
Waitress does say that if you want to log http traffic, then you need another component. In particular, it points you to pastedeploy docs which is some middle-ware that can log http traffic for you.
The documentation from waitress is actually kind of helpful answering you question, though not direct and explicit. It says
The WSGI design is modular.
per the logging doc
I.e. waitress won't log http traffic for you. You'll need another WSGI component to do that, and because WSGI is modular, you can probably choose a few things.
If you want some background on how this works, there's a pretty good post here leftasexercise.com
OK, how do I get it to log?
Use tee
Basically, if you just want to log the same stuff that is output from waitress-serve then you don't need anything special.
waitress-serve --listen=localhost:8080 --threads=1 my_app_api:app | tee -a waitress-serve.log
Python logging
But if you're actually looking for logging coming from python's standard logger (say you app is making logger calls or you want to log http traffic) then, you can set that up in your python application code. E.g. edit your applications soure-code and get it to setup logging to a file
import logging
logging.basicConfig(filename='app.log', encoding='utf-8', level=logging.DEBUG)
PasteDeploy middleware for http logs
Or if your looking for apache type http logging then you can use something like PasteDeploy to do it. Note, PasteDeploy is another python dependency so you'll need to install it. E.g.
pip install PasteDeploy
Then you need to setup a .ini file that tells PasteDeploy how to start your server and then also tell it to use TransLogger to create apache type http logs. This is explained more detail here logging with pastedeploy The ini file is specific to each app, but from your question is sounds like the ini file should look like:
[app:wsgiapp]
use = my_app_api:app
[server:main]
use = egg:waitress#main
host = 127.0.0.1
port = 8080
[filter:translogger]
use = egg:Paste#translogger
setup_console_handler = False
[pipeline:main]
pipeline = translogger
app
You'll still need to edit the source-code of your app to get PasteDeploy to load the app with your configuration file:
from paste.deploy import loadapp
wsgi_app = loadapp('config:/path/to/config.ini')
Webframework-dependent roll-your-own http logging
Even if you want to log http traffic, you don't necessarily need something like PasteDeploy. For example, if you are using flask as the web-framework, you can write your own http logs using after_request decorator:
#app.after_request
def after_request(response):
timestamp = strftime('[%Y-%b-%d %H:%M]')
logger.error('%s %s %s %s %s %s', timestamp, request.remote_addr, request.method, request.scheme, request.full_path, response.status)
return response
See the full gist at https://gist.github.com/alexaleluia12/e40f1dfa4ce598c2e958611f67d28966

Compress the streamed output with cherrypy

I am using cherrypy for a web server which is able to stream the output of some methods.
Server uses yield to send lines of data and client uses onprogress event of $.ajax method.
But enabling 'tools.gzip' config of cherrypy caused the output not to be cached by the client. In fact the onprogress event of client is not called unless the server method is finished completely. It seems the cherrypy compression tool is not able to compress the output in streaming mode (it can compress the output only when get it completely).
Now my first question is how to fix this problem. If it is not fixable my second question is how to diable the cherrypy compression for a specific method.
You have to enable the streaming capabilities of the request.
Set the following configuration:
{'response.stream': True}
The gzip tools inspect the current request and look for the stream and response accordingly.
For more information: http://docs.cherrypy.org/en/latest/advanced.html#streaming-the-response-body

Grabbing log files from production server

I developed a statistics system for online web service user behavior research in python, which mostly relies on reading and analyzing logs from production server. Currently I shared log folders internally under SMB protocol for the routine analytics program to read, but for the data accessing method I have 2 questions,
Are there any other way accessing logs other than via SMB? or other strategy?
I guess a lot read may block HD of the production and affect normal log writing, any solution to solve this?
I hoped I could come up with some real number but currently don't have. Any guy can give me some guide on doing this more gracefully?
If you are open to using a third party log aggregation tool, you have a couple of options:
http://graylog2.org/
http://www.logstash.net/
http://www.octopussy.pm/
https://github.com/facebook/scribe
In addition, if you are logging to syslog - many of the commonly used syslog daemons ( eg syslog-ng ) can be configured to forward logs from various applications to one or more of these aggregators. It is trivial to log to syslog from a python application - there is a syslog module in the standard library
Well, if you have a HTTP server in between (IHS, OHS, I guess Apache too...) then you can expose your physical repositories via a URL: each of your files will benefit from a URL too, and via this kind of code you can download them quite easily:
import os
import urllib2
# Open our local file for writing
f = urllib2.urlopen(url)
with open(os.path.basename(url), 'wb') as local_file:
local_file.write(f.read())

Categories