Deserializing Prometheus `remote_write` Protobuf output in Python

Deserializing Prometheus `remote_write` Protobuf output in Python - python

I'm experimenting (for the first time) with Prometheus. I've setup Prometheus to send messages to a local flask server:
remote_write:
- url: "http://localhost:5000/metric"
I'm able to read the incoming bytes, however, I'm not able to convert the incoming messages to any meaningful data.
I'm very new to Prometheus (and Protobuf!) so I'm not sure what the best approach is. I would rather not use a third party package, but want to learn and understand the Protobuf de/serialization myself.
I tried copying the metrics.proto definitions from the Prometheus GitHub and compiling them with protoc. I tried importing the metrics_pb2.py file and parsing the incoming message:
read_metric = metrics_pb2.Metric()
read_metric.ParseFromString(request.data)
I also tried using the remote.proto definitions (specifically WriteRequest) which also didn't work:
read_metric = remote_pb2.WriteRequest()
read_metric.ParseFromString(request.data)
This results in:
google.protobuf.message.DecodeError: Error parsing message
So I suspect that I'm using the wrong Protobuf definitions?
I would really appreciate any help & advice on this!
To provide some more context for what I'm attempting to accomplish:
I'm trying to stream data from multiple Prometheus instances to a message queue so they can be passed to a machine learning model.
I'm using online training with an active learning model, and I want the data to be (near) real-time. That's why I thought the remote_write functionality is the best approach rather than continuously scraping each instance. If you have any other ideas on how I can build this system, feel free to share - I've just been playing around with it for a couple days, so I'm open to any feedback!
ANSWER EDIT:
I had to first decompress the data using snappy, thanks larsks!:
bytes = request.data
decompressed = snappy.uncompress(bytes)
read_metric = remote_pb2.WriteRequest()
read_metric.ParseFromString(decompressed)

The remote.proto document is the correct protobuf specification. You may find this document useful, which explicitly defines the remote write protocol. That document includes the "official" protobuf specification, and mentions that:
The remote write request MUST be encoded using Google Protobuf 3, and MUST use the schema defined above. Note the Prometheus implementation uses gogoproto optimisations - for receivers written in languages other than Golang the gogoproto types MAY be substituted for line-level equivalents.
The document also notes that the body of remote write requests is compressed:
The remote write request in the body of the HTTP POST MUST be compressed with Google’s Snappy. The block format MUST be used - the framed format MUST NOT be used.
So before you can parse the request body you'll need to find a Python solution for decompressing snappy-compressed data.
(I found a link to that google doc from this article that talks about the development of the remote write protocol.)

Related

Need more context for the Grafanalib Python3 library

I'm building a project using python and grafana where I'd like to generate a certain number of copies of certain grafana dashboards based on certain criteria. I've downloaded the grafanalib library to help me out with that, and I've read through the Generating Dashboards From Code section of the grafanalib website, but I feel like I still need more context to understand how to use this library.
So my first question is, how do I convert a grafana dashboard JSON model into a python friendly format? What method of organization do I use? I saw the dashboard generation function written in the grafanalib documentation, but it looked quite a bit different from how my JSON data is organized. I'd just like some further description of how to do the conversion.
My second question is, once I've converted my grafana JSON into a python format, how do I then get the proper information to send that generated dashboard to my grafana server? I see in the grafanalib documentation the "upload_to_grafana" function used to send the information and it takes in the three parameters (json, server, api_key), and I understand where its getting the json parameter from, but I dont get where the server information or API key are coming from or where that information is found to be input.
This is all being developed on a raspberry pi 4 just to put that out there. I'm working on a personal smart agriculture project as a way to develop my coding abilities further, as I'm self taught. Any help that can be provided to help me in my understanding is most appreciated. Thank you.

create an API key in Grafana configuration ..The secret key that u get while creating is the API key ..Server is localhost:3000 in case of installed grafana

Understanding protobuff protocol

I'm just doing some reverse engineering exercise and have ran across application/x-protobuff protocol..
I am currently sniffing network calls from redfin using mitmproxy. I see a endpoint for a result, however the response is unstructured JSON formatted data with content type application/x-protobuff After doing a bit of research, I found out that protobuff uses a schema to map the data internally, and I am assuming the schema also sits in the client somewhere, called .proto file.
SS
To validate my assumption on what that screenshot tells is that
I can see there is a response header called X-ProtoBuf-Schema is that the the location where the schma would be located, the same schema I can use to decrypt the response data? How would I go on about reading that data in a more structured manner?
I am able to make a request using requests to that endpoint, just gives me protobuffers.
PS: This is what the JSON format looks like
https://pastebin.com/LY51X9KZ

"and I am assuming the schema also sits in the client somewhere, called .proto file." - I wouldn't assume that at all; the client, once built, doesn't need the .proto - the generated code is used instead of any explicit schema. If a site is publishing a schema, it is probably a serialized FileDescriptorSet from google/protobuf/descriptor.proto, which contains the intent of the .proto, but as data.

How to send json into snowplow using iglu webook in python

I have a json full of event data that I need to send into snowplow in python using an iglu webhook but having trouble finding any solid guidance on this. Most of the documentation I've been able to find relates to tracking specific events and sending the data through but I need to backfill historical data in the same manner I'll fill forward looking data hence having to send a large json with activity history at the outset.
Is this possible using snowplow/python/iglu or am I approaching the problem incorrectly?

This question is getting old and OP may have moved on, but I'll leave an answer for anyone else who might stumble upon it.
A Snowplow collector (eg, the stream-collector) receives data over HTTP. Any method of sending an HTTP request should work in theory, however there are specific SDKs that address common use cases. For Python specifically, there is the snowplow-python-tracker. You can refer to the full documentation here: Snowplow Python Tracker Docs.
You do not need to be using an Iglu webhook. You can point your Python tracker instance directly to your collector via the existing request paths, which are documented here. Yes, one of these paths is for requests via the Iglu webhook adapter but that is meant to be used in specific situations where you don't control the environment, in which the tracker is instantiated, eg third-pary vendor systems.

Python - Django - cardav - How to implement a server

I have a server application written with Django which has a contacts database.
I wish to add a cardav webservice in order to share my contacts on my phone. I have made many search but I am completely lost.
I found some server as Radical, some API which uses files ... but nothing help me.
I need to implement in my server an API which will return to my Android the list of contact from my databases. What output format should I use ?
Thank you.

Your question seems a bit generic nor do you list what resources you looked at and why you are lost.
This presentation is a little old, but shows the fundamentals on how the *DAV protocols work. Building a CardDAV Client is another great starting point.
CardDAV itself is specified in
RFC 6352, and the related RFCs:
WebDAV,
WebDAV ACL,
etc.
What output format should I use ?
CardDAV requests and responses use
WebDAV,
hence XML.
The actual payload is a
vCard v3.
If you are looking for sample code:
The Apple CalendarServer
is a full-fledged CalDAV/CardDAV server written in Python.
Radicale is another one, but you already found that (be more specific why this isn't helping you, Radicale looks like a great starting point to me).
Finally: I don't think Android has CardDAV support builtin. Presumably you are using a sync plugin?

Loading a Lot of Data into Google Bigquery from Python

I've been struggling to load big chunks of data into bigquery for a little while now. In Google's docs, I see the insertAll method, which seems to work fine, but gives me 413 "Entity too large" errors when I try to send anything over about 100k of data in JSON. Per Google's docs, I should be able to send up to 1TB of uncompressed data in JSON. What gives? The example on the previous page has me building the request body manually instead of using insertAll, which is uglier and more error prone. I'm also not sure what format the data should be in in that case.
So, all of that said, what is the clean/proper way of loading lots of data into Bigquery? An example with data would be great. If at all possible, I'd really rather not build the request body myself.

Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.
If you'd like to send large chunks directly to BQ, you can send it via POST. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert() instead of tabledata.insertAll(), and provide a description of a load job. To actually push the bytes using the Python client, you can create a MediaFileUpload or MediaInMemoryUpload and pass it as the media_body parameter.
The other option is to stage the data in Google Cloud Storage and load it from there.

The example here uses the resumable upload to upload a CSV file. While the file used is small, it should work for virtually any size upload since it uses a robust media upload protocol. It sounds like you want json, which means you'd need to tweak the code slightly for json (an example for json is in the load_json.py example in the same directory). If you have a stream you want to upload instead of a file, you can use a MediaInMemoryUpload instead of the MediaFileUpload that is used in the example.
BTW ... Craig's answer is correct, I just thought I'd chime in with links to sample code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deserializing Prometheus `remote_write` Protobuf output in Python - python

Related

Need more context for the Grafanalib Python3 library

Understanding protobuff protocol

How to send json into snowplow using iglu webook in python

Python - Django - cardav - How to implement a server

Loading a Lot of Data into Google Bigquery from Python

Categories

Resources