Monitoring git repositories using prometheus

Monitoring git repositories using prometheus - python

I would like to monitor our centralized git repository and visualize them in Grafana. In the end, I want to create a chart that would have repository-name on X-axis and disk space on Y-axis (i.e. bar graph).
I am writing a prometheus exporter. I am unsure about the metric type of the custom exporter.
If I design an exporter that returns following:
disk_space(name=repo1, size=10240)
disk_space(name=repo2, size=20480)
then I would have to define and add lots of Gauge. Is this the right way to go? Is there a better solution? Also, I would like to see how git repository's disk space changed over time.
Would it be better if I use Histogram metric type?
Should I define a single gauge and add labels per git-repository?

Metrics about git repository can be tricky, see this article about git-sizer. You may even reuse part of the project, it is in go.
Now, to answer your questions:
gauge is the right type because size can increase or decrease (depending on compression or garbage collection applied)
the natural way of identifying your metric is to use a meaningful name and use labels to distinguish between repo (this is the cardinality)
Histogram is better suited when you want to keep some information about what happens between two scrapes of a metric. In your case, this is not relevant because you only care about the evolution of the size and it is unlikely to spike wildly.

Related

Python Audio Analysis: What properties/values can I extract from it?

I'm currently working on a tkinter python school project where the sole purpose is to generate images from audio files, I'm going to pick audio properties and use them as values to generate unique abstract images from it, however I don't know which properties I can analyze to extract the values from. So I was looking for some guidance on which properties (audio frequency, amplitude... etc.) I can extract values from to use to generate the images with Python.

The question is very broad in it's current form.
(Bare in mind audio is not my area of expertise so do keep an eye out for the opinion of people working in audio/audiovisual/generative fields.)
You can go about it either way: figure out what kind of image(s) you'd like to create from audio and from there figure out which audio features to use. The other way around is also valid: pick an audio feature you'd like to explore, then think of how you'd best or most interestingly represent that visually.
There's a distintion between image and images.
For a single image, the simplest thing I can think of is drawing a grid of squares where a visual property of the square (e.g. square size, fill colour intensity, etc.) is mapped to the amplitude at that time. The single image would visualise a whole track's amplitude pattern. Even with such a simple example there are many choices you can make (how often you sample, how you layout the grid (cartesian, polar), how each amplitude sample is visualised (could different shapes, sizes, colours, etc.).
(Similar concept to CinemaRedux, simpler for audio only)
You can look into the field of data visualisation for inspiration.
Information is Beautiful is great place to start.
If you want to generate images that seems to go into the audiovisual territory (e.g. abstract animation, audio reactive motion graphics, etc.).
Your question originally had the tag Processing tag, which I removed, however you could be using Processing's Python Mode.
In ferms of audio visualisisation one good example I can think is Robert Hogin's work, see Magnetosphere and the Audio-generated landscape prototype. He is using frequency analysis (FFT) with a bit of smoothing/data massaging to amplify the elements useful for visualisation and dampen some of the noise:
(There are a few handy audio libraries such as Minim and beads, however I assume you're intresting in using raw Python, not Jython (which is what the official Processing Python mode uses). He is an answer on FFT analysis for visualisation (even though it's in Processing Java, the principles can be applied in Python)
Personally I've only used pyaudio so far for basic audio tasks. I would assume you could use it for amplitude analysis, but for other more complex tasks you might something extra.
Doing a quick search librosa pops up.
If what you want to achieve isn't clear, try prototyping first and start with the simplest audio analysis and visual elements you can think of (e.g. amplitude mapped to boxes over time). Constraints can be great for creativity and the minimal approach could translate into a cleaner, minimal visuals.
You can then look into FFT, MFCC, onset/ beat detection, etc.
Another tool that could be useful for prototyping is Sonic Visualiser.
You can open a track and use some of the built-in feature extractors.
(You can even get away with exporting XML or CSV data from Sonic Visualser which you can load/parse in Python and use to render image(s))
It uses a plugin system (similar to VST plugins in DAWs like Abbleton Live, Apple Logic, etc.) called Vamp plugins. You can then use the VampPy Python wrapper if you need the data at runtime.
(You might also want to draw inspiration from other languages used of audiovisual artworks like PureData + Gems , MaxMSP + Jitter, VVVV, etc.)

Time domain: Zero-crossing rate, Root mean square energy ,etc . Frequency Domain: Spectral bandwith,flux,rollof,flatness,MFCC etc. Also ,tempo, You can use librosa for Python , link : https://librosa.org/doc/latest/index.html for extraction from a .wav file , which implements Fast Fourier Transfrom and framing. And then you can apply some statistics such mean,standard deviation to the vector of the above characteristics across the whole audio file.

Providing an additional avenue for exploration: you have some tools to explore this qualitatively (as opposed to quantitatively using metrics derived from the audio signal as suggested in the great answers above)
As you mention the objective is to generate unique abstract images from sound - I would suggest an interesting angle may be to apply some Machine Learning techniques and derive some mood classification predictions from the source audio.
For instance you could use the Tensorflow models in essentia to predict the mood of the track and associate images you select with the mood scores generated. I would suggest going well beyond this and using the tkinter image creation tools to create your mappings to mood. Use pen and paper to develop your mapping strategy - are certain moods more angular or circular? What colour mappings will you select, and why? You have a great deal of freedom to create these mappings - so start simple as complexity builds naturally.
Using some some simple mood predictions may be more useful for you as someone who has more experience with the qualitative experience with sound rather than the quantitative experience as an audio engineer. I think this may be worth making central to the report you write and documenting your mapping decisions and design process for the report if this is a requirement of the task.

6 millions of markers in folium/leaflet map

With the MarkerCluster algorithm, it's possible to cluster the nearby markers together, so the map is visually very acceptable.
however, I found that the performance and the response of leaflet map decrease with the number of markers inside it.
I still don't understand it but I found people talking about Server-side clustering solution instead of client-side clustering.
This durable module project is a solution for big numbers of markers that uses this concept (Server-side clustering) in leaflet map.
My questions are:
how it is done in leaflet map?
how to make this solution in python at folium maps?

Server-side clustering can be accomplished with XHR requests.
The simplest approach would be to divide your map into squares, and have it switch between single-feature layers and substitute geoJSON/JSON layers using a MAP.on('zoomend', function(e){}); event.
In an example, if jQuery is available, you can do $.getJSON(SERVER_SIDE_URL, {VARIABLE: 'VALUE'}, function(data){}); on zoomend. Here the anonymous function will carry response data. You can use this data to either create a substitute LayerGroup, or a single Layer, while keeping track of and destroying its predicate.
The server's side will need to have access to the full dataset, and be able to either provide JSON for a single feature abstracting those nearby, or a set of features within the radius/square radius of a placeholder.
That's the abstract of one option. Alternatively, there may be market-ready solutions. But writing your own should produce a more efficient solution for such a simple task.

I find an Opensource solution for leaflet and Mapbox.
it is the SuperCluster project created by the owner of leaflet.
it is Server-Side Clustring solution with node.js and Client-Side Clustring with MapBox.
the concept of these algorithme is explained here

how to make semantic label image?

Now I want to train my own image data in caffe using SegNet.
But at the first step we need label our own image like these:
I have tried to search github but cannot find anything. So my question is anyone know which tool can make semantic label image?

Check out a tool called sloth: https://github.com/cvhciKIT/sloth, which is an open-source tool written in Python with PyQt for creating ground truth computer vision datasets for a wide array of applications, such as semantically creating data like you have above.
If you don't like sloth, you can use any image editing software, like GIMP where you would make one layer per label and use polygons and flood fill of different hues to create your data. You would then merge all of the layers together to make a final image that you would use for your purposes.
However, as user Miki mentioned (see discussion thread below), creating new datasets from the beginning will take a considerable amount of effort. It is highly advisable that you don't create this on your own as you need a lot of data to ensure your algorithms are performing correctly. You'll need the help of other (hopefully willing) PhD students, preferably those you know personally or work with you in your lab or workplace to help manually curate this data for you.
If this isn't an option, you can use crowd sourced funded places like Amazon Mechanical Turk where you can outsource the work to willing individuals where you inform them of the task at hand and you pay a small amount per image. This would be something to consider if you can't find many people to help you.
All in all, this will take a considerable amount of effort, not only in terms of time but in terms of people if you want to create a large data set within a short span of time. I would recommend you simply use established datasets, such as what you have referenced from Cambridge, or Miki suggested LabelMe by Antonio Torralba which not only is a toolbox for annotating images from his LabelMe dataset but it also allows you to do the same for your own images.
Good luck!

As answer by #rayryeng a tool called sloth is great to finish these task in simple way. However, if I have more than 20 object waiting for me to classify, sloth is not a ideal tools. Thus I develop a simple tool which call IsLabel to finish these problem with few algorithms.
And the result look like these while using IsLabel just took me 40s:
INPUT:
OUTPUT:
I know its not perfect but it work fine for me.

I would recommend using https://www.labelbox.io/. They open sourced a lot of their code and have a hosting platform to manage the whole labeling process end to end.
Here is an example of segmentation
And you can export labels with a mask.

Python: Create Nomograms from Data (using PyNomo)

I am working on Python 2.7. I want to create nomograms based on the data of various variables in order to predict one variable. I am looking into and have installed PyNomo package.
However, the from the documentation here and here and the examples, it seems that nomograms can only be made when you have equation(s) relating these variables, and not from the data. For example, examples here show how to use equations to create nomograms. What I want, is to create a nomogram from the data and use that to predict things. How do I do that? In other words, how do I make the nomograph take data as input and not the function as input? Is it even possible?
Any input would be helpful. If PyNomo cannot do it, please suggest some other package (in any language). For example, I am trying function nomogram from package rms in R, but not having luck with figuring out how to properly use it. I have asked a separate question for that here.

The term "nomogram" has become somewhat confused of late as it now refers to two entirely different things.
A classic nomogram performs a full calculation - you mark two scales, draw a straight line across the marks and read your answer from a third scale. This is the type of nomogram that pynomo produces, and as you correctly say, you need a formula. As mentioned above, producing nomograms like this is definitely a two-step process.
The other use of the term (very popular, recently) is to refer to regression nomograms. These are graphical depictions of regression models (usually logistic regression models). For these, a group of parallel predictor variables are depicted with a common scale on the bottom; for each predictor you read the 'score' from the scale and add these up. These types of nomograms have become very popular in the last few years, and thats what the RMS package will draft. I haven't used this but my understanding is that it works directly from the data.
Hope this is of some use! :-)

Python libraries for on-line machine learning MDP

I am trying to devise an iterative markov decision process (MDP) agent in Python with the following characteristics:
observable state
I handle potential 'unknown' state by reserving some state space
for answering query-type moves made by the DP (the state at t+1 will
identify the previous query [or zero if previous move was not a query]
as well as the embedded result vector) this space is padded with 0s to
a fixed length to keep the state frame aligned regardless of query
answered (whose data lengths may vary)
actions that may not always be available at all states
reward function may change over time
policy convergence should incremental and only computed per move
So the basic idea is the MDP should make its best guess optimized move at T using its current probability model (and since its probabilistic the move it makes is expectedly stochastic implying possible randomness), couple the new input state at T+1 with the reward from previous move at T and reevaluate the model. The convergence must not be permanent since the reward may modulate or the available actions could change.
What I'd like to know is if there are any current python libraries (preferably cross-platform as I necessarily change environments between Windoze and Linux) that can do this sort of thing already (or may support it with suitable customization eg: derived class support that allows redefining say reward method with one's own).
I'm finding information about on-line per-move MDP learning is rather scarce. Most use of MDP that I can find seems to focus on solving the entire policy as a preprocessing step.

Here is a python toolbox for MDPs.
Caveat: It's for vanilla textbook MDPs and not for partially observable MDPs (POMDPs), or any kind of non-stationarity in rewards.
Second Caveat: I found the documentation to be really lacking. You have to look in the python code if you want to know what it implements or you can quickly look at their documentation for a similar toolbox they have for MATLAB.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.