I’m currently working on a research for a faceted search backend implementation and want to hear your advice about picking the right technology and approach to tackle this, before I'll gonna perform POCs.
The input
n*m matrix, where n is ~50m records, and m is ~5k columns. About half of the columns are boolean.
Size: ~10gb
Requirements
Each of the ~5k columns should be considered as an optional attribute for the faceted search. Only the relevant attributes should be appear.
The types are varies(Boolean, String etc).
Sub-second response for each search (Multiple filters applied)
Each filter should provide the values counts (e.g 100 valid options, based on the current filters).
The solution should serve 500 concurrent users per sec.
Solution alternatives (Backend solution to serve the UI)
Implement an in-memory structure. This might require custom optimizations and indices to be implemented in order to achieve sub-second response.
Work with a db/search engine which might provide the required latency. Among the solutions I thought about - Clickhouse or ElasticSearch/OpenSearch.
I would love to hear your thoughts.
The desired solution should be as simple as possible (e.g using an out of the box solution rather than implementing complex custom structure), cost wise.
*The mentioned matrix is the input - Each solution will probably require indexing / reconstruct it to the right data structure.
I'm currently working on a tkinter python school project where the sole purpose is to generate images from audio files, I'm going to pick audio properties and use them as values to generate unique abstract images from it, however I don't know which properties I can analyze to extract the values from. So I was looking for some guidance on which properties (audio frequency, amplitude... etc.) I can extract values from to use to generate the images with Python.
The question is very broad in it's current form.
(Bare in mind audio is not my area of expertise so do keep an eye out for the opinion of people working in audio/audiovisual/generative fields.)
You can go about it either way: figure out what kind of image(s) you'd like to create from audio and from there figure out which audio features to use. The other way around is also valid: pick an audio feature you'd like to explore, then think of how you'd best or most interestingly represent that visually.
There's a distintion between image and images.
For a single image, the simplest thing I can think of is drawing a grid of squares where a visual property of the square (e.g. square size, fill colour intensity, etc.) is mapped to the amplitude at that time. The single image would visualise a whole track's amplitude pattern. Even with such a simple example there are many choices you can make (how often you sample, how you layout the grid (cartesian, polar), how each amplitude sample is visualised (could different shapes, sizes, colours, etc.).
(Similar concept to CinemaRedux, simpler for audio only)
You can look into the field of data visualisation for inspiration.
Information is Beautiful is great place to start.
If you want to generate images that seems to go into the audiovisual territory (e.g. abstract animation, audio reactive motion graphics, etc.).
Your question originally had the tag Processing tag, which I removed, however you could be using Processing's Python Mode.
In ferms of audio visualisisation one good example I can think is Robert Hogin's work, see Magnetosphere and the Audio-generated landscape prototype. He is using frequency analysis (FFT) with a bit of smoothing/data massaging to amplify the elements useful for visualisation and dampen some of the noise:
(There are a few handy audio libraries such as Minim and beads, however I assume you're intresting in using raw Python, not Jython (which is what the official Processing Python mode uses). He is an answer on FFT analysis for visualisation (even though it's in Processing Java, the principles can be applied in Python)
Personally I've only used pyaudio so far for basic audio tasks. I would assume you could use it for amplitude analysis, but for other more complex tasks you might something extra.
Doing a quick search librosa pops up.
If what you want to achieve isn't clear, try prototyping first and start with the simplest audio analysis and visual elements you can think of (e.g. amplitude mapped to boxes over time). Constraints can be great for creativity and the minimal approach could translate into a cleaner, minimal visuals.
You can then look into FFT, MFCC, onset/ beat detection, etc.
Another tool that could be useful for prototyping is Sonic Visualiser.
You can open a track and use some of the built-in feature extractors.
(You can even get away with exporting XML or CSV data from Sonic Visualser which you can load/parse in Python and use to render image(s))
It uses a plugin system (similar to VST plugins in DAWs like Abbleton Live, Apple Logic, etc.) called Vamp plugins. You can then use the VampPy Python wrapper if you need the data at runtime.
(You might also want to draw inspiration from other languages used of audiovisual artworks like PureData + Gems , MaxMSP + Jitter, VVVV, etc.)
Time domain: Zero-crossing rate, Root mean square energy ,etc . Frequency Domain: Spectral bandwith,flux,rollof,flatness,MFCC etc. Also ,tempo, You can use librosa for Python , link : https://librosa.org/doc/latest/index.html for extraction from a .wav file , which implements Fast Fourier Transfrom and framing. And then you can apply some statistics such mean,standard deviation to the vector of the above characteristics across the whole audio file.
Providing an additional avenue for exploration: you have some tools to explore this qualitatively (as opposed to quantitatively using metrics derived from the audio signal as suggested in the great answers above)
As you mention the objective is to generate unique abstract images from sound - I would suggest an interesting angle may be to apply some Machine Learning techniques and derive some mood classification predictions from the source audio.
For instance you could use the Tensorflow models in essentia to predict the mood of the track and associate images you select with the mood scores generated. I would suggest going well beyond this and using the tkinter image creation tools to create your mappings to mood. Use pen and paper to develop your mapping strategy - are certain moods more angular or circular? What colour mappings will you select, and why? You have a great deal of freedom to create these mappings - so start simple as complexity builds naturally.
Using some some simple mood predictions may be more useful for you as someone who has more experience with the qualitative experience with sound rather than the quantitative experience as an audio engineer. I think this may be worth making central to the report you write and documenting your mapping decisions and design process for the report if this is a requirement of the task.
I've been looking for a good way in Python to draw an abstract syntax tree to PNG. A combination of networkx and matplotlib seems to be able to do the job well enough to get by.
But I just noticed that https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html does much better! This applies when using sklearn to generate a random forest; it is a function specific to the resulting decision trees.
Is there a way to supply an arbitrary tree to the above function, or to some version of the code behind it, to obtain the high-quality rendering?
You could use simple graphviz. There is examples how to draw your own data structures.
I would like to monitor our centralized git repository and visualize them in Grafana. In the end, I want to create a chart that would have repository-name on X-axis and disk space on Y-axis (i.e. bar graph).
I am writing a prometheus exporter. I am unsure about the metric type of the custom exporter.
If I design an exporter that returns following:
disk_space(name=repo1, size=10240)
disk_space(name=repo2, size=20480)
then I would have to define and add lots of Gauge. Is this the right way to go? Is there a better solution? Also, I would like to see how git repository's disk space changed over time.
Would it be better if I use Histogram metric type?
Should I define a single gauge and add labels per git-repository?
Metrics about git repository can be tricky, see this article about git-sizer. You may even reuse part of the project, it is in go.
Now, to answer your questions:
gauge is the right type because size can increase or decrease (depending on compression or garbage collection applied)
the natural way of identifying your metric is to use a meaningful name and use labels to distinguish between repo (this is the cardinality)
Histogram is better suited when you want to keep some information about what happens between two scrapes of a metric. In your case, this is not relevant because you only care about the evolution of the size and it is unlikely to spike wildly.
I have a graph database with a Gremlin query engine. I don't want to change that API. The point of the library is to be able to study graphs that can not fully stay in memory and maximize speed by not falling back to virtual memory.
The query engine is lazy, it will not fetch an edge or vertex until required or requested by the user. Otherwise it only use indices to traverse the graph.
Networkx has another API. What can I do to re-use networkx graph algorithm implementations with my graph?
You are talking about extending your Graph API.
Hopefully the code translates from one implementation to another in which case copy-paste'ing from the algorithms section might work for you. (check the licenses first)
If you want to use existing code going forward you could make a middle layer or adapter class to help out with this.
If the source code doesn't line up then NetworkX has copious notes about the algorithms used and the underpinning mathematics at the bottom of the help pages and the code itself.
For the future:
Maybe you could make it open source and get some traction with others who see the traversal engine as a good piece of engineering. In which case you would have help in maintaining/extending your work. Good Luck.