Subtract/compare differences between two audio files using Python - python

My goal is to take two identical length .wav files, one original with noise + speech, one processed with improved speech and compare the two. This should leave me with the difference of the two wave files being the noise that was removed during the processing.
I would like to do this to practice my python coding skills, also to test the efficiency of speech processing programs. So far I have found programs that can do this but I would really like to build my own simple version in python. Some libraries that I have considered are audiodiff and librosa but it doesn't seem like they include a subtract function.
I have a few programs that can accomplish this but I would like to create a simple version and be able to customize it over time as my needs expand.

The task of estimating the quality of a speech clip is known as Speech Quality Estimation. A slightly simpler task formulation is Speech Intelligibility, which only measures how easy it is to understand the speech, not the subjective "quality" of the speech signal.
These are established task within speech signal processing, and there are industry standards as well as bleeding edge research. Such methods (often referred to as objective metrics) take into account the human auditory system and perception of speech in order to give a good estimate for how a group of humans would evaluate the result in a listening test. Because of the complexity of human perception, good methods are a bit more involved than just a difference of the audio.
If you have access to the original speech (without noise), then you can use a full reference based method. An early ITU standard is PESQ, and the most recent as of 2021 is POLQA. A good metric with open source implementation is VISQOL.
If you do not have access to the original speech without noise, you must use a no-reference / single-ended / non-intrusive method. An ITU standard is P.563. There is currently a lot of research using deep neural networks to better solve this task. Examples are NISQA, Gamper2019
Some more details can be found on https://github.com/jonnor/machinehearing/tree/master/audio-quality

Related

Python Audio Analysis: What properties/values can I extract from it?

I'm currently working on a tkinter python school project where the sole purpose is to generate images from audio files, I'm going to pick audio properties and use them as values to generate unique abstract images from it, however I don't know which properties I can analyze to extract the values from. So I was looking for some guidance on which properties (audio frequency, amplitude... etc.) I can extract values from to use to generate the images with Python.
The question is very broad in it's current form.
(Bare in mind audio is not my area of expertise so do keep an eye out for the opinion of people working in audio/audiovisual/generative fields.)
You can go about it either way: figure out what kind of image(s) you'd like to create from audio and from there figure out which audio features to use. The other way around is also valid: pick an audio feature you'd like to explore, then think of how you'd best or most interestingly represent that visually.
There's a distintion between image and images.
For a single image, the simplest thing I can think of is drawing a grid of squares where a visual property of the square (e.g. square size, fill colour intensity, etc.) is mapped to the amplitude at that time. The single image would visualise a whole track's amplitude pattern. Even with such a simple example there are many choices you can make (how often you sample, how you layout the grid (cartesian, polar), how each amplitude sample is visualised (could different shapes, sizes, colours, etc.).
(Similar concept to CinemaRedux, simpler for audio only)
You can look into the field of data visualisation for inspiration.
Information is Beautiful is great place to start.
If you want to generate images that seems to go into the audiovisual territory (e.g. abstract animation, audio reactive motion graphics, etc.).
Your question originally had the tag Processing tag, which I removed, however you could be using Processing's Python Mode.
In ferms of audio visualisisation one good example I can think is Robert Hogin's work, see Magnetosphere and the Audio-generated landscape prototype. He is using frequency analysis (FFT) with a bit of smoothing/data massaging to amplify the elements useful for visualisation and dampen some of the noise:
(There are a few handy audio libraries such as Minim and beads, however I assume you're intresting in using raw Python, not Jython (which is what the official Processing Python mode uses). He is an answer on FFT analysis for visualisation (even though it's in Processing Java, the principles can be applied in Python)
Personally I've only used pyaudio so far for basic audio tasks. I would assume you could use it for amplitude analysis, but for other more complex tasks you might something extra.
Doing a quick search librosa pops up.
If what you want to achieve isn't clear, try prototyping first and start with the simplest audio analysis and visual elements you can think of (e.g. amplitude mapped to boxes over time). Constraints can be great for creativity and the minimal approach could translate into a cleaner, minimal visuals.
You can then look into FFT, MFCC, onset/ beat detection, etc.
Another tool that could be useful for prototyping is Sonic Visualiser.
You can open a track and use some of the built-in feature extractors.
(You can even get away with exporting XML or CSV data from Sonic Visualser which you can load/parse in Python and use to render image(s))
It uses a plugin system (similar to VST plugins in DAWs like Abbleton Live, Apple Logic, etc.) called Vamp plugins. You can then use the VampPy Python wrapper if you need the data at runtime.
(You might also want to draw inspiration from other languages used of audiovisual artworks like PureData + Gems , MaxMSP + Jitter, VVVV, etc.)
Time domain: Zero-crossing rate, Root mean square energy ,etc . Frequency Domain: Spectral bandwith,flux,rollof,flatness,MFCC etc. Also ,tempo, You can use librosa for Python , link : https://librosa.org/doc/latest/index.html for extraction from a .wav file , which implements Fast Fourier Transfrom and framing. And then you can apply some statistics such mean,standard deviation to the vector of the above characteristics across the whole audio file.
Providing an additional avenue for exploration: you have some tools to explore this qualitatively (as opposed to quantitatively using metrics derived from the audio signal as suggested in the great answers above)
As you mention the objective is to generate unique abstract images from sound - I would suggest an interesting angle may be to apply some Machine Learning techniques and derive some mood classification predictions from the source audio.
For instance you could use the Tensorflow models in essentia to predict the mood of the track and associate images you select with the mood scores generated. I would suggest going well beyond this and using the tkinter image creation tools to create your mappings to mood. Use pen and paper to develop your mapping strategy - are certain moods more angular or circular? What colour mappings will you select, and why? You have a great deal of freedom to create these mappings - so start simple as complexity builds naturally.
Using some some simple mood predictions may be more useful for you as someone who has more experience with the qualitative experience with sound rather than the quantitative experience as an audio engineer. I think this may be worth making central to the report you write and documenting your mapping decisions and design process for the report if this is a requirement of the task.

Is there a way to generate real time depthmap from single camera video in python/opencv?

I'm trying to convert single images into it's depthmap, but I can't find any useful tutorial or documentation.
I'd like to use opencv, but if you know a way to get the depth map using for example tensorflow, I'd be glad to hear it.
There are numerous tutorials for stereo vision but I want to make it cheaper because it's for a project to help blind people.
I'm currently using esp32 cam to stream frame by frame and receiving the images on python using opencv.
Usually, we need a photometric measurement from a different position in the world to form a geometric understanding of the world(a.k.a depth map). For a single image, it is not possible to measure the geometric, but it is possible to infer depth from prior understanding.
One way for a single image to work is to use a deep learning-based method to direct infer depth. Usually, the deep learning-based approaches are all based on python, so if you only familiar with python, then this is the approach that you should go for. If the image is small enough, i think it is possible for realtime performance. There are many of this kind of work using CAFFE, TF, TORCH etc. you can search on git hub for more option. The one I posted here is what i used recently
reference:
Godard, Clément, et al. "Digging into self-supervised monocular depth estimation." Proceedings of the IEEE international conference on computer vision. 2019.
Source code: https://github.com/nianticlabs/monodepth2
The other way is to use a large FOV video for a single camera-based SLAM. This one has various constraints such as need good features, large FOV, slow motion, etc. You can find many of this work such as DTAM, LSDSLAM, DSO, etc. There are a couple of other packages from HKUST or ETH that does the mapping given the position(e.g if you have GPS/compass), some of the famous names are REMODE+SVO open_quadtree_mapping etc.
One typical example for a single camera-based SLAM would be LSDSLAM. It is a realtime SLAM.
This one is implemented based on ROS-C++, I remember they do publish the depth image. And you can write a python node to subscribe to the depth directly or the global optimized point cloud and project it into a depth map of any view angle.
reference: Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
source code: https://github.com/tum-vision/lsd_slam

coding a deconvolution using python

Before I begin I have to tell you that I have zero knowledge about DSP in python.
I want to deconvolute two sound signals using python so that I can extract the room impulse response, the input signal being a sinesweep and the output a record of it.
I wrote a piece of code but it didn't work, I've been trying for too long and really without results.
Can someone please help me with a code that calculate the FFT of the input and output then calculate h the iFFT of their fraction and plot it.
Deconvolution is an ill-posed tough problem in presence of noise and spatially-variant blurring. I assume you have a non spatially variant problem, as far as you are using FFTs, so you can use restoration module from skimage python package (instead of programming the algorithm at low level with FFTs).
Here you can study a code example with one of the implemented methods in restoration module.
I recommend you to read O'Leary et al. book if you want to learn more. All authors of this book have more advanced books about this great topic.

Neural Network based ranking of documents

I'm planning of implementing a document ranker which uses neural networks. How can one rate a document by taking in to consideration the ratings of similar articles?. Any good python libraries for doing this?. Can anyone recommend a good book for AI, with python code.
EDIT
I'm planning to make a recommendation engine which would make recommendations from similar users as well as using the data clustered using tags. User would be given chance to vote for articles. There will be about hundred thousand articles. Documents would be clustered based on their tags. Given a keyword articles would be fetched based on their tags and passed through a neural network for ranking.
The problem you are trying to solve is called "collaborative filtering".
Neural Networks
One state-of-the-art neural network method is Deep Belief Networks and Restricted Boltzman Machines. For a fast python implementation for a GPU (CUDA) see here. Another option is PyBrain.
Academic papers on your specific problem:
This is probably the state-of-the-art of neural networks and collaborative filtering (of movies):
Salakhutdinov, R., Mnih, A. Hinton, G, Restricted Boltzman
Machines for Collaborative Filtering, To appear in
Proceedings of the 24th International Conference on
Machine Learning 2007.
PDF
A Hopfield network implemented in Python:
Huang, Z. and Chen, H. and Zeng, D. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering.
ACM Transactions on Information Systems (TOIS), 22, 1,116--142, 2004, ACM. PDF
A thesis on collaborative filtering with Restricted Boltzman Machines (they say Python is not practical for the job):
G. Louppe. Collaborative filtering: Scalable
approaches using restricted Boltzmann machines.
Master's thesis, Universite de Liege, 2010.
PDF
Neural networks are not currently the state-of-the-art in collaborative filtering. And they are not the simplest, wide-spread solutions. Regarding your comment about the reason for using NNs being having too little data, neural networks don't have an inherent advantage/disadvantage in that case. Therefore, you might want to consider simpler Machine Learning approaches.
Other Machine Learning Techniques
The best methods today mix k-Nearest Neighbors and Matrix Factorization.
If you are locked on Python, take a look at pysuggest (a Python wrapper for the SUGGEST recommendation engine) and PyRSVD (primarily aimed at applications in collaborative filtering, in particular the Netflix competition).
If you are open to try other open source technologies look at: Open Source collaborative filtering frameworks and http://www.infoanarchy.org/en/Collaborative_Filtering.
Packages
If you're not committed to neural networks, I've had good luck with SVM, and k-means clustering might also be helpful. Both of these are provided by Milk. It also does Stepwise Discriminant Analysis for feature selection, which will definitely be useful to you if you're trying to find similar documents by topic.
God help you if you choose this route, but the ROOT framework has a powerful machine learning package called TMVA that provides a large number of classification methods, including SVM, NN, and Boosted Decision Trees (also possibly a good option). I haven't used it, but pyROOT provides python bindings to ROOT functionality. To be fair, when I first used ROOT I had no C++ knowledge and was in over my head conceptually too, so this might actually be amazing for you. ROOT has a HUGE number of data processing tools.
(NB: I've also written a fairly accurate document language identifier using chi-squared feature selection and cosine matching. Obviously your problem is harder, but consider that you might not need very hefty tools for it.)
Storage vs Processing
You mention in your question that:
...articles would be fetched based on their tags and passed through a neural network for ranking.
Just as another NB, one thing you should know about machine learning is that processes like training and evaluating tend to take a while. You should probably consider ranking all documents for each tag only once (assuming you know all the tags) and storing the results. For machine learning generally, it's much better to use more storage than more processing.
Now to your specific case. You don't say how many tags you have, so let's assume you have 1000, for roundness. If you store the results of your ranking for each doc on each tag, that gives you 100 million floats to store. That's a lot of data, and calculating them all will take a while, but retrieving them is very fast. If instead you recalculate the ranking for each document on demand, you have to do 1000 passes of it, one for each tag. Depending on the kind of operations you're doing and the size of your docs, that could take a few seconds to a few minutes. If the process is simple enough that you can wait for your code to do several of these evaluations on demand without getting bored, then go for it, but you should time this process before making any design decisions / writing code you won't want to use.
Good luck!
If I understand correctly, your task is something related to Collaborative filtering. There are many possible approaches to this problem; I suggest you follow the wikipedia page to have an overview of the main approaches you can choose.
For your project work I can suggest looking at Python based intro to Neural Networks with a simple BackProp NN implementation and a classification example. This is not "the" solution, but perhaps you can build your system out of that example without the need for a bigger framework.
You might want to check out PyBrain.
The FANN library also looks promising.
I am not really sure if a neural networks are the best way to solve this. I think Euclidean Distance Score or Pearson Correlation Score combined with item or user based filtering would be a good start.
An excellent book on the topic is: Programming Collective Intelligence from Toby Segaran

Simulation of molecular dynamics in Python

I am searching for a python package that I can use to simulate molecular dynamics in non-equilibrium situations. I need a setup that can handle a fairly large number of molecules in a primarily kinetic theory manner, and that can handle having solid surfaces present. With regards to the surfaces, I would need to be able to create arbitrary shapes and monitor pressure and other variables resulting from the molecular action. Alternatively, I could add the surface parts myself if I had molecules that could handle it.
Does anyone know of any packages that might be suitable?
Have you considered SimPy? SimPy is a rather generic Discrete Event Simulation package, but could feasibly meet your needs.
Better yet the Molecular Modelling ToolKit (MMTK) seems more specialized...
I have used neither, but this sounds like fun. Python, as a language, seems to be in privileged position for use in simulation software, whereby people can script the specific details of their model while relying on the framework for all the common logic, such as scheduling, visualization, monitoring etc. The unknown is how well such toolkits scale when fed with agent counts commensurate with biology models (BTW, how "big" is that?)
Lampps and gromacs are two well known molecular dynamics codes. These codes both have some python based wrapper stuff, but I am not sure how much functionality the wrappers expose. They may not give you enough control over the simulation.
Google for "GromacsWrapper" or google for "lammps" and "pizza.py"
Digital material and ASE are two molecular dynamics codes that expose a lot of functionality, but last time I looked, they were both fairly specialized. They may not allow you to use the force potentials that you want
Google for "digital material" and "cornell" or google for "ase" and dtu
Note to MJV: Normal MD-codes take one time step at a time, and they move all particles in each time step. Most of the time is spend calculating the total force on each atom. This involves iterating over a list of pairs of neighboring atoms. I think the best idea is to do the force calculation and a few more basics in c++ or fortran and then wrap that functionality in python. (But it could be fun to see how far one can get by using numpy matrices)
The following programs can be used to run MD symulations:
Gromacs
AMBER
charmm
OpenMM
many others...
The following Python packages are useful for preparing and analysing MD trajectories:
MDtraj and the OMNIA ecosystem
MDAnalysis
ProDy
MMTK
Another generic simulations framework is my own GarlicSim. You can try that. I could help you get a simpack up if you're serious about it.
I don't know if that programs does all the features you need but there is avogadro in the kde programs, i think it is extendable and since it is open source you could do anything with it. http://www.kde-apps.org/content/show.php/Avogadro?content=59521
It is really advanced and programmed by a friend of mine
I second MMTK, but take a look at VMD, which is the best MD software I'm aware of, and is Python-scriptable (in addition to Tk). See this for examples and tutorials.
I recommend to use molecular dynamics software to run MD simulations like Gromacs. This software is highly optimized for that particular purpose. You can also run on GPU's and you will be able to run larger systems in less time.
Afterwards, you run only the analysis with python packages using the generated trajectories.
mdtraj
pmx

Categories