Working With Audio in PyProcessing

Working With Audio in PyProcessing - python

I am learning Pyprocessing. It comes with the regular processing platform that originally was written in Java. Many of the example projects that come bundled with processing have also been written in Python but not any of the audio libraries/examples.
I tried searching google but haven't found anything as of yet.
Does anyone know of a good resource where I can learn to do basic things with the audio library in pyprocessing such as playing audio and filtering audio?

I've used pyaudio and SWMixer for basic audio needs on a project.
Other python-audio resources I found useful at the time:
Scott W Harden's blog post on FFT analysis in Python (lots of neat things there)
PyAudioMixer
python-sounddevice
I hadn't used these exhaustively though to be able to advise on which one is most stable and easy to use.

Related

Speech to Text API offline(Preferred) or online

I'm making a windows desktop application that needs to transcribe videos and I'm looking for a good free API to help me achieve that.
I looked a lot but most of the API's that I've found have bad accuracies.

This doesn't work with .NET Core but if you're using the legacy .NET Framework (which is supported) you can use System.Speech to both recognize and synthesize speech offline.
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition?view=netframework-4.8
https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine?view=netframework-4.8
Update 3/1/21: System.Speech is now been ported to .NET Core. The Nuget package is available at: https://www.nuget.org/packages/System.Speech

Google's Speech-to-Text API has state of the art accuracy, a simple interface, and client libraries in many languages. You get 60 minutes free per month.
Link: https://cloud.google.com/speech-to-text/
If you want online API that is totally free, you most likely will not find it.
If you are willing to go offline, you will probably have to come up with a custom solution using the weights of some openly available deep learning model. Read some papers on state-of-the-art transcription models and see if any of the weights are available on GitHub. Keep in mind that performing such a task offline is very computationally expensive, and might require a GPU to give you results in a reasonable amount of time.

Audio Domain Specific Language vs Python

I want to write some code to do acoustic analysis and I'm trying to determine the proper tool(s) for the job. I would normally write something like this in Python using numpy and scipy and possibly Cython for the analysis part. I've discovered that the world of Python audio libraries is a bit chaotic, with scads of very limited packages in various states of development.
I've also come across a bunch of audio/acoustic specific languages like SuperCollider, Faust, etc. that seem to make the audio processing easy but may be limited in terms of IO and analysis capability.
I'm currently working on Linux with Alsa and PulseAudio installed by default. I would prefer not to involve and of the various and sundry other audio packages like Jack if possible, though that is not a hard requirement.
My primary interest in this question is to determine whether there is a domain specific language that will provide for quicker prototyping and testing or whether a general language like Python is more appropriate. Thanks.

I've got a lot of experience with SuperCollider and Python (with and without Numpy). I do a lot of audio analysis, and I'm afraid the answer depends on what you want to do.
If you want to create systems that will input OR output audio in real time, then Python is not a good choice. The audio I/O libraries (as you say) are a bit sketchy. There's also a fundamental issue that Python's garbage collector is not really designed for realtime stuff. You should use a system that is designed from the ground up for realtime. SuperCollider is nice for this, and as caseyanderson notes, some of the standard building-blocks for audio analysis are right there. There are other environments too.
If you want to do hardcore work such as applying various machine learning algorithms, not necessarily in real time (i.e. if you can get away with reading/writing WAV files rather than live audio), then you should use a general-purpose programming language with wide support, and an ecosystem of good libraries for the extra things you want. Using Python with libs such as numpy and scikits-learn works great for this. It's good for quick prototyping, but not only does it lack solid realtime audio, it also has far fewer of the standard audio building-blocks. Those are two important things which hold you back when prototyping audio pipelines.
So, then, you're caught between these two options. Depending on your application you may be able to combine the two by manipulating the audio I/O in a realtime environment, and using OSC messaging or shell scripts to communicate with an external Python process. The limitation there is that you can't really throw masses of data around between the two (you can't sensibly pipe all your audio across to some other process, that'd be silly).

SuperCollider has lots of support for things along these lines, both as externals/plugins or Quarks. That said, it depends exactly what you want to do. If you are simply looking to detect events, Onsets.kr would be fine. If you are looking for frequency/pitch information, Pitch or Tartini would work (I find Tartini to be more accurate). If you are trying to track amplitude, a combination of Amplitude.ar and some simple math would also work.
Similarly, there is SpecCentroid.kr (for a kind of brightness analysis), Loudness.kr, SpecFlatness.kr, etc.
The above are all pretty general, and there are lots more (the JoshUGens externals package has some interesting FFT-related acoustics stuff). So I would recommend downloading the program, joining the mailing list (if you have further questions), which lives here, and poking around in the Externals, Quarks, and Standard UGens.
Nonetheless, since I am not sure what you are trying to do, I cannot make more concrete recommendations than the above combined with my feeling that it makes the most sense to go to SC for this, rather than writing all of your own tools in Python from scratch.

I'm not 100% sure what you want to do, but as an additional suggestion I would put forth: Spear with scripting in Common Lisp. If what you are doing involves a great deal of spectral analysis, then you can do the heavy Lifting in Spear, and script all of this using Common List with Common Music. Spear has some great tools in terms of editing out very specific partials.

Audio/Video broadcasting via Python

so i have been racking my brain with trying to get an audio/video broadcasting method going with python.
I have done allot of research, hit google up quite a bit, and found a few audio broadcasting methods, but those are simply for mp3 based broadcasting systems.
What I am looking to do, is broadcast with python (audio/video) that will play in sync together to an audience (much like ustream).
If anyone could shed some light on this situation, I would be greatly appreciative!
Also, if there is a way that this could be done via HTML 5, that would be even more fantastic

And what about gstreamer?
http://gstreamer.freedesktop.org/
I already read about a lot of projects using gstreamer to stream video and audio :
http://www.jejik.com/articles/2007/01/streaming_audio_over_tcp_with_python-gstreamer/
http://blog.abourget.net/2009/6/14/gstreamer-rtp-and-live-streaming/
Gstreamer woiuld be the server part of the application.
To catch the stream, you'll need a plugin.
I think about vlc for example :
http://wiki.videolan.org/HowTo_Integrate_VLC_plugin_in_your_webpage
I know that the vlc plugin and gstreamer can work together, I experienced it myself in the past.

Not strictly a Python example, but this blog post documents how to build a web app that streams webcam video to an HTML5 video player. Chrome especially seems to make playing streaming video very easy.

Playing audio file with Python

I've seen most of the questions on this topic but almost all of them are outdated. (This is not a dupe)
My requirement is a preferably light weight library for simply playing audio files such as mp3,etc from Python (2.7)
These are the libraries that I've so far looked into and I'm listing what are the things that are stopping me from using each of them:
PyMedia: it was last updated in Feb, 2006
Mp3Play: supports only XP and was last updated in 2008.
I've also tried Pyglet but even this doesn't look good.
Also heard that wx has support for mp3 and I'm trying it. Any comments about the same?
Which reliable lightweight library do others use these days?
PS: please post one library only per answer

I'm not sure what your issue is with pyglet. Playing an mp3 using that couldn't be simpler:
import pyglet
sound = pyglet.media.load('mysound.mp3', streaming=False)
sound.play()
pyglet.app.run()
pyglet is well-maintained, cross-platform, and very small for a multimedia library.

I know this is late but anyway...
Try just_playback. It's a wrapper around miniaudio that can read multiple file formats including mp3 and provides playback control functionality like pausing, resuming, seeking and setting the playback volume.

How can I represent avi video as set of matrices using Python?

I have video files written in avi format and I would like to analyze these videos using Python. For that I would like to represent every frame of the video as a 2D matrix.
How can I do that? Google search gives me PyMedia as a way to go? Is it really the best choice or there some other approaches that I should to considered?
If the PyMedia is a good choice, could anybody pleas to give me a link where I can get exe files to install the module on Windows from binaries?
By the way, is it a good idea, in general, to use Python for these purposes? I like Python very much because of its simplicity and I prefer to use it, but if it is really not suitable for analysis of video, I am ready to use something else.
ADDED
Some people claim that PyMedia is "dead". Is it true?

Yeah, the latest news on the PyMedia web site is dated 01 Feb 2006. That's a pretty bad sign.
The most active and up-to-date open project for manipulating video is ffmpeg. Apparently there is a recently updated python wrapper for it: http://code.google.com/p/pyffmpeg/
In general Python is much too slow for doing any sort of pixel analysis of video. Therefore there will be practically zero libraries of any reasonable level of quality and support for helping at the pixel level of granularity. There are well supported libraries for working at an image level of granularity though. PIL seems to be a popular choice: http://www.pythonware.com/products/pil/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.