How can I represent avi video as set of matrices using Python?

How can I represent avi video as set of matrices using Python? - python

I have video files written in avi format and I would like to analyze these videos using Python. For that I would like to represent every frame of the video as a 2D matrix.
How can I do that? Google search gives me PyMedia as a way to go? Is it really the best choice or there some other approaches that I should to considered?
If the PyMedia is a good choice, could anybody pleas to give me a link where I can get exe files to install the module on Windows from binaries?
By the way, is it a good idea, in general, to use Python for these purposes? I like Python very much because of its simplicity and I prefer to use it, but if it is really not suitable for analysis of video, I am ready to use something else.
ADDED
Some people claim that PyMedia is "dead". Is it true?

Yeah, the latest news on the PyMedia web site is dated 01 Feb 2006. That's a pretty bad sign.
The most active and up-to-date open project for manipulating video is ffmpeg. Apparently there is a recently updated python wrapper for it: http://code.google.com/p/pyffmpeg/
In general Python is much too slow for doing any sort of pixel analysis of video. Therefore there will be practically zero libraries of any reasonable level of quality and support for helping at the pixel level of granularity. There are well supported libraries for working at an image level of granularity though. PIL seems to be a popular choice: http://www.pythonware.com/products/pil/

Related

Working With Audio in PyProcessing

I am learning Pyprocessing. It comes with the regular processing platform that originally was written in Java. Many of the example projects that come bundled with processing have also been written in Python but not any of the audio libraries/examples.
I tried searching google but haven't found anything as of yet.
Does anyone know of a good resource where I can learn to do basic things with the audio library in pyprocessing such as playing audio and filtering audio?

I've used pyaudio and SWMixer for basic audio needs on a project.
Other python-audio resources I found useful at the time:
Scott W Harden's blog post on FFT analysis in Python (lots of neat things there)
PyAudioMixer
python-sounddevice
I hadn't used these exhaustively though to be able to advise on which one is most stable and easy to use.

Audio Domain Specific Language vs Python

I want to write some code to do acoustic analysis and I'm trying to determine the proper tool(s) for the job. I would normally write something like this in Python using numpy and scipy and possibly Cython for the analysis part. I've discovered that the world of Python audio libraries is a bit chaotic, with scads of very limited packages in various states of development.
I've also come across a bunch of audio/acoustic specific languages like SuperCollider, Faust, etc. that seem to make the audio processing easy but may be limited in terms of IO and analysis capability.
I'm currently working on Linux with Alsa and PulseAudio installed by default. I would prefer not to involve and of the various and sundry other audio packages like Jack if possible, though that is not a hard requirement.
My primary interest in this question is to determine whether there is a domain specific language that will provide for quicker prototyping and testing or whether a general language like Python is more appropriate. Thanks.

I've got a lot of experience with SuperCollider and Python (with and without Numpy). I do a lot of audio analysis, and I'm afraid the answer depends on what you want to do.
If you want to create systems that will input OR output audio in real time, then Python is not a good choice. The audio I/O libraries (as you say) are a bit sketchy. There's also a fundamental issue that Python's garbage collector is not really designed for realtime stuff. You should use a system that is designed from the ground up for realtime. SuperCollider is nice for this, and as caseyanderson notes, some of the standard building-blocks for audio analysis are right there. There are other environments too.
If you want to do hardcore work such as applying various machine learning algorithms, not necessarily in real time (i.e. if you can get away with reading/writing WAV files rather than live audio), then you should use a general-purpose programming language with wide support, and an ecosystem of good libraries for the extra things you want. Using Python with libs such as numpy and scikits-learn works great for this. It's good for quick prototyping, but not only does it lack solid realtime audio, it also has far fewer of the standard audio building-blocks. Those are two important things which hold you back when prototyping audio pipelines.
So, then, you're caught between these two options. Depending on your application you may be able to combine the two by manipulating the audio I/O in a realtime environment, and using OSC messaging or shell scripts to communicate with an external Python process. The limitation there is that you can't really throw masses of data around between the two (you can't sensibly pipe all your audio across to some other process, that'd be silly).

SuperCollider has lots of support for things along these lines, both as externals/plugins or Quarks. That said, it depends exactly what you want to do. If you are simply looking to detect events, Onsets.kr would be fine. If you are looking for frequency/pitch information, Pitch or Tartini would work (I find Tartini to be more accurate). If you are trying to track amplitude, a combination of Amplitude.ar and some simple math would also work.
Similarly, there is SpecCentroid.kr (for a kind of brightness analysis), Loudness.kr, SpecFlatness.kr, etc.
The above are all pretty general, and there are lots more (the JoshUGens externals package has some interesting FFT-related acoustics stuff). So I would recommend downloading the program, joining the mailing list (if you have further questions), which lives here, and poking around in the Externals, Quarks, and Standard UGens.
Nonetheless, since I am not sure what you are trying to do, I cannot make more concrete recommendations than the above combined with my feeling that it makes the most sense to go to SC for this, rather than writing all of your own tools in Python from scratch.

I'm not 100% sure what you want to do, but as an additional suggestion I would put forth: Spear with scripting in Common Lisp. If what you are doing involves a great deal of spectral analysis, then you can do the heavy Lifting in Spear, and script all of this using Common List with Common Music. Spear has some great tools in terms of editing out very specific partials.

is there a simple portable way to read a gif image using python (no PIL)?

I am using python inside another application (CINEMA 4D) create a nice connection to out issue tracker (Jira) inside the application. Rationale behind this is to make it really easy for our plugin users to report and track bugs and have things like machine specs, screenshots or attaching scene files (including textures) automatically.
So far it as been a really smooth ride and the integration is coming along great. I started grabbing the icons for issue priorities, projects, issue types, etc. from Jira as well so they can be displayed for better overview. To read the image files I am using CINEMA 4D functionality that is available inside its python binding.
The problem now is, that most icons from Jira come in GIF format and the CINEMA 4D SDK doesn't read GIF files directly (actually it does read them, but only through a back door so users can load them, but I can't use that functionality through Python or the SDK). So I need another way to read the GIF files.
There are a few questions on stackoverflow that go into this direction, but they all seem to recommend PIL. This doesn't feel like the right solution for a few reasons:
While that looks nice, it's not part of the standard distribution and seems to be really only maintained for Windows (even though there are builds for Mac OS X).
It also seems to install itself into the current system installation of Python, but CINEMA 4D comes with its own, so I'd have to rip it apart and distribute it with my plugin.
And then it is quite large, while I really only want a compact script to have a compact solution (preferably out of the box, but that doesn't seem to be an option)
I was wondering if there is a simpler or at least more compact way. Since GIF seems to be a relatively simple file format, I am wondering if there may even be a simple parser as a python function/class.
I found a link where somebody disassembles a gif files embedded frames, but doesn't actually grab the image contents: Python, how i can get gif frames
I'm fine with putting in some time on my own, and I would've already been coding away if the file format was something uncompressed, but I am a little reluctant since the compression seems to raise the bar slightly.

Is there a standard way of Watermarking videos in Python?

Is there any standard method of watermarking videos of some format in Python?
And how about still images?

I'd suggest checking out pyffmpeg or pymedia, but that's about as good as it gets. Try to find a way to leverage ffmpeg proper if you can.
For still images, simply use PIL, the Python Imaging Library.

If you're looking for a robust (for-pay) service, I've had a very nice experience with Zencoder. The python api module is easy to use and fairly well documented.

Transloadit provides image & video conversion via web services, works well, and very cheap. If you need to do this on a large scale and don't want to buy a bunch of HW, they are great. Someone mentioned Zencoder. I don't have experience to understand all the tradeoffs between Transloadit and Zencoder. However in their current pricing models, Transloadit charges per GB of video and Zencoder charges per minute of video. If you are doing enough volume to worry about scalable pricing, for the scenarios I've looked at, Transloadit is cheaper for smaller / lower-resolutions videos. Perhaps obviously :)

Audio waveform visualisation in Python/Django

I've looked around Stack Overflow for an answer to this, but nowhere seems to give the correct answer or direction...
My project will allow a user to upload a WAV, which ultimately will be converted to a low quality MP3 using FFmpeg on the server and it'll all be stored and served on Amazon S3. The next obstacle is working out how to extract a reliable waveform visualisation from this uploaded sound. I'm using Python and Django on Linux Ubuntu 10 on a VPS for this project...
I'm, at the vert least, needing some sort of direction... I'm at a lost of where to start to look for such a tool?

This one (uses audiolab, PIL and numpy) is decent: http://www.freesound.org/blog/?p=10

To make a graph or plot of the waveform, the usual Python appoach is to get the waveform into a numpy array, and then use matplotlib to make the plot.
The easiest way to read the data into a numpy array is to use scipy.io.wavfile.read, though if you prefer not to use scipy (it's a big package), it's not difficult to read and convert the data using Python's wav module.

Not trying to answer my own question here, but it's a suggestion that may help others clearly when seeing this quesion...
After lots of searching around, I found this solution... It seems well done, but does anyone else know anything about it?
Seems to do the lot!
http://code.google.com/p/timeside/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.