reading and parsing Windows video files metadata in Python - python

I'm working on a project and I need to read and parse video metadata.(duration, date created,title,bit rate , ...)
as far as I know there isn't any good package for Python 3.X for this issue.
1 -- I found these :
enzyme
hachoir-metadata
but they are designed for python 2.X
2 --
I also know how to use ffmpeg and other libraries that process video files. but they are so slow, I want to simply read the metadata from the windows file.
3 -- I tried to use exifread package to read video metadata , but as far as I found out , it doesn't work on video files.
4 -- there was this question that asked for a way to retrieve only length of videos, but it is unanswered.
I'm looking for this:
file = open(path_to_video_file)
props = python_video_info_parser.get_info(file)
print(props)
platform:
python 3.4
windows 10

I hope you found what you are looking for. :)
But if you did not or others are wondering.. I am researching the same sunject and I may have found a solution.
What I have found so far is a command-line tool called exiftool.
If you download this software and use the command line feature you are able to run these commands with the subprocess module.
There is support for a lot of file formats, as shown in the documentation.
I will update this post as soon as I have found a working soltuion. :)
You can download the tool here.

Related

Convert mp4 to .wav or mp3 with python

Does anybody have experience with converting mp4 files to .wav or mp3 files? I am able to do this in Linux (bash), but I try to do everything in Python that I do in other languages, call me an enthusiast. I have been looking over the Pymedia library, but have not made progress as of yet.
You can use Python bindings for GStreamer, and create a pipeline to do the conversion:
More info here:
http://pygstdocs.berlios.de/pygst-tutorial/pipeline.html
Example of pipeline in another SO question:
converting wav to mp3 (and vice versa) using GStreamer
You might find the python audio tools of some use. They are designed to work from command line, but being python code you can simply import the modules and integrate it in another program. This is the API documentation. From the "About" page:
Python Audio Tools are a collection of audio handling programs which work from the command line. These include programs for CD extraction, track conversion from one audio format to another, track renaming and retagging, track identification, CD burning from tracks, and more. Supports internationalized track filenames and metadata using Unicode. Works with high-definition, multi-channel audio as well as CD-quality. Track conversion uses multiple CPUs or CPU cores if available to greatly speed the transcoding process. Track metadata can be retrieved from FreeDB, MusicBrainz or compatible servers.

solution to convert PDFs, DOCs, DOCXs into a textual format with python

I am developing a full text search engine for indexing popular binary formats. I know that there are hundereds of such questions (and solutions) already, but I found it tough to find one:
cross platform
supports DOC, DOCX and PDF formats at once
easy to use with python
can be set up in a major shared host
For PDFs, I recommend PDFminer.
Try the docx module (I have not used it myself)
I am not aware of any pure python module that can read .doc files.
There are command-line tools to extract text from .doc files: antiword and catdoc (and probably others). If the packages are installed on your shared host, you could use subprocess to shell out to these tools. Available on Windows via Cygwin.
Apache POI is a Java library that can extract text from Office documents. If your shared host has Java installed, you could write a bit of Java (or Jython) code and execute using subprocess.
If at server side you can use OpenOffice then you can use unoconv: Convert between any document format supported by OpenOffice
One possible solution is to use google documents to extract the text contents from binary .doc-files. You upload the document to google docs and then download the text contents. It is a fairly slow process, but it is the only "pure Python" solution I know of since it doesn't require any external tools except for network access. An external tool such as catdoc or antiword is a much better solution if you are allowed to install it on your host.
Textract uses the default tools for every kind of file.
https://github.com/deanmalmgren/textract

GStreamer: status of Python bindings and encoding video with mixed audio

I am hoping to find a way to write generated video (non-real time) from Python and mix it with external audio file (MP3) simultaneously.
What's the current status of GStreamer Python bindings, are they up-to-date?
Would it be possible to write MPEG-4 output with GStreamer and feed raw image frames from Python
Is it possible to construct pipeline so that GStreamer would also read MP3 audio and mix it into the container, so that I do not need to reprocess the resulting video track with ffmpeg etc. external tools to have the audio track
Are there any up-to-date tutorials for using GStreamer with Python? (I couldn't find anything dated since 2006-2009)
(my old question: did not really give good pointers Writing video with OpenCV + Python + Mac )
Whether or not the binding are "up-to-date" really depends on what version of Python you're using. As for Python 2.7, I am using GStreamer without incident.
I have been fighting a major bug in developing with Python 2.7 and GStreamer on Windows 7 (WinBuilds installers), but I'm able to work with GStreamer just fine on Ubuntu.
GStreamer does have mp3 codecs, but there are some legal matters surrounding their legality in some countries. I'd do a Google search on that before using them.
As for tutorials, no luck. All the same, the existing tutorials do quite well for the modern version, especially this one and this one.
In regards to writing MPEG-4 output and feeding raw images, I do not know. That would be a good stand-alone question, in all honesty.

Read/Write LabView TDMS files in python under linux

Does anyone know of a way to read and write the National Instruments binary file type (TDMS) in python under linux? I know that NI has a C DLL available, but I don't know how to access that through python, or if I even can do so under linux.
It looks like TDMS isn't directly supported under Linux (see here).
Your options currently are to use the G-based functions directly in LabVIEW (It's possible that you can wrap them in a .so file), calling LabVIEW from Python, or building your own file parser from the TDMS spec.
Sorry, no really easy options.
Edit: It looks like there may be an open source project to try to do this at http://sourceforge.net/projects/pytdms/. Worth a try, at least.
You have to install the python version 2.7 (thats the only one that is working with the tdms package for labview atleast)
Sudo pip install npTDMS
Link to the tdms package page
and just follow the example on the page.

How to extract a windows cabinet file in python

Is it somehow possible to extract .cab files in python?
Not strictly answering what you asked, but if you are running on a windows platform you could spawn a process to do it for you.
Taken from Wikipedia:
Microsoft Windows provides two
command-line tools for creation and
extraction of CAB files. They are
MAKECAB.EXE (included within Windows
packages such as 'ie501sp2.exe' and
'orktools.msi'; also available from
the SDK, see below) and EXTRACT.EXE
(included on the installation CD),
respectively. Windows XP also provides
the EXPAND.EXE command.
I had the same problem last week so I implemented this in python. Comments, additions and especially pull requests welcome: https://github.com/hughsie/python-cabarchive
Oddly, the msilib can only create or append to .CAB files, but not extract them. :(
However, the hachoir parser module can apparently read & edit Cabinets. (I have not used it, though, so I couldn't tell you how fitting it is or not!)

Categories