Using FFprobe in Python - python

How can I use FFprobe in python scripts, which also exports the output as csv file?
The command I want to perform is:
ffprobe -i file_name -show_frames -select_streams v:1 -print_format csv > filename.csv
I looked at other posts about similar problem, and changed a little:
def probe_file(filename):
cmnd = ['ffprobe', '-i',filename, '-show_frames', '-select_streams', 'a', '-print_format', 'csv']
p = subprocess.Popen(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print filename
out, err = p.communicate()
print "==========output=========="
print out
if err:
print "========= error =r======="
print err
however, I can't seem to have "> filename.csv" working.
After analysing the video, I want all the output as csv file, named the same as the file name
Does anyone know how I can approach this?
Thanks in advance

If you want the data available to python (as a list of lists):
data = [l.split(',') for l in out.decode('ascii').splitlines()]
If you just want to dump the data to a file:
with open("{}.csv".format(filename), "w") as f:
f.write(out.decode('ascii'))
Note that I'm using Python 3, hence the decode()s (out is bytes, and must be decoded). You're clearly using Python 2, so the decode()s probably aren't necessary.

Related

How to send jar file output to a json file using python

I have jar file when executed will print out the output on the terminal, Im not sure how can i pass the jar file output as a json file.
The code below just print the jar file output on the terminal
subprocess.Popen(['java', '-jar', '/home/myfolder/collect.jar'])
I'm thinking of below but no idea to start with...
with open('collect.json', 'w') as fp:
xxxxxxxxxxx
Hope someone could advise further. Thank you.
This should do the trick. If you look at the subprocess documentation you can see that check_output runs a command with arguments and return its output as a byte string.
import multiprocessing as mp
import subprocess
command = "java -jar /home/myfolder/collect.jar"
def runCommand(q):
commandOutput = subprocess.check_output(command.split()).decode("utf-8")
q.put(commandOutput)
q = mp.Queue()
commandProcess = mp.Process(target=runCommand, args=(q, ))
commandProcess.start()
output = q.get()
print(output)
with open('collect.json', w) as fp:
fp.write(output)
Didn't run the code, but should work.
You can try something like this
with open('collect.json','w') as fp :
subprocess.Popen('java -jar .//home/myfolder/collect.jar',stdout=fp).wait()
for further information, see this question How to get the output from .jar execution in python codes?
I hope this helped.

File type not recognized when using ffprobe with Python

This is my python script:
#!/usr/bin/python3
import os, sys, subprocess, shlex, re
from subprocess import call
def probe_file(filename):
cmnd = ['ffprobe', '-show_packets', '-show_frames', '-print_format', filename]
p = subprocess.Popen(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print filename
out, err = p.communicate()
print "Output:"
print out
if err:
print "Error:"
print err
probe_file(sys.argv[1])
Being run with this command:
python parse_frames_and_packets.py mux_test.ts
And I'm getting the following error output:
Unknown output format with name 'mux_test.ts'
Can you not use .ts file types when using ffprobe with Python? Because I've run ffprobe alone using this exact file and it worked correctly.
Does anyone have any tips?
Edit:
it seems to not recognize all types of video, not just transport stream (I just tried it with .avi and got the same response).
-print_format (or the -of alias) requires a format. Available formats are: default, compact, csv, flat, ini, json, xml.
Example:
ffprobe -loglevel error -show_streams -print_format csv=p=0 input.mkv
See the ffprobe documentation for more info.

Using .Popen() and .shlex() to load files to deluge-console

I am having trouble getting a script to load files via deluge-console using .Popen() and .shlex(). I am using xubuntu and byobu on gnome-terminal.
def torrentLoad(url):
#client_run = subprocess.Popen(['deluged']])
sourceList = torrentWrite(sortXML(url))
print(sourceList)
for s in sourceList:
sleep(2)
delugeList = ['deluge-console', 'add', s]
load = subprocess.Popen(delugeList, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = load.communicate()
print(out, err)
The variable sourceList contains functions that parse xml and returns a list of torrent files locations. (e.g. '/home/YOURHOME/Documents/torrents/file.torrent')
In theory this should feed the command:
deluge-console add /home/YOURHOME/Documents/torrents/file.torrent
directly to terminal. Mind you I am running Byobu as well. I don't know if that would play a part in this or not.
The output I am getting is nodda. Zilch.Thanks for any help.

python script using subprocess, redirect ALL output to file

I am writing something for static analysis of source code in different languages. As anything has to be open source and callable from command line I now have downloaded one tool per language. So I decided to write a python script listing all source files in a project folder and calling the respective tool.
So part of my code looks like this:
import os
import sys
import subprocess
from subprocess import call
from pylint.lint import Run as pylint
class Analyser:
def __init__(self, source=os.getcwd(), logfilename=None):
# doing initialization stuff
self.logfilename = logfilename or 'CodeAnalysisReport.log'
self.listFiles()
self.analyseFiles()
def listFiles(self):
# lists all source files in the specified directory
def analyseFiles(self):
self.analysePythons()
self.analyseCpps()
self.analyseJss()
self.analyseJavas()
self.analyseCs()
if __name__ == '__main__':
Analyser()
Let's have at a look at the C++ files part (I use Cppcheck to analyse those):
def analyseCpps(self):
for sourcefile in self.files['.cc'] + self.files['.cpp']:
print '\n'*2, '*'*70, '\n', sourcefile
call(['C:\\CodeAnalysis\\cppcheck\\cppcheck', '--enable=all', sourcefile])
The console output for one of the files (it's just a random downloaded file) is:
**********************************************************************
C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc
Checking C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc...
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:18]: (style) The scope of the variable 'oldi' can be reduced.
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:43]: (style) The scope of the variable 'lastbit' can be reduced.
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:44]: (style) The scope of the variable 'two_to_power_i' can be reduced.
(information) Cppcheck cannot find all the include files (use --check-config for details)
Line 1 and 2 coming from my script, lines 3 to 7 coming from Cppcheck.
And this is what I want to save to my log file, for all the other files too. Everything in one single file.
Of course I have searched SO and found some methods. But none is working completely.
First try:
Adding sys.stdout = open(self.logfilename, 'w') to my constructor. This makes line 1 and 2 of the above showed output be written to my log file. The rest is still shown on console.
Second try:
Additionaly, in analyseCpps I use:
call(['C:\CodeAnalysis\cppcheck\cppcheck', '--enable=all', sourcefile], stdout=sys.stdout)
This makes my log file to be:
Checking C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc...
**********************************************************************
C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc
and the console output is:
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:18]: (style) The scope of the variable 'oldi' can be reduced.
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:43]: (style) The scope of the variable 'lastbit' can be reduced.
[C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc:44]: (style) The scope of the variable 'two_to_power_i' can be reduced.
Not what I want.
Third try:
Using Popen with pipe. sys.stdout is back to default.
As preliminary work analyseCpps now is:
for sourcefile in self.files['.cc'] + self.files['.cpp']:
print '\n'*2, '*'*70, '\n', sourcefile
p = subprocess.Popen(['C:\\CodeAnalysis\\cppcheck\\cppcheck', '--enable=all', sourcefile], stdout=subprocess.PIPE)
p.stdout.read()
p.stdout.read() shows only the last line of my desired output (line 7 in code box 3)
Fourth try:
Using subprocess.Popen(['C:\CodeAnalysis\cppcheck\cppcheck', '--enable=all', sourcefile], stdout=open(self.logfilename, 'a+')) just writes the one line Checking C:\CodeAnalysis\testproject\cpp\BiggestUnInt.cc... to my logfile, the rest is shown on the console.
Fifth try:
Instead of subprocess.Popen I use os.system, so my calling command is:
os.system('C:\CodeAnalysis\cppcheck\cppcheck --enable=all %s >> %s' % (sourcefile, self.logfilename))
This results in the same log file as my fourth try. If I type the same command directly in the windows console the result is the same. So I guess it it is not exactly a python problem but still:
If it is on the console there must be a way to put it in a file. Any ideas?
E D I T
Foolish me. I'm still a noob so I forgot about the stderr. That's where the decisive messages are going to.
So now I have:
def analyseCpps(self):
for sourcefile in self.files['.cc'] + self.files['.cpp']:
p = subprocess.Popen(['C:\\CodeAnalysis\\cppcheck\\cppcheck', '--enable=all', sourcefile], stderr=subprocess.PIPE)
with open(self.logfilename, 'a+') as logfile:
logfile.write('%s\n%s\n' % ('*'*70, sourcefile))
for line in p.stderr.readlines():
logfile.write('%s\n' % line.strip())
and it's working fine.
ANOTHER EDIT
according to Didier's answer:
with sys.stdout = open(self.logfilename, 'w', 0) in my constructor:
def analyseCpps(self):
for sourcefile in self.files['.cc'] + self.files['.cpp']:
print '\n'*2, '*'*70, '\n', sourcefile
p = subprocess.Popen(['C:\\CodeAnalysis\\cppcheck\\cppcheck', '--enable=all', sourcefile], stdout=sys.stdout, stderr=sys.stdout)
There are several problems:
you should redirect both stdout and stderr
you should use unbuffered files if you want to mix normal print and the output of launched commands.
Something like this:
import sys, subprocess
# Note the 0 here (unbuffered file)
sys.stdout = open("mylog","w",0)
print "Hello"
print "-----"
subprocess.call(["./prog"],stdout=sys.stdout, stderr=sys.stdout)
print "-----"
subprocess.call(["./prog"],stdout=sys.stdout, stderr=sys.stdout)
print "-----"
print "End"
You need to redirect stderr too, you can use STDOUT or pass the file object to stderr=:
from subprocess import check_call,STDOUT
with open("log.txt","w") as f:
for sourcefile in self.files['.cc'] + self.files['.cpp']:
check_call(['C:\\CodeAnalysis\\cppcheck\\cppcheck', '--enable=all', sourcefile],
stdout=f, stderr=STDOUT)
Try to redirect stdout and stderr to a logfile:
import subprocess
def analyseCpps(self):
with open("logfile.txt", "w") as logfile:
for sourcefile in self.files['.cc'] + self.files['.cpp']:
print '\n'*2, '*'*70, '\n', sourcefile
call(['C:\\CodeAnalysis\\cppcheck\\cppcheck',
'--enable=all', sourcefile], stdout=logfile,
stderr=subprocess.STDOUT)
In this example the filename is hardcoded, but you should be able to change that easily (to your self.logfilename or similar).

PDFminer gives strange letters

I am using python2.7 and PDFminer for extracting text from pdf. I noticed that sometimes PDFminer gives me words with strange letters, but pdf viewers don't. Also for some pdf docs result returned by PDFminer and other pdf viewers are same (strange), but there are docs where pdf viewers can recognize text (copy-paste). Here is example of returned values:
from pdf viewer: ‫فتــح بـــاب ا�ستيــراد البيــ�ض والدجــــاج المجمـــد‬
from PDFMiner: óªéªdG êÉ````LódGh ¢†``«ÑdG OGô``«à°SG ÜÉH í``àa
So my question is can I get same result as pdf viewer, and what is wrong with PDFminer. Does it missing encodings I don't know.
Yes.
This will happen when custom font encodings have been used e.g. identity-H,identity-V, etc. but fonts have not been embedded properly.
pdfminer gives garbage output in such cases because encoding is required to interpret the text
Maybe the PDF file you are trying to read has an encoding not yet supported by pdfMiner.
I had a similar problem last month and finally solved it by using a java library named "pdfBox" and calling it from python. The pdfBox library supported the encoding that I needed and worked like a charm!.
First I downloaded pdfbox from the official site
and then referenced the path to the .jar file from my code.
Here is a simplified version of the code I used (untested, but based on my original tested code).
You will need subprocess32, which you can install by calling pip install subprocess32
import subprocess32 as subprocess
import os
import tempfile
def extractPdf(file_path, pdfboxPath, timeout=30, encoding='UTF-8'):
#tempfile = temp_file(data, suffix='.pdf')
try:
command_args = ['java', '-jar', os.path.expanduser(pdfboxPath), 'ExtractText', '-console', '-encoding', encoding, file_path]
status, stdout, stderr = external_process(command_args, timeout=timeout)
except subprocess.TimeoutExpired:
raise RunnableError('PDFBox timed out while processing document')
finally:
pass#os.remove(tempfile)
if status != 0:
raise RunnableError('PDFBox returned error status code {0}.\nPossible error:\n{1}'.format(status, stderr))
# We can use result from PDFBox directly, no manipulation needed
pdf_plain_text = stdout
return pdf_plain_text
def external_process(process_args, input_data='', timeout=None):
process = subprocess.Popen(process_args,
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE)
try:
(stdout, stderr) = process.communicate(input_data, timeout)
except subprocess.TimeoutExpired as e:
# cleanup process
# see https://docs.python.org/3.3/library/subprocess.html?highlight=subprocess#subprocess.Popen.communicate
process.kill()
process.communicate()
raise e
exit_status = process.returncode
return (exit_status, stdout, stderr)
def temp_file(data, suffix=''):
handle, file_path = tempfile.mkstemp(suffix=suffix)
f = os.fdopen(handle, 'w')
f.write(data)
f.close()
return file_path
if __name__ == '__main__':
text = extractPdf(filename, 'pdfbox-app-2.0.3.jar')
`
This code was not entirely written by me. I followed the suggestions of other stack overflow answers, but it was a month ago, so I lost the original sources. If anyone finds the original posts where I got the pieces of this code, please let me know, so I can give them their deserved credit for the code.

Categories