So, I'm writing a basic python script to use youtube-dl to download a highquality thumbnail from a video. With the command line youtube-dl, you can run "youtube-dl --list-thumbnails [LINK]" and it will output a list of different quality links to the thumbnail images. Usually the highest resolution one has 'maxresdefault' in its link. I want to be able to download this image from the command line with wget. This is the code I have so far to achieve it. I'm not familiar with regex, but according to this site: regexr.com, it should have a match in the link with 'maxresdefault'.
import subprocess
import sys
import re
youtubeoutput = subprocess.call(['youtube-dl', '--list-thumbnails', 'https://www.youtube.com/watch?v=t2U2mUtTnzY'], shell=True, stdout=subprocess.PIPE)
print(str(youtubeoutput))
imgurl = re.search("/maxresdefault/g", str(youtubeoutput)).group(0)
print(imgurl)
subprocess.run('wget', str(imgurl))
I put the print statements in there to see what the outputs were. When I run the code, I can see the youtube-dl doesn't recognize a link being in there. youtube-dl: error: You must provide at least one url. Since there's no links in the output, the re.search becomes a NoneType and it gives me an error. I don't know why youtube-dl won't recognize the link. I'm not even sure it recognizes the --list-thumnails. Could anyone help?
You've asked subprocess to use a shell (shell=True), so you would usually pass an entire command to call, like so:
youtubeoutput = subprocess.call("youtube-dl --list-thumbnails https://www.youtube.com/watch?v=t2U2mUtTnzY", shell=True, stdout=subprocess.PIPE)
But really, you may not need a shell. Try something like:
youtubeoutput = subprocess.check_output(['youtube-dl', '--list-thumbnails', 'https://www.youtube.com/watch?v=t2U2mUtTnzY'])
Note that call does not actually return the program's standard output; check_output does.
Reference
Related
I need to extract text from a PDF. I tried the PyPDF2, but the textExtract method returned an encrypted text, even though the pdf is not encrypted acoording to the isEncrypted method.
So I moved on to trying accessing a program that does the job from the command prompt, so I could call it from python with the subprocess module. I found this program called textExtract, which did the job I wanted with the following command line on cmd:
"textextract.exe" "download.pdf" /to "download.txt"
However, when I tried running it with subprocess I couldn't get a 0 return code.
Here is the code I tried:
textextract = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
subprocess.run(textextract)
I already tried it with shell=True, but it didn't work.
Can anyone help me?
I was able to get the following script to work from the command line after installing the PDF2Text Pilot application you're trying to use:
import shlex
import subprocess
args = shlex.split(r'"textextract.exe" "download.pdf" /to "download.txt"')
print('args:', args)
subprocess.run(args)
Sample screen output of running it from a command line session:
> C:\Python3\python run-textextract.py
args: ['textextract.exe', 'download.pdf', '/to', 'download.txt']
Progress:
Text from "download.pdf" has been successfully extracted...
Text extraction has been completed!
The above output was generated using Python 3.7.0.
I don't know if your use of spyder on anaconda affects things or not since I'm not familiar with it/them. If you continue to have problems with this, then, if it's possible, I suggest you see if you can get things working directly—i.e. running the the Python interpreter on the script manually from the command line similar to what's shown above. If that works, but using spyder doesn't, then you'll at least know the cause of the problem.
There's no need to build a string of quoted strings and then parse that back out to a list of strings. Just create a list and pass that:
command=["textextract.exe", "download.pdf", "/to", "download.txt"]
subprocess.run(command)
All that shlex.split is doing is creating a list by removing all of the quotes you had to add when creating the string in the first place. That's an extra step that provides no value over just creating the list yourself.
The original Problem - MCVE
The following script should use chrome headless, to print to pdf (I am running windows 10, and python 3.6):
import subprocess
from tempfile import NamedTemporaryFile
output = NamedTemporaryFile()
CHROME_PATH=r'"C:\Program Files (x86)\Google\Chrome\Application\chrome"'
chrome_args=[CHROME_PATH,
'--headless',
r'--print-to-pdf="{}"'.format(output.name),
'--disable-gpu',
'https://www.google.com/',]
subprocess.call(chrome_args,shell=True)
However the generated file, is just empty.
Attempt at debugging
To try and figure out what is going, on I adapted the script to the following:
import subprocess
CHROME_PATH=r'"C:\Program Files (x86)\Google\Chrome\Application\chrome"'
chrome_args=[CHROME_PATH,
'--headless',
r'--print-to-pdf="c:\Users\timmc\Documents\output.pdf"',
'--disable-gpu',
'https://www.google.com/',]
print(r" ".join(chrome_args)) #For debuging
subprocess.call(chrome_args,shell=True)
In this case, there is just no file generated at the expected location. The result of the print is:
"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --print-to-pdf="c:\Users\timmc\Documents\output.pdf" --disable-gpu https://www.google.com/
if I run the following (creating a raw string literal), everything works as expected and the file is produced.
subprocess.call(r'"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --print-to-pdf="c:\Users\timmc\Documents\output.pdf" --disable-gpu https://www.google.com/', shell=True)
Having searched around on stack-overflow, and tried a few things, I still can’t get the original script to work. Any ideas?
Part of the problem is that I can't seem to get any meaningful debug from the subprocess call. Any help with that would also be much appreciated.
I'll try to answer instead of commenting again and again, but obviously I cannot test this.
The issue is mainly the forcing of the double quotes & shell=True. Leaving the quoting to subprocess (also in CHROME_PATH) and splitting arguments properly usually work. I've solved a lot of questions here with this technique.
Since your comments state that it does not, and that you found a workaround, let me suggest an improvement of this workaround: injecting the output filename in the command line that works:
subprocess.call(r'"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --print-to-pdf="{}" --disable-gpu https://www.google.com/'.format(output.name), shell=True)
not satisfactory to me but it has a good chance to work.
It turns out that the reason the subprocess wasn't running properly, is that when python creates a NamedTemporaryFile in windows, it does so with a FILE_SHARE_DELETE tag which prevents any other process accessing it unless it also has this tag. There is more discussion of this here.
Fortunately, Django comes with its own NamedTemporaryFile which was made to partially address this problem, and does so well enough for these purposes.
Using Python 2.7 on Raspberry Pi B+, I want to call the command "raspistill -o image.jpg" from Python and find using this is recommended:
from subprocress import call
call(["raspistill","-o image.jpg"])
However, this doesn't work since the image.jpg isn't created although outside Python,
raspistill -o
does create the file.
Next try is to first create the image file and writing to it.
f = open("image.jpg","w")
call(["raspistill","-o image.jpg"], stdout = f)
Now the image file is created, but nothing is written to it: its size remains 0. So how can I get this to work?
Thank you.
You are passing -o image.jpg as a single argument. You should pass them like two. Here is how:
call(["raspistill", "-o", "image.jpg"])
The way you did it it's like calling raspistill "-o image.jpg" from the command line, which will likely result in an error.
First, you're creating and truncating the file image.jpg:
f = open("image.jpg","w")
Then you're sending raspistill's stdout to that same file:
call(["raspistill","-o image.jpg"], stdout = f)
When you eventually get around to close-ing the file in Python, now image.jpg is just going to hold whatever raspistill wrote to stdout. Or, if you never close it, it'll be that minus the last buffer, which may be nothing at all.
Meanwhile, you're also trying to get raspistill to create a file with the same name, by passing it as part of the -o argument. You're doing that wrong, as Ionut Hulub's answer explains. Some programs will take "-o image.jpg" "-oimage.jpg", and "-o", "image.jpg" as meaning the same thing, some won't. But, even if this one does, at best you've now got two programs fighting over what file gets created and written as image.jpg.
If raspistill has an option to write the still to stdout, then you can use that option, together with passing stdout=f, and making sure to close the file. Or, if it has an option to write to a filename, then you can use that option. But doing both is not going to work.
If you don't know how to split the command, you can use shlex.split. For example,
>>> import shlex
>>> args = shlex.split('raspistill -o image.jpg')
>>> args
['raspistill', '-o', 'image.jpg']
>>> call(args)
I am trying to read a dicom header tag in dicom file.
Now, there are two ways to read this dicom header tag.
1) Using pydicom package in python which apparently is not working well on my python installed version(python 3).
2) or when i call AFNI function 'dicom_hinfo' through command line, i can get dicom tag value. The syntax to call afni function in terminal is as follows:
dicom_hinfo -tag aaaa,bbbb filename.dcm
output:fgre
Now how should i call this dicom-info -tag aaaa,bbbb filename.dcm in python script.
I guess subprocess might work but not sure about how to use it in this case.
To get output from a subprocess, you could use check_output() function:
#!/usr/bin/env python
from subprocess import check_output
tag = check_output('dicom_hinfo -tag aaaa,bbbb filename.dcm output:fgre'.split(),
universal_newlines=True).strip()
universal_newlines=True is used to get Unicode text on Python 3 (the data is decoded using user locale's character encoding).
check_output() assumes that dicom_hinfo prints to its standard output stream (stdout). Some utilities may print to stderr or the terminal directly instead. The code could be modified to adapt to that.
Oh this was due to syntax error using Pydicom.
I wanted to access 0019, 109c tag.
Syntax should be:
ds[0x0019,0x109c].value.
not ds[aaaa,bbbb].value
I am trying to execute a system executable on UNIX with python. I have used op.system() to do this, but really need to use subprocess.call() instead. My op.System call is below:
os.system('gmsh default.msh_timestep%06d* animation_options.geo' %(timestep));
and works fine. It calls the program gmsh and gmsh reads a series of files specified in default.msh_timestep%06d*. I then try to do the equivalent thing with subprocess, but I get errors saying that the files are not there. Below is the subprocesses call:
call(["gmsh", "default.msh_timestep%06d*" %(timestep), "animation_options.geo"],shell=True);
Does anyone know what could be going on here? I'm admittedly a Python noob, so this might be a silly question.
Globbing is done by the shell for you. In Python, you need to do it yourself. You can use glob.glob to get file list that match the pattern:
import glob
call(["gmsh"] + glob.glob("default.msh_timestep%06d*" % (timestep,)) +
["animation_options.geo"])
If you want to use shell=True, pass a string isntead of a list of strings:
call("gmsh default.msh_timestep%06d* animation_options.geo" % (timestep,), shell=True)