Python - shell command causes InterfaceError on file download - python

Recently, we replaced curl with aria2c in order to download files faster from our backend servers for later conversion to different formats.
Now for some reason we ran into the following issue with aria2c:
Pool callback raised exception: InterfaceError(0, '')
It's not clear to us where this InterfaceError occurs or what it actually could mean. Besides, we can trigger the executed command manually without any problems.
Please also have a look at our download function:
def download_file(descriptor):
"""
creates the WORKING_DIR structure and Download the descriptor.
The descriptor should be a URI (processed via aria2c)
returns the created resource path
"""
makedirs(WORKING_DIR + 'output/', exist_ok=True)
file_path = WORKING_DIR + decompose_uri(descriptor)['fileNameExt']
print(file_path)
try:
print(descriptor)
exec_command(f'aria2c -x16 "{descriptor}" -o "{file_path}"')
except CalledProcessError as err:
log('DEBUG', f'Aria2C error: {err.stderr}')
raise VodProcessingException("Download failed. Aria2C error")
return file_path
def exec_command(string):
"""
Shell command interface
Returns returnCode, stdout, stderr
"""
log('DEBUG', f'[Command] {string}')
output = run(string, shell=True, check=True, capture_output=True)
return output.returncode, output.stdout, output.stderr
Is stdout here maybe misunderstood by python which then drop into this InterfaceError?
Thanks in advance

As I just wanted to use aria2c to download files faster, as it support multiple connection, I now switched over to a tool called "axel". It also supports multiple connections without the excessive overhead aria2c has, at least for me in this situation.

Related

How should i pass username and password to Cassandra using python script

I am new to python programming, I am trying to build a script that will take Casandra metadata backup.
My script is working fine when there is authentication configured in yaml file but it failed when we turned on authentication.
This is the part where I am calling CQLSH.
with open(save_path + '/' + filename, 'w') as f:
query_process = subprocess.Popen(['echo', query], stdout=subprocess.PIPE)
cqlsh = subprocess.Popen(('/bin/cqlsh', host),
stdin=query_process.stdout, stdout=f)
query_process.stdout.close()
return (save_path + filename)
It will be really helpful for me if anyone can help.
Depending on your configuration and deployment there are a couple of options.
You might just choose to pass them as command line options to your popen command.
Another alternative is to have them in a cqlshrc file, which is either read from the standard location (~/.cassandra/cqlshrc), or an alternative path passed as another command line option.

Perfect Wrapper (in Python)

I run a configuration management tool which calls /usr/bin/dpkg, but does not show the stdout/stderr.
Something goes wrong and I want to debug the root of the problem.
I want to see all calls to dpkg and stdout/stderr.
I moved the original /usr/bin/dpkg to /usr/bin/dpkg-orig and wrote a wrapper:
#!/usr/bin/env python
import os
import sys
import datetime
import subx
import psutil
cmd=list(sys.argv)
cmd[0]='dpkg-orig'
def parents(pid=None):
if pid==1:
return '\n'
if pid is None:
pid = os.getpid()
process = psutil.Process(pid)
lines = [parents(process.ppid())]
lines.append('Parent: %s' % ' '.join(process.cmdline()))
return '\n'.join(lines)
result = subx.call(cmd, assert_zero_exit_status=False)
with open('/var/tmp/dpkg-calls.log', 'ab') as fd:
fd.write('----------- %s\n' % (datetime.datetime.now()))
fd.write('%s\n' % parents())
fd.write('stdout:\n%s\n\n' % result.stdout)
sys.stdout.write(result.stdout)
fd.write('stderr:\n%s\n' % result.stderr)
fd.write('ret: %s\n' % result.ret)
sys.stderr.write(result.stderr)
sys.exit(result.ret)
Now I run the configuration management tool again and searched for non zero "ret:" lines.
The output:
Parent: /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold -o DPkg::Options::=--force-confdef install openssl-foo-bar-aptguettler.cert
Parent: python /usr/bin/dpkg --force-confold --force-confdef --status-fd 67 --no-triggers --unpack --auto-deconfigure /var/cache/apt/archives/openssl-foo-bar-aptguettler.cert_1-2_all.deb
stdout:
stderr:
dpkg: error: unable to read filedescriptor flags for <package status and progress file descriptor>: Bad file descriptor
ret: 2
This happens because my wrapper is not perfect yet.
The tool which calls dpkg wants to read the file descriptor but this does not work with my wrapper.
My goal:
Capture all calls to dpkg and write it to a logfile (works)
Write out the parent processes (works)
The parent process of dpkg should not notice a difference and not fail like above (does not work yet).
Any idea how to achieve this?
I wrote a simple python script which solves this:
https://github.com/guettli/wrap_and_log_calls
Wrapper to log all calls to a linux command
particular use case: My configuration management tool calls
/usr/bin/dpkg. An error occurs, but unfortunately my configuration
management tool does not show me the whole stdout/stderr. I have no
clue what's wrong.
General use case: Wrap a linux command like /usr/bin/dpkg and write
out all calls to this.

python subprocess.call not finding specified file from windows task scheduler

I have a python script that uses subprocess.call() to call openssl, and it runs fine from command line.
However, when I run through windows task schedule, the job fails with
Winrror2 : specified file not found
The same job runs fine from task scheduler when subprocess.call() is removed.
I tried replacing openssl command with simple cp copy but it still shows the same error.
Note:
1. I'm using python v3.6
2. The job is set to run using SYSTEM account with highest privileges.
3. After searching net, I included shell=True as well; no luck. The only difference is - it suppressed the error message in log
Here is the part of the code:
infilepath = str(r'C:\Test\filename.txt.bin')
outfilepath = str(r'C:\Test\filename.txt')
deckeyfile = str(r'C:\Test\decryptionkey.key')
#decrypt the file
try:
subprocess.call(["openssl", "cms", "-decrypt", "-inform", "DER", "-in", infilepath, "-binary", "-inkey", deckeyfile, "-out", outfilepath], shell=True)
#subprocess.call(["cp", infilepath, outfilepath])
decryptcount += 1
except Exception as e:
module_logger.error("Failed to decrypt with error: %s", str(e), exc_info = True)
errorcount += 1`

How to check whether a shell command returned nothing or something

I am writing a script to extract something from a specified path. I am returning those values into a variable. How can i check whether the shell command has returned something or nothing.
My Code:
def any_HE():
global config, logger, status, file_size
config = ConfigParser.RawConfigParser()
config.read('config2.cfg')
for section in sorted(config.sections(), key=str.lower):
components = dict() #start with empty dictionary for each section
#Retrieving the username and password from config for each section
if not config.has_option(section, 'server.user_name'):
continue
env.user = config.get(section, 'server.user_name')
env.password = config.get(section, 'server.password')
host = config.get(section, 'server.ip')
print "Trying to connect to {} server.....".format(section)
with settings(hide('warnings', 'running', 'stdout', 'stderr'),warn_only=True, host_string=host):
try:
files = run('ls -ltr /opt/nds')
if files!=0:
print '{}--Something'.format(section)
else:
print '{} --Nothing'.format(section)
except Exception as e:
print e
I tried checking 1 or 0 and True or false but nothing seems to be working. In some servers, the path '/opt/nds/' does not exist. So in that case, nothing will be there on files. I wanted to differentiate between something returned to files and nothing returned to files.
First, you're hiding stdout.
If you get rid of that you'll get a string with the outcome of the command on the remote host. You can then split it by os.linesep (assuming same platform), but you should also take care of other things like SSH banners and colours from the retrieved outcome.
As perror commented already, the python subprocess module offers the right tools.
https://docs.python.org/2/library/subprocess.html
For your specific problem you can use the check_output function.
The documentation gives the following example:
import subprocess
subprocess.check_output(["echo", "Hello World!"])
gives "Hello World"
plumbum is a great library for running shell commands from a python script. E.g.:
from plumbum.local import ls
from plumbum import ProcessExecutionError
cmd = ls['-ltr']['/opt/nds'] # construct the command
try:
files = cmd().splitlines() # run the command
if ...:
print ...:
except ProcessExecutionError:
# command exited with a non-zero status code
...
On top of this basic usage (and unlike the subprocess module), it also supports things like output redirection and command pipelining, and more, with easy, intuitive syntax (by overloading python operators, such as '|' for piping).
In order to get more control of the process you run, you need to use the subprocess module.
Here is an example of code:
import subprocess
task = subprocess.Popen(['ls', '-ltr', '/opt/nds'], stdout=subprocess.PIPE)
print task.communicate()

PDFminer gives strange letters

I am using python2.7 and PDFminer for extracting text from pdf. I noticed that sometimes PDFminer gives me words with strange letters, but pdf viewers don't. Also for some pdf docs result returned by PDFminer and other pdf viewers are same (strange), but there are docs where pdf viewers can recognize text (copy-paste). Here is example of returned values:
from pdf viewer: ‫فتــح بـــاب ا�ستيــراد البيــ�ض والدجــــاج المجمـــد‬
from PDFMiner: óªéªdG êÉ````LódGh ¢†``«ÑdG OGô``«à°SG ÜÉH í``àa
So my question is can I get same result as pdf viewer, and what is wrong with PDFminer. Does it missing encodings I don't know.
Yes.
This will happen when custom font encodings have been used e.g. identity-H,identity-V, etc. but fonts have not been embedded properly.
pdfminer gives garbage output in such cases because encoding is required to interpret the text
Maybe the PDF file you are trying to read has an encoding not yet supported by pdfMiner.
I had a similar problem last month and finally solved it by using a java library named "pdfBox" and calling it from python. The pdfBox library supported the encoding that I needed and worked like a charm!.
First I downloaded pdfbox from the official site
and then referenced the path to the .jar file from my code.
Here is a simplified version of the code I used (untested, but based on my original tested code).
You will need subprocess32, which you can install by calling pip install subprocess32
import subprocess32 as subprocess
import os
import tempfile
def extractPdf(file_path, pdfboxPath, timeout=30, encoding='UTF-8'):
#tempfile = temp_file(data, suffix='.pdf')
try:
command_args = ['java', '-jar', os.path.expanduser(pdfboxPath), 'ExtractText', '-console', '-encoding', encoding, file_path]
status, stdout, stderr = external_process(command_args, timeout=timeout)
except subprocess.TimeoutExpired:
raise RunnableError('PDFBox timed out while processing document')
finally:
pass#os.remove(tempfile)
if status != 0:
raise RunnableError('PDFBox returned error status code {0}.\nPossible error:\n{1}'.format(status, stderr))
# We can use result from PDFBox directly, no manipulation needed
pdf_plain_text = stdout
return pdf_plain_text
def external_process(process_args, input_data='', timeout=None):
process = subprocess.Popen(process_args,
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE)
try:
(stdout, stderr) = process.communicate(input_data, timeout)
except subprocess.TimeoutExpired as e:
# cleanup process
# see https://docs.python.org/3.3/library/subprocess.html?highlight=subprocess#subprocess.Popen.communicate
process.kill()
process.communicate()
raise e
exit_status = process.returncode
return (exit_status, stdout, stderr)
def temp_file(data, suffix=''):
handle, file_path = tempfile.mkstemp(suffix=suffix)
f = os.fdopen(handle, 'w')
f.write(data)
f.close()
return file_path
if __name__ == '__main__':
text = extractPdf(filename, 'pdfbox-app-2.0.3.jar')
`
This code was not entirely written by me. I followed the suggestions of other stack overflow answers, but it was a month ago, so I lost the original sources. If anyone finds the original posts where I got the pieces of this code, please let me know, so I can give them their deserved credit for the code.

Categories