I have been trying to append my output of a command to a temporary file in python and later doing some operations. Not able to append the data to a temporary file. Any help is appreciated! My sample code as follows.
Getting the error like this.
with open(temp1 , 'r') as f:
TypeError: expected str, bytes or os.PathLike object, not _TemporaryFileWrapper
import tempfile
import os
temp1 = tempfile.NamedTemporaryFile()
os.system("echo Hello world | tee temp1")
with open(temp1 , 'r') as f:
a = f.readlines()[-1]
print(a)
import tempfile
import os
# Opening in update-text mode to avoid encoding the data written to it
temp1 = tempfile.NamedTemporaryFile("w+")
# popen opens a pipe from the command, allowing one to capture its output
output = os.popen("echo Hello world")
# Write the command output to the temporary file
temp1.write(output.read())
# Reset the stream position at the beginning of the file, if you want to read its contents
temp1.seek(0)
print(temp1.read())
Check out subprocess.Popen for more powerful subprocess communication.
Whatever you're trying to do isn't right. It appears that you are trying to have a system call write to a file, and then you want to read that file in your Python code. You're creating a temporary file, but then your system call is writing to a statically named file, named 'temp1' rather than to the temporary file you've opened. So it's unclear if you want/need to use a computed temporary file name or if using temp1 is OK. The easiest way to fix your code to do what I think you want is like this:
import os
os.system("echo Hello world | tee temp1")
with open('temp1' , 'r') as f:
a = f.readlines()[-1]
print(a)
If you need to create a temporary file name in your situation, then you have to be careful if you are at all concerned about security or thread safety. What you really want to do is have the system create a temporary directory for you, and then create a statically named file in that directory. Here's your code reworked to do that:
import tempfile
import os
with tempfile.TemporaryDirectory() as dir:
tempfile = os.path.join(dir, "temp1")
os.system("echo Hello world /tmp > " + tempfile)
with open(tempfile) as f:
buf = f.read()
print(buf)
This method has the added benefit of automatically cleaning up for you.
UPDATE: I have now seen #UlisseBordingnon's answer. That's a better solution overall. Using os.system() is discouraged. I would have gone a bit different of a way by using the subprocess module, but what they suggest is 100% valid, and is thread and security safe. I guess I'll leave my answer here as maybe you or other readers need to use os.system() or otherwise have the shell process you execute write directly to a file.
As others have suggested, you should use the subprocess module instead of os.system. However from subprocess you can use the most recent interface (and by most recent, I believe this was adding in Python 3.4) of subprocess.run.
The neat thing about using .run is that you can pass any file-like object to stdout and the stdout stream will automatically redirect to that file.
import tempfile
import subprocess
with tempfile.NamedTemporaryFile("w+") as f:
subprocess.run(["echo", "hello world"], stdout=f)
# command has finished running, let's check the file
f.seek(0)
print(f.read())
# hello world
If you are using python 3.5 or later (as with most of us), then use subprocess.run is better because you do not need a temporary file:
import subprocess
completed_process = subprocess.run(
["echo", "hello world"],
capture_output=True,
encoding="utf-8",
)
print(completed_process.stdout)
Notes
The capture_output parameter tells run() to save the output to the .stdout and .stderr attributes
The encoding parameter will convert the output from bytes to string
Depending on your needs, if your print your output, a quickier way, but maybe not exactly what you are looking for is to redirect the output to a file, at the command line level
Example(egfile.py):
import os
os.system("echo Hello world")
At command level you can simply do:
python egfile.py > file.txt
The output of the file will be redirected to the file instead to the screen
Related
I can successfully redirect my output to a file, however this appears to overwrite the file's existing data:
import subprocess
outfile = open('test','w') #same with "w" or "a" as opening mode
outfile.write('Hello')
subprocess.Popen('ls',stdout=outfile)
will remove the 'Hello' line from the file.
I guess a workaround is to store the output elsewhere as a string or something (it won't be too long), and append this manually with outfile.write(thestring) - but I was wondering if I am missing something within the module that facilitates this.
You sure can append the output of subprocess.Popen to a file, and I make a daily use of it. Here's how I do it:
log = open('some file.txt', 'a') # so that data written to it will be appended
c = subprocess.Popen(['dir', '/p'], stdout=log, stderr=log, shell=True)
(of course, this is a dummy example, I'm not using subprocess to list files...)
By the way, other objects behaving like file (with write() method in particular) could replace this log item, so you can buffer the output, and do whatever you want with it (write to file, display, etc) [but this seems not so easy, see my comment below].
Note: what may be misleading, is the fact that subprocess, for some reason I don't understand, will write before what you want to write. So, here's the way to use this:
log = open('some file.txt', 'a')
log.write('some text, as header of the file\n')
log.flush() # <-- here's something not to forget!
c = subprocess.Popen(['dir', '/p'], stdout=log, stderr=log, shell=True)
So the hint is: do not forget to flush the output!
Well the problem is if you want the header to be header, then you need to flush before the rest of the output is written to file :D
Are data in file really overwritten? On my Linux host I have the following behavior:
1) your code execution in the separate directory gets:
$ cat test
test
test.py
test.py~
Hello
2) if I add outfile.flush() after outfile.write('Hello'), results is slightly different:
$ cat test
Hello
test
test.py
test.py~
But output file has Hello in both cases. Without explicit flush() call stdout buffer will be flushed when python process is terminated.
Where is the problem?
I am attempting to cat a CSV file into stdout and then pipe the printed output as input into a python program that also takes a system argument vector with 1 argument. I ran into an issue I think directly relates to how Python's fileinput.input() function reacts with regards to occupying the stdin file descriptor.
generic_user% cat my_data.csv | python3 my_script.py myarg1
Here is a sample Python program:
import sys, fileinput
def main(argv):
print("The program doesn't even print this")
data_list = []
for line in fileinput.input():
data_list.append(line)
if __name__ == "__main__":
main(sys.argv)
If I attempt to run this sample program with the above terminal command and no argument myarg1, the program is able to evaluate and parse the stdin for the data output from the CSV file.
If I run the program with the argument myarg1, it will end up throwing a FileNotFoundError directly related to myarg1 not existing as a file.
FileNotFoundError: [Errno 2] No such file or directory: 'myarg1'
Would someone be able to explain in detail why this behavior takes place in Python and how to handle the logic such that a Python program can first handle stdin data before argv overwrites the stdin descriptor?
You can read from the stdin directly:
import sys
def main(argv):
print("The program doesn't even print this")
data_list = []
for line in iter(sys.stdin):
data_list.append(line)
if __name__ == "__main__":
main(sys.argv)
You are trying to access a file which has not been yet created, hence fileinput cannot open it, but since you are piping the data you have no need for it.
This is by design. The conceptors of fileinput thought that there were use cases where reading from stdin would be non sense and just provided a way to specifically add stdin to the list of files. According to the reference documentation:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:], defaulting to sys.stdin if the list is empty. If a filename is '-', it is also replaced by sys.stdin.
Just keep your code and use: generic_user% cat my_data.csv | python3 my_script.py - myarg1
to read stdin before myarg1 file or if you want to read it after : ... python3 my_script.py myarg1 -
fileinput implements a pattern common for Unix utilities:
If the utility is called with commandline arguments, they are files to read from.
If it is called with no arguments, read from standard input.
So fileinput works exactly as intended. It is not clear what you are using commandline arguments for, but if you don't want to stop using fileinput, you should modify sys.argv before you invoke it.
some_keyword = sys.argv[1]
sys.argv = sys.argv[:1] # Retain only argument 0, the command name
for line in fileinput.input():
...
I am working with some fairly large gzipped text files that I have to unzip, edit and re-zip. I use Pythons gzip module for unzipping and zipping, but I have found that my current implementation is far from optimal:
input_file = gzip.open(input_file_name, 'rb')
output_file = gzip.open(output_file_name, 'wb')
for line in input_file:
# Edit line and write to output_file
This approach is unbearably slow – probably because there is a huge overhead involved in doing per line iteration with the gzip module: I initially also run a line-count routine where I - using the gzip module - read chunks of the file and then count the number of newline chars in each chunk and that is very fast!
So one of the optimizations should definitely be to read my files in chunks and then only do per line iterations once the chunks have been unzipped.
As an additional optimization, I have seen a few suggestions to unzip in a shell command via subprocess. Using this approach, the equivalent of the first line in the above could be:
from subprocess import Popen, PIPE
file_input = Popen(["zcat", fastq_filename], stdout=PIPE)
input_file = file_input.stdout
Using this approach input_file becomes a file-like object. I don't know exactly how it is different to a real file object in terms of available attributes and methods, but one difference is that you obviously cannot use seek since it is a stream rather than a file.
This does run faster and it should - unless you run your script in a single core machine the claim is. The latter must mean that subprocess automatically ships different threads to different cores if possible, but I am no expert there.
So now to my current problem: I would like to zip my output in a similar fashion. That is, instead of using Pythons gzip module, I would like to pipe it to a subprocess and then call the shell gzip. This way I could potentially get reading, editing and writing in separate cores, which sounds wildly effective to me.
I have made a puny attempt at this, but attempting to write to output_file resulted in an empty file. Initially, I create an empty file using the touch command because Popen fails if the file does not exist:
call('touch ' + output_file_name, shell=True)
output = Popen(["gzip", output_file_name], stdin=PIPE)
output_file = output.stdin
Any help is greatly appreciated, I am using Python 2.7 by the way. Thanks.
Here is a working example of how this can be done:
#!/usr/bin/env python
from subprocess import Popen, PIPE
output = ['this', 'is', 'a', 'test']
output_file_name = 'pipe_out_test.txt.gz'
gzip_output_file = open(output_file_name, 'wb', 0)
output_stream = Popen(["gzip"], stdin=PIPE, stdout=gzip_output_file) # If gzip is supported
for line in output:
output_stream.stdin.write(line + '\n')
output_stream.stdin.close()
output_stream.wait()
gzip_output_file.close()
If our script only wrote to console and we wanted the output zipped, a shell command equivalent of the above could be:
script_that_writes_to_console | gzip > output.txt.gz
You meant output_file = gzip_process.stdin. After that you can use output_file as you've used gzip.open() object previously (no-seeking).
If the result file is empty then check that you call output_file.close() and gzip_process.wait() at the end of your Python script. Also, the usage of gzip may be incorrect: if gzip writes the compressed output to its stdout then pass stdout=gzip_output_file where gzip_output_file = open(output_file_name, 'wb', 0).
I have a folder containing lots of files like file_1.gz to file_250.gz and increasing.
A zgrep command which searches through them is like:
zgrep -Pi "\"name\": \"bob\"" ../../LM/DATA/file_*.gz
I want to execute this command in a python subprocess like:
out_file = os.path.join(out_file_path, file_name)
search_command = ['zgrep', '-Pi', '"name": "bob"', '../../LM/DATA/file_*.gz']
process = subprocess.Popen(search_command, stdout=out_file)
The problem is the out_file is created but it is empty and these errors are raised:
<type 'exceptions.AttributeError'>
'str' object has no attribute 'fileno'
What is the solution?
You need to pass a file object:
process = subprocess.Popen(search_command, stdout=open(out_file, 'w'))
Citing the manual, emphasis mine:
stdin, stdout and stderr specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None. PIPE indicates that a new pipe to the child should be created. With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.
Combined with LFJ's answer - using the convenience functions is recommended, and you need to use shell=True to make the wildcard (*) work:
subprocess.call(' '.join(search_command), stdout=open(out_file, 'w'), shell=True)
Or when you're using shell anyways, you can use the shell redirection as well:
subprocess.call("%s > %s" % (' '.join(search_command), out_file), shell=True)
There are two issues:
you should pass something with a valid .fileno() method instead of the filename
the shell expands * but subprocess does not invoke the shell unless you ask. You could use glob.glob() to expand the file patterns manually.
Example:
#!/usr/bin/env python
import os
from glob import glob
from subprocess import check_call
search_command = ['zgrep', '-Pi', '"name": "bob"']
out_path = os.path.join(out_file_path, file_name)
with open(out_path, 'wb', 0) as out_file:
check_call(search_command + glob('../../LM/DATA/file_*.gz'),
stdout=out_file)
if your want to execute a shell command and get the output, try to use subprocess.check_output(). it is very simple, and you could save the output to a file easily.
command_output = subprocess.check_output(your_search_command, shell=True)
with open(out_file, 'a') as f:
f.write(command_output)
My problem consist of two parts:
First part is answered by #liborm as well
The second part is related to the files that zgrep tries to search in. when we write a command like zgrep "pattern" path/to/files/*.gz the bash automatically removes the *.gz by all files ends with .gz. When i run the command in a subprocess no one replaced the *.gz by real file, in consequence the error gzip: ../../LM/DATA/file_*.gz: No such file or directory raises. So solved it by:
for file in os.listdir(archive_files_path):
if file.endswith(".gz"):
search_command.append(os.path.join(archive_files_path, file))
I'm making a call to a program from the shell using the subprocess module that outputs a binary file to STDOUT.
I use Popen() to call the program and then I want to pass the stream to a function in a Python package (called "pysam") that unfortunately cannot Python file objects, but can read from STDIN. So what I'd like to do is have the output of the shell command go from STDOUT into STDIN.
How can this be done from within Popen/subprocess module? This is the way I'm calling the shell program:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
This will read "my_cmd"'s STDOUT output and get a stream to it in p. Since my Python module cannot read from "p" directly, I am trying to redirect STDOUT of "my_cmd" back into STDIN using:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout
I then call my module, which uses "-" as a placeholder for STDIN:
s = pysam.Samfile("-", "rb")
The above call just means read from STDIN (denoted "-") and read it as a binary file ("rb").
When I try this, I just get binary output sent to the screen, and it doesn't look like the Samfile() function can read it. This occurs even if I remove the call to Samfile, so I think it's my call to Popen that is the problem and not downstream steps.
EDIT: In response to answers, I tried:
sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__
This seems to hang. I get the output:
Opening SAM..
but it never gets past the Samfile("-", "rb") line. Any idea why?
Any idea how this can be fixed?
EDIT 2: I am adding a link to Pysam documentation in case it helps, I really cannot figure this out. The documentation page is:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html
and the specific note about streams is here:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams
In particular:
"""
Pysam does not support reading and writing from true python file objects, but it does support reading and writing from stdin and stdout. The following example reads from stdin and writes to stdout:
infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
It will also work with BAM files. The following script converts a BAM formatted file on stdin to a SAM formatted file on stdout:
infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
Note, only the file open mode needs to changed from r to rb.
"""
So I simply want to take the stream coming from Popen, which reads stdout, and redirect that into stdin, so that I can use Samfile("-", "rb") as the above section states is possible.
thanks.
I'm a little confused that you see binary on stdout if you are using stdout=subprocess.PIPE, however, the overall problem is that you need to work with sys.stdin if you want to trick pysam into using it.
For instance:
sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin
UPDATE: This assumed that pysam is running in the context of the Python interpreter and thus means the Python interpreter's stdin when "-" is specified. Unfortunately, it doesn't; when "-" is specified it reads directly from file descriptor 0.
In other words, it is not using Python's concept of stdin (sys.stdin) so replacing it has no effect on pysam.Samfile(). It also is not possible to take the output from the Popen call and somehow "push" it on to file descriptor 0; it's readonly and the other end of that is connected to your terminal.
The only real way to get that output onto file descriptor 0 is to just move it to an additional script and connect the two together from the first. That ensures that the output from the Popen in the first script will end up on file descriptor 0 of the second one.
So, in this case, your best option is to split this into two scripts. The first one will invoke my_cmd and take the output of that and use it for the input to a second Popen of another Python script that invokes pysam.Samfile("-", "rb").
In the specific case of dealing with pysam, I was able to work around the issue using a named pipe (http://docs.python.org/library/os.html#os.mkfifo), which is a pipe that can be accessed like a regular file. In general, you want the consumer (reader) of the pipe to listen before you start writing to the pipe, to ensure you don't miss anything. However, pysam.Samfile("-", "rb") will hang as you noted above if nothing is already registered on stdin.
Assuming you're dealing with a prior computation that takes a decent amount of time (e.g. sorting the bam before passing it into pysam), you can start that prior computation and then listen on the stream before anything gets output:
import os
import tempfile
import subprocess
import shutil
import pysam
# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)
# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order
# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])
# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")
# Print out the names of each record
for read in samfile:
print read.qname
# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)
If your system supports it; you could use /dev/fd/# filenames:
process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")