StringIO and compatibility with 'with' statement (context manager) - python

I have some legacy code with a legacy function that takes a filename as an argument and processes the file contents. A working facsimile of the code is below.
What I want to do is not have to write to disk with some content that I generate in order to use this legacy function, so I though I could use StringIO to create an object in place of the physical filename. However, this does not work, as you can see below.
I thought StringIO was the way to go with this. Can anyone tell me if there is a way to use this legacy function and pass it something in the argument that isn't a file on disk but can be treated as such by the legacy function? The legacy function does have the with context manager doing work on the filename parameter value.
The one thing I came across in google was: http://bugs.python.org/issue1286, but that didn't help me...
Code
from pprint import pprint
import StringIO
# Legacy Function
def processFile(filename):
with open(filename, 'r') as fh:
return fh.readlines()
# This works
print 'This is the output of FileOnDisk.txt'
pprint(processFile('c:/temp/FileOnDisk.txt'))
print
# This fails
plink_data = StringIO.StringIO('StringIO data.')
print 'This is the error.'
pprint(processFile(plink_data))
Output
This is the output in FileOnDisk.txt:
['This file is on disk.\n']
This is the error:
Traceback (most recent call last):
File "C:\temp\test.py", line 20, in <module>
pprint(processFile(plink_data))
File "C:\temp\test.py", line 6, in processFile
with open(filename, 'r') as fh:
TypeError: coercing to Unicode: need string or buffer, instance found

A StringIO instance is an open file already. The open command, on the other hand, only takes filenames, to return an open file. A StringIO instance is not suitable as a filename.
Also, you don't need to close a StringIO instance, so there is no need to use it as a context manager either. While closing an instance frees the memory allocated, so does simply letting the garbage collector reap the object. At any rate, the contextlib.closing() context manager could take care of closing the object if you want to ensure freeing the memory while still holding a reference to the object.
If all your legacy code can take is a filename, then a StringIO instance is not the way to go. Use the tempfile module to generate a temporary filename instead.
Here is an example using a contextmanager to ensure the temp file is cleaned up afterwards:
import os
import tempfile
from contextlib import contextmanager
#contextmanager
def tempinput(data):
temp = tempfile.NamedTemporaryFile(delete=False)
temp.write(data)
temp.close()
try:
yield temp.name
finally:
os.unlink(temp.name)
with tempinput('Some data.\nSome more data.') as tempfilename:
processFile(tempfilename)
You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()).

you could define your own open function
fopen = open
def open(fname,mode):
if hasattr(fname,"readlines"): return fname
else: return fopen(fname,mode)
however with wants to call __exit__ after its done and StringIO does not have an exit method...
you could define a custom class to use with this open
class MyStringIO:
def __init__(self,txt):
self.text = txt
def readlines(self):
return self.text.splitlines()
def __exit__(self):
pass

This one is based on the python doc of contextmanager
It's just wrapping StringIO with simple context, and when exit is called, it will return to the yield point, and properly close the StringIO. This avoids the need of making tempfile, but with large string, this will still eat up the memory, since StringIO buffer that string.
It works well on most cases where you know the string data is not going to be long
from contextlib import contextmanager
#contextmanager
def buildStringIO(strData):
from cStringIO import StringIO
try:
fi = StringIO(strData)
yield fi
finally:
fi.close()
Then you can do:
with buildStringIO('foobar') as f:
print(f.read()) # will print 'foobar'

Even if
You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()
In Python3 , this works to me:
from pprint import pprint
from io import StringIO
import contextlib
#contextlib.contextmanager
def as_handle(handleish, mode="r", **kwargs):
try:
with open(handleish, mode, **kwargs) as fp:
yield fp
except TypeError:
yield handleish
def processFile(filename):
#with filename as fh: ### OK for StringIO
#with(open(filename)) as fh: #TypeError: expected str, bytes or os.PathLike #object, not _io.StringIO
with as_handle(filename) as fh:
return fh.readlines()
# This fails ## doesnt fail anymore
plink_data = StringIO('StringIO data.')
print('This is the error.')
pprint(processFile(plink_data))
output:
This is the error.
['StringIO data.']

Related

tempfile is not accessible when using subprocess.Popen

When I run the following script, the error"Command line argument error: Argument "query". File is not accessible" occurs. I'm using python 3.4.2.
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import subprocess
import tempfile
import sys
def main():
# read name file and put all identifications into a list
infile_I = open('OTU_name.txt','r')
name = infile_I.read().split('>')
infile_I.close()
# extract sequence segments to a temporary file one at a time
for i in name:
i = i.replace('\n','')
for j in SeqIO.parse("GemSIM_OTU_ids.fa","fasta"):
if str(i) == str(j.id):
f = tempfile.NamedTemporaryFile()
record = j.seq
f.write(bytes(str(record),'UTF-8'))
f.seek(0)
f = f.read().decode()
Result = subprocess.Popen(['blastn','-remote','-db','chromosome','-query',f,'-out',str(i)],stdout=subprocess.PIPE)
output = Result.communicate()[0]
if __name__== '__main__': main()
f = tempfile.NamedTemporaryFile() returns a file-like object which you're trying to provide as a command line argument. Instead, you want the actual filename which is available via its .name attribute - although I'm somewhat confused why you're creating a tempfile, writing to it, seeking back to position 0, then replacing your tempfile f object with the contents of the file? I suspect you don't want to do that replacement and use f.name for your query.
Result = subprocess.Popen(['blastn','-remote','-db','chromosome','-query',f.name,'-out',str(i)],stdout=subprocess.PIPE)
Also, there's some convenient wrapper functions around subprocess.Popen such as subprocess.check_output which are also somewhat more explicit as to your intent which could be used here instead.

Open a file in memory

(I'm working on a Python 3.4 project.)
There's a way to open a (sqlite3) database in memory :
with sqlite3.connect(":memory:") as database:
Does such a trick exist for the open() function ? Something like :
with open(":file_in_memory:") as myfile:
The idea is to speed up some test functions opening/reading/writing some short files on disk; is there a way to be sure that these operations occur in memory ?
How about StringIO:
import StringIO
output = StringIO.StringIO()
output.write('First line.\n')
print >>output, 'Second line.'
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
python3: io.StringIO
There is something similar for file-like input/output to or from a string in io.StringIO.
There is no clean way to add url-based processing to normal file open, but being Python dynamic you could monkey-patch standard file open procedure to handle this case.
For example:
from io import StringIO
old_open = open
in_memory_files = {}
def open(name, mode="r", *args, **kwargs):
if name[:1] == ":" and name[-1:] == ":":
# in-memory file
if "w" in mode:
in_memory_files[name] = ""
f = StringIO(in_memory_files[name])
oldclose = f.close
def newclose():
in_memory_files[name] = f.getvalue()
oldclose()
f.close = newclose
return f
else:
return old_open(name, mode, *args, **kwargs)
after that you can write
f = open(":test:", "w")
f.write("This is a test\n")
f.close()
f = open(":test:")
print(f.read())
Note that this example is very minimal and doesn't handle all real file modes (e.g. append mode, or raising the proper exception on opening in read mode an in-memory file that doesn't exist) but it may work for simple cases.
Note also that all in-memory files will remain in memory forever (unless you also patch unlink).
PS: I'm not saying that monkey-patching standard open or StringIO instances is a good idea, just that you can :-D
PS2: This kind of problem is solved better at OS level by creating an in-ram disk. With that you can even call external programs redirecting their output or input from those files and you also get all the full support including concurrent access, directory listings and so on.
io.StringIO provides a memory file implementation you can use to simulate a real file. Example from documentation:
import io
output = io.StringIO()
output.write('First line.\n')
print('Second line.', file=output)
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
In Python 2, this class is available instead as StringIO.StringIO.

function for loading both strings and files on disk?

I have a design question. I have a function loadImage() for loading an image file. Now it accepts a string which is a file path. But I also want to be able to load files which are not on physical disk, eg. generated procedurally. I could have it accept a string, but then how could it know the string is not a file path but file data? I could add an extra boolean argument to specify that, but that doesn't sound very clean. Any ideas?
It's something like this now:
def loadImage(filepath):
file = open(filepath, 'rb')
data = file.read()
# do stuff with data
The other version would be
def loadImage(data):
# do stuff with data
How to have this function accept both 'filepath' or 'data' and guess what it is?
You can change your loadImage function to expect an opened file-like object, such as:
def load_image(f):
data = file.read()
... and then have that called from two functions, one of which expects a path and the other a string that contains the data:
from StringIO import StringIO
def load_image_from_path(path):
with open(path, 'rb') as f:
load_image(f)
def load_image_from_string(s):
sio = StringIO(s)
try:
load_image(sio)
finally:
sio.close()
How about just creating two functions, loadImageFromString and loadImageFromFile?
This being Python, you can easily distinguish between a filename and a data string. I would do something like this:
import os.path as P
from StringIO import StringIO
def load_image(im):
fin = None
if P.isfile(im):
fin = open(im, 'rb')
else:
fin = StringIO(im)
# Read from fin like you would from any open file object
Other ways to do it would be a try block instead of using os.path, but the essence of the approach remains the same.

how to concisely create a temporary file that is a copy of another file in python

I know that it is possible to create a temporary file, and write the data of the file I wish to copy to it. I was just wondering if there was a function like:
create_temporary_copy(file_path)
There isn't one directly, but you can use a combination of tempfile and shutil.copy2 to achieve the same result:
import tempfile, shutil, os
def create_temporary_copy(path):
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, 'temp_file_name')
shutil.copy2(path, temp_path)
return temp_path
You'll need to deal with removing the temporary file in the caller, though.
This isn't quite as concise, and I imagine there may be issues with exception safety, (e.g. what happens if 'original_path' doesn't exist, or the temporary_copy object goes out of scope while you have the file open) but this code adds a little RAII to the clean up. The difference here to using NamedTemporaryFile directly is that rather than ending up with a file object, you end up with a file, which is occasionally desirable (e.g. if you plan to call out to other code to read it, or some such.)
import os,shutil,tempfile
class temporary_copy(object):
def __init__(self,original_path):
self.original_path = original_path
def __enter__(self):
temp_dir = tempfile.gettempdir()
base_path = os.path.basename(self.original_path)
self.path = os.path.join(temp_dir,base_path)
shutil.copy2(self.original_path, self.path)
return self.path
def __exit__(self,exc_type, exc_val, exc_tb):
os.remove(self.path)
in your code you'd write:
with temporary_copy(path) as temporary_path_to_copy:
... do stuff with temporary_path_to_copy ...
# Here in the code, the copy should now have been deleted.
The following is more concise (OP's ask) than the selected answer. Enjoy!
import tempfile, shutil, os
def create_temporary_copy(path):
tmp = tempfile.NamedTemporaryFile(delete=True)
shutil.copy2(path, tmp.name)
return tmp.name
A variation on #tramdas's answer, accounting for the fact that the file cannot be opened twice on windows. This version ignores the preservation of the file extension.
import os, shutil, tempfile
def create_temporary_copy(src):
# create the temporary file in read/write mode (r+)
tf = tempfile.TemporaryFile(mode='r+b', prefix='__', suffix='.tmp')
# on windows, we can't open the the file again, either manually
# or indirectly via shutil.copy2, but we *can* copy
# the file directly using file-like objects, which is what
# TemporaryFile returns to us.
# Use `with open` here to automatically close the source file
with open(src,'r+b') as f:
shutil.copyfileobj(f,tf)
# display the name of the temporary file for diagnostic purposes
print 'temp file:',tf.name
# rewind the temporary file, otherwise things will go
# tragically wrong on Windows
tf.seek(0)
return tf
# make a temporary copy of the file 'foo.txt'
name = None
with create_temporary_copy('foo.txt') as temp:
name = temp.name
# prove that it exists
print 'exists', os.path.isfile(name) # prints True
# read all lines from the file
i = 0
for line in temp:
print i,line.strip()
i += 1
# temp.close() is implicit using `with`
# prove that it has been deleted
print 'exists', os.path.isfile(name) # prints False
A slight variation (in particular I needed the preserve_extension feature for my use case, and I like the "self-cleanup" feature):
import os, shutil, tempfile
def create_temporary_copy(src_file_name, preserve_extension=False):
'''
Copies the source file into a temporary file.
Returns a _TemporaryFileWrapper, whose destructor deletes the temp file
(i.e. the temp file is deleted when the object goes out of scope).
'''
tf_suffix=''
if preserve_extension:
_, tf_suffix = os.path.splitext(src_file_name)
tf = tempfile.NamedTemporaryFile(suffix=tf_suffix)
shutil.copy2(src_file_name, tf.name)
return tf

How to use tempfile.NamedTemporaryFile() in Python

I want to use tempfile.NamedTemporaryFile() to write some contents into it and then open that file. I have written following code:
tf = tempfile.NamedTemporaryFile()
tfName = tf.name
tf.seek(0)
tf.write(contents)
tf.flush()
but I am unable to open this file and see its contents in Notepad or similar application. Is there any way to achieve this? Why can't I do something like:
os.system('start notepad.exe ' + tfName)
at the end.
I don't want to save the file permanently on my system. I just want the contents to be opened as a text in Notepad or similar application and delete the file when I close that application.
This could be one of two reasons:
Firstly, by default the temporary file is deleted as soon as it is closed. To fix this use:
tf = tempfile.NamedTemporaryFile(delete=False)
and then delete the file manually once you've finished viewing it in the other application.
Alternatively, it could be that because the file is still open in Python Windows won't let you open it using another application.
Edit: to answer some questions from the comments:
As of the docs from 2 when using delete=False the file can be removed by using:
tf.close()
os.unlink(tf.name)
You can also use it with a context manager so that the file will be closed/deleted when it goes out of scope. It will also be cleaned up if the code in the context manager raises.
import tempfile
with tempfile.NamedTemporaryFile() as temp:
temp.write('Some data')
temp.flush()
# do something interesting with temp before it is destroyed
Here is a useful context manager for this.
(In my opinion, this functionality should be part of the Python standard library.)
# python2 or python3
import contextlib
import os
#contextlib.contextmanager
def temporary_filename(suffix=None):
"""Context that introduces a temporary file.
Creates a temporary file, yields its name, and upon context exit, deletes it.
(In contrast, tempfile.NamedTemporaryFile() provides a 'file' object and
deletes the file as soon as that file object is closed, so the temporary file
cannot be safely re-opened by another library or process.)
Args:
suffix: desired filename extension (e.g. '.mp4').
Yields:
The name of the temporary file.
"""
import tempfile
try:
f = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
tmp_name = f.name
f.close()
yield tmp_name
finally:
os.unlink(tmp_name)
# Example:
with temporary_filename() as filename:
os.system('echo Hello >' + filename)
assert 6 <= os.path.getsize(filename) <= 8 # depending on text EOL
assert not os.path.exists(filename)

Categories