Open a file in memory - python

(I'm working on a Python 3.4 project.)
There's a way to open a (sqlite3) database in memory :
with sqlite3.connect(":memory:") as database:
Does such a trick exist for the open() function ? Something like :
with open(":file_in_memory:") as myfile:
The idea is to speed up some test functions opening/reading/writing some short files on disk; is there a way to be sure that these operations occur in memory ?

How about StringIO:
import StringIO
output = StringIO.StringIO()
output.write('First line.\n')
print >>output, 'Second line.'
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
python3: io.StringIO

There is something similar for file-like input/output to or from a string in io.StringIO.
There is no clean way to add url-based processing to normal file open, but being Python dynamic you could monkey-patch standard file open procedure to handle this case.
For example:
from io import StringIO
old_open = open
in_memory_files = {}
def open(name, mode="r", *args, **kwargs):
if name[:1] == ":" and name[-1:] == ":":
# in-memory file
if "w" in mode:
in_memory_files[name] = ""
f = StringIO(in_memory_files[name])
oldclose = f.close
def newclose():
in_memory_files[name] = f.getvalue()
oldclose()
f.close = newclose
return f
else:
return old_open(name, mode, *args, **kwargs)
after that you can write
f = open(":test:", "w")
f.write("This is a test\n")
f.close()
f = open(":test:")
print(f.read())
Note that this example is very minimal and doesn't handle all real file modes (e.g. append mode, or raising the proper exception on opening in read mode an in-memory file that doesn't exist) but it may work for simple cases.
Note also that all in-memory files will remain in memory forever (unless you also patch unlink).
PS: I'm not saying that monkey-patching standard open or StringIO instances is a good idea, just that you can :-D
PS2: This kind of problem is solved better at OS level by creating an in-ram disk. With that you can even call external programs redirecting their output or input from those files and you also get all the full support including concurrent access, directory listings and so on.

io.StringIO provides a memory file implementation you can use to simulate a real file. Example from documentation:
import io
output = io.StringIO()
output.write('First line.\n')
print('Second line.', file=output)
# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()
# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()
In Python 2, this class is available instead as StringIO.StringIO.

Related

Reading a binary file Python (pickle) [duplicate]

I created some data and stored it several times like this:
with open('filename', 'a') as f:
pickle.dump(data, f)
Every time the size of file increased, but when I open file
with open('filename', 'rb') as f:
x = pickle.load(f)
I can see only data from the last time.
How can I correctly read file?
Pickle serializes a single object at a time, and reads back a single object -
the pickled data is recorded in sequence on the file.
If you simply do pickle.load you should be reading the first object serialized into the file (not the last one as you've written).
After unserializing the first object, the file-pointer is at the beggining
of the next object - if you simply call pickle.load again, it will read that next object - do that until the end of the file.
objects = []
with (open("myfile", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break
There is a read_pickle function as part of pandas 0.22+
import pandas as pd
obj = pd.read_pickle(r'filepath')
The following is an example of how you might write and read a pickle file. Note that if you keep appending pickle data to the file, you will need to continue reading from the file until you find what you want or an exception is generated by reaching the end of the file. That is what the last function does.
import os
import pickle
PICKLE_FILE = 'pickle.dat'
def main():
# append data to the pickle file
add_to_pickle(PICKLE_FILE, 123)
add_to_pickle(PICKLE_FILE, 'Hello')
add_to_pickle(PICKLE_FILE, None)
add_to_pickle(PICKLE_FILE, b'World')
add_to_pickle(PICKLE_FILE, 456.789)
# load & show all stored objects
for item in read_from_pickle(PICKLE_FILE):
print(repr(item))
os.remove(PICKLE_FILE)
def add_to_pickle(path, item):
with open(path, 'ab') as file:
pickle.dump(item, file, pickle.HIGHEST_PROTOCOL)
def read_from_pickle(path):
with open(path, 'rb') as file:
try:
while True:
yield pickle.load(file)
except EOFError:
pass
if __name__ == '__main__':
main()
I developed a software tool that opens (most) Pickle files directly in your browser (nothing is transferred so it's 100% private):
https://pickleviewer.com/ (formerly)
Now it's hosted here: https://fire-6dcaa-273213.web.app/
Edit: Available here if you want to host it somewhere: https://github.com/ch-hristov/Pickle-viewer
Feel free to host this somewhere.

os.path.getsize() returns "0"

Getting "0" output, when I am trying to use os.path.getsize()
Not sure what's wrong, using PyCharm, I see that the file was created and the "comments" were added to the file. But PyCharm shows the output "0" :(
Here is the code:
import os
def create_python_script(filename):
comments = "# Start of a new Python program"
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
return(filesize)
print(create_python_script("program.py"))
Please, point what is the error I don't see.
You're getting the size 0, due to the peculiar behaviour of the write function.
When you call the write function, it writes the content to the internal buffer. An internal buffer is kept for performance constraints (to limit too frequent I/O calls).
So in this case, you can't ensure that the data/content has been actually dumped to the file on disk or not when you call the getsize function.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)
In order to ensure that the content is dumped to the file before calling the getsize function, you can call flush method.
flush method clears the internal buffer and dumps all the content to the file on the disk.
with open(filename, "w") as file:
file.write(comments)
file.flush()
filesize = os.path.getsize(filename)
Or, a better way would be to first close the file and then call the getsize method.
with open(filename, "w") as file:
file.write(comments)
filesize = os.path.getsize(filename)

StringIO and compatibility with 'with' statement (context manager)

I have some legacy code with a legacy function that takes a filename as an argument and processes the file contents. A working facsimile of the code is below.
What I want to do is not have to write to disk with some content that I generate in order to use this legacy function, so I though I could use StringIO to create an object in place of the physical filename. However, this does not work, as you can see below.
I thought StringIO was the way to go with this. Can anyone tell me if there is a way to use this legacy function and pass it something in the argument that isn't a file on disk but can be treated as such by the legacy function? The legacy function does have the with context manager doing work on the filename parameter value.
The one thing I came across in google was: http://bugs.python.org/issue1286, but that didn't help me...
Code
from pprint import pprint
import StringIO
# Legacy Function
def processFile(filename):
with open(filename, 'r') as fh:
return fh.readlines()
# This works
print 'This is the output of FileOnDisk.txt'
pprint(processFile('c:/temp/FileOnDisk.txt'))
print
# This fails
plink_data = StringIO.StringIO('StringIO data.')
print 'This is the error.'
pprint(processFile(plink_data))
Output
This is the output in FileOnDisk.txt:
['This file is on disk.\n']
This is the error:
Traceback (most recent call last):
File "C:\temp\test.py", line 20, in <module>
pprint(processFile(plink_data))
File "C:\temp\test.py", line 6, in processFile
with open(filename, 'r') as fh:
TypeError: coercing to Unicode: need string or buffer, instance found
A StringIO instance is an open file already. The open command, on the other hand, only takes filenames, to return an open file. A StringIO instance is not suitable as a filename.
Also, you don't need to close a StringIO instance, so there is no need to use it as a context manager either. While closing an instance frees the memory allocated, so does simply letting the garbage collector reap the object. At any rate, the contextlib.closing() context manager could take care of closing the object if you want to ensure freeing the memory while still holding a reference to the object.
If all your legacy code can take is a filename, then a StringIO instance is not the way to go. Use the tempfile module to generate a temporary filename instead.
Here is an example using a contextmanager to ensure the temp file is cleaned up afterwards:
import os
import tempfile
from contextlib import contextmanager
#contextmanager
def tempinput(data):
temp = tempfile.NamedTemporaryFile(delete=False)
temp.write(data)
temp.close()
try:
yield temp.name
finally:
os.unlink(temp.name)
with tempinput('Some data.\nSome more data.') as tempfilename:
processFile(tempfilename)
You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()).
you could define your own open function
fopen = open
def open(fname,mode):
if hasattr(fname,"readlines"): return fname
else: return fopen(fname,mode)
however with wants to call __exit__ after its done and StringIO does not have an exit method...
you could define a custom class to use with this open
class MyStringIO:
def __init__(self,txt):
self.text = txt
def readlines(self):
return self.text.splitlines()
def __exit__(self):
pass
This one is based on the python doc of contextmanager
It's just wrapping StringIO with simple context, and when exit is called, it will return to the yield point, and properly close the StringIO. This avoids the need of making tempfile, but with large string, this will still eat up the memory, since StringIO buffer that string.
It works well on most cases where you know the string data is not going to be long
from contextlib import contextmanager
#contextmanager
def buildStringIO(strData):
from cStringIO import StringIO
try:
fi = StringIO(strData)
yield fi
finally:
fi.close()
Then you can do:
with buildStringIO('foobar') as f:
print(f.read()) # will print 'foobar'
Even if
You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()
In Python3 , this works to me:
from pprint import pprint
from io import StringIO
import contextlib
#contextlib.contextmanager
def as_handle(handleish, mode="r", **kwargs):
try:
with open(handleish, mode, **kwargs) as fp:
yield fp
except TypeError:
yield handleish
def processFile(filename):
#with filename as fh: ### OK for StringIO
#with(open(filename)) as fh: #TypeError: expected str, bytes or os.PathLike #object, not _io.StringIO
with as_handle(filename) as fh:
return fh.readlines()
# This fails ## doesnt fail anymore
plink_data = StringIO('StringIO data.')
print('This is the error.')
pprint(processFile(plink_data))
output:
This is the error.
['StringIO data.']

NameError: global name is not defined

Hello
My error is produced in generating a zip file. Can you inform what I should do?
main.py", line 2289, in get
buf=zipf.read(2048)
NameError: global name 'zipf' is not defined
The complete code is as follows:
def addFile(self,zipstream,url,fname):
# get the contents
result = urlfetch.fetch(url)
# store the contents in a stream
f=StringIO.StringIO(result.content)
length = result.headers['Content-Length']
f.seek(0)
# write the contents to the zip file
while True:
buff = f.read(int(length))
if buff=="":break
zipstream.writestr(fname,buff)
return zipstream
def get(self):
self.response.headers["Cache-Control"] = "public,max-age=%s" % 86400
start=datetime.datetime.now()-timedelta(days=20)
count = int(self.request.get('count')) if not self.request.get('count')=='' else 1000
from google.appengine.api import memcache
memcache_key = "ads"
data = memcache.get(memcache_key)
if data is None:
a= Ad.all().filter("modified >", start).filter("url IN", ['www.koolbusiness.com']).filter("published =", True).order("-modified").fetch(count)
memcache.set("ads", a)
else:
a = data
dispatch='templates/kml.html'
template_values = {'a': a , 'request':self.request,}
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Content-Length'] = len(output)
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(zipstream,"w")
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
zipstream.seek(0)
# create and return the output stream
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="list.kmz"'
while True:
buf=zipf.read(2048)
if buf=="": break
self.response.out.write(buf)
That is probably zipstream and not zipf. So replace that with zipstream and it might work.
i don't see where you declare zipf?
zipfile? Senthil Kumaran is probably right with zipstream since you seek(0) on zipstream before the while loop to read chunks of the mystery variable.
edit:
Almost certainly the variable is zipstream.
zipfile docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter
should be 'r' to read an existing
file, 'w' to truncate and write a new
file, or 'a' to append to an existing
file. If mode is 'a' and file refers
to an existing ZIP file, then
additional files are added to it. If
file does not refer to a ZIP file,
then a new ZIP archive is appended to
the file. This is meant for adding a
ZIP archive to another file (such as
python.exe).
your code:
zipsteam=StringIO.StringIO()
create a file-like object using StringIO which is essentially a "memory file" read more in docs
file = zipfile.ZipFile(zipstream,w)
opens the zipfile with the zipstream file-like object in 'w' mode
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
uses the addFile method to retrieve and write the retrieved data to the file-like object and returns it. The variables are slightly confusing because you pass a zipfile to the addFile method which aliases as zipstream (confusing because we are using zipstream as a StringIO file-like object). Anyways, the zipfile is returned, and closed to make sure everything is "written".
It was written to our "memory file", which we now seek to index 0
zipstream.seek(0)
and after doing some header stuff, we finally reach the while loop that will read our "memory-file" in chunks
while True:
buf=zipstream.read(2048)
if buf=="": break
self.response.out.write(buf)
You need to declare:
global zipf
right after your
def get(self):
line. you are modifying a global variable, and this is the only way python knows what you are doing.

How to use tempfile.NamedTemporaryFile() in Python

I want to use tempfile.NamedTemporaryFile() to write some contents into it and then open that file. I have written following code:
tf = tempfile.NamedTemporaryFile()
tfName = tf.name
tf.seek(0)
tf.write(contents)
tf.flush()
but I am unable to open this file and see its contents in Notepad or similar application. Is there any way to achieve this? Why can't I do something like:
os.system('start notepad.exe ' + tfName)
at the end.
I don't want to save the file permanently on my system. I just want the contents to be opened as a text in Notepad or similar application and delete the file when I close that application.
This could be one of two reasons:
Firstly, by default the temporary file is deleted as soon as it is closed. To fix this use:
tf = tempfile.NamedTemporaryFile(delete=False)
and then delete the file manually once you've finished viewing it in the other application.
Alternatively, it could be that because the file is still open in Python Windows won't let you open it using another application.
Edit: to answer some questions from the comments:
As of the docs from 2 when using delete=False the file can be removed by using:
tf.close()
os.unlink(tf.name)
You can also use it with a context manager so that the file will be closed/deleted when it goes out of scope. It will also be cleaned up if the code in the context manager raises.
import tempfile
with tempfile.NamedTemporaryFile() as temp:
temp.write('Some data')
temp.flush()
# do something interesting with temp before it is destroyed
Here is a useful context manager for this.
(In my opinion, this functionality should be part of the Python standard library.)
# python2 or python3
import contextlib
import os
#contextlib.contextmanager
def temporary_filename(suffix=None):
"""Context that introduces a temporary file.
Creates a temporary file, yields its name, and upon context exit, deletes it.
(In contrast, tempfile.NamedTemporaryFile() provides a 'file' object and
deletes the file as soon as that file object is closed, so the temporary file
cannot be safely re-opened by another library or process.)
Args:
suffix: desired filename extension (e.g. '.mp4').
Yields:
The name of the temporary file.
"""
import tempfile
try:
f = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
tmp_name = f.name
f.close()
yield tmp_name
finally:
os.unlink(tmp_name)
# Example:
with temporary_filename() as filename:
os.system('echo Hello >' + filename)
assert 6 <= os.path.getsize(filename) <= 8 # depending on text EOL
assert not os.path.exists(filename)

Categories