Why does tempfile.NamedTemporaryFile() truncate my data?

Why does tempfile.NamedTemporaryFile() truncate my data? - python

Here is a test I created to recreate a problem I was having when I used
tempfile.NamedTemporaryFile(). The problem is that when I use tempfile the
data in my CSV is truncated off the end of the file.
When you run this test script, temp2.csv will get truncated and temp1.csv
will be the same size as the original CSV.
I'm using Python 2.7.1.
You can download the sample CSV from http://explore.data.gov/Energy-and-Utilities/Residential-Energy-Consumption-Survey-RECS-Files-A/eypy-jxs2
#!/usr/bin/env python
import tempfile
import shutil
def main():
f = open('RECS05alldata.csv')
data = f.read()
f.close()
f = open('temp1.csv', 'w+b')
f.write(data)
f.close()
temp = tempfile.NamedTemporaryFile()
temp.write(data)
shutil.copy(temp.name, 'temp2.csv')
temp.close()
if __name__ == '__main__':
main()

Add temp.flush() after temp.write(data).

You copy the file before you close it. Files are buffered, which means that some of it will remain in the buffer while it is waiting to be written to the file. The close will write out all remaining data from the buffer to the file as part of the closing of the file.
This has nothing to do with NamedTemporaryFile.

I think your problem is that Python has not flushed the entire file to disk when you call shutil.copy.
Change
temp = tempfile.NamedTemporaryFile()
temp.write(data)
shutil.copy(temp.name, 'temp2.csv')
temp.close()
to
temp = tempfile.NamedTemporaryFile()
temp.write(data)
temp.close()
shutil.copy(temp.name, 'temp2.csv')

Related

Reading a binary file Python (pickle) [duplicate]

I created some data and stored it several times like this:
with open('filename', 'a') as f:
pickle.dump(data, f)
Every time the size of file increased, but when I open file
with open('filename', 'rb') as f:
x = pickle.load(f)
I can see only data from the last time.
How can I correctly read file?

Pickle serializes a single object at a time, and reads back a single object -
the pickled data is recorded in sequence on the file.
If you simply do pickle.load you should be reading the first object serialized into the file (not the last one as you've written).
After unserializing the first object, the file-pointer is at the beggining
of the next object - if you simply call pickle.load again, it will read that next object - do that until the end of the file.
objects = []
with (open("myfile", "rb")) as openfile:
while True:
try:
objects.append(pickle.load(openfile))
except EOFError:
break

There is a read_pickle function as part of pandas 0.22+
import pandas as pd
obj = pd.read_pickle(r'filepath')

The following is an example of how you might write and read a pickle file. Note that if you keep appending pickle data to the file, you will need to continue reading from the file until you find what you want or an exception is generated by reaching the end of the file. That is what the last function does.
import os
import pickle
PICKLE_FILE = 'pickle.dat'
def main():
# append data to the pickle file
add_to_pickle(PICKLE_FILE, 123)
add_to_pickle(PICKLE_FILE, 'Hello')
add_to_pickle(PICKLE_FILE, None)
add_to_pickle(PICKLE_FILE, b'World')
add_to_pickle(PICKLE_FILE, 456.789)
# load & show all stored objects
for item in read_from_pickle(PICKLE_FILE):
print(repr(item))
os.remove(PICKLE_FILE)
def add_to_pickle(path, item):
with open(path, 'ab') as file:
pickle.dump(item, file, pickle.HIGHEST_PROTOCOL)
def read_from_pickle(path):
with open(path, 'rb') as file:
try:
while True:
yield pickle.load(file)
except EOFError:
pass
if __name__ == '__main__':
main()

I developed a software tool that opens (most) Pickle files directly in your browser (nothing is transferred so it's 100% private):
https://pickleviewer.com/ (formerly)
Now it's hosted here: https://fire-6dcaa-273213.web.app/
Edit: Available here if you want to host it somewhere: https://github.com/ch-hristov/Pickle-viewer
Feel free to host this somewhere.

Undesired deletion of temporaly files

I am try to create some temporal files and make some operations on them inside a loop. Then I will access the information on all of the temporal files. And do some operations with that information. For simplicity I brought the following code that reproduces my issue:
import tempfile
tmp_files = []
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt")
with open(tmp.name, "w") as f:
f.write(str(i))
tmp_files.append(tmp.name)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
ERROR:
with open(tmp_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynh0kbnw.txt'
When I look on /tmp directory (with some time.sleep(2) on the loop) I see that the file is deleted and only one is preserved. And for that the error.
Of course I could handle to keep all the files with the flag tempfile.NamedTemporaryFile(suffix=".txt", delete=False). But that is not the idea. I would like to hold the temporal files just for the running time of the script. I also could delete the files with os.remove. But my question is more why this happen. Because I expected that the files hold to the end of the running. Because I don't close the file on the execution (or do I?).
A lot of thanks in advance.

tdelaney does already answer your actual question.
I just would like to offer you an alternative to NamedTemporaryFile. Why not creating a temporary folder which is removed (with all files in it) at the end of the script?
Instead of using a NamedTemporaryFile, you could use tempfile.TemporaryDirectory. The directory will be deleted when closed.
The example below uses the with statement which closes the file handle automatically when the block ends (see John Gordon's comment).
import os
import tempfile
with tempfile.TemporaryDirectory() as temp_folder:
tmp_files = []
for i in range(40):
tmp_file = os.path.join(temp_folder, f"{i}.txt")
with open(tmp_file, "w") as f:
f.write(str(i))
tmp_files.append(tmp_file)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)

By default, a NamedTemporaryFile deletes its file when closed. its a bit subtle, but tmp = tempfile.NamedTemporaryFile(suffix=".txt") in the loop causes the previous file to be deleted when tmp is reassigned. One option is to use the delete=False parameter. Or, just keep the file open and seek to the beginning after the write.
NamedTemporaryFile is already a file object - you can write to it directly without reopening. Just make sure the mode is "write plus" and in text, not binary mode. Put the code an a try/finally block to make sure the files are really deleted at the end.
import tempfile
tmp_files = []
try:
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt", mode="w+")
tmp.write(str(i))
tmp.seek(0)
tmp_files.append(tmp)
string = ""
for tmp_file in tmp_files:
data = tmp_file.read()
string += data
finally:
for tmp_file in tmp_files:
tmp_file.close()
print(string)

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?

You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html

file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.

Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()

import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.

See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()

in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)

Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))

Read from file while it is being written to in Python?

I followed the solution proposed here
In order to test it, I used two programs, writer.py and reader.py respectively.
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
print('I wrote {}'.format(i))
time.sleep(3)
i += 1
# reader.py
import time, os
#Set the filename and open the file
filename = 'pipe.txt'
file = open(filename, 'r', encoding = 'utf-8')
#Find the size of the file and move to the end
st_results = os.stat(filename)
st_size = st_results[6]
file.seek(st_size)
while 1:
where = file.tell()
line = file.readline()
if not line:
time.sleep(1)
file.seek(where)
else:
print(line)
But when I run:
> python writer.py
> python reader.py
the reader will print the lines after the writer has exited (when I kill the process)
Is there any other way around to read the contents the time they are being written ?
[EDIT]
The program that actually writes to the file is an .exe application and I don't have access to the source code.

You need to flush your writes/prints to files, or they'll default to being block-buffered (so you'd have to write several kilobytes before the user mode buffer would actually be sent to the OS for writing).
Simplest solution is to call .flush for after write calls:
f.write('{}'.format(i))
f.flush()

There are 2 different problems here:
OS and file system must allow concurrent accesses to a file. If you get no error it is the case, but on some systems it could be disallowed
The writer must flush its output to have it reach the disk so that the reader can find it. If you do not, the output stays in in memory buffer until those buffers are full which can require several kbytes
So writer should become:
# writer.py
import time
with open('pipe.txt', 'w', encoding = 'utf-8') as f:
i = 0
while True:
f.write('{}'.format(i))
f.flush()
print('I wrote {}'.format(i))
time.sleep(3)
i += 1

Not able to fix file handling issue in python

I wrote python code to search a pattern in a tcl file and replace it with a string, it prints the output but the same is not saved in the tcl file
import re
import fileinput
filename=open("Fdrc.tcl","r+")
for i in filename:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
filename.close()
actual result
set qa_label
REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
set qa_label harsh
Expected result is that in my file it should reflect the same result as above but it is not

You need to actually write your changes back to disk if you want to see them affected there. As #ImperishableNight says, you don't want to do this by trying to write to a file you're also reading from...you want to write to a new file. Here's an expanded version of your code that does that:
import re
import fileinput
fin=open("/tmp/Fdrc.tcl")
fout=open("/tmp/FdrcNew.tcl", "w")
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
fout.write(a)
else:
fout.write(i)
fin.close()
fout.close()
Input and output file contents:
> cat /tmp/Fdrc.tcl
set qa_label REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
> cat /tmp/FdrcNew.tcl
set qa_label harsh
If you wanted to overwrite the original file, then you would want to read the entire file into memory and close the input file stream, then open the file again for writing, and write modified content to the same file.
Here's a cleaner version of your code that does this...produces an in memory result and then writes that out using a new file handle. I am still writing to a different file here because that's usually what you want to do at least while you're testing your code. You can simply change the name of the second file to match the first and this code will overwrite the original file with the modified content:
import re
lines = []
with open("/tmp/Fdrc.tcl") as fin:
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
i=re.sub(r'REL.*','harsh',i)
print(i)
lines.append(i)
with open("/tmp/FdrcNew.tcl", "w") as fout:
fout.writelines(lines)

Open a tempfile for writing the updated file contents and open the file for writing.
After modifying the lines, write it back in the file.
import re
import fileinput
from tempfile import TemporaryFile
with TemporaryFile() as t:
with open("Fdrc.tcl", "r") as file_reader:
for line in file_reader:
if line.find("set qa_label") != -1:
t.write(
str.encode(
re.sub(r'REL.*', 'harsh', str(line))
)
)
else:
t.write(str.encode(line))
t.seek(0)
with open("Fdrc.tcl", "wb") as file_writer:
file_writer.writelines(t)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does tempfile.NamedTemporaryFile() truncate my data? - python

Add temp.flush() after temp.write(data).

Related

Reading a binary file Python (pickle) [duplicate]

Undesired deletion of temporaly files

os.write() appends file instead of overwriting, but O_APPEND isn't used [duplicate]

Read from file while it is being written to in Python?

Not able to fix file handling issue in python

Categories

Resources