I am try to create some temporal files and make some operations on them inside a loop. Then I will access the information on all of the temporal files. And do some operations with that information. For simplicity I brought the following code that reproduces my issue:
import tempfile
tmp_files = []
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt")
with open(tmp.name, "w") as f:
f.write(str(i))
tmp_files.append(tmp.name)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
ERROR:
with open(tmp_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynh0kbnw.txt'
When I look on /tmp directory (with some time.sleep(2) on the loop) I see that the file is deleted and only one is preserved. And for that the error.
Of course I could handle to keep all the files with the flag tempfile.NamedTemporaryFile(suffix=".txt", delete=False). But that is not the idea. I would like to hold the temporal files just for the running time of the script. I also could delete the files with os.remove. But my question is more why this happen. Because I expected that the files hold to the end of the running. Because I don't close the file on the execution (or do I?).
A lot of thanks in advance.
tdelaney does already answer your actual question.
I just would like to offer you an alternative to NamedTemporaryFile. Why not creating a temporary folder which is removed (with all files in it) at the end of the script?
Instead of using a NamedTemporaryFile, you could use tempfile.TemporaryDirectory. The directory will be deleted when closed.
The example below uses the with statement which closes the file handle automatically when the block ends (see John Gordon's comment).
import os
import tempfile
with tempfile.TemporaryDirectory() as temp_folder:
tmp_files = []
for i in range(40):
tmp_file = os.path.join(temp_folder, f"{i}.txt")
with open(tmp_file, "w") as f:
f.write(str(i))
tmp_files.append(tmp_file)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
By default, a NamedTemporaryFile deletes its file when closed. its a bit subtle, but tmp = tempfile.NamedTemporaryFile(suffix=".txt") in the loop causes the previous file to be deleted when tmp is reassigned. One option is to use the delete=False parameter. Or, just keep the file open and seek to the beginning after the write.
NamedTemporaryFile is already a file object - you can write to it directly without reopening. Just make sure the mode is "write plus" and in text, not binary mode. Put the code an a try/finally block to make sure the files are really deleted at the end.
import tempfile
tmp_files = []
try:
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt", mode="w+")
tmp.write(str(i))
tmp.seek(0)
tmp_files.append(tmp)
string = ""
for tmp_file in tmp_files:
data = tmp_file.read()
string += data
finally:
for tmp_file in tmp_files:
tmp_file.close()
print(string)
Related
I want to check if a text file content has data inside before running some functions. so I use the following Python code
if os.path.exists("userInformation.txt") and os.path.getsize("userInformation.txt") > 0:
with open("userInformation.txt") as info:
contents = info.readlines()
else:
new_file = open("userInformation.txt", "w")
new_file.close()
if file doesn't exist or nothing inside it works fine. However, if a file contains newline, which will be 2 bytes. Then will cause IndexError: list index out of range, after I run some function
I know i can use try, except to catch that index error. Is there any other way to it?
Thanks.
You should use fewer conditions and more idempotent logic:
from pathlib import Path
p = Path('userInformation.txt')
p.touch()
with p.open() as f:
contents = [l for l in f.readlines() if l.strip()]
Path.touch ensures the file exists, because that is your goal either way. Testing each line with str.strip ensures empty lines, including lines with only a linebreak, are omitted. This uses minimal calls to the filesystem and not a single if..else branch to achieve your goal.
you have 2 easy ways to check the file size:
os.path.getsize("sample.txt")
os.stat('sample.txt').st_size
warning - both will return an error if the file doesn't exists
so-
import os
def is_non_zero_file(file_path):
return os.path.isfile(file_path) and os.path.getsize(file_path) > 0
if is_non_zero_file(file_path):
with open(file_path, "r") as f:
data = f.read()
else:
#create file
with open(file_path, "w") as f:
f.write("data")
In that part of code I make the files txt and its working
import sys
for i in range(6):
file = open('teste{:d}.txt'.format(i), 'a')
sys.stdout = file
And now the problem, the files were created but in this part of code it didnt work, i can compile but the files are empty
for i in range(1,6):
f=open('100K_Array_{:d}.txt'.format(i), 'r')
alist = f.readlines()
quickSort(alist)
print(alist)
f.close()
It appears to me that you haven't closed your output file properly. You should either use
with open('teste{:d}.txt', 'a') as file:
...
in which case with statement will handle closing the file for you. Otherwise you need to add file.close() to your current code.
I'm using Python, and would like to insert a string into a text file without deleting or copying the file. How can I do that?
Unfortunately there is no way to insert into the middle of a file without re-writing it. As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you'll have to rewrite it.
This is an operating system thing, not a Python thing. It is the same in all languages.
What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that. This is better than reading the whole file into memory because the file may be too large for that. Once the temporary file is completed, I rename it the same as the original file.
This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.
Depends on what you want to do. To append you can open it with "a":
with open("foo.txt", "a") as f:
f.write("new line\n")
If you want to preprend something you have to read from the file first:
with open("foo.txt", "r+") as f:
old = f.read() # read everything in the file
f.seek(0) # rewind
f.write("new line\n" + old) # write the new line before
The fileinput module of the Python standard library will rewrite a file inplace if you use the inplace=1 parameter:
import sys
import fileinput
# replace all occurrences of 'sit' with 'SIT' and insert a line after the 5th
for i, line in enumerate(fileinput.input('lorem_ipsum.txt', inplace=1)):
sys.stdout.write(line.replace('sit', 'SIT')) # replace 'sit' and write
if i == 4: sys.stdout.write('\n') # write a blank line after the 5th line
Rewriting a file in place is often done by saving the old copy with a modified name. Unix folks add a ~ to mark the old one. Windows folks do all kinds of things -- add .bak or .old -- or rename the file entirely or put the ~ on the front of the name.
import shutil
shutil.move(afile, afile + "~")
destination= open(aFile, "w")
source= open(aFile + "~", "r")
for line in source:
destination.write(line)
if <some condition>:
destination.write(<some additional line> + "\n")
source.close()
destination.close()
Instead of shutil, you can use the following.
import os
os.rename(aFile, aFile + "~")
Python's mmap module will allow you to insert into a file. The following sample shows how it can be done in Unix (Windows mmap may be different). Note that this does not handle all error conditions and you might corrupt or lose the original file. Also, this won't handle unicode strings.
import os
from mmap import mmap
def insert(filename, str, pos):
if len(str) < 1:
# nothing to insert
return
f = open(filename, 'r+')
m = mmap(f.fileno(), os.path.getsize(filename))
origSize = m.size()
# or this could be an error
if pos > origSize:
pos = origSize
elif pos < 0:
pos = 0
m.resize(origSize + len(str))
m[pos+len(str):] = m[pos:origSize]
m[pos:pos+len(str)] = str
m.close()
f.close()
It is also possible to do this without mmap with files opened in 'r+' mode, but it is less convenient and less efficient as you'd have to read and temporarily store the contents of the file from the insertion position to EOF - which might be huge.
As mentioned by Adam you have to take your system limitations into consideration before you can decide on approach whether you have enough memory to read it all into memory replace parts of it and re-write it.
If you're dealing with a small file or have no memory issues this might help:
Option 1)
Read entire file into memory, do a regex substitution on the entire or part of the line and replace it with that line plus the extra line. You will need to make sure that the 'middle line' is unique in the file or if you have timestamps on each line this should be pretty reliable.
# open file with r+b (allow write and binary mode)
f = open("file.log", 'r+b')
# read entire content of file into memory
f_content = f.read()
# basically match middle line and replace it with itself and the extra line
f_content = re.sub(r'(middle line)', r'\1\nnew line', f_content)
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(f_content)
# close file
f.close()
Option 2)
Figure out middle line, and replace it with that line plus the extra line.
# open file with r+b (allow write and binary mode)
f = open("file.log" , 'r+b')
# get array of lines
f_content = f.readlines()
# get middle line
middle_line = len(f_content)/2
# overwrite middle line
f_content[middle_line] += "\nnew line"
# return pointer to top of file so we can re-write the content with replaced string
f.seek(0)
# clear file content
f.truncate()
# re-write the content with the updated content
f.write(''.join(f_content))
# close file
f.close()
Wrote a small class for doing this cleanly.
import tempfile
class FileModifierError(Exception):
pass
class FileModifier(object):
def __init__(self, fname):
self.__write_dict = {}
self.__filename = fname
self.__tempfile = tempfile.TemporaryFile()
with open(fname, 'rb') as fp:
for line in fp:
self.__tempfile.write(line)
self.__tempfile.seek(0)
def write(self, s, line_number = 'END'):
if line_number != 'END' and not isinstance(line_number, (int, float)):
raise FileModifierError("Line number %s is not a valid number" % line_number)
try:
self.__write_dict[line_number].append(s)
except KeyError:
self.__write_dict[line_number] = [s]
def writeline(self, s, line_number = 'END'):
self.write('%s\n' % s, line_number)
def writelines(self, s, line_number = 'END'):
for ln in s:
self.writeline(s, line_number)
def __popline(self, index, fp):
try:
ilines = self.__write_dict.pop(index)
for line in ilines:
fp.write(line)
except KeyError:
pass
def close(self):
self.__exit__(None, None, None)
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
with open(self.__filename,'w') as fp:
for index, line in enumerate(self.__tempfile.readlines()):
self.__popline(index, fp)
fp.write(line)
for index in sorted(self.__write_dict):
for line in self.__write_dict[index]:
fp.write(line)
self.__tempfile.close()
Then you can use it this way:
with FileModifier(filename) as fp:
fp.writeline("String 1", 0)
fp.writeline("String 2", 20)
fp.writeline("String 3") # To write at the end of the file
If you know some unix you could try the following:
Notes: $ means the command prompt
Say you have a file my_data.txt with content as such:
$ cat my_data.txt
This is a data file
with all of my data in it.
Then using the os module you can use the usual sed commands
import os
# Identifiers used are:
my_data_file = "my_data.txt"
command = "sed -i 's/all/none/' my_data.txt"
# Execute the command
os.system(command)
If you aren't aware of sed, check it out, it is extremely useful.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
This script currently grabs specific types of IP addresses out of a file, formats them into csv.
How do I change this to get it to look through all files in its directory (same dir as script) and create a new output file. This is my first week on python so please be as simple as possible.
#!usr/bin/python
# Extract IP address from file
#import modules
import re
# Open Source File
infile = open('stix1.xml', 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'w')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
I thinks you want to have a look at glob.glob: https://docs.python.org/2/library/glob.html
This will return a list of files matching a given pattern.
then you can do something like
import re, glob
def do_something_with(f):
# Open Source File
infile = open(f, 'r')
# Open output file
outfile = open('ExtractedIPs.csv', 'wa') ## ADDED a to append
# Create a list
BadIPs = []
### rest of you code
.
.
outfile.write(data)
infile.close
outfile.close
for f in glob.glob("*.xml"):
do_something_with(f)
assuming that you want to add all outputs to the same file this would be the script:
#!usr/bin/python
import glob
import re
for infileName in glob.glob("*.xml"):
# Open Source File
infile = open(infileName, 'r')
# Append to file
outfile = open('ExtractedIPs.csv', 'a')
# Create a list
BadIPs = []
#search each line in doc
for line in infile:
# ignore empty lines
if line.isspace(): continue
# find IP that are Indicator Titles
IP = (re.findall(r"(?:<indicator:Title>IP:) (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})", line))
# Only take finds
if not IP: continue
# Add each found IP to the BadIP list
BadIPs.append(IP)
#tidy up for CSV format
data = str(BadIPs)
data = data.replace('[', '')
data = data.replace(']', '')
data = data.replace("'", "")
# Write IPs to a file
outfile.write(data)
infile.close
outfile.close
You could get a list of all XML files like this.
filenames = [nm for nm in os.listdir() if nm.endswith('.xml')]
And then you iterate over all the files.
for fn in filenames:
with open(fn) as infile:
for ln in infile:
# do your thing
The with-statement makes sure that the file is closed after you're done with it.
import sys
Make a function out of your current code, for examle def extract(filename).
Call the script with all filenames: python myscript.py file1 file2 file3
Inside your script, loop over the filenames for filename in sys.argv[1:]:.
Call the function inside the loop: extract(filename).
I had a need to do this, and also to go into subdirectories as well. You need to import os and os.path, then can use a function like this:
def recursive_glob(rootdir='.', suffix=()):
""" recursively traverses full path from route, returns
paths and file names for files with suffix in tuple """
pathlist = []
filelist = []
for looproot,dirnames, filenames in os.walk(rootdir):
for filename in filenames:
if filename.endswith(suffix):
pathlist.append(os.path.join(looproot, filename))
filelist.append(filename)
return pathlist, filelist
You pass the function the top level directory you want to start from and the suffix for the file type you are looking for. This was written and tested for Windows, but I believe it will work on other OS's as well, as long as you've got file extensions to work from.
You could just use os.listdir() if all files in your current folder are relevant. If not, say all the .xml files, then use glob.glob("*.xml"). But the overall program can be improved, roughly as follows.
#import modules
import re
pat = re.compile(reg) # reg is your regex
with open("out.csv", "w") as fw:
writer = csv.writer(fw)
for f in os.listdir(): # or glob.glob("*.xml")
with open(f) as fr:
lines = (line for line in fr if line.isspace())
# genex for all ip in that file
ips = (ip for line in lines for ip in pat.findall(line))
writer.writerow(ips)
You probably have to change it to suit to exact needs. But the idea is in this version there are a lot less side effects, lot less memory consumption and close is managed by the context manager. Please comment if doesn't work.
Is there a method of creating a text file without opening a text file in "w" or "a" mode? For instance If I wanted to open a file in "r" mode but the file does not exist then when I catch IOError I want a new file to be created
e.g.:
while flag == True:
try:
# opening src in a+ mode will allow me to read and append to file
with open("Class {0} data.txt".format(classNo),"r") as src:
# list containing all data from file, one line is one item in list
data = src.readlines()
for ind,line in enumerate(data):
if surname.lower() and firstName.lower() in line.lower():
# overwrite the relevant item in data with the updated score
data[ind] = "{0} {1}\n".format(line.rstrip(),score)
rewrite = True
else:
with open("Class {0} data.txt".format(classNo),"a") as src:
src.write("{0},{1} : {2}{3} ".format(surname, firstName, score,"\n"))
if rewrite == True:
# reopen src in write mode and overwrite all the records with the items in data
with open("Class {} data.txt".format(classNo),"w") as src:
src.writelines(data)
flag = False
except IOError:
print("New data file created")
# Here I want a new file to be created and assigned to the variable src so when the
# while loop iterates for the second time the file should successfully open
At the beginning just check if the file exists and create it if it doesn't:
filename = "Class {0} data.txt"
if not os.path.isfile(filename):
open(filename, 'w').close()
From this point on you can assume the file exists, this will greatly simplify your code.
No operating system will allow you to create a file without actually writing to it. You can encapsulate this in a library so that the creation is not visible, but it is impossible to avoid writing to the file system if you really want to modify the file system.
Here is a quick and dirty open replacement which does what you propose.
def open_for_reading_create_if_missing(filename):
try:
handle = open(filename, 'r')
except IOError:
with open(filename, 'w') as f:
pass
handle = open(filename, 'r')
return handle
Better would be to create the file if it doesn't exist, e.g. Something like:
import sys, os
def ensure_file_exists(file_name):
""" Make sure that I file with the given name exists """
(the_dir, fname) = os.path.split(file_name)
if not os.path.exists(the_dir):
sys.mkdirs(the_dir) # This may give an exception if the directory cannot be made.
if not os.path.exists(file_name):
open(file_name, 'w').close()
You could even have a safe_open function that did something similar prior to opening for read and returning the file handle.
The sample code provided in the question is not very clear, specially because it invokes multiple variables that are not defined anywhere. But based on it here is my suggestion. You can create a function similar to touch + file open, but which will be platform agnostic.
def touch_open( filename):
try:
connect = open( filename, "r")
except IOError:
connect = open( filename, "a")
connect.close()
connect = open( filename, "r")
return connect
This function will open the file for you if it exists. If the file doesn't exist it will create a blank file with the same name and the open it. An additional bonus functionality with respect to import os; os.system('touch test.txt') is that it does not create a child process in the shell making it faster.
Since it doesn't use the with open(filename) as src syntax you should either remember to close the connection at the end with connection = touch_open( filename); connection.close() or preferably you could open it in a for loop. Example:
file2open = "test.txt"
for i, row in enumerate( touch_open( file2open)):
print i, row, # print the line number and content
This option should be preferred to data = src.readlines() followed by enumerate( data), found in your code, because it avoids looping twice through the file.