Strange race condition: FileNotFoundError with mkdtemp - python

I am getting a rather bizarre race condition in Mac OS X with Python (I've only tested Python 3.3). I am making several temporary directories, writing things to them, and then clearing them. Something along the lines of
while running:
(do something)
tempdir = mkdtemp('name')
try:
(write some stuff to tempdir)
finally:
shutil.rmtree(tempdir)
However, in some of the later loops of the (write some stuff to tempdir), I get errors like
with open(os.path.join("/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmp0fh2ztname", "file"), 'w', encoding='utf-8') as fn:
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmpups5dpname/file'
(I've inlined the temp dir path for clarity)
Notice how the path being opened is not the same as the path that it can't find. In each case, the path in the error message is the temporary directory from the previous iteration of the loop.
The error is reproducible most of the time in the same place (after about the fourth iteration), but not every time.
EDIT: I just realized this is probably relevant. The (write some stuff to tempdir) stuff actually happens in a subprocess. This is how I am sure about the tempdir path, I have to pass it on to the subprocess (I actually lied about the "clarity" bit, I am actually writing out a Python file with that exact with open line). This is how I know for sure that the tempdir path is indeed different from the one being used.

I figured it out. It turns out it has nothing to do with mkdtemp (a sigh of relief that Mac OS X and Python are doing the right things there).
The problem is that I was writing out the code to a file, including the with open(os.path.join("/var/folders/yc/8wpl9rlx47qgzxqpcf003k280000gn/T/tmp0fh2ztname", "file"), 'w', encoding='utf-8') as fn: bit, and running it in a subprocess. The issue was that I was using the same file each time, and the .pyc files were not being invalidated correctly.
The error message was confusing because when Python generates a traceback, it reads the .py file (where the actual code is), but what is actually run is the .pyc file.
If I understand http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html correctly, the timestamps in .pyc files ony have one second granularity (this explains why this was reproducible in the same place each time: the same fourth item in the loop ran in under a second).
The solution was to explicitly delete the .pyc files when writing out the file (in other circumstances you could also write out to a temp file itself, but in my case I needed the file to be importable under the same name).
Something along the lines of
if sys.version_info >= (3,):
os.unlink(os.path.join(path_to_file, '__pycache__', 'file.cpython-%s%s.pyc' % sys.version_info[:2]))
os.unlink(os.path.join(path_to_file, '__pycache__', 'file.cpython-%s%s.pyo' % sys.version_info[:2]))
else:
os.unlink(os.path.join(path_to_file, 'file.pyc'))
os.unlink(os.path.join(path_to_file, 'file.pyo'))

Related

The 'os' module in Python says "No such file or directory" even when there definitely is one

I am trying to open an .npy file. However, Python keeps saying that there exists no such file or directory, even when there is one...
To make sure that it isn't an issue of giving correct path names, I changed my directory to the folder that contains the .npy file that I want. Then used list.dir() and used it to use np.load (the code is below):
os.chdir(filename_dir) #filename_dir is the directory I want to get to that contains the npy file I want)
the_path=os.path.join(os.getcwd()+os.listdir()[-1]) #i.e. I did this to make sure that the directory is correct
data=np.load(the_path)
However, I got the error, [Errno 2] No such file or directory: .......
The error occurs when I try np.load. I couldn't understand this because I explicitly made the path the_path from what Python says exists.
I think this is the problem...
Windows (by default) has a maximum path length of 260 characters. The absolute path to this file exceeds that limit. The filename alone is 143 characters.
It looks as though even if you try to access the file using a relative path (i.e., chdir() to the appropriate folder then specify just the filename), numpy is probably working out the absolute path then failing.
You have said that renaming the file to something much shorter solves the problem but is impractical due to the high numbers of files that you need to process.
There is a Windows Registry key that can be modified to enable long pathnames: Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled
Changing that may help.
Or you could buy a Mac
Try adding r. For example,
file_path = r"filePath"
This may work...

Windows - file opened by another process, still can rename it in Python

On Windows OS, just before doing some actions on my file, I need to know if it's in use by another process. After some serious research over all the other questions with a similar problem, I wasn't able to find a working solution for it.
os.rename(my_file.csv, my_file.csv) is still working even if I have the file opened with ... notepad let's say.
psutil ... it took too much time and it doesn't work (can't find my file path in nt.path:
for proc in psutil.process_iter():
try:
flist = proc.open_files()
if flist:
for nt in flist:
if my_file_path == nt.path:
print("it's here")
except psutil.NoSuchProcess as err:
print(err)
Is there any other solution for this?
UPDATE 1
I have to do 2 actions on this file: 1. check if the filename corresponds to a pattern; 2. copy it over SFTP.
UPDATE 2 + solution
Thanks to #Eryk Sun, I found out that Notepad "reads the contents into memory and then closes the handle". After opening my file with Word, os.rename and psutil are working like a (py)charm.
If the program You use opens the file by importing it (like Excel would do it, for example), that means that it transforms Your data in a readable form for itself, without keeping a hand on the actual file afterwards. If You save the file from there, it either saves it in the programs own format or exports (and transforms) the file back.
What dou You want to do with the file? Maybe You can simply copy the file?

Ignore "temp" files?

I've had a script for a while that has been running without issues however recently had a "hitch" with a temporary file that was within a directory.
The file in question started with '~$' on a windows PC so the script was erroring out on this file as it is not a proper DOCX file. The file in question was not open and occurred after being transferred of a network drive onto an external hard drive. Checking the destination drive (with hidden files on etc) did not show this file either.
I have attempted a quick fix off:
for (dirpath,dirnames,filenames) in os.walk('.'):
for filename in filenames:
if filename.endswith('.docx'):
filesList.append(os.path.join(dirpath,filename))
for file in filesList:
if file.startswith('~$'):
pass
else:
<rest of script>
However the script appears to be ignoring this to proceed then error out again, as the file is not "valid".
Does anyone know either why this isn't working or a quick solution to get it to ignore any files that are like this? I would attempt a if exists, however the file technically does exist so this wouldn't work either.
Sorry if its a bit stupid, but I am a bit stumped as to A. why its there and B. how to code around it.
In the second code block, your variable file contains the whole file path, not just the file name.
Instead skip the "bad" files in your first block instead of appending to the list:
for (dirpath,dirnames,filenames) in os.walk('.'):
for filename in filenames:
if filename.endswith('.docx'):
if not filename.startswith('~$'):
filesList.append(os.path.join(dirpath,filename))
The other option would be to check os.path.basename(file) in your second code block.

Python - [Errno 2] No such file or directory,

I am trying to make a minor modification to a python script made by my predecessor and I have bumped into a problem. I have studied programming, but coding is not my profession.
The python script processes SQL queries and writes them to an excel file, there is a folder where all the queries are kept in .txt format. The script creates a list of the queries found in the folder and goes through them one by one in a for cycle.
My problem is if I want to rename or add a query in the folder, I get a "[Errno 2] No such file or directory" error. The script uses relative path so I am puzzled why does it keep making errors for non-existing files.
queries_pathNIC = "./queriesNIC/"
def queriesDirer():
global filelist
l = 0
filelist = []
for file in os.listdir(queries_pathNIC):
if file.endswith(".txt"):
l+=1
filelist.append(file)
return(l)
Where the problem arises in the main function:
for round in range(0,queriesDirer()):
print ("\nQuery :",filelist[round])
file_query = open(queries_pathNIC+filelist[round],'r'); # problem on this line
file_query = str(file_query.read())
Contents of queriesNIC folder
00_1_Hardware_WelcomeNew.txt
00_2_Software_WelcomeNew.txt
00_3_Software_WelcomeNew_AUTORENEW.txt
The scripts runs without a problem, but if I change the first query name to
"00_1_Hardware_WelcomeNew_sth.txt" or anything different, I get the following error message:
FileNotFoundError: [Errno 2] No such file or directory: './queriesNIC/00_1_Hardware_WelcomeNew.txt'
I have also tried adding new text files to the folder (example: "00_1_Hardware_Other.txt") and the script simply skips processing the ones I added altogether and only goes with the original files.
I am using Python 3.4.
Does anyone have any suggestions what might be the problem?
Thank you
The following approach would be an improvement. The glob module can produce a list of files ending with .txt quite easily without needing to create a list.
import glob, os
queries_pathNIC = "./queriesNIC/"
def queriesDirer(directory):
return glob.glob(os.path.join(directory, "*.txt"))
for file_name in queriesDirer(queries_pathNIC):
print ("Query :", file_name)
with open(file_name, 'r') as f_query:
file_query = f_query.read()
From the sample you have given, it is not clear if you need further access to the round variable or the file list.

Limitation to Python's glob?

I'm using glob to feed file names to a loop like so:
inputcsvfiles = glob.iglob('NCCCSM*.csv')
for x in inputcsvfiles:
csvfilename = x
do stuff here
The toy example that I used to prototype this script works fine with 2, 10, or even 100 input csv files, but I actually need it to loop through 10,959 files. When using that many files, the script stops working after the first iteration and fails to find the second input file.
Given that the script works absolutely fine with a "reasonable" number of entries (2-100), but not with what I need (10,959) is there a better way to handle this situation, or some sort of parameter that I can set to allow for a high number of iterations?
PS- initially I was using glob.glob, but glob.iglob fairs no better.
Edit:
An expansion of above for more context...
# typical input file looks like this: "NCCCSM20110101.csv", "NCCCSM20110102.csv", etc.
inputcsvfiles = glob.iglob('NCCCSM*.csv')
# loop over individial input files
for x in inputcsvfiles:
csvfile = x
modelname = x[0:5]
# ArcPy
arcpy.AddJoin_management(inputshape, "CLIMATEID", csvfile, "CLIMATEID", "KEEP_COMMON")
do more stuff after
The script fails at the ArcPy line, where the "csvfile" variable gets passed into the command. The error reported is that it can't find a specified csv file (e.g., "NCCSM20110101.csv"), when in fact, the csv is definitely in the directory. Could it be that you can't reuse a declared variable (x) multiple times as I have above? Again, this will work fine if the directory being glob'd only has 100 or so files, but if there's a whole lot (e.g., 10,959), it fails seemingly arbitrarily somewhere down the list.
Try doing a ls * on shell for those 10,000 entries and shell would fail too. How about walking the directory and yield those files one by one for your purpose?
#credit - #dabeaz - generators tutorial
import os
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
lognames = gen_find("NCCCSM*.csv",".")
for name in lognames:
print name
One issue that arose was not with Python per se, but rather with ArcPy and/or MS handling of CSV files (more the latter, I think). As the loop iterates, it creates a schema.ini file whereby information on each CSV file processed in the loop gets added and stored. Over time, the schema.ini gets rather large and I believe that's when the performance issues arise.
My solution, although perhaps inelegant, was do delete the schema.ini file during each loop to avoid the issue. Doing so allowed me to process the 10k+ CSV files, although rather slowly. Truth be told, we wound up using GRASS and BASH scripting in the end.
If it works for 100 files but fails for 10000, then check that arcpy.AddJoin_management closes csvfile after it is done with it.
There is a limit on the number of open files that a process may have at any one time (which you can check by running ulimit -n).

Categories