Python IOError: [Errno 24] Too many open files

Python IOError: [Errno 24] Too many open files - python

I plan to pass a user code script to python logger, run it using python bdb, then log the output in a file.
Here is my code in python logger:
try:
logger._runscript(script_str)
except bdb.BdbQuit:
pass
finally:
logger.finalize(filename)
where logger.finalize is defined below as function finalizer(output, filename).
The bdb will spawn a new thread and call the following finalizer function after execution:
def finalizer(output, filename):
outfile = open(filename , 'a')
outfile.write(json.dumps(output, indent = 4))
outfile.close()
Here output is execution result and we will write it to a file with filename.
I tested the three lines in the finalizer function, and they ran okay.
However, when they were called from python logger, I always get the following error message:
IOError: [Errno 24] Too many open files: filename
I only open one file, append a string to its end, then close it. Why are there "too many open files"? Can anyone kindly point me to the problem?
Here is the TraceBack info:
Traceback (most recent call last):
File "./exec.py", line 95, in <module>
File "./exec.py", line 82, in main
File "./exec.py", line 45, in run
File "path to project/logger.py", line 1321, in exec_script_str
File "path to project/logger.py", line 1292, in finalize
File "./exec.py", line 24, in finalizer
IOError: [Errno 24] Too many open files: 'test01.py'

This doesn't look like the logger from the standard library. Rather it looks like you are trying to use a third-party library that provides a sandbox environment. One of the restrictions is which that it forbids file access (see line 1233).
If you are creating the logger object yourself you can disable these security checks by creating with the appropriate flag eg.
def exec_str_with_user_ns(script_str, user_ns, finalizer_func):
logger = PGLogger(False, False, False, finalizer_func, disable_security_checks=True)
try:
logger._runscript(script_str, user_ns)
except bdb.BdbQuit:
pass
finally:
return logger.finalize()

Related

`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defined function

I have a trouble with concurrent.futures. For the short background, I was trying to do a massive image manipulation with python-opencv2. I stumbled upon performance issue, which is a pain considering it can take hours to process only hundreds of image. I found a solution by using concurrent.futures to utilize CPU multicores to make the process go faster (because I noticed while it took really long time to process, it only use like 16% of my 6-core processor, which is roughly a single-core). So I created the code but then I noticed that the multiprocessing actually start from the beginning of the code instead of isolated around the function I just created. Here's the minimal working reproduction of the error:
import glob
import concurrent.futures
import cv2
import os
def convert_this(filename):
### Read in the image data
img = cv2.imread(filename)
### Resize the image
res = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
res.save("output/"+filename)
try:
#create output dir
os.mkdir("output")
with concurrent.futures.ProcessPoolExecutor() as executor:
files = glob.glob("../project/temp/")
executor.map(convert_this, files)
except Exception as e:
print("Encountered Error!")
print(e)
filelist = glob.glob("output")
for f in filelist:
os.remove(f)
os.rmdir("output")
It gave me an error:
Encountered Error!
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
[WinError 183] Cannot create a file when that file already exists: 'output'
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\<username>\Anaconda3\envs\py37\lib\multiprocessing\spawn.py", line 105, in spawn_main
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
...
(it was repeating errors of the same "can't create file")
As you see, the os.mkdir was ran even though it's outside of the convert_this function I just defined. I'm not that new to Python but definitely new in multiprocessing and threading. Is this just how concurrent.futures behaves? Or am I missing some documentation reading?
Thanks.

Yes, multiprocessing must load the file in the new processes before it can run the function (just as it does when you run the file yourself), so it runs all code you have written. So, either (1) move your multiprocessing code to a separate file with nothing extra in it and call that, or (2) enclose your top level code in a function (e.g., main()), and at the bottom of your file write
If __name__ == ”__main__":
main()
This code will only be run when you start the script, but not by the multiprocess-spawned version. See Python docs for details on this construction.

Why doesn't fileinput throw an error when there's a bad path?

import fileinput
def main()
try:
lines = fileinput.input()
res = process_lines(lines)
...more code
except Exception:
print('is your file path bad?')
if __name__ == '__main__':
main()
When I run this code with a bad path, it doesn't throw an error, yet the docs say that an OS error will be thrown if there's an IO error. How do I test for bad paths then?

fileinput.input() returns an iterator, not an ad-hoc list:
In [1]: fileinput.input()
Out[1]: <fileinput.FileInput at 0x7fa9bea55a50>
Proper usage of this function is done via a for loop:
with fileinput.input() as files:
for line in files:
process_line(line)
or using a conversion to list:
lines = list(fileinput.input())
I.e. the files are opened only when you actually iterate over this object.
Although I wouldn't recommend the second way, as it is counter to the philosophy of how such scripts are supposed to work.
You are supposed to parse as little as you need to output data, and then output it as soon as possible. This avoids issues with large inputs, and if your script is used within a larger pipeline, speeds up the processing significantly.
With regards to checking whether the path is correct or not:
As soon as you'll iterate down to the file that doesn't exist, the iterator will throw an exception:
# script.py
import fileinput
with fileinput.input() as files:
for line in files:
print(repr(line))
$ echo abc > /tmp/this_exists
$ echo xyz > /tmp/this_also_exists
$ python script.py /tmp/this_exists /this/does/not /tmp/this_also_exists
'abc\n'
Traceback (most recent call last):
File "/tmp/script.py", line 6, in <module>
for line in files:
File "/home/mrmino/.pyenv/versions/3.7.7/lib/python3.7/fileinput.py", line 252, in __next__
line = self._readline()
File "/home/mrmino/.pyenv/versions/3.7.7/lib/python3.7/fileinput.py", line 364, in _readline
self._file = open(self._filename, self._mode)
FileNotFoundError: [Errno 2] No such file or directory: '/this/does/not'

Failing to make an array of Images in Python

I am trying to create an array of .jpg files, but the compiler is not building the array
More specifically, my problem is that a public folder, whose path is defined as the object path, is not accessible by my Python compiler [Spyder]. However, the folder, and its respective files are all public and open access to everyone. What might be the reason that my computer cannot access the images?
Code 1 is an simple function to find and access the file path I want, and the Kernal results show what is failing.
Code 2 is the syntax for the isolated error in the program I am applying the open() method. Kernal results depict compiler failure.
Code 1:
import os
path = r'C:/Users/BeckerLab/Pictures/Final_Sample_Set/Right2'
try:
os.path.exists(path)
if (True):
R = open(path)
R.close()
except FileNotFoundError:
print("file does not exist")
Kernal for Code 1:
!runfile('C:/Users/BeckerLab/untitled6.py', wdir='C:/Users/BeckerLab')
Traceback (most recent call last):
File "C:\Users\BeckerLab\untitled6.py", line 8, in <module>
R = open(path)
PermissionError: [Errno 13] Permission denied: 'C:/Users/BeckerLab/Pictures/Final_Sample_Set/Right2'
Code 2:
import os
rightSamples = [open(file, 'r+') for file in os.listdir(r'C:/Users/Public/Right2')]
Kernal Results for Code 2:
!runfile('C:/Users/BeckerLab/almost.py', wdir='C:/Users/BeckerLab')
2020-04-05 12:59:28
Traceback (most recent call last):
File "C:\Users\BeckerLab\almost.py", line 46, in <module>
rightSamples = [open(file, 'r+') for file in os.listdir(r'C:/Users/Public/Right2')]
File "C:\Users\BeckerLab\almost.py", line 46, in <listcomp>
rightSamples = [open(file, 'r+') for file in os.listdir(r'C:/Users/Public/Right2')]
FileNotFoundError: [Errno 2] No such file or directory: 'R1.JPG'

Notice that your condition is:
os.path.exists(path)
if (True):
which will always be true. Maybe try:
if (os.path.exists(path)):
Try moving the files to another directory like 'D:/../BeckerLab/untitled6.py'

Multiple File Stream Issue

In my project, I am using 3 files throughout the whole process. The source file (.ada), a "three address code" file (.TAC), and my own temporary file for use during processing (.TACTMP).
in Caller.py:
TACFILE = open(str(sys.argv[1])[:-4] + ".TAC", 'w') # line 17
# Does a bunch of stuff
TACFILE.close() # line 653
# the below function is imported from Called.py
post_process_file_handler() # line 654
in Called.py:
TAC_FILE_NAME = str(sys.argv[1])[:-4] # line 6
TAC_lines = open(TAC_FILE_NAME + ".TAC", 'r').readlines() # line 7
If I try to run my program without already having a (even if it's blank) .TAC file, I will get the following error:
Traceback (most recent call last):
File "Caller.py", line 8, in <module>
from Called import post_process_file_handler
File "Called.py", line 7, in <module>
TAC_lines = file(TAC_FILE_NAME + ".TAC", 'r').readlines()
IOError: [Errno 2] No such file or directory: 'test76.TAC'
Why would this be happening? This error is being thrown even if I put a breakpoint at the beginning of Caller.py, well before the post_process_file_handler() function ever gets called.
For clarity: test76.TAC should be being generated by Caller.py, and then Called.py should open that file to process it further, for some reason that isn't happening.

This may be specific to my case, but I found out the issue is due to the order and manner in which I was using these streams.
In short, when the import line was encountered:
from Called import post_process_file_handler
it triggered some sort of initialization, and since the file pointer was a global variable in Called.py, it was initialized before Caller.py had a chance to create the .TAC file it would read from.
Moving the import line to just before I use the function fixed my issue, as nothing in Called.py is initialized until after Caller.py is done doing its work.

Python multiprocessing Pool collision (file writing) due to process race-condition

The objective of the code is to read sqlite3 file and process the text and write to another file (gzip format). I am trying to use multiprocessing with pool, but
sometimes it generates errors and stop with error message "Cannot create a file when that file already exists". After the fail, if I repeat the same code, it works fine most of cases, which means this happens occasionally.
I guess this is related to race between processes in the pool, but cannot find a way to solve the problem. Normally it works nice, but sometimes it causes problem.
Also, I tried to terminate all the process at each directory level, and start new processes in the next directory.
P.S. Environment: Windows server 64bit, Python 2.7 64bit
import sqlite3 as lite
import gzip
import multiprocessing
def convert_txt((infile,outfile)):
try:
conn=lite.connect(infile)
conn.text_factory = str
except:
print 'Sql Lite error:', infile
return
try:
fout = gzip.open(outfile, 'wb')
except:
'File write error:', outfile
return
for line in conn.iterdump():
fout.write(line.replace('abc','def')
fout.close()
for directory in directory_list:
filenames=glob.glob('B:\\Hebercity UT\\*.txt')
p = multiprocessing.Pool(min(10,len(filenames)))
file_list=[]
for input_file in filenames:
output_file=input_file.replace('.txt','.csv')
file_list.append([input_file, output_file])
p.map(convert_txt, file_list)
time.sleep(1)
p.close() # close the pool and start the pool in the next directory
Traceback (most recent call last):
File "B:\gws_txt_converter_multi.py", line 100, in <module>
p.map(convert_txt_msg, fflist)
File "C:\opt\Anaconda\lib\multiprocessing\pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\opt\Anaconda\lib\multiprocessing\pool.py", line 558, in get
raise self._value
WindowsError: [Error 183] Cannot create a file when that file already exists: 'B:\\Hebercity UT'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python IOError: [Errno 24] Too many open files - python

Related

`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defined function

Why doesn't fileinput throw an error when there's a bad path?

Failing to make an array of Images in Python

Multiple File Stream Issue

Python multiprocessing Pool collision (file writing) due to process race-condition

Categories

Resources