Why python has limit for count of file handles?

Why python has limit for count of file handles? - python

I writed simple code for test, how much files may be open in python script:
for i in xrange(2000):
fp = open('files/file_%d' % i, 'w')
fp.write(str(i))
fp.close()
fps = []
for x in xrange(2000):
h = open('files/file_%d' % x, 'r')
print h.read()
fps.append(h)
and I get a exception
IOError: [Errno 24] Too many open files: 'files/file_509'

The number of open files is limited by the operating system. On linux you can type
ulimit -n
to see what the limit is. If you are root, you can type
ulimit -n 2048
now your program will run ok (as root) since you have lifted the limit to 2048 open files

I see same behavior on Windows when running your code. The limit exists from C runtime. You can use win32file to change the limit value:
import win32file
print win32file._getmaxstdio()
The above shall give you 512, which explains the failure at #509 (+stdin, stderr, stdout as others have already stated)
Execute the following and your code shall run fine:
win32file._setmaxstdio(2048)
Note that 2048 is the hard limit, though (hard limit of the underlying C Stdio). As a result, executing the _setmaxstdio with a value greater than 2048 fails for me.

To check change the limit of open file handles on Linux, you can use the Python module resource:
import resource
# the soft limit imposed by the current configuration
# the hard limit imposed by the operating system.
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print 'Soft limit is ', soft
# For the following line to run, you need to execute the Python script as root.
resource.setrlimit(resource.RLIMIT_NOFILE, (3000, hard))
On Windows, I do as Punit S suggested:
import platform
if platform.system() == 'Windows':
import win32file
win32file._setmaxstdio(2048)

Most likely because the operating system has a limit for the number of files that an application can have open.

On Windows one can get or set the limit with the built-in ctypes library:
import ctypes
print("Before: {}".format(ctypes.windll.msvcrt._getmaxstdio()))
ctypes.windll.msvcrt._setmaxstdio(2048)
print("After: {}".format(ctypes.windll.msvcrt._getmaxstdio()))

Since this is not a Python problem, do this:
for x in xrange(2000):
with open('files/file_%d' % x, 'r') as h:
print h.read()
The following is a very bad idea.
fps.append(h)

The append is needed so the garbage collector does not clean up and close the files

Related

Python win32com CallNotImplementedError instead of AccessDenied?

This code:
import os
from win32com.shell import shell, shellcon
tempFile = os.path.join(os.path.abspath(os.path.curdir), u'_tempfile.tmp')
# print tempFile
dest = os.path.join('C:\Program Files', '_tempfile.tmp')
with open(tempFile, 'wb'): pass # create the file
try: # to move it into C:\Program Files
result, aborted = shell.SHFileOperation(
(None, # parent window
shellcon.FO_MOVE, tempFile, dest,
# 0,
shellcon.FOF_SILENT, # replace this with 0 to get a UAC prompt
None, None))
print result, aborted
except: # no exception raised
import traceback
traceback.print_exc()
Prints 120 False - 120 being CallNotImplementedError. If the flags are set to 0 then you get a UAC prompt as expected. Now why is not the result 5 (AccessDeniedError) ? Is something not implemented indeed or is it a bug or what do I not get ?
Needless to say that this was hard to debug - I was expecting an access denied and I had to look very closely to see what was wrong.

You're misreading the error code. SHFileOperation uses its own set of error codes separate from the system error codes; in this case, 0x78/120 is:
DE_ACCESSDENIEDSRC: Security settings denied access to the source.
That doesn't seem like it's the likely error (the destination is the problem), but this function is deprecated (replaced in Vista by IFileOperation) and while it still exists in Vista+, they may not have been particularly careful about returning entirely accurate error codes for situations (like UAC) that didn't exist pre-Vista.
Per the docs:
These are pre-Win32 error codes and are no longer supported or defined in any public header file. To use them, you must either define them yourself or compare against the numerical value.
These error codes are subject to change and have historically done so.
These values are provided only as an aid in debugging. They should not be regarded as definitive.

Python - How to get the start/base address of a process?

How do I get the start/base address of a process? Per example Solitaire.exe (solitaire.exe+BAFA8)
#-*- coding: utf-8 -*-
import ctypes, win32ui, win32process
PROCESS_ALL_ACCESS = 0x1F0FFF
HWND = win32ui.FindWindow(None,u"Solitär").GetSafeHwnd()
PID = win32process.GetWindowThreadProcessId(HWND)[1]
PROCESS = ctypes.windll.kernel32.OpenProcess(PROCESS_ALL_ACCESS,False,PID)
print PID, HWND,PROCESS
I would like to calculate a memory address and for this way I need the base address of solitaire.exe.
Here's a picture of what I mean:

I think the handle returned by GetModuleHandle is actually the base address of the given module. You get the handle of the exe by passing NULL.

Install pydbg
Source: https://github.com/OpenRCE/pydbg
Unofficial binaries here: http://www.lfd.uci.edu/~gohlke/pythonlibs/#pydbg
from pydbg import *
from pydbg.defines import *
import struct
dbg = pydbg()
path_exe = "C:\\windows\\system32\\calc.exe"
dbg.load(path_exe, "-u amir")
dbg.debug_event_loop()
parameter_addr = dbg.context.Esp #(+ 0x8)
print 'ESP (address) ',parameter_addr
#attach not working under Win7 for me
#pid = raw_input("Enter PID:")
#print 'PID entered %i'%int(pid)
#dbg.attach(int(pid)) #attaching to running process not working
You might want to have a look at PaiMei, although it's not very active right now https://github.com/OpenRCE/paimei
I couldn't get attach() to work and used load instead. Pydbg has loads of functionality, such as read_proccess_memory, write_process_memory etc.
Note that you can't randomly change memory, because an operating system protects memory of other processes from your process (protected mode). Before the x86 processors there were some which allowed all processors to run in real mode, i.e. the full access of memory for every programm. Non-malicious software usually (always?) doesn't read/write other processes' memory.

The HMDOULE value of GetModuleHandle is the base address of the loaded module and is probably the address you need to compute the offset.
If not, that address is the start of the header of the module (DLL/EXE), which can be displayed with the dumpbin utility that comes with Visual Studio or you can interpret it yourself using the Microsoft PE and COFF Specification to determine the AddressOfEntryPoint and BaseOfCode as offsets from the base address. If the base address of the module isn't what you need, one of these two is another option.
Example:
>>> BaseAddress = win32api.GetModuleHandle(None) + 0xBAFA8
>>> print '{:08X}'.format(BaseAddress)
1D0BAFA8
If The AddressOfEntryPoint or BaseOfCode is needed, you'll have to use ctypes to call ReadProcessMemory following the PE specification to locate the offsets, or just use dumpbin /headers solitaire.exe to learn the offsets.

You can use frida to easy do that.
It is very useful to make hack and do some memory operation just like make address offset, read memory, write something to special memory etc...
https://github.com/frida/frida
2021.08.01 update:
Thanks for #Simas Joneliunas reminding
There some step using frida(windows):
Install frida by pip
pip install frida-tools # CLI tools
pip install frida # Python bindings
Using frida api
session = frida.attach(processName)
script = session.create_script("""yourScript""")
script.load()
sys.stdin.read() #make program always alive
session.detach()
Edit your scrip(using JavaScrip)
var baseAddr = Module.findBaseAddress('solitaire.exe');
var firstPointer = baseAddr.add(0xBAFA8).readPointer();
var secondPointer = firstPointer.add(0x50).readPointer();
var thirdPointer = secondPointer.add(0x14).readPointer();
#if your target pointer points to a Ansi String, you can use #thirdPointer.readAnsiString() to read
The official site https://frida.re/

Alternative to psutil.Process(pid).name

I have measured the performance of psutil.Process(pid).name and it turns out that it is more than ten times slower than for example psutil.Process(pid).exe. Because the last one of these functions requires different privileges over the path, I cannot just just extract the filename from the path. My question is: Are there any alternatives to psutil.Process(pid).name, which does the same?

You mentioned this is for windows. I took a look at what psutil does for windows. It looks like psutil.Process().name is using the windows tool help API. If you look at psutil's Process code and trace .name, it goes to get_name() in process_info.c. It is looping through all the pids on your system until it finds the one you're looking for. I think this may be a limitation of the toolhelp API. But this is why it's slower than .exe, which uses a different API path, that (as you pointed out), requires additional privilege.
The solution I came up with is to use ctypes and ctypes.windll to call the windows ntapi directly. It only needs PROCESS_QUERY_INFORMATION, which is different than PROCESS_ALL_ACCESS:
import ctypes
import os.path
# duplicate the UNICODE_STRING structure from the windows API
class UNICODE_STRING(ctypes.Structure):
_fields_ = [
('Length', ctypes.c_short),
('MaximumLength', ctypes.c_short),
('Buffer', ctypes.c_wchar_p)
]
# args
pid = 8000 # put your pid here
# define some constants; from windows API reference
MAX_TOTAL_PATH_CHARS = 32767
PROCESS_QUERY_INFORMATION = 0x0400
PROCESS_IMAGE_FILE_NAME = 27
# open handles
ntdll = ctypes.windll.LoadLibrary('ntdll.dll')
process = ctypes.windll.kernel32.OpenProcess(PROCESS_QUERY_INFORMATION,
False, pid)
# allocate memory
buflen = (((MAX_TOTAL_PATH_CHARS + 1) * ctypes.sizeof(ctypes.c_wchar)) +
ctypes.sizeof(UNICODE_STRING))
buffer = ctypes.c_char_p(' ' * buflen)
# query process image filename and parse for process "name"
ntdll.NtQueryInformationProcess(process, PROCESS_IMAGE_FILE_NAME, buffer,
buflen, None)
pustr = ctypes.cast(buffer, ctypes.POINTER(UNICODE_STRING))
print os.path.split(pustr.contents.Buffer)[-1]
# cleanup
ctypes.windll.kernel32.CloseHandle(process)
ctypes.windll.kernel32.FreeLibrary(ntdll._handle)

As of psutil 1.1.0 this problem has been fixed, see https://code.google.com/p/psutil/issues/detail?id=426

Python mmap ctypes - read only

I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?

I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?

Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here

Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

In Python, how do I check if a drive exists w/o throwing an error for removable drives?

Here's what I have so far:
import os.path as op
for d in map(chr, range(98, 123)): #drives b-z
if not op.isdir(d + ':/'): continue
The problem is that it pops up a "No Disk" error box in Windows:
maya.exe - No Disk: There is no disk in
the drive. Please insert a disk into
drive \Device\Harddisk1\DR1 [Cancel, Try Again, Continue]
I can't catch the exception because it doesn't actually throw a Python error.
Apparently, this only happens on removable drives where there is a letter assigned, but no drive inserted.
Is there a way to get around this issue without specifically telling the script which drives to skip?
In my scenario, I'm at the school labs where the drive letters change depending on which lab computer I'm at. Also, I have zero security privileges to access disk management.

Use the ctypes package to access the GetLogicalDrives function. This does not require external libraries such as pywin32, so it's portable, although it is a little clunkier to work with. For example:
import ctypes
import itertools
import os
import string
import platform
def get_available_drives():
if 'Windows' not in platform.system():
return []
drive_bitmask = ctypes.cdll.kernel32.GetLogicalDrives()
return list(itertools.compress(string.ascii_uppercase,
map(lambda x:ord(x) - ord('0'), bin(drive_bitmask)[:1:-1])))
itertools.compress was added in Python 2.7 and 3.1; if you need to support <2.7 or <3.1, here's an implementation of that function:
def compress(data, selectors):
for d, s in zip(data, selectors):
if s:
yield d

Here's a way that works both on Windows and Linux, for both Python 2 and 3:
import platform,os
def hasdrive(letter):
return "Windows" in platform.system() and os.system("vol %s: 2>nul>nul" % (letter)) == 0

If you have the win32file module, you can call GetLogicalDrives():
def does_drive_exist(letter):
import win32file
return (win32file.GetLogicalDrives() >> (ord(letter.upper()) - 65) & 1) != 0

To disable the error popup, you need to set the SEM_FAILCRITICALERRORS Windows error flag using pywin:
old_mode = win32api.SetErrorMode(0)
SEM_FAILCRITICALERRORS = 1 # not provided by PyWin, last I checked
win32api.SetErrorMode(old_mode & 1)
This tells Win32 not to show the retry dialog; when an error happens, it's returned to the application immediately.
Note that this is what Python calls are supposed to do. In principle, Python should be setting this flag for you. Unfortunately, since Python may be embedded in another program, it can't change process-wide flags like that, and Win32 has no way to specify this flag in a way that only affects Python and not the rest of the code.

As long as a little parsing is acceptable, this is one way to do it without installing win32api and without iterating through all possible drive letters.
from subprocess import check_output
def getDriveLetters():
args = [
'wmic',
'logicaldisk',
'get',
'caption,description,providername',
'/format:csv'
]
output = check_output(args)
results = list()
for line in output.split('\n'):
if line:
lineSplit = line.split(',')
if len(lineSplit) == 4 and lineSplit[1][1] == ':':
results.append(lineSplit[1][0])
return results
You could also parse for specific drive types, such as "Network Connection" to get a list of all network mounted drive letters by adding and lineSplit[2] == 'Network Connection' for example.
Alternatively, rather than returning a list, you could return a dictionary, where keys are drive letters and values are unc paths (lineSplit[3]). Or whatever other info you want to pull from wmic. To see more options: wmic logicaldisk get /?

import os
possible_drives_list = [chr(97 + num).upper() for num in range(26)]
for drive in possible_drives_list:
print(drive + ' exists :' + str(os.path.exists(drive + ':\\')))

import os
def IsDriveExists(drive):
return os.path.exists(drive + ':\\')
print(IsDriveExists('c'))
print(IsDriveExists('d'))
print(IsDriveExists('e'))
print(IsDriveExists('x'))
print(IsDriveExists('v'))
this works in any os

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.