I want to create a ramdisk in Python. I want to be able to do this in a cross-platform way, so it'll work on Windows XP-to-7, Mac, and Linux. I want to be able to read/write to the ramdisk like it's a normal drive, preferably with a drive letter/path.
The reason I want this is to write tests for a script that creates a directory with a certain structure. I want to create the directory completely in the ramdisk so I'll be sure it would be completely deleted after the tests are over. I considered using Python's tempfile, but if the test will be stopped in the middle the directory might not be deleted. I want to be completely sure it's deleted even if someone pulls the plug on the computer in the middle of a test.
How about PyFilesystem?
https://docs.pyfilesystem.org/en/latest/reference/memoryfs.html
https://docs.pyfilesystem.org/en/latest/reference/tempfs.html
The downside is that you have to access the filesystem with PyFilesystem API, but you can also access the real fs with PyFilesystem.
Because file and directory-handling is so low-level and OS dependent, I doubt anything like what you want exists (or is even possible). Your best bet might be to implement a "virtual" file-system-like set of functions, classes, and methods that keep track of the files and directory-hierarchy created and their content.
Callables in such an emulation would need to have the same signature and return the same value(s) as their counterparts in the various Python standard built-ins and modules your application uses.
I suspect this might not be as much work as it sounds -- emulating the standard Python file-system interface -- depending on how much of it you're actually using since you wouldn't necessarily have to imitate all of it. Also, if written in Pure Pythonâ„¢, it would also be portable and easy to maintain and enhance.
One option might be to inject (monkey patch) modified versions of the methods used in the os module as well as the builtins open and file that write to StringIO files instead of to disk. Obviously this substitution should only occur for the module being tested;
Please read this:
http://docs.python.org/library/tempfile.html#tempfile.TemporaryFile
"Return a file-like object that can be
used as a temporary storage area. The
file is created using mkstemp(). It
will be destroyed as soon as it is
closed (including an implicit close
when the object is garbage
collected)."
It's all handled for you. Do nothing and it already works.
Related
I'm working on a pure Python file parser for event logs, which may range in size from kilobytes to gigabytes. Is there a module that abstracts explicit .open()/.seek()/.read()/.close() calls into a simple buffer-like object? You might think of this as the inverse of StringIO. I expect it might look something like:
with FileBackedBuffer('/my/favorite/path', 'rb') as buf:
header = buf[0:0x10]
footer = buf[0x10000000:]
The mmap module may fulfill my requirements; however, I have two reservations that I'd appreciate feedback on:
It is important that the module handle files larger than available RAM/swap. I am unsure if mmap can do this well.
The mmap constructors are different depending on OS. This makes me hesitant as I am looking to write nicely cross-platform code, and would rather not muck in OS specifics. I will if I need to, but this set off a warning that I might be looking in the wrong place.
If mmap is the correct module for such as task, how does it handle these two points? If it is not, what is an appropriate module?
mmap can easily handle files larger than RAM/swap. What mmap can't do is handle files larger than the address space, which means that 32bit systems are limited in how large a file they can use.
What happens with mmap is that the OS will only have in memory as much data as it it chooses to, but you program will think it is all there. Be careful in usage patters though since if your data DOESN'T fit in RAM and you jump around too randomly, it will swap (discard pages from your file that you haven't used recently to make room for the new pages to be loaded).
If you don't need to specify anything base fileno and length, I don't believe you need to worry about the platform specific arguments for mmap. If you do need to worry about the extra arguments, then you will either have to master Windows versus Unix, or pass that on to your users. I don't know what your library will be, but it may be nice to provide reasonable defaults on both platforms while also allowing the user to tweak the options. It looks to me that it would be unlikely that you would care about the Windows tagname option, also, if you are cross platform, then just accept the Unix default for prot since you have no choice on Windows. That only leaves caring about MAP_PRIVATE and MAP_SHARED. The default is MAP_SHARED, but I'm not sure if that is the option that most closely matches Windows behavior, but accepting the default is probably fine there.
I just finished this exercise for beginners on creating and importing modules in python.
I was wondering does everything in the module get imported into the computer's memory?
Will there be implications later on with memory as the code gets longer and the modules imported become more numerous?
Will I need to know memory management to write resource-efficient code because of this?
Your modules are automatically compiled (.pyc files) which are then imported into memory, but you do not have to be affraid of getting out of memory: modules are very small; it is common to have thousands of modules loaded at a time!
You do not need to know memory management as Python does all the work for you.
edit: You can also write a lot of documentation of your code and modules in each module itself (and you should, read about docstrings here) without increasing the size or speed of the modules when loading, because the compiling-step takes out all unnecessary text, comments, etc.
I can only imagine one way that imports could be abused to leak memory; You could dynamically create and import modules of arbitrary name (say, for the purpose of creating a plugin system); use them once and stop using them. If you did this through the normal import machinery, ie with __import__(variable_module_name), those modules would be added to sys.modules and even though they will not be used any further.
The solution is well, don't do that. If you are really creating a plugin system, then dynamic imports of that sort are probably fine, since the plugins would get re-used. If you really need to use dynamically generated, single use code; use eval.
If you really, really need to use importing on dynamically generated code (say, for automated testing), then you probably do need to poke around in sys.modules to erase the modules that you imported. Here's a nice article explaining how to do something like that.
Yes and no.
Yes, the modules do get imported into the computers memory, but no you shouldn't be writing resource-efficient code because of this. Python modules are very small (a few KiB, in rare cases a few MiB) and have no significant effect on memory usage.
I'm currently writing a Python sandbox using sandboxed PyPy. Basically, the sandbox works by providing a "controller" that maps system library calls to a specified function instead. After following the instructions found at codespeak (which walk through the set up process), I realized that the default controller does not include a replacement for os.fstat(), and therefore crashes when I call open(). Specifically, the included pypy/translator/sandbox/sandlib.py does not contain a definition for do_ll_os__ll_os_fstat.
So far, I've implemented it as:
def do_ll_os__ll_os_fstat(self, fd):
return os.fstat(fd)
which seems to work fine. Is this safe? Will this create a hole in the sandbox?
The fstat call can reveal certain information which you may or may not want to keep secret. Among other things:
Whether two file descriptors are on the same filesystem
The block size of the underlying filesystem
Numeric UID/GIDs of file owners
Modification/access times of files
However, it will not modify anything, so if you don't mind this (relatively minor) information leak, no problem. You could also alter some of the results to mask information you want to hide (set owner UIDs/GIDs to 0, for example)
bdonlan's answer is good, but since there is a bounty here, what the heck :-)
You can see for yourself precisely what information fstat provides by reading the POSIX spec for struct stat.
It is definitely a "read-only" operation. And as a rule, Unix file descriptors only provide access to the single object to which they refer. For example, a (readable) file descriptor referencing a directory will allow you to list the files within the directory, but it will not allow you to access files within the directory; for that, you need to open() the file, which will perform a permission check.
Be aware that fstat can be called on non-files like directories or sockets. Here again, though, it will only provide the information you see in struct stat and it will not modify anything. (And for a socket, most of the fields will be meaningless.)
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
I'm setting up a web application to use IronPython for scripting various user actions and I'll be exposing various business objects ready for accessing by the script. I want to make it impossible for the user to import the CLR or other assemblies in order to keep the script's capabilities simple and restricted to the functionality I expose in my business objects.
How do I prevent the CLR and other assemblies/modules from being imported?
This would prevent imports of both python modules and .Net objects so may not be what you want. (I'm relatively new to Python so I might be missing some things as well):
Setup the environment.
Import anything you need the user to have access to.
Either prepend to their script or execute:
__builtins__.__import__ = None #Stops imports working
reload = None #Stops reloading working (specifically stops them reloading builtins
#giving back an unbroken __import___!
then execute their script.
You'll have to search the script for the imports you don't want them to use, and reject the script in toto if it contains any of them.
Basically, just reject the script if it contains Assembly.Load, import or AddReference.
You might want to implement the protection using Microsoft's Code Access Security. I myself am not fully aware of its workings (or how to make it work with IPy), but its something which I feel you should consider.
There's a discussion thread on the IPy mailing list which you might want to look at. The question asked is similar to yours.
If you'd like to disable certain built-in modules I'd suggest filing a feature request over at ironpython.codeplex.com. This should be an easy enough thing to implement.
Otherwise you could simply look at either Importer.cs and disallow the import there or you could simply delete ClrModule.cs from IronPython and re-build (and potentially remove any references to it).
In case anyone comes across this thread from google still (like i did)
I managed to disable 'import clr' in python scripts by commenting out the line
//[assembly: PythonModule("clr", typeof(IronPython.Runtime.ClrModule))]
in ClrModule.cs, but i'm not convinced this is a full solution to preventing unwanted access, since you will still need to override things like the file builtin.