I'm just curious, is it possible to dump all the variables and current state of the program to a file, and then restore it on a different computer?!
Let's say that I have a little program in Python or Ruby, given a certain condition, it would dump all the current variables, and current state to a file.
Later, I could load it again, in a different machine, and return to it.
Something like VM snapshot function.
I've seen here a question like this, but Java related, saving the current JVM and running it again in a different JVM. Most of the people told that there was nothing like that, only Terracotta had something, still, not perfect.
Thank you.
To clarify what I am trying to achieve:
Given 2 or more Raspberry Pi's, I'm trying to run my software at Pi nº1, but then, when I need to do something different with it, I need to move the software to Pi nº2 without dataloss, only a minor break time.
And so on, to an unlimited number of machines.
Its seams that I was trying to re-invent the wheel.
Check this links:
http://en.wikipedia.org/wiki/Application_checkpointing#DMTCP
http://www.linuxscrew.com/2007/10/17/cryopid-freeze-and-unfreeze-processes-in-linux/
Good question.
In Smalltalk, yes.
Actually, in Smalltalk, dumping the whole program and restarting is the only way to store and share programs. There are no source files and there is no way of starting a program from square zero. So in Smalltalk you would get your feature for free.
The Smalltalk VM offers a hook where each object can register to restore its externals resources after a restart, like reopening files and internet connections. But also, for example integer arrays are registered to that hook to change the endianness of their values on case the dump has been moved to a machine with different endianness.
This might give a hunch at how difficult (or not) it might turn our to achieve this in a language which does not support resumable dumps by design.
All other languages are, alas, much less live. Except for some Lisp implementation, I would not know of any language which supports resuming from a memory dump.
Which is a missed opportunity.
I've seen Mariano demonstrate that using Fuel (object serialization) in Pharo Smalltalk on a recent Esug conference. You could continue debugging and running as long as you don't hit objects not serialized. Squeak smalltalk runs on the Pi, and if saving an image is good enough for you, this is trivial. We're still waiting for the faster JITting VM for ARM though (part of the Google Summer of Code programme)
Related
I am the TA for a coding class in which students will have to write Python 3 scripts to solve programming problems. An assignment consists of several problems, and for each problem the student is supposed to write a python program that will read input from standard input and write the output to the standard output. And for each problem there will be hidden test cases that we will use to evaluate their codes and grade them accordingly. So the idea is to automatize this process as much as possible. The problem is how to implement the whole framework to run students' assignments without compromising the safety of the system the assignments will be running on, which will probably be my laptop (which has Windows 10). I need to set up some kind of sandbox for Python 3, establishing limits for execution time, memory usage, disallowing access to the file system, networking, limiting imports only to safe modules from Python's standard library, etc.
Conceptually speaking I would like some kind of sand-boxed service that can receive a python script + some tests cases, the service runs the python script against the test cases in a safe environment (detecting compilation errors, time limit exceeded errors, memory limit exceeded errors, attempts to use forbidden libraries, etc.) and reporting the results back. So from Windows I can simply write a simple script that iterates over all students submissions and uses this service as a black-box to evaluate them.
Is anything like that possible on Windows 10? If so, how? My educated guess is that something like Docker or a Virtual Machine might be useful, but to be honest I'm not really sure because I lack enough expertise in these technologies, so I'm open to any suggestions.
Any advises on how to set up a secure system for automatic evaluation of untrusted Python 3 code submissions will be very appreciated.
What you are looking for a system that automatically evaluates a code using test cases.
You can use CMS to satisfy your use case. It is mainly a system to manage a programming contest, but it will be perfect for what you are trying to accomplish in your class.
My organization spends a lot of time processing GIS data. I have built a number of python scripts that perform different steps of the data processing. Other than the first script, all scripts rely on a different script to finish before it can start. Many of the scripts take 5+ minutes to execute (one is over an hour), so I do not want to repeat already-executed steps. I want this to work similar to Make, so that if an error occurs in "script3", I don't have to re-execute "script1" and "script2". I can just re-run "script3".
Is SCons the right tool for this? I looked at it, and it seems to be focused on compiling code rather than running scripts. I'm open to other suitable tools.
I am not sure a build system is what you want. Unless I am missing something, what you want is some kind of controlled automation to execute your processing tasks, and handle runtime errors.
Of course, 'make' and 'SCons' can do that, but it would be like using a bazooka to hammer a nail. And you're actually overlooking something that might be easier and more rewarding to invest time learning on the long run, which is Python itself. Python is a full-fledged, multi-paradigm programming language, with a lot of features for robust exception handling and interaction with the operating system (and it is heavily used in system administration on Unix-like platforms).
A first simple step would be to have a master script call each of your other scripts, each inside a try ... except block, and handle the exceptions according to your requirements. And you might improve on that as you go along, by refactoring your scripts into a consistent Python application.
Here are some links to start with: link1, link2.
Clarification: As per some of the comments, I should clarify that this is intended as a simple framework to allow execution of programs that are naturally parallel (so-called embarrassingly parallel programs). It isn't, and never will be, a solution for tasks which require communication or synchronisation between processes.
I've been looking for a simple process-based parallel programming environment in Python that can execute a function on multiple CPUs on a cluster, with the major criterion being that it needs to be able to execute unmodified Python code. The closest I found was Parallel Python, but pp does some pretty funky things, which can cause the code to not be executed in the correct context (with the appropriate modules imported etc).
I finally got tired of searching, so I decided to write my own. What I came up with is actually quite simple. The problem is, I'm not sure if what I've come up with is simple because I've failed to think of a lot of things. Here's what my program does:
I have a job server which hands out jobs to nodes in the cluster.
The jobs are handed out to servers listening on nodes by passing a dictionary that looks like this:
{
'moduleName':'some_module',
'funcName':'someFunction',
'localVars': {'someVar':someVal,...},
'globalVars':{'someOtherVar':someOtherVal,...},
'modulePath':'/a/path/to/a/directory',
'customPathHasPriority':aBoolean,
'args':(arg1,arg2,...),
'kwargs':{'kw1':val1, 'kw2':val2,...}
}
moduleName and funcName are mandatory, and the others are optional.
A node server takes this dictionary and does:
sys.path.append(modulePath)
globals()[moduleName]=__import__(moduleName, localVars, globalVars)
returnVal = globals()[moduleName].__dict__[funcName](*args, **kwargs)
On getting the return value, the server then sends it back to the job server which puts it into a thread-safe queue.
When the last job returns, the job server writes the output to a file and quits.
I'm sure there are niggles that need to be worked out, but is there anything obvious wrong with this approach? On first glance, it seems robust, requiring only that the nodes have access to the filesystem(s) containing the .py file and the dependencies. Using __import__ has the advantage that the code in the module is automatically run, and so the function should execute in the correct context.
Any suggestions or criticism would be greatly appreciated.
EDIT: I should mention that I've got the code-execution bit working, but the server and job server have yet to be written.
I have actually written something that probably satisfies your needs: jug. If it does not solve your problems, I promise you I'll fix any bugs you find.
The architecture is slightly different: workers all run the same code, but they effectively generate a similar dictionary and ask the central backend "has this been run?". If not, they run it (there is a locking mechanism too). The backend can simply be the filesystem if you are on an NFS system.
I myself have been tinkering with batch image manipulation across my computers, and my biggest problem was the fact that some things don't easily or natively pickle and transmit across the network.
for example: pygame's surfaces don't pickle. these I have to convert to strings by saving them in StringIO objects and then dumping it across the network.
If the data you are transmitting (eg your arguments) can be transmitted without fear, you should not have that many problems with network data.
Another thing comes to mind: what do you plan to do if a computer suddenly "disappears" while doing a task? while returning the data? do you have a plan for re-sending tasks?
This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Python, safe, sandbox [duplicate]
(9 answers)
Closed 9 years ago.
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.
If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake __builtin__ and __builtins__ in the corresponding globals/locals arguments to exec or eval.
Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing sys.settrace() inside restricted environment before any other code executed) so you can break execution if something will go bad.
If none of solutions is acceptable, use OS-level sandboxing like chroot, unionfs and standard multiprocess python module to spawn code worker in separate secured process.
You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.
It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.
I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.
You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.
I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.
Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.
What are some best practises for prototyping a filesystem?
I've had an attempt in Python using fusepy, and now I'm curious:
In the long run, should any
respectable filesystem implementation
be in C? Will not being in C hamper
portability, or eventually cause
performance issues?
Are there other implementations like
FUSE?
Evidently core filesystem technology moves slowly (fat32, ext3, ntfs, everything else is small fish), what debugging techniques are employed?
What is the general course filesystem development takes in arriving at a highly optimized, fully supported implementation in major OSs?
A filesystem that lives in userspace (be that in FUSE or the Mac version thereof) is a very handy thing indeed, but will not have the same performance as a traditional one that lives in kernel space (and thus must be in C). You could say that's the reason that microkernel systems (where filesystems and other things live in userspace) never really "left monolithic kernels in the dust" as A. Tanenbaum so assuredly stated when he attacked Linux in a famous posting on the Minix mailing list almost twenty years ago (as a CS professor, he said he'd fail Linus for choosing a monolithic architecture for his OS -- Linus of course responded spiritedly, and the whole exchange is now pretty famous and can be found in many spots on the web;-).
Portability's not really a problem, unless perhaps you're targeting "embedded" devices with very limited amounts of memory -- with the exception of such devices, you can run Python where you can run C (if anything it's the availability of FUSE that will limit you, not that of a Python runtime). But performance could definitely be.
In the long run, should any
respectable filesystem implementation
be in C? Will not being in C hamper
portability, or eventually cause
performance issues?
Not necessarily, there are plenty of performing languages different to C (O'Caml, C++ are the first that come to mind.) In fact, I expect NTFS to be written in C++. Thing is you seem to come from a Linux background, and as the Linux kernel is written in C, any filesystem with hopes to be merged into the kernel has to be written in C as well.
Are there other implementations like
FUSE?
There are a couple for Windows, for example, http://code.google.com/p/winflux/ and http://dokan-dev.net/en/ in various maturity levels
Evidently core filesystem technology
moves slowly (fat32, ext3, ntfs,
everything else is small fish), what
debugging techniques are employed?
Again, that is mostly true in Windows, in Solaris you have ZFS, and in Linux ext4 and btrfs exist. Debugging techniques usually involve turning machines off in the middle of various operations and see in what state data is left, storing huge amounts of data and see performance.
What is the general course filesystem
development takes in arriving at a
highly optimized, fully supported
implementation in major OSs?
Again, this depends on which OS, but it does involve a fair amount of testing, especially
making sure that failures do not lose data.
I recommend you create a mock object for the kernel block device API layer. The mock layer should use a mmap'd file as a backing store for the file system. There are a lot of benefits for doing this:
Extremely fast FS performance for running unit test cases.
Ability to insert debug code/break points into the mock layer to check for failure conditions.
Easy to save multiple copies of the file system state for study or running test cases.
Ability to deterministically introduce block device errors or other system events that the file system will have to handle.
Respectable filesystems will be fast and efficient. For Linux, that will basically mean writing in C, because you won't be taken seriously if you're not distributed with the kernel.
As for other tools like Fuse, There's MacFUSE, which will allow you to use the same code on macs as well as linux.