collecting reaction times from inside a docker container - python

I develop python application for use in neuroscience and psychological research. These applications mostly present visual information and/or sounds and require input from the user (experiment subjects). Because of this, I need to solve two specific problems. First - applications frequently need to be distributed to many users, with different environments and operating systems. This is a big headache for me, because the people receiving the applications aren't necessarily very "tech savvy", so I end up spending a lot of time troubleshooting small issues. Second - because these applications are required for research, I need for them to be completely backwards compatible (like, 20 years later, compatible). This is because there are occasionally times that we need to re-run experiments from the past, or revisit some things we have done.
I've been playing around with docker lately,and I'm feeling like this might be the answer to my problems (and maybe for a lot of academics). If I could containerize my applications, with environments set up with specific versions of specific packages, I would be able to send them to anyone (which they could run from the container), and re-run things from the past in their original containers.
I feel like I'm getting conflicting information about the utility of docker for non-web (desktop) applications. Is there any reason this wouldn't work? Often I collect time sensitive input (like reaction time) - would running applications inside the docker (and somehow sharing the screen) substantially alter reaction time data? Would I lose the millisecond precision I am going for? Is this not really what docker is intended for?

Related

Package python code dependencies for remote execution on the fly

my situation is as follows, we have:
an "image" with all our dependencies in terms of software + our in-house python package
a "pod" in which such image is loaded on command (kubernetes pod)
a python project which has some uncommitted code of its own which leverages the in-house package
Also please assume you cannot work on the machine (or cluster) directly (say remote SSH interpreter). the cluster is multi-tenant and we want to optimize it as much as possible so no idle times between trials
also for now forget security issues, everything is locked down on our side so no issues there
we want to essentially to "distribute" remotely the workload -i.e. a script.py - (unfeasible in our local machines) without being constrained by git commits, therefore being able to do it "on the fly". this is necessary as all of the changes are experimental in nature (think ETL/pipeline kind of analysis): we want to be able to experiment at scale but with no bounds with regards to git.
I tried dill but I could not manage to make it work (probably due to the structure of the code). ideally, I would like to replicate the concept mleap applied to ML pipelines for Spark but on a much smaller scale, basically packaging but with little to no constraints.
What would be the preferred route for this use case?

How to run code 24/7 from cloud

My intention is to run python code 24/7 over several months to collect data through API calls and alert me if certain conditions are met.
How can I do that without keeping the code running on my laptop 24/7? Is there a way of doing it in "the cloud"?
Preferably free but would consider paying. Simplicity also a plus.
There is a plenty of free possibilities. You can use heroku servers. They have a strong documentation for python and is free (but of course with limit on megabytes and time usage). Heroku is also good because they have a lot of extensions (databases, dashboards, loggers etc.). In order to deploy your app on heroku you need to have some experience of working with git (but not that much, don't worry).
Another possibility could be Google Cloud Platform, which should be also easy to use.

Distributing jobs over multiple servers using python

I currently has an executable that when running uses all the cores on my server. I want to add another server, and have the jobs split between the two machines, but still each job using all the cores on the machine it is running. If both machines are busy I need the next job to queue until one of the two machines become free.
I thought this might be controlled by python, however I am a novice and not sure which python package would be the best for this problem.
I liked the "heapq" package for the queuing of the jobs, however it looked like it is designed for a single server use. I then looked into Ipython.parallel, but it seemed more designed for creating a separate smaller job for every core (on either one or more servers).
I saw a huge list of different options here (https://wiki.python.org/moin/ParallelProcessing) but I could do with some guidance as which way to go for a problem like this.
Can anyone suggest a package that may help with this problem, or a different way of approaching it?
Celery does exactly what you want - make it easy to distribute a task queue across multiple (many) machines.
See the Celery tutorial to get started.
Alternatively, IPython has its own multiprocessing library built in, based on ZeroMQ; see the introduction. I have not used this before, but it looks pretty straight-forward.

Encrypted and secure docker containers

We all know situations when you cannot go open source and freely distribute software - and I am in one of these situations.
I have an app that consists of a number of binaries (compiled from C sources) and Python code that wraps it all into a system. This app used to work as a cloud solution so users had access to app functions via network but no chance to touch the actual server where binaries and code are stored.
Now we want to deliver the "local" version of our system. The app will be running on PCs that our users will physically own. We know that everything could be broken, but at least want to protect the app from possible copying and reverse-engineering as much as possible.
I know that Docker is a wonderful deployment tool so I wonder: is it possible to create encrypted Docker containers where no one can see any data stored in the container's filesystem? Is there a known solution to this problem?
Also, maybe there are well known solutions not based on Docker?
The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.
If you can control the hardware running the operating system, then you might want to look at the Trusted Platform Module which starts system verification as soon as the system boots. You could then theoretically do things before the root user has access to the system to hide keys and strongly encrypt file systems. Even then, given physical access to the machine, a determined attacker can always get the decrypted data.
What you are asking about is called obfuscation. It has nothing to do with Docker and is a very language-specific problem; for data you can always do whatever mangling you want, but while you can hope to discourage the attacker it will never be secure. Even state-of-the-art encryption schemes can't help since the program (which you provide) has to contain the key.
C is usually hard enough to reverse engineer, for Python you can try pyobfuscate and similar.
For data, I found this question (keywords: encrypting files game).
If you want a completely secure solution, you're searching for the 'holy grail' of confidentiality: homomorphous encryption. In short, you want to encrypt your application and data, send them to a PC, and have this PC run them without its owner, OS, or anyone else being able to scoop at the data.
Doing so without a massive performance penalty is an active research project. There has been at least one project having managed this, but it still has limitations:
It's windows-only
The CPU has access to the key (ie, you have to trust Intel)
It's optimised for cloud scenarios. If you want to install this to multiple PCs, you need to provide the key in a secure way (ie just go there and type it yourself) to one of the PCs you're going to install your application, and this PC should be able to securely propagate the key to the other PCs.
Andy's suggestion on using the TPM has similar implications to points 2 and 3.
Sounds like Docker is not the right tool, because it was never intended to be used as a full-blown sandbox (at least based on what I've been reading). Why aren't you using a more full-blown VirtualBox approach? At least then you're able to lock up the virtual machine behind logins (as much as a physical installation on someone else's computer can be locked up) and run it isolated, encrypted filesystems and the whole nine yards.
You can either go lightweight and open, or fat and closed. I don't know that there's a "lightweight and closed" option.
I have exactly the same problem. Currently what I was able to discover is bellow.
A. Asylo(https://asylo.dev)
Asylo requires programs/algorithms to be written in C++.
Asylo library is integrated in docker and it seems to be feаsable to create custom dоcker image based on Asylo .
Asylo depends on many not so popular technologies like "proto buffers" and "bazel" etc. To me it seems that learning curve will be steep i.e. the person who is creating docker images/(programs) will need a lot of time to understand how to do it.
Asylo is free of charge
Asylo is bright new with all the advantages and disadvantages of being that.
Asylo is produced by Google but it is NOT an officially supported Google product according to the disclaimer on its page.
Asylo promises that data in trusted environment could be saved even from user with root privileges. However, there is lack of documentation and currently it is not clear how this could be implemented.
B. Scone(https://sconedocs.github.io)
It is binded to INTEL SGX technology but also there is Simulation mode(for development).
It is not free. It has just a small set of functionalities which are not paid.
Seems to support a lot of security functionalities.
Easy for use.
They seems to have more documentation and instructions how to build your own docker image with their technology.
For the Python part, you might consider using Pyinstaller, with appropriate options, it can pack your whole python app in a single executable file, which will not require python installation to be run by end users. It effectively runs a python interpreter on the packaged code, but it has a cipher option, which allows you to encrypt the bytecode.
Yes, the key will be somewhere around the executable, and a very savvy costumer might have the means to extract it, thus unraveling a not so readable code. It's up to you to know if your code contains some big secret you need to hide at all costs. I would probably not do it if I wanted to charge big money for any bug solving in the deployed product. I could use it if client has good compliance standards and is not a potential competitor, nor is expected to pay for more licenses.
While I've done this once, I honestly would avoid doing it again.
Regarding the C code, if you can compile it into executables and/or shared libraries can be included in the executable generated by Pyinstaller.

Python: tool to keep track of deployments

I'm looking for a tool to keep track of "what's running where". We have a bunch of servers, and on each of those a bunch of projects. These projects may be running on a specific version (hg tag/commit nr) and have their requirements at specific versions as well.
Fabric looks like a great start to do the actual deployments by automating the ssh part. However, once a deployment is done there is no overview of what was done.
Before reinventing the wheel I'd like to check here on SO as well (I did my best w/ Google but could be looking for the wrong keywords). Is there any such tool already?
(In practice I'm deploying Django projects, but I'm not sure that's relevant for the question; anything that keeps track of pip/virtualenv installs or server state in general should be fine)
many thanks,
Klaas
==========
EDIT FOR TEMP. SOLUTION
==========
For now, we've chosen to simply store this information in a simple key-value store (in our case: the filesystem) that we take great care to back up (in our case: using a DCVS). We keep track of this store with the same deployment tool that we use to do the actual deploys (in our case: fabric)
Passwords are stored inside a TrueCrypt volume that's stored inside our key-value store.
==========
I will still gladly accept any answer when some kind of Open Source solution to this problem pops up somewhere. I might share (part of) our solution somewhere myself in the near future.
pip freeze gives you a listing of all installed packages. Bonus: if you redirect the output to a file, you can use it as part of your deployment process to install all those packages (pip can programmatically install all packages from the file).
I see you're already using virtualenv. Good. You can run pip freeze -E myvirtualenv > myproject.reqs to generate a dependency file that doubles as a status report of the Python environment.
Perhaps you want something like Opscode Chef.
In their own words:
Chef works by allowing you to write
recipes that describe how you want a
part of your server (such as Apache,
MySQL, or Hadoop) to be configured.
These recipes describe a series of
resources that should be in a
particular state - for example,
packages that should be installed,
services that should be running, or
files that should be written. We then
make sure that each resource is
properly configured, only taking
corrective action when it's
neccessary. The result is a safe,
flexible mechanism for making sure
your servers are always running
exactly how you want them to be.
EDIT: Note Chef is not a Python tool, it is a general purpose tool, written in Ruby (it seems). But it is capable of supporting various "cookbooks", including one for installing/maintaining Python apps.

Categories