Redirecting disk read/writes in Python

Redirecting disk read/writes in Python - python

Would it be possible for a python application which runs on a server to run another python application and intercept all HDD reads and writes made by the child application. Then send them through to a client application over a web socket so that the operation can be executed on the client rather than the server?

Intercepting real hard disk access is impossible without OS-specific changes.
A easier approach would be intercepting file access.
If you're importing the python module that does the writes, this can be done through simple monkey patching - just replace the file objects by instances of a custom class you create. You can even replace open, if you really want to.
If you're launching a separate process (such as with subprocess), and want to keep doing so, I suspect this will be impossible with pure python (without modifying the called program)
Some possible system-level solutions on linux:
using LD_PRELOAD to intercept library calls.
Write a FUSE program to intercept the filesystem access

Related

Cross-Process Memory Filesystem In Python

I have a main Python program that invokes (via Popen) another program in C++. The two programs transfer files to one another, and these files are rather huge.
I want to be able to keep those files in RAM instead of writing them to disk from one program, and then reading it in the other program.
The point is that I can't really touch the code of the C++ program, only the Python one, and all I can do is to inject the C++ program with filesystem paths, so I need an abstraction of filesystem over RAM.
I've seen the option of using PyFileSystem, but I'm not sure whether it is possible to use the MemoryFS paths in an external program, just as if it was a regular mount point. Seems as if it is only usable via the API of the FS object itself. (Be glad to know whether I'm wrong here)

Is it a good practice to open connection to Redis database inside the function that writes to it or outside globally?

I use a python script (docker container) to write to Redis db (docker container). The script main objective is to write to Redis db. But there are also other scripts that write to the same Redis db. So where should i make the connection to redis db inside a function in script or globally ?

If your python project is long running (e.g. a web app or a daemon script that runs forever) and making repeated calls, open a single connection and reuse it.
If your python code is short lived script (e.g. it runs for a few seconds then exits) then it doesn't matter so much. Even then, if it's making multiple reads/writes it's better to open one connection and reuse it in the script.
By the wording of your question, it sounds as though you might be thinking of opening the connection outside the script? I'm not really sure where you're going with that, so I can't answer there.

Script that can automatically download new data from the server to my local backup

I have an application running on linux server and I need to create a local backup of it's data.
However, new data is being added to the application after every hour and I want to sync my local backup data with server's data.
I want to write a script (shell or python) that can automatically download new added data from the linux server to my local machine backup. But I am newbie to the linux envoirnment and don't know how to write shell script to achieve this.
What is the better way of achieving this ? And what would be the script to do so ?

rsync -r fits in your use case and it's a single line command.
rsync -r source destination
or the options you need according to your specific case.
So, you don't need a python script for that, but you can still write it and let it use the command above.
Moreover, if you want the Python script to do it in an automatic way, you may check the event scheduler module.

This depends on where and how your data is stored on the Linux server, but you could write a network application which pushes the data to a client and the client saves the data on the local machine. You can use sockets for that.
If the data is available via aan http server and you know how to write RESTful APIs, you could use that as well and make a task run on your local machine every hour which calls the REST API and handles its (JSON) data. Keep in mind that you need to secure the API if the server is running online and not in the same LAN.
You could also write a small application which downloads the files every hour from the server over FTP (if you want to backup files stored on the system). You will need to know the exact path of the file(s) to do this though.
All solutions above are meant for Python programming. Using a shell script is possible, but a little more complicated. I would use Python for this kind of tasks, as you have a lot of network related libraries available (ftp, socket, http clients, simple http servers, WSGI libraries, etc.)

How to "redirect" filesystem read/write calls without root and performance degradation?

I have non-root access to a server that is shared by many users. I first develop and run some code locally, and then I want to rsync my data to a temporary location on a remote server and run my code on a remote server without changing any file paths.
I want to transparently hijack filesystem reads and writes and redirect them to different folders, like, if I run
redirect /home/a /home/b/remote-home/a python code.py
and then code tries to read from /home/a/a.txt, it should get content of /home/remote-home/a/a.txt, and same with writes.
I am particularly interested in doing this for a python process if that is necessary. I use a lot of third-party libraries that do file IO, so just mocking builtins.open is not an option. That IO is pretty intensive (reading and writing gigabytes of data), so performance degradation that exceeds something like 200-300% is an issue.
Options that I am aware of are:
redefining read,read64, write, etc. calls with a LD_PRELOAD that would call real functions with different paths under the hood
same with ptrace
unshare and remount parts of the filesystem, but userspace namespacse are disabled in my particular case for whatever security reasons
First two options seem not very reliable (and ptrace must be slow), unless there is some fairly stable piece of code that does exactly that so I could be sure that I did not make any obvious buffer overflow errors there. Containers like docker are not an options because they are not installed on the remote server. Unless, of course, there are some userspace containers that do not rely on linux namespaces under the hood.
UPD: not a full answer, but singularity manages to provide such functionality without giving everyone root privileges.

python copying directory and reading text files Remotely

I'm about to start working on a project where a Python script is able to remote into a Windows Server and read a bunch of text files in a certain directory. I was planning on using a module called WMI as that is the only way I have been able to successfully remotely access a windows server using Python, But upon further research I'm not sure i am going to be using this module.
The only problem is that, these text files are constantly updating about every 2 seconds and I'm afraid that the script will crash if it comes into an MutEx error where it tries to open the file while it is being rewritten. The only thing I can think of is creating a new directory, copying all the files (via script) into this directory in the state that they are in and reading them from there; and just constantly overwriting these ones with the new ones once it finishes checking all of the old ones. Unfortunately I don't know how to execute this correctly, or efficiently.
How can I go about doing this? Which python module would be best for this execution?

There is Windows support in Ansible these days. It uses winrm. There are plenty of Python libraries that utilize winrm, just google it, but Ansible is very versatile.
http://docs.ansible.com/intro_windows.html
https://msdn.microsoft.com/en-us/library/aa384426%28v=vs.85%29.aspx

I've done some work with WMI before (though not from Python) and I would not try to use it for a project like this. As you said WMI tends to be obscure and my experience says such things are hard to support long-term.
I would either work at the Windows API level, or possibly design a service that performs the desired actions access this service as needed. Of course, you will need to install this service on each machine you need to control. Both approaches have merit. The WinAPI approach pretty much guarantees you don't invent any new security holes and is simpler initially. The service approach should make the application faster and required less network traffic. I am sure you can think of others easily.
You still have to have the necessary permissions, network ports, etc. regardless of the approach. E.g., WMI is usually blocked by firewalls and you still run as some NT process.
Sorry, not really an answer as such -- meant as a long comment.
ADDED
Re: API programming, though you have no Windows API experience, I expect you find it familiar for tasks such as you describe, i.e., reading and writing files, scanning directories are nothing unique to Windows. You only need to learn about the parts of the API that interest you.
Once you create the appropriate security contexts and start your client process, there is nothing service-oriented in the, i.e., your can simply open and close files, etc., ignoring that fact that the files are remote, other than server name being included in the UNC name of the file/folder location.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.