I have a main Python program that invokes (via Popen) another program in C++. The two programs transfer files to one another, and these files are rather huge.
I want to be able to keep those files in RAM instead of writing them to disk from one program, and then reading it in the other program.
The point is that I can't really touch the code of the C++ program, only the Python one, and all I can do is to inject the C++ program with filesystem paths, so I need an abstraction of filesystem over RAM.
I've seen the option of using PyFileSystem, but I'm not sure whether it is possible to use the MemoryFS paths in an external program, just as if it was a regular mount point. Seems as if it is only usable via the API of the FS object itself. (Be glad to know whether I'm wrong here)
Related
I have non-root access to a server that is shared by many users. I first develop and run some code locally, and then I want to rsync my data to a temporary location on a remote server and run my code on a remote server without changing any file paths.
I want to transparently hijack filesystem reads and writes and redirect them to different folders, like, if I run
redirect /home/a /home/b/remote-home/a python code.py
and then code tries to read from /home/a/a.txt, it should get content of /home/remote-home/a/a.txt, and same with writes.
I am particularly interested in doing this for a python process if that is necessary. I use a lot of third-party libraries that do file IO, so just mocking builtins.open is not an option. That IO is pretty intensive (reading and writing gigabytes of data), so performance degradation that exceeds something like 200-300% is an issue.
Options that I am aware of are:
redefining read,read64, write, etc. calls with a LD_PRELOAD that would call real functions with different paths under the hood
same with ptrace
unshare and remount parts of the filesystem, but userspace namespacse are disabled in my particular case for whatever security reasons
First two options seem not very reliable (and ptrace must be slow), unless there is some fairly stable piece of code that does exactly that so I could be sure that I did not make any obvious buffer overflow errors there. Containers like docker are not an options because they are not installed on the remote server. Unless, of course, there are some userspace containers that do not rely on linux namespaces under the hood.
UPD: not a full answer, but singularity manages to provide such functionality without giving everyone root privileges.
I have previously written a script using python that monitors a windows directory and uploads any new files to a remote server offsite. The intent is to run it at all times and allow users to dump their files there to sync with the cloud directory.
When a file added is large enough that it is not transferred to the local drive all at once, Watchdog "sees" it as it is partially uploaded and tries to upload the partial file, which fails. How can I ensure that these files are "complete" before they are uploaded? Again, I am on Windows, and cannot use anything but Windows to complete this task, or I would have used inotify. Is it even possible to check the "state" of a file in this way on Windows?
It looks like there is no easy way to do this. I think you can put in place something that checks the stats on the directory when it triggers and only actions after a given amount of time that the folder size hasn't changed:
https://github.com/gorakhargosh/watchdog/issues/184
As a side note, I would check out Apache Nifi. I have used it with a lot of success and it was pretty easy to get up and running
https://nifi.apache.org/
I'm about to start working on a project where a Python script is able to remote into a Windows Server and read a bunch of text files in a certain directory. I was planning on using a module called WMI as that is the only way I have been able to successfully remotely access a windows server using Python, But upon further research I'm not sure i am going to be using this module.
The only problem is that, these text files are constantly updating about every 2 seconds and I'm afraid that the script will crash if it comes into an MutEx error where it tries to open the file while it is being rewritten. The only thing I can think of is creating a new directory, copying all the files (via script) into this directory in the state that they are in and reading them from there; and just constantly overwriting these ones with the new ones once it finishes checking all of the old ones. Unfortunately I don't know how to execute this correctly, or efficiently.
How can I go about doing this? Which python module would be best for this execution?
There is Windows support in Ansible these days. It uses winrm. There are plenty of Python libraries that utilize winrm, just google it, but Ansible is very versatile.
http://docs.ansible.com/intro_windows.html
https://msdn.microsoft.com/en-us/library/aa384426%28v=vs.85%29.aspx
I've done some work with WMI before (though not from Python) and I would not try to use it for a project like this. As you said WMI tends to be obscure and my experience says such things are hard to support long-term.
I would either work at the Windows API level, or possibly design a service that performs the desired actions access this service as needed. Of course, you will need to install this service on each machine you need to control. Both approaches have merit. The WinAPI approach pretty much guarantees you don't invent any new security holes and is simpler initially. The service approach should make the application faster and required less network traffic. I am sure you can think of others easily.
You still have to have the necessary permissions, network ports, etc. regardless of the approach. E.g., WMI is usually blocked by firewalls and you still run as some NT process.
Sorry, not really an answer as such -- meant as a long comment.
ADDED
Re: API programming, though you have no Windows API experience, I expect you find it familiar for tasks such as you describe, i.e., reading and writing files, scanning directories are nothing unique to Windows. You only need to learn about the parts of the API that interest you.
Once you create the appropriate security contexts and start your client process, there is nothing service-oriented in the, i.e., your can simply open and close files, etc., ignoring that fact that the files are remote, other than server name being included in the UNC name of the file/folder location.
Would it be possible for a python application which runs on a server to run another python application and intercept all HDD reads and writes made by the child application. Then send them through to a client application over a web socket so that the operation can be executed on the client rather than the server?
Intercepting real hard disk access is impossible without OS-specific changes.
A easier approach would be intercepting file access.
If you're importing the python module that does the writes, this can be done through simple monkey patching - just replace the file objects by instances of a custom class you create. You can even replace open, if you really want to.
If you're launching a separate process (such as with subprocess), and want to keep doing so, I suspect this will be impossible with pure python (without modifying the called program)
Some possible system-level solutions on linux:
using LD_PRELOAD to intercept library calls.
Write a FUSE program to intercept the filesystem access
I'm trying to write a script to take video files (ranging from several MB to several GB) written to a shared folder on a Windows server.
Ideally, the script will run on a Linux machine watching the Windows shared folder at an interval of something like every 15-120 seconds, and upload any files that have fully finished writing to the shared folder to an FTP site.
I haven't been able to determine any criteria that allows me to know for certain whether a file has been fully written to the share. It seems like Windows reserves a spot on the share for the entire size of the file (so the file size does not grow incrementally), and the modified date seems to be the time the file started writing, but it is not incremented as the file continues to grow. LSOF and fuser do not seem to be aware of the file, and even Samba tools don't seem to indicate it's locked, but I'm not sure if that's because I haven't mounted with the correct options. I've tried things like trying to open the file or rename it, and the best I've been able to come up with is a "Text File Busy" error code, but this seems to cause major delays in file copying. Naively uploading the file without checking to see if it has finished copying not only does not throw any kind of error, but actually seems to upload nul or random bytes from the allocated space to the FTP resulting in a totally corrupt file (if the network writing process is slower than the FTP) .
I have zero control over the writing process. It will take place on dozens of machines and consist pretty much exclusively of Windows OS file copies to a network share.
I can control the share options on the Windows server, and I have full control over the Linux box. Is there some method of checking locks on a Windows CIFS share that would allow me to be sure that the file has completely finished writing before I try to upload it via FTP? Or is the only possible solution to have the Linux server locally own the share?
Edit
The tldr, I'm really looking for the equivalent of something like 'lsof' that works for a cifs mounted share. I don't care how low level, though it would be ideal if it was something I could call from Python. I can't move the share or rename the files before they arrive.
I had this problem before, i'm not sure my way is the best way and it's most deffinatley a hacky fix, but i used a sleep interval and file size check, (i would expect the file to have grown if it was being written to...)
In my case i wanted to know that not only was the file not being written to but also that the windows share was not being written to...
my code is;
while [ "$(ls -la "$REMOTE_CSV_DIR"; sleep 15)" != "$(ls -la "$REMOTE_CSV_DIR")" ]; do
echo "File writing seems to be ocuring, waiting for files to finish copying..."
done
(ls -la includes file sizes in bits...)
What about this?:
Change the windows share to point to an actual Linux directory reserved for the purpose. Then, with simple Linux scripts, you can readily determine if any files there have any writers. Once there is a file not being written to, copy it to the windows folder—if that is where it needs to be.