Proftpd verify complete upload - python

I was wondering whether there was a best practice for checking if an upload to your ftp server was successful.
The system I'm working with has an upload directory which contains subdirectories for every user where the files are uploaded.
Files in these directories are only temporary, they're disposed of once handled.
The system loops through each of these subdirectories and new files in them and for each file checks whether it's been modified for 10 seconds. If it hasn't been modified for 10 seconds the system assumed the file was uploaded successfully.
I don't like the way the system currently handles these situations, because it will try and handle the file and fail if the file upload was incomplete, instead of waiting and allowing the user to resume the upload until it's complete.
It might be fine for small files which doesn't take a lot of time to upload, but if the file is big I'd like to be able to resume the upload.
I also don't like the loops of directories and files, the system idles at a high cpu usage, so I've implemented pyinotify to trigger an action when a file is written. I haven't really looked at the source code, I can only assume it is more optimized than the current implementation (which does more than I've described).
However I still need to check whether the file was successfully uploaded.
I know I can parse the xferlog to get all complete uploads. Like:
awk '($12 ~ /^i$/ && $NF ~ /^c$/){print $9}' /var/log/proftpd/xferlog
This would make pyinotify unnecessary since I can get the path for complete and incomplete uploads if I only tail the log.
So my solution would be to check the xferlog in my run-loop and only handle complete files.
Unless there's a best practice or simply a better way to do this?
What would the disadvantages be with this method?
I run my app on a debian server and proftpd is installed on the same server. Also, I have no control over clients sending the file.

Looking at the proftpd docs, I see http://www.proftpd.org/docs/directives/linked/config_ref_HiddenStores.html
The HiddenStores directive enables two-step file uploads: files are
uploaded as ".in.filename." and once the upload is complete, renamed
to just "filename". This provides a degree of atomicity and helps
prevent 1) incomplete uploads and 2) files being used while they're
still in the progress of being uploaded.
This should be the "better way" to solve the problem when you have control of proftpd as it handles all the work for you - you can assume that any file which doesn't start .in. is a completed upload. You can also safely delete any orphan .in.* files after some arbitrary period of inactivity in a tidy-up script somewhere.

You can use pure-uploadscript if your pure-ftpd installation was compiled with
--with-uploadscript option.
It is used to launch a specified script after every upload is completely finished.
Set CallUploadScript to "yes"
Make a script with a command like touch /tmp/script.sh
Write the code in it. In my example the script renames the file and adds ".completed" before the file name:
#!/bin/bash
fullpath=$1
filename=$(basename "$1")
dirname=${fullpath%/*}
mv "$fullpath" "$dirname/completed.$filename"
Run chmod 755 /tmp/script.shto make the script executable by pure-uploadscript
Then run a command pure-uploadscript -B -r /etc/pure-ftpd/uploadscript.sh
Now /tmp/script.sh will be launched after each completed upload.

Related

Making sure that a script does not modify files in specific folder

I'm writing a python script which copies files from a server, performs a few operations on them, and delete the files locally after processing.
The script is not supposed to modify the files on the server in any way.
However, since bugs may occur, I would like to make sure that I'm not modifying\deleting the original server files.
Is there a way to prevent a python script from having writing permissions to a specific folder? I work on Windows OS.
That is unrelated to Python, but to the filesystem security provided by the OS. The key is that permissions are not given to programs but to the user under which they run.
Windows provides the command runas that allows to run a command (whatever the language is uses) under a different user. There is even a /savecred option that allows not to provide the password on each activation but instead save in in the current user's profile.
So if you setup a dedicated user to run the scrip, give it only read permissions on the server folder and run the scrip under that user, then even a bug in the script could not tamper that folder.
BTW, if the script is runned as a scheduled task, you can directly say what user should be used and give its password at config time.

Python: Stop watchdog reacting to partially transferred files?

I have previously written a script using python that monitors a windows directory and uploads any new files to a remote server offsite. The intent is to run it at all times and allow users to dump their files there to sync with the cloud directory.
When a file added is large enough that it is not transferred to the local drive all at once, Watchdog "sees" it as it is partially uploaded and tries to upload the partial file, which fails. How can I ensure that these files are "complete" before they are uploaded? Again, I am on Windows, and cannot use anything but Windows to complete this task, or I would have used inotify. Is it even possible to check the "state" of a file in this way on Windows?
It looks like there is no easy way to do this. I think you can put in place something that checks the stats on the directory when it triggers and only actions after a given amount of time that the folder size hasn't changed:
https://github.com/gorakhargosh/watchdog/issues/184
As a side note, I would check out Apache Nifi. I have used it with a lot of success and it was pretty easy to get up and running
https://nifi.apache.org/

Where to keep application specific log file

I am using apache.
If we keep it in var/www/ folder permission issues is raised.
I am thinking to keep log files in /tmp/ folder.Is it right place to keep log files?
No, /tmp would not be the right place to save log files.
According to the Filesystem Hierarchy Standard (FHS), the /tmp directory serves a different purpose:
3.17.1. Purpose
The /tmp directory must be made available for programs that require temporary files.
Programs must not assume that any files or directories in /tmp are preserved between invocations of the program.
The intent of writing log files is the ability to debug errors and keep track of program activity. Therefore, non-persistent logs would be of very little use.
For Logging there is the /var/log directory, as being recommended by the FHS:
5.10.1. Purpose
This directory contains miscellaneous log files. Most logs must be written to this directory or an appropriate subdirectory.
For the rights question I can only refer to WiseTechi, a /var/log/mydaemon directory is your way to go.

How to tell if a file is being written to a Windows CIFS share from Linux

I'm trying to write a script to take video files (ranging from several MB to several GB) written to a shared folder on a Windows server.
Ideally, the script will run on a Linux machine watching the Windows shared folder at an interval of something like every 15-120 seconds, and upload any files that have fully finished writing to the shared folder to an FTP site.
I haven't been able to determine any criteria that allows me to know for certain whether a file has been fully written to the share. It seems like Windows reserves a spot on the share for the entire size of the file (so the file size does not grow incrementally), and the modified date seems to be the time the file started writing, but it is not incremented as the file continues to grow. LSOF and fuser do not seem to be aware of the file, and even Samba tools don't seem to indicate it's locked, but I'm not sure if that's because I haven't mounted with the correct options. I've tried things like trying to open the file or rename it, and the best I've been able to come up with is a "Text File Busy" error code, but this seems to cause major delays in file copying. Naively uploading the file without checking to see if it has finished copying not only does not throw any kind of error, but actually seems to upload nul or random bytes from the allocated space to the FTP resulting in a totally corrupt file (if the network writing process is slower than the FTP) .
I have zero control over the writing process. It will take place on dozens of machines and consist pretty much exclusively of Windows OS file copies to a network share.
I can control the share options on the Windows server, and I have full control over the Linux box. Is there some method of checking locks on a Windows CIFS share that would allow me to be sure that the file has completely finished writing before I try to upload it via FTP? Or is the only possible solution to have the Linux server locally own the share?
Edit
The tldr, I'm really looking for the equivalent of something like 'lsof' that works for a cifs mounted share. I don't care how low level, though it would be ideal if it was something I could call from Python. I can't move the share or rename the files before they arrive.
I had this problem before, i'm not sure my way is the best way and it's most deffinatley a hacky fix, but i used a sleep interval and file size check, (i would expect the file to have grown if it was being written to...)
In my case i wanted to know that not only was the file not being written to but also that the windows share was not being written to...
my code is;
while [ "$(ls -la "$REMOTE_CSV_DIR"; sleep 15)" != "$(ls -la "$REMOTE_CSV_DIR")" ]; do
echo "File writing seems to be ocuring, waiting for files to finish copying..."
done
(ls -la includes file sizes in bits...)
What about this?:
Change the windows share to point to an actual Linux directory reserved for the purpose. Then, with simple Linux scripts, you can readily determine if any files there have any writers. Once there is a file not being written to, copy it to the windows folder—if that is where it needs to be.

Ensuring a test case can delete the temporary directory it created

(Platform: Linux, specifically Fedora and Red Hat Enterprise Linux 6)
I have an integration test written in Python that does the following:
creates a temporary directory
tells a web service (running under apache) to run an rsync job that copies files into that directory
checks the files have been copied correctly (i.e. the configuration was correctly passed from the client through to an rsync invocation via the web service)
(tries to) delete the temporary directory
At the moment, the last step is failing because rsync is creating the files with their ownership set to that of the apache user, and so the test case doesn't have the necessary permissions to delete the files.
This Server Fault question provides a good explanation for why the cleanup step currently fails given the situation the integration test sets up.
What I currently do: I just don't delete the temporary directory in the test cleanup, so these integration tests leave dummy files around that need to be cleared out of /tmp manually.
The main solution I am currently considering is to add a setuid script specifically to handle the cleanup operation for the test suite. This should work, but I'm hoping someone else can suggest a more elegant solution. Specifically, I'd really like it if nothing in the integration test client needed to care about the uid of the apache process.
Approaches I have considered but rejected for various reasons:
Run the test case as root. This actually works, but needing to run the test suite as root is rather ugly.
Set the sticky bit on the directory created by the test suite. As near as I can tell, rsync is ignoring this because it's set to copy the flags from the remote server. However, even tweaking the settings to only copy the execute bit didn't seem to help, so I'm still not really sure why this didn't work.
Adding the test user to the apache group. As rsync is creating the files without group write permission, this didn't help.
Running up an Apache instance as the test user and testing against that. This has some advantages (in that the integration tests won't require that apache be already running), but has the downside that I won't be able to run the integration tests against an Apache instance that has been preconfigured with the production settings to make sure those are correct. So even though I'll likely add this capability to the test suite eventually, it won't be as a replacement for solving the current problem more directly.
One other thing I really don't want to do is change the settings passed to rsync just so the test suite can correctly clean up the temporary directory. This is an integration test for the service daemon, so I want to use a configuration as close to production as I can get.
Add the test user to the apache group (or httpd group, whichever has group ownership on the files).
With the assistance of the answers to that Server Fault question, I was able to figure out a solution using setfacl.
The code that creates the temporary directory for the integration test now does the following (it's part of a unittest.TestCase instance, hence the reference to addCleanup):
local_path = tempfile.mkdtemp().decode("utf-8")
self.addCleanup(shutil.rmtree, local_path)
acl = "d:u:{0}:rwX".format(os.geteuid())
subprocess.check_call(["setfacl", "-m", acl, local_path])
The first two lines just create the temporary directory and ensure it gets deleted at the end of the test.
The last two lines are the new part and set the default ACL for the directory such that the test user always has read/write access and will also have execute permissions for anything with the execute bit set.

Categories