Python rsync script directory names mirror - python

I have a script that i use to push files back to my home PC using rsync. File names successfully pushed are added to a sqlite database so they don't get pushed again ( since i only want 1 way mirror ). Anyhow, the problem that i have is that although the script recursively goes down the source path and push files based on a defined extension, the files go down the same destination root directory.
What i am trying to is to have the destination folder structure the same as the source.
I think i have to do add something to the destDir path, but not exactly sure what:
for root, dirs, files in os.walk(sourceDir):
for file in files:
//If some filtering criteria
print("Syncing new file: "+file)
cmd=["rsync"]
cmd.append(os.path.join(root, file))
cmd.append(destDir+ "/")
p=subprocess.Popen(cmd,shell=False)
if p.wait()==0:
rememberFile(file)

I think you should rely on the features of rsync for this as much as possible, rather than trying to reimplement it in Python. rsync has been extensively tested and is full-featured. They've fixed all of the bugs that you're encountering. For instance, in your original code snippet, you need to reconstruct the full path of your file (instead of just the filename) and add that to your destDir.
But before you keep debugging that, consider this alternative. Instead of a sql db, why not keep all of the files that you have pushed in a plain text file? Let's say it's called exclude_list.txt. Then your one-liner rsync command is:
rsync -r --exclude-from 'exclude_list.txt' src dst
The -r switch will cause it to traverse the file tree automatically. See topic #6 on this page for more details on this syntax.
Now you only need your Python script to maintain exclude_list.txt. I can think of two options:
Capture the output of rsync with the -v option to list the filenames that were moved, parse them, and append to exclude_list.txt. I think this is the most elegant solution. You can probably do it in just a few lines.
Use the script you already have to traverse the tree and add all of the files to exclude_list.txt, but remove all of the individual rsync calls. Then call rsync once at the end, as above.

Related

Python shutil.move copies endlessly

I made a script that reads pdf-files from a src-folder, gets some info from the file and then renames the file and moves it to another location.
This is done every 5 seconds.
Since the src and dest are on different disks I use shutil.move instead of os.rename.
Because src and dest are on a different disk shutil.move wil instead copy and delete the source file.
The script works fine, but sometimes there are problems with permissions in the source folder.
This keeps the source file in the source folder and is copied endlessly because the source file can't get deleted.
How can I work around this? Since I don't keep the name of the original pdf-files I don't know how to solve this.
Have you considered keeping track of when you last moved files, and then only moving source files with later modification timestamps?
You can call os.scandir() to list the source directory, getting a DirEntry info on each of its entries. For each entry, check entry.is_file() to see if it's a file, entry.name to see if it ends with '.pdf', and its entry.stat().st_mtime to see if it's newer than your last scan.
You can still call shutil.copyfile() to copy each chosen file to the new name in the destination directory.

Python: iNotify_Simple getting files from other directories

I'm using inotify_simple to get notifications from a directory of directories. I'm accessing a directory that has multiple sub-directories, looping through those sub-directories and wanting to use inotify within each directory. My goal for this program is to be notified anytime anything happens (in particular, a creation of a file) in one of the sub-directories.
My file structure:
-mainDir
-subDir1
-file1
-file2
-subDir2
-file3
-file4
...etc.
I'm looping through the directories in mainDir, and setting that path as the path for inotify to search in:
for directory in os.listdir(self.path):
new_path = os.path.join(self.path, directory)
new_curr = self.inotify.new_curr_file_notification(new_path)
New path values are exactly what I expect:
.../mainDir/subDir1
.../mainDir/subDir2
When passing in new_path into my function (which is the path to give inotify), I'm expecting inotify to only look in that directory. However, I'm getting notifications that files in other directories are causing the notification.
path for inotify .../mainDir/subDir1
Event(wd=1, mask=256, cookie=0, name='someFileInSubDir2')
flags.CREATE
Does anyone know why this is happening? And, if anyone has any suggestions to make this process easier/better, I'm all ears! Thanks!
I'm the author of inotify_simple, and since it doesn't have a method called new_curr_file_notification, I'm guessing that's that's something you wrote. Without seeing that method, or some more code that demonstrates how you're using the library exactly, I unfortunately can't give you any advice, as there's not enough information to see how you're using inotify_simple.
If you post a complete example, I will probably be able to tell what's going wrong.
Feel free to post a bug report on the project't github if it looks like there might be a bug.

Ubuntu/Python- How to call ubuntu commands and 3rd party applications through Python

Basically, I'm trying to navigate through directories and call a specific program (called galfit). The reason I navigate through the directories is because all of the files that I want to run through galfit are in that directory. However, there are dozens of files, and individually running each file through galfit would take far too long. On top of that, they take a while to process, so the overall process is incredibly slow.
Here's what the Ubuntu terminal code looks like:
vidur#vidur-VirtualBox:~$ cd Documents
vidur#vidur-VirtualBox:~/Documents$ cd XDF_Thumbnails_sci
vidur#vidur-VirtualBox:~/Documents$ ls
documents-export-2013-07-08 XDF_Images_Sci XDF_Images_Wht XDF_Thumbnails_Sci
vidur#vidur-VirtualBox:~/Documents$ cd XDF_Thumbnails_Sci
vidur#vidur-VirtualBox:~/Documents/XDF_Thumbnails_Sci$ ~/galfit galfit.feedme
galfit.feedme is the feedme file that I wish to process; however, there are about fifty files in total (with different names, of course!) that I wish to process.
So my question is, how do you approach that through Python? Eventually I'll be looping through all the files (and likely somehow auto-naming them, that's easy), but what's the process to get to the directory and then run galfit?
Take a look at os.path for directory navigation. To execute a shell command use os.system. The example you posted could go something along the lines of:
os.chdir(os.path.expanduser('~/Documents/XDF_Thumbnails_Sci'))
for file in os.listdir('.'):
if os.path.splitext(file)[1] == ".feedme":
os.system("~/galfit %s" % file)

Python/Django: how to get files fastest (based on path and name)

My website users can upload image files, which then need to be found whenever they are to be displayed on a page (using src = ""). Currently, I put all images into one directory. What if there are many files - is it slow to find the right file? Are they indexed? Should I create subdirectories instead?
I use Python/Django. Everything is on webfaction.
The access time for an individual file are not affected by the quantity of files in the same directory.
running ls -l on a directory with more files in it will take longer of course. Same as viewing that directory in the file browser. Of course it might be easier to work with these images if you store them in a subdirectory defined by the user's name. But that just depends on what you are going to doing with them. There is no technical reason to do so.
Think about it like this. The full path to the image file (/srv/site/images/my_pony.jpg) is the actual address of the file. Your web server process looks there, and returns any data it finds or a 404 if there is nothing. What it doesn't do is list all the files in /srv/site/images and look through that list to see if it contains an item called my_pony.jpg.
If only for organizational purposes, and to help with system maintenance you should create subdirectories. Otherwise, there is very little chance you'll run into the maximum number of files that a directory can hold.
There is negligible performance implication for the web. For other applications though (file listing, ftp, backup, etc.) there may be consequences, but only if you reach a very large number of files.

Python: Correct way to write to another directory within a package

Currently working with the following package structure:
/package
__init__.py
final.py
/write
__init__.py
write.py
/data
backup.txt
backup1.txt
backup2.txt
final.py imports write.py, which should be able to go back one directory and write a series of backup .txt files to /data.
final.py should be able to go into /data during another call and access the backup files, hence the need to save the information in /data.
I'm not sure this should be the correct hierarchy for a package file? How would /write create text files in a directory branch separate from itself without using absolute file paths in case the whole project file is moved, say onto a server.
Would it be wrong (once the backup.txt files are created) to add a retrieve.py to /data which returns the .txt files in some sort of data structure and make /data a package, or (2) should final.py directly enter /data and retrieve the text files.
IMO you shouldn't be writing into your packages. Set your code up so that you are writing to a data directory that is potentially outside your package. Numerous code deployment strategies assume that your code will be in a directory that is not normally writable. (E.g. if it is packaged for common linux distributions, the code will go into /usr/lib/python.../yourpackage/ and the data will be written to /var/lib/yourpackage, or something similar.)
Put your retrieve.py outside of .../data, possibly in a .../read directory, or alongside final.py, depending on the organization you need.
To write to an arbitrary location, just pass the full path to open. For example, assume that you store the path to your data directory in a constant:
DATA_PATH = '/var/lib/mypackage'
def backup():
f = open(os.path.join(DATA_PATH, 'backup.txt'), 'w')
f.write('some backup data...')
f.close()

Categories