Python shutil.move copies endlessly - python

I made a script that reads pdf-files from a src-folder, gets some info from the file and then renames the file and moves it to another location.
This is done every 5 seconds.
Since the src and dest are on different disks I use shutil.move instead of os.rename.
Because src and dest are on a different disk shutil.move wil instead copy and delete the source file.
The script works fine, but sometimes there are problems with permissions in the source folder.
This keeps the source file in the source folder and is copied endlessly because the source file can't get deleted.
How can I work around this? Since I don't keep the name of the original pdf-files I don't know how to solve this.

Have you considered keeping track of when you last moved files, and then only moving source files with later modification timestamps?
You can call os.scandir() to list the source directory, getting a DirEntry info on each of its entries. For each entry, check entry.is_file() to see if it's a file, entry.name to see if it ends with '.pdf', and its entry.stat().st_mtime to see if it's newer than your last scan.
You can still call shutil.copyfile() to copy each chosen file to the new name in the destination directory.

Related

White space in batch file and shelve saving in a different directory

I am having problems with Project Multiclipboard from Chapter 8 of the book: Automate the Boring Stuff and using Python 3.
The first issue is that, suppose my program mcb.pyw is saved in:
C:\Users\myName\folder name
where the last folder has a space in the name, my batch file:
#pyw.exe C:\Users\myName\folder name\mcb.pyw %*
doesn't seem to work properly from the command line. I can now type in
mcb save keyword
into the command line without getting an error, but it's not doing anything. After testing by changing the directory to a folder whose path has no space in it, I've concluded that the problem is because of the space, but I am unsure of how I might go about fixing this.
The second issue is that when the batch file is working, the module shelve seems to be saving the data in the wrong folder. Specifically, I noticed that if I were to run mcb.pyw from the command line, shelve would save the data in C:\Users\myName, which is also the default directory when you open the command windod, instead of the folder C:\Users\myName\folderName, where mcb.pyw and mcb.bat are saved.
I have gotten around this by including the lines:
import os
os.chdir('C:\\Users\\myName\\folderName')
However, is there any other way to solve this issue? Why is shelve saving in C:\Users\myName instead of the folder where everything is already saved?
I apologise if I have made any ettiquette or formatting problems. If you let me know what I did wrong I will do my best to fix it as soon as I can, thank you!
Files are always saved in the current working directory unless they are specified with path names, so you do have to change your working directory if the default one is not what you want.
You can avoid hard-coding the path name and always change your working directory where the script is located with:
import os
import sys
os.chdir(os.path.dirname(sys.argv[0]))

Python: get the list of files in a directory

I have a program in Python (Python 3 on Ubuntu 16.04) that checks for new files in a directory (.mp4 files are the result of segmenting a live video stream). I use os.listdir(path) to get the new files in my iterations. The problem is that when a new .mp4 file is created, first an empty file is created while the contents are being appended incrementally, so the file is not yet finalized/finished/playable (usually if you look at a folder, these files are shown like no extension).
Is it possible to ignore such non-finalized files at the Python level when getting the list of files in directory? Maybe some functions or API exists for that?
Using glob.glob('*.mp4', root_dir=path) should be just fine.
https://docs.python.org/3/library/glob.html

iterate through folder and sub folders to find a specific file [duplicate]

I am trying to automate a search and delete operation for specific files and folder underneath a specific folder. Below is the folder structure which I have:
Primary Directory is MasterFolder, which includes multiple sub directories which are Child Folders Fol1, Fol2, Fol3, Fol4 the sub directories may vary folder to folder.
The Sub folders have more files and subfolders. ExL Fol1 holds someFilesFolder, sometext.txt, AnotherFilesFolder same applies to other Fol2,Fol3 etc sub directories under the MasterFolder.
Now what I would like to do is I wound want to scan the MasterFolder and go through every ChildFolder and look for 1 file named someText.txt and 1 folder named someFilesFolder under every child folder and remove the same. Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder. I checked multiple articles but everything specifies deleting a specific file or a directory using shutil.rmtree under one folder, but I am looking for something which will do the find and delete recursively I believe.
To get you started:
Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder.
One easy way to go through every child folder under MasterFolder is to loop over [os.listdir]('/path/to/MasterFolder'). This will give you both files and child folders. You can check them each with os.path.isdir. But it's much simpler (and more efficient, and cleaner) to just try to operate on them as if they were all folders, and handle the exceptions on non-folders by doing nothing/logging/whatever seems appropriate.
The list you get back from listdir is just bare names, so you will need os.path.join to concatenate each name to /path/to/MasterFolder. And you'll need to use it to concatenate "someTxt.txt" and "someFilesFolder" as well, of course.
Finally, while you could listdir again on each child directory, and only delete the file and subdirectory if they exist, again, it's simpler (and cleaner and more efficient) to just try each one. You apparently already know how to shutil.rmtree and os.unlink, so… you're done.
If that "ideally" isn't actually guaranteed, instead of os.listdir, you will have to use os.walk. This is slightly more complicated, but if you look at the examples, then come back up and read the docs above the examples for the details, it's not hard to figure out.

Find and delete specific file and sub directory within a directory using Python

I am trying to automate a search and delete operation for specific files and folder underneath a specific folder. Below is the folder structure which I have:
Primary Directory is MasterFolder, which includes multiple sub directories which are Child Folders Fol1, Fol2, Fol3, Fol4 the sub directories may vary folder to folder.
The Sub folders have more files and subfolders. ExL Fol1 holds someFilesFolder, sometext.txt, AnotherFilesFolder same applies to other Fol2,Fol3 etc sub directories under the MasterFolder.
Now what I would like to do is I wound want to scan the MasterFolder and go through every ChildFolder and look for 1 file named someText.txt and 1 folder named someFilesFolder under every child folder and remove the same. Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder. I checked multiple articles but everything specifies deleting a specific file or a directory using shutil.rmtree under one folder, but I am looking for something which will do the find and delete recursively I believe.
To get you started:
Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder.
One easy way to go through every child folder under MasterFolder is to loop over [os.listdir]('/path/to/MasterFolder'). This will give you both files and child folders. You can check them each with os.path.isdir. But it's much simpler (and more efficient, and cleaner) to just try to operate on them as if they were all folders, and handle the exceptions on non-folders by doing nothing/logging/whatever seems appropriate.
The list you get back from listdir is just bare names, so you will need os.path.join to concatenate each name to /path/to/MasterFolder. And you'll need to use it to concatenate "someTxt.txt" and "someFilesFolder" as well, of course.
Finally, while you could listdir again on each child directory, and only delete the file and subdirectory if they exist, again, it's simpler (and cleaner and more efficient) to just try each one. You apparently already know how to shutil.rmtree and os.unlink, so… you're done.
If that "ideally" isn't actually guaranteed, instead of os.listdir, you will have to use os.walk. This is slightly more complicated, but if you look at the examples, then come back up and read the docs above the examples for the details, it's not hard to figure out.

Python rsync script directory names mirror

I have a script that i use to push files back to my home PC using rsync. File names successfully pushed are added to a sqlite database so they don't get pushed again ( since i only want 1 way mirror ). Anyhow, the problem that i have is that although the script recursively goes down the source path and push files based on a defined extension, the files go down the same destination root directory.
What i am trying to is to have the destination folder structure the same as the source.
I think i have to do add something to the destDir path, but not exactly sure what:
for root, dirs, files in os.walk(sourceDir):
for file in files:
//If some filtering criteria
print("Syncing new file: "+file)
cmd=["rsync"]
cmd.append(os.path.join(root, file))
cmd.append(destDir+ "/")
p=subprocess.Popen(cmd,shell=False)
if p.wait()==0:
rememberFile(file)
I think you should rely on the features of rsync for this as much as possible, rather than trying to reimplement it in Python. rsync has been extensively tested and is full-featured. They've fixed all of the bugs that you're encountering. For instance, in your original code snippet, you need to reconstruct the full path of your file (instead of just the filename) and add that to your destDir.
But before you keep debugging that, consider this alternative. Instead of a sql db, why not keep all of the files that you have pushed in a plain text file? Let's say it's called exclude_list.txt. Then your one-liner rsync command is:
rsync -r --exclude-from 'exclude_list.txt' src dst
The -r switch will cause it to traverse the file tree automatically. See topic #6 on this page for more details on this syntax.
Now you only need your Python script to maintain exclude_list.txt. I can think of two options:
Capture the output of rsync with the -v option to list the filenames that were moved, parse them, and append to exclude_list.txt. I think this is the most elegant solution. You can probably do it in just a few lines.
Use the script you already have to traverse the tree and add all of the files to exclude_list.txt, but remove all of the individual rsync calls. Then call rsync once at the end, as above.

Categories