I am trying to automate a search and delete operation for specific files and folder underneath a specific folder. Below is the folder structure which I have:
Primary Directory is MasterFolder, which includes multiple sub directories which are Child Folders Fol1, Fol2, Fol3, Fol4 the sub directories may vary folder to folder.
The Sub folders have more files and subfolders. ExL Fol1 holds someFilesFolder, sometext.txt, AnotherFilesFolder same applies to other Fol2,Fol3 etc sub directories under the MasterFolder.
Now what I would like to do is I wound want to scan the MasterFolder and go through every ChildFolder and look for 1 file named someText.txt and 1 folder named someFilesFolder under every child folder and remove the same. Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder. I checked multiple articles but everything specifies deleting a specific file or a directory using shutil.rmtree under one folder, but I am looking for something which will do the find and delete recursively I believe.
To get you started:
Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder.
One easy way to go through every child folder under MasterFolder is to loop over [os.listdir]('/path/to/MasterFolder'). This will give you both files and child folders. You can check them each with os.path.isdir. But it's much simpler (and more efficient, and cleaner) to just try to operate on them as if they were all folders, and handle the exceptions on non-folders by doing nothing/logging/whatever seems appropriate.
The list you get back from listdir is just bare names, so you will need os.path.join to concatenate each name to /path/to/MasterFolder. And you'll need to use it to concatenate "someTxt.txt" and "someFilesFolder" as well, of course.
Finally, while you could listdir again on each child directory, and only delete the file and subdirectory if they exist, again, it's simpler (and cleaner and more efficient) to just try each one. You apparently already know how to shutil.rmtree and os.unlink, so… you're done.
If that "ideally" isn't actually guaranteed, instead of os.listdir, you will have to use os.walk. This is slightly more complicated, but if you look at the examples, then come back up and read the docs above the examples for the details, it's not hard to figure out.
Related
I'm wanting to make a little python script to check and compre the contents of two folders and all the files inside.
Cycle through folder structure based on Folder A
Compare every file from Folder A with Folder B
If the file doesn't exist or the contents is NOT 100% identical then to COPY the file to Folder C but in the same folder structure as Folder A
Could anyone advise on how to do such a feat?
I believe dircmp from filecmp does most of that for you:
https://docs.python.org/2/library/filecmp.html
You can just extend the basic example in this page. By using the attributes left_only, right_only and diff_files you can easily identify missing and not 100% idendical files.
I am trying to automate a search and delete operation for specific files and folder underneath a specific folder. Below is the folder structure which I have:
Primary Directory is MasterFolder, which includes multiple sub directories which are Child Folders Fol1, Fol2, Fol3, Fol4 the sub directories may vary folder to folder.
The Sub folders have more files and subfolders. ExL Fol1 holds someFilesFolder, sometext.txt, AnotherFilesFolder same applies to other Fol2,Fol3 etc sub directories under the MasterFolder.
Now what I would like to do is I wound want to scan the MasterFolder and go through every ChildFolder and look for 1 file named someText.txt and 1 folder named someFilesFolder under every child folder and remove the same. Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder. I checked multiple articles but everything specifies deleting a specific file or a directory using shutil.rmtree under one folder, but I am looking for something which will do the find and delete recursively I believe.
To get you started:
Ideally the folder name and file name I would want to delete is same under every ChildFolder, so the find should happen only one level down the MasterFolder.
One easy way to go through every child folder under MasterFolder is to loop over [os.listdir]('/path/to/MasterFolder'). This will give you both files and child folders. You can check them each with os.path.isdir. But it's much simpler (and more efficient, and cleaner) to just try to operate on them as if they were all folders, and handle the exceptions on non-folders by doing nothing/logging/whatever seems appropriate.
The list you get back from listdir is just bare names, so you will need os.path.join to concatenate each name to /path/to/MasterFolder. And you'll need to use it to concatenate "someTxt.txt" and "someFilesFolder" as well, of course.
Finally, while you could listdir again on each child directory, and only delete the file and subdirectory if they exist, again, it's simpler (and cleaner and more efficient) to just try each one. You apparently already know how to shutil.rmtree and os.unlink, so… you're done.
If that "ideally" isn't actually guaranteed, instead of os.listdir, you will have to use os.walk. This is slightly more complicated, but if you look at the examples, then come back up and read the docs above the examples for the details, it's not hard to figure out.
I am currently working on an app that syncs one specific folder in a users Google Drive. I need to find when any of the files/folders in that specific folder have changed. The actual syncing process is easy, but I don't want to do a full sync every few seconds.
I am condisering one of these methods:
1) Moniter the changes feed and look for any file changes
This method is easy but it will cause a sync if ANY file in the drive changes.
2) Frequently request all files in the whole drive eg. service.files().list().execute() and look for changes within the specific tree. This is a brute force approach. It will be too slow if the user has 1000's of files in their drive.
3) Start at the specific folder, and move down the folder tree looking for changes.
This method will be fast if there are only a few directories in the specific tree, but it will still lead to numerous API requests.
Are there any better ways to find whether a specific folder and its contents have changed?
Are there any optimisations I could apply to method 1,2 or 3.
As you have correctly stated, you will need to keep (or work out) the file hierarchy for a changed file to know whether a file has changed within a folder tree.
There is no way of knowing directly from the changes feed whether a deeply nested file within a folder has been changed. Sorry.
There are a couple of tricks that might help.
Firstly, if your app is using drive.file scope, then it will only see its own files. Depending on your specific situation, this may equate to your folder hierarchy.
Secondly, files can have multiple parents. So when creating a file in folder-top/folder-1/folder-1a/folder-1ai. you could declare both folder-1ai and folder-top as parents. Then you simply need to check for folder-top.
I have a Django app that needs to create a file in google drive: in FolderB/Sub1/Sub2/file.pdf. I have the id for FolderB but I don't know if Sub1 or Sub2 even exist. If not it should be created and the file.pdf should be put in it.
I figure I can look at children at each level and create the folder at each level if its not there, but this seems like a lot of checks and api calls just to create one file. Its also a harder task trying to accommodate multiple folder structures (ie, one python function that can accept any path of any depth and upload a file there)
The solution you have presented is the correct one. As you have realized, the Drive file system is not exactly like a hierarchical file system, so you will have to perform these checks.
One optimization you could perform is to try to find the grand-child folder (Sub2) first, so you will save a number of calls.
I have a directory of files that contain files of records. I just got access to a new directory that has the same records but additional files as well, but the additional files are buried deep inside other folders and i cant find them.
So my solution would be to have a python program run and delete all files that are duplicates in the two different directories (and subdirectories), and leave others intact, which will give me the "new files" i'm looking for.
I have seen a couple of programs that find duplicates, but i'm unsure as to how they run really, and they haven't been helpful.
Any way i can accomplish what i'm looking for?
Thanks!
Possible approach:
Create a set of MD5 hashes from your original folder.
Recursively MD5 hash the files in your new folder, deleting any files that generate hashes already present in your set.
Caveat to the above is that there is a chance two different files can generate the same hash. How different are the files?
use fslint or some similar software. Fslint is able to for example give you a list of the different files and hardlink the copies together, or delete the duplicates. One option is also to just use a diff-like program to diff the directories if their internal structure is the same.
Do they duplicate files in both directories have the same name/path? If I understand correctly you want to find the duplicate filenames rather than file contents? If so, a 'synchronised' call to os.walk in both trees might be helpful.