I'm writing a script to reorganise a large data set so that it'll be mostly compatible with a database system we're currently implementing. Most of our data is currently not organised in any meaningful way, although some folders are already marked in the way the script will. As such I've implemented a case to catch this.
Let's call this case "INFO". In all but the first instance of this my script works fine, finds the "INFO_example[2,3,4....]" folder at the top level, and moves it to a new "INFO" folder that contains all of these examples. However, for some reason in the first instance "INFO_example1" it instead takes the contents of the folder and dumps them in "INFO" instead.
I've attempted to debug the problem, but can't see any differences between the first and any other instances. The folder path also doesn't appear any differently.
file_path = join(self.path, file)
try:
move(file_path, self.info_path)
except shutil.Error:
print("Trying to move", file_path, " didn't work")
I'm a little stumped to what's actually going on.
I'd expect the "INFO_example1" folder to behave as all the others do, and just be moved into the top level "INFO" folder.
Currently, it's contents are moved into "INFO", and the "INFO_example1" folder appears to be deleted.
My print message also never fires.
So, the issue was that shutil.move would look at say "INFO_example1" and try to move it to "INFO", notice that "INFO" did not exist, and so would create it and put the contents of "INFO_example1" in to it instead of creating another folder within and then moving the contents into there.
Due to the nature of the script creating "INFO" before trying to move files caused more issues, however, if I updated the section that moves the "INFO_example" folders, to look like:
file_path = join(self.path, file)
try:
if not isdir(self.info_path):
mkdir(self.info_path)
move(file_path, self.info_path)
except shutil.Error:
print("Trying to move", file_path, " didn't work")
Checking whether the "INFO" directory exists or not, and if not creating it but only when I'm attempting to move an "INFO_example" style folder gives the expected behaviour, at least for myself.
Related
I have an s3 bucket which has 4 folders now of which is input/.
After the my airflow DAG Runs at the end of the py code are few lines which attempt to delete all files in the input/.
response_keys = self._s3_hook.delete_objects(bucket=self.s3_bucket, keys=s3_input_keys)
deleted_keys = [x['Key'] for x in response_keys.get("Deleted", []) if x['Key'] not in ['input/']]
self.log.info("Deleted: %s", deleted_keys)
if "Errors" in response_keys:
errors_keys = [x['Key'] for x in response_keys.get("Errors", [])]
raise AirflowException("Errors when deleting: {}".format(errors_keys))
Now, this sometimes deletes all files and sometimes deletes the directory itself. I am not sure why it is deleting even though I have specifically excluded the same.
Is there any other way I can try to achieve the deletion?
PS I tried using BOTO, but the AWS has a security which will not let both access the buckets. so Hook is all I got. Please help
Directories do not exist in Amazon S3. Instead, the Key (filename) of an object includes the full path. For example, the Key might be invoices/january.xls, which includes the path.
When an object is created in a path, the directory magically appears. If all objects in a directory are deleted, then the directory magically disappears (because it never actually existed).
However, if you click the Create Folder button in the Amazon S3 management console, a zero-byte object is created with the name of the directory. This forces the directory to 'appear' since there is an object in that path. However, the directory does not actually exist!
So, your Airflow job might be deleting all the objects in a given path, which causes the directory to disappear. This is quite okay and nothing to be worried about. However, if the Create Folder button was used to create the folder, then the folder will still exist when all objects are deleted (assuming that the delete operation does not also delete the zero-length object).
So I'm moving a lot of files within folders to the layer above all the folders. Essentially what I need to happen is if a file was removed from a folder (to the layer above), then delete the folder it was removed from. Something like:
for file in files:
print(file)
shutil.move(file, downloads_path)
moved = shutil.move(file, downloads_path)
if moved is True:
os.remove(downloads_folder_path)
shutil.move will return you a string of the new file name if successful and raise an Exception if either the source or the target wasn't found.
So actually there is no need to check for its return value at all. You can just proceed with moving everything and then delete the original folder.
I cannot figure out the best way to find a specific folder and send files to another specific folder, especially if the users directory is slightly different than what I have coded.
I'm working on a program that has a folder of content to grab from and the user basically picks items and when they're done, it creates a folder full of things including the images they chose. I've gotten it to work (and creating all necessary folders in the user's directory works fine but once it becomes more complex, it doesn't work some of the time) but I would like it to work every time, regardless of the user and where they've placed my program on their computer.
an example of relevant code I currently have which I'm sure is redundant compared to what I could be using instead:
init python:
import os
import shutil
current_dir = os.path.normpath(os.getcwd() + "../../")
def grab_content():
filetocopy = "image%s.png"%image_choice ##(this is a separate variable within the program that determines if it is image1.png, image2.png etc)
file_path = os.path.join(current_dir, "Program folder", "stuff", "content")
images_path = os.path.join(file_path, "images")
new_images_path = os.path.join(current_dir, "My Templates", anothervariable_name, "game", "template", "image_choices")
try:
shutil.copy(images_path + "\\" + filetocopy, new_images_path)
except:
print("error")
(folders listed in this have been checked if existing and placed if not - but not for the new file path due to that needing to be in a specific place within the main folder)
It either works if I have the files set up just right (for my own machine which defeats the purpose) or it doesn't do anything or I get an error saying the path doesn't exist. I have code prior to this that creates the folders needed but I'm trying to grab images from the folder that belongs to the actual program and put them (only ones I specify) into a new folder I create through the program.
Would i use os.walk? I was looking at all the os code but this is my first time dealing with any of it so any advice is helpful.
In python, I understand that I can delete multiple files with the same name using the following command for eg:
for f in glob.glob("file_name_*.txt"):
os.remove(f)
And that a single directory can be deleted with shutil.rmtree('/path/to/dir') - and that this command will delete the directory even if the directory is not empty. On the other hand, os.rmdir() needs that the directory be empty.
I actually want to delete multiple directories with the same name, and they are not empty. So, I am looking for something like
shutil.rmtree('directory_*')
Is there a way to do this with python?
You have all of the pieces: glob() iterates, and rmtree() deletes:
for path in glob.glob("directory_*"):
shutil.rmtree(path)
This will throw OSError if one of the globbed paths names a file, or for any other reason that rmtree() can fail. You can add error handling as you see fit, once you decide how you want to handle the errors. It doesn't make sense to add error handling unless you know what you want to do with the error, so I have left error handling out.
I'm trying to get a homemade path navigation function working - basically I need to go through one folder, and explore every folder within it, running a function within each folder.
I reach a problem when I try to change directories within a for loop. I've got this "findDirectories" function:
def findDirectories(list):
for files in os.listdir("."):
print (files)
list.append(files)
os.chdir("y")
That last line causes the problems. If I remove it, the function just compiles a list with all the folders in that folder. Unfortunately, this means I have to run this each time I go down a folder, I can't just run the whole thing once. I've specified the folder "y" as that's a real folder, but the program crashes upon opening even with that. Doing os.chdir("y") outside of the for loop has no issues at all.
I'm new to Python, but not to programming in general. How can I get this to work, or is there a better way? The final result I need is running a Function on each single "*Response.xml" file that exists within this folder, no matter how deeply nested it is.
Well, you don't post the traceback of the actual error but clearly it doesn't work as you have specified y as a relative path.
Thus it may be able to change to y in the first iteration of the loop, but in the second it will be trying to change to a subdirectory of y that is also called y
Which you probably do not have.
You want to be doing something like
import os
for dirName, subDirs, fileNames in os.walk(rootPath):
# its not clear which files you want, I assume anything that ends with Response.xml?
for f in fileNames:
if f.endswith("Response.xml"):
# this is the path you will want to use
filePath = os.path.join(dirName, f)
# now do something with it!
doSomethingWithFilePath(filePath)
Thats untested, but you have the idea ...
As Dan said, os.walk would be better. See the example there.