In the following code, I need to iterate through files in a directory with long names and spaces in paths.
def avg_dmg_acc(path):
for d in os.listdir(path):
sub_path = path + '/' + d
if os.path.isdir(sub_path):
if d.startswith('Front'):
for f in os.listdir(sub_path):
fpath = r"%s" % sub_path + '/' + f
print(fpath)
print(os.path.exists(fpath))
df = pd.read_csv(fpath)
Then I ran the function providing the argument path:
path = r"./Mid-Con Master dd3d5c56-581c-42e0-acde-04e7feed3bb8/620138 91852327-e08d-4ed1-9774-383c888cb04e/Power End 2d41ba63-dfb9-4984-a5a5-153997fea43a"
avg_dmg_acc(path)
However I am getting file not exist error:
File b'./Mid-Con Master dd3d5c56-581c-42e0-acde-04e7feed3bb8/620138 91852327-e08d-4ed1-9774-383c888cb04e/Power End 2d41ba63-dfb9-4984-a5a5-153997fea43a/Front c41f42ce-7158-4371-8cf6-82d1bcf04787/Damage Accumulation f907a97a-6d2d-40f6-ba02-0bc0599b773b.csv' does not exist
As you can see, I am already using r"path" since I read it somewhere it handles spaces in path. Also the path was constructed manually in this version, e.g. sub_path = path + '/' + d but I tried to use os.path.join(path, d) originally and it didn't work. I also tried Path from pathlib since it is the recommended way in Python 3 and still the same. At one point I tried to use os.path.abspath instead of the relative path I am using now with ./ but it still says file not exist.
Why is it not working? Is it because the path is too long or spaces are still not dealt with correctly?
It turns out it is the length of the path that is causing this problem. I tried to reduce the folder name of the lowest level one character at a time and got to the point where os.path.exists(fpath) changed from false to true. I think I will need to rename all the folder names before processing
Related
Let's say I have a folder like this.
/home/user/dev/Project/media/image_dump/images/02_car_folder
Everything after the media directory should be kept. The remaining should be removed.
/media/image_dump/images/02_car_folder
I was originally doing it this way but as more subdirectories were added to different folders started generating invalid filepaths
split_absolute = [os.sep.join(os.path.normpath(y).split(os.sep)[-2:]) for y in absolute_path]
The problem this causes is that once you start going deeper, the media path is cut out of the filepath all together.
So if I went into
media/image_dump/images/02_car_folder/
The filepath now becomes this, when it needs to include everything up to /media.
/images/02_car_folder
What are some ways to actually handle this? I won't know users filepaths will be leading up to media, but I know that everything after media is what should be kept regardless, no matter how deep their folders go.
I think you can achieve what you want quite easily using Path.parts:
from pathlib import Path
path = "/home/user/dev/Project/media/image_dump/images/02_car_folder"
parts = Path(path).parts
stripped_path = Path(*parts[parts.index("media"):])
Result:
>>> print(stripped_path)
media/image_dump/images/02_car_folder
Actually you don't need to use some path specific libraries.
Just work with strings:
※ note → the weak point of working with paths as strings is that you need to handle many edge cases by yourself (for example if path will be media/blahblah/blahblah2 or /blahblah/blahblah2/media). pathlib solving these cases out of the box.
import os
full_path1 = "/home/user/dev/Project/media/image_dump/images/02_car_folder"
full_path2 = "/home/user/dev/Project/media/image_dump/media/images/02_car_folder"
separator_dir = os.path.sep + "media" + os.path.sep
print(f'Separate by {separator_dir}')
if separator_dir in full_path1:
separated_path1 = os.path.sep + separator_dir.join(full_path1.split(separator_dir)[1:])
else:
separated_path1 = full_path1
if separator_dir in full_path2:
separated_path2 = os.path.sep + separator_dir.join(full_path2.split(separator_dir)[1:])
else:
separated_path2 = full_path2
print(f'Full path 1 is {full_path1}')
print(f'Full path 2 is {full_path2}')
print(f'Separated path 1 is {separated_path1}')
print(f'Separated path 2 is {separated_path2}')
First path has one media folder
Second path has two media folders, but use only first for path cutting
Separate by /media/
Full path 1 is /home/user/dev/Project/media/image_dump/images/02_car_folder
Full path 2 is /home/user/dev/Project/media/image_dump/media/images/02_car_folder
Separated path 1 is /image_dump/images/02_car_folder
Separated path 2 is /image_dump/media/images/02_car_folder
You could also use a regex, concise and easy:
path = '/home/user/dev/Project/media/image_dump/images/02_car_folder'
import re
re.search('/media/.*', path).group(0)
Output: '/media/image_dump/images/02_car_folder'
If the presence of media is unsure:
m = re.search('/media/.*', path)
m.group(0) if m else None # or any default you want
If you want the first / to be optional if media is at the beginning, use '(?:/|^)media/.*'
I'm trying to write a basic backup script from one folder to another, and I got it to work - but the directory structure was not being copied over, just the files. I'm trying to copy in the subfolder as well, so that, for example, c:\temp\docs\file.txt goes to d:\temp\docs\file.txt instead of just d:\temp\file.txt
My issue exists in indentation with my if/else statement, but everything looks good to me. What am I doing wrong?
import datetime, time, string, os, shutil
COPY_FROM_LOCATION = 'C:\\xampp\\htdocs\\projects'
folder_date = time.strftime("%Y-%m-%d")
BACKUP_TO_LOCATION = 'D:\\BACKUP\\' + folder_date
#Create a new directory in D:\BACKUP based on today's date so the folder you're trying to copy to actually exists:
if not os.path.exists(BACKUP_TO_LOCATION):
os.makedirs(BACKUP_TO_LOCATION)
#copy function
def backup(source_folder, target_folder):
for subdir, dirs, files in os.walk(source_folder):
if subdir == source_folder :
new_target_folder = target_folder
else:
folder_name = subdir.split("C:\\xampp\\htdocs\\projects\\",1)[-1]
new_target_folder = target_folder + "\\" + folder_name
for file in files:
print "backing up: " + folder_name
shutil.copy2(os.path.join(subdir, file), new_target_folder)
backup(COPY_FROM_LOCATION,BACKUP_TO_LOCATION)
Here's the error I'm getting:
File "backup.py", line 15
new_target_folder = target_folder
^
IndentationError: expected an indented block
You're intermixing tabs and spaces.
Use one or the other, not both. Preferably spaces.
This error typically means there is an error in indentation. Check you don't mix tabs and spaces.
You can use https://www.pylint.org/ to detect them or if it something simple paste the code at http://pep8online.com, it will show you what you can enhance.
what's up with the weirdness with the space before the semi-colon? I've not seen it done that way before, that appears to be where this script is choking up.
change
if subdir == source_folder :
with
if subdir == source_folder:
I've got a script that will accurately tell me how many files are in a directory, and the subdirectories within. However, I'm also looking into identify how many folders there are within the same directory and its subdirectories...
My current script:
import os, getpass
from os.path import join, getsize
user = 'Copy of ' + getpass.getuser()
path = "C://Documents and Settings//" + user + "./"
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
file_counter = sum([len(files) for r, d, files in os.walk(path)])
print ' [*] ' + str(file_counter) + ' Files were found and ' + str(folder_counter) + ' folders'
This code gives me the print out of: [*] 147 Files were found and 147 folders.
Meaning that the folder_counter isn't counting the right elements. How can I correct this so the folder_counter is correct?
Python 2.7 solution
For a single directory and in you can also do:
import os
print len(os.walk('dir_name').next()[1])
which will not load the whole string list and also return you the amount of directories inside the 'dir_name' directory.
Python 3.x solution
Since many people just want an easy and fast solution, without actually understanding the solution, I edit my answer to include the exact working code for Python 3.x.
So, in Python 3.x we have the next method instead of .next. Thus, the above snippet becomes:
import os
print(len(next(os.walk('dir_name'))[1]))
where dir_name is the directory that you want to find out how many directories has inside.
I think you want something like:
import os
files = folders = 0
for _, dirnames, filenames in os.walk(path):
# ^ this idiom means "we won't be using this value"
files += len(filenames)
folders += len(dirnames)
print "{:,} files, {:,} folders".format(files, folders)
Note that this only iterates over os.walk once, which will make it much quicker on paths containing lots of files and directories. Running it on my Python directory gives me:
30,183 files, 2,074 folders
which exactly matches what the Windows folder properties view tells me.
Note that your current code calculates the same number twice because the only change is renaming one of the returned values from the call to os.walk:
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
# ^ here # ^ and here
file_counter = sum([len(files) for r, d, files in os.walk(path)])
# ^ vs. here # ^ and here
Despite that name change, you're counting the same value (i.e. in both it's the third of the three returned values that you're using)! Python functions do not know what names (if any at all; you could do print list(os.walk(path)), for example) the values they return will be assigned to, and their behaviour certainly won't change because of it. Per the documentation, os.walk returns a three-tuple (dirpath, dirnames, filenames), and the names you use for that, e.g. whether:
for foo, bar, baz in os.walk(...):
or:
for all_three in os.walk(..):
won't change that.
If interested only in the number of folders in /input/dir (and not in the subdirectories):
import os
folder_count = 0 # type: int
input_path = "/path/to/your/input/dir" # type: str
for folders in os.listdir(input_path): # loop over all files
if os.path.isdir(os.path.join(input_path, folders): # if it's a directory
folder_count += 1 # increment counter
print("There are {} folders".format(folder_count))
>>> import os
>>> len(list(os.walk('folder_name')))
According to os.walk the first argument dirpath enumerates all directories.
I have a basic file/folder structure on the Desktop where the "Test" folder contains "Folder 1", which in turn contains 2 subfolders:
An "Original files" subfolder which contains shapefiles (.shp).
A "Processed files" subfolder which is empty.
I am attempting to write a script which looks into each parent folder (Folder 1, Folder 2 etc) and if it finds an Original Files subfolder, it will run a function and output the results into the Processed files subfolder.
I made a simple diagram to showcase this where if Folder 1 contains the relevant subfolders then the function will run; if Folder 2 does not contain the subfolders then it's simply ignored:
I looked into the following posts but having some trouble:
python glob issues with directory with [] in name
Getting a list of all subdirectories in the current directory
How to list all files of a directory?
The following is the script which seems to run happily, annoying thing is that it doesn't produce an error so this real noob can't see where the problem is:
import os, sys
from os.path import expanduser
home = expanduser("~")
for subFolders, files in os.walk(home + "\Test\\" + "\*Original\\"):
if filename.endswith('.shp'):
output = home + "\Test\\" + "\*Processed\\" + filename
# do_some_function, output
I guess you mixed something up in your os.walk()-loop.
I just created a simple structure as shown in your question and used this code to get what you're looking for:
root_dir = '/path/to/your/test_dir'
original_dir = 'Original files'
processed_dir = 'Processed files'
for path, subdirs, files in os.walk(root_dir):
if original_dir in path:
for file in files:
if file.endswith('shp'):
print('original dir: \t' + path)
print('original file: \t' + path + os.path.sep + file)
print('processed dir: \t' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir)
print('processed file: ' + os.path.sep.join(path.split(os.path.sep)[:-1]) + os.path.sep + processed_dir + os.path.sep + file)
print('')
I'd suggest to only use wildcards in a directory-crawling script if you are REALLY sure what your directory tree looks like. I'd rather use the full names of the folders to search for, as in my script.
Update: Paths
Whenever you use paths, take care of your path separators - the slashes.
On windows systems, the backslash is used for that:
C:\any\path\you\name
Most other systems use a normal, forward slash:
/the/path/you/want
In python, a forward slash could be used directly, without any problem:
path_var = '/the/path/you/want'
...as opposed to backslashes. A backslash is a special character in python strings. For example, it's used for the newline-command: \n
To clarify that you don't want to use it as a special character, but as a backslash itself, you either have to "escape" it, using another backslash: '\\'. That makes a windows path look like this:
path_var = 'C:\\any\\path\\you\\name'
...or you could mark the string as a "raw" string (or "literal string") with a proceeding r. Note that by doing that, you can't use special characters in that string anymore.
path_var = r'C:\any\path\you\name'
In your comment, you used the example root_dir = home + "\Test\\". The backslash in this string is used as a special character there, so python tries to make sense out of the backslash and the following character: \T. I'm not sure if that has any meaning in python, but \t would be converted to a tab-stop. Either way - that will not resolve to the path you want to use.
I'm wondering why your other example works. In "C:\Users\me\Test\\", the \U and \m should lead to similar errors. And you also mixed single and double backslashes.
That said...
When you take care of your OS path separators and trying around with new paths now, also note that python does a lot of path-concerning things for you. For example, if your script reads a directory, as os.walk() does, on my windows system the separators are already processed as double backslashes. There's no need for me to check that - it's usually just hardcoded strings, where you'll have to take care.
And finally: The Python os.path module provides a lot of methods to handle paths, seperators and so on. For example, os.path.sep (and os.sep, too) wil be converted in the correct seperator for the system python is running on. You can also build paths using os.path.join().
And finally: The home-directory
You use expanduser("~") to get the home-path of the current user. That should work fine, but if you're using an old python version, there could be a bug - see: expanduser("~") on Windows looks for HOME first
So check if that home-path is resolved correct, and then build your paths using the power of the os-module :-)
Hope that helps!
I'm trying to use the current date as the file's name, but it seems either this can't be done or I'm doing something wrong. I used a variable as a name for a file before, but this doesn't seem to work.
This is what I tried:
import time
d = time.strftime("%d/%m/%Y")
with open(d +".txt", "a+") as f:
f.write("")
This is just to see if it create the file. As you can see I tried with a+ because I read that creates the file if it doesn't exist and I still get the same error.
The problem is with how you're using the date:
d = time.strftime("%d/%m/%Y")
You can't have a / in a filename, because that's a directory instead. You haven't made the directory yet. Try using hyphens instead:
d = time.strftime("%d-%m-%Y")
You almost certainly don't want to make directories in the structure day/month/year, so I assume that's not what you were intending.
You are including directory separators (/) in your filename, and those directories are not created for you when you try to open a file. There is either no 26/ directory or no 26/02/ directory in your current working path.
You'll either have to create those directories by other means, or if you didn't mean for the day and month to be directories, change your slashes to a different separator character:
d = time.strftime("%d-%m-%Y")