How can I scan through a directory in python? - python

I have a python script that is trying to compare two files to each other and output the difference. However I am not sure what exactly is going on as when I run the script it gives me an error as
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:\\api\\API_TEST\\Apis.os\\*.*'
I dont know why it is appending * . * at the end of the file extention.
This is currently my function:
def CheckFilesLatest(self, previous_path, latest_path):
for filename in os.listdir(latest_path):
previous_filename = os.path.join(previous_path, filename)
latest_filename = os.path.join(latest_path, filename)
if self.IsValidOspace(latest_filename):
for os_filename in os.listdir(latest_filename):
name, ext = os.path.splitext(os_filename)
if ext == ".os":
previous_os_filename = os.path.join(previous_filename, os_filename)
latest_os_filename = os.path.join(latest_filename, os_filename)
if os.path.isfile(latest_os_filename) == True:
# If the file exists in both directories, check if the files are different; otherwise mark the contents of the latest file as added.
if os.path.isfile(previous_os_filename) == True:
self.GetFeaturesModified(previous_os_filename, latest_os_filename)
else:
self.GetFeaturesAdded(latest_os_filename)
else:
if os.path.isdir(latest_filename):
self.CheckFilesLatest(previous_filename, latest_filename)
Any thoughts on why it cant scan the directory and look for an os file for example?
It is failing on line:
for os_filename in os.listdir(latest_filename):
The code first gets called from
def main():
for i in range(6, arg_length, 2):
component = sys.argv[i]
package = sys.argv[i+1]
previous_source_dir = os.path.join(previous_path, component, package)
latest_source_dir = os.path.join(latest_path, component, package)
x.CheckFilesLatest(previous_source_dir, latest_source_dir)
x.CheckFilesPrevious(previous_source_dir, latest_source_dir)
Thank you

os.listdir() requires that the latest_path argument be a directory as you have stated. However, latest_path is being passed in as an argument. Thus, you need to look at the code that actually creates latest_path in order to determine why the '.' is being put in. Since you are calling it recursively, first check the original call (the first time). It would appear that your base code that calls CheckFilesLatest() is trying to set up the search command to find all files within the directory 'C:\api\API_TEST\Apis.os' You would need to split out the file indicator first and then do the check.

If you want to browse a directory recursively, using os.walk would be better and simpler than your complex handling with recursive function calls. Take a look at the docs: http://docs.python.org/2/library/os.html#os.walk

Related

Python on Linux: Iterating through a list with If Elif Elif Elif when the list is unknown

I am trying to make a python script that automatically moves files from my internal drive to any usb drive that is plugged in. However this destination path is unpredictable because I am not using the same usb drives everytime. With Raspbian Buster full version, the best I can do so far is automount into /media/pi/xxxxx, where that xxxxxx part is unpredictable. I am trying to make my script account for that. I can get the drive mounting points with
drives = os.listdir("/media/pi/")
but I am worried some will be invalid because of not being unmounted before they're yanked out (I need to run this w/o a monitor or keyboard or VNC or any user input other than replacing USB drives). So I'd think I'd need to do a series of try catch statements perhaps in an if elif elif elif chain, to make sure that the destination is truly valid, but I don't know how to do that w/o hardcoding the names in. The only way I know how to iterate thru a set of names I don't know is
for candidate_drive in drives:
but I don't know how to make it go onto the next candidate drive only if the current one is throwing an exception.
System: Raspberry Pi 4, Raspbian buster full, Python 3.7.
Side note: I am also trying this on Buster lite w/ usbmount, which does have predictable mounting names, but I can't get exfat and ntfs to mount and that is question for another post.
Update: I was thinking about this more and maybe I need a try, except, else statement where the except is pas and the else is break? I am trying it out.
Update2: I rethought my problem and maybe instead of looking for the exception to determine when to try the next possible drive, perhaps I could instead look for a successful transfer and break the loop if so.
import os
import shutil
files_to_send = os.listdir("/home/outgoing_files/")
source_path = "/home/outgoing_files/"
possible_USB_drives = os.listdir("/media/")
for a_possible_drive in possible_USB_drives:
try:
destpath = os.path.join("/media/", a_possible_drive)
for a_file in files_to_send:
shutil.copy(source_path + a_file, destpath)
except:
pass # transfer to possible drive didn't work
else:
break # Stops this entire program bc the transfer worked!
If you have an unsigned number of directories in side of a directory, etc... You cannot use nested for cicles. You need to implement a recursive call function. If you have directories inside a directory, you would like to review all the directories, this is posible iterating over the list of directories using a same function that iterate over them, over and over, until it founds the file.
Lets see an example, you have a path structure like this:
path
dir0
dir2
file3
dir1
file2
file0
file1
You have no way to now how many for cicles are required to iterate over al elements in this structure. You can call an iteration (for cicle) over all elements of a single directory, and do the same to the elements inside that elements. In this structure dirN where N is a number means a directory, and fileN means a file.
You can use os.listdir() function to get the contents of a directory:
print(os.listdir("path/"))
returns:
['dir0', 'dir1', 'file0.txt', 'file1.txt']
Using this function with all the directories you can get all the elements of the structure. You only need to iterate over all the directories. Specificly directories, because if you use a file:
print(os.listdir("path/file0.txt"))
you get an error:
NotADirectoryError: [WinError 267]
But, remember that in Python exists the generator expressions.
String work
If you have a mainpath you need to get access to a a directory inside this with a full string reference: "path/dirN". You cannot access directly to the file that does not is in the scope of the .py script:
print(os.listdir("dir0/"))
gets an error
FileNotFoundError: [WinError 3]
So you need to always format the initial mainpath with the actual path, in this way you can get access to al the elements of the structure.
Filtering
I said that you could use an generator expression to get just the directories of the structure, recursively. Lets take a look to a function:
def searchfile(paths: list,search: str):
for path in paths:
print("Actual path: {}".format(path))
contents = os.listdir(path)
print("Contents: {}".format(contents))
dirs = ["{}{}/".format(path,x) for x in contents if os.path.isdir("{}/{}/".format(path,x)) == True]
print("Directories: {} \n".format(dirs))
searchfile(dirs,search)
In this function we are getting the contents of the actual path, with os.listdir() and then filtering it with a generator expression. Obviusly we use recursive function call with the dirs of the actual path: searchfile(dirs,search)
This algorithm can be applied to any file structure, because the path argument is a list. So you can iterate over directories with directories with directories, and that directories with more directories inside of them.
If you want to get an specific file you could use the second argument, search. Also you can implement a conditional and get the specific path of the file found:
if search in contents:
print("File found! \n")
print("{}".format(os.path.abspath(search)))
sys.exit()
I hope have helped you.

A way to create files and directories without overwriting

You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.

Finding files in directories in Python

I've been doing some scripting where I need to access the os to name images (saving every subsequent zoom of the Mandelbrot set upon clicking) by counting all of the current files in the directory and then using %s to name them in the string after calling the below function and then adding an option to delete them all
I realize the below will always grab the absolute path of the file but assuming we're always in the same directory is there not a simplified version to grab the current working directory
def count_files(self):
count = 0
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(someext):
count += 1
return count
def delete_files(self):
for files in os.listdir(os.path.abspath(__file__))):
if files.endswith(.someext):
os.remove(files)
Since you're doing the .endswith thing, I think the glob module might be of some interest.
The following prints all files in the current working directory with the extension .py. Not only that, it returns only the filename, not the path, as you said you wanted:
import glob
for fn in glob.glob('*.py'): print(fn)
Output:
temp1.py
temp2.py
temp3.py
_clean.py
Edit: re-reading your question, I'm unsure of what you were really asking. If you wanted an easier way to get the current working directory than
os.path.abspath(__file__)
Then yes, os.getcwd()
But os.getcwd() will change if you change the working directory in your script (e.g. via os.chdir(), whereas your method will not.
Using antipathy* it gets a little easier:
from antipathy import Path
def count_files(pattern):
return len(Path(__file__).glob(pattern))
def deletet_files(pattern):
Path(__file__).unlink(pattern)
*Disclosure: I'm the author of antipathy.
You can use os.path.dirname(path) to get the parent directory of the thing path points to.
def count_files(self):
count = 0
for files in os.listdir(os.path.dirname(os.path.abspath(__file__)))):
if files.endswith(someext):
count += 1
return count

Prevent my file from being overwritten - python

i am currently creating a file on run of my application using the simple method
file = open('myfile.dat', 'w+')
however i have noticed that this is overwritting the file on each run, what i want to do is if it already exsists, create a new file called myfilex.dat where x is the number of previous copies of the file, is there a quick and effective way of doing this ?
Thanks :)
EDIT : I know how to check it already exists using the os.path.exists function, but i am am asking if it does exist how can i apend the number of versions on the end easy if that makes sense sorry if it does not
You could use a timestamp, so that each time you will execute the program it will write to a different file:
import time
file = open('myfile.%d.dat' % time.time(), 'w+')
You can do two things, either Open with append that is file = open('myfile.dat', 'a') or check if file exists and give user option to overwrite. Python have number of option. You can check this question for enlightment
How do I check whether a file exists using Python?
Consider
import os
def build_filename(name, num=0):
root, ext = os.path.splitext(name)
return '%s%d%s' % (root, num, ext) if num else name
def find_next_filename(name, max_tries=20):
if not os.path.exists(name): return name
else:
for i in range(max_tries):
test_name = build_filename(name, i+1)
if not os.path.exists(test_name): return test_name
return None
If your filename doesn't exist, it'll return your filename.
If your filename does exist, it'll try rootX.extension where root and extension are determined by os.path.splittext and X is an integer, starting at 1 and ending at max_tries (I had it default to 20, but you could change the default or pass a different argument).
If no file can be found, the function returns None.
Note, there are still race conditions present here (a file is created by another process with a clashing name after your check), but its what you said you wanted.
# When the files doesn't exist
print find_next_filename('myfile.dat') # myfile.dat
# When the file does exist
print find_next_filename('myfile.dat') # myfile1.dat
# When the file does exist, as does "1" and "2"
print find_next_filename('myfile.dat') # myfile3.dat
Nothing particularly quick, but effective? Sure! I'm used to a backup system where I do:
filename.ext
filename-1.ext # older
filename-2.ext # older still
filename-3.ext # even older
This is slightly harder than what you want to do. You want filename-N.ext to be the NEWEST file! Let's use glob to see how many files match that name, then make a new one!
from glob import glob
import os.path
num_files = len(glob.glob(os.path.join(root, head, filename + "*", ext)))
# where:
# root = r"C:\"
# head = r"users\username\My Documents"
# filename = "myfile"
# ext = "dat"
if num_files = 0:
num_files = "" # handles the case where file doesn't exist AT ALL yet
with open(os.path.join(root, head, filename + str(num_files), ext), 'w+'):
do_stuff_to_file()
Here is a few solutions for everyone experiencing a similar problem.
Keep YOUR program from overwiting data:
with open('myfile.txt', 'a') as myfile:
myfile.write('data')
Note: I believe that a+ (not a) allows for reading and writing, but I'm not 100% sure.
Prevent ALL programs from overwriting your data (by setting it to read-only):
from os import chmod
from stat import S_IREAD
chmod('path_to_file', IREAD)
Note: both of these modules are built-in to Python (at least Python 3.10.4) so no need to use pip.
Note 2: Setting it to read-only is not the best idea, as programs can set it back. I would combine this with a hash and/or signature to verify the file has not been tampered with to 'invalidate' the data inside and require the user to re-generate the file (eg, to store any temporary but very important data like decryption keys after generating them before deleting them).
Just check to see if your file already exists then?
name = "myfile"
extension =".dat"
x = 0
fileName = name + extension
while(!os.path.exists(fileName)):
x = x + 1
fileName = name + x + extension
file = open(fileName, 'w+')

Python automated file names

I want to automate the file name used when saving a spreadsheet using xlwt. Say there is a sub directory named Data in the folder the python program is running. I want the program to count the number of files in that folder (# = n). Then the filename must end in (n+1). If there are 0 files in the folder, the filename must be Trial_1.xls. This file must be saved in that sub directory.
I know the following:
import xlwt, os, os.path
n = len([name for name in os.listdir('.') if os.path.isfile(name)])
counts the number of files in the same folder.
a = n + 1
filename = "Trial_" + "a" + ".xls"
book.save(filename)
this will save the file properly named in to the same folder.
My question is how do I extend this in to a sub directory? Thanks.
os.listdir('.') the . in this points to the directory from where the file is executed. Change the . to point to the subdirectory you are interested in.
You should give it the full path name from the root of your file system; otherwise it will be relative to the directory from where the script is executed. This might not be what you want; especially if you need to refer to the sub directory from another program.
You also need to provide the full path to the filename variable; which would include the sub directory.
To make life easier, just set the full path to a variable and refer to it when needed.
TARGET_DIR = '/home/me/projects/data/'
n = sum(1 for f in os.listdir(TARGET_DIR) if os.path.isfile(os.path.join(TARGET_DIR, f)))
new_name = "{}Trial_{}.xls".format(TARGET_DIR,n+1)
You actually want glob:
from glob import glob
DIR = 'some/where/'
existing_files = glob(DIR + '*.xls')
filename = DIR + 'stuff--%d--stuff.xls' % (len(existing_files) + 1)
Since you said Burhan Khalid's answer "Works perfectly!" you should accept it.
I just wanted to point out a different way to compute the number. The way you are doing it works, but if we imagine you were counting grains of sand or something would use way too much memory. Here is a more direct way to get the count:
n = sum(1 for name in os.listdir('.') if os.path.isfile(name))
For every qualifying name, we get a 1, and all these 1's get fed into sum() and you get your count.
Note that this code uses a "generator expression" instead of a list comprehension. Instead of building a list, taking its length, and then discarding the list, the above code just makes an iterator that sum() iterates to compute the count.
It's a bit sleazy, but there is a shortcut we can use: sum() will accept boolean values, and will treat True as a 1, and False as a 0. We can sum these.
# sum will treat Boolean True as a 1, False as a 0
n = sum(os.path.isfile(name) for name in os.listdir('.'))
This is sufficiently tricky that I probably would not use this without putting a comment. But I believe this is the fastest, most efficient way to count things in Python.

Categories