I have six directories the follow the format
\home\mydir\myproject\2012-01-23_03-01-34
\home\mydir\myproject\2012-01-11_01-00-57
\home\mydir\myproject\2010-01-11_01-00-57
\home\mydir\myproject\2010-01-11_01-00-54
\home\mydir\myproject\2010-01-08_01-00-54
Note, the datetime as the final directory. It is exactly this format and it is meant to indicate the time the directory it was created Now they all cotain the file name myfile.xml. I want to parse out the latest and greatest myfile.xml. Does python have any magic where it can tell the latest (i.e. most up to date directory) from the name format of the directory I am using? If it does not, does it have any magic where it can tell by the file timestamps who is the most up to date? The OS is windows?
Another way of looking at this is that the most up to date directory will also have the highest number.
Thanks.
If you have those directory names in a list dirs, then max(dirs) will give you the latest.
For getting OS information as to the age of the files see http://docs.python.org/release/2.5.2/lib/module-stat.html - if you really need the "most up to date", and there's a chance the files in the directories could be modified, and so considered more up-to-date than files in directories with later names, going by what the OS says is more robust. If only the creation age given by the folder is relevant then #Greg Hewgill has you covered.
Related
I am in the process of making an automated script. Basically I am downloading some shapefiles, unzipping them and them making a few changes each month. Each month I download the same dataset.
An issue I have found is that the dataset name changes each month after I download it, I'm not sure how i can point the script too it if the name changes? I don't really want to have to update the script with the new file path each month.
For example November was
L:\load\Ten\Nov20\NSW\FME_68185551_1604301077137_7108\GSNSWDataset
And Dec is
L:\load\Ten\Dec20\NSW\FME_68185551_1606880934716_1252\GSNSWDataset
You could use glob with a wildcard in the changing number section. Something like:
import glob
import datetime
d = datetime.today().strftime('%b%y') #'Dec20'
fil = glob.glob("L:/load/Ten/%s/NSW/FME*/GSNSWDataset" % (d))[0]
This should get you the correct path to your files and then you just read/manipulate however you need.
I have a directory that creates a new subfolder each day, each subfolder's name always starts with the date it was created (i.e. MMDDYY). I need to prompt the user for the date of the file they need (something they'd already have) and search for a subfolder that has a matching prefix in the name. The rest of the folder name can be ignored.
If a folder with the correct prefix is found there will be a similar prompt to locate files in the folder that have a name leading with a 5 digit number that the user would also have. Those files just need copied to a new location. I'm just getting stuck on how to locate a subfolder when I only have the prefix to the folder name and same with the file inside that folder once it's found.
For example, I'm looking for a file that generated on 1/10/2019, the file name starts with 42333. The full folder name would be something like 01102019CHA71H2HBMNN. There would be two files that are found, one with a full file name that might be 42333aaabc.xrf and the other would be 42333aaabc with no file extension. These file names could exist in multiple other folders but usually I need them for specific dates.
If I understood correctly, you need a algorithm that the input is a prefix (a string).
In Python you can make "membership" tests with strings, for example:
>>> string = "A long string"
>>> "long" in string
True
Your algorithm would work with something like:
"If {prefix as string} in {directory/file name as string}:
do something"
But if your question is how to list files inside a directory, you can do this by two libraries:
os
subprocess (by calling "ls" in Linux or "dir" in Windows)
Or you could use, also the re library which is for regular expressions. It's a bit complex but way more flexible.
Good source for debugging RegEx: https://regexr.com/
For learning RegEx in Python: https://www.w3schools.com/python/python_regex.asp
Best wishes, pal
For learning
I am working through the renameDates project in Automate the Boring stuff. It's supposed to match dates formated the American way and change them to the European format.
The thing I don't understand about this code is, how does it find the correct directory?
I can't find any code that sets the current working directory to the file I need it to work in, but the script seems written assuming the default current working directory is where it should work.
Is it something simple like, running the script from the file I want to search for regular expressions in will make Python set that file as the CWD?
#! python3
# renameDates.py - renames filenames with American MM-DD-YYYY date format
# to European DD-MM-YYYY.
import shutil, os, re
# Create a regex that matches files with the American date format.
datePattern=re.compile(r"""^(.*?) # all text before the date
((0|1)?\d)- # one or two digits for the month
((0|1|2|3)?\d)- # on or two digits for the day
((19|20)\d\d) #four digits for the year
(.*?)$ # all text after the date
""", re.VERBOSE)
# loop over the files in the working directory.
for amerFilename in os.listdir('.'):
mo=datePattern.search(amerFilename)
# Skip files without a date.
if mo==none:
continue
# Get the different parts of the filename.
beforePart=mo.group(1)
monthPart=mo.group(2)
dayPart=mo.group(4)
yearPart=mo.group(6)
afterPart=mo.group(8)
# Form the European-style filename.
euroFilename=beforePart+dayPart+'-'+monthPart+'-'+yearPart+afterPart
# Get the full, absolute file paths.
absWorkingDir=os.path.abspath('.')
amerFilename=os.path.join(absWorkingDir, amerFilename)
euroFilename=os.path.join(absWorkingDir, euroFilename)
# Rename the files.
print('Renaming "%s" to "%s%...' % (amerFilename, euroFilename))
#shutil.move(amerFilename,euroFilename) #uncomment after testing
Hey do you run your code from Terminal or interpreter? It uses the current working directory. Normaly it's the directory where you have been before starting the script / python interpreter... You can check your current working directory with this code... Hope this helps you:
import os
print(os.getcwd())
You can change the working directory with:
os.chdir(path)
you can create a txt file with a date as a name like 11-11-2011 in your working directory and when you run the program you'll see the name has changed
I am using Python 3.5 to analyze data contained in csv files. These files are contained in a "figs" directory, which is contained in a case directory, which is contained in an overall data directory, e.g.:
/strm1/serino/DATA/06052009/figs
Or more generally:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs
The directory I am starting in is '/strm1/serino/DATA/,' and each subdirectory is the month, day, and year of a case I am working with. Each subdirectory contains another subdirectory named 'figs,' and that is the location of each case's csv file. To be exact:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs/case_date_in_MMDDYYYY.csv
So, I would like to start in my DATA directory and go through its subdirectories to find those that have the MMDDYYYY naming. However, some of the case directories may be named with a state abbreviation at the end, like: '06052009_TX.' Therefore, instead of matching the MMDDYYYY naming exactly, it could be something as simple as verifying that the directory name contains any number 1 through 9.
Once I am in the first subdirectory (the case directory) I would like to move into the 'figs' subdirectory. Once there, I want to access the csv file with the same naming convention as the first subdirectory (the case directory). I will fill existing arrays with the data contained in each csv file.
Basically, my question concerns navigating through multiple subdirectories that match a certain naming convention and ultimately accessing the data file at the "end." I was naively playing around with glob, fnmatch, os.listdir, and os.walk, but I could not get anything close enough to working that I feel would be helpful to include. I am not very familiar with those modules. What I can include is what I am going for:
for dirs in data_dir that contain a number:
go into this directory
go into 'figs' directory
read data from the csv file whose name matches its case directory name (or whose name format matches the case directory name format)
I have come across related questions, but I have not been able to apply their answers in the way that I would like, especially with nested directories. I really appreciate the help, and let me know if I need to clarify anything.
The following should get you going. It uses the datetime.strptime() function to attempt to convert each folder name into a valid datetime object. If the conversion fails, then you know that the folder name is not in the correct format and can be skipped. It then attempts to parse any CSV file found in the corresponding fig folder:
from datetime import datetime
import glob
import csv
import os
dirpath, dirnames, filenames = next(os.walk('/strm1/serino/DATA'))
for dirname in dirnames:
if len(dirname) >= 8:
try:
dt = datetime.strptime(dirname[:8], '%m%d%Y')
print(dt, dirname)
csv_folder = os.path.join(dirpath, dirname)
for csv_file in glob.glob(os.path.join(csv_folder, 'figs', '*.csv')):
with open(csv_file, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
print(row)
except ValueError as e:
pass
You listed several problems above. Which one are you stuck on? It seems like you already know how to navigate the file storage system using os.path. You may not know of the function os.path.join() which allows you to manually specify a file path relative to a file as such:
os.path.abspath(os.path.join(os.path.dirname(__file__), '../..', 'Data/TrailShelters/'))
To break down the above:
os.path.dirname(__file__) returns the path of the current file. '../..' means: go up two levels in the folder hierarchy. And Data/TrailShelters/ is the directory I wish to navigate to.
How does this apply to your particular case? Well, you will need to make some adaptations but you can store the os.path of the parent directory in a variable. Then you can essentially use a while sub_dir is not null loop to iterate through subdirectories. For every subdirectory you will want to examine its os.path and extract the particular part of the path you are interested in. Then you can simply use something like: if 'TN' in subdirectory_name to determine if it is a subdirectory you are interested in. If so; then update the saved os.path of the parent directory by appending the path to the subdirectory. Does that make any sense?
I have some homework that I am trying to complete. I don't want the answer. I'm just having trouble in starting. The work I have tried is not working at all... Can someone please just provide a push in the right direction. I am trying to learn but after trying and trying I need some help.
I know I can you os.path.basename() to get the basename and then add it to the file name but I can't get it together.
Here is the assignment
In this project, write a function that takes a directory path and creates an archive of the directory only. For example, if the same path were used as in the example ("c:\\xxxx\\Archives\\archive_me"), the zipfile would contain archive_me\\groucho, archive_me\\harpo and archive_me\\chico.
The base directory (archive_me in the example above) is the final element of the input, and all paths recorded in the zipfile should start with the base directory.
If the directory contains sub-directories, the sub-directory names and any files in the sub-directories should not be included. (Hint: You can use isfile() to determine if a filename represents a regular file and not a directory.)
Thanks again any direction would be great.
It would help to know what you tried yourself, so I'm only giving a few pointers to methods in the standard libraries:
os.listdir to get the a list of files and folders under a given directory (beware, it returns only the file/folder name, not the full path!)
os.path.isfile as mentioned in the assignment to check if a given path represents a file or a folder
os.path.isdir, the opposite of os.path.isfile (thanks inspectorG4adget)
os.path.join to join a filename with the basedir without having to worry about slashes and delimiters
ZipFile for handling, well, zip files
zipFile.write to write the files found to the zip
I'm not sure you'll need all of those, but it doesn't hurt knowing they exist.