Organising Files as per timestamp in python - python

Have a task where i have to sort files into groups using their time frames
Suppose i have files like “ 2018/12/12 11:32:34 xyz.txt “
And the task that i have is That I have to use python to first extract the timeframe from the files and order them into groups by creating directories firstly as per years then as per months and finally the all the files inside their respective years and months
Like for the file in above egs it should be in a path like
Files/2018/December/file.txt
Just need help regarding which libraries to use andnd how to approach the problem

So, you want to copy files with timestamp names to the directories. Our input data:
Directory with timestamp files
Directory, where we need to save our files.
We ask a user to input those directories, and after that, we need to open directory 1 (directory with timestamp files) and read each one step by step.
We can get all filenames in the directory and iterate in like in the list, parse date in filename. We get the first filename and split the filename by space. Now we have a date and a time in the first array element and in the second.
Now we will use the datetime library to transform date and time to datetime. After doing that we can easily get a year, month, day, etc.
Now we can check year and month and create a folder connected with that year and month. If a folder not exists, we can create it. After that - use copy to copy file in that folder
EDIT. My solution:
import glob
import os
from datetime import datetime
from shutil import copyfile
def getListOfFilenamesInFolder(src):
filenamesWithPath = glob.glob(f"{src}\\*.txt")
filenames = [filename.split('\\')[-1] for filename in filenamesWithPath]
return filenames
def parseFilename(filename):
splittedFilename = filename.split(' ')
dateFilename = splittedFilename[0]
timeFilename = splittedFilename[1]
datetimeFilename = datetime.strptime(f'{dateFilename} {timeFilename}', '%Y-%m-%d %H.%M.%S')
return datetimeFilename
def createFolderIfNotExist(dest, datetimeFilename):
path = os.path.join(dest, 'File')
if not os.path.exists(path):
os.mkdir(path)
path = os.path.join(path, str(datetimeFilename.year))
if not os.path.exists(path):
os.mkdir(path)
path = os.path.join(path, datetimeFilename.strftime("%b"))
if not os.path.exists(path):
os.mkdir(path)
return path
pass
def makeSortingByFilenames(src, dest):
listOfFilenames = getListOfFilenamesInFolder(src)
print(listOfFilenames)
for filename in listOfFilenames:
datetimeFilename = parseFilename(filename)
path = createFolderIfNotExist(dest, datetimeFilename)
copyfile(os.path.join(src, filename), os.path.join(path, ' '.join(filename.split(' ')[2:])))
if __name__ == '__main__':
srcDirectory = input()
destDirectory = input()
makeSortingByFilenames(srcDirectory, destDirectory)
I have that structure folder. If you have another filenames, you need to change datetimeFilename = datetime.strptime(f'{dateFilename} {timeFilename}', '%Y-%m-%d %H.%M.%S') to your specific filenames.
Screenshots:
My input and output:
Results:

Related

Create folder by files year

I have a lot of pictures in a paste following a pattern for the file name, they only differ in the file type which may be .jpg or .jpeg
For instance:
IMG-20211127-WA0027.jpg
IMG-20211127-WA0028.jpeg
IMG-20211127-WA0029.jpg
I'm trying to find a way to create a folder for each year and send the pictures for the respective folder, given that the file name already has its year.
How can I create folders for each year, move the files to the right folder?
I tried to adapt a code from a tutorial, but I'm not getting what I need.
Please see my code below :
from distutils import extension
import os
import shutil
path = "D:\WhatsApp Images"
files = os.listdir(path)
year = os.path.getmtime(path)
for file in files:
filename, extension = os.path.splitext(file)
extension = extension[1:]
if os.path.exists(path+'/'+extension):
shutil.move(path+'/'+file, path+'/'+extension+'/'+file)
else:
os.makedirs(path+'/'+extension)
shutil.move(path+'/'+file,path+'/'+extension+'/'+file)
You can try something like this: See the inline comments for an explanation.
from pathlib import Path
import shutil
import os
path = Path("D:\\WhatsApp Images") # path to images
for item in path.iterdir(): # iterate through images
if not item.suffix.lower() in [".jpg", "jpeg"]: # ensure each file is a jpeg
continue
parts = item.name.split("-")
if len(parts) > 1 and len(parts[1]) > 5:
year = parts[1][:4] # extract year from filename
else:
continue
if not os.path.exists(path / year): # check if directory already exists
os.mkdir(path / year) # if not create the directory
shutil.move(item, path / year / item.name) # copy the file to directory.
I like #alexpdev 's answer, but you can do this all within pathlib alone:
from pathlib import Path
path_to_your_images = "D:\\WhatsApp Images"
img_types = [".jpg", ".jpeg"] # I'm assuming that all your images are jpegs. Extend this list if not.
for f in Path(path_to_your_images).iterdir():
if not f.suffix.lower() in img_types:
# only deal with image files
continue
year = f.stem.split("-")[1][:4]
yearpath = Path(path_to_your_images) / year # create intended path
yearpath.mkdir(exist_ok = True) # make sure the dir exists; create it if it doesn't
f.rename(yearpath / f.name) # move the file to the new location

Is there a way to loop through folder structure to find file name?

I want to be able to run a script that looks for a specific file name that contains text and a date that is 3 days later and returns a yes/no response based on the findings. I want to call a powershell script that would do this from a master python script. Basically I want the script to look in a subfolder called "PACP" and find a file with the name test_%date%_deliverable.mdb for example, and if say, its misspelled, to return a line noting the error. Are there any examples of scripts like this?
You don't have to do this by hand with Python. You can use the glob module like this:
import glob
date = "2021-01-01"
for f in glob.glob(f"PACP/**/test_{date}_deliverable.mdb"):
# do something with the file matching the pattern
print(f)
Depending what you want to do with the files you can also use the pathlib module. The Path objects also support the same syntax with their glob() method.
Something like that?
import os
from datetime import timedelta, date
# get the working directory
dir_path = os.path.dirname(os.path.realpath(__file__))
# get current date
today = date.today()
formattedDate = today.strftime("%d-%m-%Y")
# search for file
for root, dirs, files in os.walk(dir_path + "/PACP"):
for file in files:
if file == "text_" + formattedDate + "_deliverable.mdb":
print("found file")
quit()
print("file not found")
quit()
This for checking misspelled file names in directory
from pathlib import Path
import re
for path in Path('PACP').rglob('*.mdb'):
m = re.match(r"(test)_(\d{6})_(deliverable)", path.name)
if m is None:
print(path.name)
for example using a list of files from directory
files=[
'test_210127_deliverable.mdb',
'ttes_210127_derivrablle.mdb', #
'tset_2101327_deliveraxxle.mdb',#
'test_210128_deliverable.mdb',
'test_210127_seliverable.mdb',#
'test_2101324_deliverable.mdb']#
for s in files:
m = re.match(r"(test)_(\d{6})_(deliverable)",s)
if m is None: print(s)
misspelled output:
ttes_210127_derivrablle.mdb
tset_2101327_deliveraxxle.mdb
test_210127_seliverable.mdb
test_2101324_deliverable.mdb
or a much more precise date matching yyyy mm dd with a choice of 3 consistent separators: _,-,.
files=[
'test_2021-01-27_deliverable.mdb',
'test_2021.01.27_deliverable.mdb',
'test_2021_01_27_deliverable.mdb',
'test_2021-21-27_deliverable.mdb',
'test_2021-01-72_deliverable.mdb',
'tets_2021-01-27_deliverable.mdb',
'test_2021-01-27_delivvrable.mdb']
for s in files:
m = re.match(r"(test)_(19|20)\d\d([-._])(0[1-9]|1[012])\3(0[1-9]|[12][0-9]|3[01])_(deliverable)",s)
if m is None: print(s)
misspelled output (more precise in date matching)
test_2021-21-27_deliverable.mdb
test_2021-01-72_deliverable.mdb
tets_2021-01-27_deliverable.mdb
test_2021-01-27_delivvrable.mdb

A purge script - saving and deleting files according to it date

I'm working on a purge script in Python. The following script is working fine. The script browse a list of directories and, according to each file date, remove or keep any files. So if any files exist after X days or minutes, these files are removed.
purge_files_path = ['x:/path/destination', 'x:/path/destination', ]
kwargs = {"weeks":0, "days": 0, "hours": 0, "minutes": 23}
def purge(purge_files_path, kwargs):
import os
from datetime import datetime, timedelta
for folder in purge_files_path:
for(dir, _, files) in os.walk(folder):
for file in files:
path = os.path.join(dir, file)
if os.path.exists(path):
file_date = datetime.fromtimestamp(os.path.getmtime(path))
delta = datetime.now() - file_date
if delta >= timedelta(**kwargs):
os.remove(path)
Now, I just want to add a new functionality. I want, before to remove a file, to save it in another folder. It could be easy, but I want to keep the folder architecture of any files.
So if a file which has to be purged and located on:
c:/source/folder1/folder2/file.txt
It has to be back up on :
c:/backup/folder1/folder2/file.txt
I could use shutil.copytree, but I have an error of existing folders.
Yes, just use os.rename instead of os.remove. You might want to use relpath to preserve directory structure. Just replace your call to os.remove(path) with the following snippet.
relative_path = os.path.relpath(path, folder))
backup_path = os.path.join(backup_dir, relative_path)
os.makedirs(os.path.dirname(backup_path), exist_ok=True)
os.rename(path, backup_path)
Note: If the directories are on different volumes, you need to use shutil.move instead of os.rename.
To get python 2 support, you need to drop the exists_ok=True like so
if not os.path.isdir(os.path.dirname(backup_path)):
os.makedirs(os.path.dirname(backup_path))

Python: List files in subdirectories with specific extension created in last 24 hours

First of all, I'm new to programming and Python in particular, therefore I'm struggling to find a right solution.
I'm trying to search the files with specific extension recursively which have been created only in last 24 hours and either print the result to the screen, save to the file, and copy those files to directory.
Below is an example the code which does most of what I would like to achieve, except it finds all files with given extension, however, I need only files created in last 24 or less hours.
import os
import shutil
topdir = r"C:\Docs"
dstdir = r"C:\test"
exten = ".png"
for dname, names, files in os.walk(topdir):
for name in files:
if name.lower().endswith(exten):
# Prints result of walk
print(os.path.join(dname, name))
#copy all files with given extension to the dst folder
path = os.path.realpath(os.path.join(dname, name))
shutil.copy2(path, dstdir)
compare_date = datetime.datetime.today() - datetime.timedelta(hours = 24)
Inside nested loop, you can add these code
create_dt = os.stat(name).st_mtime
created_date = datetime.datetime.fromtimestamp(create_dt)
if created_date > compare_date:
print name

Using python to filter files on disk

I am using below to remove files from disk.
def match_files(dir, pattern):
for dirname, subdirs, files in os.walk(dir):
for f in files:
if f.endswith(pattern):
yield os.path.join(dirname, f)
# Remove all files in the current dir matching *.txt
for f in match_files(dn, '.txt'):
os.remove(f)
What I would to remove files from disk that "was not updated today." List the files from today. Check against to update list.
Besides os.stat you could use os.path.getmtime or os.path.getctime, the pro's / con's of which are discussed on this question. You can use datetime.datetime.fromtimestamp to convert the timestamp returned into a datetime object, and then you can do whatever you want. In this example I'll remove files not modified today, create a list of remaining files:
from datetime import datetime, timedelta
today = datetime.now().date()
remaining = []
for f in match_files(dn, '.txt'):
mtime = datetime.fromtimestamp(os.path.getmtime(f)).date()
if mtime != today:
os.remove(f)
else:
remaining.append(f)
What is "pattern" ?
Otherwise, the "os.stat" gives the date of the file. Here a sample with the "last mod" date.
stats = os.stat(file)
lastmod_date = time.localtime(stats[8])

Categories