Moving oldest file names with different extensions to common folder. Python - python

I have a folder that contains 15 .jpg files and 15 .pdf files. The file names are the same with just the extensions being different. Example ABC123.jpg and ABC123.pdf. I have spent the better part of the last few days trying to use shutil to move the oldest .pdf file to a new folder then finding the matching .jpg file and moving it to the same folder as the .pdf. I was able to move the oldest file or move all files of a given type. Just couldn't get the oldest of a specific type. I tried moving all .pdfs to a new folder1 and all .jpgs to a new folder2 and then moving oldest from each of those to a common folder. However, they don't always match. The oldest .jpg might be different than the oldest .pdf. I am sure there is a simple solution, I have just been working it in circles so long I can no longer see the forest through the trees.

Use the os.path.getmtime function as the key to sort your files.
import os
def oldest_file(dir, type):
return min([name for name in os.listdir(dir) if name.endswith(type)], key=lambda name: os.path.getmtime(os.path.join(dir, name)))
print(oldest('/your/folder', '.jpg'))
If you need to search the entire tree, use os.walk instead of os.listdir:
import os
from itertools import chain
def oldest_file(dir, type):
return min(list(chain(*[[os.path.join(root, file) for file in files if file.endswith(type)] for root, _, files in os.walk(dir)])), key=lambda file: os.path.getmtime(file))
print(oldest('/your/folder', '.jpg'))
I'm sure you can handle the rest of the code that deals with moving files.

I found oldest_file_in_tree from this answer.
import os
import shutil
def oldest_file_in_tree(rootfolder, extension=".avi"):
return min(
(os.path.join(dirname, filename)
for dirname, dirnames, filenames in os.walk(rootfolder)
for filename in filenames
if filename.endswith(extension)),
key=lambda fn: os.stat(fn).st_mtime)
oldest_pdf = oldest_file_in_tree('/var/somedir', '.pdf')
name = oldest_pdf[:4]
matching_jpg = '{}.jpg'.format(name)
shutil.move("/var/somedir/{}.pdf".format(name), "path/to/new/destination/{}.pdf".format(name))
shutil.move("/var/somedir/{}.jpg".format(name), "path/to/new/destination/{}.jpg".format(name))

Here is how I was able to make it work..
import os, shutil
import glob
todir = '/var/somedir/'
def oldest_file_in_tree(rootfolder, extension=".pdf"):
return min(
(os.path.join(dirname, filename)
for dirname, dirnames, filenames in os.walk(rootfolder)
for filename in filenames
if filename.endswith(extension)),
key=lambda fn: os.stat(fn).st_mtime)
oldest_g3d = oldest_file_in_tree('/var/somedir/', '.pdf')
name = oldest_pdf[:-4]
matching_jpg = '{}.jpg'.format(name)
shutil.move(oldest_pdf, todir)
shutil.move(matching_jpg, todir)

Related

Move pairs of files (.txt & .xml) into their corresponding folder using Python

I have been working this challenge for about a day or so. I've looked at multiple questions and answers asked on SO and tried to 'MacGyver' the code used for my purpose, but still having issues.
I have a directory (lets call it "src\") with hundreds of files (.txt and .xml). Each .txt file has an associated .xml file (let's call it a pair). Example:
src\text-001.txt
src\text-001.xml
src\text-002.txt
src\text-002.xml
src\text-003.txt
src\text-003.xml
Here's an example of how I would like it to turn out so each pair of files are placed into a single unique folder:
src\text-001\text-001.txt
src\text-001\text-001.xml
src\text-002\text-002.txt
src\text-002\text-002.xml
src\text-003\text-003.txt
src\text-003\text-003.xml
What I'd like to do is create an associated folder for each pair and then move each pair of files into its respective folder using Python. I've already tried working from code I found (thanks to a post from Nov '12 by Sethdd, but am having trouble figuring out how to use the move function to grab pairs of files. Here's where I'm at:
import os
import shutil
srcpath = "PATH_TO_SOURCE"
srcfiles = os.listdir(srcpath)
destpath = "PATH_TO_DEST"
# grabs the name of the file before extension and uses as the dest folder name
destdirs = list(set([filename[0:9] for filename in srcfiles]))
def create(dirname, destpath):
full_path = os.path.join(destpath, dirname)
os.mkdir(full_path)
return full_path
def move(filename, dirpath):
shutil.move(os.path.join(srcpath, filename)
,dirpath)
# create destination directories and store their names along with full paths
targets = [
(folder, create(folder, destpath)) for folder in destdirs
]
for dirname, full_path in targets:
for filename in srcfile:
if dirname == filename[0:9]:
move(filename, full_path)
I feel like it should be easy, but Python isn't something I work with everyday and it's been a while since my scripting days... Any help would be greatly appreciated!
Thanks,
WK2EcoD
Use the glob module to interate all of the 'txt' files. From that you can parse and create the folders and copy the files.
The process should be as simple as it appears to you as a human.
for file_name in os.listdir(srcpath):
dir = file_name[:9]
# if dir doesn't exist, create it
# move file_name to dir
You're doing a lot of intermediate work that seems to be confusing you.
Also, insert some simple print statements to track data flow and execution flow. It appears that you have no tracing output so far.
You can do it with os module. For every file in directory check if associated folder exists, create if needed and then move the file. See the code below:
import os
SRC = 'path-to-src'
for fname in os.listdir(SRC):
filename, file_extension = os.path.splitext(fname)
if file_extension not in ['xml', 'txt']:
continue
folder_path = os.path.join(SRC, filename)
if not os.path.exists(folder_path):
os.mkdir(folderpath)
os.rename(
os.path.join(SRC, fname),
os.path.join(folder_path, fname)
)
My approach would be:
Find the pairs that I want to move (do nothing with files without a pair)
Create a directory for every pair
Move the pair to the directory
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os, shutil
import re
def getPairs(files):
pairs = []
file_re = re.compile(r'^(.*)\.(.*)$')
for f in files:
match = file_re.match(f)
if match:
(name, ext) = match.groups()
if ext == 'txt' and name + '.xml' in files:
pairs.append(name)
return pairs
def movePairsToDir(pairs):
for name in pairs:
os.mkdir(name)
shutil.move(name+'.txt', name)
shutil.move(name+'.xml', name)
files = os.listdir()
pairs = getPairs(files)
movePairsToDir(pairs)
NOTE: This script works when called inside the directory with the pairs.

match filenames to foldernames then move files

I have files named "a1.txt", "a2.txt", "a3.txt", "a4.txt", "a5.txt" and so on. Then I have folders named "a1_1998", "a2_1999", "a3_2000", "a4_2001", "a5_2002" and so on.
I would like to make the conection between file "a1.txt" & folder "a1_1998" for example. (I'm guessing I'll need a regular expresion to do this). then use shutil to move file "a1.txt" into folder "a1_1998", file "a2.txt" into folder "a2_1999" etc....
I've started like this but I'm stuck because of my lack of understanding of regular expresions.
import re
##list files and folders
r = re.compile('^a(?P')
m = r.match('a')
m.group('id')
##
##Move files to folders
I modified the answer below slightly to use shutil to move the files, did the trick!!
import shutil
import os
import glob
files = glob.glob(r'C:\Wam\*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[7:-4]
# find the matching directory
dir = glob.glob(r'C:\Wam\%s_*/' % first_part)[0]
shutil.move(file, dir)
You do not need regular expressions for this.
How about something like this:
import glob
files = glob.glob('*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[:-4]
# find the matching directory
dir = glob.glob('%s_*/' % first_part)[0]
os.rename(file, os.path.join(dir, file))
A slight alternative, taking into account Inbar Rose's suggestion.
import os
import glob
files = glob.glob('*.txt')
dirs = glob.glob('*_*')
for file in files:
filename = os.path.splitext(file)[0]
matchdir = next(x for x in dirs if filename == x.rsplit('_')[0])
os.rename(file, os.path.join(matchdir, file))

All Files in Dir & Sub-Dir

I would like to find all the files in a directory and all sub-directories.
code used:
import os
import sys
path = "C:\\"
dirs = os.listdir(path)
filename = "C.txt"
FILE = open(filename, "w")
FILE.write(str(dirs))
FILE.close()
print dirs
The problem is - this code only lists files in directories, not sub-directories. What do I need to change in order to also list files in subdirectories?
To traverse a directory tree you want to use os.walk() for this.
Here's an example to get you started:
import os
searchdir = r'C:\root_dir' # traversal starts in this directory (the root)
for root, dirs, files in os.walk(searchdir):
for name in files:
(base, ext) = os.path.splitext(name) # split base and extension
print base, ext
which would give you access to the file names and the components.
You'll find the functions in the os and os.path module to be of great use for this sort of work.
This function will help you: os.path.walk() http://docs.python.org/library/os.path.html#os.path.walk

Find all CSV files in a directory using Python

How can I find all files in directory with the extension .csv in python?
import os
import glob
path = 'c:\\'
extension = 'csv'
os.chdir(path)
result = glob.glob('*.{}'.format(extension))
print(result)
from os import listdir
def find_csv_filenames( path_to_dir, suffix=".csv" ):
filenames = listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith( suffix ) ]
The function find_csv_filenames() returns a list of filenames as strings, that reside in the directory path_to_dir with the given suffix (by default, ".csv").
Addendum
How to print the filenames:
filenames = find_csv_filenames("my/directory")
for name in filenames:
print name
By using the combination of filters and lambda, you can easily filter out csv files in given folder.
import os
all_files = os.listdir("/path-to-dir")
csv_files = list(filter(lambda f: f.endswith('.csv'), all_files))
# lambda returns True if filename (within `all_files`) ends with .csv or else False
# and filter function uses the returned boolean value to filter .csv files from list files.
use Python OS module to find csv file in a directory.
the simple example is here :
import os
# This is the path where you want to search
path = r'd:'
# this is the extension you want to detect
extension = '.csv'
for root, dirs_list, files_list in os.walk(path):
for file_name in files_list:
if os.path.splitext(file_name)[-1] == extension:
file_name_path = os.path.join(root, file_name)
print file_name
print file_name_path # This is the full path of the filter file
I had to get csv files that were in subdirectories, therefore, using the response from tchlpr I modified it to work best for my use case:
import os
import glob
os.chdir( '/path/to/main/dir' )
result = glob.glob( '*/**.csv' )
print( result )
import os
path = 'C:/Users/Shashank/Desktop/'
os.chdir(path)
for p,n,f in os.walk(os.getcwd()):
for a in f:
a = str(a)
if a.endswith('.csv'):
print(a)
print(p)
This will help to identify path also of these csv files
While solution given by thclpr works it scans only immediate files in the directory and not files in the sub directories if any. Although this is not the requirement but just in case someone wishes to scan sub directories too below is the code that uses os.walk
import os
from glob import glob
PATH = "/home/someuser/projects/someproject"
EXT = "*.csv"
all_csv_files = [file
for path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(path, EXT))]
print(all_csv_files)
Copied from this blog.
Use the python glob module to easily list out the files we need.
import glob
path_csv=glob.glob("../data/subfolrder/*.csv")
You could just use glob with recursive = true, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories.
import glob, os
os.chdir("C:\\Users\\username\\Desktop\\MAIN_DIRECTORY")
for file in glob.glob("*/.csv", recursive = true):
print(file)
This solution uses the python function filter. This function creates a list of elements for which a function returns true. In this case, the anonymous function used is partial matching '.csv' on every element of the directory files list obtained with os.listdir('the path i want to look in')
import os
filepath= 'filepath_to_my_CSVs' # for example: './my_data/'
list(filter(lambda x: '.csv' in x, os.listdir('filepath_to_my_CSVs')))
Many (linked) answers change working directory with os.chdir(). But you don't have to.
Recursively print all CSV files in /home/project/ directory:
pathname = "/home/project/**/*.csv"
for file in glob.iglob(pathname, recursive=True):
print(file)
Requires python 3.5+. From docs [1]:
pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif)
pathname can contain shell-style wildcards.
Whether or not the results are sorted depends on the file system.
If recursive is true, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories
[1] https://docs.python.org/3/library/glob.html#glob.glob
You could just use glob with recursive = True, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories.
import glob, os
os.chdir("C:\\Users\\username\\Desktop\\MAIN_DIRECTORY")
for file in glob.glob("*/*.csv", recursive = True):
print(file)
Please use this tested working code. This function will return a list of all the CSV files with absolute CSV file paths in your specified path.
import os
from glob import glob
def get_csv_files(dir_path, ext):
os.chdir(dir_path)
return list(map(lambda x: os.path.join(dir_path, x), glob(f'*.{ext}')))
print(get_csv_files("E:\\input\\dir\\path", "csv"))

Finding most recently edited file in python

I have a set of folders, and I want to be able to run a function that will find the most recently edited file and tell me the name of the file and the folder it is in.
Folder layout:
root
Folder A
File A
File B
Folder B
File C
File D
etc...
Any tips to get me started as i've hit a bit of a wall.
You should look at the os.walk function, as well as os.stat, which can let you do something like:
import os
max_mtime = 0
for dirname,subdirs,files in os.walk("."):
for fname in files:
full_path = os.path.join(dirname, fname)
mtime = os.stat(full_path).st_mtime
if mtime > max_mtime:
max_mtime = mtime
max_dir = dirname
max_file = fname
print max_dir, max_file
It helps to wrap the built in directory walking to function that yields only full paths to files. Then you can just take the function that returns all files and pick out the one that has the highest modification time:
import os
def all_files_under(path):
"""Iterates through all files that are under the given path."""
for cur_path, dirnames, filenames in os.walk(path):
for filename in filenames:
yield os.path.join(cur_path, filename)
latest_file = max(all_files_under('root'), key=os.path.getmtime)
If anyone is looking for an one line way to do it:
latest_edited_file = max([f for f in os.scandir("path\\to\\search")], key=lambda x: x.stat().st_mtime).name
use os.walk to list files
use os.stat to get file modified timestamp (st_mtime)
put both timestamps and filenames in a list and sort it by timestamp, largest timestamp is most recently edited file.
For multiple files, if anyone came here for that:
import glob, os
files = glob.glob("/target/directory/path/*/*.mp4")
files.sort(key=os.path.getmtime)
for file in files:
print(file)
This will print all files in any folder within /path/ that have the .mp4 extension, with the most recently modified file paths at the bottom.
You can use
os.walk
See: http://docs.python.org/library/os.html
Use os.path.walk() to traverse the directory tree and os.stat().st_mtime to get the mtime of the files.
The function you pass to os.path.walk() (the visit parameter) just needs to keep track of the largest mtime it's seen and where it saw it.
I'm using path = r"C:\Users\traveler\Desktop":
import os
def all_files_under(path):
#"""Iterates through all files that are under the given path."""
for cur_path, dirnames, filenames in os.walk(path):
for filename in filenames:
yield os.path.join(cur_path, filename)
latest_file = max(all_files_under('root'), key=os.path.getmtime)
What am i missing here?

Categories