How to move files with similar filename into new folder in Python - python

I have a list of files like this in the images folder.
and How can I create a new folder if there are multiple files with a similar name and move those similar files to that folder?
I am new to python.
Here is my expectation:

Try this:
import glob
from pathlib import Path
for fn in Path("Images").glob("*"):
file_base_name = "_".join(fn.stem.split("_")[:-1])
file_count = len(glob.glob1("Images", f"{file_base_name}*"))
if file_count > 1 or Path(file_base_name).is_dir():
outdir = Path("Images") / file_base_name
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
Input:
Output:
Please ignore file names extension. I create those just to test my code

In this case you don't even need re:
from pathlib import Path
for fn in Path("Images").glob("*.jpg"):
outdir = Path("Images") / "_".join(fn.stem.split("_")[:-1])
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
What's going on here?
Pathlib is how you want to think of paths if you can. It combines most of the os.path apis. Specifically:
glob gets us all the files matching the glob in the path
mkdir makes the directory (only if it doesn't exist)
rename moves the file there

I am unable to test since I don't have your files. My suggestion would be to comment out the mkdir command and the shutil.move command and replace them with print statements to see what commands would be generated before letting it run for real. But I think it should work.
import pathlib
import os
import re
from itertools import groupby
import shutil
source_dir = 'Images'
files = [os.path.basename(f) for f in pathlib.Path(source_dir).glob('*.jpg')]
def keyfunc(file):
m = re.match('^(.*?)_\d+.jpg$', file)
return m[1]
matched_files = [file for file in files if re.search(r'_\d+.jpg$', file)]
matched_files.sort()
for k, g in groupby(matched_files, keyfunc):
new_dir = os.path.join(source_dir, k)
if not os.path.exists(new_dir):
os.mkdir(new_dir)
for file in g:
shutil.move(os.path.join(source_dir, file), new_dir)

Related

Find a file 1 folder level down

I am trying to get full_path of places.sqlite file present in '%APPDATA%\Mozilla\Firefox\Profiles\<random_folder>\places.sqlite' using Python OS module. The issue as you can see that <random_folder> has a random name and there could be multiple folders inside the Profiles folder.
How do I navigate/find the path to the places.sqlite file?
You would ideally want to go through each folder to search for this file. In terminal 'locate file_name' command would do this for you. In python file you can use the following command:
import os
db_path = os.path.join(os.getenv('APPDATA'), r'Mozilla\Firefox\Profiles')
def find_file(file_name, path):
for root_folder, directory, file_names in os.walk(path):
if file_name in file_names:
return os.path.join(root_folder, file_name)
print(find_file('places.sqlite', db_path))
os.walk gives a list of all files in a path recusivly. Use it to search for 'places.sqlite' as follows.
path = ""
for root, dirs, files in os.walk("%APPDATA%\\Mozilla\\Firefox\\Profiles\\"):
if "places.sqlite" in files:
path = os.path.join(root, 'places.sqlite')
break
Use the os module to list out all directories in %APPDATA%\Mozilla\Firefox\Profiles\
loop over the directories until you find places.sqlite file (also using os module)
A glob might be simpler as in this case one expects the file to be there in level below the Profiles folder or not there at all.
import os
import pathlib
profiles = pathlib.Path(os.environ["APPDATA"]) / "Mozilla" / "Firefox" / "Profiles"
# rglob will recursively search as well
if places := list(profiles.rglob("places.sqlite")):
print(places[0]) # will print the sqllite file path
with places[0].open() as f:
# ....

How to move from one directory to another and delete only '.html' files in python?

I attended an interview and they asked me to write a script to move from one directory to another and delete only the .html files.
Now I tried to do this at first using os.remove() . Following is the code:
def rm_files():
import os
from os import path
folder='J:\\Test\\'
for files in os.listdir(folder):
file_path=path.join(folder,files)
os.remove(file_path)
The problem I am facing here is that I cannot figure out how to delete only .html files in my directory
Then I tried using glob. Following is the code:
def rm_files1():
import os
import glob
files=glob.glob('J:\\Test\\*.html')
for f in files:
os.remove(f)
Using glob I can delete the .html files but still I cannot figure out how to implement the logic of moving from one directory to another.
And along with that can someone please help me figure out how to delete a specific file type using os.remove() ?
Thank you.
Either of these methods should work. For the first way, you could just string.endswith(suffix) like so:
def rm_files():
import os
from os import path
folder='J:\\Test\\'
for files in os.listdir(folder):
file_path=path.join(folder,files)
if file_path.endswith(".html"):
os.remove(file_path)
Or if you prefer glob, moving directories is fairly straightforward: os.chdir(path) like this:
def rm_files1():
import os
os.chdir('J:\\Test')
import glob
files=glob.glob('J:\\Test\\*.html')
for f in files:
os.remove(f)
Though it seems unnecessary since glob is taking an absolute path anyway.
Your problem can be described in the following steps.
move to specific directory. This can be done using os.chdir()
grab list of all *.html files. Use glob.glob('*.html')
remove the files. use os.remove()
Putting it all together:
import os
import glob
import sys
def remove_html_files(path_name):
# move to desired path, if it exists
if os.path.exists(path_name):
os.chdir(path_name)
else:
print('invalid path')
sys.exit(1)
# grab list of all html files in current directory
file_list = glob.glob('*.html')
#delete files
for f in file_list:
os.remove(f)
#output messaage
print('deleted '+ str(len(file_list))+' files in folder' + path_name)
# call the function
remove_html_files(path_name)
To remove all html files in a directory with os.remove() you can do like this using endswith() function
import sys
import os
from os import listdir
directory = "J:\\Test\\"
test = os.listdir( directory )
for item in test:
if item.endswith(".html"):
os.remove( os.path.join( directory, item ) )

How to delete a file by extension in Python?

I was messing around just trying to make a script that deletes items by ".zip" extension.
import sys
import os
from os import listdir
test=os.listdir("/Users/ben/downloads/")
for item in test:
if item.endswith(".zip"):
os.remove(item)
Whenever I run the script I get:
OSError: [Errno 2] No such file or directory: 'cities1000.zip'
cities1000.zip is obviously a file in my downloads folder.
What did I do wrong here? Is the issue that os.remove requires the full path to the file? If this is the issue, than how can I do that in this current script without completely rewriting it.
You can set the path in to a dir_name variable, then use os.path.join for your os.remove.
import os
dir_name = "/Users/ben/downloads/"
test = os.listdir(dir_name)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(dir_name, item))
For this operation you need to append the file name on to the file path so the command knows what folder you are looking into.
You can do this correctly and in a portable way in python using the os.path.join command.
For example:
import os
directory = "/Users/ben/downloads/"
test = os.listdir( directory )
for item in test:
if item.endswith(".zip"):
os.remove( os.path.join( directory, item ) )
Alternate approach that avoids join-ing yourself over and over: Use glob module to join once, then let it give you back the paths directly.
import glob
import os
dir = "/Users/ben/downloads/"
for zippath in glob.iglob(os.path.join(dir, '*.zip')):
os.remove(zippath)
I think you could use Pathlib-- a modern way, like the following:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.glob(dir / "*.zip")
for zf in zip_files:
zf.unlink()
If you want to delete all zip files recursively, just write so:
import pathlib
dir = pathlib.Path("/Users/ben/downloads/")
zip_files = dir.rglob(dir / "*.zip") # recursively
for zf in zip_files:
zf.unlink()
Just leaving my two cents on this issue: if you want to be chic you can use glob or iglob from the glob package, like so:
import glob
import os
files_in_dir = glob.glob('/Users/ben/downloads/*.zip')
# or if you want to be fancy, you can use iglob, which returns an iterator:
files_in_dir = glob.iglob('/Users/ben/downloads/*.zip')
for _file in files_in_dir:
print(_file) # just to be sure, you know how it is...
os.remove(_file)
origfolder = "/Users/ben/downloads/"
test = os.listdir(origfolder)
for item in test:
if item.endswith(".zip"):
os.remove(os.path.join(origfolder, item))
The dirname is not included in the os.listdir output. You have to attach it to reference the file from the list returned by said function.
Prepend the directory to the filename
os.remove("/Users/ben/downloads/" + item)
EDIT: or change the current working directory using os.chdir.

match filenames to foldernames then move files

I have files named "a1.txt", "a2.txt", "a3.txt", "a4.txt", "a5.txt" and so on. Then I have folders named "a1_1998", "a2_1999", "a3_2000", "a4_2001", "a5_2002" and so on.
I would like to make the conection between file "a1.txt" & folder "a1_1998" for example. (I'm guessing I'll need a regular expresion to do this). then use shutil to move file "a1.txt" into folder "a1_1998", file "a2.txt" into folder "a2_1999" etc....
I've started like this but I'm stuck because of my lack of understanding of regular expresions.
import re
##list files and folders
r = re.compile('^a(?P')
m = r.match('a')
m.group('id')
##
##Move files to folders
I modified the answer below slightly to use shutil to move the files, did the trick!!
import shutil
import os
import glob
files = glob.glob(r'C:\Wam\*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[7:-4]
# find the matching directory
dir = glob.glob(r'C:\Wam\%s_*/' % first_part)[0]
shutil.move(file, dir)
You do not need regular expressions for this.
How about something like this:
import glob
files = glob.glob('*.txt')
for file in files:
# this will remove the .txt extension and keep the "aN"
first_part = file[:-4]
# find the matching directory
dir = glob.glob('%s_*/' % first_part)[0]
os.rename(file, os.path.join(dir, file))
A slight alternative, taking into account Inbar Rose's suggestion.
import os
import glob
files = glob.glob('*.txt')
dirs = glob.glob('*_*')
for file in files:
filename = os.path.splitext(file)[0]
matchdir = next(x for x in dirs if filename == x.rsplit('_')[0])
os.rename(file, os.path.join(matchdir, file))

Search files in folder using part of the name and save/copy to different folder using Python

I have 700 files in a single folder. I need to find files that have "h10v03" as part of the name and copy them to a different folder using python.
Heres an example of one of the files: MOD10A1.A2000121.h10v03.005.2007172062725.hdf
I appreciate any help.
Something like this would do the trick.
import os
import shutil
source_dir = "/some/directory/path"
target_dir = "/some/other/directory/path"
part = "h10v03"
files = [file for file in os.listdir(source_dir)
if os.path.isfile(file) and part in file]
for file in files:
shutil.copy2(os.path.join(source_dir, file), target_dir)
Does it need to be python?
A unix shell does that for you quite fine:
cp ./*h10v03* /other/directory/
In python I would suggest you take a look at os.listdir() and shutil.copy()
EDIT:
some untested code:
import os
import shutil
src_dir = "/some/path/"
target_dir = "/some/other/path/"
searchstring = "h10v03"
for f in os.listdir(src_dir):
if searchstring in f and os.path.isfile(os.path.join(src_dir, f)):
shutil.copy2(os.path.join(src_dir, f), target_dir)
print "COPY", f
with the glob module (untested):
import glob
import os
import shutil
for f in glob.glob("/some/path/*2000*h10v03*"):
print f
shutil.copy2(f, os.path.join("/some/target/dir/", os.path.basename(f)))
Firstly, find all the items in that folder with os.listdir. Then you can use the count() method of string to determine if it has your string. Then you can use shutil to copy the file.

Categories