I have a list of files like this in the images folder.
and How can I create a new folder if there are multiple files with a similar name and move those similar files to that folder?
I am new to python.
Here is my expectation:
Try this:
import glob
from pathlib import Path
for fn in Path("Images").glob("*"):
file_base_name = "_".join(fn.stem.split("_")[:-1])
file_count = len(glob.glob1("Images", f"{file_base_name}*"))
if file_count > 1 or Path(file_base_name).is_dir():
outdir = Path("Images") / file_base_name
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
Input:
Output:
Please ignore file names extension. I create those just to test my code
In this case you don't even need re:
from pathlib import Path
for fn in Path("Images").glob("*.jpg"):
outdir = Path("Images") / "_".join(fn.stem.split("_")[:-1])
outdir.mkdir(exist_ok=True)
fn.rename(outdir / fn.name)
What's going on here?
Pathlib is how you want to think of paths if you can. It combines most of the os.path apis. Specifically:
glob gets us all the files matching the glob in the path
mkdir makes the directory (only if it doesn't exist)
rename moves the file there
I am unable to test since I don't have your files. My suggestion would be to comment out the mkdir command and the shutil.move command and replace them with print statements to see what commands would be generated before letting it run for real. But I think it should work.
import pathlib
import os
import re
from itertools import groupby
import shutil
source_dir = 'Images'
files = [os.path.basename(f) for f in pathlib.Path(source_dir).glob('*.jpg')]
def keyfunc(file):
m = re.match('^(.*?)_\d+.jpg$', file)
return m[1]
matched_files = [file for file in files if re.search(r'_\d+.jpg$', file)]
matched_files.sort()
for k, g in groupby(matched_files, keyfunc):
new_dir = os.path.join(source_dir, k)
if not os.path.exists(new_dir):
os.mkdir(new_dir)
for file in g:
shutil.move(os.path.join(source_dir, file), new_dir)
im creating script in python that get all files on disk, but no folders only files. Its my code.
import hashlib
import os
if os.name != "nt":
print("Sorry this script works only on Windows!")
path = "C://"
dir_list = os.listdir(path)
print(dir_list)
You can use for example the pathlib library and build something like that:
import pathlib
path = "" # add your path here don't forget a \ at the end on windows
for i in os.listdir(path):
if pathlib.Path(path + i).is_dir() is not True:
print(i)
It iterates through the current directory and checks if its a directory, by creating a Path object from the list entry and then checks on that object if it is a directory.
My code should find the newest and oldest files in a folder and its subfolders. It works for the top-level folder but it doesn't include files within subfolders.
import os
import glob
mypath = 'C:/RDS/*'
print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))
How do I make it recurse into the subfolders?
Try using pathlib, also getmtime gives the last modified time, you want the time file was created so use getctime
if you strictly want only files:
import os
import pathlib
mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))
print(max([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))
if results may include folders:
import os
import pathlib
mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))
print(max(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))
As the docs show, you can add a recursive=True keyword argument to glob.glob()
so your code becomes:
import os
import glob
mypath = 'C:/RDS/*'
print(min(glob.glob(mypath, recursive=True), key=os.path.getmtime))
print(max(glob.glob(mypath, recursive=True), key=os.path.getmtime))
This should give you the oldest and newest file in your folder and all its subfolders.
Pay attention to the os filepath separator: "/" (on unix) vs. "\" (on windows).
You can try something like below.
It saves the files list in a variable, it is faster than traversing twice the file system.
There is one line for debugging, comment it in production.
import os
import glob
mypath = 'D:\RDS\**'
allFilesAndFolders = glob.glob(mypath, recursive=True)
# just for debugging
print(allFilesAndFolders)
print(min(allFilesAndFolders, key=os.path.getmtime))
print(max(allFilesAndFolders, key=os.path.getmtime))
Here's a fairly efficient way of doing it. It determines the oldest and newest files by iterating through them all once. Since it uses iteration, there's no need to first create a list of them and go through it twice to determine the two extremes.
mport os
import pathlib
def max_min(iterable, keyfunc=None):
if keyfunc is None:
keyfunc = lambda x: x # Identity.
iterator = iter(iterable)
most = least = next(iterator)
mostkey = leastkey = keyfunc(most)
for item in iterator:
key = keyfunc(item)
if key > mostkey:
most = item
mostkey = key
elif key < leastkey:
least = item
leastkey = key
return most, least
mypath = '.'
files = (f for f in pathlib.Path(mypath).resolve().glob('**/*') if f.is_file())
oldest, newest = max_min(files, keyfunc=os.path.getmtime)
print(f'oldest file: {oldest}')
print(f'newest file: {newest}')
This code runs on Mac but doesn't work on windows. Im using pycharm (2019.2) as IDE and python 3.7.
import glob
import shutil
import os
dst = '/base/a/CAR1'
alter = '/base/a/CAR2'
path = '/base/a/Tub*'
for filename in glob.glob(path + 'Finsa*.txt'):
if '19999' in open(filename, 'r').read():
shutil.copyfile(filename, os.path.join(dst, os.path.basename(filename)))
elif '18888' in open(filename, 'r').read():
shutil.copyfile(filename, os.path.join(alter, os.path.basename(filename)))
even if I do the following it doesn't work -
for filename in glob.glob('C:/user/base/a/CAR1*.txt'):
print(filename)
RESULT:
process finished with exit code 0.
Is this happening because python can't read the Windows file directory? I have tried everything including back slashes, forward slashes, double slashes.
Use the os module and place the base folder in your project directory:
import os
path = os.path.join(os.getcwd(), 'base', 'a', 'Tub*')
for filename in glob.glob(os.path.join(path, 'Finsa*.txt')):
You gain portability between platforms (assuming both in Windows and Linux the base folder is in the python project space).
Suppose my python code is executed a directory called main and the application needs to access main/2091/data.txt.
how should I use open(location)? what should the parameter location be?
I found that below simple code will work.. does it have any disadvantages?
file = "\2091\sample.txt"
path = os.getcwd()+file
fp = open(path, 'r+');
With this type of thing you need to be careful what your actual working directory is. For example, you may not run the script from the directory the file is in. In this case, you can't just use a relative path by itself.
If you are sure the file you want is in a subdirectory beneath where the script is actually located, you can use __file__ to help you out here. __file__ is the full path to where the script you are running is located.
So you can fiddle with something like this:
import os
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)
This code works fine:
import os
def read_file(file_name):
file_handle = open(file_name)
print file_handle.read()
file_handle.close()
file_dir = os.path.dirname(os.path.realpath('__file__'))
print file_dir
#For accessing the file in the same folder
file_name = "same.txt"
read_file(file_name)
#For accessing the file in a folder contained in the current folder
file_name = os.path.join(file_dir, 'Folder1.1/same.txt')
read_file(file_name)
#For accessing the file in the parent folder of the current folder
file_name = os.path.join(file_dir, '../same.txt')
read_file(file_name)
#For accessing the file inside a sibling folder.
file_name = os.path.join(file_dir, '../Folder2/same.txt')
file_name = os.path.abspath(os.path.realpath(file_name))
print file_name
read_file(file_name)
I created an account just so I could clarify a discrepancy I think I found in Russ's original response.
For reference, his original answer was:
import os
script_dir = os.path.dirname(__file__)
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)
This is a great answer because it is trying to dynamically creates an absolute system path to the desired file.
Cory Mawhorter noticed that __file__ is a relative path (it is as well on my system) and suggested using os.path.abspath(__file__). os.path.abspath, however, returns the absolute path of your current script (i.e. /path/to/dir/foobar.py)
To use this method (and how I eventually got it working) you have to remove the script name from the end of the path:
import os
script_path = os.path.abspath(__file__) # i.e. /path/to/dir/foobar.py
script_dir = os.path.split(script_path)[0] #i.e. /path/to/dir/
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)
The resulting abs_file_path (in this example) becomes: /path/to/dir/2091/data.txt
It depends on what operating system you're using. If you want a solution that is compatible with both Windows and *nix something like:
from os import path
file_path = path.relpath("2091/data.txt")
with open(file_path) as f:
<do stuff>
should work fine.
The path module is able to format a path for whatever operating system it's running on. Also, python handles relative paths just fine, so long as you have correct permissions.
Edit:
As mentioned by kindall in the comments, python can convert between unix-style and windows-style paths anyway, so even simpler code will work:
with open("2091/data/txt") as f:
<do stuff>
That being said, the path module still has some useful functions.
I spend a lot time to discover why my code could not find my file running Python 3 on the Windows system. So I added . before / and everything worked fine:
import os
script_dir = os.path.dirname(__file__)
file_path = os.path.join(script_dir, './output03.txt')
print(file_path)
fptr = open(file_path, 'w')
Try this:
from pathlib import Path
data_folder = Path("/relative/path")
file_to_open = data_folder / "file.pdf"
f = open(file_to_open)
print(f.read())
Python 3.4 introduced a new standard library for dealing with files and paths called pathlib. It works for me!
Code:
import os
script_path = os.path.abspath(__file__)
path_list = script_path.split(os.sep)
script_directory = path_list[0:len(path_list)-1]
rel_path = "main/2091/data.txt"
path = "/".join(script_directory) + "/" + rel_path
Explanation:
Import library:
import os
Use __file__ to attain the current script's path:
script_path = os.path.abspath(__file__)
Separates the script path into multiple items:
path_list = script_path.split(os.sep)
Remove the last item in the list (the actual script file):
script_directory = path_list[0:len(path_list)-1]
Add the relative file's path:
rel_path = "main/2091/data.txt
Join the list items, and addition the relative path's file:
path = "/".join(script_directory) + "/" + rel_path
Now you are set to do whatever you want with the file, such as, for example:
file = open(path)
import os
def file_path(relative_path):
dir = os.path.dirname(os.path.abspath(__file__))
split_path = relative_path.split("/")
new_path = os.path.join(dir, *split_path)
return new_path
with open(file_path("2091/data.txt"), "w") as f:
f.write("Powerful you have become.")
If the file is in your parent folder, eg. follower.txt, you can simply use open('../follower.txt', 'r').read()
Get the path of the parent folder, then os.join your relative files to the end.
# get parent folder with `os.path`
import os.path
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
# now use BASE_DIR to get a file relative to the current script
os.path.join(BASE_DIR, "config.yaml")
The same thing with pathlib:
# get parent folder with `pathlib`'s Path
from pathlib import Path
BASE_DIR = Path(__file__).absolute().parent
# now use BASE_DIR to get a file relative to the current script
BASE_DIR / "config.yaml"
Python just passes the filename you give it to the operating system, which opens it. If your operating system supports relative paths like main/2091/data.txt (hint: it does), then that will work fine.
You may find that the easiest way to answer a question like this is to try it and see what happens.
Not sure if this work everywhere.
I'm using ipython in ubuntu.
If you want to read file in current folder's sub-directory:
/current-folder/sub-directory/data.csv
your script is in current-folder
simply try this:
import pandas as pd
path = './sub-directory/data.csv'
pd.read_csv(path)
When I was a beginner I found these descriptions a bit intimidating. As at first I would try
For Windows
f= open('C:\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f)
and this would raise an syntax error. I used get confused alot. Then after some surfing across google. found why the error occurred. Writing this for beginners
It's because for path to be read in Unicode you simple add a \ when starting file path
f= open('C:\\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f)
And now it works just add \ before starting the directory.
In Python 3.4 (PEP 428) the pathlib was introduced, allowing you to work with files in an object oriented fashion:
from pathlib import Path
working_directory = Path(os.getcwd())
path = working_directory / "2091" / "sample.txt"
with path.open('r+') as fp:
# do magic
The with keyword will also ensure that your resources get closed properly, even if you get something goes wrong (like an unhandled Exception, sigint or similar)