During my simulation, Python creates folders named __pycache__. Not just one, but many. The __pycache__-folders are almost always created next to the modules that are executed.
But these modules are scattered in my directory. The main folder is called LPG and has a lot of subfolders, which in turn have further subfolders. The __pycache__-folders can occur at all possible places.
At the end of my simulation I would like to clean up and delete all folders named __pycache__ within the LPG-tree.
What is the best way to do this?
Currently, I am calling the function below on simulation end (also on simulation start). However, that is a bit annoying since I specifically have to write down every Path where a __pycache__-folder might occur.
def clearCache():
"""
Removes generic `__pycache__` .
The `__pycache__` files are automatically created by python during the simulation.
This function removes the generic files on simulation start and simulation end.
"""
try:
shutil.rmtree(Path(f"{PATH_to_folder_X}/__pycache__"))
except:
pass
try:
shutil.rmtree(Path(f"{PATH_to_folder_Y}/__pycache__"))
except:
pass
This will remove all *.pyc files and pycache directories recursively in the current directory:
with python:
import os
os.popen('find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf')
or manually with terminal or cmd:
find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf
Bit of a frame challenge here: If you don't want the bytecode caches, the best solution is to not generate them in the first place. If you always delete them after every run, they're worse than useless. Either:
Invoke python/python3 with the -B option (affects that single launch), or...
Set the PYTHONDONTWRITEBYTECODE environment variable to affect all Python launches until it's unset, e.g. in bash, export PYTHONDONTWRITEBYTECODE=1
This does need to be set before the Python script is launched, so perhaps wrap your script with a simple bash script or the like that invokes the real Python script with the appropriate switch/environment set up.
Another simple solution is available if you have access to a Command-Line Interface and the find utility:
find . -type d -name __pycache__
as you will ask in plain language, it finds in the current folder (.) directories (-type d) that exactly match your pattern -name __pycache__. You can use this to identify where these folders are and then to delete them:
find . -type d -name __pycache__ -exec rm -fr {} \;
the huge advantage of this solution is that it transfers to other tasks easily (finding *.pyc files?) and has become an everyday tool for me.
Here is simple solution if you already know where the __pycache__ folders are just try the following
import shutil
import os
def clearCache():
"""
Removes generic `__pycache__` .
The `__pycache__` files are automatically created by python during the simulation.
This function removes the genric files on simulation start and simulation end.
"""
path = 'C:/Users/Yours/Desktop/LPG'
try:
for all in os.listdir(path):
if os.path.isdir(path + all):
if all == '__pycache__':
shutil.rmtree(path + all, ignore_errors=False)
except:
pass
clearCache()
Just simple you can still modify the path to the actually your path is.
And if you want the script to penetrate into the subdirectories to remove the pycache folders just check the following
Example
import shutil
import os
path = 'C:/Users/Yours/Desktop/LPG'
for directories, subfolder, files in os.walk(path):
if os.path.isdir(directories):
if directories[::-1][:11][::-1] == '__pycache__':
shutil.rmtree(directories)
If you want to delete any folders from any directory, use this function.
By default, it would start deleting from current directory and recursively goes in every sub-directories
import os
import shutil
def remove_dirs(curr_dir='./', del_dirs=['temp_folder', '__pycache__']):
for del_dir in del_dirs:
if del_dir in os.listdir(curr_dir):
shutil.rmtree(os.path.join(curr_dir, del_dir))
for dir in os.listdir(curr_dir):
dir = os.path.join(curr_dir, dir)
if os.path.isdir(dir):
self.remove_dirs(dir, del_dirs)
You can use os with glob like this:
import os, glob
in_dir = "/path/to/your/folder"
pattern = ['__pycache__']
for p in pattern:
[os.remove(x) for x in glob.iglob(os.path.join(in_dir, "**", p), recursive=True)]
Related
I'm using a basic python script to create an archive with the contents of a directory "directoryX":
shutil.make_archive('NameOfArchive', format='gztar', root_dir=getcwd()+'/directoryX/')
The generated archive rather than just storing the contents of directoryX, creates a . folder in the archive (and the contents of folder directoryX are stored in this . folder).
Interestingly this only happens with .tar and tar.gz but not with .zip
Used python version -> 3.8.10
It seems that when using .tar or .tar.gz formats, the default base_dir of "./" gets accepted literally and it creates a folder titled "."
I tried using base_dir=os.currdir but got the same results...
Tried to also use python2 but got the same results.
Is this a bug with shutil.make_archive or am I doing something incorrectly?
It's a documented behavior, sort of, just a little odd. The base_dir argument to make_archive is documented to:
Be the directory we start archiving from (after chdiring to root_dir)
Default to the current directory (specifically, os.curdir)
os.curdir is actually a constant string, '.', and, matching the tar command line utility, shutil.make_archive (and tar.add which it's implemented in terms of) stores the complete path "given" (in this case, './' plus the rest of the relative path to the file). If you run tar -c -z -C directoryX -f NameOfArchive.tar.gz ., you'll end up with a tarball full of ./ prefixed files too (-C directoryX does the same thing as root_dir, and the . argument is the same as the default base_dir='.').
I don't see an easy workaround that retains the simplicity of shutil.make_archive; if you try to pass base_dir='' it dies when it tries to stat '', so that's out.
To be clear, this behavior should be fine; a tar entry named ./foo and one named foo are equivalent for most purposes. If it really bothers you, you can switch to using the tarfile module directly, e.g.:
# Imports at top of file
import os
import tarfile
# Actual code
with tarfile.open('NameOfArchive.tar.gz', 'w:gz') as tar:
for entry in os.scandir('directoryX'):
# Operates recursively on any directories, using the arcname as the base,
# so you add the whole tree just by adding all the entries in the top
# level directory. Using arcname of entry.name means it's equivalent to
# adding os.path.basename(entry.path), omitting all directory components
tar.add(entry.path, arcname=entry.name)
# The whole loop *could* be replaced with just:
# tar.add('directoryX', arcname='')
# which would add all contents recursively, but it would also put an entry
# for '/' in, which is undesirable
For a directory structure like:
directoryX/
|
\- foo
\- bar
\- subdir/
|
\- spam
\- eggs
the resulting tar's contents would be:
foo
bar
subdir/
subdir/eggs
subdir/spam
vs. the:
./foo
./bar
./subdir/
./subdir/eggs
./subdir/spam
your current code produces.
Slightly more work to code, but not that much worse; two imports and three lines of code, and with greater control over what gets added (for example, you could trivially exclude symlinks by wrapping the tar.add call in an if not entry.is_symlink(): block, or omit recursive adding of specific directories by conditionally setting recursive=False to the tar.add call for directories you don't want to include the contents of; you can even provide a filter function to the tar.add call to conditionally exclude specific entries even when deep recursion gets involved).
The current file organization looks like this:
Species_name1.asc
Species_name1.csv
Species_name1_Averages.csv
...
...
Species_name2.asc
Species_name2.csv
Species_name2_Averages.csv
I need to figure out a script that can create the new directories with the names (Species_name1, Species_name2... etc) and that can move the files from the base directory into the appropriate new directories.
import os
import glob
import shutil
base_directory = [CURRENT_WORKING_DIRECTORY]
with open("folder_names.txt", "r") as new_folders:
for i in new_folders:
os.mkdirs(base_directory+i)
Above is an example of what I can think of doing when creating new directories within the base directory.
I understand that I will have to utilize tools within the os, shutil, and/or glob modules if I were to use python. However, the exact script is escaping me and my files remain unorganized. If there is any advise you can provide in helping me complete this small task I will be most grateful.
Also there are many file types and suffixes within this directory but the (species_name?) portion is always consistent.
Below is the expected hierarchy:
Species_name1
-- Species_name1.asc
-- Species_name1.csv
-- Species_name1_Averages.csv
Species_name2
-- Species_name2.asc
-- Species_name2.csv
-- Species_name2_Averages.csv
Thank you in advance!
Like this using simple shell tools with bash:
find . -type f -name '*Species_name*' -exec bash -c '
dir=$(grep -oP "Species_name\d+" <<< "$1")
echo mkdir "$dir"
echo mv "$1" "$dir"
' -- {} \;
Drop the echo commands when the output looks good for you.
Assuming all your asc files are named like in your example:
from os import mkdir
from shutil import move
from glob import glob
fs = []
for file in glob("*.asc"):
f = file.split('.')[0]
fs.append(f)
mkdir(f)
for f in fs:
for file in glob("*.*"):
if file.startswith(f):
move(file, f'.\\{f}\\{file}')
UPDATE:
Assuming all your Species_name.asc files are labeled like in your example:
from os import mkdir
from shutil import move
from glob import glob
fs = [file.split('.')[0] for file in glob("Species_name*.asc")]
for f in fs:
mkdir(f)
for file in glob("*.*"):
if file.startswith(f):
move(file, f'.\\{f}\\{file}')
I have been trying to figure out how to translate this simple batch code (that deletes every empty dir in a tree) into python and it is taking me an unreasonable amount of time. I kindly ask for a solution with detailed explanation, I believe it will jumpstart my understanding of the language. I'm in danger of giving up.
for /d /r %%u in (*) do rmdir "%%u"
I do have my grotesque version I am trying to fix which must be all sorts of wrong. I would prefer using the shutil module, if suitable.
for dirpath in os.walk("D:\\SOURCE")
os.rmdir(dirpath)
If you only want to delete the empty directories, then pathlib.Path(..).glob(..) would work:
import os
from pathlib import Path
emptydirs = [d for d in Path('.').glob('**/*') # go through everything under '.'
if d.is_dir() and not os.listdir(str(d))] # include only directories without contents
for empty in emptydirs: # iterate over all found empty directories
os.rmdir(empty) # .. and remove
if you want to delete everything under the directory, then the shutil.rmtree(..) function can do it in one line:
import shutil
shutil.rmtree('.')
check the docs for all the details (https://docs.python.org/2/library/shutil.html#shutil.rmtree)
I try to get the name of subdirectories with Python3 script on Windows10.
Thus, I wrote code as follows:
from pathlib2 import Path
p = "./path/to/target/dir"
[str(item) for item in Path(p).rglob(".")]
# obtained only subdirectories path names including target directory itself.
It is good for me to get this result, but I don't know why the pattern of rglob argument returns this reuslt.
Can someone explain this?
Thanks.
Every directory in a posix-style filesystem features two files from the get go: .., which refers to the parent directory, and ., which refers to the current directory:
$ mkdir tmp; cd tmp
tmp$ ls -a
. ..
tmp$ cd .
tmp$ # <-- still in the same directory
- with the notable exception of /.., which refers to the root itself since the root has not parent.
A Path object from python's pathlib is, when it is created, just a wrapper around a string that is assumed to point somewhere into the filesystem. It will only refer to something tangible when it is resolved:
>>> Path('.')
PosixPath('.') # just a fancy string
>>> Path('.').resolve()
PosixPath('/current/working/dir') # an actual point in your filesystem
The bottom line is that
the paths /current/working/dir and /current/working/dir/. are, from the filesystem's point of view, completely equivalent, and
a pathlib.Path will also reflect that as soon as it is resolved.
By matching the glob call to ., you found all links pointing to the current directories below the initial directory. The results from glob get resolved on return, so the . doesn't appear in there any more.
As a source for this behavior, see this section of PEP428 (which serves as the specification for pathlib), where it briefly mentions path equivalence.
I have a script that runs on a folder to create contour lines. Since I have roughly 2700 DEM which need to be processed, I need a way using the script to run on all folders within the parent folder saving them to an output folder. I am not sure how to script this but it would be greatly appreciated if I could get some guidance.
The following is the script I currently have which works on a single folder.
import arcpy
from arcpy import env
from arcpy.sa import *
env.workspace = "C:/DATA/ScriptTesting/test"
inRaster = "1km17670"
contourInterval = 5
baseContour = 0
outContours = "C:/DATA/ScriptTesting/test/output/contours5.shp"
arcpy.CheckOutExtension("Spatial")
Contour(inRaster,outContours, contourInterval, baseContour)
You're probably looking for os.walk(), which can recursively walk through all subdirectories of the given directory. You can either use the current working directory, or calculate your own parent folder and start from there, or whatever - but it'll give you the filenames for everything beneath what it starts with. From there, you can make a subroutine to determine whether or not to perform your script on that file.
You can get a list of all directories like this:
import arcpy
from arcpy import env
from arcpy.sa import *
import os
# pass in your root directory here
directories = os.listdir(root_dir)
Then you can iterate over this dirs:
for directory in directories:
# I assume you want the workspace attribute set to the subfolders
env.workspace = os.path.realpath(directory)
inRaster = "1km17670"
contourInterval = 5
baseContour = 0
# here you need to adjust the outputfile name if there is a file for every subdir
outContours = "C:/DATA/ScriptTesting/test/output/contours5.shp"
arcpy.CheckOutExtension("Spatial")
Contour(inRaster,outContours, contourInterval, baseContour)
As #a625993 mentioned, os.walk could be useful too if you have recursively nested directories. But as I can read from your question, you have just single subdirectories which directly contain the files and no further directories. That's why listing just the dirs underneath your root directory should be enough.