I an pretty new to python and have been given a problem to solve in my lab, how could one possibly remove specific files ending with .gff or another ending if the file is empty? The files were all just created and are all in the same directory.
for getting files of a particular kind out of a directory glob works very well glob.glob("./file/path/*.gff") will return a list of files ending in .gff
as for finding the size of the file use os.stat("./file/path/blah.gff").st_size
Related
Since it looks like Pathlib is the future, I'm trying to refactor some of my code to change from from previous use of os to Pathlib. I'm stuck with the following problem. Since I work with a Mac, sometimes the folders contain hidden files preceded by a period (.DS_Store or names from deleted files preceded by ._). That gets me into a lot of problems when I loop through files in a directory that have certain extension. To avoid this problem using os.walk, I do the following:
for root, dirs, files in os.walk(DIR_NAME):
# iterate all files
for file in files:
if file.endswith(ext):
if file.startswith("."):
continue
do something with the file
I know we have the .stem and .suffix method to manipulate file names with Pathlib, but I don't see how they can help with this problem. The .startswith seems more intuitive but alas it does not seem to be available in Pathlib. So, the question is, how would one go about doing this in Pathlib?
Say I have a file that contains the different locations where some '.wav' files are present on a server. For example say the content of the text file location.txt containing the locations of the wav files is this
/home/user/test_audio_folder_1/audio1.wav
/home/user/test_audio_folder_2/audio2.wav
/home/user/test_audio_folder_3/audio3.wav
/home/user/test_audio_folder_4/audio4.wav
/home/user/test_audio_folder_5/audio5.wav
Now what I want to do is that I want to copy these files from different locations within the server to a particular directory within that server, for example say /home/user/final_audio_folder/ and this directory will contain all the audio files from audio1.wav to audio5.wav
I am trying to perform this task by using shutil, but the main problem with shutil that I am facing is that while copying the files, I need to name the file. I have written a demo version of what I am trying to do, but dont know how to scale it when I will be reading the paths of the '.wav' files from the txt file and copy them to my desired location using a loop.
My code for copying a single file goes as follows,
import shutil
original = r'/home/user/test_audio_folder_1/audio1.wav'
target=r'/home/user/final_audio_folder_1/final_audio1.wav'
shutil.copyfile(original,target)
Any suggestions will be really helpful. Thank you.
import shutil
i=0
with open(r'C:/Users/turing/Desktop/location.txt', "r") as infile:
for t in infile:
i+=1
x="audio"+str(i)+".wav"
t=t.rstrip('\n')
original= r'{}'.format(t)
target=r'C:/Users/turing/Desktop/audio_in/' + x
shutil.copyfile(original, target)
Use the built-in string's split() method within a for loop on the location.txt contents & split the name of the directory on the '/' character, then the last element in a new list would be your filename.
This question already has answers here:
Using os.walk() to recursively traverse directories in Python
(14 answers)
Closed 2 years ago.
I hope everyone is staying safe and sound.
Could you please help me with file name change in certain directory?
I am writing a script for RDA to download documents from this website, and when I download these files from this website, file names do not have pattern, it is completely random. So I want to rename these files into certain pattern.
For example,
1st downloaded file name: filedownload.pdf
rename to: 1234567.pdf
2nd downloaded file name: extractpages.pdf
rename to: 1234568.pdf
new names are set in parameter so that part is good but as I don't know what downloaded file name will be, I can't change this file name into new name.
So what I had in mind is for each downloaded file, put it in this folder without any files, and whenever any file is located in this folder, it renames the file and put it in different folder.
Below is the code I wrote;
filepath = "originalFilePath/downloadedfile.pdf"
if os.path.isfile(filepath):
os.rename(r'originalFilePath/downloadedfile.pdf',
'newFilePath' + str(parameter['search_number']) + '.pdf')
But I would like to change file names no matter what the downloaded file name is.
Please help, thank you very much!!
I think your idea is quite correct.
First, you will list all file name in a folder before downloading anything, store that list in a variable, let's say current_files. Then you start downloading files from the web, after that, you list all the file name again and store in the second list. Now you will compare 2 lists and know what files are new (just downloaded). Then you just simply loop over these new files and rename it in a pattern that you want. I will add code in a minute
import glob
current_files = glob.glob("/path/Downloads/*.pdf")
# ['path/Downloads/abc.pdf', 'path/Downloads/xyz.pdf']
# your code to download files from the web
new_files = set(glob.glob("/path/Downloads/*.pdf")) - set(current_files)
# ['path/Downloads/just_downloaded1.pdf', 'path/Downloads/just_downloaded2.pdf']
for file in new_files:
#do the rename code here to your new files
You question is not totally clear.
Anyway I think you have an issue with your code, you have to add / after newFilePath.
Use os.renames(old, new).
os.renames(old, new)
Recursive directory or file renaming function. Works like rename(), except creation of any intermediate directories needed to make the new pathname good is attempted first. After the rename, directories corresponding to rightmost path segments of the old name will be pruned away using removedirs().
References
https://docs.python.org/3/library/os.html
I am getting a FileNotFound error when iterating through a list of files in Python on Windows.
The specific error I get looks like:
FileNotFoundError: File b'fileName.csv' does not exist
In my code, I first ask for input on where the file is located and generate a list using os (though I also tried glob):
directory = input('In what directory are your files located?')
fileList = [s for s in os.listdir(directory) if s.endswith('.csv')]
When I print the list, it does not contain byte b before any strings, as expected (but I still checked). My code seems to break at this step, which generates the error:
for file in fileList:
pd.read_csv(file) # breaks at this line
I have tried everything I could find on Stack Overflow to solve this problem. That includes:
Putting an r or b or rb right before the path string
Using both the relative and absolute file paths
Trying different variations of path separators (/, \, \\, etc.)
I've been dealing with Windows-related issues lately (since I normally work in Mac or Linux) so that was my first suspicion. I'd love another set of eyes to help me figure out where the snag is.
A2A.
Although the list was generated correctly because the full directory path was used, the working directory when the file was being run was .. You can verify this by running os.getcwd(). When later iterating through the list of file names, the program could not find those file names in the . directory, as it shouldn't. That list is just a list of file names; there was no directory tied to it so it used the current directory.
The easiest fix here is to change the directory the file is running through after the input. So,
directory = input('In what directory are your files located?')
os.chdir(directory) # Points os to given directory to avoid byte issue
If you need to access multiple directories, you could either do this right before you switch to each directory or save the full file path into your list instead.
I am trying to match file names within a folder using python so that I can run a secondary process on the files that match. My file names are such that they begin differently but match strings at some point as below:
3322_VEGETATION_AREA_2009_09
3322_VEGETATION_LINE_2009_09
4522_VEGETATION_POINT_2009_09
4422_VEGETATION_AREA_2009_09
8722_VEGETATION_LINE_2009_09
2522_VEGETATION_POINT_2009_09
4222_VEGETATION_AREA_2009_09
3522_VEGETATION_LINE_2009_09
3622_VEGETATION_POINT_2009_09
Would regex be the right approach to matching those files after the first underscore or am I overthinking this?
import glob
files = glob.glob("*VEGETATION*")
should do the trick. It should find all files in the current directory that contain "VEGETATION" somewhere in the filename