Finding latest file in a folder using python - python

I've searched for an answer for this but the answers still gave me an error message and I wasn't allowed to ask there because I had to make a new question. So here it goes...
I need my python script to use the latest file in a folder.
I tried several things, currently the piece of code looks like this:
list_of_files = glob.glob('/my/path/*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
But the code fails with the following comment:
ValueError: max() arg is an empty sequence
Does anyone have an idea why?

It should be ok if the list is not empty, but it seems to be. So first check if the list isn't empty by printing it or something similar.
I tested this code and it worked fine:
import os
import glob
mypath = "C:/Users/<Your username>/Downloads/*.*"
print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))

glob.glob has a limitation of not matching the files that start with a .
So, if you want to match these files, this is what you should do - (assume a directory having .picture.png in it)
import glob
glob.glob('.p*') #assuming you're already in the directory
Also, it would be an ideal way to check the number of files present in the directory, before operating on them.

Related

how does this batch FOR command translate to python?

I have been trying to figure out how to translate this simple batch code (that deletes every empty dir in a tree) into python and it is taking me an unreasonable amount of time. I kindly ask for a solution with detailed explanation, I believe it will jumpstart my understanding of the language. I'm in danger of giving up.
for /d /r %%u in (*) do rmdir "%%u"
I do have my grotesque version I am trying to fix which must be all sorts of wrong. I would prefer using the shutil module, if suitable.
for dirpath in os.walk("D:\\SOURCE")
os.rmdir(dirpath)
If you only want to delete the empty directories, then pathlib.Path(..).glob(..) would work:
import os
from pathlib import Path
emptydirs = [d for d in Path('.').glob('**/*') # go through everything under '.'
if d.is_dir() and not os.listdir(str(d))] # include only directories without contents
for empty in emptydirs: # iterate over all found empty directories
os.rmdir(empty) # .. and remove
if you want to delete everything under the directory, then the shutil.rmtree(..) function can do it in one line:
import shutil
shutil.rmtree('.')
check the docs for all the details (https://docs.python.org/2/library/shutil.html#shutil.rmtree)

Python glob.glob always returns empty list

I'm trying to use glob and os to locate the most recent .zip file in a directory. Funny thing is, I had the following set up and it was working previously:
max(glob.glob('../directory/*.zip'), key=os.path.getctime)
Running this now gets me max() arg is an empty sequence, which makes sense because when I try this:
glob.glob('../directory/*.zip')
it returns nothing but an empty list. Using the full path also gets me an empty list. Trying other directories also gets me an empty list. I'm very confused about what's going on here given this worked perfectly previously. Help?
EDIT: Got it to work again using:
glob.glob(/Users/*/directory/*.zip)
You want the ** glob operator:
glob.glob('**/*.zip',recursive=True)
Will match all files ending in '.zip' in the current directory and in all subdirectories for example.
In my case, I forgot to escape the special character [ that was in the directory name using glob.escape(pathname).
So instead of glob.glob(pathname), try glob.glob(glob.escape(pathname)).
I faced a lot of problem in globbing on ubuntu.
This code works fine on windows
cv_img = []
for img in glob.glob('/home/itisha/Desktop/op/*.JPG'):
print('hi')
n= cv2.imread(img)
cv_img.append(n)
But for Ubuntu you have to replace line 2 by
for img in glob.glob('/home/*itisha/*Desktop/*op/*.JPG'):

How to input multiple files from a directory

First and foremost, I am recently new to Unix and I have tried to find a solution to my question online, but I could not find a solution.
So I am running Python through my Unix terminal, and I have a program that parses xml files and inputs the results into a .dat file.
My program works, but I have to input every single xml file (which number over 50) individually.
For example:
clamshell: python3 my_parser2.py 'items-0.xml' 'items-1.xml' 'items-2.xml' 'items-3.xml' .....`
So I was wondering if it is possible to read from the directory, which contains all of my files into my program? Rather than typing all the xml file names individually and running the program that way.
Any help on this is greatly appreciated.
import glob
listOffiles = glob.glob('directory/*.xml')
The shell itself can expand wildcards so, if you don't care about the order of the input files, just use:
python3 my_parser2.py items-*.xml
If the numeric order is important (you want 0..9, 10-99 and so on in that order, you may have to adjust the wildcard arguments slightly to guarantee this, such as with:
python3 my_parser2.py items-[0-9].xml items-[1-9][0-9].xml items-[1-9][0-9][0-9].xml
python3 my_parser2.py *.xml should work.
Other than the command line option, you could just use glob from within your script and bypass the need for command arguments:
import glob
filenames = glob.glob("*.xml")
This will return all .xml files (as filenames) in the directory from which you are running the script.
Then, if needed you can simply iterate through all the files with a basic loop:
for file in filenames:
with open(file, 'r') as f:
# do stuff to f.

How to loop through the list of .tar.gz files using linux command in python

Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!
I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')

getting a list of files in a custom directory using glob()

Im trying to write a program that renames files when a use input their own custom file directory.
I'm at a very early part of it. And this is my first time using the OS and glob commands.
My code is below. However when I tried running that, the result was an empty list. I tried typing a file root directory into the glob command directly, it somehow works, but the result isn't what I wanted.
Hope you guys can help me.
Thanks.
import os, glob
def fileDirectory():
#Asks the user for a file root directory
fileroot = raw_input("Please input the file root directory \n\n")
#Returns a list with all the files inside the file root directory
filelist = glob.glob(fileroot)
print filelist
fileDirectory()
Python is white-space sensitive, so you need to make sure that everything you want inside the function is indented.
Stackoverflow has its own indentation requirements for code, which makes it hard to be sure what indentation your code originally had.
import os, glob
def fileDirectory():
#Asks the user for a file root directory
fileroot = raw_input("Please input the file root directory \n\n")
#Returns a list with all the files inside the file root directory
filelist = glob.glob(fileroot)
print filelist
fileDirectory()
The next thing is that glob returns a the results of a glob - it doesn't list a directory, which appears to be what you're trying to do.
Either you want os.listdir, or os.walk or you actually should ask for a glob expression rather than a directory.
Finally raw_input might give you some extra whitespace that you'll have to strip off. Check what fileroot is.
You might want to split up your program, so that you can investigate each function separately.

Categories