Python, Copy, Rename and run Commands

Python, Copy, Rename and run Commands - python

I have a little task for my company
I have multiple files which start with swale-randomnumber
I want to copy then to some directory (does shutil.copy allow wildmasks?)
anyway I then want to choose the largest file and rename it to sync.dat and then run a program.
I get the logic, I will use a loop to do each individual piece of work then move on to the next, but I am unsure how to choose a single largest file or a single file at all for that matter as when I type in swale* surely it will just choose them all?
Sorry I havnt written any source code yet, I am still trying to get my head around how this will work.
Thanks for any help you may provide

The accepted answer of this question proposes a nice portable implementation of file copy with wildcard support:
from glob import iglob
from shutil import copy
from os.path import join
def copy_files(src_glob, dst_folder):
for fname in iglob(src_glob):
copy(fname, join(dst_folder, fname))
If you want to compare file sizes, you can use either of these functions:
import os
os.path.getsize(path)
os.stat(path).st_size

This might work :
import os.path
import glob
import shutil
source = "My Source Path" # Replace these variables with the appropriate data
dest = "My Dest Path"
command = "My command"
# Find the files that need to be copied
files = glob.glob(os.path.join(source, "swale-*"))
# Copy the files to the destination
for file in files:
shutil.copy(os.path.join(source, "swale-*"), dest)
# Create a sorted list of files - using the file sizes
# biggest first, and then use the 1st item
biggest = sorted([file for file in files],
cmp=lambda x,y : cmp(x,y),
key=lambda x: os.path.size( os.path.join( dest, x)), reverse = True)[0]
# Rename that biggest file to swale.dat
shutil.move( os.path.join(dest,biggest), os.path.join(dest,"swale.date") )
# Run the command
os.system( command )
# Only use os.system if you know your command is completely secure and you don't need the output. Use the popen module if you need more security and need the output.
Note : None of this is tested - but it should work

from os import *
from os.path import *
directory = '/your/directory/'
# You now have list of files in directory that starts with "swale-"
fileList = [join(directory,f) for f in listdir(directory) if f.startswith("swale-") and isfile(join(directory,f))]
# Order it by file size - from big to small
fileList.sort(key=getsize, reverse=True)
# First file in array is biggest
biggestFile = fileList[0]
# Do whatever you want with this files - using shutil.*, os.*, or anything else..
# ...
# ...

Related

Iterate directory without joining file with dir (os.join)

I want to know if there is a package that lets me avoid typing the os.path.join every time I'm opening a file, and have the handler already contain this info.
From this:
import os
top = '~/folder'
for file in os.listdir(top):
full_file_path = os.path.join(top, file)
To:
import package
top = '~/folder'
for file in package.listdir(top):
full_file_path = file

In short: I recommend option 2
Option 1: A potential quick and dirty approach:
If your goal is to then open the files and you don't want to have to join, a trick could be to change your working directory:
import os
os.chdir(os.path.expanduser('~/folder'))
for file in os.listdir('.'):
# do something with file
note that file will not contain the full path to your file, but since you changed your local dir (equiv do cd), you will be able to access the file with just its name.
Option 2: Cleaner approach - pathlib:
Aside from that, you can also look into pathlib which is a module that provides nice path utilities.
For example:
import pathlib
path = pathlib.Path('~/folder').expanduser()
for file in path.glob('*'):
# do something with file
So you get a better idea:
In [23]: path = pathlib.Path('~/Documents').expanduser()
In [24]: [i for i in path.glob('*')]
Out[24]:
[PosixPath('/home/smagnan/Documents/folder1'),
PosixPath('/home/smagnan/Documents/folder2'),
PosixPath('/home/smagnan/Documents/file1')]

You should be able to define an alias on your import:
from os.path import join as pjoin
Or simply import it with from ... import, though join is a bit overloaded as a term, given it's implication for strings.
from os.path import join

You can create an alias to simplify the usage of a function:
J = os.path.join
## -or- ##
import from os :: path.join as J
J(top, file)

copying specific files with shutil()

I have a folder with 7500 images. I need to copy the first 600 images to a new folder using the shutil module in Python.
I tried to look for relevant stuff on the net but the usage of the paths is a bit confusing. What exactly should be my sequence of commands? I guess it will start like:
import os
import shutil
l=os.listdir(path)
for file in l[0:600]:
Edit: after having clarification on what shutil.copy() does, I came up with:
import os
import shutil
l=os.listdir(path)
for file in l[0:600]:
shutil.copy(file, destination, *, follow_symlinks = True)
But it's highlighting the comma after *, and giving the error iterable argument unpacking follows keyword argument unpacking. What's going wrong in the syntax?

Well, os.listdir() will return files randomly sorted, one thing you could do is that you can call os.stat(file).st_mtime on each file which will return timestamp when that file was last modified and then you can sort the files by that time to get first/last files. But it really depends on your use-case and how you interpret what first files are for you. But when it comes to shutil library you can just call:
for file in l[0:600]:
shutil.copy(file, f'./destination/{file}')
which will copy 600 files into directory that is in your current directory and named 'destination'.

os.listdir(path) will list files and sub direcotries in your directory you're searching.
I'm making the assumption that all you're files will be .jpg so I would use the glob module.
import glob
path = "D:Pictures\*.jpg"
destination = r"E:\new_pictures\\"
files = glob.glob(path)
for f in sorted(files)[:600]:
shutil.copy(f, destination)

How do I access a similar path to a file that only has a minor difference between computers?

I am trying to access a file from a Box folder as I am working on two different computers. So the file path is pretty much the same except for the username.
I am trying to load a numpy array from a .npy file and I could easily change the path each time, but it would be nice if I could make it universal.
Here is what the line of code looks like on my one computer:
y_pred_walking = np.load('C:/Users/Eric/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
And here is what the line of code looks like on the other computer:
y_pred_walking = 'C:/Users/erapp/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'
The only difference is that the username on one computer is Eric and the other is erapp, but is there a way where I can make the line universal to all computers where all computers will have the Box folder?

You could either save the file to a path that doesn't depend on the user: e.g. 'C:/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'
Or you could do some string formatting. One way would be with an environment or configuration variable that indicates which is the relevant user, and then for your load statement:
import os
current_user = os.environ.get("USERNAME") # assuming you're running on the Windows box as the relevant user
# Now load the formatted string. f-strings are better, but this is more obvious since f-strings are still very new to Python
y_pred_walking = 'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy'.format(user=current_user)

Yes, there is a way, at least for the problem as it is right now solution is pretty simple: to use f-strings
user='Eric'
y_pred_walking =np.load(f'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
or more general
def pred_walking(user):
return np.load(f'C:/Users/{user}/Box/CMU_MBL/Data/Calgary/4_Best_Results/Walking/Knee/bidir_lstm_50_50/predictions/y_pred_test.npy')
so on any machine you just do
y_pred_walking=pred_walking(user)
with defined user before, to receive the result

Simply search the folders recursivly for your file:
filename = 'y_pred_test.npy'
import os
import random
# creates 1000 directories with a 1% chance of having the file as well
for k in range(20):
for i in range(10):
for j in range(5):
os.makedirs(f"./{k}/{i}/{j}")
if random.randint(1,100) == 2:
with open(f"./{k}/{i}/{j}/{filename}","w") as f:
f.write(" ")
# search the directories for your file
found_in = []
# this starts searching in your current folder - you can give it your c:\Users\ instead
for root,dirs,files in os.walk("./"):
if filename in files:
found_in.append(os.path.join(root,filename))
print(*found_in,sep = "\n")
File found in:
./17/3/1/y_pred_test.npy
./3/8/1/y_pred_test.npy
./16/3/4/y_pred_test.npy
./16/5/3/y_pred_test.npy
./14/2/3/y_pred_test.npy
./0/5/4/y_pred_test.npy
./11/9/0/y_pred_test.npy
./9/8/1/y_pred_test.npy
If you get read errors because of missing file/directory permissions you can start directly in the users folder:
# Source: https://stackoverflow.com/a/4028943/7505395
from pathlib import Path
home = str(Path.home())
found_in = []
for root,dirs,files in os.walk(home):
if filename in files:
found_in.append(os.path.join(root,filename))
# use found_in[0] or break as soon as you find first file

You can use the expanduser function in the os.path module to modify a path to start from the home directory of a user
https://docs.python.org/3/library/os.path.html#os.path.expanduser

sort filenames by their time created on linux [duplicate]

What is the best way to get a list of all files in a directory, sorted by date [created | modified], using python, on a windows machine?

I've done this in the past for a Python script to determine the last updated files in a directory:
import glob
import os
search_dir = "/mydir/"
# remove anything from the list that is not a file (directories, symlinks)
# thanks to J.F. Sebastion for pointing out that the requirement was a list
# of files (presumably not including directories)
files = list(filter(os.path.isfile, glob.glob(search_dir + "*")))
files.sort(key=lambda x: os.path.getmtime(x))
That should do what you're looking for based on file mtime.
EDIT: Note that you can also use os.listdir() in place of glob.glob() if desired - the reason I used glob in my original code was that I was wanting to use glob to only search for files with a particular set of file extensions, which glob() was better suited to. To use listdir here's what it would look like:
import os
search_dir = "/mydir/"
os.chdir(search_dir)
files = filter(os.path.isfile, os.listdir(search_dir))
files = [os.path.join(search_dir, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))

Update: to sort dirpath's entries by modification date in Python 3:
import os
from pathlib import Path
paths = sorted(Path(dirpath).iterdir(), key=os.path.getmtime)
(put #Pygirl's answer here for greater visibility)
If you already have a list of filenames files, then to sort it inplace by creation time on Windows (make sure that list contains absolute path):
files.sort(key=os.path.getctime)
The list of files you could get, for example, using glob as shown in #Jay's answer.
old answer
Here's a more verbose version of #Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).
#!/usr/bin/env python
from stat import S_ISREG, ST_CTIME, ST_MODE
import os, sys, time
# path to the directory (relative or absolute)
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
# get all entries in the directory w/ stats
entries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))
entries = ((os.stat(path), path) for path in entries)
# leave only regular files, insert creation date
entries = ((stat[ST_CTIME], path)
for stat, path in entries if S_ISREG(stat[ST_MODE]))
#NOTE: on Windows `ST_CTIME` is a creation date
# but on Unix it could be something else
#NOTE: use `ST_MTIME` to sort by a modification date
for cdate, path in sorted(entries):
print time.ctime(cdate), os.path.basename(path)
Example:
$ python stat_creation_date.py
Thu Feb 11 13:31:07 2009 stat_creation_date.py

There is an os.path.getmtime function that gives the number of seconds since the epoch
and should be faster than os.stat.
import os
os.chdir(directory)
sorted(filter(os.path.isfile, os.listdir('.')), key=os.path.getmtime)

Here's my version:
def getfiles(dirpath):
a = [s for s in os.listdir(dirpath)
if os.path.isfile(os.path.join(dirpath, s))]
a.sort(key=lambda s: os.path.getmtime(os.path.join(dirpath, s)))
return a
First, we build a list of the file names. isfile() is used to skip directories; it can be omitted if directories should be included. Then, we sort the list in-place, using the modify date as the key.

Here's a one-liner:
import os
import time
from pprint import pprint
pprint([(x[0], time.ctime(x[1].st_ctime)) for x in sorted([(fn, os.stat(fn)) for fn in os.listdir(".")], key = lambda x: x[1].st_ctime)])
This calls os.listdir() to get a list of the filenames, then calls os.stat() for each one to get the creation time, then sorts against the creation time.
Note that this method only calls os.stat() once for each file, which will be more efficient than calling it for each comparison in a sort.

In python 3.5+
from pathlib import Path
sorted(Path('.').iterdir(), key=lambda f: f.stat().st_mtime)

Without changing directory:
import os
path = '/path/to/files/'
name_list = os.listdir(path)
full_list = [os.path.join(path,i) for i in name_list]
time_sorted_list = sorted(full_list, key=os.path.getmtime)
print time_sorted_list
# if you want just the filenames sorted, simply remove the dir from each
sorted_filename_list = [ os.path.basename(i) for i in time_sorted_list]
print sorted_filename_list

from pathlib import Path
import os
sorted(Path('./').iterdir(), key=lambda t: t.stat().st_mtime)
or
sorted(Path('./').iterdir(), key=os.path.getmtime)
or
sorted(os.scandir('./'), key=lambda t: t.stat().st_mtime)
where m time is modified time.

Here's my answer using glob without filter if you want to read files with a certain extension in date order (Python 3).
dataset_path='/mydir/'
files = glob.glob(dataset_path+"/morepath/*.extension")
files.sort(key=os.path.getmtime)

# *** the shortest and best way ***
# getmtime --> sort by modified time
# getctime --> sort by created time
import glob,os
lst_files = glob.glob("*.txt")
lst_files.sort(key=os.path.getmtime)
print("\n".join(lst_files))

sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.stat(p).st_mtime)
You could use os.walk('.').next()[-1] instead of filtering with os.path.isfile, but that leaves dead symlinks in the list, and os.stat will fail on them.

For completeness with os.scandir (2x faster over pathlib):
import os
sorted(os.scandir('/tmp/test'), key=lambda d: d.stat().st_mtime)

this is a basic step for learn:
import os, stat, sys
import time
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
listdir = os.listdir(dirpath)
for i in listdir:
os.chdir(dirpath)
data_001 = os.path.realpath(i)
listdir_stat1 = os.stat(data_001)
listdir_stat2 = ((os.stat(data_001), data_001))
print time.ctime(listdir_stat1.st_ctime), data_001

Alex Coventry's answer will produce an exception if the file is a symlink to an unexistent file, the following code corrects that answer:
import time
import datetime
sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.path.exists(p) and os.stat(p).st_mtime or time.mktime(datetime.now().timetuple())
When the file doesn't exist, now() is used, and the symlink will go at the very end of the list.

This was my version:
import os
folder_path = r'D:\Movies\extra\new\dramas' # your path
os.chdir(folder_path) # make the path active
x = sorted(os.listdir(), key=os.path.getctime) # sorted using creation time
folder = 0
for folder in range(len(x)):
print(x[folder]) # print all the foldername inside the folder_path
folder = +1

Here is a simple couple lines that looks for extention as well as provides a sort option
def get_sorted_files(src_dir, regex_ext='*', sort_reverse=False):
files_to_evaluate = [os.path.join(src_dir, f) for f in os.listdir(src_dir) if re.search(r'.*\.({})$'.format(regex_ext), f)]
files_to_evaluate.sort(key=os.path.getmtime, reverse=sort_reverse)
return files_to_evaluate

Add the file directory/folder in path, if you want to have specific file type add the file extension, and then get file name in chronological order.
This works for me.
import glob, os
from pathlib import Path
path = os.path.expanduser(file_location+"/"+date_file)
os.chdir(path)
saved_file=glob.glob('*.xlsx')
saved_file.sort(key=os.path.getmtime)
print(saved_file)

Turns out os.listdir sorts by last modified but in reverse so you can do:
import os
last_modified=os.listdir()[::-1]

Maybe you should use shell commands. In Unix/Linux, find piped with sort will probably be able to do what you want.

how can I save the output of a search for files matching *.txt to a variable?

I'm fairly new to python. I'd like to save the text that is printed by at this script as a variable. (The variable is meant to be written to a file later, if that matters.) How can I do that?
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)

you can store it in variable like this:
import fnmatch
import os
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_var = file
# do your stuff
or you can store it in list for later use:
import fnmatch
import os
my_match = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
print(file)
my_match.append(file) # append insert the value at end of list
# do stuff with my_match list

You can store it in a list:
import fnmatch
import os
matches = []
for file in os.listdir("/Users/x/y"):
if fnmatch.fnmatch(file, '*.txt'):
matches.append(file)

Both answers already provided are correct, but Python provides a nice alternative. Since iterating through an array and appending to a list is such a common pattern, the list comprehension was created as a one-stop shop for the process.
import fnmatch
import os
matches = [filename for filename in os.listdir("/Users/x/y") if fnmatch.fnmatch(filename, "*.txt")]

While NSU's answer and the others are all perfectly good, there may be a simpler way to get what you want.
Just as fnmatch tests whether a certain file matches a shell-style wildcard, glob lists all files matching a shell-style wildcard. In fact:
This is done by using the os.listdir() and fnmatch.fnmatch() functions in concert…
So, you can do this:
import glob
matches = glob.glob("/Users/x/y/*.txt")
But notice that in this case, you're going to get full pathnames like '/Users/x/y/spam.txt' rather than just 'spam.txt', which may not be what you want. Often, it's easier to keep the full pathnames around and os.path.basename them when you want to display them, than to keep just the base names around and os.path.join them when you want to open them… but "often" isn't "always".
Also notice that I had to manually paste the "/Users/x/y/" and "*.txt" together into a single string, the way you would at the command line. That's fine here, but if, say, the first one came from a variable, rather than hardcoded into the source, you'd have to use os.path.join(basepath, "*.txt"), which isn't quite as nice.
By the way, if you're using Python 3.4 or later, you can get the same thing out of the higher-level pathlib library:
import pathlib
matches = list(pathlib.Path("/Users/x/y/").glob("*.txt"))

Maybe defining an utility function is the right path to follow...
def list_ext_in_dir(e,d):
"""e=extension, d= directory => list of matching filenames.
If the directory d cannot be listed returns None."""
from fnmatch import fnmatch
from os import listdir
try:
dirlist = os.listdir(d)
except OSError:
return None
return [fname for fname in dirlist if fnmatch(fname,e)]
I have put the dirlist inside a try except clause to catch the
possibility that we cannot list the directory (non-existent, read
permission, etc). The treatment of errors is a bit simplistic, but...
the list of matching filenames is built using a so called list comprehension, that is something that you should investigate as soon as possible if you're going to use python for your programs.
To close my post, an usage example
l_txtfiles = list_ext_in_dir('*.txt','/Users/x/y;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, Copy, Rename and run Commands - python

Related

Iterate directory without joining file with dir (os.join)

copying specific files with shutil()

How do I access a similar path to a file that only has a minor difference between computers?

sort filenames by their time created on linux [duplicate]

how can I save the output of a search for files matching *.txt to a variable?

Categories

Resources