I have a Python script that compares existing file names in a folder to a reference table and then determines if it needs to be renamed or not.
As it loops through each filename:
'oldname' = the current file name
'newname' = what it needs to be renamed to
I want rename the file and move it to a new folder "..\renamedfiles"
Can I do the rename and the move at the same time as it iterates through the loop?
Yes you can do this. In Python you can use the move function in shutil library to achieve this.
Let's say on Linux, you have a file in /home/user/Downloads folder named "test.txt" and you want to move it to /home/user/Documents and also change the name to "useful_name.txt". You can do both things in the same line of code:
import shutil
shutil.move('/home/user/Downloads/test.txt', '/home/user/Documents/useful_name.txt')
In your case you can do this:
import shutil
shutil.move('oldname', 'renamedfiles/newname')
os.rename (and os.replace) won't work if the source and target locations are on different partitions/drives/devices. If that's the case, you need to use shutil.move, which will use atomic renaming if possible, and fallback to copy-then-delete if the destination is not on the same file system. It's perfectly happy to both move and rename in the same operation; the operation is the same regardless.
To do both of the operations, you can use the os.rename(src, dest) function.
You should have the wanted directory to save the file in, and the new file name. You can do this for every file you run across in your loop.
For example:
# In Windows
dest_dir = "tmp\\2"
new_name = "bar.txt"
current_file_name = "tmp\\1\\foo.txt"
os.rename(current_file_name, os.path.join(dest_dir, new_name))
The rename function allows you to change the name of the file and it's folder at the same time.
To prevent any errors in renaming and moving of the file, use shutil.move.
Since Python 3.4, working with paths is done easily with pathlib. Moving/renaming a file is done with rename or replace (will unconditionally do the replace). So combining with the parent attribute and the concat operator, you can do:
from pathlib import Path
source = Path("path/to/file/oldname")
target = source.replace(source.parent / "renames" / "newname")
Create a Python file in your desired directory and write something like that :
import os
for filename in os.listdir("."):
if(filename ...):
newFilename = ...
os.rename(filename, newFilename)
Related
I need to add a prefix to file names within a directory. Whenever I try to do it though, it tries to add the prefix to the beginning of the file path. That won't work. I have a few hundred files that I need to change, and I've been stuck on this for a while. Have any ideas? Here's the closest I've come to getting it to work. I found this idea in this thread: How to add prefix to the files while unzipping in Python? If I could make this work inside my for loop to download and extract the files that would be cool, but it's okay if this happens outside of that loop.
import os
import glob
import pathlib
for file in pathlib.Path(r'C:\Users\UserName\Desktop\Wells').glob("*WaterWells.*"):
dst = f"County_{file}"
os.rename(file, os.path.join(file, dst))
That produces this error:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Users\\UserName\\Desktop\\Wells\\Alcona_WaterWells.cpg' -> 'C:\\Users\\UserName\\Desktop\\Wells\\Alcona_WaterWells.cpg\\County_C:\\Users\\UserName\\Desktop\\Wells\\Alcona_WaterWells.cpg'
I'd like to add "County_" to each file. The targeted files use this syntax: CountyName_WaterWells.ext
os.path.basename gets the file name, os.path.dirname gets directory names. Note that these may break if your slashes are in a weird direction. Putting them in your code, it would work like this
import os
import glob
import pathlib
for file in pathlib.Path(r'C:\Users\UserName\Desktop\Wells').glob("*WaterWells.*"):
dst = f"County_{os.path.basename(file)}"
os.rename(file, os.path.join(os.path.dirname(file), dst))
The problem is your renaming variable, dst, adds 'County_' before the entire path, which is given by the file variable.
If you take file and break it up with something like file.split("/") (where you should replace the slash with whatever appears between directories when you print file to terminal) then you should be able to get file broken up as a list, where the final element will be the current filename. Modify just this in the loop, put the whole thing pack together using "".join(_path + modified_dst) and then pass this to os.rename.
You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.
If you do print filename in the for loop #commented below, it gives you all the file names in the directory. yet when I call pd.ExcelFile(filename) it returns that there is no file with the name of : [the first file that ends with '.xlsx' What am I missing?
p.s: the indentation below is right, the if is under the for in my code, but it doesn't show this way here..
for filename in os.listdir('/Users/ramikhoury/PycharmProjects/R/excel_files'):
if filename.endswith(".xlsx"):
month = pd.ExcelFile(filename)
day_list = month.sheet_names
i = 0
for day in month.sheet_names:
df = pd.read_excel(month, sheet_name=day, skiprows=21)
df = df.iloc[:, 1:]
df = df[[df.columns[0], df.columns[4], df.columns[8]]]
df = df.iloc[1:16]
df['Date'] = day
df = df.set_index('Date')
day_list[i] = df
i += 1
month_frame = day_list[0]
x = 1
while x < len(day_list):
month_frame = pd.concat([month_frame, day_list[x]])
x += 1
print filename + ' created the following dataframe: \n'
print month_frame # month_frame is the combination of the all the sheets inside the file in one dataframe !
The problem is that your work directory is not the same as the directory you are listing. Since you know the absolute path of the directory, the easiest solution is to add os.chdir('/Users/ramikhoury/PycharmProjects/R/excel_files') to the top of your file.
Your "if" statement must be inside the for loop
The issue is that you are trying to open a relative file-path from a different directory than the one you are listing. Rather than using os it is probably better to use a higher level interface like pathlib:
import pathlib
for file_name in pathlib.Path("/Users/ramikhoury/PycharmProjects/R/excel_files").glob("*.xslx"):
# this produces full paths for you to use
pathlib was added in Python 3.4 so if you are using an older version of python, your best bet would be to use the much older glob module, which functions similarly:
import glob
for file_name in glob.glob("/Users/ramikhoury/PycharmProjects/R/excel_files/*.xslx"):
# this also produces full paths for you to use
If for some reason you really need to use the low-level os interface, the best way to solve this is by making use of the dir_fd optional argument to open:
# open the target directory
dir_fd = os.open("/Users/ramikhoury/PycharmProjects/R/excel_files", os.O_RDONLY)
try:
# pass the open file descriptor to the os.listdir method
for file_name in os.listdir(dir_fd):
# you could replace this with fnmatch.fnmatch
if file_name.endswith(".xlsx"):
# use the open directory fd as the `dir_fd` argument
# this opens file_name relative to your target directory
with os.fdopen(os.open(file_name, os.O_RDONLY, dir_fd=dir_fd)) as file_:
# do excel bits here
finally:
# close the directory
os.close(dir_fd)
While you could accomplish this fix by changing directories at the top of your script (as suggested by another answer), this has the side-effect of changing the current working directory of your process which is often undesirable and may have negative consequences. To make this work without side-effects requires you to chdir back to the original directory:
# store cwd
original_cwd = os.getcwd()
try:
os.chdir("/Users/ramikhoury/PycharmProjects/R/excel_files")
# do your listdir, etc
finally:
os.chdir(original_cwd)
Note that this introduces a race condition into your code, as original_cwd may be removed or the access controls for that directory might be changed such that you cannot chdir back to it, which is precisely why dir_fd exists.
dir_fd was added in Python 3.3, so if you are using an older version of Python I would recommend just using glob rather than the chdir solution.
For more on dir_fd see this very helpful answer.
Hi: I am trying to use the Pandas DataFrame.to_csv method to save a dataframe to a csv file:
filename = './dir/name.csv'
df.to_csv(filename)
However I am getting the error:
IOError: [Errno 2] No such file or directory: './dir/name.csv'
Shouldn't the to_csv method be able to create the file if it doesn't exist? This is what I am intending for it to do.
to_csv does create the file if it doesn't exist as you said, but it does not create directories that don't exist. Ensure that the subdirectory you are trying to save your file within has been created first.
I often do something like this in my work:
import os
outname = 'name.csv'
outdir = './dir'
if not os.path.exists(outdir):
os.mkdir(outdir)
fullname = os.path.join(outdir, outname)
df.to_csv(fullname)
This can easily be wrapped up in a function if you need to do this frequently.
Here is an alternative way to do this using the excellent standard library pathlib module, which generally makes things neater.
As explained elsewhere, to_csv will create the file if it doesn't exist, but won't create any non-existent directories in the path to the file, so you need to first ensure that these exist.
from pathlib import Path
output_file = 'my_file.csv'
output_dir = Path('long_path/to/my_dir')
output_dir.mkdir(parents=True, exist_ok=True)
df.to_csv(output_dir / output_file) # can join path elements with / operator
Setting parents=True will also create any necessary parent directories, and exist_ok=True means it won't raise an error if the directory already exists, so you don't have to explicitly check that separately.
I had this error when I accidentally added file:// at the begging of the save path. Since search brought me here, might be also helpful to someone.
Adding to the answer of Tim if you have the whole file path in one string you can use this modified version:
from pathlib import Path
output_file = '/tmp/long_path/to/my_dir/my_file.csv'
output_file_path = Path(output_file)
output_file_path.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(output_file)
By using the full path of the file in Path() it will point to that file, therefore we need to call .parent before .mkdir to not create a directory with the name of our file.
To save 'csv' file from jupyter to desktop:
df.to_csv(r'your directory\your file name.csv', index=False)
For example:
df5.to_csv(r'C:\Users\Asus\Desktop\DataSets\compounds\inactive_compounds.csv', index=False)
I want to rename a file from say {file1} to {file2}. I read about os.rename(file1,file2) in python and is able to do so.
I succeeded only when the the file is placed in the same folder as python script, so I want to ask how can we rename files of other folders i.e. different folder than the one in which python script is placed.
Just use the full path, instead of the relative path:
oldFile = 'C:\\folder\\subfolder\\inFile.txt'
newFile = 'C:\\foo\\bar\\somewhere\\other\\outFile.txt'
os.rename(oldFile, newFile)
To get the double-slash behavior, you can do the following
import os
oldFile = r'C:\folder\subfolder\inFile.txt' # note the r character for raw string
os.path.normpath(oldFile)
Output
'C:\\folder\\subfolder\\inFile.txt'
As others have noted, you need to use full path.
On the other note, take a look at shutil.move documentation, it can also be used for renaming.