Removing file extension from filename with file handle as input - python

I have the following code f = open('01-01-2017.csv')
From f variable, I need to remove the ".csv" and set the remaining "01-01-2017" to a variable called "date". what is the best way to accomplish this

just retrieve the name of the file using f.name and apply os.path.splitext, keep the left part:
import os
date = os.path.splitext(os.path.basename(f.name))[0]
(I've used os.path.basename in case the file has an absolute path)

Related

A way to create files and directories without overwriting

You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.

Run only if "if " statement is true.!

So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?
The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.
Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.
Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))

python split string and list full path name

I am using glob to get a list of all PDF files in a folder (I need full path names to upload file to cloud)
also, during the upload I need to assign a "title" to the file which we be the items name in the cloud.
I need to split the last "\" and the "." and get the values in between. for example:
pdf_list = glob.glob(r'C:\User\username\Desktop\pdf\*.pdf')
a item in the list will be: "c:\User\username\Desktop\pdf\4434343434331.pdf"
I need another pythonic way to grab the pdfs file name in a separate variable while still in the for loop.
for file in pdf_list:
upload.file
file.title(file.split(".")[0]
however the above split will not return my desired results but something along those lines
I am using a for loop to upload each pdf (using file path)
Actually, there is a function for this already:
for file in pdf_list:
file_name = os.path.basename(file)
upload.file(file_name)
You can use pathlib, for example:
from pathlib import Path
p = list(Path('C:/User/username/Desktop/pdf').glob('*.pdf'))
first_filename = p[0].name

Rename and move file with Python

I have a Python script that compares existing file names in a folder to a reference table and then determines if it needs to be renamed or not.
As it loops through each filename:
'oldname' = the current file name
'newname' = what it needs to be renamed to
I want rename the file and move it to a new folder "..\renamedfiles"
Can I do the rename and the move at the same time as it iterates through the loop?
Yes you can do this. In Python you can use the move function in shutil library to achieve this.
Let's say on Linux, you have a file in /home/user/Downloads folder named "test.txt" and you want to move it to /home/user/Documents and also change the name to "useful_name.txt". You can do both things in the same line of code:
import shutil
shutil.move('/home/user/Downloads/test.txt', '/home/user/Documents/useful_name.txt')
In your case you can do this:
import shutil
shutil.move('oldname', 'renamedfiles/newname')
os.rename (and os.replace) won't work if the source and target locations are on different partitions/drives/devices. If that's the case, you need to use shutil.move, which will use atomic renaming if possible, and fallback to copy-then-delete if the destination is not on the same file system. It's perfectly happy to both move and rename in the same operation; the operation is the same regardless.
To do both of the operations, you can use the os.rename(src, dest) function.
You should have the wanted directory to save the file in, and the new file name. You can do this for every file you run across in your loop.
For example:
# In Windows
dest_dir = "tmp\\2"
new_name = "bar.txt"
current_file_name = "tmp\\1\\foo.txt"
os.rename(current_file_name, os.path.join(dest_dir, new_name))
The rename function allows you to change the name of the file and it's folder at the same time.
To prevent any errors in renaming and moving of the file, use shutil.move.
Since Python 3.4, working with paths is done easily with pathlib. Moving/renaming a file is done with rename or replace (will unconditionally do the replace). So combining with the parent attribute and the concat operator, you can do:
from pathlib import Path
source = Path("path/to/file/oldname")
target = source.replace(source.parent / "renames" / "newname")
Create a Python file in your desired directory and write something like that :
import os
for filename in os.listdir("."):
if(filename ...):
newFilename = ...
os.rename(filename, newFilename)

Changing name of file until it is unique

I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.

Categories