How to get file extension correctly? - python

I know that this question is asked many times on this website. But I found that they missed an important point: only file extension with one period was taken into consider like *.png *.mp3, but how do I deal with these filename with two period like .tar.gz.
The basic code is:
filename = '/home/lancaster/Downloads/a.ppt'
extention = filename.split('/')[-1]
But obviously, this code do not work with the file like a.tar.gz.
How to deal with it? Thanks.

Python 3.4
You can now use Path from pathlib. It has many features, one of them is suffix:
>>> from pathlib import Path
>>> Path('my/library/setup.py').suffix
'.py'
>>> Path('my/library.tar.gz').suffix
'.gz'
>>> Path('my/library').suffix
''
If you want to get more than one suffix, use suffixes:
>>> from pathlib import Path
>>> Path('my/library.tar.gar').suffixes
['.tar', '.gar']
>>> Path('my/library.tar.gz').suffixes
['.tar', '.gz']
>>> Path('my/library').suffixes
[]

Here is a in build module in os. More about os.path.splitext.
In [1]: from os.path import splitext
In [2]: file_name,extension = splitext('/home/lancaster/Downloads/a.ppt')
In [3]: extension
Out[1]: '.ppt'
If you have to fine the extension of .tar.gz,.tar.bz2 you have to write a function like this
from os.path import splitext
def splitext_(path):
for ext in ['.tar.gz', '.tar.bz2']:
if path.endswith(ext):
return path[:-len(ext)], path[-len(ext):]
return splitext(path)
Result
In [4]: file_name,ext = splitext_('/home/lancaster/Downloads/a.tar.gz')
In [5]: ext
Out[2]: '.tar.gz'
Edit
Generally you can use this function
from os.path import splitext
def splitext_(path):
if len(path.split('.')) > 2:
return path.split('.')[0],'.'.join(path.split('.')[-2:])
return splitext(path)
It will work for all extensions.
Working on all files.
In [6]: inputs = ['a.tar.gz', 'b.tar.lzma', 'a.tar.lz', 'a.tar.lzo', 'a.tar.xz','a.png']
In [7]: for file_ in inputs:
file_name,extension = splitext_(file_)
print extension
....:
tar.gz
tar.lzma
tar.lz
tar.lzo
tar.xz
.png

The role of a file extension is to tell the viewer (and sometimes the computer) which application to use to handle the file.
Taking your worst-case example in your comments (a.ppt.tar.gz), this is a PowerPoint file that has been tar-balled and then gzipped. So you need to use a gzip-handling program to open it. Using PowerPoint or a tarball-handling program wouldn't work. OK, a clever program that knew how to handle both .tar and .gz files could understand both operations and work with a .tar.gz file - but note that it would do that even if the extension was simply .gz.
The fact that both tar and gzip add their extensions to the original filename, rather than replace them (as zip does) is a convenience. But the base name of the gzip file is still a.ppt.tar.

Simplest One:
import os.path
print os.path.splitext("/home/lancaster/Downloads/a.ppt")[1]
# '.ppt'

One possible way is:
Slice at "." => tmp_ext = filename.split('.')[1:]
Result is a list = ['tar', 'gz']
Join them together => extention = ".".join(tmp_ext)
Result is your extension as string = 'tar.gz'
Update: Example:
>>> test = "/test/test/test.tar.gz"
>>> t2 = test.split(".")[1:]
>>> t2
['tar', 'gz']
>>> ".".join(t2)
'tar.gz'

>>> import os
>>> import re
>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt')
>>> extensions = re.findall(r'\.([^.]+)', basename)
['ppt']
>>> filename = os.path.basename('/home/lancaster/Downloads/a.ppt.tar.gz')
>>> extensions = re.findall(r'\.([^.]+)', basename)
['ppt','tar','gz']

with re.findall and python 3.6
filename = '/home/Downloads/abc.ppt.tar.gz'
ext = r'\.\w{1,6}'
re.findall(f'{ext}\\b | {ext}$', filename, re.X)
['.ppt', '.tar', '.gz']

filename = '/home/lancaster/Downloads/a.tar.gz'
extention = filename.split('/')[-1]
if '.' in extention:
extention = extention.split('.')[-1]
if len(extention) > 0:
extention = '.'+extention
print extention

Related

Python Tkinter :How can I only display the FIle Name without the path? [duplicate]

How do I get the filename without the extension from a path in Python?
"/path/to/some/file.txt" → "file"
Getting the name of the file without the extension:
import os
print(os.path.splitext("/path/to/some/file.txt")[0])
Prints:
/path/to/some/file
Documentation for os.path.splitext.
Important Note: If the filename has multiple dots, only the extension after the last one is removed. For example:
import os
print(os.path.splitext("/path/to/some/file.txt.zip.asc")[0])
Prints:
/path/to/some/file.txt.zip
See other answers below if you need to handle that case.
Use .stem from pathlib in Python 3.4+
from pathlib import Path
Path('/root/dir/sub/file.ext').stem
will return
'file'
Note that if your file has multiple extensions .stem will only remove the last extension. For example, Path('file.tar.gz').stem will return 'file.tar'.
You can make your own with:
>>> import os
>>> base=os.path.basename('/root/dir/sub/file.ext')
>>> base
'file.ext'
>>> os.path.splitext(base)
('file', '.ext')
>>> os.path.splitext(base)[0]
'file'
Important note: If there is more than one . in the filename, only the last one is removed. For example:
/root/dir/sub/file.ext.zip -> file.ext
/root/dir/sub/file.ext.tar.gz -> file.ext.tar
See below for other answers that address that.
>>> print(os.path.splitext(os.path.basename("/path/to/file/hemanth.txt"))[0])
hemanth
In Python 3.4+ you can use the pathlib solution
from pathlib import Path
print(Path(your_path).resolve().stem)
https://docs.python.org/3/library/os.path.html
In python 3 pathlib "The pathlib module offers high-level path objects."
so,
>>> from pathlib import Path
>>> p = Path("/a/b/c.txt")
>>> p.with_suffix('')
WindowsPath('/a/b/c')
>>> p.stem
'c'
As noted by #IceAdor in a comment to #user2902201's solution, rsplit is the simplest solution robust to multiple periods (by limiting the number of splits to maxsplit of just 1 (from the end of the string)).
Here it is spelt out:
file = 'my.report.txt'
print file.rsplit('.', maxsplit=1)[0]
my.report
If you want to keep the path to the file and just remove the extension
>>> file = '/root/dir/sub.exten/file.data.1.2.dat'
>>> print ('.').join(file.split('.')[:-1])
/root/dir/sub.exten/file.data.1.2
os.path.splitext() won't work if there are multiple dots in the extension.
For example, images.tar.gz
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> print os.path.splitext(file_name)[0]
images.tar
You can just find the index of the first dot in the basename and then slice the basename to get just the filename without extension.
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> index_of_dot = file_name.index('.')
>>> file_name_without_extension = file_name[:index_of_dot]
>>> print file_name_without_extension
images
import os
filename, file_extension =os.path.splitext(os.path.basename('/d1/d2/example.cs'))
filename is 'example'
file_extension is '.cs'
'
Answers using Pathlib for Several Scenarios
Using Pathlib, it is trivial to get the filename when there is just one extension (or none), but it can be awkward to handle the general case of multiple extensions.
Zero or One extension
from pathlib import Path
pth = Path('./thefile.tar')
fn = pth.stem
print(fn) # thefile
# Explanation:
# the `stem` attribute returns only the base filename, stripping
# any leading path if present, and strips the extension after
# the last `.`, if present.
# Further tests
eg_paths = ['thefile',
'thefile.tar',
'./thefile',
'./thefile.tar',
'../../thefile.tar',
'.././thefile.tar',
'rel/pa.th/to/thefile',
'/abs/path/to/thefile.tar']
for p in eg_paths:
print(Path(p).stem) # prints thefile every time
Two or fewer extensions
from pathlib import Path
pth = Path('./thefile.tar.gz')
fn = pth.with_suffix('').stem
print(fn) # thefile
# Explanation:
# Using the `.with_suffix('')` trick returns a Path object after
# stripping one extension, and then we can simply use `.stem`.
# Further tests
eg_paths += ['./thefile.tar.gz',
'/abs/pa.th/to/thefile.tar.gz']
for p in eg_paths:
print(Path(p).with_suffix('').stem) # prints thefile every time
Any number of extensions (0, 1, or more)
from pathlib import Path
pth = Path('./thefile.tar.gz.bz.7zip')
fn = pth.name
if len(pth.suffixes) > 0:
s = pth.suffixes[0]
fn = fn.rsplit(s)[0]
# or, equivalently
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
# or simply run the full loop
fn = pth.name
for _ in pth.suffixes:
fn = fn.rsplit('.')[0]
# In any case:
print(fn) # thefile
# Explanation
#
# pth.name -> 'thefile.tar.gz.bz.7zip'
# pth.suffixes -> ['.tar', '.gz', '.bz', '.7zip']
#
# If there may be more than two extensions, we can test for
# that case with an if statement, or simply attempt the loop
# and break after rsplitting on the first extension instance.
# Alternatively, we may even run the full loop and strip one
# extension with every pass.
# Further tests
eg_paths += ['./thefile.tar.gz.bz.7zip',
'/abs/pa.th/to/thefile.tar.gz.bz.7zip']
for p in eg_paths:
pth = Path(p)
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # prints thefile every time
Special case in which the first extension is known
For instance, if the extension could be .tar, .tar.gz, .tar.gz.bz, etc; you can simply rsplit the known extension and take the first element:
pth = Path('foo/bar/baz.baz/thefile.tar.gz')
fn = pth.name.rsplit('.tar')[0]
print(fn) # thefile
Thought I would throw in a variation to the use of the os.path.splitext without the need to use array indexing.
The function always returns a (root, ext) pair so it is safe to use:
root, ext = os.path.splitext(path)
Example:
>>> import os
>>> path = 'my_text_file.txt'
>>> root, ext = os.path.splitext(path)
>>> root
'my_text_file'
>>> ext
'.txt'
But even when I import os, I am not able to call it path.basename. Is it possible to call it as directly as basename?
import os, and then use os.path.basename
importing os doesn't mean you can use os.foo without referring to os.
The other methods don't remove multiple extensions. Some also have problems with filenames that don't have extensions. This snippet deals with both instances and works in both Python 2 and 3. It grabs the basename from the path, splits the value on dots, and returns the first one which is the initial part of the filename.
import os
def get_filename_without_extension(file_path):
file_basename = os.path.basename(file_path)
filename_without_extension = file_basename.split('.')[0]
return filename_without_extension
Here's a set of examples to run:
example_paths = [
"FileName",
"./FileName",
"../../FileName",
"FileName.txt",
"./FileName.txt.zip.asc",
"/path/to/some/FileName",
"/path/to/some/FileName.txt",
"/path/to/some/FileName.txt.zip.asc"
]
for example_path in example_paths:
print(get_filename_without_extension(example_path))
In every case, the value printed is:
FileName
A multiple extension aware procedure. Works for str and unicode paths. Works in Python 2 and 3.
import os
def file_base_name(file_name):
if '.' in file_name:
separator_index = file_name.index('.')
base_name = file_name[:separator_index]
return base_name
else:
return file_name
def path_base_name(path):
file_name = os.path.basename(path)
return file_base_name(file_name)
Behavior:
>>> path_base_name('file')
'file'
>>> path_base_name(u'file')
u'file'
>>> path_base_name('file.txt')
'file'
>>> path_base_name(u'file.txt')
u'file'
>>> path_base_name('file.tar.gz')
'file'
>>> path_base_name('file.a.b.c.d.e.f.g')
'file'
>>> path_base_name('relative/path/file.ext')
'file'
>>> path_base_name('/absolute/path/file.ext')
'file'
>>> path_base_name('Relative\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('C:\\Absolute\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('/path with spaces/file.ext')
'file'
>>> path_base_name('C:\\Windows Path With Spaces\\file.txt')
'file'
>>> path_base_name('some/path/file name with spaces.tar.gz.zip.rar.7z')
'file name with spaces'
import os
path = "a/b/c/abc.txt"
print os.path.splitext(os.path.basename(path))[0]
import os
filename = C:\\Users\\Public\\Videos\\Sample Videos\\wildlife.wmv
This returns the filename without the extension(C:\Users\Public\Videos\Sample Videos\wildlife)
temp = os.path.splitext(filename)[0]
Now you can get just the filename from the temp with
os.path.basename(temp) #this returns just the filename (wildlife)
Very very very simpely no other modules !!!
import os
p = r"C:\Users\bilal\Documents\face Recognition python\imgs\northon.jpg"
# Get the filename only from the initial file path.
filename = os.path.basename(p)
# Use splitext() to get filename and extension separately.
(file, ext) = os.path.splitext(filename)
# Print outcome.
print("Filename without extension =", file)
print("Extension =", ext)
On Windows system I used drivername prefix as well, like:
>>> s = 'c:\\temp\\akarmi.txt'
>>> print(os.path.splitext(s)[0])
c:\temp\akarmi
So because I do not need drive letter or directory name, I use:
>>> print(os.path.splitext(os.path.basename(s))[0])
akarmi
Improving upon #spinup answer:
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # thefile
This works for filenames without extension also
I've read the answers, and I notice that there are many good solutions.
So, for those who are looking to get either (name or extension), here goes another solution, using the os module, both methods support files with multiple extensions.
import os
def get_file_name(path):
if not os.path.isdir(path):
return os.path.splitext(os.path.basename(path))[0].split(".")[0]
def get_file_extension(path):
extensions = []
copy_path = path
while True:
copy_path, result = os.path.splitext(copy_path)
if result != '':
extensions.append(result)
else:
break
extensions.reverse()
return "".join(extensions)
Note: this solution on windows does not support file names with the "\" character
We could do some simple split / pop magic as seen here (https://stackoverflow.com/a/424006/1250044), to extract the filename (respecting the windows and POSIX differences).
def getFileNameWithoutExtension(path):
return path.split('\\').pop().split('/').pop().rsplit('.', 1)[0]
getFileNameWithoutExtension('/path/to/file-0.0.1.ext')
# => file-0.0.1
getFileNameWithoutExtension('\\path\\to\\file-0.0.1.ext')
# => file-0.0.1
For convenience, a simple function wrapping the two methods from os.path :
def filename(path):
"""Return file name without extension from path.
See https://docs.python.org/3/library/os.path.html
"""
import os.path
b = os.path.split(path)[1] # path, *filename*
f = os.path.splitext(b)[0] # *file*, ext
#print(path, b, f)
return f
Tested with Python 3.5.
import os
list = []
def getFileName( path ):
for file in os.listdir(path):
#print file
try:
base=os.path.basename(file)
splitbase=os.path.splitext(base)
ext = os.path.splitext(base)[1]
if(ext):
list.append(base)
else:
newpath = path+"/"+file
#print path
getFileName(newpath)
except:
pass
return list
getFileName("/home/weexcel-java3/Desktop/backup")
print list
the easiest way to resolve this is to
import ntpath
print('Base name is ',ntpath.basename('/path/to/the/file/'))
this saves you time and computation cost.
I didn't look very hard but I didn't see anyone who used regex for this problem.
I interpreted the question as "given a path, return the basename without the extension."
e.g.
"path/to/file.json" => "file"
"path/to/my.file.json" => "my.file"
In Python 2.7, where we still live without pathlib...
def get_file_name_prefix(file_path):
basename = os.path.basename(file_path)
file_name_prefix_match = re.compile(r"^(?P<file_name_pre fix>.*)\..*$").match(basename)
if file_name_prefix_match is None:
return file_name
else:
return file_name_prefix_match.group("file_name_prefix")
get_file_name_prefix("path/to/file.json")
>> file
get_file_name_prefix("path/to/my.file.json")
>> my.file
get_file_name_prefix("path/to/no_extension")
>> no_extension
Using pathlib.Path.stem is the right way to go, but here is an ugly solution that is way more efficient than the pathlib based approach.
You have a filepath whose fields are separated by a forward slash /, slashes cannot be present in filenames, so you split the filepath by /, the last field is the filename.
The extension is always the last element of the list created by splitting the filename by dot ., so if you reverse the filename and split by dot once, the reverse of the second element is the file name without extension.
name = path.split('/')[-1][::-1].split('.', 1)[1][::-1]
Performance:
Python 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pathlib import Path
In [2]: file = 'D:/ffmpeg/ffmpeg.exe'
In [3]: Path(file).stem
Out[3]: 'ffmpeg'
In [4]: file.split('/')[-1][::-1].split('.', 1)[1][::-1]
Out[4]: 'ffmpeg'
In [5]: %timeit Path(file).stem
6.15 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit file.split('/')[-1][::-1].split('.', 1)[1][::-1]
671 ns ± 37.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]:
What about the following?
import pathlib
filename = '/path/to/dir/stem.ext.tar.gz'
pathlib.Path(filename).name[:-len(''.join(pathlib.Path(filename).suffixes))]
# -> 'stem'
or this equivalent?
pathlib.Path(filename).name[:-sum(map(len, pathlib.Path(filename).suffixes))]
>>>print(os.path.splitext(os.path.basename("/path/to/file/varun.txt"))[0]) varun
Here /path/to/file/varun.txt is the path to file and the output is varun
Assuming you're already using pathlib, here's a simple one-line approach that removes all extensions.
>>> from pathlib import Path
>>> pth = Path("/path/to.some/file.foo.bar.txt")
>>> pth.name.rstrip("".join(pth.suffixes))
'file'

How to create a list of files without their extensions in python 3 [duplicate]

How do I get the filename without the extension from a path in Python?
"/path/to/some/file.txt" → "file"
Getting the name of the file without the extension:
import os
print(os.path.splitext("/path/to/some/file.txt")[0])
Prints:
/path/to/some/file
Documentation for os.path.splitext.
Important Note: If the filename has multiple dots, only the extension after the last one is removed. For example:
import os
print(os.path.splitext("/path/to/some/file.txt.zip.asc")[0])
Prints:
/path/to/some/file.txt.zip
See other answers below if you need to handle that case.
Use .stem from pathlib in Python 3.4+
from pathlib import Path
Path('/root/dir/sub/file.ext').stem
will return
'file'
Note that if your file has multiple extensions .stem will only remove the last extension. For example, Path('file.tar.gz').stem will return 'file.tar'.
You can make your own with:
>>> import os
>>> base=os.path.basename('/root/dir/sub/file.ext')
>>> base
'file.ext'
>>> os.path.splitext(base)
('file', '.ext')
>>> os.path.splitext(base)[0]
'file'
Important note: If there is more than one . in the filename, only the last one is removed. For example:
/root/dir/sub/file.ext.zip -> file.ext
/root/dir/sub/file.ext.tar.gz -> file.ext.tar
See below for other answers that address that.
>>> print(os.path.splitext(os.path.basename("/path/to/file/hemanth.txt"))[0])
hemanth
In Python 3.4+ you can use the pathlib solution
from pathlib import Path
print(Path(your_path).resolve().stem)
https://docs.python.org/3/library/os.path.html
In python 3 pathlib "The pathlib module offers high-level path objects."
so,
>>> from pathlib import Path
>>> p = Path("/a/b/c.txt")
>>> p.with_suffix('')
WindowsPath('/a/b/c')
>>> p.stem
'c'
As noted by #IceAdor in a comment to #user2902201's solution, rsplit is the simplest solution robust to multiple periods (by limiting the number of splits to maxsplit of just 1 (from the end of the string)).
Here it is spelt out:
file = 'my.report.txt'
print file.rsplit('.', maxsplit=1)[0]
my.report
If you want to keep the path to the file and just remove the extension
>>> file = '/root/dir/sub.exten/file.data.1.2.dat'
>>> print ('.').join(file.split('.')[:-1])
/root/dir/sub.exten/file.data.1.2
os.path.splitext() won't work if there are multiple dots in the extension.
For example, images.tar.gz
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> print os.path.splitext(file_name)[0]
images.tar
You can just find the index of the first dot in the basename and then slice the basename to get just the filename without extension.
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> index_of_dot = file_name.index('.')
>>> file_name_without_extension = file_name[:index_of_dot]
>>> print file_name_without_extension
images
import os
filename, file_extension =os.path.splitext(os.path.basename('/d1/d2/example.cs'))
filename is 'example'
file_extension is '.cs'
'
Answers using Pathlib for Several Scenarios
Using Pathlib, it is trivial to get the filename when there is just one extension (or none), but it can be awkward to handle the general case of multiple extensions.
Zero or One extension
from pathlib import Path
pth = Path('./thefile.tar')
fn = pth.stem
print(fn) # thefile
# Explanation:
# the `stem` attribute returns only the base filename, stripping
# any leading path if present, and strips the extension after
# the last `.`, if present.
# Further tests
eg_paths = ['thefile',
'thefile.tar',
'./thefile',
'./thefile.tar',
'../../thefile.tar',
'.././thefile.tar',
'rel/pa.th/to/thefile',
'/abs/path/to/thefile.tar']
for p in eg_paths:
print(Path(p).stem) # prints thefile every time
Two or fewer extensions
from pathlib import Path
pth = Path('./thefile.tar.gz')
fn = pth.with_suffix('').stem
print(fn) # thefile
# Explanation:
# Using the `.with_suffix('')` trick returns a Path object after
# stripping one extension, and then we can simply use `.stem`.
# Further tests
eg_paths += ['./thefile.tar.gz',
'/abs/pa.th/to/thefile.tar.gz']
for p in eg_paths:
print(Path(p).with_suffix('').stem) # prints thefile every time
Any number of extensions (0, 1, or more)
from pathlib import Path
pth = Path('./thefile.tar.gz.bz.7zip')
fn = pth.name
if len(pth.suffixes) > 0:
s = pth.suffixes[0]
fn = fn.rsplit(s)[0]
# or, equivalently
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
# or simply run the full loop
fn = pth.name
for _ in pth.suffixes:
fn = fn.rsplit('.')[0]
# In any case:
print(fn) # thefile
# Explanation
#
# pth.name -> 'thefile.tar.gz.bz.7zip'
# pth.suffixes -> ['.tar', '.gz', '.bz', '.7zip']
#
# If there may be more than two extensions, we can test for
# that case with an if statement, or simply attempt the loop
# and break after rsplitting on the first extension instance.
# Alternatively, we may even run the full loop and strip one
# extension with every pass.
# Further tests
eg_paths += ['./thefile.tar.gz.bz.7zip',
'/abs/pa.th/to/thefile.tar.gz.bz.7zip']
for p in eg_paths:
pth = Path(p)
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # prints thefile every time
Special case in which the first extension is known
For instance, if the extension could be .tar, .tar.gz, .tar.gz.bz, etc; you can simply rsplit the known extension and take the first element:
pth = Path('foo/bar/baz.baz/thefile.tar.gz')
fn = pth.name.rsplit('.tar')[0]
print(fn) # thefile
Thought I would throw in a variation to the use of the os.path.splitext without the need to use array indexing.
The function always returns a (root, ext) pair so it is safe to use:
root, ext = os.path.splitext(path)
Example:
>>> import os
>>> path = 'my_text_file.txt'
>>> root, ext = os.path.splitext(path)
>>> root
'my_text_file'
>>> ext
'.txt'
But even when I import os, I am not able to call it path.basename. Is it possible to call it as directly as basename?
import os, and then use os.path.basename
importing os doesn't mean you can use os.foo without referring to os.
The other methods don't remove multiple extensions. Some also have problems with filenames that don't have extensions. This snippet deals with both instances and works in both Python 2 and 3. It grabs the basename from the path, splits the value on dots, and returns the first one which is the initial part of the filename.
import os
def get_filename_without_extension(file_path):
file_basename = os.path.basename(file_path)
filename_without_extension = file_basename.split('.')[0]
return filename_without_extension
Here's a set of examples to run:
example_paths = [
"FileName",
"./FileName",
"../../FileName",
"FileName.txt",
"./FileName.txt.zip.asc",
"/path/to/some/FileName",
"/path/to/some/FileName.txt",
"/path/to/some/FileName.txt.zip.asc"
]
for example_path in example_paths:
print(get_filename_without_extension(example_path))
In every case, the value printed is:
FileName
A multiple extension aware procedure. Works for str and unicode paths. Works in Python 2 and 3.
import os
def file_base_name(file_name):
if '.' in file_name:
separator_index = file_name.index('.')
base_name = file_name[:separator_index]
return base_name
else:
return file_name
def path_base_name(path):
file_name = os.path.basename(path)
return file_base_name(file_name)
Behavior:
>>> path_base_name('file')
'file'
>>> path_base_name(u'file')
u'file'
>>> path_base_name('file.txt')
'file'
>>> path_base_name(u'file.txt')
u'file'
>>> path_base_name('file.tar.gz')
'file'
>>> path_base_name('file.a.b.c.d.e.f.g')
'file'
>>> path_base_name('relative/path/file.ext')
'file'
>>> path_base_name('/absolute/path/file.ext')
'file'
>>> path_base_name('Relative\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('C:\\Absolute\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('/path with spaces/file.ext')
'file'
>>> path_base_name('C:\\Windows Path With Spaces\\file.txt')
'file'
>>> path_base_name('some/path/file name with spaces.tar.gz.zip.rar.7z')
'file name with spaces'
import os
path = "a/b/c/abc.txt"
print os.path.splitext(os.path.basename(path))[0]
import os
filename = C:\\Users\\Public\\Videos\\Sample Videos\\wildlife.wmv
This returns the filename without the extension(C:\Users\Public\Videos\Sample Videos\wildlife)
temp = os.path.splitext(filename)[0]
Now you can get just the filename from the temp with
os.path.basename(temp) #this returns just the filename (wildlife)
Very very very simpely no other modules !!!
import os
p = r"C:\Users\bilal\Documents\face Recognition python\imgs\northon.jpg"
# Get the filename only from the initial file path.
filename = os.path.basename(p)
# Use splitext() to get filename and extension separately.
(file, ext) = os.path.splitext(filename)
# Print outcome.
print("Filename without extension =", file)
print("Extension =", ext)
On Windows system I used drivername prefix as well, like:
>>> s = 'c:\\temp\\akarmi.txt'
>>> print(os.path.splitext(s)[0])
c:\temp\akarmi
So because I do not need drive letter or directory name, I use:
>>> print(os.path.splitext(os.path.basename(s))[0])
akarmi
Improving upon #spinup answer:
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # thefile
This works for filenames without extension also
I've read the answers, and I notice that there are many good solutions.
So, for those who are looking to get either (name or extension), here goes another solution, using the os module, both methods support files with multiple extensions.
import os
def get_file_name(path):
if not os.path.isdir(path):
return os.path.splitext(os.path.basename(path))[0].split(".")[0]
def get_file_extension(path):
extensions = []
copy_path = path
while True:
copy_path, result = os.path.splitext(copy_path)
if result != '':
extensions.append(result)
else:
break
extensions.reverse()
return "".join(extensions)
Note: this solution on windows does not support file names with the "\" character
We could do some simple split / pop magic as seen here (https://stackoverflow.com/a/424006/1250044), to extract the filename (respecting the windows and POSIX differences).
def getFileNameWithoutExtension(path):
return path.split('\\').pop().split('/').pop().rsplit('.', 1)[0]
getFileNameWithoutExtension('/path/to/file-0.0.1.ext')
# => file-0.0.1
getFileNameWithoutExtension('\\path\\to\\file-0.0.1.ext')
# => file-0.0.1
For convenience, a simple function wrapping the two methods from os.path :
def filename(path):
"""Return file name without extension from path.
See https://docs.python.org/3/library/os.path.html
"""
import os.path
b = os.path.split(path)[1] # path, *filename*
f = os.path.splitext(b)[0] # *file*, ext
#print(path, b, f)
return f
Tested with Python 3.5.
import os
list = []
def getFileName( path ):
for file in os.listdir(path):
#print file
try:
base=os.path.basename(file)
splitbase=os.path.splitext(base)
ext = os.path.splitext(base)[1]
if(ext):
list.append(base)
else:
newpath = path+"/"+file
#print path
getFileName(newpath)
except:
pass
return list
getFileName("/home/weexcel-java3/Desktop/backup")
print list
the easiest way to resolve this is to
import ntpath
print('Base name is ',ntpath.basename('/path/to/the/file/'))
this saves you time and computation cost.
I didn't look very hard but I didn't see anyone who used regex for this problem.
I interpreted the question as "given a path, return the basename without the extension."
e.g.
"path/to/file.json" => "file"
"path/to/my.file.json" => "my.file"
In Python 2.7, where we still live without pathlib...
def get_file_name_prefix(file_path):
basename = os.path.basename(file_path)
file_name_prefix_match = re.compile(r"^(?P<file_name_pre fix>.*)\..*$").match(basename)
if file_name_prefix_match is None:
return file_name
else:
return file_name_prefix_match.group("file_name_prefix")
get_file_name_prefix("path/to/file.json")
>> file
get_file_name_prefix("path/to/my.file.json")
>> my.file
get_file_name_prefix("path/to/no_extension")
>> no_extension
Using pathlib.Path.stem is the right way to go, but here is an ugly solution that is way more efficient than the pathlib based approach.
You have a filepath whose fields are separated by a forward slash /, slashes cannot be present in filenames, so you split the filepath by /, the last field is the filename.
The extension is always the last element of the list created by splitting the filename by dot ., so if you reverse the filename and split by dot once, the reverse of the second element is the file name without extension.
name = path.split('/')[-1][::-1].split('.', 1)[1][::-1]
Performance:
Python 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pathlib import Path
In [2]: file = 'D:/ffmpeg/ffmpeg.exe'
In [3]: Path(file).stem
Out[3]: 'ffmpeg'
In [4]: file.split('/')[-1][::-1].split('.', 1)[1][::-1]
Out[4]: 'ffmpeg'
In [5]: %timeit Path(file).stem
6.15 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit file.split('/')[-1][::-1].split('.', 1)[1][::-1]
671 ns ± 37.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]:
What about the following?
import pathlib
filename = '/path/to/dir/stem.ext.tar.gz'
pathlib.Path(filename).name[:-len(''.join(pathlib.Path(filename).suffixes))]
# -> 'stem'
or this equivalent?
pathlib.Path(filename).name[:-sum(map(len, pathlib.Path(filename).suffixes))]
>>>print(os.path.splitext(os.path.basename("/path/to/file/varun.txt"))[0]) varun
Here /path/to/file/varun.txt is the path to file and the output is varun
Assuming you're already using pathlib, here's a simple one-line approach that removes all extensions.
>>> from pathlib import Path
>>> pth = Path("/path/to.some/file.foo.bar.txt")
>>> pth.name.rstrip("".join(pth.suffixes))
'file'

How to get rid of extensions from file basename using python

I have got the complete path of files in a list like this:
a = ['home/robert/Documents/Workspace/datafile.xlsx', 'home/robert/Documents/Workspace/datafile2.xls', 'home/robert/Documents/Workspace/datafile3.xlsx']
what I want is to get just the file NAMES without their extensions, like:
b = ['datafile', 'datafile2', 'datafile3']
What I have tried is:
xfn = re.compile(r'(\.xls)+')
for name in a:
fp, fb = os.path.split(fp)
ofn = xfn.sub('', name)
b.append(ofn)
But it results in:
b = ['datafilex', 'datafile2', 'datafile3x']
The regex you've used is wrong. (\.xls)+ matches strings of the form .xls, .xls.xls, etc. This is why there is a remaining x in the .xlsx items. What you want is \.xls.*, i.e. a .xls followed by zero or more of any characters.
You don't really need to use regex. There are specialized methods in os.path that deals with this: basename and splitext.
>>> import os.path
>>> os.path.basename('home/robert/Documents/Workspace/datafile.xlsx')
'datafile.xlsx'
>>> os.path.splitext(os.path.basename('home/robert/Documents/Workspace/datafile.xlsx'))[0]
'datafile'
so, assuming you don't really care about the .xls/.xlsx suffix, your code can be as simple as:
>>> a = ['home/robert/Documents/Workspace/datafile.xlsx', 'home/robert/Documents/Workspace/datafile2.xls', 'home/robert/Documents/Workspace/datafile3.xlsx']
>>> [os.path.splitext(os.path.basename(fn))[0] for fn in a]
['datafile', 'datafile2', 'datafile3']
(also note the list comprehension.)
Oneliner:
>>> filename = 'file.ext'
>>> '.'.join(filename.split('.')[:-1]) if '.' in filename else filename
'file'
This is a repeat of:
How to get the filename without the extension from a path in Python?
https://docs.python.org/3/library/os.path.html
In python 3 pathlib "The pathlib module offers high-level path objects." so,
>>> from pathlib import Path
>>> p = Path("/a/b/c.txt")
>>> print(p.with_suffix(''))
\a\b\c
>>> print(p.stem)
c
Why not just use the split method?
def get_filename(path):
""" Gets a filename (without extension) from a provided path """
filename = path.split('/')[-1].split('.')[0]
return filename
>>> path = '/home/robert/Documents/Workspace/datafile.xlsx'
>>> filename = get_filename(path)
>>> filename
'datafile'

How to get only the last part of a path in Python?

In python, suppose I have a path like this:
/folderA/folderB/folderC/folderD/
How can I get just the folderD part?
Use os.path.normpath, then os.path.basename:
>>> os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
'folderD'
The first strips off any trailing slashes, the second gives you the last part of the path. Using only basename gives everything after the last slash, which in this case is ''.
With python 3 you can use the pathlib module (pathlib.PurePath for example):
>>> import pathlib
>>> path = pathlib.PurePath('/folderA/folderB/folderC/folderD/')
>>> path.name
'folderD'
If you want the last folder name where a file is located:
>>> path = pathlib.PurePath('/folderA/folderB/folderC/folderD/file.py')
>>> path.parent.name
'folderD'
You could do
>>> import os
>>> os.path.basename('/folderA/folderB/folderC/folderD')
UPDATE1: This approach works in case you give it /folderA/folderB/folderC/folderD/xx.py. This gives xx.py as the basename. Which is not what you want I guess. So you could do this -
>>> import os
>>> path = "/folderA/folderB/folderC/folderD"
>>> if os.path.isdir(path):
dirname = os.path.basename(path)
UPDATE2: As lars pointed out, making changes so as to accomodate trailing '/'.
>>> from os.path import normpath, basename
>>> basename(normpath('/folderA/folderB/folderC/folderD/'))
'folderD'
Here is my approach:
>>> import os
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD/test.py'))
folderD
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD/'))
folderD
>>> print os.path.basename(
os.path.dirname('/folderA/folderB/folderC/folderD'))
folderC
I was searching for a solution to get the last foldername where the file is located, I just used split two times, to get the right part. It's not the question but google transfered me here.
pathname = "/folderA/folderB/folderC/folderD/filename.py"
head, tail = os.path.split(os.path.split(pathname)[0])
print(head + " " + tail)
I like the parts method of Path for this:
grandparent_directory, parent_directory, filename = Path(export_filename).parts[-3:]
log.info(f'{t: <30}: {num_rows: >7} Rows exported to {grandparent_directory}/{parent_directory}/{filename}')
If you use the native python package pathlib it's really simple.
>>> from pathlib import Path
>>> your_path = Path("/folderA/folderB/folderC/folderD/")
>>> your_path.stem
'folderD'
Suppose you have the path to a file in folderD.
>>> from pathlib import Path
>>> your_path = Path("/folderA/folderB/folderC/folderD/file.txt")
>>> your_path.name
'file.txt'
>>> your_path.parent
'folderD'
During my current projects, I'm often passing rear parts of a path to a function and therefore use the Path module. To get the n-th part in reverse order, I'm using:
from typing import Union
from pathlib import Path
def get_single_subpath_part(base_dir: Union[Path, str], n:int) -> str:
if n ==0:
return Path(base_dir).name
for _ in range(n):
base_dir = Path(base_dir).parent
return getattr(base_dir, "name")
path= "/folderA/folderB/folderC/folderD/"
# for getting the last part:
print(get_single_subpath_part(path, 0))
# yields "folderD"
# for the second last
print(get_single_subpath_part(path, 1))
#yields "folderC"
Furthermore, to pass the n-th part in reverse order of a path containing the remaining path, I use:
from typing import Union
from pathlib import Path
def get_n_last_subparts_path(base_dir: Union[Path, str], n:int) -> Path:
return Path(*Path(base_dir).parts[-n-1:])
path= "/folderA/folderB/folderC/folderD/"
# for getting the last part:
print(get_n_last_subparts_path(path, 0))
# yields a `Path` object of "folderD"
# for second last and last part together
print(get_n_last_subparts_path(path, 1))
# yields a `Path` object of "folderc/folderD"
Note that this function returns a Pathobject which can easily be converted to a string (e.g. str(path))
path = "/folderA/folderB/folderC/folderD/"
last = path.split('/').pop()
str = "/folderA/folderB/folderC/folderD/"
print str.split("/")[-2]

How do I get the filename without the extension from a path in Python?

How do I get the filename without the extension from a path in Python?
"/path/to/some/file.txt" → "file"
Getting the name of the file without the extension:
import os
print(os.path.splitext("/path/to/some/file.txt")[0])
Prints:
/path/to/some/file
Documentation for os.path.splitext.
Important Note: If the filename has multiple dots, only the extension after the last one is removed. For example:
import os
print(os.path.splitext("/path/to/some/file.txt.zip.asc")[0])
Prints:
/path/to/some/file.txt.zip
See other answers below if you need to handle that case.
Use .stem from pathlib in Python 3.4+
from pathlib import Path
Path('/root/dir/sub/file.ext').stem
will return
'file'
Note that if your file has multiple extensions .stem will only remove the last extension. For example, Path('file.tar.gz').stem will return 'file.tar'.
You can make your own with:
>>> import os
>>> base=os.path.basename('/root/dir/sub/file.ext')
>>> base
'file.ext'
>>> os.path.splitext(base)
('file', '.ext')
>>> os.path.splitext(base)[0]
'file'
Important note: If there is more than one . in the filename, only the last one is removed. For example:
/root/dir/sub/file.ext.zip -> file.ext
/root/dir/sub/file.ext.tar.gz -> file.ext.tar
See below for other answers that address that.
>>> print(os.path.splitext(os.path.basename("/path/to/file/hemanth.txt"))[0])
hemanth
In Python 3.4+ you can use the pathlib solution
from pathlib import Path
print(Path(your_path).resolve().stem)
https://docs.python.org/3/library/os.path.html
In python 3 pathlib "The pathlib module offers high-level path objects."
so,
>>> from pathlib import Path
>>> p = Path("/a/b/c.txt")
>>> p.with_suffix('')
WindowsPath('/a/b/c')
>>> p.stem
'c'
As noted by #IceAdor in a comment to #user2902201's solution, rsplit is the simplest solution robust to multiple periods (by limiting the number of splits to maxsplit of just 1 (from the end of the string)).
Here it is spelt out:
file = 'my.report.txt'
print file.rsplit('.', maxsplit=1)[0]
my.report
If you want to keep the path to the file and just remove the extension
>>> file = '/root/dir/sub.exten/file.data.1.2.dat'
>>> print ('.').join(file.split('.')[:-1])
/root/dir/sub.exten/file.data.1.2
os.path.splitext() won't work if there are multiple dots in the extension.
For example, images.tar.gz
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> print os.path.splitext(file_name)[0]
images.tar
You can just find the index of the first dot in the basename and then slice the basename to get just the filename without extension.
>>> import os
>>> file_path = '/home/dc/images.tar.gz'
>>> file_name = os.path.basename(file_path)
>>> index_of_dot = file_name.index('.')
>>> file_name_without_extension = file_name[:index_of_dot]
>>> print file_name_without_extension
images
import os
filename, file_extension =os.path.splitext(os.path.basename('/d1/d2/example.cs'))
filename is 'example'
file_extension is '.cs'
'
Answers using Pathlib for Several Scenarios
Using Pathlib, it is trivial to get the filename when there is just one extension (or none), but it can be awkward to handle the general case of multiple extensions.
Zero or One extension
from pathlib import Path
pth = Path('./thefile.tar')
fn = pth.stem
print(fn) # thefile
# Explanation:
# the `stem` attribute returns only the base filename, stripping
# any leading path if present, and strips the extension after
# the last `.`, if present.
# Further tests
eg_paths = ['thefile',
'thefile.tar',
'./thefile',
'./thefile.tar',
'../../thefile.tar',
'.././thefile.tar',
'rel/pa.th/to/thefile',
'/abs/path/to/thefile.tar']
for p in eg_paths:
print(Path(p).stem) # prints thefile every time
Two or fewer extensions
from pathlib import Path
pth = Path('./thefile.tar.gz')
fn = pth.with_suffix('').stem
print(fn) # thefile
# Explanation:
# Using the `.with_suffix('')` trick returns a Path object after
# stripping one extension, and then we can simply use `.stem`.
# Further tests
eg_paths += ['./thefile.tar.gz',
'/abs/pa.th/to/thefile.tar.gz']
for p in eg_paths:
print(Path(p).with_suffix('').stem) # prints thefile every time
Any number of extensions (0, 1, or more)
from pathlib import Path
pth = Path('./thefile.tar.gz.bz.7zip')
fn = pth.name
if len(pth.suffixes) > 0:
s = pth.suffixes[0]
fn = fn.rsplit(s)[0]
# or, equivalently
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
# or simply run the full loop
fn = pth.name
for _ in pth.suffixes:
fn = fn.rsplit('.')[0]
# In any case:
print(fn) # thefile
# Explanation
#
# pth.name -> 'thefile.tar.gz.bz.7zip'
# pth.suffixes -> ['.tar', '.gz', '.bz', '.7zip']
#
# If there may be more than two extensions, we can test for
# that case with an if statement, or simply attempt the loop
# and break after rsplitting on the first extension instance.
# Alternatively, we may even run the full loop and strip one
# extension with every pass.
# Further tests
eg_paths += ['./thefile.tar.gz.bz.7zip',
'/abs/pa.th/to/thefile.tar.gz.bz.7zip']
for p in eg_paths:
pth = Path(p)
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # prints thefile every time
Special case in which the first extension is known
For instance, if the extension could be .tar, .tar.gz, .tar.gz.bz, etc; you can simply rsplit the known extension and take the first element:
pth = Path('foo/bar/baz.baz/thefile.tar.gz')
fn = pth.name.rsplit('.tar')[0]
print(fn) # thefile
Thought I would throw in a variation to the use of the os.path.splitext without the need to use array indexing.
The function always returns a (root, ext) pair so it is safe to use:
root, ext = os.path.splitext(path)
Example:
>>> import os
>>> path = 'my_text_file.txt'
>>> root, ext = os.path.splitext(path)
>>> root
'my_text_file'
>>> ext
'.txt'
But even when I import os, I am not able to call it path.basename. Is it possible to call it as directly as basename?
import os, and then use os.path.basename
importing os doesn't mean you can use os.foo without referring to os.
The other methods don't remove multiple extensions. Some also have problems with filenames that don't have extensions. This snippet deals with both instances and works in both Python 2 and 3. It grabs the basename from the path, splits the value on dots, and returns the first one which is the initial part of the filename.
import os
def get_filename_without_extension(file_path):
file_basename = os.path.basename(file_path)
filename_without_extension = file_basename.split('.')[0]
return filename_without_extension
Here's a set of examples to run:
example_paths = [
"FileName",
"./FileName",
"../../FileName",
"FileName.txt",
"./FileName.txt.zip.asc",
"/path/to/some/FileName",
"/path/to/some/FileName.txt",
"/path/to/some/FileName.txt.zip.asc"
]
for example_path in example_paths:
print(get_filename_without_extension(example_path))
In every case, the value printed is:
FileName
A multiple extension aware procedure. Works for str and unicode paths. Works in Python 2 and 3.
import os
def file_base_name(file_name):
if '.' in file_name:
separator_index = file_name.index('.')
base_name = file_name[:separator_index]
return base_name
else:
return file_name
def path_base_name(path):
file_name = os.path.basename(path)
return file_base_name(file_name)
Behavior:
>>> path_base_name('file')
'file'
>>> path_base_name(u'file')
u'file'
>>> path_base_name('file.txt')
'file'
>>> path_base_name(u'file.txt')
u'file'
>>> path_base_name('file.tar.gz')
'file'
>>> path_base_name('file.a.b.c.d.e.f.g')
'file'
>>> path_base_name('relative/path/file.ext')
'file'
>>> path_base_name('/absolute/path/file.ext')
'file'
>>> path_base_name('Relative\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('C:\\Absolute\\Windows\\Path\\file.txt')
'file'
>>> path_base_name('/path with spaces/file.ext')
'file'
>>> path_base_name('C:\\Windows Path With Spaces\\file.txt')
'file'
>>> path_base_name('some/path/file name with spaces.tar.gz.zip.rar.7z')
'file name with spaces'
import os
path = "a/b/c/abc.txt"
print os.path.splitext(os.path.basename(path))[0]
import os
filename = C:\\Users\\Public\\Videos\\Sample Videos\\wildlife.wmv
This returns the filename without the extension(C:\Users\Public\Videos\Sample Videos\wildlife)
temp = os.path.splitext(filename)[0]
Now you can get just the filename from the temp with
os.path.basename(temp) #this returns just the filename (wildlife)
Very very very simpely no other modules !!!
import os
p = r"C:\Users\bilal\Documents\face Recognition python\imgs\northon.jpg"
# Get the filename only from the initial file path.
filename = os.path.basename(p)
# Use splitext() to get filename and extension separately.
(file, ext) = os.path.splitext(filename)
# Print outcome.
print("Filename without extension =", file)
print("Extension =", ext)
On Windows system I used drivername prefix as well, like:
>>> s = 'c:\\temp\\akarmi.txt'
>>> print(os.path.splitext(s)[0])
c:\temp\akarmi
So because I do not need drive letter or directory name, I use:
>>> print(os.path.splitext(os.path.basename(s))[0])
akarmi
Improving upon #spinup answer:
fn = pth.name
for s in pth.suffixes:
fn = fn.rsplit(s)[0]
break
print(fn) # thefile
This works for filenames without extension also
I've read the answers, and I notice that there are many good solutions.
So, for those who are looking to get either (name or extension), here goes another solution, using the os module, both methods support files with multiple extensions.
import os
def get_file_name(path):
if not os.path.isdir(path):
return os.path.splitext(os.path.basename(path))[0].split(".")[0]
def get_file_extension(path):
extensions = []
copy_path = path
while True:
copy_path, result = os.path.splitext(copy_path)
if result != '':
extensions.append(result)
else:
break
extensions.reverse()
return "".join(extensions)
Note: this solution on windows does not support file names with the "\" character
We could do some simple split / pop magic as seen here (https://stackoverflow.com/a/424006/1250044), to extract the filename (respecting the windows and POSIX differences).
def getFileNameWithoutExtension(path):
return path.split('\\').pop().split('/').pop().rsplit('.', 1)[0]
getFileNameWithoutExtension('/path/to/file-0.0.1.ext')
# => file-0.0.1
getFileNameWithoutExtension('\\path\\to\\file-0.0.1.ext')
# => file-0.0.1
For convenience, a simple function wrapping the two methods from os.path :
def filename(path):
"""Return file name without extension from path.
See https://docs.python.org/3/library/os.path.html
"""
import os.path
b = os.path.split(path)[1] # path, *filename*
f = os.path.splitext(b)[0] # *file*, ext
#print(path, b, f)
return f
Tested with Python 3.5.
import os
list = []
def getFileName( path ):
for file in os.listdir(path):
#print file
try:
base=os.path.basename(file)
splitbase=os.path.splitext(base)
ext = os.path.splitext(base)[1]
if(ext):
list.append(base)
else:
newpath = path+"/"+file
#print path
getFileName(newpath)
except:
pass
return list
getFileName("/home/weexcel-java3/Desktop/backup")
print list
the easiest way to resolve this is to
import ntpath
print('Base name is ',ntpath.basename('/path/to/the/file/'))
this saves you time and computation cost.
I didn't look very hard but I didn't see anyone who used regex for this problem.
I interpreted the question as "given a path, return the basename without the extension."
e.g.
"path/to/file.json" => "file"
"path/to/my.file.json" => "my.file"
In Python 2.7, where we still live without pathlib...
def get_file_name_prefix(file_path):
basename = os.path.basename(file_path)
file_name_prefix_match = re.compile(r"^(?P<file_name_pre fix>.*)\..*$").match(basename)
if file_name_prefix_match is None:
return file_name
else:
return file_name_prefix_match.group("file_name_prefix")
get_file_name_prefix("path/to/file.json")
>> file
get_file_name_prefix("path/to/my.file.json")
>> my.file
get_file_name_prefix("path/to/no_extension")
>> no_extension
Using pathlib.Path.stem is the right way to go, but here is an ugly solution that is way more efficient than the pathlib based approach.
You have a filepath whose fields are separated by a forward slash /, slashes cannot be present in filenames, so you split the filepath by /, the last field is the filename.
The extension is always the last element of the list created by splitting the filename by dot ., so if you reverse the filename and split by dot once, the reverse of the second element is the file name without extension.
name = path.split('/')[-1][::-1].split('.', 1)[1][::-1]
Performance:
Python 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from pathlib import Path
In [2]: file = 'D:/ffmpeg/ffmpeg.exe'
In [3]: Path(file).stem
Out[3]: 'ffmpeg'
In [4]: file.split('/')[-1][::-1].split('.', 1)[1][::-1]
Out[4]: 'ffmpeg'
In [5]: %timeit Path(file).stem
6.15 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: %timeit file.split('/')[-1][::-1].split('.', 1)[1][::-1]
671 ns ± 37.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [7]:
What about the following?
import pathlib
filename = '/path/to/dir/stem.ext.tar.gz'
pathlib.Path(filename).name[:-len(''.join(pathlib.Path(filename).suffixes))]
# -> 'stem'
or this equivalent?
pathlib.Path(filename).name[:-sum(map(len, pathlib.Path(filename).suffixes))]
>>>print(os.path.splitext(os.path.basename("/path/to/file/varun.txt"))[0]) varun
Here /path/to/file/varun.txt is the path to file and the output is varun
Assuming you're already using pathlib and Python 3.9 or newer, here's a simple one-line approach that removes all extensions.
>>> from pathlib import Path
>>> pth = Path("/path/to.some/file.foo.bar.txt")
>>> pth.name.removesuffix("".join(pth.suffixes))
'file'

Categories