I want to replace a file path such as "C:\Users\Bob\Documents\file.xlsx" to "C:\Users\Bob\Documents" only. What will be the regular expression to replace the 'file.xlsx' to empty. The file name can be anything with any extension like txt, xls, csv etc.
I'm unable to create the regex to replace the right group.
Regex Self Try
You're not looking for regex here, you're looking for os.path.dirname.
import os
....
path = r"C:\Users\Bob\Documents\file.xlsx"
os.path.dirname(path)
Output:
'C:\\Users\\Bob\\Documents'
Extra info: I highly recommend using os.path.join to create cross platform compatible paths, instead of using a string directly.
Related
I have a given file path. For example, "C:\Users\cobyk\Downloads\GrassyPath.jpg". I would like to pull in a separate string, the image file name.
I'm assuming the best way to do that is to start from the back end of the string, find the final slash and then take the characters following that slash. Is there a method to do this already or will I have search through the string via a for loop, find the last slash myself, and then do the transferring manually?
The pathlib module makes it very easy to access individual parts of a file path like the final path component:
from pathlib import Path
image_path = Path(r"C:\Users\cobyk\Downloads\GrassyPath.jpg")
print(image_path.name) # -> GrassyPath.jpg
You can certainly search manually as you've suggested. However, the Python standard library already has, as you suspected, a function which does this for you.
import os
file_name = os.path.basename(r'C:\Users\cobyk\Downloads\GrassyPath.jpg')
I want to be able to use the glob module in Python 3.9 to match filenames in a directory containing the following file names:
"MM05_awani3_StudentA.py"
"MM05_liu127.py"
Specifically, I want to be able to loop over all the files in a directory that fit a certain pattern. So I want to use a for loop like this:
for file in current_path.glob("string"):
# do something
The glob pattern "MM05_submissions/MM05_*[a-z0-9]?(_Student[A-Z]).py" seems to work according to DigitalOcean's glob tester tool, but I'm not getting any matches inside of Python 3.9
Is the glob used on DigitalOcean's tester different from the one in Python?
Can I match optional groups in Python using round brackets ()?
If not, should I use something like RegEx to loop over files that match a certain pattern in a directory?
You can't use (...) grouping, no. The glob() module uses the fnmatch module to do the matching, and it supports *, ?, [seq] and [!seq], nothing more.
However, fnmatch uses a simple pattern-to-regex conversion to test filenames. Just do the same yourself with os.scandir():
import re
import os
pattern = re.compile("MM05_[a-z0-9]*(_Student[A-Z])?\.py")
for entry in os.scandir("MM05_submissions"):
if pattern.fullmatch(entry.name):
# name matched the pattern
If you need to support arbitrary depth patterns using directory names, you'll have to write something up using os.walk().
Given is a variable that contains a windows file path. I have to then go and read this file. The problem here is that the path contains escape characters, and I can't seem to get rid of it. I checked os.path and pathlib, but all expect the correct text formatting already, which I can't seem to construct.
For example this. Please note that fPath is given, so I cant prefix it with r for a rawpath.
#this is given, I cant rawpath it with r
fPath = "P:\python\t\temp.txt"
file = open(fPath, "r")
for line in file:
print (line)
How can I turn fPath via some function or method from:
"P:\python\t\temp.txt"
to
"P:/python/t/temp.txt"
I've tried also tried .replace("\","/"), which doesnt work.
I'm using Python 3.7 for this.
You can use os.path.abspath() to convert it:
print(os.path.abspath("P:\python\t\temp.txt"))
>>> P:/python/t/temp.txt
See the documentation of os.path here.
I've solved it.
The issues lies with the python interpreter. \t and all the others don't exist as such data, but are interpretations of nonprint characters.
So I got a bit lucky and someone else already faced the same problem and solved it with a hard brute-force method:
http://code.activestate.com/recipes/65211/
I just had to find it.
After that I have a raw string without escaped characters, and just need to run the simple replace() on it to get a workable path.
You can use Path function from pathlib library.
from pathlib import Path
docs_folder = Path("some_folder/some_folder/")
text_file = docs_folder / "some_file.txt"
f = open(text_file)
if you would like to do replace then do
replace("\\","/")
When using python version >= 3.4, the class Path from module pathlib offers a function called as_posix, which will sort of convert a path to *nix style path. For example, if you were to build Path object via p = pathlib.Path('C:\\Windows\\SysWOW64\\regedit.exe'), asking it for p.as_posix() it would yield C:/Windows/SysWOW64/regedit.exe. So to obtain a complete *nix style path, you'd need to convert the drive letter manually.
I came across similar problem with Windows file paths. This is what is working for me:
import os
file = input(str().split('\\')
file = '/'.join(file)
This gave me the input from this:
"D:\test.txt"
to this:
"D:/test.txt"
Basically when trying to work with the Windows path, python tends to replace '' to '\'. It goes for every backslash. When working with filepaths, you won't have double slashes since those are splitting folder names.
This way you can list all folders by order by splitting '\' and then rejoining them by .join function with frontslash.
Hopefully this helps!
I am trying to match file names within a folder using python so that I can run a secondary process on the files that match. My file names are such that they begin differently but match strings at some point as below:
3322_VEGETATION_AREA_2009_09
3322_VEGETATION_LINE_2009_09
4522_VEGETATION_POINT_2009_09
4422_VEGETATION_AREA_2009_09
8722_VEGETATION_LINE_2009_09
2522_VEGETATION_POINT_2009_09
4222_VEGETATION_AREA_2009_09
3522_VEGETATION_LINE_2009_09
3622_VEGETATION_POINT_2009_09
Would regex be the right approach to matching those files after the first underscore or am I overthinking this?
import glob
files = glob.glob("*VEGETATION*")
should do the trick. It should find all files in the current directory that contain "VEGETATION" somewhere in the filename
I have a ZIP file and I need to extract all the files (normally one) that contain the string "test" in the filename. They are all xlsx files.
I am using Python zipfile for that. This is my code that doesn't work:
zip.extract(r'*\test.*\.xlsx$', './')
The error I get:
KeyError: "There is no item named '*\\\\test.*\\\\.xlsx$' in the archive"
Any ideas?
You have multiple problems here:
r simply means treat the string as a raw string, it looks like you might think it creates a regular expression object; (in any case, zip.extract() only accepts strings)
The * quantifier at the start of the regex has no character before it to match
You need to manually iterate through the zip file index and match the filenames against your regex:
from zipfile import ZipFile
import re
zip = ZipFile('myzipfile.zip')
for info in zip.infolist():
if re.match(r'.*test.*\.xlsx$', info.filename):
print info.filename
zip.extract(info)
You might also consider using shell file globbing syntax: fnmatchcase(info.filename, '*.test.*.xls') (behind the scenes it converts it to a regex but it makes your code slightly simpler)