Get specific parts of a file path in Python - python

I have a path string like
'/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
Q1: How can I get only accescontrol.dat from the path.
Q2: How can I get only /path/eds/vs/accescontrol.dat from the path.

import re
url = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
match = re.search('^(.+/([^/]+\.dat))[^$]', url)
print match.group(1)
# Outputs /path/eds/vs/accescontrol.dat
print match.group(2)
# Outputs accescontrol.dat
I edited this to work in python2 and to answer both questions (the earlier regex answer above only answers the first of the two)

You could use regular expressions
import re
ma = re.search('/([^/]+\.dat)/d=', path)
print ma.group(1)

A simple solution is to use .split():
Q1:
str = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'
[x for x in str.split('/') if x[-4:] == '.dat']
gives:
['accescontrol.dat','file1.dat']
A similar trick will answer Q2.
For more advanced file path manipulation I would recommend reading about os.path
https://docs.python.org/2/library/os.path.html#module-os.path

I would recommend separating each level of folder/file into strings in a list.
path = '/path/eds/vs/accescontrol.dat/d=12520/file1.dat'.split("/")
This makes path = ['path', 'eds', 'vs', 'accescontrol.dat', 'd=12520', 'file1.dat']
Then from there, you can access each of the different parts.

Why not this way
from pathlib import Path
h=r'C:\Users\dj\Pictures\Saved Pictures'
path = Path(h)
print(path.parts)
Path: .
('C:\\', 'Users', 'dj', 'Pictures', 'Saved Pictures')

Related

How to replace multiple forward slashes in a directory by a single slash?

My path:
'/home//user////document/test.jpg'
I want this to be converted into:
'/home/user/document/test.jpg'
How to do this?
Use os.path.abspath or normpath to canonicalise the path:
>>> import os.path
>>> os.path.abspath('/home//user////document/test.jpg')
'/home/user/document/test.jpg'
Solution:
This code snippet should solve your issue:
import re
x = '/home//user////document/test.jpg'
re.sub('/+','/', x)
Output:
'/home/user/document/test.jpg'
this solution is very simple by using Regex.
You can use it 're' module of the Python standard library.
import re
old_path = '/home//user////document/test.jpg'
converted_path = re.sub('/+', '/', old_path)
I'm sorry not to speak English fluently ;)
Instantiating a pathlib.Path object from your string will remove redundant slashes automatically for you:
from pathlib import Path
path = Path('/home//user////document/test.jpg')
print(path)
# /home/user/document/test.jpg
I think the easiest way is to replace '//' with '/' twice:
a = '/home//user////document/test.jpg'
a.replace('//', '/').replace('//', '/')

Grab part of filename with Python

Newbie here.
I've just been working with Python/coding for a few days, but I want to create a script that grabs parts of filenames corresponding to a certain pattern, and outputs it to a textfile.
So in my case, let's say I have four .pdf like this:
aaa_ID_8423.pdf
bbbb_ID_8852.pdf
ccccc_ID_7413.pdf
dddddd_ID_4421.pdf
(Note that they are of variable length.)
I want the script to go through these filenames, grab the string after "ID_" and before the filename extension.
Can you point me in the direction to which Python modules and possibly guides that could assist me?
Here's a simple solution using the re module as mentioned in other answers.
# Libraries
import re
# Example filenames. Use glob as described below to grab your pdf filenames
file_list = ['name_ID_123.pdf','name2_ID_456.pdf'] # glob.glob("*.pdf")
for fname in file_list:
res = re.findall("ID_(\d+).pdf", fname)
if not res: continue
print res[0] # You can append the result to a list
And below should be your output. You should be able to adapt this to other patterns.
# Output
123
456
Goodluck!
Here's another alternative, using re.split(), which is probably closer to the spirit of exactly what you're trying to do (although solutions with re.match() and re.search(), among others, are just as valid, useful, and instructive):
>>> import re
>>> re.split("[_.]", "dddddd_ID_4421.pdf")[-2]
'4421'
>>>
If the numbers are variable length, you'll want the regex module "re"
import re
# create and compile a regex pattern
pattern = re.compile(r"_([0-9]+)\.[^\.]+$")
pattern.search("abc_ID_8423.pdf").group(1)
Out[23]: '8423'
Regex is generally used to match variable strings. The regex I just wrote says:
Find an underscore ("_"), followed by a variable number of digits ("[0-9]+"), followed by the last period in the string ("\.[^\.]+$")
You can use the os module in python and do a listdir to get a list of filenames present in that path like so:
import os
filenames = os.listdir(path)
Now you can iterate over the filenames list and look for the pattern which you need using regular expressions:
import re
for filename in filenames:
m = re.search('(?<=ID_)\w+', filename)
print (m)
The above snippet will return the part of the filename following ID_ and prints it out. So, for your example, it would return 4421.pdf, 8423.pdf etc. You can write a similar regex to remove the .pdf part.
You probably want to use glob, which is a python module for file globbing. From the python help page the usage is as follows:
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']

replace part of path - python

Is there a quick way to replace part of the path in python?
for example:
old_path='/abc/dfg/ghi/f.txt'
I don't know the beginning of the path (/abc/dfg/), so what I'd really like to tell python to keep everything that comes after /ghi/ (inclusive) and replace everything before /ghi/ with /jkl/mno/:
>>> new_path
'/jkl/mno/ghi/f.txt/'
If you're using Python 3.4+, or willing to install the backport, consider using pathlib instead of os.path:
path = pathlib.Path(old_path)
index = path.parts.index('ghi')
new_path = pathlib.Path('/jkl/mno').joinpath(*path.parts[index:])
If you just want to stick with the 2.7 or 3.3 stdlib, there's no direct way to do this, but you can get the equivalent of parts by looping over os.path.split. For example, keeping each path component until you find the first ghi, and then tacking on the new prefix, will replace everything before the last ghi (if you want to replace everything before the first ghi, it's not hard to change things):
path = old_path
new_path = ''
while True:
path, base = os.path.split(path)
new_path = os.path.join(base, new_path)
if base == 'ghi':
break
new_path = os.path.join('/jkl/mno', new_path)
This is a bit clumsy, so you might want to consider writing a simple function that gives you a list or tuple of the path components, so you can just use find, then join it all back together, as with the pathlib version.
>>> import os.path
>>> old_path='/abc/dfg/ghi/f.txt'
First grab the relative path from the starting directory of your choice using os.path.relpath
>>> rel = os.path.relpath(old_path, '/abc/dfg/')
>>> rel
'ghi\\f.txt'
Then add the new first part of the path to this relative path using os.path.join
>>> new_path = os.path.join('jkl\mno', rel)
>>> new_path
'jkl\\mno\\ghi\\f.txt'
You can use the index of ghi:
old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/")
In [4]: old_path.replace(old_path[:old_path.index("ghi")],"/jkl/mno/" )
Out[4]: '/jkl/mno/ghi/f.txt'
A rather naive approach, but does the job:
Function:
def replace_path(path, frm, to):
pre, match, post = path.rpartition(frm)
return ''.join((to if match else pre, match, post))
Example:
>>> s = '/abc/dfg/ghi/f.txt'
>>> replace_path(s, '/ghi/', '/jkl/mno')
'/jkl/mno/ghi/f.txt'
>>> replace_path(s, '/whatever/', '/jkl/mno')
'/abc/dfg/ghi/f.txt'
The following is useful when you want to replace some known base directory in your path.
from pathlib import Path
old_path = Path('/abc/dfg/ghi/f.txt')
old_root = Path('/abc/dfg')
new_root = Path('/jkl/mno')
new_path = new_root / old_path.relative_to(old_root)
# Result: /jkl/mno/ghi/f.txt
I understand that the OP specifically mentioned that the path to the base directory is not known. However, since it is a common task to remove the path to the base directory, and the title of the question ("replace part of the path") is certainly bringing some folks with this subtype of problem here, I am posting it anyway.
I needed to replace an arbitrary number of an arbitrary strings in a path
e.g. replace 'package' with foo in
VERSION_FILE = Path(f'{Path.home()}', 'projects', 'package', 'package', '_version.py')
So I use this call
_replace_path_text(VERSION_FILE, 'package', 'foo)
def _replace_path_text(path, text, replacement):
parts = list(path.parts)
new_parts = [part.replace(text, replacement) for part in parts]
return Path(*new_parts)

Python cp regex: copy an unknown dir

Currently in a program I have, I use:
os.system('cp /run/user/1000/gvfs/*/store_00010001/DCIM/100ND610/* /home/$USER/Pictures/$(date +%F)/')
But I want to use shutil.copy to do the copying for me. I have one problem though. In Bash I can have a path that has a regex/wildcard like * or ? eg.: /run/user/1000/gvfs/*/store_000?0001/, but how to I do that in python for a path?
I'm using Python 3.3
The following is a clean way to use glob when you are expecting only one match:
import glob
unknown_dir = "/run/user/1000/gvfs/gphoto*/"
matches = glob.glob(unknown_dir)
if len(matches) == 0:
raise Exception('No matches')
if len(matches) > 1:
raise Exception('Too many matches')
known_dir = matches[0]
If you are expecting zero or more matches, you can simply iterate over the list of matches and handle each one in turn:
import glob
unknown_dir = "/run/user/1000/gvfs/gphoto*/"
for match in glob.glob(unknown_dir):
# use match
#eryksun answered my question with recommending glob. For future people looking at this question, here is how I used it:
import os
import glob
unkown_dir = "/run/user/1000/gvfs/gphoto*/"
known_dir = str(glob.glob(unkown_dir)).strip("[]'")
I need the path as a string, hence me using str() and strip().
I'll now be able to use shutil.copy. Thank you #eryksun!

python:extract certain part of string

I have a string from which I would like to extract certain part. The string looks like :
E:/test/my_code/content/dir/disp_temp_2.hgx
This is a path on a machine for a specific file with extension hgx
I would exactly like to capture "disp_temp_2". The problem is that I used strip function, does not work for me correctly as there are many '/'. Another problem is that, that the above location will change always on the computer.
Is there any method so that I can capture the exact string between the last '/' and '.'
My code looks like:
path = path.split('.')
.. now I cannot split based on the last '/'.
Any ideas how to do this?
Thanks
Use the os.path module:
import os.path
filename = "E:/test/my_code/content/dir/disp_temp_2.hgx"
name = os.path.basename(filename).split('.')[0]
Python comes with the os.path module, which gives you much better tools for handling paths and filenames:
>>> import os.path
>>> p = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> head, tail = os.path.split(p)
>>> tail
'disp_temp_2.hgx'
>>> os.path.splitext(tail)
('disp_temp_2', '.hgx')
Standard libs are cool:
>>> from os import path
>>> f = "E:/test/my_code/content/dir/disp_temp_2.hgx"
>>> path.split(f)[1].rsplit('.', 1)[0]
'disp_temp_2'
Try this:
path=path.rsplit('/',1)[1].split('.')[0]
path = path.split('/')[-1].split('.')[0] works.
You can use the split on the other part :
path = path.split('/')[-1].split('.')[0]

Categories